Learner corpora around the world


This list is still work in progress. We would like it to be as comprehensive as possible. If you have a learner corpus or know of one that is not listed on this webpage, send a message to Magali Paquot and we'll add it to the list. We hope you will find the list useful for your research!

The list below only contains learner corpora, i.e. electronic collections of continuous written or spoken data produced by foreign or second language learners.
For a list of learner corpus-based datasets (treebanks, error lists, etc.), click here.

To refer to this list :

Centre for English Corpus Linguistics (date of access): Learner Corpora around the World. Louvain-la-Neuve: Université catholique de Louvain. https://uclouvain.be/en/research-institutes/ilc/cecl/learner-corpora-around-the-world.html


© 2019, Université catholique de Louvain

Learner corpora

Use the query box below to search for specific keywords (e.g. languages, task type, medium)



Medium Text type/ task type Proficiency level Size
in words
Project director Availability
The Arabic Learner Corpus
Arabic 66 languages written and spoken Narrative and discussion Intermediate and advanced

c. 283,000

c. 3h30

Abdullah Alfaifi & Eric Atwell

The Pilot Arabic Learner Corpus Arabic English written Narrative Intermediate and advanced c. 9,000 Ghazi Abuhakema
Reem Faraj
Anna Feldman
Eileen Fitzpatrick
Montclair State University, USA
The Jinan Chinese Learner Corpus
Chinese 50 languages written Exams and assignments Beginners, intermediate and advanced

c. 6 m. Chinese characters

c. 9,000 texts

Maolin Wang

Shervin Malmasi

Minggxuan Huang

Free download upon contact with researchers.
Croatian Learner Text Corpus (CroLTeC)  Croatian 36 languages (Afrikaans, Arabic, Bulgarian, Catalan, Czech, Danish, German, English, Estonian, Persian, Finnish, French, Hebrew, Hungarian, Indonesian, Italian, Japanese, Korean, Lari, Mandinka, Dutch, Norwegian, Polish, Portuguese, Russian, Slovak, Slovenian, Spanish, Albanian, Swedish, Thai, Turkish, Ukrainian, Vietnamese, Chinese, Malay) written exam essays, argumentative and literary essays, letters, diaries, picture descriptions, book reviews, short dialogues, etc. A1-C2 c. 1 million Nives Mikelic Preradovic, University of Zagreb, Croatia  Freely available
The AKCES/CZESL corpus
(Acquisition corpora of Czech/Czech as a second language)
Czech Various written and spoken Student essays and
Various 2 m. Karel Sebesta
Charles University in Prague
Technical University in Liberec, Czech Republic
Leerdercorpus Nederlands als Vreemde Taal Dutch French written       Liesbeth Degand
Université catholique de Louvain, Belgium
Arab Learner English Corpus (ALEC) English Arabic written Essays written by freshman students as part of first level college writing course University students (second language learners)
Analysis 184749 
Narrative 67527 
Synthesis 66015 
Argumentation 192298

Dr. Inas Mahfouz
American University of Kuwait


The Aachen Corpus of Academic Writing

English German written Academic research writing Advanced

c. 240,000 words

c. 225,000 words (L1 component)

Elma Kerz, RWTH Aachen University Under development
The Advanced Learner English Corpus
English Mainly Swedish written Essays written by university students of English linguistics and English literature Advanced c. 1,3 m. Tove Larsson, Uppsala University Not freely available

The ANGLISH corpus

English French spoken Readings of texts and sentences, spontaneous oral language. Various c. 5h30 Anne Tortel
University of Provence, France.

Freely available 

Asao Kojiro’s Learner Corpus Data English Japanese written Essays and stories written or reproduced by Japanese college students.     Asao Kojiro Texts available for download

The Barcelona English Language Corpus (BELC)

English Spanish
spoken and written

4 tasks:
Written composition
Oral narrative
Oral interview

Longitudinal data (children and young adults learning English)

 Various   Carmen Muños
University of Barcelona, Spain

The BATMAT Corpus

English Swedish
written BA dissertations
MA dissertations
Advanced c. 2,5 m. Professor Tuija Virtanen-Ulfhielm, English language and literature, Åbo Akademi University, Finland Not publicly available
Belarussian Learner Corpus of English (BELLCE) English Russian; Belarussian written argumentative essays High intermediate to advanced unknown Anastasia Rakhuba  
The Bilingual Corpus of Chinese English Learners
English Chinese spoken and written

Spoken: National Oral English test.

Written: in-class assignments

  c. 2 m. Wen Qiufang
National Research Center for Foreign Language Education Beijing Foreign Studies University, China
The Brazilian Spoken Corpus of English Learners (BraSCEL) English Portuguese spoken Informal interview + thought-provoking picture discussion A1-C2 benchmarked to the CEFR Under development

Mateus Miranda - Mary Immaculate College, University of Limerick

The corpus (transcriptions of audio files) will be available to the scientific community upon request.
The British Academic Written English (BAWE) corpus English

Mainly L1 speakers

Also includes data produced by L2 speakers

written ESP papers 

4 levels of study (from undergraduate levels to final year and taught masters level)


c. 6,5 m. Hilary Nesi
Sheena Gardner
Warwick, UK
Paul Thompson
University of Birmingham, UK
Paul Wickens
Oxford Brookes, UK

The BAWE corpus can be accessed through the corpus analysis interface, Sketch Engine.

The BUiD Arab Learner Corpus (BALC) English Arabic written School examination essays Various c. 290,000 Mick Randall
The British University in Dubai,
United Arab Emirates
Nicholas Groom
University of Birmingham, UK
At present, copies of the current version of the corpus is available on request from mick.randall@buid.ac.ae
The Cambridge Learner Corpus (CLC) English Various written Exam scripts Various c. 50 m. Cambridge University Press and Cambridge ESOL, UK Commercial
The Corpus of Academic Learner English
English German written Various academic text types that are typically produced in university courses of English, e.g. term papers, reading reports, research plans, abstract, reviews, and summaries. Advanced under development Marcus Callies
University of Bremen, Germany
The Corpus of English Essays Written by Asian University Students (CEEAUS) English Various written Student essays Various c. 200,000 Shin Ishikawa
Kobe University, Japan
The Chinese Academic Written English corpus
English Chinese written Dissertations written by Chinese undergraduates majoring in English linguistics or applied linguistics.   c. 400,000 David Yong Wey Lee
City University of Hong Kong, Hong Kong

The Chinese Learner English Corpus (CLEC)

English Chinese written   Various c. 1 m. Gui Shichun
Guangdong University of Foreign Studies & Yang Huizhong, Shanghai Jiatong, China
The corpus can only be accessed by users in the Department of English at HKPU. 
The City University Corpus of Academic Spoken English (CUCASE) English


Also includes data produced by L1 speakers

multimedia     c. 2 m. David Yong Wey Lee
City University of Hong Kong, Hong Kong
The Cologne-Hanover Advanced Learner Corpus (CHALC) English German written term papers and essays Advanced c. 210,000 Ute Römer
University of Michigan, USA
The College Learners’ Spoken English Corpus
English Chinese spoken National spoken English test for non-English majors.   c. 700,000 Yang and Wei  
The Corpus Archive of Learner English in Sabah/Sarawak (CALES) English Malay written Argumentative essays Various c. 400,000

Simon Botley
Faizal Hakim
Doreen Dillah
Universiti Teknologi MARA Sarawak, Malaysia

Corpus Oral de Português como Língua Adicional-Brasil (CoPLA-BR)/ Oral Corpus of Brazilian Portuguese as an Additional Language Portuguese Various spoken Informal interview + thought-provoking picture discussion


Under development Mateus Miranda - Mary Immaculate College, University of Limerick The corpus (transcriptions of audio files) will be available to the scientific community upon request.
CORpus del ESPañol de los Italianos (CORESPI) Spanish Italian Written Written compositions A1 to B2 c.125,000

Sonia Bailini
Università Cattolica del Sacro Cuore, Milan, Italy

Online access

CORpus del ITaliano de los Españoles (CORITE) Italian Spanish Written Written compositions A1 to B2 c.103,000 Sonia Bailini
Università Cattolica del Sacro Cuore, Milan, Italy

Online access

The Corpus of Business Letters English Italian written

Tagged part: BEC1 writting tests (letters, emails, faxes, memos, reports)

Untagged part: business writing exam tests

  c. 32,000 Anna Romagnuolo  
The Corpus of Multilingual Opinion Essays by College Students (MOECS) English varied written opinion essays college students unknown Megumi Okugiri available
Corpus of writing, pronunciation, reading, and listening by learners of English as a Foreign Language English Japanese written and spoken varied beginners to advanced 29h audio + 30.000 words

Katsunori Kotani

Takehiko Yoshimi

Hiroaki Nanjo

Hitoshi Isahara

The Corpus of Young Learner Interlanguage (CYLIL) English


spoken English L2 data elicited from European School pupils.
Longitudinal data
Various c. 500,000 Alex Housen
Vrije Universiteit Brussel, Belgium
The DiSKo ("Deutsch im Studium"-Lernerkorpus) German Various written Standardized writing task from university admission language test (TestDaF), app. 400 tokens per text B1-C2 Longitudinal, under development; targeted word number ~ 180,000  Katrin Wisniewski, University of Leipzig Will be freely available online under the ANNIS architecture
The Eastern European English learner corpus English Russian
spoken Spontaneaous spoken production data elicited by means of a semi-structured interview Various c. 60,000 Elena Salakhian
Eberhard Karls University of Tübingen, Germany
The EFL Teacher Corpus
English Korean
spoken Teacher talks in language classrooms Upper-intermediate to advanced c. 123,000 Ye-eun Kwon
Eun-Joo Lee
Under development
The English of Malaysian School Students corpus (EMAS) English Malay written Student essays + oral interviews various c. 500,000 Arshad Abd. Samad et al.
Universiti Putra Malaysia, Malaysia
The English Speech Corpus of Chinese Learners
English Chinese spoken Dialogue reading-aloud Middle school and college   Chen Hua
Nantong University, China
Wen Qiufang
Beijing Foreign Studies University, China
Li Aijun
Chinese Academy of Social Sciences, China
The ETS Corpus of Non-Native Written English English 11 languages written 12,100 TOEFL English essays /   Daniel Blanchard

Information avout the score level is available for each essay

Samples are available

The Europarl corpus of Native Non-native and Translated Texts
English 24 EU languages written Proceedings of the European Parliament Advanced

NNS: c. 780,000

NS: c. 3 m.

Translated: c. 22m.

Sergiu Nisioi Available
English Students’ Oral Corpus in Chile (ESOC-Chile) English Spanish spoken Student Interviews B1 - B2 - C1 73631

Chinger Zapata

Universidad Católica del Norte - Chile

The corpus (audio files or plain transcriptions of audio files in txt. format) will be available to the scientific community upon request to czapata@ucn.cl
The EVA Corpus of Norwegian School English English Norwegian spoken Picture-based tasks  / c. 35,000 Angela Hasselgren
University of Bergen, Norway
The FUSE (The Finnish Upper Secondary School Corpus of Spoken English) English Finnish (possibly other L1s too, information not collected) spoken Role-tasks or mind-map tasks as part of a low-stakes, course examination in Finnish upper secondary/high schools CEFR: A2-C1 N/A  Lasse Ehrnrooth Online access
The Gachon Learner Corpus English Korean
(+ a few Chinese & Spanish speaking students) 
written Written Journal Assignments Lower intermediate c. 2,5 m. Brian Carlstrom Freely available
The Gesprochene Wissenschaftssprache konstrastiv - Multilingual corpus of spoken academic language (GeWiss) German English, Polish, Bulgarian & diverse other L1 languages spoken Academic papers, student presentations and academic oral examinations in German philology / Applied Linguistics / Language pedagogy as well as in Polish, English, and Italian philology B2, C1 1.4 m. Christian Fandrych Freely available upon registration: https://gewiss.uni-leipzig.de/index.php?id=home&L=1
The GICLE corpus (German component of ICLE) English German written Mainly non-academic argumentative essays Advanced c. 234,000    
The Giessen-Long Beach Chaplin Corpus
English German spoken Transcribed interactions between native English speakers, ESL and EFL speakers Various c. 350,000 Andreas Jucker
Sara Smith
University of Giessen, Germany
Restricted use: apply for approval to get a copy.
The Hong Kong University of Science & Technology learner corpus
English Chinese - mostly Cantonese written Untimed assignments written for EFL courses and school leaving exams University and advanced high school students c. 25 m. John Milton
Hong Kong University of Science &Technology, Hong Kong
The Indianapolis Business Learner Corpus
English Various written Job application letters and résumés of business communication students from the U.S., Belgium, Finland, Germany, and Thailand, spanning the years 1990-1998     Ulla Connor
Kristen Precht
Thomas Albin Upton
Indiana University, USA
The International Corpus of Crosslinguistic Interlanguage (ICCI) English Various written Essays (20-min in-class tasks without the use of a dictionary)  Beginner to lower-intermediate 9,000 essays Yukio Tono
Tokyo University of Foreign Studies, Japan
Freely available
The International Corpus Network of Asian Learners of English
English Chinese
written and spoken

Controlled speeches and essays

L1 productions by 350 NS

Various c. 1,8 m. Shin'ichiro Ishikawa
Kobe University, Japan
Freely available
The International Corpus of Learner English
English Various written Argumentative and literary essays High-intermediate to advanced c. 3 m. Sylviane Granger
Centre for English Corpus Linguistics
Université catholique de Louvain, Belgium
CD-Rom + handbook: order online.
The International Teaching Assistants corpus
English Various spoken Learner language from a variety ofspoken classroom tasks: office hours role plays, presentations, discussions   c. 500,000 Steven L. Thorne
Paula Golombek
Jonathon Reinhardt
Pennsylvania State University, USA
The Iranian Corpus of Learner English English Farsi written Expository essays University students (English majors) 436,035 Parviz Maftoon, Parviz Birjandi, Hossein Khazaee CD-ROM, data gathered for PhD dissertation by Hossein Khazaee; this corpus is an intellectual property of Science and Research Branch, Islamic Azad University, Tehran, Iran
The ISLE speech corpus English German
spoken Recorded sentences from several blocks of differing types (reading simple sentences, using minimal pairs, giving answers to multiple choice questions) Intermediate  c. 18h ecisle@nats.informatik.uni-hamburg.de CD-Rom
The Israeli Learner Corpus of Written English English Hebrew written Argumentative and descriptive essays   c. 750,000 Tina Waldman
Kibbutzim College of Education, Israel
The Janus Pannonius University Corpus
English Hungarian written Essays and research papers University students c. 500,000 József Horváth
University of Pécs, Hungary
Searchable online
The Japanese English as a Foreign Language Learner Corpus
English Japanese written Student essays From beginning to intermediate c. 700,000

Yukio Tono, Meikai University, Japan


The JEFLL Corpus will be freely available for research, first via the web query system (already available in Japanese) and then the entire data will be distributed under license in the future.
Lancaster Corpus of Academic Written English
English various written IELTS academic writing tests (descriptive and argumentative tasks); assignments.
Longitudinal data.
The Lang-8 Learner Corpora English Various written texts from Lang-8, a social networking site for language learning / / Toshikazu Tajiri & Mamoru Komachi Available here
The LeaP Corpus : Learning Prosody in a Foreign Language English German spoken Four types of speech styles were recorded:
  • nonsense word lists
  • readings of a short story
  • retellings of the story
  • free speech in an interview situation
Various  c. 12h Ulrike Gut
Albert-Ludwigs-University Freiburg, Germany

The annotated corpus is available to the scientific community. Please contact Ulrike Gut at the University of Augsburg.

LeaP manual

The Learner Corpus of Engineering Abstracts
English Malaysian written Abstracts of the Computer and Communication Systems Engineering Final Year Projects Various

c. 550,000

998 abstracts

Helen Tan, University Putra Malaysia

Chan Swee Heng

Ain Nadzimah

Syamsiah bt Mashohor

Available. Contact: Helen Tan, University Putra Malaysia
The Learner Corpus of English for Business Communication English Chinese written Different types of business correspondence written for simulated business situations, including memos, faxes, reports, letters of enquiry and complaint letters   c. 117,500 Li Lan
Hong Kong Polytechnic University, Hong Kong
Searchable online
The Learner Corpus of Essays and Reports English  Chinese written Essays and project reports covering a range of topics from Science, IT and New Media to Nursing, Business and Economics, and the Social Sciences   c. 188,000

Sima Sengupta
Hong Kong Polytechnic University, Hong Kong


Searchable online
A Learners' Corpus of Reading Texts English French spoken Unprepared reading of English texts.
The texts are short abstracts of fiction or made-up dialogues.
 University students   Sophie Herment 
Valérie Kerfelec
Laetitia Leonarduzzi
Gabor Turcsan
Freely available
The LONGDALE project: LONGitudinal DAtabase of Learner English English Various spoken and written Range of text types/task types.
Longitudinal data.
From intermediate to advanced   Fanny Meunier
Centre for English Corpus Linguistics
Université catholique de Louvain, Belgium
Under development
The Longman Learners' Corpus English Various written Essays and exam scripts Various c. 10 m. Longman Commercial
The Louvain International Database of Spoken English Interlanguage (LINDSEI) English Various spoken Interviews and picture descriptions High-intermediate to advanced c. 800,000 Gaëtanelle Gilquin
Centre for English Corpus Linguistics
Université catholique de Louvain, Belgium
CD-Rom and handbook: order online
The Malaysian Corpus of Learner English
English Malay written       Gerry Knowles
Zuraidah Mohd. Don
University of Malay, Malaysia
The Malaysian Corpus of Students' Argumentative Writing
English Malay
Chinese Indian
written Argumentative essays

Form 4
Form 5

c. 565,500

Seyed Ali Rezvani Kalajahi
Jayakaran Mukundan
University Putra Malaysia

Available from developers
The Michigan Corpus of Academic Spoken English (MICASE) English Mainly L1 speakers but also includes data produced by L2 speakers spoken Transcipts of academic speech events   c. 1,8 m.

Ute Römer
University of Michigan, USA


Searchable online
The Michigan Corpus of Upper-level Student Papers (MICUSP) English Semi-balanced sample of native and non-native speakers of English written ESP papers
A-grade papers or ungraded papers that have been assessed and accepted (such as research proposals), but not published
  c. 2,6 m.

Ute Römer
University of Michigan, USA


Searchable online
The Montclair Electronic Language Database
English Various written Student essays Various c. 100,000 Eileen Fitzpatrick
Milton S. Seegmiller
Monclair State University, USA

Contact Eileen Fitzpatrick.

Includes error annotations

The Multimedia Adult ESL Learner Corpus
English ESL environment multimedia Video of classroom interaction and associated written materials Beginner to upper-intermediate  

Stephen Reder
Kathryn Harris
Kristen Setzler
Portland State University, USA


The Lab School would like to share the extensive resources from MAELC with interested researchers and teacher trainers. Those interested should make inquiries to the Lab School by e-mail.
The Neungyule Interlanguage Corpus of Korean Learners of English (NICKLE) English Korean spoken and written

Written part: student essays
Spoken part: student interviews and oral speech tests transcriptions

Mainly from beginning to intermediate 

c. 890,000

c. 100,000

 Ji-Myoung Choi
Yonsei University, Seoul, Korea
The corpus will be available to the scientific community for research purposes upon request.
The Japanese Learner English Corpus
English Japanese spoken English oral proficiency interview test various 2 m. Emi Izumi
Kiyotaka Uchimoto
Hitoshi Isahara
National Institute of Information and Communications Technology, Kyoto, Japan.
Freely available (downloadable)
The NOn-native Spanish corpus of English
English Spanish written Argumentative and descriptive student essays Intermediate and upper-intermediate c. 300,000 words  Ana Diaz-Negrillo
Universidad de Granada, Spain
The NUS Corpus of Learner English English Several East Asian languages, predominantly Chinese written Student essays on a wide range of topics including environmental pollution, healthcare, etc.   various c. 1 m. Hwee Tou Ng
Siew Mei Wu
Daniel Dahlmeier
National University of Singapore, Singapore.
Freely available
The PELCRA Learner English Corpus
English Polish spoken and written Written: Argumentative, descriptive, narrative and quasi-academic essays; formal letters From beginning to post-advanced

Under development

Aim spoken:
c. 200,000

Aim written:
c.2,8 m.

Piotr Pęzik
Barbara Lewandowska-Tomaszczyk
University of Lodz, Poland

Online search engine and corpus analysis tools
The PICLE corpus (Polish component of ICLE) English Polish written Student essays Advanced c. 330,000 Przemyslaw Kaszubski
AMU, Poznan, Poland
Searchable online
The Qatar learner corpus English Arabic (mostly from Qatar) spoken Spoken interviews with Qatari learners of English     Yun Zhao Helen
Carnegie Mellon University, USA
Freely available
The Québec learner corpus English French (from Québec) written Argumentative essays Intermediate and advanced c. 250,000 Tom Cobb
Université du Québec à Montréal, Canada
The Romanian Corpus of Learner English
English Romanian written Student essays     Chitez Madalina
Zurich University, Switzerland
Russian Error-Annotated English Learner Corpus English Russian written

examination essays of the kind similar to IELTS Task 1 and Task 2, with errors annotated manually

Intermediate to Advanced

c.800,000 by November 2017 and growing (together with the old part of the corpus less consistently annotated or not annotated, available at http://realec.org/index.xhtml#/ - c.2,000,000)

Olga Vinogradova, School of Linguistics, Research University Higher School of Economics

freely available

The Russian Learner Translator Corpus
Russian written Translations produced by trainee translators Trainee translators c. 1.5 m. tokens Project directors: Andrey Kutuzov and Maria Kunilovskaya Freeliy available
The Santiago University Learner of English Corpus (SULEC) English Spanish spoken and written

Written: compositions or argumentative essays.

Spoken: semistuctured interviews, short oral presentations and brief story descriptions.

Various Aim: c. 1 m. words Ignacio M. Palacios Martínez, Santiago University Available after registration
The Scientext English Learner Corpus English French written Academic argumentative texts    c. 1.1 m. scientext@u-grenoble3.fr Searchable online
Second Language Research Tasks
English Various



written paragraphs

various oral tasks

Various c. 300,000

Bill Crawford (Northern Arizona University)

Kim McDonough (Concordia University)

Under development
The Seoul National University Korean-speaking English Learner Corpus (SKELC) English Korean written Student essays Various c. 900,000 Heokseung Kwon
Seoul National University
The SILS Learner Corpus of English English Various (mainly Japanese) written Student essays Basic, intermediate and advanced

 c. 3.2 m.

(first and second drafts included)

Victoria Muehleisen
Waseda University, Japan
The Soochow Colber Student Corpus (SCSC) English Chinese written Student essays   c. 227,000 Colman Bernath
Soochow University, Taiwan
The Spoken and Written English Corpus of Chinese Learners
English Chinese spoken (SECCL)
and written (WECCL)

Written: argumentative and narrative essays.

Spoken: National Spoken English Test – longitudinal data

  c. 2 m. Wei Qiufang
Liang Maocheng
Wang Lifei


The Taiwanese Corpus of Learner English
English Chinese written Journals and essays (descriptive, narrative, expository, argumentative) Intermediate to advanced c. 2 m. Rebecca Hsue-Huch Shih
Sun Yat-sen University, Taiwan
The Tawainese learner academic writing corpus (TaiwanLAWC) English Chinese written Theses and dissertations written by Taiwanese graduate students.     Howard Chen
National Taiwan Normal University, Taiwan

The TELEC Secondary Learner Corpus

English Chinese written and spoken Compostions from secondary classroom   c. 2 m. Quentin Allan
University of Hong Kong, Hong Kong
The Telecollaborative Learner Corpus of English and German Telekorp English German written Bilingual, longitudinal database comprising computer-mediated NS-NNS interactions between approximately 200 Americans and Germans collected during six different telecollaborative partnerships from 2000-2005.   c. 1,5 m. Julie Belz
Pennsylvania State University, USA.
Not publicly available
The Ten-Thousand English Compositions of Chinese Learners
English Chinese written Essays (various topics) written in and after class, and in testing context. Also contains some collaborative writing samples. Various (mainly undergraduates) c. 1,8 m. Project initiator: Jiajin Xu, National Research Centre for Foreign Language Education, Beijing Foreign Studies University Raw texts and part-of-speech tagged texts are available
TRAWL – Tracking Written Learner Language Multilingual (English, French, German, Spanish) Norwegian writing Texts written as part of regular class work (tests, in-school writing, homework) Longitudinal corpus (beginners/advanced)  

Project director: Hildegunn Dirdal

The Tswana Learner English Corpus (TLEC) English Tswana written Argumentative essays Advanced c. 200,000 Bertus Van Rooy
North-West University, South Africa
Available in ICLE

The Undergraduate Learner Translator Corpus (ULTC)


English-Arabic or



English-Arabic or



Arabic is the native language of the learners and the main target language

Written and Spoken Translations produced by learners of translation from and into Arabic and a reference subcorpus of published translations From beginners to advanced levels Under development Reem Alfuraih Available via https://arabicparallelultc.com/
The Uppsala Student English Corpus
English Swedish written Student essays Various c. 1,200,000 Ylva Berglund Prytz
Margareta Westergren Axelsson
Uppsala University, Sweden
The corpus can be used for research and educational purposes. It can be accessed on the Internet from the Oxford Text Archive.
The Uppsala WordReference Corpus English, Spanish, French, Italian Various Written Forum posts










English learner subcorpus: 38M

English native subcorpus: 50M

Spanish learner subcorpus: 5M

Spanish native subcorpus: 22M

French learner subcorpus: 4M

French native subcorpus: 7M

Italian learner subcorpus: 1M

Italien native subcorpus: 3M

Aleksandrs Berdicevskis
Uppsala University
Freely available 
The UPF Learner Translation Corpus English Catalan written Translations written by the students of the Translation and Interpreting degree at UPF.    c. 200,000 Anna Espunya
Pompeu Fabra University, Barcelona, Spain 
The UPV Learner Corpus English Catalan written essays Various c. 150,000 Universitat Politècnica de València, Spain  
The Varieties of English for Specific Purposes dAtabase learner corpus
English Various written ESP texts (term papers, reports, MA dissertations) Various c. 220,000 (under development) Magali Paquot
Centre for English Corpus Linguistics
Université catholique de Louvain, Belgium
The Written Corpus of Learner English corpus
English Spanish written Essays Various c. 750,000 Paul Rollinson
Universidad Autonoma de Madrid, Spain
The corpus is available for free, and can be downloaded from this website. There is also a search interface to retrieve sentences and clauses.
The Yonsei English Learner Corpus (YELC) English Korean written Yonsei University English Diagnostic Tests (Part 1: Descriptive task, max. 100 words; Part 2: Argumentative tast, max. 300 words) 9 levels
(A1, A1+, A2, B1, B1+, B2, B2+, C1, C2)
c. 1 m. Seok-Chae Rhee
CK Jung
Yonsei University, Korea
The YELC corpus will be available to the scientific community for research purposes from 31 March 2012.
The Young Learner Corpus of English
English Greek spoken Pedagogic Corpus of video-recorded EFL language classes.  

170 school hours (126  hours of videotaped material)

1,5 m. types

Project director: Marina Mattheoudakis, Aristotle University of Thessaloniki, Greece

Thomas Zapounidis

The Estonian Interlanguage Corpus of Tallinn University
Estonian Russian
written Spontaneously produced texts in language learning situations: argumentative and literary essays, written stories, letters, term papers, reading reports. A1-C2 c. 1 m. Project director: Pille Eslon
Tallinn University, Estonia
Restricted online access
Linguistic Basis of the Common European Framework for L2 English and L2 Finnish
Various written Various Various  

Maisa Martin, University of Jyväskylä, Finland

 Download the corpus data set here

Paths in Second Language Acquisition (TOPLING)

Various written Various Various  

Maisa Martin, University of Jyväskylä, Finland

Available (see here for instructions on how to access the corpora) 
The Advanced Finnish Learner Corpus
Finnish  Russian
written Exam essays, theses, essays and writings Advanced c. 630,000

Kirsti Siitonen, University of Turku, Finland

Ilmari Ivaska, University of Turky, Finland

The Finnish National Foreign Language Certificate Corpus (YKI) Finnish

Lappish (Sami)



Various Beginner, intermediate and advanced  

Ari Maijanen, Centre for Applied Language Studies, University of Jyväskylä, Finland

Tiina Lammervo, Centre for Applied Language Studies, University of Jyväskylä, Finland

Available with user ID and Password
The International Corpus of Learner Finnish
Finnish Various written Finnish learners’ spontaneously produced texts in language learning situations, large variety of text types Beginner, intermediate and advanced Under development

Jarmo Harri Jantunen

University of Oulu, Finland

Free download after applying for a user licence
The Chy-FLE (Cypriot Learner Corpus of French) French Modern Greek
(and Cypriot Greek)
written Argumentative and descriptive essays From intermediate to advanced c. 250,000 (under development) Freiderikos Valetopoulos
Université de Poitiers, France
In collaboration with the University of Cyprus
The COREIL corpus French
  spoken       Elisabeth Delais-Roussarie
Hiyon Yoo
Université Paris-Diderot, France
The "Dire Autrement" corpus French (Second Language) Mainly L1 speakers of English written Narrative, injunctive, persuasivle and informative texts   c. 50,000 Marie-Josée Hamel
Jasmina Milicevic
Dalhousie University, Canada
Available after registration
French Interlanguage Database
French Various written Free compositions: desciptive, argumentative and narrative texts, news & mail  Intermediate   Sylviane Granger
Centre for English Corpus Linguistics
Université catholique de Louvain, Belgium
French Learner Language Oral Corpora
French Various spoken See description of the 7 corpora Various   Florence Myles
Newcastle University
Rosamund Mitchell
University of Southampton, UK

The contents of the database are being made freely available to the research community, in the form of digital sound files and related transcripts formatted using CHILDES software.

Searchable online

The InterFra corpus French Swedish spoken Interviews, retellings of video clips and picture stories Various  

Inge Bartning 
Stockholm University, Sweden.


The "Interphonologie du Français Contemporain" corpus
French Cypriot Greek
English (Canada)


spoken Reading aloud, repeating words, guided interviews, interactions between two learners. Various Under development Sylvain Detey
Waseda University, Japan
Université de Rouen, France
Isabelle Racine
Université de Genève, Switzerland
Yuji Kawaguchi
Tokyo University of Foreign Studies, Japan
Under development; samples available
The Learner Corpus French
French Dutch written

Argumentative essays
Informative texts
Journalistic texts
Formal letters

Written compositions by Flemish students of French

Intermediate to advanced c. 500,000 K.U.Leuven Campus Kortrijk, UGent and Lessius
Hans Paulussen
Under development
The Lund CEFLE Corpus (Corpus Écrit de Français Langue Étrangère) French Swedish written Descriptive and narrative essays; picture-based stories. Various c. 100,000 Malin Ågren
Lund University, Sweden
A sub-part of the corpus is available online.
The University of the West Indies learner corpus


Jamaican Creole

spoken Conversations during oral exams and in informal contexts Various   Hugues Peters
University of New South Wales, Sydney, Australia
Corpus is available freely here (last updated 2017)
Comasan Labhairt ann an Gàidhlig (CLAG)
Gaelic Adult Proficiency
Gaelic Various spoken

Conversation task


Elicited oral imitation task

Question and answer activity


Roibeard Ó Maolalaigh (University of Glasgow)

Nicola Carty (University of Glasgow)


The AleSKO corpus



Also German L1 data from the FALKO corpus

written Argumentative essays    c. 13,600 Heike Zinsmeister
University of Konstanz, Germany
Margrit Breckle
Vilnius Pedagogical University, Lithuania.

Analyzing Discourse Strategies: A Computer Learner Corpus


German English
(mainly American English)
written Threaded Discussion
Longitudinal data
From beginner to intermediate-mid Under development Christina Frei
Edward Nixon
University of Pennsylvania, USA
The Corpus of Learner German (CLEG13) German English written Argumentative, free compositions
Longitudinal over 4 years, undergraduate students
Intermediate to advanced c. 320,000


Ursula Maden-Weinberger



Online access through the FALKO platform.
The corpus is also available as txt files to the scientific community. Please contact U. Maden-Weinberger at uschi@miralis.co.uk

The deL1L2IM corpus German

Russian-Belorussian bilinguals

written Instant messaging dialogues Advanced c. 52,000

Sviatlana Höhn
University of Luxemburg

The Fehlerannotiertes Lernerkorpus (‘error annotated learner corpus’)

Learner subcorpus: various

Native subcorpus: German


1. Summaries

2. Essays

3. Letters, fiction writing, journal articles, book reviews (= longitudinal data from American learners)

1. Advanced

2. Advanced

3. Beginners - advanced


1. c. 40,000 (learner subcorpus) + c. 20,000 (native subcorpus)

2. c. 150,000 (learner corpus) + c. 70,000 (native subcorpus)

3. c. 78,000 (learner subcorpus)

Anke Lüdeling
Maik Walter
Humboldt-Universität zu Berlin
Institut für deutsche Sprache und Linguistik, Germany


Online access
The KOLIPSI corpus German Italian written Two written language production tasks of a standardized test (email/letter) A2-C1 under development Andrea Abel
Aivars Glaznieks
European Academy Bolzano/Bozen, Italy
The Learning the Prosody of a Foreign Language
German Various spoken The LeaP corpus covers four different types of speech:
- read speech
- prepared speech
- free speech
- nonsense word lists
Various  62 speakers Ulrike Gut
University of Augsburg, Germany

The annotated corpus is available to the scientific community. Please contact Ulrike Gut at the University of Augsburg.


The LeKo (Lernerkorpus) corpus German         c. 55,000 Anke Lüdeling, Humboldt-Universität Berlin, Germany

Online access (password protected)

Register here

The LINCS Corpus

1. German

2. German

3. German

1. English

2. German

1. Written

2. Written

3. Written

1. Essays, examination, answers.
Longitudinal and cross-sectional data.

2. Essays

3. Teaching output

1. Intermediate to Advanced

2. Advanced

Under development Elizabeth Thoday
Heriot-Watt University Edinburgh, UK
Not currently publicly available
Multilingual Platform for the European Reference Levels: Exploring Interlanguage in Context




Various written writing tasks from standardized tests (telc/UJOP) A1 to C1 c. 280,000 Katrin Wisniewski Available
Rhodes University Deutsch als Fremdsprache (RUDaF) German  English, Afrikaans, isiXhosa, XiTsonga written Short descriptive and argumentative writing paragraphs (300 words each) A2-B2 34,000

Gwyndolen Ortner

Dr Undine S. Weber

Rhodes University, South Africa

Not available
The Telecollaborative Learner Corpus of English and German Telekorp German English written Bilingual, longitudinal database comprising computer-mediated NS-NNS interactions between approximately 200 Americans and Germans collected during six different telecollaborative partnerships from 2000-2005.   c. 1,5 m.

Julie Belz
Pennsylvania State University, USA.


Not publicly available
The Langman corpus Hungarian Chinese spoken Interviews conducted in 1994 with 11 Chinese immigrants living in Hungary.
Interviews focused on issues related to their arrival in Hungary as well as their daily life activities
    Juliet Langman
University of Texas at San Antonio, USA
Freely available
Corpus di Apprendenti di Italiano L2
Italian Various written Essays Intermediate to advanced c. 237,000 Stefania Spina, Università per Stranieri di Perugia Searchable via CQPweb
Corpus parlato di italiano L2 Italian English
spoken Transcriptions of interviews Various   Stefania Spina
Silvio Pazzaglia
Mirco Perini
Università per Stranieri di Perugia, Italy
Searchable online
The KOLIPSI corpus Italian German written Two written language production tasks of a standardized test (email/letter) A2-C1 Under development Andrea Abel
European Academy Bolzano/Bozen, Italy
The Lexicon of Spoken Italian by Foreigners
Italian Various spoken Proficiency exams of the Certification of Italian as a Foreign Language (CILS) A1-C2 c. 700,000

Francesca Gallina
Università per Stranieri di Siena, Italy

Freely available
MISTiC (Multiple Italian Student TranslatIon Corpus) Italian English, French written translations produced by trainee translators (mainly specialised texts) post-graduate trainee translators ca. 125,000 (English-Italian), ca. 50,000 (French-Italian) Sara Castagnoli, University of Bologna, Italy not available
Varietà di Apprendimento della Lingua Italiana: Corpus Online
Italian Various written   Various c. 570,000

Manuel Barbera

Carla Marello

Elisa Corino

Freely available and searchable online.
Longitudinal Corpus of Chinese Learners of Italian (LOCCLI) Italian Chinese written essays beginners and pre-intermediate 97,000  The LOCCLI is part of a joint project between Stefania Spina (University for Foreigners of Perugia, Italy) and Anna Siyanova-Chanturia (Victoria University of Wellington, New Zealand). It is freely searchable via CQPweb (registration required) from https://www.unistrapg.it/cqpweb/
Corpus of Chinese Learners of Italian (COLI) Italian Chinese written and spoken

essays and answers to open questions


intermediate and advanced 82,300


Contact: Stefania Spina

The COLI is freely searchable via CQPweb (registration required) from https://www.unistrapg.it/cqpweb/
The Korean learner corpus Korean Various written Various: letters, essays, formal writing... Beginner and intermediate c. 10,000 Seok Bae Jang
Georgetown University, USA
Sun Hee Lee
Wellesley College, USA
Sang kyu Seo
Yonsei University, South Korea
ESAM Latvian and Lithuanian Latvian and Lithuanian written   Beginner 52,000 Inga Znotiņa Available online 
The ASK corpus Norwegian German
written Essays from language tests  B1 and B2   Kari Tenfjord
University of Bergen, Norway
Apply for a licence here
The Persian Learner Corpus
Persian (Farsi) Various written Narratives and essays Intermediate and advanced Academic/Restricted online access

Saeed Safari

University of Belgrade, Faculty of Philology

Academic/Restricted online access
The Salam Farsi Learner Corpus
Persian (Farsi) Serbian written Narratives, descriptive essays Beginner and upper-intermediate Under development

Saeed Safari

University of Belgrade, Faculty of Philology

Academic, under development
Learner Corpus of Portuguese L2 (COPLE2) Portuguese 15 languages: Chinese, English, Spanish, German, Russian, French, Japanese, Italian, Dutch, Tetum, Arabic, Polish, Korean, Romanian and Swedish Written and spoken Exams and assignments A1-C1 written: 171.461
oral: 25.783
Iria del Río Available
Russian Learner Corpus Russian varied written and spoken academic and non academic teachers and heritage speakers unknown Ekaterina Rakhilina Available online
The PIKUST pilot learner corpus Slovene Various written Mostly argumentative essays Majority advanced – but also intermediate and beginner c. 35,000 Mojca Stritar
University of Ljubljana, Slovenia
The Anglia Polytechnic University (APU) Learner Spanish Corpus Spanish Various written     c. 120,000 Anne Ife
Anglia Ruskin University, UK
Aprescrilov ("Aprendera Escribiren Lovaina") Spanish Dutch written Written assignments and tests; several text types (letters, expository, descriptive, argumentative, narrative) A1 to C1 c. 1 m.

Kris Buyse
KU Leuven, Belgium

Restricted online access

The Corpus de aprendices de español

Spanish Various written   A1 to C1

c. 575,000

CAES team

Universidade de Santiago de Compostela

Online access
Corpus Escrito del Español L2 (CEDEL2 version 1.0) Spanish English, Greek written Written compositions by learners of Spanish All proficiency levels (lower beginner to upper advanced) 802,019 words coming from 2,578 participants

Cristobal Lozano
Universidad de Granada, Spain

 Downloadable/browsable via the CEDEL2 webpage: http://cedel2.learnercorpora.com/

Corpus de textos escritos para el análisis de errores de aprendices de E/LE

Spanish Various written Essays A2 to C1 /

Cestero Mancera, A. M. 
Penadés Martínez, I.

Universidad de Alcalá Henares

CD-ROM available
The Corpus of Taiwanese Learners of Spanish (Corpus de Aprendices Taiwaneses de Español)
Spanish Chinese written Student essays Various c. 340,000 hclu@mail.ncku.edu.tw Under development
The DIAZ corpus Spanish


spoken Semi-spontaneous (structured interviews) and experimental (structured questionnaires) Adult Spanish L2/L3 oral data Various   Lourdes Diaz Rodriguez
Universitat Pompeu Fabra, Spain
Freely available
The Japanese learner corpus of Spanish Spanish Japanese written Student essays   c. 83,400 Yoshihito Kamakura
University of Birmingham, UK
The Spanish Corpus Proficiency Level Training
Spanish English (heritage language learners) spoken Dialogues about a given set of questions Beginner to advanced   Dr Dale Koike, University of Texas, Austin Liberal Arts Instructional Technology Center

Videos are available

Spanish Learner Language Oral Corpus
Spanish English spoken Learner narratives, interviews and picture description tasks Beginner to advanced c. 50,000

Laura Dominguez
University of Southampton, UK

Searchable online
Data freely available for download
Spanish Learner Oral Corpus Spanish Various
(9+ languages - especially Portuguese, French, Italian)
spoken Semi-spontaneous interviews, narrative and descriptive tasks A2-B1 c. 50,000 words Leonardo Campillos Llanos
Laboratorio de Lingüistica Informatica
Universidad Autonoma de Madrid, Spain
Online access
The Tartu Learner Corpus of Spanish as a L3+ Spanish Estonian written Academic research writing Advanced c. 885,000 Mari Kruse, University of Tartu, Estonia  

The ASU corpus

Swedish  Chinese
spoken and written Transcribed audio-recorded conversations and written texts from adult learners of Swedish – longitudinal data   c. 490,000 words
(c. 415,000 spoken and c. 75,000 written)
Björn Hammarberg
Stockholm University, Sweden
Leiden Learner Corpus Multilingual (Dutch, French, Italian, Portuguese and Spanish) various written and spoken written data: short essays; oral data: picture-based story telling various 200 participants M. Carmen Parafita Couto    

The European Science Foundation Second Language Database
(ESF database)




spoken Spontaneous second language acquisition of forty adult immigrant workers living in Western Europe, and their communication with native speakers in the respective host countries Various   Wolfgang Klein
Clive Perdue
Max Planck Institut, Nijmegen, Netherlands
Freely available
The Foreign Language Examination Corpus
Multilingual Polish written Data from the Warsaw University
Certification Exams
Various Under development Piotr Banski
Romuald Gozdawa-Golebiowski
Warsaw University, Poland
The MeLLANGE Learner Translator Corpus
Multilingual various written Legal, technical, administrative and journalistic texts Trainee translators  

Natalie Kübler
Université Paris Diderot, France.


Searchable online
The MiLC Corpus



Catalan written Formal and informal letters, summaries, curriculum vitae, essays, reports, translations, synchronous and asynchronous communication exchanges, business letters   c. 150,000 Angeles Andreu Andrés et al
Universidad Polytecnica de Valencia, Spain
The Multilingual Learner Corpus (MLC)



Brazilian Portuguese written Argumentative and marrative essays    Aim: c. 200,000 Stella E.O. Tagnin
University of São Paulo, Brazil
Accessible online to registered researchers
The Padova Learner Corpus



Italian CMC
(Computer-Mediated Communication)

Student work produced in blended language courses using FirstClass conferencing software.
Variety of genres: diaries, debate contributions, formal reports, résumés etc. 
Longitudinal data


  Under development Fiona Dalziel
Francesca Helm
University of Padua, Italy

The corpus PARallèle Oral en Langue Etrangère (PAROLE)




(Mainly L2 speakers but also includes data produced by L1 speakers)

Various spoken 5 oral production tasks Various   Heather Hilton
John Osborne
Marie-Jo Derive
Nejma Succo
Jean O'Donnell
Sandra Billard
Sandrine Rutigliano-Daspet
Université de Savoie, France
The University of Toronto Romance Phonetics Database



(including English, Mandarin, Russian, Spanish, etc.)
spoken Elicited production - sentence and passage reading, story narration, description of favourite meal Various   Laura Colantoni
Jeffrey Steele
University of Toronto, Canada
Password available from directors


Learner corpus-based datasets



Corpus Target language First language Medium Text type / task type Proficiency level Size in words Project director Availability
 The Treebank of Learner English
 English Various written  Sentences from the CLC FCE (annotated with syntactic trees)  Upper-intermediate

(5,124 sentences)

Yevgeni Berzak Publicly available through the UD repository ('English-ESL')