Resources

ILC Louvain-La-Neuve, Mons

Corpora

Corpor@uclouvain  Some of the corpora compiled by members of our research institute are distributed on the Corpor@uclouvain catalogue. This catalogue contains learner corpora and corpora of various other types.
Learner corpora around the world The Centre for English Corpus Linguistics maintains a list of learner corpora with relevant metadata and information about their availability for research purposes
L2 learner corpora resource family  The CLARIN infrastructure provides access to 74 L2 learner corpora

 

Other resources and tools 

Some of the resources and tools developed by members of our research institute can be used to compile, annotate and analyze learner corpora:

Academic Keyword List The Academic Keyword List contains 930 academic words that can be used to explore the lexical sophistication of L2 English learner language 
CEFRLex The CEFRLex project proposes several lexical resources graded according to the Common European Framework of Reference for language skills (CEFR)
Core Metadata Schema for Learner Corpora This document contains a list of metadata fields that can be used to describe learner corpus data. The core metadata scheme is structured around 8 metadata types: - Administrative metadata; - Corpus design metadata; - Learner; - Text (language sample); - Task; - Annotation; - Annotator; - Transcriber.
FABRA FABRA was first developed as a readability toolkit based on the aggregation of a large number of readability predictor variables targeting French. In practice, the tool computes a large number of complexity measures typically used in L2 research
fsca fsca is an open-source R package for the extraction of syntactic units from dependency-parsed French texts. 
Guide pratique de constitution de corpus A set of guidelines (written in French) to help our students collect and document written and spoken corpora
Recto-Verso The software allows you to automatically introduce the 1990 spelling corrections into a text
Resyf French lexical resource with synonyms graded according to their level of difficulty
TreeTagger Web interface that facilitates the use of the TreeTagger tagger, developed at the Institute for Computational Linguistics at the University of Stuttgart
UCLouvain Error Editor (UCLEEv2) Software meant to facilitate the insertion of error tags and corrections into learner texts, as well as their subsequent processing

 

Publications

CECL papers The CECL Papers aim to make available to the academic community a series of articles, books and technical papers related to activities (conferences, corpus collection, corpus annotation, etc.) led by the CECL. Several of these publications focus on L2 research (e.g. The Louvain Error Tagging Manual).
Learner Corpus Bibliography The Learner Corpus Bibliography (LCB) is a collection of c. 2000 references related to Learner Corpus Research. The LCB was created and maintained by the CECL for many years. In 2013, the CECL agreed to share the LCB with the Learner Corpus Association, which currently maintains it in the form of a Zotero-based collection available to all its members.
The International Journal of Learner Corpus Research The International Journal of Learner Corpus Research (IJLCR) is a forum for researchers who collect, annotate, and analyse computer learner corpora and/or use them to investigate topics in Second Language Acquisition and linguistic theory in general, inform foreign language teaching, develop learner-corpus-informed tools (e.g. courseware, proficiency tests, dictionaries and grammars) or conduct natural language processing tasks (e.g. annotation, automatic spell- and grammar-checking, L1 identification).