Corpora
Corpor@uclouvain | Some of the corpora compiled by members of our research institute are distributed on the Corpor@uclouvain catalogue. This catalogue contains learner corpora and corpora of various other types. |
Learner corpora around the world | The Centre for English Corpus Linguistics maintains a list of learner corpora with relevant metadata and information about their availability for research purposes |
L2 learner corpora resource family | The CLARIN infrastructure provides access to 74 L2 learner corpora |
Other resources and tools
Some of the resources and tools developed by members of our research institute can be used to compile, annotate and analyze learner corpora:
Academic Keyword List | The Academic Keyword List contains 930 academic words that can be used to explore the lexical sophistication of L2 English learner language |
CEFRLex | The CEFRLex project proposes several lexical resources graded according to the Common European Framework of Reference for language skills (CEFR) |
Core Metadata Schema for Learner Corpora | This document contains a list of metadata fields that can be used to describe learner corpus data. The core metadata scheme is structured around 8 metadata types: - Administrative metadata; - Corpus design metadata; - Learner; - Text (language sample); - Task; - Annotation; - Annotator; - Transcriber. |
FABRA | FABRA was first developed as a readability toolkit based on the aggregation of a large number of readability predictor variables targeting French. In practice, the tool computes a large number of complexity measures typically used in L2 research |
fsca | fsca is an open-source R package for the extraction of syntactic units from dependency-parsed French texts. |
Guide pratique de constitution de corpus | A set of guidelines (written in French) to help our students collect and document written and spoken corpora |
Recto-Verso | The software allows you to automatically introduce the 1990 spelling corrections into a text |
Resyf | French lexical resource with synonyms graded according to their level of difficulty |
TreeTagger | Web interface that facilitates the use of the TreeTagger tagger, developed at the Institute for Computational Linguistics at the University of Stuttgart |
Publications
CECL papers | The CECL Papers aim to make available to the academic community a series of articles, books and technical papers related to activities (conferences, corpus collection, corpus annotation, etc.) led by the CECL. Several of these publications focus on L2 research (e.g. The Louvain Error Tagging Manual). |
Learner Corpus Bibliography | The Learner Corpus Bibliography (LCB) is a collection of c. 2000 references related to Learner Corpus Research. The LCB was created and maintained by the CECL for many years. In 2013, the CECL agreed to share the LCB with the Learner Corpus Association, which currently maintains it in the form of a Zotero-based collection available to all its members. |
The International Journal of Learner Corpus Research | The International Journal of Learner Corpus Research (IJLCR) is a forum for researchers who collect, annotate, and analyse computer learner corpora and/or use them to investigate topics in Second Language Acquisition and linguistic theory in general, inform foreign language teaching, develop learner-corpus-informed tools (e.g. courseware, proficiency tests, dictionaries and grammars) or conduct natural language processing tasks (e.g. annotation, automatic spell- and grammar-checking, L1 identification). |