CLARIN - Knowledge Centre for Learner Corpora

The CLARIN Knowledge Centre for Learner Corpora (CKL2CORPORA) is a new CLARIN Knowledge Centre with official recognition issued on 28 November 2022 for a duration of 3 years.

Missions

The CLARIN Knowledge Centre for Learner Corpora offers expert knowledge on the collection and use of learner corpora (i.e. electronic collections of language data produced by second or foreign language learners) for theoretical and applied purposes. Sharing of expertise can take various forms, from answering (theoretical, methodological, technical) questions sent via the helpdesk to sharing resources and providing training services.

The K-centre builds on more than 30 years of expertise in learner corpus research (from corpus design to corpus analysis) at the Centre for English Corpus Linguistics (CECL). Among other things, CECL members launched the Learner Corpus Bibliography and also maintain the Learner Corpora around the World webpage. They are the founding members of the Learner Corpus Association and also launched the International Journal of Learner Corpus Research. They have published extensively on learner corpus research. Major CECL publications include The Cambridge Handbook of Learner Corpus Research (edited by S. Granger, G. Gilquin, & F. Meunier, 2015) and The Routledge Handbook of Second Language Acquisition and Corpora (N. Tracy-Ventura & M. Paquot, 2021).

Currently, the K-centre relies on the expertise of staff from 3 research centres from the Linguistic Research Unit of the Institute for Language and Communication (ILC):

Together, the three research centres have expertise in learner corpus design (metadata, transcription, ethics), annotation (POS tagging, parsing, error annotation), and analysis. Current projects of the CKL2CORPORA members include the development of a Core Metadata Schema for Learner Corpora (Paquot et al., 2024) and the development of FABRA (Wilkens et al., 2022), a tool that was originally developed for readability research but can also be used to compute a wide range of measures of linguistic complexity for French.

More generally, CKL2CORPORA members and ILC colleagues specialize in applied linguistics, corpus linguistics, natural language processing, second language acquisition, and translation studies. They have knowledge in a variety of linguistic topics (linguistic complexity, phraseology, discourse markers, morphology, language variation, crosslinguistic influence), theories (usage-based approaches to language acquisition, Construction Grammar) and methodologies (contrastive interlanguage analysis, discourse analysis, error analysis, etc.), which they often use to explore how languages vary from one another, and how each language varies according to its context of use. This theoretical and methodological toolkit can be applied to a wide range of language varieties and registers (learner language, translation, academic writing, computer-mediated communication).

CLARIN infrastructure

The Knowledge Centre for Learner Corpora is part of CLARIN-BE, the Belgian infrastructure for language resources and technologies. CLARIN-BE is operating the Belgian part of the pan-European CLARIN infrastructure. CLARIN Knowledge Centres are a cornerstone of the CLARIN Knowledge Infrastructure and have their own specific areas of expertise (individual languages, language processing topics, data types, etc.).