CoNNECT is a Corpus of Native and Non-native EFL Classroom Teacher Talk. It contains the transcripts of native and non-native English lesson audio-recordings carried out in secondary education (classes ranging from A1 to B2 levels). Data collection spread over a 26 month-period, starting in January 2009 until March 2011. Recordings were made in French-speaking Belgium and in Britain. CoNNECT is made up of two sub-corpora: the English-native sub-corpus includes 108,988 words, while its smaller non-native counterpart counts 56,526 words, for a total of 165,514 words.

The native-English recordings include 24 lessons which can be categorized as follows:

  • 9 fifty-minute EFL lessons recorded in English immersion classes
  • 3 fifty-minute EFL lessons recorded in English non-immersion classes
  • 6 fifty-minute lessons recorded in CLIL immersion classes
  • 6 ninety-minute EFL lessons recorded in England

In all, 11 different English-speaking teachers were recorded (9 British teachers, 1 American and 1 Irish teacher).

The non-native English recordings include 14 lessons taught by 7 Francophone EFL teachers, all of which took place in the French-speaking Community of Belgium.
The corpus is used to analyze the salient linguistic features of native-speaker teachers’ classroom language that could be useful to non-native foreign language teachers within the framework of their most common teaching functions. The native English-lesson recordings serve as a baseline for comparison with the non-native sub-corpus.

The corpus has been transcribed according to the guidelines used for the Louvain International Database of Spoken English (LINDSEI). Extra annotation for basic prosodic features has also been added in the English-native sub-corpus.