PROCEED

PROCEED (PROcess Corpus of English in EDucation) is a new type of learner corpus, which, unlike traditional written learner corpora, does not simply give access to the final product of the writing process, but reproduces the process itself, including editing or use of online tools, for example. The compilation of this ‘process learner corpus’ relies on keystroke logging and screencasting which, by capturing the keyboard and screen activity, make it possible to keep a record of all the steps involved in the writing process.

The first PROCEED data were collected in February 2017, among (mostly) French-speaking university students majoring in English at the University of Louvain. Since then, several data collection sessions have been organized. During these sessions, the students are required to write a short argumentative essay on a computer. Next to texts in L2 English, we also collect corresponding texts from the same students in their L1. The corpus itself includes three types of data: the written texts, the keystroke log files and the screencast videos. The videos can be annotated through a multi-layer annotation system, so as to be amenable to corpus analysis. PROCEED comes with rich metadata in the form of a detailed learner profile, including cognitive measures such as fluid intelligence or working memory, which make it possible to account for individual variation in an unprecedented way by assessing the impact of learners’ cognitive abilities on writing.

The corpus can provide insights into the writing process of foreign learners of English by revealing what learners do when writing a text, how and when. It makes it possible, for example, to examine how they go about structuring their texts, what textual changes they make (insertions, deletions, corrections, etc.) and what online resources they use, if any (dictionaries, corpus interfaces, secondary sources, etc.). Phenomena such as pausing behaviour or, more generally, writing fluency can also be investigated on the basis of PROCEED. These results can then be related to the learner’s socio-linguistic and cognitive profile, thanks to the many variables recorded in the metadata.

The PROCEED data have served as a basis for the following projects: "Honing students' writing skills via the use of online tools", "Comparison of L1 and L2 writing fluency on the basis of screencast videos and keystroke log files" and "Dyslexia in L2 writing: A product- and process-based approach".

Project director:
Gaëtanelle Gilquin

The Louvain PROCEED team:
Gaëtanelle Gilquin
Samantha Laporte
Laurie Radar

How to cite:
Gilquin, Gaëtanelle (2022) The Process Corpus of English in Education: Going beyond the written text. Research in Corpus Linguistics 10(1): 31-44.