You can discover on this page some important projects that have been carried out over the last few years in the Linguistics Research Unit. Please consult the pages of the different centers and platforms for a more exhaustive list.
Research team involved: CENTAL
AMesure is a web platform designed to help writers of administrative texts to write as legibly as possible. It offers the three following services:
- Give a score assessing the reading difficulty of a text;
- Identify linguistic characteristics that negatively influence the comprehension process of a given text (such as rare or specialized terms, complex syntactic structures, excessive use of abbreviations, etc.);
- Provide editorial pieces of advice for each of the phenomena detected in step 2. To do this, we rely on the recommendations found in several official guides for simple writing.
Access the AMesure page here : AMesure
Research team involved: CENTAL
Archibald: ARCHIves Breeding by Automated Language Description
To date, and despite the digitalisation of media supports, processing audio-visual content found in Sonuma S.A.'s archived resources (made up of RTBF's and other local television station's archives) is greatly slowed by the documentation description phase. Only 25% of these resources possess the metadata necessary for optimal appreciation.
The main objective of the Archibald project is to accelerate the archives' availability and therefore to enhance their use and appreciation, by providing an indexation platform for both audio-visual (radio and TV programmes) and textual (descriptions, journalists' written supports, etc.) content and a plain text research interface, as well as a set of semantic navigation tools.
In order to achieve this objective, two problems must be solved:
- The segmentation of files. Each programme is monolithically stored and does not offer any means of efficient navigation within each component. Thus, it is not possible, for example, to pick a subject of news broadcast without browsing through the whole video file.
- Metadata augmentation. As the current information linked to each file does not take into consideration the audio and video sequences, it gives only a very partial overview of the subjects raised by each document.
Therefore, to ensure the effective appreciation of the contents, each file must be transformed and enriched so that the resulting units, i.e. the documents, can be found effectively and with a sufficient level of granularity.
Research teams involved: VALIBEL/CECL
In times of globalization and cultural openness, policies increasingly promote multilingualism as a strong social and economic asset. One way to foster multilingualism in education is Content and Language Integrated Learning (CLIL), a didactic method in which school subjects are taught in a different target language than the mainstream school language. In the French Community of Belgium, schools have been allowed to provide CLIL in Dutch, English or German since 1998. However, to this day we only have an incomplete and fragmented view on how CLIL differs from non-CLIL education and on how it impacts second/foreign language acquisition.
On the basis of a large-scale longitudinal study, this research project aims to gain insight into the linguistic, cognitive and educational aspects of CLIL and to understand how the interplay between those three perspectives may underlie L2 acquisition processes. To this end, the project concentrates on French-speaking CLIL and non-CLIL learners (control group) having Dutch or English as target language. Data are collected at different times in the last two years of primary and secondary school education. This interdisciplinary study intends to make a strong empirical and theoretical contribution to the ongoing international scientific debates on multilingualism in general and CLIL in particular.
Research team involved: CENTAL
This international project (including four universities in Belgium, France and Sweden) hosts a collection of machine-readable graded lexical resources that describe the frequency distributions of words observed across the six levels of the CEFR scale. The lexical frequencies reported in each of the resources have been estimated on a corpus of L2 learning materials. The resources are based on materials which foreign language learners are actually confronted with and can therefore be used for pedagogical purposes. Frequencies were estimated and normalized using a procedure described in François et al. (2014). Such resources, that are also available through a web interface, offer various useful perspectives for teaching and research.
Access the CEFRLex page here: CEFRLex
Discourse, populism and democracy – Tracking the uses of populism in media and political discourse (TrUMPo)
Research teams involved: VALIBEL/LASCO/IPSOLE
In contemporary democracies, there is not a day without the word populism being used in political and media discourse. For many observers worldwide, the spread of populism is one of the main threats to democracy. This has led to the development of a booming literature on populist political parties and politicians. Nevertheless, one major aspect of populism remains understudied, namely the use of the term populist itself by political and other actors, which is the subject of a real political struggle. Populism is indeed used by some political actors to disqualify political opponents but also as a positive category/label demonstrating proximity to people’s concerns in order to gain legitimacy. These uses of populism lead to fierce debates about the role and place of the people in democracies, and about who can pretend to best represent the people since the way the word and the notion of populism are defined, used and circulated is directly related to competing conceptions of democracy.
In order to understand how the construction of this category of populism contributes to shaping our collective imagination of democracy, TrUMPo seeks to understand in which contexts and situations this notion is used, which meaning it conveys in actual discursive practices, and how it circulates in the public debate. As a result, the topic will be studied from a threefold perspective: political science, communication studies and linguistics. Since TrUMPo’s aim is to understand how discourse about populism plays a crucial role in the mediatized political debate in European democracies, the project will compare these discourses in the four domestic contexts – French- and Dutch-speaking Belgium, France and Spain. Project data will come from the parliamentary arena as well as mass and social media. The data will be analyzed through qualitative and quantitative methods.
The multidisciplinary project is carried out under the supervision of Prof. Min Reuchamps (ISPOLE) as coordinator together with Prof. Barbara De Cock (ILC-PLIN), Prof. Philippe Hambye (ILC-PLIN) and Prof. Sandrine Roginsky (ILC-PCOM). Further project collaborators from PLIN are Nadezda Schinova, Raül Nuevo Gascó and Romane Werner.
Research teams involved: VALIBEL/CECL
The goals of the research program were theoretical, methodological and descriptive in nature. Its main topic was the investigation of fluency and disfluency markers in different languages and modes in a contrastive perspective, focusing on English and French spoken discourse, English learner speech, and French Belgian Sign language. The main hypothesis of the project was that there are no specific linguistic markers of fluency and disfluency but that the same markers can be signals of either type depending on the communicative situation, the discourse genre, the language, or the speakers involved.
The research was organised around four doctoral dissertations:
- Ludivine Crible (co-supervisors: Liesbeth Degand & Gaëtanelle Gilquin): Discourse Markers and (Dis)Fluency across Genres: A Contrastive Usage-Based Study in English and French (defended on 14 February 2017)
- Ingrid Notarrigo (co-supervisors: Laurence Meurant & Anne-Catherine Simon): Les marqueurs de (dis)fluence en langue des signes de Belgique francophone (LSFB) (defended on 31 August 2017)
- Amandine Dumont (co-supervisors: Sylviane Granger & Gaëtanelle Gilquin): Fluency and Disfluency: A Corpus Study of Nonnative and Native Speaker (Dis)fluency Profiles (defended on 22 May 2018)
- Iulia Grosman (co-supervisors: Anne-Catherine Simon & Liesbeth Degand): Evaluation contextuelle de la (dis)fluence en production et perception. Pratiques communicatives et formes prosodico-syntaxiques en français (to be defended in December 2018)
All research objectives have been fulfilled. From a theoretical point of view, the main advance concerns the ambivalent, contextual definition of the notions of fluency and disfluency. From a methodological point of view, the corpus-based analysis, with a common core in the four languages and modes, has proven successful in our endeavour to design comparable corpora as well as multilingual and multimodal annotation schemes, but also in the combination of corpus research and experimentation. Next to the four dissertations at the heart of the project, three additional doctoral projects were carried out in close interaction: George Christodoulides (supervisor: Anne-Catherine Simon) on "Effects on Cognitive Load on Speech Production and Perception", defended in September 2016; Silvia Gabarró Lopez (supervisor: Laurence Meurant) on "Discourse Markers in French Belgian Sign Language (LSFB) and Catalan Sign Language (LSC): Buoys, Palm-Up and Same. Variation, functions and position in discourse", defended in September 2017; Maïté Dupont (co-supervisors: Liesbeth Degand & Sylviane Granger) on "A systemic functional corpus-based analysis of contrastive connectives in English and French", to be defended in 2018. In addition, seven Master's theses were written within the framework of the project.
Research team involved: CECL
The International Corpus of Learner English contains argumentative essays written by higher intermediate to advanced learners of English from several mother tongue backgrounds. The corpus is the result of collaboration with a wide range of partner universities internationally. It is highly homogeneous as all partners have adopted the same corpus collection guidelines.
The first version was published on CD-ROM in 2002, and an expanded version, ICLEv2, was published in 2009. We are currently working towards an expanded web-based version of the corpus, ICLEv3, which will contain c. 5 million words of writing produced by learners from 26 mother tongue backgrounds (Brazilian Portuguese, Chinese, Czech, Dutch, Finnish, French, German, Greek, Hungarian, Italian, Japanese, Korean, Lithuanian, Macedonian, Norwegian, Persian, Polish, Punjabi, Russian, Serbian, Spanish, Swedish, Tswana, Turkish, Urdu).
The ICLE has been used extensively to inform theoretical and applied research in second language acquisition, foreign language teaching, language testing and natural language processing.
- Granger S. (2003). The International Corpus of Learner English : A new resource for foreign language learning and teaching and second language acquisition research. TESOL Quaterly 37(3), pp. 538-546.
- Granger, S., Dagneaux, E., Meunier, F., Paquot, M. (2009). The International Corpus of Learner English. Handbook and CD-ROM. Version 2. Louvain-la-Neuve: Presses universitaires de Louvain.
Research team involved: CECL
The Louvain EAP dictionary (LEAD) is a web-based English for Academic Purposes (EAP) dictionary for non-native writers (Granger and Paquot, 2010).
It contains a rich corpus-based description of c. 900 non-technical words and phrases that express key functions in academic discourse (e.g. contrast, exemplification or cause and effect), with particular focus on their phraseology (collocations and recurrent phrases). The lexical entries provide information derived from an analysis of a large corpus of academic texts (i.e. the academic component of the British National Corpus), as well as a range of home-made discipline-specific corpora and English as a Foreign Language (EFL) learner corpora representing a wide range of first language (L1) populations. Its main originality is its customisability: the content is automatically adapted to users’ needs in terms of discipline and mother tongue background.
Another key feature of the LEAD is that is makes full use of the capabilities afforded by the electronic medium in terms of multiplicity of access modes. The dictionary can be used as both a semasiological dictionary (from lexeme to meaning) and an onomasiological dictionary (from meaning/concept to lexeme) via a list of typical rhetorical or organisational functions in academic discourse. It is also a semi-bilingual dictionary as users who have selected a particular mother tongue background can search lexical entries via their translations into that language.
The LEAD dictionary is designed as an integrated tool where the actual dictionary part is linked up to other language resources (in particular, a corpus-handling tool, discipline-specific corpora and exercises).
A beta version of the LEAD is available to all UCL staff members and students (please make sure you register with a valid UCL email address).
The LEAD is being developed within the framework of the FNRS-FRFC project “Lexicography and phraseology: onomasiological and semasiological approach to English for Academic Purposes” (2.4.501.08.F) (2008-2012).
- Granger, S. & Paquot, M. (2015).Electronic lexicography goes local. Design and structures of a needs-driven online academic writing aid. Lexicographica 31.
Research team involved: CECL
Project director: Prof. Fanny Meunier
The LONGDALE (Longitudinal Database of Learner English) project was initiated in January 2008 and aims to build a large truly longitudinal database of learner English containing data from learners from a wide range of mother tongue backgrounds. The same students are followed over a period of at least three years and data collections are organised at least once a year.
The term 'database' rather than 'corpus' is used here as our aim is to collect a wide range of data types, from fairly uncontrolled spoken or written data such as argumentative essays, narratives or informal interviews to more guided types in the form of summaries or picture descriptions. We also collect some experimental data such as grammaticality judgment tests.
The database includes comprehensive learner profile information which is gathered during each data collection session. These variables include, inter alia, age, gender, educational background, country, language background, proficiency level. Variables pertaining to the task are also included.
Five international teams have collected data so far and future collections will soon be organised.
- de Haan, Pieter / van der Haagen, Monique 2014. A Longitudinal Study of the Syntactic Development of Very Advanced Dutch EFL Writing. In Vandelanotte, Lieven / Davidse, Kristin / Gentens, Caroline / Kimps, Ditte (eds) Recent Advances in Corpus Linguistics Developing and Exploiting Corpora. Amsterdam/New York: Rodopi, 335–349.
- Gentil, Guillaume / Meunier, Fanny (in press) A systemic functional linguistic approach to usage-based research and instruction: The case of nominalization in L2 academic writing. In Tyler, A. / Ortega, L. / Uno, M. / Park, H. I. (eds.). Usage-inspired L2 instruction: Researched pedagogy. Amsterdam: John Benjamins.
- Goutéraux, Pascale 2013. Learners of English and Conversational Proﬁciency. In Granger, Sylviane / Gilquin, Gaëtanelle / Meunier, Fanny (eds) Twenty Years of Learner Corpus Research: Looking Back, Moving Ahead. Louvain-la-Neuve: Presses universitaires de Louvain, 197–210.
- Littré, Damien 2015. Combining Experimental Data and Corpus Data: Intermediate French-speaking Learners and the English Present. Corpus Linguistics and Linguistic Theory 11/1, 89–126.
- Meunier, Fanny 2016. Introduction to the LONGDALE Project. In Castello, Erik / Ackerley, Katherine / Coccetta, Francesca (eds) Studies in Learner Corpus Linguistics. Research and Applications for Foreign Language Teaching and Assessment. Berlin: Peter Lang, 123-126.
- Meunier, Fanny / Littré, Damien 2013. Tracking Learners’ Progress. Adopting a Dual ‘Corpus Cum Experimental Data’ Approach. The Modern Language Journal 97/1, 61–76.
- van Vuuren, Sanne 2013. ‘Information Structural Transfer in Advanced Dutch EFL Writing: A Cross-linguistic Longitudinal Study’, in Aalberse, S. and Auer, A. (eds) Linguistics in the Netherlands 2011, Amsterdam: Benjamins, 173–87.
Research team involved: CECL
The Multilingual Student Translation (MUST) project brings together translation and foreign language researchers and teachers around two main objectives: to collect and share translations produced by students and to process them using a standardized set of tools and guidelines with a view to advancing empirical research and optimizing translation teaching.
The MUST corpus is truly multilingual; it includes both direct (L2>L1) and inverse (L1<L2) translation, and represents a range of text types, genres, registers and topics. Two key features of MUST are its rich set of standardized metadata subdivided into three categories (translator metadata source text metadata and translation task metadata) and its computer-aided Translation-oriented Annotation System (TAS).
The project currently includes 31 research teams from 14 countries and covers 16 languages (Chinese, French, Dutch, English, Galician, German, Greek, Italian, Lithuanian, Macedonian, Norwegian, Polish, (Brazilian) Portuguese, Russian, Slovene, Spanish). To ensure easy access for all partners in the project, all the data are collected and searchable on a web-based interface, Hypal4MUST, and adapted version of the Hypal software tool designed by Adam Obrusnik for the processing of parallel texts.
Research team involved: VALIBEL
The main objective of the COST Action is to coordinate the creation of a European portal of cross-linguistically available monolingual or parallel corpora that have been enriched and made interoperable and co-searchable through annotation of discourse relational devices and the information they convey.
Access the TextLink Website here: TextLink