Corpus linguistics: From manual to automatic analysis

lling2250  2025-2026  Louvain-la-Neuve

Corpus linguistics: From manual to automatic analysis
5.00 credits
22.5 h + 10.0 h
Q1
Language
Prerequisites
  • A bachelor's degree  
  • A solid foundation of general linguistics 
  • A solid knowledge of academic English 
Main themes
This course focuses on corpus linguistics: the use of corpora in exploring theoretical questions in various areas of linguistics. The notion and specificities of a corpus of text data or oral data will be defined. Several methods (both qualitative and quantitative) that enable answering a research question based on corpus data will be introduced as well as different tools linked to these methods. The course will be hands-on: students will get practical experience with various computational tools, as well as with the statistical package R. The practical application of the concepts and methods learned will be grounded in a research project involving the full research cycle: framing of a research question, corpus construction, analyses (qualitative/quantitative), presentation of the results (both orally and written). 
Learning outcomes

At the end of this learning unit, the student is able to :

1 Build a corpus of text or oral language data for the analysis of a particular linguistic phenomena (lexical, phonetic, syntactic, semantic, or related to discourse) 
 
2 Use several tools for corpus analysis
 
3 Answer a research question using quantitative corpus analysis
 
4 Answer a research question using qualitative corpus analysis
 
5 Present orally the research question, the method and results
 
6 Present the research question, method and results in a written academic paper
 
This learning unit contributes to the development and command of the following skills and learning outcomes of the ELAL programmes (ELAL learning outcomes)
 
Content
GENERAL OBJECTIVES :
(1) To be able to lead, completely and autonomously, a linguistic study on corpora;
(2) To acquire a general knowledge on linguistic corpora (in French, but not only), tools and methods.
SPECIFIC OBJECTIVES :
(1) To design a corpus answering a specific research question;
(2) To collect oral and/or written data + metadata;
(3) To "edit" the data (transcription, cleaning, formatting, encoding, etc.);
(4) To tag the data (at several levels of linguistic analysis) using adequate software;
(5) To formulate research questions / hypotheses;
(6) To choose a methodology of analysis;
(7) To exploit / analyze a corpus according to a chosen methology (qualitative and/or quantitative);
(8) To present the results obtained.
Teaching methods
22 hours of lectures + 10 hours of practical work (dedicated to the realization of a personal research based on the methodology see in the lectures).
Evaluation methods
The final grade is the mean of the grades obtained for the following 4 components:
  • Attendance and participation to lectures mandatory since lectures are hands-on (10%)
  • Submission of a short homework in R consisting data manipulation and vizualization (at mid-term) (15%)
  • Research paper of 13 pages max. (with bibliography, but without appendices) to be submitted on the first day of the exam session (50%)
  • Oral exam with presentation and defense of the research paper (25%)
For the summer (August/September) exam session, the continued evaluation will still be applicable. Students who failed one of the components will be offered the possibility to retake the component or another one judged equivalent by the professors.
Other information
Generative artificial intelligence (AI) must be used responsibly and in accordance with academic and scientific integrity practices. Given that scientific integrity requires sources to be cited, the use of generative AI must be explicitly and thorougly acknowledged: the student is required to state (for example in a footnote) whether they used a generative AI (and which one) in writing their answers/term paper, specifying in what capacity the generative AI was used. The student remains responsible for the content of their work, regardless of the sources/references used. Any use of a generative AI for tasks prohibiting its use will be viewed as cheating.
Bibliography
Zufferey, Sandrine (2020). Introduction to Corpus Linguistics. Wiley.
Teaching materials
  • L'ensemble des supports sont disponibles sur Moodle.
Faculty or entity


Programmes / formations proposant cette unité d'enseignement (UE)

Title of the programme
Sigle
Credits
Prerequisites
Learning outcomes
Master [120] in French and Romance Languages and Literatures : French as a Foreign Language

Master [120] in Translation

Master [120] in Linguistics

Master [120] in Modern Languages and Literatures : German, Dutch and English

Master [120] in Modern Languages and Literatures : General