Multivariate statistics for linguistics

lling2240  2025-2026  Louvain-la-Neuve

Multivariate statistics for linguistics
The version you’re consulting is not final. This course description may change. The final version will be published on 1st June.
5.00 credits
22.5 h + 10.0 h
Q2
Language
Main themes
Access to this course is restricted to students who have already successfully completed an introductory statistics course, e.g. LFIAL2260.  
The course will cover the following topics:  
  • Concept and structure of a statistical model (linear or non-linear) and examples of typical linguistic research questions 
  • Regression models: introduction to different regression models (e.g. linear regression, logistic regression, etc.), algorithms to select variables, notion of multicollinearity, models’ estimation, interpretation of parameters, quality of predictions 
  • Analysis of variance: presentation of different analysis of variance techniques (for parametric and non-parametric data; for independent or repeated measures; with one or two classification criteria, etc.), the logic of F-test, multiple comparison of means (post-hoc tests) 
  • Linear mixed models: notion of generalised linear model, random effects, hierarchical models  
  • Exploratory methods: exploration of linguistic data with methods such as principal component analysis, factor analysis, etc. 
  • Classification methods: introduction to classification models, e.g. decision trees   
  • Model validation: measures of goodness of fit, residual analysis, variance and sphericity homogeneity tests, detection of outliers or influential points, variable transformation, etc. 
Learning outcomes

At the end of this learning unit, the student is able to :

1 Translate a linguistic research problem into a series of statistical questions, choose the appropriate methods, apply them and present all the results in a report. 
 
2 Understand and explain the statistical concepts underlying the different methods used in the course.  
 
3 Apply the various statistical methods covered in the course to textual data using the R software.   
 
This teaching unit contributes to the development and mastery of the following skills and outcomes of the ELAL programmes (ELAL learning outcomes):  1.4 ; 2.3 ; 2.6 ; 3.1 ; 3.2 ; 3.3 ; 3.5 ; 4.5 ; 5.1 ; 5.2 ; 5.3 
 
Content
After an introduction to the central role of statistical methods in linguistics, the course will cover various topics:   
  • Notions of statistical modelling 
  • ANOVA I - Analysis of variance with one classification criterion: Classical model, post-hoc comparisons and Kruskal-Wallis ANOVA 
  • ANOVA II - Analysis of variance with two classification criteria 
  • ANOVA for repeated measures: Classical model and Friedman ANOVA 
  • Simple and multiple regression models and residual analysis 
  • Simple et multiple logistic regression models 
  • GLM - General linear model and mixed models 
  • Exploratory multivariate statistical analyses: Principal component analysis and factor analysis 
  • Classification methods: decision trees
Teaching methods
Lectures + readings + practical works
Evaluation methods
The evaluation is three-fold :
  • continuous assessment (exercices during TP and readings) (30 %)
  • written examination (30 %)
  • personal written essay (40 %)
In September, the evaluation is adapted as follows:
  • written examination (50 %)
  • personal written essay (50 %)
Generative artificial intelligence (AI) must be used responsibly and in accordance with the practices of academic and scientific integrity. Scientific integrity requires that sources be cited, and the use of AI must always be reported. The use of artificial intelligence for tasks where it is explicitly forbidden will be considered as cheating.
Other information
Support (available on Moodle) :
  • slides;
  • articles ou book chapters;
  • additional exercices.
Bibliography
Field, A. (2013). Discovering statistics using IBM SPSS statistics. Sage.
Howell, D. (2008). Méthodes statistiques en sciences humaines, Paris, De Boeck Université.
Muller, Charles (1992). Initiation aux méthodes de la statistique linguistique, Champion.
Rasinger, S.M. (2008). Quantitative Research in Linguistics. New York, Continuum International Publishing Group
Faculty or entity


Programmes / formations proposant cette unité d'enseignement (UE)

Title of the programme
Sigle
Credits
Prerequisites
Learning outcomes
Master [120] in Linguistics

Master [120] in Modern Languages and Literatures : German, Dutch and English

Master [120] in Modern Languages and Literatures : General