Statistical learning. Estimation, selection and inference

ldats2450  2025-2026  Louvain-la-Neuve

Statistical learning. Estimation, selection and inference
5.00 crédits
30.0 h + 7.5 h
Q2
Enseignants
Langue
d'enseignement
Anglais
Préalables
Concepts et outils équivalents à ceux enseignés dans les UEs
LSTAT2020Logiciels et programmation statistique de base
LSTAT2120Linear models
LSTAT2100Modèles linéaires généralisés et données discrêtes
Thèmes abordés
Le cours se concentre sur le cadre ‘modélisation en grande dimension’ et sur les techniques permettant l'estimation des paramètres, la sélection de modèles et les procédures inférentielles valides pour les modèles de grande dimension en statistique.
Acquis
d'apprentissage

A la fin de cette unité d’enseignement, l’étudiant est capable de :

1 Eu égard au référentiel AA du programme de master en statistique, orientation générale, cette activité contribue au développement et à l'acquisition des AA suivants, de manière prioritaire : 1.4, 1.5, 2.4, 4.3, 6.1, 6.2
 
Contenu
The class is focused on the presentation of key concepts of statistical learning and high-dimensional models such as:
  • Statistical learning
  • Challenges concerning high-dimensional models and differences from low-dimensional models
  • Classical variable selection techniques for linear regression models: R2, adj.R2, Cp
  • Information criteria selection: KL divergence, AIC/TIC/BIC derivation
  • Cross-validation based selection: Leave-one-out and K-fold
  • Under- and overfitting or the bias-variance trade-off
  • Ridge shrinkage: theoretical properties, bias/variance trade-off, GCV
  • Lasso shrinkage: regularization paths, LARS, coordinate descent algorithm, prediction error bounds, degrees of freedom for lasso, support recovery, stability selection, knock-offs; inference by debiasing, post-selection inference, Bayesian inference
  • Extensions of Lasso: elastic net, group lasso, adaptive lasso, fused lasso
  • Other techniques: sparse graphical models and networks, sparse PCA, sparse Discriminant Analysis 
Méthodes d'enseignement
The class consists of lectures (30h) and exercises sessions (7.5h).
The classes and the TP are intended to be face to face.
Teaching language: English.
Modes d'évaluation
des acquis des étudiants
June Session:
  • During the semester the student must submit 2 compulsory assignments (short, 2-3 pages maximum per assignment), counting for 1 point of the final grade (each assignment = 0.5 points). The assignments are to be solved individually or in groups of 2. A mark will be assigned per group. Assignments arriving after the deadline are not considered.
  • A project (written in French/English in min 6 and max 12 pages in the template on Moodle, appendices not included) which will illustrate the methods of the course for 5 points. This (written) project will be submitted before the exam session and discussed with the teacher during the exam session. The evaluation of the project is done on the basis of the written report and on the basis of the answers in an oral discussion (without slides) on the results and methodology used for the report, during the exam session. The project is to be solved individually or in groups of 2. A score will be awarded per group. Projects arriving after the deadline are not considered.
  • An oral exam (~45min), in which the teacher will assess knowledge about the material covered in class (14 points), the quality of the project and the homework.
Attention: Any usage of artificial intelligence software for producing part of text, code, figures or equations that are included in the final project or homework is strictly forbidden. All projects and homework will be analyzed with specialized software and infringements of this rule can result in failing the class. 
The final grade for the LSTAT2450 course in June is given by the points obtained for the assignments + the points obtained for the project + the points obtained for knowldge about the material covered in class.
To validate the course, the student needs a final mark of 10 or more. 
August session:
  • A project (written in French/English in min 6 and max 12 pages in the template on Moodle, appendices not included) which will illustrate the methods of the course for 5 points. This (written) project will be submitted before the exam session and discussed with the teacher during the exam session. The evaluation of the project is done on the basis of the written report and on the basis of the answers in an oral discussion (without slides) on the results and methodology used for the report, during the exam session. The project is to be solved individually or in groups of 2. A score will be awarded per group. Projects arriving after the deadline are not considered.
  • An oral exam (~45min), in which the teacher will assess the knowledge about the material covered in class (15 points) and the quality of the project.
Attention: Any usage of artificial intelligence software for producing part of text, code, figures or equations that are included in the final project or homework is strictly forbidden. All projects and homework will be analyzed with specialized software and infringements of this rule can result in failing the class. 
The final grade for the LSTAT2450 course in August is given by the points obtained for the project + the points obtained for knowldge about the material covered in class. The points awarded for homework do not count for the August session, as continuous assessment is only planned for work during the semester.
To validate the course, the student needs a final mark of 10 or more. 
Autres infos
Software: R/Python
French friendly class.
Ressources
en ligne
Moodle website of the class : LSTAT2450 - Statistical learning. Estimation, selection and inference.
https://moodle.uclouvain.be/course/view.php?id=4214
Bibliographie
  • Hastie, T., Tibshirani, R. and Friedman, J. (2009). The Elements of  Statistical Learning: Data Mining, Inference, and Prediction. Springer.
  • James, G., Witten, D., Hastie, T., and Tibshirani, R. (2014). An Introduction to Statistical Learning: With Applications in R. Springer
  • Hastie, T., Tibshirani, R. and Wainwright, M. J. (2015). Statistical Learning with Sparsity: The Lasso and Generalizations. Chapman and Hall/CRC.
  • Wainwright, M. J. (2019). High-Dimensional Statistics: A Non-Asymptotic Viewpoint. Cambridge University Press.
  • Bühlmann, P., van de Geer, S. (2011). Statistics for High-Dimensional Data. Springer.
Support de cours
  • Transparents du cours disponible pendant le quadrimestre
Faculté ou entité
en charge


Programmes / formations proposant cette unité d'enseignement (UE)

Intitulé du programme
Sigle
Crédits
Prérequis
Acquis
d'apprentissage
Master [120] en science des données, orientation statistique

Master [120] en statistique, orientation biostatistiques

Master [120] en sciences mathématiques

Master [120] en statistique, orientation générale

Master [120] : ingénieur civil en mathématiques appliquées

Master [120] : ingénieur civil en science des données

Certificat d'université : Statistique et science des données (15/30 crédits)

Master [120] en science des données, orientation technologies de l'information