Statistical learning. Estimation, selection and inference

lstat2450  2020-2021  Louvain-la-Neuve

Statistical learning. Estimation, selection and inference
Due to the COVID-19 crisis, the information below is subject to change, in particular that concerning the teaching mode (presential, distance or in a comodal or hybrid format).
5 credits
30.0 h + 7.5 h
LSTAT2011 Éléments de mathématiques pour la statistique
LSTAT2013 - Concepts de base en statistique inférentielle
LSTAT2120 Linear models
LSTAT2020 Logiciels et programmation statistique de base
Main themes
The course focuses on high-dimensional settings and on techniques to that allow for parameter estimation, model selection and valid inferential procedures for high-dimensional models in statistics.

At the end of this learning unit, the student is able to :

1 With regard to the AA reference framework of the Master's programme in Statistics, general orientation, this activity contributes to the development and acquisition of the following AAs, as a matter of priority : 1.4, 1.5, 2.4, 4.3, 6.1, 6.2
The class is focused on the presentation of key concepts of statistical learning and high-dimensional models such as:
  • Statistical learning
  • Challenges concerning high-dimensional models and differences from low-dimensional models
  • Classical variable selection techniques for linear regression models: R2, adj.R2, Cp
  • Information criteria selection: KL divergence, AIC/TIC/BIC derivation
  • Cross-validation based selection: Leave-one-out and K-fold
  • Under- and overfitting or the bias-variance trade-off
  • Ridge shrinkage: theoretical properties, bias/variance trade-off, GCV
  • Lasso shrinkage: regularization paths, LARS, coordinate descent algorithm, prediction error bounds, degrees of freedom for lasso, support recovery, stability selection, knock-offs; inference by debiasing, post-selection inference, Bayesian inference
  • Extensions of Lasso: elastic net, group lasso, adaptive lasso, fused lasso
  • Other techniques: sparse graphical models, sparse PCA, sparse Disriminant Analysis 
Teaching methods

Due to the COVID-19 crisis, the information in this section is particularly likely to change.

The class consists of lectures (30h) and exercises sessions (7.5h).
Teaching language: English.
Evaluation methods

Due to the COVID-19 crisis, the information in this section is particularly likely to change.

An oral examination, where the instructors evaluate: 
  • knowledge about the concepts seen in class throughout the semester  (50% des points);
  • the quality of a project (written in French / English in min 5 and max 8 pages in the template on Moodle, annexes not included) of data analysis/simulation that ilustrates the statistical learning methods in a concrete case  (50% des points). This written project will be handed in before the exam session and discussed with the instructors during the exam session. The evaluation of the project is based on the written manuscript and responses to questions in an oral discussion about the results and the methodology used for the report.
The failure of one of the two parts results in the automatic failure of the course!
To be allowed to take part in the examination the student has to submit 3 compulsory homeworks (short, 1-2 pages maximum per homework). The homeworks are not graded as they are not part of the evaluation.
Submission of less than 3 homework results in failure of the course!
Online resources
Moodle website of the class : LSTAT2450 - Statistical learning. Estimation, selection and inference.
  • Hastie, T., Tibshirani, R. and Friedman, J. (2009). The Elements of  Statistical Learning: Data Mining, Inference, and Prediction. Springer.
  • James, G., Witten, D., Hastie, T., and Tibshirani, R. (2014). An Introduction to Statistical Learning: With Applications in R. Springer
  • Hastie, T., Tibshirani, R. and Wainwright, M. J. (2015). Statistical Learning with Sparsity: The Lasso and Generalizations. Chapman and Hall/CRC.
  • Wainwright, M. J. (2019). High-Dimensional Statistics: A Non-Asymptotic Viewpoint. Cambridge University Press.
  • Bühlmann, P., van de Geer, S. (2011). Statistics for High-Dimensional Data. Springer.
Teaching materials
  • Transparents du cours disponible sur moodle.
Faculty or entity

Programmes / formations proposant cette unité d'enseignement (UE)

Title of the programme
Master [120] in Data Science : Statistic

Certificat d'université : Statistique et sciences des données (15/30 crédits)

Master [120] in Statistic: General