Data Analysis

lstat2110  2020-2021  Louvain-la-Neuve

Data Analysis
Due to the COVID-19 crisis, the information below is subject to change, in particular that concerning the teaching mode (presential, distance or in a comodal or hybrid format).
5 credits
30.0 h + 7.5 h
Q1
Teacher(s)
Language
French
Main themes
Contents: - Reminders of algebra and geometry useful for multivariate data analysis - Basic principles of factorial methods - Principal components analysis (PCA) - Canonical correlation - Factorial discriminant analysis (FDA) - Factorial correspondence analysis (FCA simple and multiple) - Cluster analysis - Data analysis in practice
Aims

At the end of this learning unit, the student is able to :

1 General objectives. Presentation of the modern techniques for the analysis of huge multivariate data sets. Developing the basic tools for " data mining ". Specific objectives. At the end of this course, the students should be able to: - Manipulate and describe the information contained in huge data sets; - Understand why such or such method is appropriate; - Give a correct interpretation of the resulting pictures and of the output of the software; - Solve problems with real data sets.
 

The contribution of this Teaching Unit to the development and command of the skills and learning outcomes of the programme(s) can be accessed at the end of this sheet, in the section entitled “Programmes/courses offering this Teaching Unit”.
Content
  • Data matrices
  • Principal component analysis
  • Classification: k-means clustering and hierarchical clustering
  • Linear discriminant analysis
  • Simple and multiple correspondence analysis
  • Principal component regression
  • Partial least squares regression
Implementation of the methods is done in the R language using the RStudio integrated development environment, and the R Markdown framework is used to combine text, mathematical formulas, R code and R output (tables, graphs).
Teaching methods

Due to the COVID-19 crisis, the information in this section is particularly likely to change.

During the lectures, the teacher presents the various statistical methods, covering the questions and data-sets to which they apply, the underlying mathematical theory, and how to program them in R. Homework assignments are given, the solution of which is discussed in the lectures too.
The tutorials take place in computer rooms and have as primary objective to allow the students to train themselves in applying the method on real data-sets in R.
Evaluation methods

Due to the COVID-19 crisis, the information in this section is particularly likely to change.

Exam (12/20):
  • written, closed book, with the help of a formula list and a pocket calculator
  • exercises and questions involving (small) calculcations, interpretation of computer output, and understanding of the main results and formulas
Tests during the lectures:
  • Test 1: Data matrices and principal component analysis
  • Test 2: Clustering and linear discriminant analysis
Participation is optional. At the discretion of the student, each test can replace the part of the exam on the same topic.
    Project (8/20):
    • individually or in pairs
    • data application, the data being sought by the students themselves
    • written report, to be submitted at a date or at dates specified during the semester
    • detailed instructions will be provided in the exercise sessions and on the MoodleUCL course page
    Submitting a projet is a necessary requirement in order to participate at the exam and obtain an exam result. At a second exam inscription, a new project can be resubmitted.
    Other information
    Prerequisities:
    • vector and matrix calculus
    • Euclidean geometry: points, spaces, orthogonality, distances, angles
    • basic notions in statistiques: sample mean, (co)variance, correlation, covariance matrix, conditional probabilities, normal distribution, chi-square distribution
    Online resources
    All teaching material is made available through the MoodleUCL cours page: slides, exercises, software scripts. In addition, links to interesting external material are given too: on-line courses, videos, software documentation.
    Bibliography
    • Escofier, B. et Pagès, J. (2016): Analyses factorielles simples et multiples, 5e édition, Dunod, Paris.
    • Lebart, L., Piron, M. et Morineau, A. (2006): Statistique exploratoire multidimensionnelle, 4e édition, Dunod, Paris.
    • Saporta, G. (2011): Probabilités, analyse des données et statistique, 3e édition révisée, Editions TECHNIP, Paris.
    Teaching materials
    • matériel sur moodle
    Faculty or entity


    Programmes / formations proposant cette unité d'enseignement (UE)

    Title of the programme
    Sigle
    Credits
    Prerequisites
    Aims
    Master [120] in Statistic: General

    Approfondissement en statistique et sciences des données

    Minor in Statistics, Actuarial Sciences and Data Sciences

    Master [120] in Biomedical Engineering

    Master [120] in Economics: General

    Master [120] in Mathematical Engineering

    Master [120] in Data Science : Statistic

    Master [120] in Statistic: Biostatistics

    Certificat d'université : Statistique et sciences des données (15/30 crédits)

    Master [120] in Mathematics