Although we do not yet know how long the social distancing related to the Covid-19 pandemic will last, and regardless of the changes that had to be made in the evaluation of the June 2020 session in relation to what is provided for in this learning unit description, new learnig unit evaluation methods may still be adopted by the teachers; details of these methods have been - or will be - communicated to the students by the teachers, as soon as possible.
At the end of this learning unit, the student is able to :
General objectives. Presentation of the modern techniques for the analysis of huge multivariate data sets. Developing the basic tools for " data mining ". Specific objectives. At the end of this course, the students should be able to: - Manipulate and describe the information contained in huge data sets; - Understand why such or such method is appropriate; - Give a correct interpretation of the resulting pictures and of the output of the software; - Solve problems with real data sets.
The contribution of this Teaching Unit to the development and command of the skills and learning outcomes of the programme(s) can be accessed at the end of this sheet, in the section entitled “Programmes/courses offering this Teaching Unit”.
- Data matrices
- Principal component analysis
- Classification: k-means clustering and hierarchical clustering
- Linear discriminant analysis
- Simple and multiple correspondence analysis
- Principal component regression
- Partial least squares regression
The tutorials take place in computer rooms and have as primary objective to allow the students to train themselves in applying the method on real data-sets in R.
- Test 1: Data matrices and principal component analysis
- Test 2: Clustering and linear discriminant analysis
- written, closed book, with the help of a formula list and a pocket calculator
- exercises and questions involving (small) calculcations, interpretation of computer output, and understanding of the main results and formulas
- individually or in pairs
- data application, the data being sought by the students themselves
- written report in R Markdown, to be submitted before the exam session
- detailed instructions will be provided in the exercise sessions and on the MoodleUCL course page
- vector and matrix calculus
- Euclidean geometry: points, spaces, orthogonality, distances, angles
- basic notions in statistiques: sample mean, (co)variance, correlation, covariance matrix, conditional probabilities, normal distribution, chi-square distribution
- Escofier, B. et Pagès, J. (2016): Analyses factorielles simples et multiples, 5e édition, Dunod, Paris.
- Lebart, L., Piron, M. et Morineau, A. (2006): Statistique exploratoire multidimensionnelle, 4e édition, Dunod, Paris.
- Saporta, G. (2011): Probabilités, analyse des données et statistique, 3e édition révisée, Editions TECHNIP, Paris.
- matériel sur moodle