Data exploration and introduction to statiscal inference

lmafy1101  2020-2021  Louvain-la-Neuve

Data exploration and introduction to statiscal inference
Due to the COVID-19 crisis, the information below is subject to change, in particular that concerning the teaching mode (presential, distance or in a comodal or hybrid format).
5 credits
30.0 h + 30.0 h
Mastery of the French language and mathematics; level: last year of high school (math 6h / week).
Passive knowledge of English.
Main themes
This teaching unit provides an active introduction to the exploratory methods and fundamental principles of probabilistic and statistical modeling essential to the analysis of observational and experimental data. Real data will be used to present the numerical indexes and graphical descriptive analysis tools and to allow the student to discover and develop his operational skills in data processing using specialized software (for example, the free software R: Simulations of data, also done on software, following basic probability laws will develop an intuitive understanding of the notion of chance in modeling. Once these notions of probability (and random variable) are assimilated, fundamental notions of statistical inference such as sampling, point and interval estimation will be presented empirically via simulations and applied to different real contexts. The emphasis will be deliberately not on the demonstration of mathematical results but on the meaning and interpretation of concepts. The teaching unit will also include a module on the analysis of the uncertainties of experimental measurements, their sources, their propagation and their modeling. These aspects are illustrated on concrete cases in fields such as physics or biology.

At the end of this learning unit, the student is able to :

1 a.     Contribution of the teaching unit to the learning outcomes of the programme
With regard to the AA reference system of the Bachelor programme in mathematics, this teaching unit allows students to master:
·      as a priority, the following LOs: x.x, ....;
·      in a secondary way the following LO: x.x, .....
With regard to the AA reference system of the Bachelor programme in physics, this teaching unit allows students to master:
·      as a priority, the following LOs: A1.2, A1.5, A2.3, A3.4;
·      in a secondary way the following LOs: A5.3, A6.5, A4.4.
b.     Specific learning outcomes of the teaching unit

At the end of this teaching unit, the student will be able to:

' select descriptive statistics tools (numerical and graphical indexes) adapted to effectively summarize and answer questions about a dataset;
' analyze a potentially large data set via descriptive tools using specialized software;
' read and interpret the results of a descriptive data analysis by formalizing them in the context of the study;
' know how to explain and define the basic concepts of probability on events and related to univariate random variables;
' perform basic probability calculations in a variety of situations;
' Using software, simulations to illustrate the behaviour of random variables and the notions of sampling;
' explain the sampling notations and fundamental concepts of statistical inference, and estimate the basic parameters of a random variable by quantifying uncertainty via confidence intervals;
' identify and quantify the experimental uncertainty, and present the results of an analysis or a set of measurements while indicating the degree of uncertainty;
' explain the general objectives and concepts of linear and nonlinear modeling; adjust a linear model to a variable via a least squares method.
  • Presentation of databases from different fields of application in science and technology that are used to illustrate the course (results of laboratory experiments, weather data, survey results, clinical trials, stock market data, network data (social and other), etc.
  • Structuring of statistical data and nature of the variables (discrete / continuous quantitative, nominal / ordinal quality).
  • Numeric tools to summarize data by nature: frequency tables, position indices (mode, median, mean), dispersion indices (range, interquartile range, standard deviation, variance, coefficient of variation), percentiles, correlation coefficient.
  • Graphical tools to summarize data: histogram, empirical distribution function, bar graph, box-plot, time graph, X-Y graph (simple or matrix), Q-Q-plot.
  • Random experimentation and basic notions of probability theory: the definition of a probability and its elementary properties and the calculation of probability on events.
  • Random variables (univariate) and their properties (moments, distribution ...). Introduction of the most frequently used distributions in the analysis of data encountered in applications seen in the course: uniform law, binomial, Fish and normal.
  • Simple algorithms for generating random numbers according to these introduced probability laws.
  • Principle of random sampling and estimation.
  • Intuitive concept of sampling distribution and confidence interval.
  • Application to mean (and variance) in normal distribution.
  • Notions of error and measurement uncertainty.
  • Quantification and expression of uncertainty on a simple or repeated measurement (including uncertainty widened by confidence interval).
  • Calculation of composite uncertainties in the case of sums, products and nonlinear transformations of independent or correlated measurements. Applications to laboratory measurements in various fields (physics, biology, ...).
  • Adjustment of a linear or polynomial model (simple linear regression) by least squares method.
  • Initiation to a statistical programming language (for example, R). Approach through management projects and data analysis and simulations.
Teaching methods

Due to the COVID-19 crisis, the information in this section is particularly likely to change.

The teaching unit will consist of
  • lectures that will present the subject on the basis of examples,
  • exercises sessions to systematically put into practice the different notions seen in the course on well-targeted cases and using specialized software,
  • projects that will give the student the opportunity to integrate the different tools in fields of application of mathematics and physics.
The pedagogical approach will favor the active learning of students and will try to respect the pedagogical orientations proposed by the Faculty of Sciences.
Evaluation methods

Due to the COVID-19 crisis, the information in this section is particularly likely to change.

  • During the practicals: assignments in the form of mini-projects (case studies) to be solved in groups of two students using specialized software.
  • During the exam session: computer-based written exam.
Faculty or entity

Programmes / formations proposant cette unité d'enseignement (UE)

Title of the programme
Master [120] in Data Science : Statistic

Mineure en statistique et science des données

Certificat d'université : Statistique et sciences des données (15/30 crédits)

Minor in Statistics, Actuarial Sciences and Data Sciences

Bachelor in Physics

Bachelor in Mathematics

Approfondissement en statistique et sciences des données