Course content: Predictive modeling for linguists with R
The bootcamp is a hands-on introduction to statistical methods for both graduate students and seasoned researchers and is loosely based on the third edition (2021) of Gries’s textbook Statistics for linguistics with R (see prerequisites). The course is intended for linguists who already have a basic knowledge in statistics and some experience using R and who wish to improve their proficiency in statistical modeling of linguistic data. Using the open source software and programming language R, we will deal with:
- fundamental aspects of fixed effects regression modeling for both numeric and binary response variables; these include exploration of data and their preparation for modeling, model formulation and selection; numerical and visual interpretation and evaluation of models;
- more advanced aspects of fixed-effects regression modeling such as contrasts for ordinal predictors, orthogonal contrasts, curvature of numeric predictors, and maybe general linear hypothesis tests;
- the theoretical foundations of mixed-effects regression modeling;
- applications of mixed-effects modeling for both numeric and binary response variables;
- tree-based methods and random forests: 'fitting' and interpreting them with importance scores, partial dependence scores, and detecting (not just capturing) interactions.
Typical schedule
Week day |
Schedule |
|
Monday |
9.00-9.30 Welcome |
2.00 - 5.00 Class |
Tuesday |
9.00-12.15 Class |
1.45 - 5.00 Class |
Wednesday |
9.00-12.15 Class |
1.45 - 5.00 Class |
Thursday |
9.00-12.15 Class |
1.45 - 5.00 Class |
Friday |
9.00-12.15 Class |
1.45 - 5.00 Class |
Class sessions of more than two hours include a 15-minute break.