Programme

Course content: Predictive modeling for linguists with R

The bootcamp is a hands-on introduction to statistical methods for both graduate students and seasoned researchers and is loosely based on the third edition (2021) of Gries’s textbook Statistics for linguistics with R (see prerequisites). The course is intended for linguists who already have a basic knowledge in statistics and some experience using R and who wish to improve their proficiency in statistical modeling of linguistic data. Using the open source software and programming language R, we will deal with:

fundamental aspects of fixed effects regression modeling for both numeric and binary response variables; these include exploration of data and their preparation for modeling, model formulation and selection; numerical and visual interpretation and evaluation of models;
more advanced aspects of fixed-effects regression modeling such as contrasts for ordinal predictors, orthogonal contrasts, curvature of numeric predictors, and maybe general linear hypothesis tests;
the theoretical foundations of mixed-effects regression modeling;
applications of mixed-effects modeling for both numeric and binary response variables;
tree-based methods and random forests: 'fitting' and interpreting them with importance scores, partial dependence scores, and detecting (not just capturing) interactions.

Typical schedule

Week day	Schedule
Monday	9.00-9.30 Welcome 9.30-12.30 Class	2.00 - 5.00 Class 7.00 Welcome dinner
Tuesday	9.00-12.15 Class	1.45 - 5.00 Class
Wednesday	9.00-12.15 Class	1.45 - 5.00 Class
Thursday	9.00-12.15 Class	1.45 - 5.00 Class
Friday	9.00-12.15 Class	1.45 - 5.00 Class

Class sessions of more than two hours include a 15-minute break.