March 24, 2017

14:30 - 15:30


ISBA C115 (Seminar Room Bernouilli)

Variable selection for high-dimensional data: a Focused Information Criterion.


The last few decades have seen a large increase of dataset characterized by a number of covariates $p$ exceeding the sample size $n$, requiring the development of new estimation and inference tools. In particular, $\ell_1$-penalized procedures such as the Lasso have become extremely popular by performing simultaneously estimation and variable selection. High-dimensional data require an asymptotic framework in which $p$ is allowed to grow with $n$. A sparsity condition such as $s_0=o(\sqrt{n/\log p })$ is usually needed to obtain reliable results, meaning that only a small number of covariates are relevant.

In this presentation, we consider a variable selection procedure initially developed in the classical low-dimensional setting, namely the Focus Information Criterion (FIC), and show how  it can be extended to high-dimensional data. The FIC departs from common tools such as the AIC, the BIC and the Lasso by performing focused driven variable selection. The FIC selects the model that best estimates a particular quantity of interest (the focus) in terms of mean squared error (MSE). Consequently, different models can be selected for different quantities of interest and the FIC can provide estimators with smaller MSE. An example of such a quantity of interest is the prediction for a new particular observation of the covariates.  In this high-dimensional framework we distinguish two cases: (i) the considered submodel is of low-dimension and (ii) the considered submodel is of high-dimension. In the former case, we obtain an alternative low-dimensional FIC formula that can directly be applied. In the latter case we use a desparsified estimator that allows us to derive the MSE of the focus estimator. We illustrate the performance of the high-dimensional FIC with a numerical study and a real dataset example.

Categories Events: