By Victor Hamer and Pierre Dupont.
Feature selection is an important issue when one deals with large amounts of omics data. Feature selection offers interpretability of the predictive model to the domain expert. Such interpretability is strongly affected by the typical instability of current feature selection methods. Instability here refers to the fact that the selected features may be drastically different even after marginal modification of the data.
In this paper, we investigate the possibility of a tradeoff between the classification performance and the stability of a standard feature selection method: the Recursive Feature Elimination algorithm (RFE).
The compromise is done by explicitly favoring the selection of some features through differential shrinkage. Such an approach allows the domain expert to control explicitly the trade-off between selection stability and predictive accuracy. Domain experts can thus select particular Pareto-optimal compromises, based on their personal preferences.
As a secondary contribution, we propose the use of the hypervolume metric to assess the performance of methods realizing such a compromise and we define a corresponding confidence interval. Our approach is evaluated on prostate cancer diagnosis from microarray data and handwritten digit recognition tasks.
Results show that the aforementioned tradeoff is effectively possible and that prior knowledge is an efficient way of stabilizing the selection.
About authors :
Victor Hamer is a PhD student at the INGI department at the UCLouvain, Belgium. His current work focuses on the realization of a tradeoff between the classical predictive performance of feature selection methods and their selection stability.
Pierre Dupont is a Professor at the Louvain School of Engineering at the UCLouvain, Belgium and the co-founder of the UCL Machine Learning Group.