November 21, 2018
12:50 - 13:50
Louvain-la-Neuve
Paul Otlet Room - Reaumur building a.327
Finding Probabilistic Rule Lists using the Minimum Description Length Principle
by John Aoga Ph.D. student at UCLouvain
An important task in data mining is that of rule discovery in supervised data. Well-known examples include rule-based classification and subgroup discovery. Motivated by the need to succinctly describe an entire labeled dataset, rather than accurately classify the label, we propose an MDL- based supervised rule discovery task. The task concerns the discovery of a small rule list where each rule captures the probability of the Boolean target attribute being true.
Our approach is built on a novel combination of two main building blocks: (i) the use of the Minimum Description Length (MDL) principle to characterize good-and-small sets of probabilistic rules, (ii) the use of branch-and-bound with a best-first search strategy to find better-than-greedy and optimal solutions for the proposed task.
We experimentally show the effectiveness of our approach, by providing a comparison with other supervised rule learning algorithms on real-life datasets. Biography: John is a 4nd year Ph.D. student at UCLouvain. His main research topics are the hybridization of CP/MIP and Data mining/Machine Learning techniques, especially the design of new approaches being more flexible and interpretable.
John Aoga is a 4nd year Ph.D. student at UCLouvain. His main research topics are the hybridization of CP/MIP and Data mining/Machine Learning techniques, especially the design of new approaches being more flexible and interpretable.