ICTEAM - Public Thesis defense - John Aoga

July 09, 2019

14:00-17:00

Louvain-la-Neuve

Place du Levant, 3

Global Constraints for Mining Sets and Sequences

pour l’obtention du grade de Docteur en sciences de l’ingénieur et technologie

The purpose of Pattern Mining (PM) is the discovery of patterns, knowledge, and information in data (textual or structured). Major concerns in the design of PM algorithms include the efficiency (in time and memory consumption) and the flexibility of the mining algorithms. The flexibility means that one can add preferences/constraints to guide the mining process in finding relevant patterns.

In recent years, the paradigm of Constraint Programming (CP) has been introduced in PM. CP is a way of solving combinatorial problem by separating its resolution into two steps: the modeling of the problem and the search for solutions. This way, CP is sufficiently flexible in the addition of new constraints.

However, existing approaches towards standard and CP-based PM represent different trade-offs between these concerns (flexibility and efficiency): standard PM methods are highly efficient but lack in terms of flexibility. On the other hand, CP-based PM methods are very flexible but less efficient.

Furthermore, standard PM typically relies on effective data structures and algorithmic improvements, where CP-based PM relies on the power of declarative tools. In this work we aim to reconnect the two approaches to maximize both flexibility and efficiency:

  • we propose new algorithms for PM using CP, in which PM problems are modeled as global constraints;
  • we show that through these global constraints one can combine ingredients from both PM and CP to build effective data structures and efficient filtering algorithms.

We experimentally demonstrate that our approaches perform well in terms of efficiency on a variety of benchmarks despite its flexibility. We show that these global constraints can be (i) combined with existing constraints in CP such as regular expression and grammar constraints, (ii) used as a building block in learning tasks such as discriminative patterns and rule-based models.