INGI Seminar

August 23, 2019

12:50-13:50

Louvain-la-Neuve

Nyquist a.164 - Maxwell Building

Enumerative algorithms for biclustering: expanding and exploring their potential in bioinformatics and neuroscience

By Rosana Veroneze Postdoc at UNICAMP, São Paulo, Brasil

Biclustering is a data analysis technique successfully applied in various domains. Biclustering, Frequent Pattern Mining (FPM), Formal Concept Analysis and Graph Theory (in specific: the problem of finding bicliques in a bipartite graph) are closely related to each other.

In these areas, we have many algorithms for enumerating all maximal biclusters of ones in a binary dataset that have four interesting properties :

  • efficiency (have time complexity linear in the number of biclusters and polynomial in the input size),
  • completeness (find all maximal biclusters),
  • correctness (all biclusters obey the user-defined measure of internal consistency), and
  • non-redundancy (all the obtained biclusters are maximal and the same bicluster is not enumerated more than once).

Since her PhD, Rosana Veroneze has been pursuing these four properties in the development of enumerative biclustering algorithms for numerical (not only binary, but also integer or real-valued) data matrices. One of the major goals of this postdoctoral project is to address challenges that are intrinsic to the enumerative algorithms.

Among these challenges, we can highlight: the constant pursuit for better algorithms in terms of runtime and memory usage; the selection of relevant biclusters (or ranking biclusters) since enumerative algorithms can return a huge amount of patterns and many of them are irrelevant; the provision to the expert (eg a physician) of a compact set (or list) of biclusters that are both very relevant and poorly redundant; and data preprocessing and parameter setting.

This project also intends to explore the interplay between biclustering and associative rules in the context of supervised descriptive pattern mining and associative classification. Lastly, this project also aims to explore applications of biclustering enumerative algorithms in real-world problems, specially in the analysis of brain activity data and gene expression data since this project is linked to the Brazilian Institute of Neuroscience and Neurotechnology (BRAINN).

Rosana Veroneze holds a bachelor's degree in Systems Analysis from the Pontifical Catholic University of Campinas (2004), a master's degree in Electrical Engineering from the State University of Campinas (2011) and a doctorate in Electrical Engineering from the State University of Campinas (2016). Part of her doctorate research has been developed while visiting the University of Minnesota within cooperation with Prof. Arindam Banerjee. She is currently a postdoctoral researcher at the University of Campinas. Her research interests include artificial intelligence, data mining and machine learning areas. She mainly works on the following topics: biclustering, enumeration of biclusters and association rule mining.