Semiring Rank Matrix Factorisation by Thanh Le Van, KU Leuven

22 February 2017

11:50 am

Louvain-la-Neuve

Paul Otlet room - Réaumur Building, a.327

Semiring Rank Matrix Factorisation by Thanh Le Van (KU Leuven)
Abstract. Rank data, in which each row is a complete or partial ranking of available items (columns), is ubiquitous. It can be used to represent, for instance, preferences of users, the levels of gene expression, and the outcomes of sports events. While rank data has been analysed in the data mining literature, mining patterns in such data has so far not received much attention.

In this talk, I will discuss matrix factorisation based methods for pattern set mining in rank data.

First, I will discuss a general framework called Semiring Rank Matrix Factorisation. The framework employs semiring theory rather than relying on the traditional linear algebra for matrix factorisation, which results in a more elegant way of aggregating rankings. Subsequently, I will introduce two instantiations of the framework: Sparse RMF and ranked tiling. We introduce Sparse RMF to mine a set of sparse rank vectors that can be used to summarise given rank matrices succinctly and show the main categories of rankings. We introduce ranked tiling to discover a set of data regions in a rank matrix which have high ranks. Such data regions are interesting as they can show local associations between subsets of the rows and subsets of the columns of the given matrices.
Finally, I will discuss how to use ranked tiling to formally define the concept of driver pathways, from which we can find cancer subtypes, i.e., groups of tumour samples having the same molecular mechanism driving tumorigenesis.

Thanh obtained his master degree at the Asian Institute of Technology (AIT) in 2007 and his PhD at the KU Leuven in December 2016, under the supervision of Luc De Raedt (KU Leuven), Kathleen Marchal (Universiteit Gent) and Siegfried Nijssen (currently UC Louvain). He is interested in declarative methods for data mining using Constraint Programmming and Integer Programming, matrix factorisation for pattern set mining in rank data and its applications in bioinformatics.

Categories Events: