Skip to main content

SEMINAR by Olga Klopp

isba
Louvain-la-neuve
More information

Olga Klopp

will give a presentation on

Assigning Topics to Documents by Successive Projections

 


Abstract:

Topic models provide a useful tool to organize and understand the structure of large corpora of text documents, in particular, to discover hidden thematic structure. Clustering documents from big unstructured corpora into topics is an important task in various fields, such as image analysis, e-commerce, social networks, population genetics. Since the number of topics is typically substantially smaller than the size of the corpus and of the dictionary, the methods of topic modeling can lead to a dramatic dimension reduction. We study the problem of estimating the topic-document matrix, which gives the topics distribution for each document in a given corpus, that is we focus on the clustering aspect of the problem. We introduce an algorithm that we call Successive Projection Overlapping Clustering (SPOC) inspired by the Successive Projection Algorithm for separable matrix factorization. This algorithm is simple to implement and computationally fast. We establish upper bounds on the performance of SPOC algorithm for estimation of topic-document matrix, as well as near matching minimax lower bounds.

 

  • Friday, 06 October 2023, 08h00
    Friday, 06 October 2023, 17h00
  • Contact