Séminaires du Cental

Louvain-La-Neuve

Les séminaires du CENTAL ont pour but de réunir des enseignants, des étudiants et des chercheurs (du monde académique ou de l'industrie) intéressés par le traitement automatique de langues. Les séminaires sont gratuits et ouverts à tous et ont généralement lieu le vendredi de 14h à 15h dans le local c.142 du Collège Erasme (salle de séminaire du CENTAL). Si vous souhaitez être informé par courrier électronique des séminaires que nous organisons et des actualités du CENTAL, nous vous proposons de vous inscrire à la liste de diffusion du CENTAL en indiquant votre adresse électronique dans le formulaire.

Contact : Serge Bibauw et Anaïs Tack

 

Calendrier

Vous pouvez intégrer le calendrier des séminaires dans votre agenda :

 

Programme 2017-2018

 

En bref

 

 

Programme complet

 



 

Vendredi 20 octobre 2017, 14h-15h, Collège Erasme c.142

Natalia Grabar (STL, Université de Lille, CNRS, FR)

Acquisition de ressources pour la simplification de textes médicaux

Une des particularités des textes médicaux consiste en utilisation de termes techniques très spécialisés, qui restent souvent non compréhensibles pour les locuteurs. Lors de la simplification de ces textes, il est donc important de disposer de ressources nécessaires. Nous introduisons ici deux méthodes pour effectuer l'acquisition de telles ressources. L'une repose sur les indices internes des termes (l'analyse morphologique des termes composés) alors que l'autre exploite les indices externes des termes (les reformulations effectuées dans les textes). Aucune de ces méthodes ne requiert l'exploitation de corpus parallèles. Nous décrivons et discutons les résultats.

mots-clés : simplification de textes · acquisition de ressources · définition de règles · catégorisation · domaine médical

références : 

  • Antoine, E., & Grabar, N. (2016). Exploitation de reformulations pour l’acquisition d’un vocabulaire expert/non expert. In Actes de la conférence conjointe JEP-TALN-RECITAL 2016, volume 2 : TALN (pp. 153–166).
  • Grabar, N., & Hamon, T. (2016). A large rated lexicon with French medical words. In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016) (pp. 2643–48).
  • Grabar, N., & Hamon, T. (2016). Exploitation de la morphologie pour l’extraction automatique de paraphrases grand public des termes médicaux. TAL, 57(1), 85–109.

diapositives : ici

 


Vendredi 17 novembre 2017, 14h-15h, Auditoire MONTESQUIEU 3
Séminaire co-organisé avec l'Institut Langage et Communication (IL&C)

Naomi Baron (CTRL, American University, Washington, D.C., US)

Learning, Knowing, and Remembering in a Digital World

mots-clés : cognition · digital impact on memory · educational curricula · GPS

Digital tools such as the internet, search engines, and online navigation have put a wealth of information at our fingertips. Are these same tools impacting the way we use human cognitive skills to learn, know, and remember? Research suggests that availability of “google knowing” is redefining our assumptions about what kinds of data – and knowledge – are appropriately held in our own heads. These redefinitions are, in turn, reshaping academic curricula, for good or for ill.

références :

diffusion en direct et vidéo : www.facebook.com/didaxoUlearn/videos/1174447416020956

diapositives et références

 


Vendredi 24 novembre 2017, 14h-15h, Collège Erasme c.142

Dirk De Hertog (ITEC, imec - KU Leuven, BE)

Embeddings and their use as features in supervised learning tasks

mots-clés : embeddings · supervised learning

    This talk provides an introduction to the use and value of distributional word representations within machine learning approaches to NLP. Machine learning aims to learn how to perform specific tasks (e.g., POS-tagging, Named Entity Recognition…) by deriving statistical associations between annotated examples and so called features, i.e., meaningful pieces of information that are relevant for the problem at hand. If the learning is successful then it can be successfully applied to similar, yet new examples.​ A recent development within NLP is to replace traditional ‘flat’ features with distributional ‘semantic’ representations, such as Semantic Vector Spaces (SVS) and word2Vec. The latter methods rely on contextual information that is derived from large scale corpora to build vector representations of words, effectively transforming a word into a complex data structure.

    référence :

    • Turney, P. D., & Pantel, P. (2010). From Frequency to Meaning: Vector Space Models of Semantics. Journal of Artificial Intelligence Research, 37, 141–188. https://doi.org/10.1613/jair.2934
     

    Vendredi 1er décembre 2017, 14h-15h, Collège Erasme c.142

    Pierre Deville (Head of Data Science, Bisnode Group Analytics, BE)

    Network Science in the era of Text Mining and Big Data

    mots-clés : networks · big data · visualization

    références : 

    • Deville, P., Wang, D., Sinatra, R., Song, C., Blondel, V. D., & Barabási, A.-L. (2014). Career on the Move: Geography, Stratification, and Scientific Impact. Scientific Reports, 4, srep04770. https://doi.org/10.1038/srep04770
    • Sinatra, R., Deville, P., Szell, M., Wang, D., & Barabási, A.-L. (2015). A century of physics. Nature Physics, 11(10), 791–796. https://doi.org/10.1038/nphys3494
    • Sinatra, R., Wang, D., Deville, P., Song, C., & Barabási, A.-L. (2016). Quantifying the evolution of individual scientific impact. Science, 354(6312). https://doi.org/10.1126/science.aaf5239

     


    Vendredi 8 décembre 2017, 14h-15h, Collège Erasme c.142

    Aline Villavicencio (University of Essex, UK • INF, Federal University of Rio Grande do Sul, BR)

    Identifying Idiomatic Language with Distributional Semantic Models

    Precise natural language understanding requires adequate treatments both of single words and of larger units. However, expressions like compound nouns may display idiomaticity, and while a police car is a car used by the police, a loan shark is not a fish that can be borrowed. Therefore it is important to identify which expressions are idiomatic, and which are not, as the latter can be interpreted from a combination of the meanings of their component words while the former cannot. In this talk I discuss the ability of distributional semantic models (DSMs) to capture idiomaticity in compounds, by means of a large-scale multilingual evaluation of DSMs in French and English. A total of 816 DSMs were constructed in 2,856 evaluations. The results obtained show a high correlation with human judgments about compound idiomaticity  (Spearman’s ρ=.82 in one dataset), indicating that these models are able to successfully detect idiomaticity.

    mots-clés : idiomaticity · distributional semantic models · compound nouns · multiword expressions

    références : 

    • Wilkens, R., Zilio, L., Cordeiro, S. R., Ramisch, C., Idiart, M., & Villavicencio, A. (2017). LexSubNC: A Dataset of Lexical Substitution for Nominal Compounds. In Proceedings of the 12th International Conference on Computational Semantics (IWCS). Montpellier.
    • Cordeiro, S., Ramisch, C., Idiart, M., & Villavicencio, A. (2016). Predicting the Compositionality of Nominal Compounds: Giving Word Embeddings a Hard Time. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Vol. 1: Long Papers, pp. 1986–1997). Berlin, Germany: Association for Computational Linguistics. https://doi.org/10.18653/v1/P16-1187
    • Ramisch, C., Cordeiro, S., Zilio, L., Idiart, M., & Villavicencio, A. (2016). How Naked is the Naked Truth? A Multilingual Lexicon of Nominal Compound Compositionality. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL 2016) (Vol. 2: Short Papers, pp. 156–161). Berlin, Germany: ACL. http://aclweb.org/anthology/P/P16/P16-2026.pdf
    • Cordeiro, S., Ramisch, C., & Villavicencio, A. (2016). mwetoolkit+sem: Integrating Word Embeddings in the mwetoolkit for Semantic MWE Processing. In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016) (pp. 1221–1225). Portorož, Slovenia: European Language Resources Association (ELRA). http://www.lrec-conf.org/proceedings/lrec2016/summaries/271.html