Les séminaires du CENTAL ont pour but de réunir des enseignants, des étudiants et des chercheurs (du monde académique ou de l'industrie) intéressés par le traitement automatique de langues. Les séminaires sont gratuits et ouverts à tous et ont généralement lieu le vendredi de 14h à 15h. Si vous souhaitez être informé par courrier électronique des séminaires que nous organisons et des actualités du CENTAL, nous vous proposons de vous inscrire à la liste de diffusion du CENTAL en indiquant votre adresse électronique dans le formulaire.
Organisation 2023-2024
Calendrier 2023-2024
Séminaire à venir |
---|
1er décembre 2023 — Regina Stodden — Doyen 22
|
27 octobre 2023 — Erika Lombart — Doyen 22
L'implicite sur les réseaux sociaux: Entre les lignes des forums de discussion
Erika Lombart, PhD en linguistique, Logisticienne de recherche SHS UNamur, Collaboratrice scientifique ILC
Abstract :
L'implicite, mieux connu sous le nom de sous-entendu, est partout. Que ce soit pour mieux se faire comprendre, pour attirer l'attention de notre auditeur, pour s'assurer qu'un message passe bien ou au contraire qu'il sera le plus piquant possible… L'implicite est un outil précieux auquel nous recourons sans même nous en rendre compte. Mais qu'en est-il sur les réseaux sociaux ? Cette recherche analyse l'utilisation et la construction de l'implicite dans les forums de discussion de Doctissimo. Au départ des figures de la rhétorique et de la pragmatique, elle aboutit à une catégorisation innovante des formes de l'implicite non conventionnel et met en lumière leur lien avec l'intensité émotionnelle de la communication et leur rôle-clé dans la gestion relationnelle qui s'y joue.
3 novembre 2023 — Emmanuelle Salin — Doyen 22
Multimodal machine learning: the case of vision-language transformers
Emmanuelle Salin, doctorante au Laboratoire d'Informatique et Systèmes, Aix Marseille Université
Abstract :
Vision-Language transformer models combine information from the textual and visual modalities to extract multimodal representations. These models can be used as a basis for many multimodal vision-language tasks. Large pre-trained models based on the transformer architecture, inspired by recent advances in Natural Language Processing, have enabled great improvement on those tasks.
In this presentation, I will give an overview of vision-language transformer models. I will introduce the different types of models, in terms of architecture and pre-training methods. I will also present the strengths and weaknesses of those different methods. Finally, I will talk about current challenges and emerging trends of research in vision-language machine learning.
17 novembre 2023 — Danqing Huang — Doyen 22
Diachronic Prototype Semantics of Chinese Radicals
Danqing Huang, Data manager at the ILC (UCLouvain) & affiliated researcher at the QLVL (KU Leuven)
Abstract :
Chinese radicals are the semantic components of Chinese characters that generally indicate major concepts and categories. Characters that share the same radical may be semantically linked in various ways to the broad semantic category that the radical represents, and radicals may thus be considered a categorization mechanism to distinguish lexical meanings (see Chen 2012). However, traditional studies of Chinese characters or radicals in Chinese linguistics are philological in nature (e.g. Lu & Wang 1994; Wang 1996), which tend to focus on the origin of radicals and characters, their graphemic development through time, and the symbolic connection between the character’s graphemic form and its phonetic aspect. In other words, not only has the cognitive aspect of Chinese radicals been neglected, but also prototype-based studies of Chinese radicals seem to be a missing corner.
To fill this research gap, this study takes the perspective of Cognitive Linguistics to determine which role radicals play as a way of categorization in Chinese characters. Concretely, the project focuses on the FIRE character given that FIRE is an independent character that can also be used as a radical in composite characters. The question arises as to what extent the semantic developments of the FIRE character and the FIRE radical are similar and whether it is possible that the FIRE radical develops independently of the FIRE character. In a first case study, I therefore investigate how the senses of the internal semantic structure of the FIRE character connect as a network. In a second case study, I analyze the semantic structure and development of the FIRE radical as well as the semantic network of composite characters in which the FIRE radical is involved. Finally, I look into variant characters and paronyms incorporating the FIRE radical in order to find out the semantic functions of radicals in so-called radicalization processes, whereby a radical is either added, replaced or removed from a character. Although the semantic structure of the FIRE radical overlaps with that of the FIRE character to a large extent, we find that the radical features independent developments, which are due to the semasiological change of the FIRE radical, internal semantic changes within composite characters and external mechanisms such as phonetic loaning and analogy.
1er décembre 2023 — Regina Stodden — Doyen 22
German Text Simplification : Scarce Data and Other Challenges
Regina Stodden, PhD student in computational linguistics, Heinrich Heine University Düsseldorf
Abstract :
Text simplification is an intra-lingual translation task in which documents or sentences of a complex source text are simplified for a specific target audience. Many new models for text simplification have been proposed in recent years and months, but unfortunately, we often cannot be very sure of their quality. In most cases, we know too little about the training data and what kind of simplification we can expect from the models. In addition, we too often rely on controversial automatic evaluations, especially in languages other than English. In our view, the success of automatic text simplification systems depends as much or even more on the quality of the parallel data used for training and evaluation than on the text simplification models themselves.
This talk will look at each point of the text simplification pipeline, particularly the data and annotation aspect, and discuss how it could be improved. For example, it will include i) facilitating the construction of new high-quality text simplification corpora, ii) improving existing corpora through new annotations, including annotations of a) simplification operations, b) quality assessment, and c) error operations, and iii) rethinking the current evaluation process. We will illustrate the problematic areas using German texts as an example.
15 décembre 2023 — Barbara Plank — More 56 (GPLO-DROIT)
Human label variation in NLP