Lexytrad’s technologies for translation and interpreting: the cases of gApp and VIP
Carlos Manuel Hidalgo Ternero (University of Malaga)
In this seminar, we present two translation and interpreting technologies developed by Lexytrad research team (University of Malaga, Spain): gApp and VIP.
gApp is a text-preprocessing system designed for automatically detecting and converting discontinuous multiword expressions (MWEs) into their continuous forms in order to improve the performance of current neural machine translation (NMT) systems (see Hidalgo-Ternero, 2021 and 2023; Hidalgo-Ternero & Corpas Pastor, 2020, 2023a & 2023b, among others). To test its effectiveness, several experiments with different NMT systems (DeepL, Google Translate and ModernMT, among others) and in different language directionalities (ES/FR/IT>EN/DE/ES/FR/IT/PT/ZH) have been carried out so as to verify to what extent gApp can enhance the performance of NMT systems under the challenge of phraseological discontinuity.
VIP (Corpas Pastor, 2021), the first integrated system specifically designed to meet the needs and requirements of interpreters, aims at contributing to the improvement of the working environment for professional interpreters, as well as providing support for trainee interpreters. The development of the VIP project has led to the conclusion that it is possible to provide interpreters with a complete working environment that meets all their needs and improves the quality of their work.
Corpas Pastor, G. (2021). Technology Solutions for Interpreters: The VIP System. Hermēneus. Revista de Traducción e Interpretación, 23, 91-123.
Hidalgo-Ternero, C. M. (2021). El algoritmo ReGap para la mejora de la traducción automática neuronal de expresiones pluriverbales discontinuas (FR>EN/ES). In G. Corpas Pastor, M. R. Bautista Zambrana & C. M. Hidalgo-Ternero (Eds.), Sistemas fraseológicos en contraste: enfoques computacionales y de corpus (pp. 253-270). Comares.
Hidalgo-Ternero C. M. (2023/forthcoming). A la cabeza de la traducción automática neuronal asistida por gApp: somatismos en VIP, DeepL y Google Translate. In G. Corpas Pastor y M. Seghiri (Eds.), Aplicaciones didácticas de las tecnologías de la interpretación. Comares.
Hidalgo-Ternero, C. M., & Corpas Pastor, G. (2020). Bridging the ‘gApp’: improving neural machine translation systems for multiword expression detection. Yearbook of Phraseology, 11, 61-80. https://doi.org/10.1515/phras-2020-0005
Hidalgo-Ternero C. M., & Corpas Pastor, G. (2023a/forthcoming). Qué se traerá gApp entre manos… O cómo mejorar la traducción automática neuronal de variantes somáticas (ES>EN/DE/FR/IT/PT). In Seghiri, M. & Pérez Carrasco, M. (Eds.). Aproximación a la traducción especializada. Peter Lang.
Hidalgo-Ternero C. M., & Corpas Pastor, G. (2023b/forthcoming). ReGap: a text preprocessing algorithm to enhance MWE-aware neural machine translation systems. In J. Monti, G. Corpas Pastor y R. Mitkov (Eds.), Recent Advances in MWU in Machine Translation and Translation technology. John Benjamins Publishing Company.
Hidalgo-Ternero, C. M., & Zhou-Lian, X. (2022). Reassessing gApp: does MWE discontinuity always pose a challenge to Neural Machine Translation? In G. Corpas Pastor y R. Mitkov (eds.), Computational and Corpus-Based Phraseology (pp. 116–132). Springer.
Anglicisation de l’enseignement supérieur en FWB : menace ou opportunité ?
Pauline Degrave (UCLouvain)
L'enseignement supérieur connaît une expansion rapide des cours dispensés en anglais dans les régions non-anglophones (EMI), notamment en Belgique francophone. Cette tendance soulève de nombreuses questions quant aux avantages et aux difficultés associés à ce mode d'enseignement.
Cette présentation aura pour objectif d’évaluer les opportunités et les menaces potentielles de l’anglicisation de l’enseignement supérieur, à partir de données établies dans la littérature scientifique. Cette synthèse des connaissances actuelles sur le sujet permettra de mieux comprendre les implications concrètes de l’anglicisation de l’enseignement supérieur et d'identifier des recommandations pratiques pour l’enseignement en FWB.
Cette présentation a lieu dans le cadre du projet institutionnel de l’UCLouvain « FDP2 – enseignement et apprentissage de contenus disciplinaires en langue étrangère », actuellement en cours (2022-2024) et mené par Pauline Degrave. https://intranet.uclouvain.be/fr/myucl/administrations/adef/fdp/enseignement-et-apprentissage-de-contenus-disciplinaires-en-langue-etrangere.html
DIONE multiplier event
Ferran Suñer (UCLouvain), Kristel Van Goethem (F.R.S.-FNRS & UCLouvain) and Philipp Wasserscheidt (HUBerlin)
As part of the Erasmus+ strategic partnership DIONE, UCLouvain is organising an international multiplier event on Thursday 26 January from 13:00 to 17:00.
The event aims at disseminating the outcomes of the DIONE project. Our focus will be on introducing and explaining our concept of “micro-collaboration”. We see this as an important tool for the realisation of international cooperation in teaching, as it makes cross-border learning more accessible and flexible. The concept tackles questions such as:
• How can we offer more students international experiences?
• How can teachers internationalise their teaching in a self-determined way?
• How do we create a common European learning space?
• How can we develop internationalisation at home into an active collaboration of students?
After a general presentation of the goals and outcomes of the DIONE project and of the concept of microcollaboration, we will present two particular micro-collaboration kits, which are online accessible on our platform after registration.
The first kit deals with “Corpus-based constructional analysis” and provides a multilingual course kit, spread over foursessions, starting with a basic introduction into Construction Grammar and introducing the method of corpus-based constructional analysis (data extraction, annotation and analysis). The students apply this method to a specific case study (the Expressive Binominal Construction, e.g. a dream of a car, a hell of a job) and analyse it from a cross-linguistic perspective.
The second kit on “Figurative language” provides participants an opportunity to (re)familiarize themselves with key concepts related to figurative language, to apply metaphor extraction methods
(MIPVU), to identify the functions of metaphors in their own corpus and to include the functions of figurative language as an analytical category in their research projects.
13:00 - 14:00 Goals and outcomes of the DIONE project. Presentation of the concept of microcollaboration
14:15 - 15:00 How to use the micro-collaboration kit on “Corpus-based constructional analysis”?
15:15 - 16:00 How to use the micro-collaboration kit on “Figurative language”?
16:15 -17:00 Discussion and closing
Participation is free of charge but registration is required. Please fill in our registration form before 15 January 2023 and indicate which parts of the programme you plan to attend.
The Gradience of Lingualities: Some Research Issues for (A)typical Language Development in Diglossia.
Kleanthes K. Grohmann
This talk will present the research agenda of the Cyprus Acquisition Team (CAT Lab). Cyprus is in a unique position for many purposes and for many reasons. I aim to bring closer the potential impact the confined geographical space of this small island has on issues pertaining to language acquisition and subsequent development from a variety of perspectives, of imminent relevance for any study of multilingualism—that is, even beyond Cyprus: bilectal Greek Cypriot children, multilingual children from multicultural backgrounds, and children with atypical, even impaired, language development. This line of research takes the local linguistic variety, Cypriot Greek, seriously as the native language of Greek Cypriot children. At the CAT Lab, we developed the notion of ‘(discrete) bilectalism’ to characterize speakers in diglossic environments. Our research, in particular on object clitic placement, further suggests that bilectal children undergo refinements in their grammatical system after the critical period for first language acquisition. A prominent factor is schooling, which falls within ‘sociosyntactic’ developments of language. The larger picture is one that places bilectalism on a gradient scale, which ranges from monolectal, monolingual speakers to multilectal, multilingual speakers across further differentiations, and different degrees of bilingualism.
The presentation and the discussion will be in english.
PPaDisM – Phonetic Patterns in Discourse Markers.
In this presentation, I will introduce my 3-year project in collaboration with Valibel starting in January 2023. I plan on exploring fine-grained phonetic variation in discourse markers to establish patterns allowing disambiguation in speech, both at the lexical (ex. Fr. "ben" vs "bain") and pragmatic levels (ex. additional "et" vs concessive "et"). Moreover, as markers of fluency and disfluency, discourse markers are the ideal locus to observe phonetic characterics (both prosodic and segmental) relative to cognitive aspects of interaction (cognitive load, discursive intent...). I therefore plan on exploring self-directed (or egocentric), other-directed (or allocentric), child-directed and even robot-directed speech, both in first and second language productions.
The presentation and the discussion will be in English.
Les besoins en langues étrangères des jeunes diplômé·es universitaires en Belgique francophone
Pauline Degrave and Philippe Hiligsmann (UCLouvain)
Le séminaire abordera les compétences linguistiques dont doivent disposer les diplômé·es universitaires en Belgique francophone afin d'augmenter leurs chances sur le marché de l’emploi. Nous examinerons quelles langues les diplômé·es de l'enseignement supérieur doivent connaître et quel(s) niveau(x)/quelles compétences sont attendu(e)s. Pour répondre à ces questions, une double étude a été réalisée. 3.300 alumni de l'UCLouvain de 2014 et 2018 ont répondu à un questionnaire en ligne qui sondait, entre autres, les langues utilisées sur le lieu de travail. Nous avons également analysé de manière approfondie 2.362 offres d'emploi publiées sur la plateforme « UCLouvain Career Center par JobTeaser » entre juillet 2018 et juin 2019.
Lors du séminaire, nous aborderons les principaux résultats des deux études et nous nous attarderons sur quelques recommandations tant pour la rédaction des offres d’emploi que pour le développement ou l'adaptation des formations de l’enseignement supérieur.
Pauline Degrave & Philippe Hiligsmann (te verschijnen), ‘De behoefte aan vreemde talen van jonge universitair opgeleiden op de werkvloer in Franstalig België’. In: Verslagen & Mededelingen (Themanummer: Taal en werk), jaargang 131, Aflevering 2, 99-128.
Vocabulary and genre: analysis in the variation of lexical correlates of linguistic proficiency through genre in French-speaking learners of Spanish
Rocío Cuberos Vicente (Universitat de Barcelona/UCLouvain)
In this seminar, I present the project “Vocabulary and genre: analysis in the variation of lexical correlates of linguistic proficiency through genre in French-speaking learners of Spanish”. This project examines university French-speaking learners of Spanish’s lexical complexity in two different genres: narrative and academic writing. Lexical complexity is defined by five dimensions that have been reported to indicate lexical proficiency in L2: diversity, density, sophistication, and the use of collocations and metaphors. Writing performance is operationalized by writing quality ratings evaluated by experienced L2 Spanish teachers. By comparing lexical complexity across narrative and academic genres, this project seeks to reveal lexical differences in writing performance that result from genre-specific language demands faced by learners. Two main questions drive this project. First, do L2 learners’ overall writing quality ratings and/or lexical features vary by genre? Second, which lexical features predict overall writing quality ratings within each genre? The ultimate goal of this project is to generate findings that would inform the design of pedagogical approaches that will be specially attuned to the needs of French-speaking learners of Spanish as they learn to participate in two prevalent discourse genres: narrative production and academic writing. Previous researched on characterizing lexical correlates of language proficiency and text quality in L1 and L2 Spanish (L1 = Chines, Arabic and Korean) conducted in the context of my PhD thesis provided a complex developmental framework for the use of these lexical measures in L1 and L2. In this seminar, I elaborate on these findings and point to directions that would need further investigation.
Pleonastic constructions in German child (directed) speech
Sarah Faidt (Université de Basel)
The project aims at investigating the role of pleonastic constructions (e.g., ins Haus rein, auf dem Baum drauf) in the acquisition of spatial language in L1 German. Earlier research suggests a supporting function of pleonastic constructions in the development of spatial language (Bryant 2012) in that sense that they bridge the gap between syntactically simple particle constructions and more complex prepositional phrases. Although this construction type has been observed in previous language production studies (e.g., Harr 2012, Madlener et al. 2017), concrete figures regarding their actual frequency and development in natural language use are missing up to now. The project targets at filling this gap by analyzing longitudinal data from German child-adult interaction regarding the use of pleonastic constructions in terms of frequency of occurrence, functional and constructional features. The analysis is grounded in a Construction Grammar framework and a usage-based approach to language acquisition. Results may add deeper insights into how children master the challenging domain of spatial language in German as well as to an understanding of pleonastic constructions in a broader network of constructions.
Bryant, D. (2012). Lokalisierungsausdrücke im Erst- und Zweitspracherwerb. Typologische, ontogenetische und kognitionspsychologische Überlegungen zur Sprachförderung in DaZ. Baltmannsweiler: Schneider.
Harr, A.-K. (2012). Language-specific factors in First Language Acquisition. The Expression of Motion Events in French and German. Berlin: Mouton de Gruyter.
Madlener, K., Skoruppa, K. & Behrens, H. (2017). Gradual development of constructional complexity in German spatial language. Cognitive Linguistics 28 (4), 757–798, doi: 10.1515/cog-2016-0089.
Revisiting simplification in corpus-based translation studies: Insights from readability research
Thomas François (UCLouvain) and Marie-Aude Lefer (UCLouvain)
Ever since the publication of Laviosa’s (1998a; 1998b) pioneering work, the study of lexico-syntactic simplification has held center stage in corpus translation research concerned with the typical features of translated texts. The simplification hypothesis states that translated texts are simpler than non-translated texts. The convergence hypothesis, also discussed by Laviosa (1998a; 1998b) but less so in follow-up studies, is that translated texts are more homogeneous than original texts, i.e. they display less variance. To date, simplification has mostly been operationalized in CBTS as type-token ratio, lexical density, core vocabulary coverage, list head coverage and average sentence length. Relying on these parameters, previous research has produced mixed results, with simplification varying across translation modalities, language pairs and registers. The present article sets out to revisit the simplification and convergence hypotheses through the lens of NLP-informed readability research. In particular, we rely on a larger set of simplification indicators and make use of multivariate statistical techniques. We present a simplification study of Europarl corpus data in French translated from English and in non-translated French. The results show that translated French is simpler than original French, lexically and syntactically. We also find evidence of convergence that shows that translators smooth out cross-speaker lexical heterogeneity in translated parliamentary proceedings.
Tracking the uses of populism in media and political discourse
ARC TrUMPo (UCLouvain)
In this seminar, we present the methodological steps as well as some of the preliminary research results of our interdisciplinary research project TrUMPo “Discourse, populism and democracy: Tracking the uses of populism in media and political discourse”. The project was set out to analyze the uses, the meanings, and the circulation of the term populis* (i.e. populism and its derivatives) in the public debates in Belgium, France, and Spain. This comparative corpus-driven study examines Dutch, French, and Spanish data from 2019 from three forums: parliamentary arena, mass media, and social media (Twitter). In this study, we follow a mixed methods research design and a four-step analytical procedure: (i) automatic identification of every token of populis* in each forum during the selected period; (ii) determination of around ten peaks of occurrences of populis* for each case study in order to establish the discursive events that will be the object of an in-depth analysis; (iii) creation of an annotated database in which each occurrence of populis* is contextualized in its communicative process; (iv) qualitative analysis specific to each disciplinary approach. In addition, we discuss the process of coding of our data. We have adopted an inductive approach to coding our data and have reached the first research result, that is determining analytical categories for the annotation of our database. Finally, we report preliminary findings of the linguistic analysis of populism.