Discourse markers as potential markers of (dis)fluency in native French and English


In spite of the growing literature on discourse markers ("nonpropositional and metadiscursive" elements, Hansen 2006: 25) on the one hand and disfluency on the other, a combined and exhaustive approach to both phenomena is still lacking today. This project thus aims at filling the gap by investigating the contribution of French and English discourse markers to the production of fluent and disfluent speech. The approach is therefore contrastive and makes use of DisFrEn, a comparable corpus gathered from existing speech material in the two languages.

A first methodological challenge is the elaboration of a cross-linguistic and operational annotation scheme that will allow a paradigmatic description of discourse markers, based on a combination of syntactic, semantic and functional parameters. Several theoretical issues are at stake regarding the definition of this heterogeneous category and the identification of potential discourse-marking elements in contextualised authentic data. One major contribution of this project will therefore be the combination of a theoretical and a more empirical, operational definition of discourse markers.

Another key aspect of this research is corpus design. Contrastive analysis of discourse markers requires a relatively large amount of speech data in French and English. While the latter benefits from extensive, representative speech corpora of various registers (the ICE-GB will be used here, Nelson, Wallis and Aarts 2002), the former suffers from the multiplicity of smaller data collections in different formats and conventions. Our comparable corpus DisFrEn will offer balanced subcorpora of homogenized contents and forms in no less than eight interactional situations, as defined through a combination of relevant metadata variables (see Crible, Dumont, Grosman and Notarrigo 2014). The resulting files will be sound-aligned and machine-readable for annotation under the EXMARaLDA suite (Schmidt 2011).

Finally, this preliminary methodological work will be applied to the framework of disfluency annotation and analysis. Potential (dis)fluency markers or "fluencemes" (Götz 2013) under investigation are coded according to Shriberg's (1994) protocol which distinguishes complex disfluent structures (repetitions, substitutions) and simple disfluent elements (pauses, fillers, editing terms, discourse markers) as well as diacritics (misarticulation, truncation). The specific contribution of this research project will thus be to situate the role of (different types of) discourse markers within a typology of fluencemes, to account for the impact of contextual variables, and to identify language-specific as well as more universal mechanisms that motivate the use of discourse markers as potential markers of (dis)fluency.

This project is funded within the framework of a concerted action project on “Fluency and disfluency markers. A multimodal contrastive perspective”, whose aim is to investigate markers of fluency and disfluency in spoken and sign language, focusing on three main modalities: first language discourse (French and English), (advanced) foreign language discourse (English), and sign language (Belgian French).

Supervisors : Liesbeth Degand & Gaëtanelle Gilquin


  • Crible L., Dumont A., Grosman I. & Notarrigo I. 2014. “Situational features”. Technical Report. Université Catholique de Louvain.
  • Götz, S. 2013. Fluency in Native and Nonnative English Speech. Amsterdam/Philadelphia: Benjamins.
  • Hansen M.-J. M. 2006. “A dynamic polysemy approach to the lexical semantics of discourse markers (with an exemplary analysis of French toujours)”. In K. Fischer (ed.), Approaches to discourse particles. Amsterdam: Elsevier. 21-42
  • Nelson G., Wallis S. and Aarts B. 2002. Exploring Natural Language: Working with the British Component of the International Corpus of English. Amsterdam: John Benjamins.
  • Schmidt T., Wörner K., Hedeland H., Lehmberg T. 2011. "New and future developments in EXMARaLDA". German Society for Computational Linguistics and Language Technology (GSCL) 96. 253-256.
  • Shriberg, E. 1994. Preliminaries to a theory of speech disfluencies. Doctoral dissertation. University of California at Berkeley.