MDMA - Model for Discourse Marker Annotation



"It has become standard in any overview article or chapter on DMs to state that reaching agreement on what makes a DM is as good as impossible, be it alone on terminological matters" (Degand et al. 2013: 5)







What context?


One of the most obvious observation in the field of Discourse Marker (DM) research is that there is no recognized closed class of these pragmatic elements. As a result, a number of linguistic expressions may or may not count as DMs according to the definition at stake (Schourup 1999: 228). Although the field is flourishing with a proliferation of case studies from various perspectives (e.g. Fischer 2006, Degand & Simon-Vandenbergen 2011), inclusive models of identification and selection are still needed today (e.g. Uygur-Distexhe 2012, Crible 2014).

For clarity purposes, the terminology adopted in the MDMA research project considers the label "discourse markers" as cover term for diverse elements that are sometimes distinguished, such as connectives (e.g., French donc 'so' or parce que 'because') and other non-relational markers (e.g., French eh ben 'well', tu vois 'you see'). We reserve the use of "pragmatic markers" for the reference to a broader category of any pragmatic element participating in the interpretation of context beyond pure semantic decoding. Another close competing term is "modal particle" (e.g., German eben or doch), which concerns syntactically restricted markers of speaker’s stance.




What aim?

In light of this lack of consensus, the MDMA project (Model for Discourse Marker Annotation) has been created in 2012, with the aim to develop an empirical method for the identification and annotation of DMs in oral data (see Bolly et al., to appear). The general goal of MDMA is to cover every step of the analysis of DMs from identification to parameter and functional description in context.




What method?

The methodology of MDMA can be represented as a constant back-and-forth from theory to data. It starts from an independent selection of potential DMs by several expert coders, which then undergo syntactic and semantic description (syntactic category and position, procedural/conceptual meaning, meaning in context, presence of a co-occurring DM, etc.) through an operational annotation model. Different visualization methods and statistical treatment (e.g., conditional tree, multiple correspondence analysis) are also used to reveal relevant clusters of features and hierarchy of (more or less) predictive variables.




What data?

The novelty of this approach to DMs has brought forward a reliable and operational corpus-based annotation model, which can be applied to various data types in terms of genres (e.g., SMS, spontaneous speech), languages (e.g., French, English) and modalities (e.g., speech, gesture).




Scientific Coordination: Catherine T. Bolly
Members: Ludivine Crible, Liesbeth Degand, Deniz Uygur-Distexhe
Alumni: Federica Ciabarri, Noalig Tanguy