Linguistic and media corpus

The Institute for Language and Communication bases much of its research on large-scale linguistic and mediatic corpora with a diverse range of origins and formats. The database below shows their variety and extent.

The rigour with which the corpora are designed and the data (and metadata) collected is instrumental in answering specialised research questions.

The constant search for improvements in the way that corpora are produced (transcription, data cleaning, anonymisation) and annotated is in itself also a subject of research, the results of which the ILC puts to practical use within international networks.

The details of the research and networks are available on the webpages of the relevant research centres and groups, and on the ‘Prototypes, methods and innovative processes’ page.