Transcription guidelines

1. Interview identification

Each interview is preceded by a code of this type: <h nt="FR" nr="FR+three-figure number">

e.g.  <h nt="FR" nr="FR004"> (4th interview with French mother tongue student)

Examples of country codes:

  • DUTCH = DU001
  • GERMAN = GE001
  • SPANISH = SP001
  • SWEDISH = SW001

All interviews should end with the following tag (on a separate line): </h>

2. Speaker turns

Speaker turns are displayed in vertical format, i.e. one below the other. Whilst the letter "A" enclosed between angle brackets always signifies the interviewer's turn, the letter "B" between angle brackets indicates the interviewee's (learner's) turn.  The end of each turn is indicated by either </A> or </B>.

e.g.  <A> okay so which topic have you chosen </A>
       <B> the film or play that I thought was particularly good or bad really </B>

3. Overlapping speech

The tag <overlap /> (with a space between "overlap" and the slash) is used to indicate the beginning of overlapping speech. It should be indicated in both turns. The end of overlapping speech is not indicated. 

e.g.  <B> yeah I went on a bus to London once and I'll never <overlap /> do it again </B>
       <A> <overlap /> that's even worse </A>

4. Punctuation

No punctuation marks are used to indicate sentence or clause boundaries.

5. Empty pauses

Empty pauses are defined as a blank on the tape, i.e. no sound, or when someone is just breathing. 

The following three-tier system is used: one dot for a "short" pause (< 1 second), two dots for a "medium" pause (1-3 seconds) and three dots for "long" pauses (> 3 seconds).  

e.g.  <B> (erm) .. it’s a British film there aren't many of those these days </B>

6. Filled pauses and backchannelling

Filled pauses and backchannelling are put between brackets and marked as (eh) [brief], (er), (em), (erm), (mm), (uhu) and (mhm). No other fillers should be used.

e.g.  <B> yeah . well Namur was warmer (er) it was (eh) a really little town </B>

7. Unclear passages

A three-tier system is used to indicate the length of unclear passages: <X> represents an unclear syllable or sound up to one word, <XX> represents two unclear words, and <XXX> represents more than two words.

e.g.  <B> <X> they're just begging <XX> there's there's honestly he did a course .. for a few weeks </B>

If transcribers are not entirely sure of a word or word ending, they should indicate this by having the word directly followed by the symbol <?>.

e.g.  <B> I went to see a<?> friend at university there and stayed </B>

Unclear names of towns or titles of films for example may be indicated as <name of city> or <title of film>.

e.g.  <B> where else did we go (er) <name of city> it's in Bolivia </B>

8. Anonymisation

Data should be anonymised (names of famous people like singers or actors can be kept). Transcribers can use tags like <first name of interviewee>, <first name and full name of interviewer> or <name of professor> to replace names.

e.g.  <A> I'm <first name of interviewer> . what's your name </A>

9. Truncated words

Truncated words are immediately followed by an equals sign.

e.g.  <B> it still resem= resembled the theatre </B>

10. Spelling and capitalisation

British spelling conventions should be followed. Capital letters are only kept when required by spelling conventions on certain specific words (proper names, I, Mrs, etc.) – not at the beginning of turns.

11. Contracted forms

All standard contracted forms are retained as they are typical features of speech.

12. Non-standard forms

Non-standard forms that appear in the dictionary are transcribed orthographically in their dictionary accepted way: cos, dunno, gonna, gotta, kinda, wanna and yeah.

13. Acronyms

If acronyms are pronounced as sequences of letters, they are transcribed as a series of upper-case letters separated by spaces.

e.g.  <B> yes not really I did sort of basic G C S E French and German </B>

If, on the other hand, acronyms are pronounced as words, they are transcribed as a series of upper-case letters not separated by spaces.

e.g.  <A> (mhm) (er) you're doing a MAELT </A>

14. Dates and numbers

Figures have to be written out in words. This avoids the ambiguity of, for example, "1901", which could be spoken in a number of different ways.

e.g.  <B> an awful lot of people complain and say well the grants were two thousand two hundred </B>

15. Foreign words and pronunciation

Foreign words are indicated by <foreign> (before the word) and </foreign> (after the word).

e.g.  <B> we couldn't go with (er) knives and so on <foreign> enfin </foreign> we were (er) </B>

As a rule, foreign pronunciation is not noted, except in the case where the foreign word and the English word are identical.  If in this case the word is pronounced as a foreign word, this is also marked using the <foreign> tag.

e.g.  <B> I didn't have the (erm) . <foreign> distinction </foreign> </B>

16. Phonetic features

(a) Syllable lengthening

A colon is added at the end of a word to indicate that the last syllable is lengthened. It is typically used with small words like to, so or or. Colons should not be inserted within words.

e.g.  <B> that's something I'll I'll plan to: to learn </B>

(b) Articles

- when pronounced as [ei], the article a is transcribed as a[ei];

e.g.  <B> and it's about (erm) . life in a[ei] (eh) public school in America I think </B>

- when pronounced as [i:], the article the is transcribed as the[i:].

e.g.  <B> and the[i:] villa we were staying in was in one of the valleys </B>

17. Prosodic information: voice quality

If a particular stretch of text is said laughing or whispering for instance, this is marked by inserting <starts laughing> or <starts whispering> immediately before the specific stretch of speech and <stops laughing> or <stops whispering> at the end of it.

e.g.  <B> <starts laughing> I don't have to assess it I only have to write it <stops laughing> </B>

18. Nonverbal vocal sounds

Nonverbal vocal sounds are enclosed between angle brackets.

e.g.  <B> I hope so I've I've got some <coughs> friends out there </B>
e.g.  <B> so I went back into Breda .. and sat down again <imitates the sound of a guitar> </B>

19. Contextual comments

Non-linguistic events are indicated between angle brackets only if they are deemed relevant to the interaction (if one of the participants reacts to it, for example).

e.g.  <A> no it's true it's nice to have your own bathroom </A>
       <somebody enters the room>
       <B> hi </B>

20. Tasks

The three tasks making up the interview (set topic, free discussion and picture description) should be separated from each other. This is done using the following tags: <S> (before the set topic), </S> (after the set topic), <F> (before the free discussion), </F> (after the free discussion), <P> (before the picture description), </P> (after the picture description). These tags should occupy a separate line and should not interrupt a turn.

e.g.  <S>
       <A> did you . manage to choose a topic </A>


If you have any questions regarding these transcription guidelines, don't hesitate to get in touch with us.