Transcription conventions

The EXMARaLDA page documenting transcription conventions for use with EXMARaLDA has undergone a rehaul. It now lists the major pieces of documentation for all major transcription conventions and annotation guidelines supported by and/or developed and tested with the EXMARaLDA tools.

If transcription and annotation practices follow one of these conventions, additional functionality in EXMARaLDA’s tools becomes available through EXMARaLDA’s mechanisms of segmentation.

A segmentation algorithm recognizes the specific constructs of a transcription convention, such as:

  • non-verbal descriptions are enclosed in double round brackets (HIAT and GAT), e.g. ((scratches his beard))
  • breathing is represented as °h, °hh, h°, hh°, etc. (GAT)
  • speaker turns are divided into utterances using dedicated utterance end symbols, such as period, question mark or ellipsis (HIAT and CHAT)
  • speaker turns are divided into intonation units using dedicated symbols to mark final tone movements, such as period, comma or semicolon (GAT)

Applying segmentation to a given transcription in the Partitur-Editor not only checks for correct use of transcription symbols. It is also the basis for a number of advanced analysis functions, such as:

  • Quantifying the transcript in terms of number of utterances, number of word tokens, mean length of utterance
  • Calculating word lists and word frequency lists
  • Generating output formats based on convention-specific entities (such as: intonation units for GAT or utterances for HIAT)
  • Exporting tokenised versions of a transcript, for example in the ISO/TEI standard format

Different transcription conventions are further supported by specialised virtual keyboards with symbols that are not easily accesible through the physical keyboard.