Transcription conventions
The EXMARaLDA page documenting transcription conventions for use with EXMARaLDA has undergone a rehaul. It now lists the major pieces of documentation for all major transcription conventions and annotation guidelines supported by and/or developed and tested with the EXMARaLDA tools.
If transcription and annotation practices follow one of these conventions, additional functionality in EXMARaLDA’s tools becomes available through EXMARaLDA’s mechanisms of segmentation.
A segmentation algorithm recognizes the specific constructs of a transcription convention, such as:
- non-verbal descriptions are enclosed in double round brackets (HIAT and GAT), e.g. ((scratches his beard))
- breathing is represented as °h, °hh, h°, hh°, etc. (GAT)
- speaker turns are divided into utterances using dedicated utterance end symbols, such as period, question mark or ellipsis (HIAT and CHAT)
- speaker turns are divided into intonation units using dedicated symbols to mark final tone movements, such as period, comma or semicolon (GAT)
Applying segmentation to a given transcription in the Partitur-Editor not only checks for correct use of transcription symbols. It is also the basis for a number of advanced analysis functions, such as:
- Quantifying the transcript in terms of number of utterances, number of word tokens, mean length of utterance
- Calculating word lists and word frequency lists
- Generating output formats based on convention-specific entities (such as: intonation units for GAT or utterances for HIAT)
- Exporting tokenised versions of a transcript, for example in the ISO/TEI standard format
Different transcription conventions are further supported by specialised virtual keyboards with symbols that are not easily accesible through the physical keyboard.