User story: ExmaraldaR – Processing (Annotated) Transcripts in R

The R package ExmaraldaR is intended to allow easy processing of (annotated) transcripts R (R Core Team 2020). R is a free platform for statistical analysis, data preparation and also Natural Language Processing. It thus offers numerous options that are also interesting for the study of spoken language. With the package, one or more annotated transcriptions (*.exb) can be read in. Annotated transcripts then result in a table object that can be used for further work (see Figure 1).

Figure 1: Table object
Figure 2: Network graph from a fictitious file

The table contains a consecutive IP numbering based on the GAT2 conventions (Selting et al. 2009), the speaker sigle, the ID of the tier, the speaker name, the transcription text, the metadata of the speaker table (optional), timestamp of the event and the annotations. The annotations are directly assigned to the transcribed text. Different annotation formats are possible (complex annotation tags that are separated or multiple annotation tracks). Descriptive tiers can also be integrated. In future, it should also be possible to transfer changes made in R or to the table (e.g. after an export to Excel) directly back into the underlying files to simplify post-processing. Templates for integration into R-Shiny applications can be requested if required (see Fig. 2). If you have any questions or would like to test the package, please contact timo.schuermann@uni-meunster.de.

A beta version, which is under constant development, can be found here:

https://github.com/TimoSchuer/ExmaraldaR

References

R Core Team (2020): R. A Language and Environment for Statistical Computing. Version 3.6.0: R Foundation for Statistical Computing. Online verfügbar unter https://www.R-project.org.

Selting, Margret; Auer, Peter; Barth-Weingarten, D.; Bergmann, Jörg; Bergmann, Pia; Birkner, Karin et al. (2009): Gesprächsanalytisches Transkriptionssystem 2 (GAT 2). In:Gesprächsforschung 10, S. 353–402.

New EXMARaLDA distributions

EXMARaLDA is currently undergoing major surgery: the tools have to be adapted to work with 64bit systems, OpenJDK and Java 11+. This is necessary, among other things, to make EXMARaLDA work with video support on MAC OS Catalina. After a lot of fiddling with the details, preview versions are available now on the Preview Download page.

As a positive side effect, video support on Windows is also improved: the new JavaFX player can now be used to work with MPEG-4 videos.

FOLK Update in DGD 2.14

On Monday, April 27th, a new version of the Database for Spoken German (Datenbank für Gesprochenes Deutsch ,DGD) has gone online.

Chancellor Merkel in a parliamentary debate (FOLK_E_00390)

This version contains an extension of the Research and Teaching Corpus of Spoken German FOLK – a 300h / 3 Million Word Corpus of transcribed audio and video recordings of spoken interaction from various private, institutional and public contexts.

Transcription in FOLK is carried out with FOLKER, orthographic normalisation with OrthoNormal, both parts of the EXMARaLDA system.

EADH Workshop: Annotation of digital oral data collections in the Humanities and Social Sciences

The workshop  “Annotation of digital oral data collections in the Humanities and Social Sciences“  is one of the nine workshops taking place during the conference “Data in Digital Humanities“, hosted by the European Association for Digital Humanities (EADH) at the National University of Ireland, Galway on 9/Dec/2018. The content of this workshop reads as follows

In many scientific fields, ranging from phonetics, applied linguistics or discourse analysis, to literary studies, sociology and history among others, annotation is the common ground for systematic and empirical analysis of oral data. While the structure and the theoretical basis for the annotation and the preferred methods of analysis might differ, the main aspects and the specific conditions pertaining to the modality of the data are shared across disciplines. In this workshop, we will first give an introduction to theoretical issues and frameworks relevant to annotation in general and discuss current methodological approaches to the annotation of oral data. In hands-on sessions we will then present and compare existing tools and web services, including editors for manual transcription and annotation such as EXMARaLDA, WebAnno or OCTRA, and their interaction with and integration of automatic web services such as WebMAUS.

Date: 07.12.2018
Time: 9:00-17:00 Uhr
Place: National University of Ireland, Galway, building and room tba

Schedule
9:00 Welcome
9:15 Introduction and motivation
9:30 Principles of Transcription and Annotation
10:30 Coffee break
11:00 Workflow
12:30 Lunch
14:00 Hands-on session I: Octra, WebAnno
15:00 Coffee break
15:30 Hands-on session II: EXMARaLDA, WebMAUS
16:30 Discussion and Summary
17:00 End of workshop

For the workshop, please install the following software on your computer:

  • Google Chrome or Mozilla Firefox
  • EXMARaLDA

It would be very nice if you could share some of your data with the other participants in the course. Upload sample audio data to the following DropBox folder:

https://www.dropbox.com/sh/gusotxe8br0ptlr/AAD0kinW3RkUSQ5u9gu7DzcYa?dl=0

Please register for the workshop by sending an email to draxler@phonetik.uni-muenchen.de until 03/Dec/2018.

More Information

Workshop Annotation of digital oral data collections in the Humanities and Social Sciences

Conference Data in Digital Humanities