EG-Spanish Corpus on Swiss repository LaRS/SWISSUbase

By Sandra Schlumpf-Thurnherr (Universität Basel)

The EG-Spanish Corpus was collected as part of the research project “Improving the visibility of Equatorial Guinea as a Spanish-speaking country”, funded by the Swiss National Science Foundation (project number 192228, https://data.snf.ch/grants/grant/192228, PI: Sandra Schlumpf-Thurnherr) and hosted at the University of Basel, Switzerland.

One of the main objectives of the research project was to conduct a systematic and methodologically broad-based data collection in Equatorial Guinea, a former colony of Spain and today the only Spanish-speaking country in central Africa. The data was collected during a one-month research stay in 2022 in Equatorial Guinea by Sandra Schlumpf-Thurnherr and Sara Carreira. To get an idea about the geographic variation of the sociolinguistic landscape within the country, we worked both on the island of Bioko, where the capital Malabo is located, and in the continental part of the country, Río Muni. In total, the EG-Spanish Corpus contains data from 186 participants (89 men, 97 women) who live in different parts of the country, both urban and rural, and represent different socio-demographic profiles in terms of age, ethnicity, place of birth, level of education, and professional situation.

The EG-Spanish Corpus is unique in its kind and therefore of great relevance for researchers interested in the Spanish language in general and, more specifically, in the linguistic situation in Equatorial Guinea. So far, different dialectological and sociolinguistic analyses have been carried out, such as about the current use and values of Spanish in Equatorial Guinea; language attitudes and ideologies (e.g., Schlumpf 2025); the language contact situation between Spanish and local languages such as Fang, Bubi, and Pichi; and about selected morphosyntactic features of Equatoguinean Spanish (e.g., Carreira 2024).

Specifically, the EG-Spanish Corpus includes data collected through five different data collection methods (for further details, see Schlumpf & Carreira 2024): a sociolinguistic questionnaire (data from all 186 participants involved in the study); a linguistic questionnaire (152 participants: 73 men, 79 women); a narrative task, which consisted in an adapted version of the story Frog, where are you? by Mercer Mayer, 1969 (135 participants: 66 men, 69 women); a semi-structured interview in the form of a life story interview (62 interviews: 30 men, 32 women, a total of 47 hours and 37 minutes); and a Verbal Guise Test (58 participants: 35 men, 23 women).

The narrative task and a selection of 36 interviews were transcribed using EXMARaLDA. This corresponds to a total of 393 minutes (43 925 tokens) in the case of the 135 narrative tasks; and a total of 27 hours and 52 minutes (263 698 tokens) in the case of the selected interviews. Subsequently, POS tagging was carried out using TreeTagger. Finally, EXMARaLDA enabled (meta)linguistic searches and the annotation of concordances thanks to the integrated analysis software EXACT Analysis and Concordance Tool).

The written data (sociolinguistic and linguistic questionnaire, questionnaire of the Verbal Guise Test) were collected by hand on paper due to the circumstances on site. All data were subsequently digitized and systematically prepared so that they could be processed with SPSS.


Now, the project has also been registered on LaRS (Language Repository of Switzerland), which is hosted at the Swiss repository SWISSUbase. This is a very important step toward increasing the visibility of Equatorial Guinea, not only within the Swiss research community, but also internationally. General information about the project can be accessed, as well as details about the various datasets of the EG-Spanish Corpus, all of which have been assigned a permanent DOI (see below): the sociolinguistic questionnaire, the linguistic questionnaire, the semi-structured interview, the narrative task, and the Verbal Guise Test.

Although most of the research data could not be made available online for data protection reasons, metadata has been stored for each dataset. Furthermore, in the case of the narrative task, the complete transcripts are available for download via open access. As well, interested researchers can get in touch with the principal investigator to get further information about the project and the research data.

References

Journal articles/Book chapters

Carreira, S. (2024). El español en contacto con lenguas bantúes y el francés: nuevos datos acerca de la estructura «verbo de movimiento + a / en + destino» en el español de Guinea Ecuatorial. Revista de Investigación Lingüística, 27, pp. 15-39. https://doi.org/10.6018/ril.592821

Schlumpf, S. (2025). Between the Local and the Global: Language Ideologies in Post-Colonial Equatorial Guinea. In: Bürki, Y., & A. N. García Agüero (Eds.): Language, Borders and Bordering Practices / Lenguaje, fronteras y prácticas de fronterización: Sociolinguistic Perspectives / Perspectivas sociolingüísticas. Berlin & Boston: De Gruyter, pp. 435-466. https://doi.org/10.1515/9783111034225-016

Schlumpf, S., & Carreira, S. (2024). Presentación de un corpus para el estudio del español actual en Guinea Ecuatorial. Boletín de Filología 59:1, pp. 403-436. https://boletinfilologia.uchile.cl/index.php/BDF/article/view/75046/76420

Datasets

Schlumpf-Thurnherr, S. (2025). EG-Spanish Corpus: Sociolinguistic questionnaire (Version 1.0) [Data set]. LaRS – Language Repository of Switzerland. https://doi.org/10.48656/k5v5-2869

Schlumpf-Thurnherr, S., & Carreira, S. (2025). EG-Spanish Corpus: Linguistic questionnaire (Version 1.0) [Data set]. LaRS – Language Repository of Switzerland. https://doi.org/10.48656/kq3s-dv13

Schlumpf-Thurnherr, S. (2025). EG-Spanish Corpus: Semi-structured interview (Version 1.0) [Data set]. LaRS – Language Repository of Switzerland. https://doi.org/10.48656/g9eg-an83

Schlumpf-Thurnherr, S., & Carreira, S. (2025). EG-Spanish Corpus: Narrative task (Version 1.0) [Data set]. LaRS – Language Repository of Switzerland. https://doi.org/10.48656/8q22-3z08

Schlumpf-Thurnherr, S. (2025). EG-Spanish Corpus: Verbal Guise Test (Version 1.0) [Data set]. LaRS – Language Repository of Switzerland. https://doi.org/10.48656/m8jw-0b66