TRANSLITERATION BETWEEN SPOKEN LANGUAGE CORPORA: MOVING BETWEEN DANISH BYSOC AND SWEDISH GSLC
The paper discusses problems that arise in trying to transfer a spoken language corpus transcribed and formatted according to one standard into the standard and format of another corpus. Some of the problems that arise are related to the differences that exist between the standards and formats of different corpora. Other problems are related to human errors and lack of reliability in creating the transcriptions. Although the discussion is based on transfer and transliteration between two specific corpora (the Swedish GSLC (Göteborg Spoken Language Corpus) and the Danish BySoc (By Sociolingvistik Corpus), we believe the discussion in the article documents and highlights problems of a general kind which have to be faced whenever spoken language corpora of different formats are to be compared.
Institutionen för lingvistik