Training for the Unexpected Approaching Universal Phone Recognition for Computer-Assisted IPA Transcription of Low-Resource Languages
Training for the Unexpected Approaching Universal Phone Recognition for Computer-Assisted IPA Transcription of Low-Resource Languages
Abstract
Abstract
We set out to develop a language-agnostic ASR model for the phonetic transcription
of speech into the International Phonetic Alphabet (IPA). While NLP and Automatic-
Speech-Recognition (ASR) have made immense leaps in research and quality, most of
the world’s languages are still excluded from this development.
In the interest of aiding documentation and linguistic work with low-resource languages,
we examine the possibility of universal Speech-to-IPA (STIPA) transcription by
exploring the cross-lingual transfer of STIPA knowledge, as learnt from high-resource
languages, to unseen and low-resource languages in zero-shot settings. Our specific goal
is the application and evaluation of cross-lingual STIPA to the severely endangered language
Sanna (also ”Cypriot Maronite Arabic”, described in e.g. Borg, 2011).
Degree
Student essay
Collections
View/ Open
Date
2025-06-13Author
Lee Suchardt, Jacob
Keywords
machine learning, automatic-speech-recognition, cross-lingual transfer learning, phonetic transcription, speech-to-IPA, low-resource languages, Whisper
Language
eng