Speech-to-speech translation using deep learning
Abstract
Current state-of-the-art translation systems for speech-to-speech rely heavily on a
text representation for the translation. By transcoding speech to text we lose important
information about the characteristics of the voice such as the emotion, pitch
and accent. This thesis examine the possibility of using an LSTM neural network
model to translate speech-to-speech without the need of a text representation. That
is by translating using the raw audio data directly in order to persevere the characteristics
of the voice that otherwise get lost in the text transcoding part of the
translation process. As part of this research we create a data set of phrases suitable
for speech-to-speech translation tasks. The thesis result in a proof of concept system
which need to scale the underlying deep neural network in order to work better.
Degree
Student essay
Collections
View/ Open
Date
2017-03-17Author
Bredmar, Fredrik
Keywords
Neural Networks
Deep Learning
LSTM
RNN
Speech-to-speech translation
Language
eng