Title :
Identification of spoken European languages
Author :
Caseiro, Diamantino ; Trancoso, Isabel
Author_Institution :
INESC/IST, INESC, Lisbon, Portugal
Abstract :
Automatic spoken language identification is the problem of identifying the language being spoken from a sample of speech by an unknown speaker. In this paper we studied the problem of language identification in the context of the European languages, which allowed us to study the effect of language proximity in Indo-European languages. The results reveal a significant impact on the identification of some languages. Current language identification systems vary in their complexity. The systems that use higher level information have the best performance. Nevertheless, that information is hard to collect for each new language. The system presented in this work is easily extendable to new languages because it uses very little linguistic information. In fact, the presented system needs only one language specific phone recogniser (in our case the Portuguese one), and is trained with speech from each of the other languages. With the SpeechDat-M corpus, with 6 European languages (English, French German, Italian, Portuguese and Spanish) our system achieved an identification rate of about 79% on 5-second utterances.
Keywords :
natural language processing; speaker recognition; English language; French language; German language; Indo-European languages; Italian language; Portuguese languages; Spanish language; SpeechDat-M corpus; automatic spoken European language identification system; language proximity; language specific phone recogniser; linguistic information; speaker identification; Computational modeling; Computer architecture; Europe; Hidden Markov models; Pragmatics; Speech; Speech recognition;
Conference_Titel :
Signal Processing Conference (EUSIPCO 1998), 9th European
Conference_Location :
Rhodes
Print_ISBN :
978-960-7620-06-4