Title :
Acoustic-support vector machines approach to detect spoken Arabic language
Author :
Eltayeb, Mohammed Osman ; Mustafa, Mohammed Elhafiz
Author_Institution :
Sudan Univ. of Sci. & Technol., Khartoum, Sudan
Abstract :
Spoken Language detection is the process of either accepting or rejecting a language identity from its sample speech. The process is essential as it represents the first phase for a complete multilingual-enabled speech processing applications. However, most efforts are focused on European languages and the research is relatively few for other languages such as Arabic. This is mainly due to the lack of tools and resources, e.g., Arabic speech corpora. Furthermore, the majority of the proposed approaches for Arabic detection are language-dependent rather than independent ones, in which the model uses only acoustic properties of speech signal. This paper describes an ongoing research to develop a language independent Modern Standard Arabic (MSA) detector, which is a binary Support Vector Machines (SVM) classifier that is based on speech acoustic features. In that context, the classifier is used to classify speech utterance into either classA, which represents the Arabic language or classNA to denote non-Arabic languages. As most currently available speech corpora are license restricted and their languages are selected based on population or geographical distribution, a new multilingual speech corpus with six languages is being created. Languages in this created corpus have some sort of similarity with MSA, e.g., Arabic and Hebrew. This property adds another dimension of complexity to the classification task, but it is essential as one of the major goal of this research is to measure whether the efficiency of the MSA model will be preserved on the same level when tested with other languages that have some sort of relationship with the MSA or other Arabic dialect. This will be referred to in this paper as stability-against-similarity of the model.
Keywords :
acoustic signal detection; natural language processing; signal classification; speech processing; support vector machines; text analysis; Arabic speech corpora; MSA detector; MSA model; SVM classifier; acoustic properties; acoustic-support vector machine approach; binary support vector machine classifier; classA; classNA; geographical distribution; language independent modern standard Arabic detector; multilingual-enabled speech processing applications; nonArabic languages; speech acoustic feature; speech signal; speech utterance classification; spoken Arabic language detection; Detectors; Hidden Markov models; Speech; Speech processing; Support vector machines; Testing; Training; Acoustic Model; Arabic Language; Language Detection; Language Identification; Language Recognition; Modern Standard Arabic; Speech Corpus; Support Vector Machines;
Conference_Titel :
Computing, Electrical and Electronics Engineering (ICCEEE), 2013 International Conference on
Conference_Location :
Khartoum
Print_ISBN :
978-1-4673-6231-3
DOI :
10.1109/ICCEEE.2013.6633994