Title :
Feature Selection Based on Mutual Information for Language Recognition
Author :
Deng, Yan ; Liu, Jia
Author_Institution :
Dept. of Electron. Eng., Tsinghua Univ., Beijing, China
Abstract :
The prevailing system for language recognition is the parallel phoneme recognition followed by vector space modeling (PPRVSM), which uses a vector space model to describe the cooccurrence information of phones. As the super-vectors are composed of phonetic N-Grams, so for high dimension vectors, there is a problem that the number of N-Grams grows exponentially as the order N increases, which will result in data sparseness. In this paper, we propose a feature selection algorithm to solve this problem, which uses the maximum relevance criteria based on mutual information to select the most discriminative N-Grams to identify languages. The effectiveness of the technique is demonstrated on the NIST 2005 language recognition 30-second task. And we achieve 4.81% in terms of equal-error-rate (EER).
Keywords :
natural language processing; speech recognition; data sparseness; feature selection algorithm; language recognition; maximum relevance criteria; parallel phoneme recognition; phonetic N-Grams; super vectors; vector space modeling; Hidden Markov models; Information science; Laboratories; Lattices; Mutual information; Natural languages; Probability; Space technology; Support vector machine classification; Support vector machines;
Conference_Titel :
Image and Signal Processing, 2009. CISP '09. 2nd International Congress on
Conference_Location :
Tianjin
Print_ISBN :
978-1-4244-4129-7
Electronic_ISBN :
978-1-4244-4131-0
DOI :
10.1109/CISP.2009.5303829