Title :
A fusion approach to spoken language identification based on combining multiple phone recognizers and speech attribute detectors
Author :
Yannan Wang ; Jun Du ; Lirong Dai ; Chin-Hui Lee
Author_Institution :
Nat. Eng. Lab. for Speech & Language Inf. Process., Univ. of Sci. & Technol. of China, Hefei, China
Abstract :
We propose a fusion approach to spoken language recognition by combining multiple tokenizers with phone and speech attribute models trained on a collection of multilingual corpora with different front-end features. The speech attribute models are trained with bottleneck features extracted from deep neural networks while the phone models are trained with temporal patterns neural network features. By exploiting different combinations of front-end features, fundamental speech units and tokenization models, we demonstrate that speech attribute units are complementary to phone units and produce enhanced performances when they are combined with conventional phone based tokenizers. Tested on the National Institute of Standards and Technology 2009 language recognition evaluation task, leveraged upon diversity in system combination, we find that speech attribute recognition followed by language modeling achieves an additional average relative equal error rate reduction of more than 20% when fused with the state-of-the-art systems with phone recognition followed by language modeling.
Keywords :
feature extraction; neural nets; speech recognition; bottleneck feature extraction; front-end features; fusion approach; language modeling; language recognition evaluation task; multilingual corpora; phone attribute models; phone based tokenizers; phone recognition; phone recognizers; phone units; speech attribute detectors; speech attribute models; speech attribute recognition; speech attribute units; spoken language identification; spoken language recognition; temporal pattern neural network features; tokenization models; Acoustics; Feature extraction; Hidden Markov models; NIST; Neural networks; Speech; Speech recognition; automatic speech attribute transcription; bottleneck features; deep neural network; manner and place of articulation; phone recognition followed by language modeling; phonetic features; spoken language recognition;
Conference_Titel :
Chinese Spoken Language Processing (ISCSLP), 2014 9th International Symposium on
Conference_Location :
Singapore
DOI :
10.1109/ISCSLP.2014.6936714