Title :
A Target-Oriented Phonotactic Front-End for Spoken Language Recognition
Author :
Tong, Rong ; Ma, Bin ; Li, Haizhou ; Chng, Eng Siong
Author_Institution :
Inst. for Infocomm Res., Singapore, Singapore
Abstract :
This paper presents a strategy to optimize the phonotactic front-end for spoken language recognition. This is achieved by selecting a subset of phones from an existing phone recognizer´s phone inventory such that only the phones that best discriminate each of the target languages are selected. Each such phone subset will be used to construct a target-oriented phone tokenizer (TOPT). In this study, we examine different approaches to construct such phone tokenizers for the front-end of a parallel phone recognizers followed by vector space modeling (PPR-VSM) system. We show that the target-oriented phone tokenizers derived from language-specific phone recognizers are more effective than the original parallel phone recognizers. Our experimental results also show that the target-oriented phone tokenizers derived from universal phone recognizers achieve better performance than those derived from language-specific phone recognizers. Using the proposed target-oriented phone tokenizers as the phonotactic front-end, the language recognition system performance is significantly improved without the need for additional training samples. We achieve an equal error rate (EER) of 1.27%, 1.42% and 2.73% on the NIST 1996, 2003 and 2007 LRE databases respectively for 30-s closed-set tests. This system is one of the subsystems in IIR´s submission to NIST 2007 LRE.
Keywords :
speech recognition; vectors; language-specific phone recognizer; parallel phone recognizer; phone inventory; phone recognizer; spoken language recognition; target-oriented phone tokenizer; target-oriented phonotactic front-end; vector space modeling; Cepstral analysis; Error analysis; Humans; Mel frequency cepstral coefficient; NIST; Natural languages; Speech processing; Speech recognition; System performance; Target recognition; Feature selection; parallel phone recognizer (PPR); phonotactic feature; spoken language recognition; target-oriented phone tokenizer (TOPT); universal phone recognizer;
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
DOI :
10.1109/TASL.2009.2016731