مرکز منطقه ای اطلاع رساني علوم و فناوري - A Target-Oriented Phonotactic Front-End for Spoken Language Recognition

DocumentCode :

1135700

Title :

A Target-Oriented Phonotactic Front-End for Spoken Language Recognition

Author :

Tong, Rong ; Ma, Bin ; Li, Haizhou ; Chng, Eng Siong

Author_Institution :

Inst. for Infocomm Res., Singapore, Singapore

Volume :

Issue :

fYear :

2009

Firstpage :

1335

Lastpage :

1347

Abstract :

This paper presents a strategy to optimize the phonotactic front-end for spoken language recognition. This is achieved by selecting a subset of phones from an existing phone recognizer´s phone inventory such that only the phones that best discriminate each of the target languages are selected. Each such phone subset will be used to construct a target-oriented phone tokenizer (TOPT). In this study, we examine different approaches to construct such phone tokenizers for the front-end of a parallel phone recognizers followed by vector space modeling (PPR-VSM) system. We show that the target-oriented phone tokenizers derived from language-specific phone recognizers are more effective than the original parallel phone recognizers. Our experimental results also show that the target-oriented phone tokenizers derived from universal phone recognizers achieve better performance than those derived from language-specific phone recognizers. Using the proposed target-oriented phone tokenizers as the phonotactic front-end, the language recognition system performance is significantly improved without the need for additional training samples. We achieve an equal error rate (EER) of 1.27%, 1.42% and 2.73% on the NIST 1996, 2003 and 2007 LRE databases respectively for 30-s closed-set tests. This system is one of the subsystems in IIR´s submission to NIST 2007 LRE.

Keywords :

speech recognition; vectors; language-specific phone recognizer; parallel phone recognizer; phone inventory; phone recognizer; spoken language recognition; target-oriented phone tokenizer; target-oriented phonotactic front-end; vector space modeling; Cepstral analysis; Error analysis; Humans; Mel frequency cepstral coefficient; NIST; Natural languages; Speech processing; Speech recognition; System performance; Target recognition; Feature selection; parallel phone recognizer (PPR); phonotactic feature; spoken language recognition; target-oriented phone tokenizer (TOPT); universal phone recognizer;

fLanguage :

English

Journal_Title :

Audio, Speech, and Language Processing, IEEE Transactions on

Publisher :

ieee

ISSN :

1558-7916

Type :

jour

DOI :

10.1109/TASL.2009.2016731

Filename :

5165117

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1135700