مرکز منطقه ای اطلاع رساني علوم و فناوري - An Improved Framework for Recognizing Highly Imbalanced Bilingual Code-Switched Lectures with Cross-Language Acoustic Modeling and Frame-Level Language Identification

DocumentCode :

35672

Title :

An Improved Framework for Recognizing Highly Imbalanced Bilingual Code-Switched Lectures with Cross-Language Acoustic Modeling and Frame-Level Language Identification

Author :

Ching-Feng Yeh ; Lin-Shan Lee

Author_Institution :

Grad. Inst. of Commun. Eng., Nat. Taiwan Univ., Taipei, Taiwan

Volume :

Issue :

fYear :

2015

fDate :

Jul-15

Firstpage :

1144

Lastpage :

1159

Abstract :

This paper considers the recognition of a widely observed type of bilingual code-switched speech: the speaker speaks primarily the host language (usually his native language), but with a few words or phrases in the guest language (usually his second language) inserted in many utterances of the host language. In this case, not only the languages are switched back and forth within an utterance so the language identification is difficult, but much less data are available for the guest language, which results in poor recognition accuracy for the guest language part. Unit merging approaches on three levels of acoustic modeling (triphone models, HMM states and Gaussians) have been proposed for cross-lingual data sharing for such highly imbalanced bilingual code-switched speech. In this paper, we present an improved overall framework on top of the previously proposed unit merging approaches for recognizing such code-switched speech. This includes unit recovery for reconstructing the identity for units of the two languages after being merged, unit occupancy ranking to offer much more flexible data sharing between units both across languages and within the language based on the accumulated occupancy of the HMM states, and estimation of frame-level language posteriors using blurred posteriorgram features (BPFs) to be used in decoding. We also present a complete set of experimental results comparing all approaches involved for a real-world application scenario under unified conditions, and show very good improvement achieved with the proposed approaches.

Keywords :

hidden Markov models; natural language processing; speech coding; BPF; HMM states; bilingual code switched speech; blurred posteriorgram features; cross language acoustic modeling; cross lingual data sharing; decoding; flexible data sharing; frame level language identification; host language; language identification; native language; recognizing highly imbalanced bilingual code switched lectures; utterances; Acoustics; Data models; Hidden Markov models; Merging; Speech; Speech coding; Speech recognition; Bilingual; code-switching; cross-language acoustic modeling; language identification; speech recognition;

fLanguage :

English

Journal_Title :

Audio, Speech, and Language Processing, IEEE/ACM Transactions on

Publisher :

ieee

ISSN :

2329-9290

Type :

jour

DOI :

10.1109/TASLP.2015.2425214

Filename :

7090981

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=35672