Title :
An Improved Framework for Recognizing Highly Imbalanced Bilingual Code-Switched Lectures with Cross-Language Acoustic Modeling and Frame-Level Language Identification
Author :
Ching-Feng Yeh ; Lin-Shan Lee
Author_Institution :
Grad. Inst. of Commun. Eng., Nat. Taiwan Univ., Taipei, Taiwan
Abstract :
This paper considers the recognition of a widely observed type of bilingual code-switched speech: the speaker speaks primarily the host language (usually his native language), but with a few words or phrases in the guest language (usually his second language) inserted in many utterances of the host language. In this case, not only the languages are switched back and forth within an utterance so the language identification is difficult, but much less data are available for the guest language, which results in poor recognition accuracy for the guest language part. Unit merging approaches on three levels of acoustic modeling (triphone models, HMM states and Gaussians) have been proposed for cross-lingual data sharing for such highly imbalanced bilingual code-switched speech. In this paper, we present an improved overall framework on top of the previously proposed unit merging approaches for recognizing such code-switched speech. This includes unit recovery for reconstructing the identity for units of the two languages after being merged, unit occupancy ranking to offer much more flexible data sharing between units both across languages and within the language based on the accumulated occupancy of the HMM states, and estimation of frame-level language posteriors using blurred posteriorgram features (BPFs) to be used in decoding. We also present a complete set of experimental results comparing all approaches involved for a real-world application scenario under unified conditions, and show very good improvement achieved with the proposed approaches.
Keywords :
hidden Markov models; natural language processing; speech coding; BPF; HMM states; bilingual code switched speech; blurred posteriorgram features; cross language acoustic modeling; cross lingual data sharing; decoding; flexible data sharing; frame level language identification; host language; language identification; native language; recognizing highly imbalanced bilingual code switched lectures; utterances; Acoustics; Data models; Hidden Markov models; Merging; Speech; Speech coding; Speech recognition; Bilingual; code-switching; cross-language acoustic modeling; language identification; speech recognition;
Journal_Title :
Audio, Speech, and Language Processing, IEEE/ACM Transactions on
DOI :
10.1109/TASLP.2015.2425214