• DocumentCode
    35672
  • Title

    An Improved Framework for Recognizing Highly Imbalanced Bilingual Code-Switched Lectures with Cross-Language Acoustic Modeling and Frame-Level Language Identification

  • Author

    Ching-Feng Yeh ; Lin-Shan Lee

  • Author_Institution
    Grad. Inst. of Commun. Eng., Nat. Taiwan Univ., Taipei, Taiwan
  • Volume
    23
  • Issue
    7
  • fYear
    2015
  • fDate
    Jul-15
  • Firstpage
    1144
  • Lastpage
    1159
  • Abstract
    This paper considers the recognition of a widely observed type of bilingual code-switched speech: the speaker speaks primarily the host language (usually his native language), but with a few words or phrases in the guest language (usually his second language) inserted in many utterances of the host language. In this case, not only the languages are switched back and forth within an utterance so the language identification is difficult, but much less data are available for the guest language, which results in poor recognition accuracy for the guest language part. Unit merging approaches on three levels of acoustic modeling (triphone models, HMM states and Gaussians) have been proposed for cross-lingual data sharing for such highly imbalanced bilingual code-switched speech. In this paper, we present an improved overall framework on top of the previously proposed unit merging approaches for recognizing such code-switched speech. This includes unit recovery for reconstructing the identity for units of the two languages after being merged, unit occupancy ranking to offer much more flexible data sharing between units both across languages and within the language based on the accumulated occupancy of the HMM states, and estimation of frame-level language posteriors using blurred posteriorgram features (BPFs) to be used in decoding. We also present a complete set of experimental results comparing all approaches involved for a real-world application scenario under unified conditions, and show very good improvement achieved with the proposed approaches.
  • Keywords
    hidden Markov models; natural language processing; speech coding; BPF; HMM states; bilingual code switched speech; blurred posteriorgram features; cross language acoustic modeling; cross lingual data sharing; decoding; flexible data sharing; frame level language identification; host language; language identification; native language; recognizing highly imbalanced bilingual code switched lectures; utterances; Acoustics; Data models; Hidden Markov models; Merging; Speech; Speech coding; Speech recognition; Bilingual; code-switching; cross-language acoustic modeling; language identification; speech recognition;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    2329-9290
  • Type

    jour

  • DOI
    10.1109/TASLP.2015.2425214
  • Filename
    7090981