• DocumentCode
    672342
  • Title

    Investigation of multilingual deep neural networks for spoken term detection

  • Author

    Knill, K.M. ; Gales, Mark J.F. ; Rath, Satish Prasad ; Woodland, Philip C. ; Zhang, Chenghui ; Zhang, S.-X.

  • Author_Institution
    Dept. of Eng., Univ. of Cambridge, Cambridge, UK
  • fYear
    2013
  • fDate
    8-12 Dec. 2013
  • Firstpage
    138
  • Lastpage
    143
  • Abstract
    The development of high-performance speech processing systems for low-resource languages is a challenging area. One approach to address the lack of resources is to make use of data from multiple languages. A popular direction in recent years is to use bottleneck features, or hybrid systems, trained on multilingual data for speech-to-text (STT) systems. This paper presents an investigation into the application of these multilingual approaches to spoken term detection. Experiments were run using the IARPA Babel limited language pack corpora (~10 hours/language) with 4 languages for initial multilingual system development and an additional held-out target language. STT gains achieved through using multilingual bottleneck features in a Tandem configuration are shown to also apply to keyword search (KWS). Further improvements in both STT and KWS were observed by incorporating language questions into the Tandem GMM-HMM decision trees for the training set languages. Adapted hybrid systems performed slightly worse on average than the adapted Tandem systems. A language independent acoustic model test on the target language showed that retraining or adapting of the acoustic models to the target language is currently minimally needed to achieve reasonable performance.
  • Keywords
    Gaussian processes; decision trees; hidden Markov models; mixture models; natural language processing; neural nets; speech recognition; speech synthesis; IARPA Babel limited language pack corpora; KWS; STT systems; Tandem configuration; high-performance speech processing systems; hybrid systems; initial multilingual system development; keyword search; language independent acoustic model test; low-resource languages; multilingual bottleneck features; multilingual deep neural networks; speech-to-text systems; spoken term detection; tandem GMM-HMM decision trees; training set languages; Acoustics; Decision trees; Hidden Markov models; Speech; Speech recognition; Training; Training data; Multilingual; keyword search; neural networks; speech recognition; spoken term detection;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Automatic Speech Recognition and Understanding (ASRU), 2013 IEEE Workshop on
  • Conference_Location
    Olomouc
  • Type

    conf

  • DOI
    10.1109/ASRU.2013.6707719
  • Filename
    6707719