• DocumentCode
    134334
  • Title

    A fusion approach to spoken language identification based on combining multiple phone recognizers and speech attribute detectors

  • Author

    Yannan Wang ; Jun Du ; Lirong Dai ; Chin-Hui Lee

  • Author_Institution
    Nat. Eng. Lab. for Speech & Language Inf. Process., Univ. of Sci. & Technol. of China, Hefei, China
  • fYear
    2014
  • fDate
    12-14 Sept. 2014
  • Firstpage
    158
  • Lastpage
    162
  • Abstract
    We propose a fusion approach to spoken language recognition by combining multiple tokenizers with phone and speech attribute models trained on a collection of multilingual corpora with different front-end features. The speech attribute models are trained with bottleneck features extracted from deep neural networks while the phone models are trained with temporal patterns neural network features. By exploiting different combinations of front-end features, fundamental speech units and tokenization models, we demonstrate that speech attribute units are complementary to phone units and produce enhanced performances when they are combined with conventional phone based tokenizers. Tested on the National Institute of Standards and Technology 2009 language recognition evaluation task, leveraged upon diversity in system combination, we find that speech attribute recognition followed by language modeling achieves an additional average relative equal error rate reduction of more than 20% when fused with the state-of-the-art systems with phone recognition followed by language modeling.
  • Keywords
    feature extraction; neural nets; speech recognition; bottleneck feature extraction; front-end features; fusion approach; language modeling; language recognition evaluation task; multilingual corpora; phone attribute models; phone based tokenizers; phone recognition; phone recognizers; phone units; speech attribute detectors; speech attribute models; speech attribute recognition; speech attribute units; spoken language identification; spoken language recognition; temporal pattern neural network features; tokenization models; Acoustics; Feature extraction; Hidden Markov models; NIST; Neural networks; Speech; Speech recognition; automatic speech attribute transcription; bottleneck features; deep neural network; manner and place of articulation; phone recognition followed by language modeling; phonetic features; spoken language recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Chinese Spoken Language Processing (ISCSLP), 2014 9th International Symposium on
  • Conference_Location
    Singapore
  • Type

    conf

  • DOI
    10.1109/ISCSLP.2014.6936714
  • Filename
    6936714