• DocumentCode
    3423219
  • Title

    Improved phonotactic language identification using random forest language models

  • Author

    Wang, XiaoRui ; Wang, ShiJin ; Liang, JiaEn ; Xu, Bo

  • Author_Institution
    Digital Media Content Technol. Res. Center, Chinese Acad. of Sci., Beijing
  • fYear
    2008
  • fDate
    March 31 2008-April 4 2008
  • Firstpage
    4237
  • Lastpage
    4240
  • Abstract
    Recently a new language model, the random forest language model (RFLM), has been proposed and shown encouraging results in speech recognition tasks. In this paper we applied the RFLM to language identification tasks. We proposed a shared backoff smoothing to deal with data sparseness problem. Experiments were conducted on a subset of NIST 2003 language recognition evaluation data. The RFLM obtained 15.7% relative error rate reduction comparing with the standard trigram LM. The RFLM can be used as a counterpart to n-gram LM and BTLM for system fusion. We also empirically studied the relation between system performance and the tree numbers in a RFLM.
  • Keywords
    decision trees; natural language processing; speech recognition; decision tree; n-gram LM; phonotactic language identification; random forest language models; shared backoff smoothing; sparseness problem; speech recognition; system fusion; Automation; Decision trees; Decoding; History; NIST; Natural languages; Pattern recognition; Random media; Smoothing methods; Speech recognition; Decision Tree Language Models; Language Identification; Random Forest Language Models;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on
  • Conference_Location
    Las Vegas, NV
  • ISSN
    1520-6149
  • Print_ISBN
    978-1-4244-1483-3
  • Electronic_ISBN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2008.4518590
  • Filename
    4518590