• DocumentCode
    3585058
  • Title

    Training candidate selection for effective rejection in open-set language identification

  • Author

    Qian Zhang ; Hansen, John H. L.

  • Author_Institution
    Center for Robust Speech Syst. (CRSS), Univ. of Texas at Dallas, Richardson, TX, USA
  • fYear
    2014
  • Firstpage
    384
  • Lastpage
    389
  • Abstract
    Research in open-set language identification (LID) generally focuses more on accurate in-set modeling versus improved out-of-set (OOS) rejection. Unknown or OOS language rejection is a challenge, since research developers seldom commit equivalent OOS corpus development effort versus the desired in-set languages. To address this, we propose an OOS candidate selection method for universal OOS language coverage. Since effective selection always requires abundant knowledge of inter-language relationships, three broad measurements across world languages are considered. Finally, the advanced OOS selection method is evaluated on a database derived from a large-scale corpus (LRE-09) with a state-of-the-art i-Vector system followed by two back-ends. The baseline system is realized using a random selection of OOS candidates. With the proposed selection method and probabilistic linear discriminative analysis (PLDA) back-end, the OOS rejection performance is improved with false alarm and miss rates achieving a relative reduction of 32.6% and 4.4%, respectively. In addition, the overall classification performance are relatively improved 8.4% and 7.5% according to the two back-ends based on an average cost function.
  • Keywords
    natural language processing; probability; LID; LRE-09; OOS corpus development; OOS language rejection; OOS language selection method; OOS rejection performance improvement; PLDA back-end; average cost function; baseline system; classification performance improvement; false alarm rates; i-Vector system; in-set language modeling; interlanguage relationship knowledge; large-scale corpus; miss rates; open-set language identification; out-of-set rejection; probabilistic linear discriminative analysis back-end; relative reduction; universal OOS language coverage; unknown language rejection; world languages; Abstracts; Acoustics; Pragmatics; Speech; LRE-09; Open-set language identification; Out-of-set identification; candidate selection; i-Vector; language distance;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Spoken Language Technology Workshop (SLT), 2014 IEEE
  • Type

    conf

  • DOI
    10.1109/SLT.2014.7078605
  • Filename
    7078605