• DocumentCode
    74383
  • Title

    Language identification for internet security in the basque context: A cross-lingual approach

  • Author

    Barroso, N. ; Lopez de Ipina, Karmele ; Ezeiza, A. ; Hernandez, C.

  • Author_Institution
    Polytech. Sch., Univ. of the Basque Country, Bilbao, Spain
  • Volume
    28
  • Issue
    8
  • fYear
    2013
  • fDate
    Aug. 2013
  • Firstpage
    24
  • Lastpage
    31
  • Abstract
    The present work describes the development of an LID system suited for handling security tasks in the Internet. The development context was the Infozazpi Internet digital radio, and the task presented substantial complexity due to the trilingual environment and the scarcity of language resources for Basque. In order to overcome previous difficulties, we propose a hybrid system based on the selection of subword units by SVMs, MLP classifiers, and discriminant analysis improved with robust regularized covariance matrix estimation methods and stochastic methods for ASR tasks (SC-HMM and n-grams). Our new subword unit proposals and the use of triphones and cross-lingual approaches considerably improve the system performance, achieving an optimal and stable LID recognition rate despite the complexity of the problem.
  • Keywords
    Internet; covariance matrices; digital radio; estimation theory; hidden Markov models; multilayer perceptrons; natural language processing; security of data; speech recognition; support vector machines; ASR tasks; Basque context; Infozazpi Internet digital radio; Internet security; LID recognition rate; LID system; MLP classifier; SC-HMM; SVM classifier; cross-lingual approach; discriminant analysis; handling security tasks; hybrid system; language identification; language resources; n-grams; robust regularized covariance matrix estimation methods; stochastic methods; subword unit proposals; subword units; system performance; trilingual environment; triphones; Automatic speech recognition; Context awareness; Hidden Markov models; Interent; Natural language processing; Security; Terminology;
  • fLanguage
    English
  • Journal_Title
    Aerospace and Electronic Systems Magazine, IEEE
  • Publisher
    ieee
  • ISSN
    0885-8985
  • Type

    jour

  • DOI
    10.1109/MAES.2013.6575408
  • Filename
    6575408