Title :
Language identification for internet security in the basque context: A cross-lingual approach
Author :
Barroso, N. ; Lopez de Ipina, Karmele ; Ezeiza, A. ; Hernandez, C.
Author_Institution :
Polytech. Sch., Univ. of the Basque Country, Bilbao, Spain
Abstract :
The present work describes the development of an LID system suited for handling security tasks in the Internet. The development context was the Infozazpi Internet digital radio, and the task presented substantial complexity due to the trilingual environment and the scarcity of language resources for Basque. In order to overcome previous difficulties, we propose a hybrid system based on the selection of subword units by SVMs, MLP classifiers, and discriminant analysis improved with robust regularized covariance matrix estimation methods and stochastic methods for ASR tasks (SC-HMM and n-grams). Our new subword unit proposals and the use of triphones and cross-lingual approaches considerably improve the system performance, achieving an optimal and stable LID recognition rate despite the complexity of the problem.
Keywords :
Internet; covariance matrices; digital radio; estimation theory; hidden Markov models; multilayer perceptrons; natural language processing; security of data; speech recognition; support vector machines; ASR tasks; Basque context; Infozazpi Internet digital radio; Internet security; LID recognition rate; LID system; MLP classifier; SC-HMM; SVM classifier; cross-lingual approach; discriminant analysis; handling security tasks; hybrid system; language identification; language resources; n-grams; robust regularized covariance matrix estimation methods; stochastic methods; subword unit proposals; subword units; system performance; trilingual environment; triphones; Automatic speech recognition; Context awareness; Hidden Markov models; Interent; Natural language processing; Security; Terminology;
Journal_Title :
Aerospace and Electronic Systems Magazine, IEEE
DOI :
10.1109/MAES.2013.6575408