DocumentCode :
1659097
Title :
An efficient method of language identification using LVQ network
Author :
Xiao, Han ; Yu, Lei ; Chen, Kai
Author_Institution :
Sch. of Inf. Eng., Beijing Univ. of Posts & Telecommun., Beijing
fYear :
2008
Firstpage :
1690
Lastpage :
1694
Abstract :
This paper presents a new method to identify languages. A LVQ (learning vector quantization) network aimed at language identification is introduced. The presence of particular characters, words and the statistical information of word lengths are used as a feature vector. The new classification technique is faster than the conventional N-gram based classification approach, but it performs similarly in correct classification rate. In an identification experiment with 8 Roman alphabet languages, the LVQ network achieved 97.6% correct classification rate with 500 bytes, but it is five times faster than N-gram based approach.
Keywords :
classification; feature extraction; learning (artificial intelligence); natural languages; text analysis; vector quantisation; Roman alphabet languages; feature extraction; feature vector; language identification; learning vector quantization; word lengths; Books; Data mining; Feature extraction; Frequency; Natural languages; Organizing; Statistical distributions; Statistics; Vector quantization; Web and internet services;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Signal Processing, 2008. ICSP 2008. 9th International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4244-2178-7
Electronic_ISBN :
978-1-4244-2179-4
Type :
conf
DOI :
10.1109/ICOSP.2008.4697462
Filename :
4697462
Link To Document :
بازگشت