• DocumentCode
    1659097
  • Title

    An efficient method of language identification using LVQ network

  • Author

    Xiao, Han ; Yu, Lei ; Chen, Kai

  • Author_Institution
    Sch. of Inf. Eng., Beijing Univ. of Posts & Telecommun., Beijing
  • fYear
    2008
  • Firstpage
    1690
  • Lastpage
    1694
  • Abstract
    This paper presents a new method to identify languages. A LVQ (learning vector quantization) network aimed at language identification is introduced. The presence of particular characters, words and the statistical information of word lengths are used as a feature vector. The new classification technique is faster than the conventional N-gram based classification approach, but it performs similarly in correct classification rate. In an identification experiment with 8 Roman alphabet languages, the LVQ network achieved 97.6% correct classification rate with 500 bytes, but it is five times faster than N-gram based approach.
  • Keywords
    classification; feature extraction; learning (artificial intelligence); natural languages; text analysis; vector quantisation; Roman alphabet languages; feature extraction; feature vector; language identification; learning vector quantization; word lengths; Books; Data mining; Feature extraction; Frequency; Natural languages; Organizing; Statistical distributions; Statistics; Vector quantization; Web and internet services;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Signal Processing, 2008. ICSP 2008. 9th International Conference on
  • Conference_Location
    Beijing
  • Print_ISBN
    978-1-4244-2178-7
  • Electronic_ISBN
    978-1-4244-2179-4
  • Type

    conf

  • DOI
    10.1109/ICOSP.2008.4697462
  • Filename
    4697462