• DocumentCode
    337488
  • Title

    An unsupervised approach to language identification

  • Author

    Pellegrino, F. ; André-Obrecht, R.

  • Author_Institution
    IRIT, Toulouse, France
  • Volume
    2
  • fYear
    1999
  • fDate
    15-19 Mar 1999
  • Firstpage
    833
  • Abstract
    This paper presents an unsupervised approach to automatic language identification (ALI) based on vowel system modeling. Each language vowel system is modeled by a Gaussian mixture model (GMM) trained with automatically detected vowels. Since this detection is unsupervised and language independent, no labeled data are required. GMMs are initialized using an efficient data-driven variant of the LBG algorithm: the LBG-Rissanen (1983) algorithm. With 5 languages from the OGI MLTS corpus and in a close set identification task, we reach 79% of correct identification using only the vowel segments detected in 45 second duration utterances for the male speakers
  • Keywords
    Gaussian processes; acoustic signal processing; natural languages; speech processing; unsupervised learning; 45 s; Gaussian mixture model; LBG-Rissanen algorithm; OGI MLTS corpus; acoustic processing; automatic speech processing; automatically detected vowels; close set identification task; correct identification; data-driven variant LBG algorithm; language identification; language independent detection; language vowel system; male speakers; unsupervised detection; utterances; vowel segments; vowel system modeling; Acoustic signal detection; Cepstral analysis; Databases; Entropy; Hidden Markov models; Modeling; Natural languages; Speech processing; Speech recognition; Topology;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech, and Signal Processing, 1999. Proceedings., 1999 IEEE International Conference on
  • Conference_Location
    Phoenix, AZ
  • ISSN
    1520-6149
  • Print_ISBN
    0-7803-5041-3
  • Type

    conf

  • DOI
    10.1109/ICASSP.1999.759800
  • Filename
    759800