• DocumentCode
    1652141
  • Title

    Modeling characteristics of agglutinative languages with Multi-class language model for ASR system

  • Author

    Dawa, I. ; Sagisaka, Y. ; Nakamura, S.

  • fYear
    2009
  • Firstpage
    104
  • Lastpage
    109
  • Abstract
    In this paper, we discuss a new language model that considers the characteristics of the agglutinative languages. We used Mongolian (a Cyrillic language system used in Mongolia) as an example from which to build the language model. We developed a Multi-class N-gram language model based on similar word clustering that focuses on the variable suffixes of a word in Mongolian. By applying our proposed language model, the resulting recognition system can improve performance by 6.85% compared with a conventional word N-gram when applying the ATRASR engine. We also confirmed that our new model will be convenient for rapid development of an ASR system for resource-deficient languages, especially for agglutinative languages such as Mongolian.
  • Keywords
    natural language processing; pattern clustering; speech recognition; statistical analysis; ATRASR engine; Cyrillic language system; Mongolian language; Mongolian word; agglutinative languages; automatic speech recognition system; multiclass N-gram language model; multiclass language model; resource-deficient languages; similar word clustering; variable suffix; Accuracy; Automatic speech recognition; Databases; Engines; Natural languages; Probability; Statistical analysis; Tellurium; Training data; Writing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Speech Database and Assessments, 2009 Oriental COCOSDA International Conference on
  • Conference_Location
    Urumqi
  • Print_ISBN
    978-1-4244-4400-7
  • Electronic_ISBN
    978-1-4244-4400-7
  • Type

    conf

  • DOI
    10.1109/ICSDA.2009.5278368
  • Filename
    5278368