• DocumentCode
    2310665
  • Title

    A Novel Approach to Overcome Data Scarcity Problem for Highly Inflecting Languages

  • Author

    Sunitha, K.V.N. ; Sharada, A.

  • Author_Institution
    CSE Dept, G.Narayanamma Inst. of Technol. & Sci., Hyderabad, India
  • fYear
    2010
  • fDate
    12-13 March 2010
  • Firstpage
    290
  • Lastpage
    292
  • Abstract
    This paper proposes a model for lexicon which can be effectively used in language modelling and other NLP related tasks for agglutinative, highly inflecting and compounding languages. Due to huge amount of distinct word forms, the traditional methods based on full words are not very effective and it is not straight forward to train efficient language models with good coverage of the language. The main contribution of the paper is the proposal of new data structure and an algorithm that is applied over the data structure. This approach greatly reduces the corpus size thereby making the research work in the field easy. The use of the new data structure fastens the search process.
  • Keywords
    data structures; simulation languages; Highly Inflecting Languages; corpus size reduction; data scarcity problem; data structure; highly inflecting languages; search process; Buildings; Computational linguistics; Data structures; Databases; Educational technology; Morphology; Proposals; Shape; Telecommunication computing; Vocabulary; Inflecting; Inverted Index; Occurrence list; Tries;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Recent Trends in Information, Telecommunication and Computing (ITC), 2010 International Conference on
  • Conference_Location
    Kochi, Kerala
  • Print_ISBN
    978-1-4244-5956-8
  • Type

    conf

  • DOI
    10.1109/ITC.2010.44
  • Filename
    5460560