Title :
A Novel Approach to Overcome Data Scarcity Problem for Highly Inflecting Languages
Author :
Sunitha, K.V.N. ; Sharada, A.
Author_Institution :
CSE Dept, G.Narayanamma Inst. of Technol. & Sci., Hyderabad, India
Abstract :
This paper proposes a model for lexicon which can be effectively used in language modelling and other NLP related tasks for agglutinative, highly inflecting and compounding languages. Due to huge amount of distinct word forms, the traditional methods based on full words are not very effective and it is not straight forward to train efficient language models with good coverage of the language. The main contribution of the paper is the proposal of new data structure and an algorithm that is applied over the data structure. This approach greatly reduces the corpus size thereby making the research work in the field easy. The use of the new data structure fastens the search process.
Keywords :
data structures; simulation languages; Highly Inflecting Languages; corpus size reduction; data scarcity problem; data structure; highly inflecting languages; search process; Buildings; Computational linguistics; Data structures; Databases; Educational technology; Morphology; Proposals; Shape; Telecommunication computing; Vocabulary; Inflecting; Inverted Index; Occurrence list; Tries;
Conference_Titel :
Recent Trends in Information, Telecommunication and Computing (ITC), 2010 International Conference on
Conference_Location :
Kochi, Kerala
Print_ISBN :
978-1-4244-5956-8
DOI :
10.1109/ITC.2010.44