DocumentCode :
2310665
Title :
A Novel Approach to Overcome Data Scarcity Problem for Highly Inflecting Languages
Author :
Sunitha, K.V.N. ; Sharada, A.
Author_Institution :
CSE Dept, G.Narayanamma Inst. of Technol. & Sci., Hyderabad, India
fYear :
2010
fDate :
12-13 March 2010
Firstpage :
290
Lastpage :
292
Abstract :
This paper proposes a model for lexicon which can be effectively used in language modelling and other NLP related tasks for agglutinative, highly inflecting and compounding languages. Due to huge amount of distinct word forms, the traditional methods based on full words are not very effective and it is not straight forward to train efficient language models with good coverage of the language. The main contribution of the paper is the proposal of new data structure and an algorithm that is applied over the data structure. This approach greatly reduces the corpus size thereby making the research work in the field easy. The use of the new data structure fastens the search process.
Keywords :
data structures; simulation languages; Highly Inflecting Languages; corpus size reduction; data scarcity problem; data structure; highly inflecting languages; search process; Buildings; Computational linguistics; Data structures; Databases; Educational technology; Morphology; Proposals; Shape; Telecommunication computing; Vocabulary; Inflecting; Inverted Index; Occurrence list; Tries;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Recent Trends in Information, Telecommunication and Computing (ITC), 2010 International Conference on
Conference_Location :
Kochi, Kerala
Print_ISBN :
978-1-4244-5956-8
Type :
conf
DOI :
10.1109/ITC.2010.44
Filename :
5460560
Link To Document :
بازگشت