DocumentCode :
3578832
Title :
Using dictionary in a knowledge based algorithm for clustering short texts in Bahasa Indonesia
Author :
Thamrin, Husni ; Sabardila, Atiqa
Author_Institution :
Dept. of Inf., Univ. Muhammadiyah Surakarta, Surakarta, Indonesia
fYear :
2014
Firstpage :
1
Lastpage :
4
Abstract :
Text clustering is important in many application of information retrieval. This paper presents a study of clustering short texts in Bahasa Indonesia using semantic similarity approach where dictionary of synonyms and hyponyms is used to get information on word relatedness. We compare sentence similarity calculations based on lexical matching and word similarity. More than 250 sentences are involved. Our experiment shows that clustering using sentence similarity based on lexical matching performs better in terms of precision and F-measure than clustering using sentence similarity based on semantic approach.
Keywords :
dictionaries; knowledge based systems; natural language processing; pattern clustering; pattern matching; statistical analysis; text analysis; Bahasa Indonesia; F-measure; dictionary; hyponyms; information retrieval; knowledge based algorithm; lexical matching; semantic similarity approach; sentence similarity calculations; short text clustering; synonyms; word relatedness; word similarity; Clustering algorithms; Dictionaries; Knowledge based systems; Organizations; Semantics; Vectors; Bahasa Indonesia; dictionary; text clustering; word relatedness;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data and Software Engineering (ICODSE), 2014 International Conference on
Print_ISBN :
978-1-4799-8175-5
Type :
conf
DOI :
10.1109/ICODSE.2014.7062678
Filename :
7062678
Link To Document :
بازگشت