DocumentCode :
3033499
Title :
The influence of word normalization in English document clustering
Author :
Han, Pu ; Shen, Si ; Wang, Dongbo ; Liu, Yanyun
Author_Institution :
School of Information Management, Nanjing University, Nanjing, China
Volume :
2
fYear :
2012
fDate :
25-27 May 2012
Firstpage :
116
Lastpage :
120
Abstract :
Stemming or lemmatization method is a key step in English document processing. Based on three clustering algorithms and two evaluation functions, the paper makes a comprehensive study about two stemming algorithms and one lemmatization algorithm. According to the experimental result, it shows that the performance is not remarkable, compared with Snowball stemmer and Stanford lemmatization, Porter stemmer can make a better performance in entropy and purity.
Keywords :
document clustering; lemmatization; stemming;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Science and Automation Engineering (CSAE), 2012 IEEE International Conference on
Conference_Location :
Zhangjiajie, China
Print_ISBN :
978-1-4673-0088-9
Type :
conf
DOI :
10.1109/CSAE.2012.6272740
Filename :
6272740
Link To Document :
بازگشت