Title :
The influence of word normalization in English document clustering
Author :
Han, Pu ; Shen, Si ; Wang, Dongbo ; Liu, Yanyun
Author_Institution :
School of Information Management, Nanjing University, Nanjing, China
Abstract :
Stemming or lemmatization method is a key step in English document processing. Based on three clustering algorithms and two evaluation functions, the paper makes a comprehensive study about two stemming algorithms and one lemmatization algorithm. According to the experimental result, it shows that the performance is not remarkable, compared with Snowball stemmer and Stanford lemmatization, Porter stemmer can make a better performance in entropy and purity.
Keywords :
document clustering; lemmatization; stemming;
Conference_Titel :
Computer Science and Automation Engineering (CSAE), 2012 IEEE International Conference on
Conference_Location :
Zhangjiajie, China
Print_ISBN :
978-1-4673-0088-9
DOI :
10.1109/CSAE.2012.6272740