DocumentCode :
3304815
Title :
Novel similarity measure for document clustering based on topic phrases
Author :
ELdesoky, A.E. ; Saleh, M. ; Sakr, N.A.
Author_Institution :
Dept. of Comput. & Syst., Mansoura Univ., Mansoura
fYear :
2009
fDate :
24-25 March 2009
Firstpage :
92
Lastpage :
96
Abstract :
Document clustering is a subset of the data clustering field which categorizes large set of documents into similar and related groups. In the traditional vector space model (VSM) researchers have considered the unique word which occurs in the document set as the candidate feature. Recently a new trend which considered the phrase to be a more informative feature has taken place; the matter which contributes in improving the document clustering accuracy and effectiveness. This paper proposes a new approach for computing the similarity measure of the traditional VSM by considering the topic phrases of the document as the constituting terms for the VSM instead of the traditional term ldquowordrdquo and applying the new approach to the Buckshot method, which is a mix of the Hierarchical Agglomerative Clustering (HAC) algorithm and the K-means partitioning algorithm. Such a mechanism may raise the effectiveness of the clustering by increasing the evaluation metrics values.
Keywords :
document handling; pattern clustering; Buckshot method; document clustering; hierarchical agglomerative clustering algorithm; k-means partitioning algorithm; similarity measure; topic phrases; vector space model; Clustering algorithms; Clustering methods; Frequency; Humans; Information retrieval; Natural language processing; Organizing; Partitioning algorithms; Taxonomy; Text mining;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Networking and Media Convergence, 2009. ICNM 2009. International Conference on
Conference_Location :
Cairo
Print_ISBN :
978-1-4244-3776-4
Electronic_ISBN :
978-1-4244-3778-8
Type :
conf
DOI :
10.1109/ICNM.2009.4907196
Filename :
4907196
Link To Document :
بازگشت