DocumentCode :
1826519
Title :
A comprehensive comparison study of document clustering for a biomedical digital library MEDLINE
Author :
Hu, Xiaohua ; Yoo, Illhoi
Author_Institution :
Coll. of Inf. Sci. & Technol., Drexel Univ., Philadelphia, PA
fYear :
2006
fDate :
38869
Firstpage :
220
Lastpage :
229
Abstract :
Document clustering has been used for better document retrieval, document browsing, and text mining in digital library. In this paper, we perform a comprehensive comparison study of various document clustering approaches such as three hierarchical methods (single-link, complete-link, and complete link), Bisecting K-means, K-means, and suffix tree clustering in terms of the efficiency, the effectiveness, and the scalability. In addition, we apply a domain ontology to document clustering to investigate if the ontology such as MeSH improves clustering qualify for MEDLINE articles. Because an ontology is a formal, explicit specification of a shared conceptualization for a domain of interest, the use of ontologies is a natural way to solve traditional information retrieval problems such as synonym/hypernym/ hyponym problems. We conducted fairly extensive experiments based on different evaluation metrics such as misclassification index, F-measure, cluster purity, and entropy on very large article sets from MEDLINE, the largest biomedical digital library in biomedicine
Keywords :
bibliographic systems; data mining; document handling; information retrieval; medical information systems; ontologies (artificial intelligence); pattern clustering; text analysis; K-means; biomedical digital library MEDLINE; biomedicine; bisecting K-means; document browsing; document clustering; document retrieval; domain ontology; formal explicit specification; hierarchical methods; shared conceptualization; suffix tree clustering; text mining; Biomedical measurements; Clustering algorithms; Educational institutions; Information retrieval; Information science; Iterative algorithms; Ontologies; Partitioning algorithms; Software libraries; Text mining; comparison study; document clustering; ontology;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Digital Libraries, 2006. JCDL '06. Proceedings of the 6th ACM/IEEE-CS Joint Conference on
Conference_Location :
Chapel Hill, NC
Print_ISBN :
1-59593-354-9
Type :
conf
DOI :
10.1145/1141753.1141802
Filename :
4119128
Link To Document :
بازگشت