مرکز منطقه ای اطلاع رساني علوم و فناوري - A comprehensive comparison study of document clustering for a biomedical digital library MEDLINE

DocumentCode :

1826519

Title :

A comprehensive comparison study of document clustering for a biomedical digital library MEDLINE

Author :

Hu, Xiaohua ; Yoo, Illhoi

Author_Institution :

Coll. of Inf. Sci. & Technol., Drexel Univ., Philadelphia, PA

fYear :

2006

fDate :

38869

Firstpage :

220

Lastpage :

229

Abstract :

Document clustering has been used for better document retrieval, document browsing, and text mining in digital library. In this paper, we perform a comprehensive comparison study of various document clustering approaches such as three hierarchical methods (single-link, complete-link, and complete link), Bisecting K-means, K-means, and suffix tree clustering in terms of the efficiency, the effectiveness, and the scalability. In addition, we apply a domain ontology to document clustering to investigate if the ontology such as MeSH improves clustering qualify for MEDLINE articles. Because an ontology is a formal, explicit specification of a shared conceptualization for a domain of interest, the use of ontologies is a natural way to solve traditional information retrieval problems such as synonym/hypernym/ hyponym problems. We conducted fairly extensive experiments based on different evaluation metrics such as misclassification index, F-measure, cluster purity, and entropy on very large article sets from MEDLINE, the largest biomedical digital library in biomedicine

Keywords :

bibliographic systems; data mining; document handling; information retrieval; medical information systems; ontologies (artificial intelligence); pattern clustering; text analysis; K-means; biomedical digital library MEDLINE; biomedicine; bisecting K-means; document browsing; document clustering; document retrieval; domain ontology; formal explicit specification; hierarchical methods; shared conceptualization; suffix tree clustering; text mining; Biomedical measurements; Clustering algorithms; Educational institutions; Information retrieval; Information science; Iterative algorithms; Ontologies; Partitioning algorithms; Software libraries; Text mining; comparison study; document clustering; ontology;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Digital Libraries, 2006. JCDL '06. Proceedings of the 6th ACM/IEEE-CS Joint Conference on

Conference_Location :

Chapel Hill, NC

Print_ISBN :

1-59593-354-9

Type :

conf

DOI :

10.1145/1141753.1141802

Filename :

4119128

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1826519