Title :
Analysis of standard clustering algorithms for grouping MEDLINE abstracts into evidence-based medicine intervention categories
Author :
Vladimir Dobrynin;Yulia Balykina;Michael Kamalov
Author_Institution :
St. Petersburg State University, 7/9 Universitetskaya nab., 199034, Russia
Abstract :
The paper describes a process of clustering of article abstracts, taken from the largest bibliographic life sciences and biomedical information MEDLINE database into categories that correspond to types of medical interventions - types of patient treatments. Experiments were carried out to evaluate the quality of clustering for the following algorithms: K-means; K-means++; Hierarchical clustering, SIB (Sequential information bottleneck) together with the LSA (Latent Semantic Analysis) methods and MI (Mutual Information) which allow selecting feature vectors. Best results of clustering were achieved by K-means++ together with LSA then 210-dimensional space was chosen: Purity = 0.5719, Entropy = 1.3841, Normalized Entropy = 0.6299.
Keywords :
"Clustering algorithms","Entropy","Mutual information","Algorithm design and analysis","Libraries","Semantics","Information retrieval"
Conference_Titel :
"Stability and Control Processes" in Memory of V.I. Zubov (SCP), 2015 International Conference
DOI :
10.1109/SCP.2015.7342223