DocumentCode :
1784899
Title :
Mining meaningful topics from massive biomedical literature
Author :
Peiyan Zhu ; Junhui Shen ; Dezhi Sun ; Ke Xu
Author_Institution :
State Key Lab. of Software Dev. Environ., Beihang Univ., Beijing, China
fYear :
2014
fDate :
2-5 Nov. 2014
Firstpage :
438
Lastpage :
443
Abstract :
There is huge amount of biomedical and biological literature online or in digital libraries. Moreover, new research papers are published with an exponential growth in recent years. So it is pressing and challenging to mine meaningful topics from massive biomedical literature. The mined topics are helpful to researchers for literature exploration and topic discovery. However, latent topics inferred by traditional topic models are not always coherent and meaningful. In this work, we propose a new methodology to mine meaningful biomedical topics with a combination of several off-the-shelf text mining techniques such as part-of-speech tagging, base noun phrase chunking, K-means clustering and latent Dirichlet allocation, which endow our methodology with scalability and implementation simplicity. We conduct comprehensive experiments on a dataset collected from PubMed. The experimental results demonstrate that our method significantly outperforms a baseline method. We also perform a qualitative analysis and present meaningful biomedical topics and multi-word expressions.
Keywords :
biomedical engineering; data mining; digital libraries; information retrieval systems; information services; medical computing; pattern clustering; text analysis; K-means clustering; PubMed dataset; base noun phrase chunking; baseline method; digital library; latent Dirichlet allocation; latent topic inference; literature exploration; massive biomedical literature; meaningful biomedical topic mining; multi-word expression; off-the-shelf text mining techniques; online biological literature; online biomedical literature; part-of-speech tagging; qualitative analysis; research paper; topic discovery; traditional topic model; Arteries; Biological system modeling; Biomedical measurement; Cancer; Diseases; Semantics; Tagging;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Bioinformatics and Biomedicine (BIBM), 2014 IEEE International Conference on
Conference_Location :
Belfast
Type :
conf
DOI :
10.1109/BIBM.2014.6999197
Filename :
6999197
Link To Document :
بازگشت