DocumentCode :
2891662
Title :
Supervised Link Discovery on Large-Scale Biomedical Concept Networks
Author :
Katukuri, Jayashima ; Xie, Ying ; Raghavan, Vijay V. ; Gupta, Ashish
Author_Institution :
Univ. of Louisiana at Lafayette, Lafayette, LA, USA
fYear :
2011
fDate :
12-15 Nov. 2011
Firstpage :
562
Lastpage :
568
Abstract :
Computational approaches to generate hypotheses from biomedical literature have been studied intensively in recent years. Nevertheless, it still remains a challenge to automatically discover novel, cross-silo biomedical hypotheses from large-scale literature repositories. In order to address this challenge, we first model a biomedical literature repository as a comprehensive network of biomedical concepts and formulate hypotheses generation as a process of link discovery on the concept network. We extract the relevant information from the biomedical literature corpus and generate a concept network and concept-author matrix on a cluster using Map-Reduce framework. We extract a set of heterogeneous features such as random walk based features, neighborhood features and common author features. The potential number of links to consider for the possibility of link discovery is large in our concept network and to address the scalability problem, the features from a concept network are extracted using a cluster with Map-Reduce framework. We further model link discovery as a classification problem carried out on two network snapshots taken in two consecutive time frames, such that the classification model that is built on the first snapshot can be tested on the second snapshot. A set of heterogeneous features, which cover both topological and semantic features derived from the concept network, have been studied with respect to their impacts on the accuracy of the proposed supervised link discovery process.
Keywords :
data mining; distributed processing; medical information systems; text analysis; MapReduce framework; biomedical literature; common author features; concept author matrix; cross silo biomedical hypotheses; large scale biomedical concept networks; large scale literature repositories; neighborhood features; random walk based features; supervised link discovery; text mining; Biological system modeling; Biomedical measurements; Data mining; Feature extraction; Mathematical model; Semantics; Unified modeling language; Hypotheses discovery; Link discovery; Supervised Learning;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Bioinformatics and Biomedicine (BIBM), 2011 IEEE International Conference on
Conference_Location :
Atlanta, GA
Print_ISBN :
978-1-4577-1799-4
Type :
conf
DOI :
10.1109/BIBM.2011.92
Filename :
6120502
Link To Document :
بازگشت