Title :
A Data Mining Method to Extract and Rank Papers Describing Coexpression Predicates Semantically
Author :
Zhang, Chengcui ; Tiwari, Richa ; Chen, Wei-Bang
Author_Institution :
Dept. of Comput. & Inf. Sci., Univ. of Alabama at Birmingham, Birmingham, AL, USA
Abstract :
Information management and extraction in the field of biomedical research has become a requirement with the rapid increase in the amount of data being published in this area. In this paper, a graphical model, Conditional Random Fields has been used to extract a particular gene-gene relationship called ¿coexpression¿ from the existing literature. First, a Conditional Random Fields based model has been trained and tested on full-length papers downloaded from PubMed, to label the predicates that talk about coexpression of genes. Proper local and contextual text features at both word and sentence levels are proposed and extracted during the pre-processing step. The classification performance of the model trained based on the proposed features has been compared with the that of Support Vector Machines, Nearest Neighbor with generalization, and Neural Networks algorithms, and seen to outperform them all. In our second experiment, the proposed ranking scheme, which is based on classification results, is applied to the ranked lists of papers returned by PubMed and Google, respectively. The comparison of our ranked results to that of PubMed and Google demonstrates that our proposed ranking scheme performs better than both in distinguishing a positive paper from a negative paper. In conclusion, this paper describes a specialized classification and ranking framework that can retrieve papers that really talk about coexpression between and among genes based on mining of semantics and not just lexical search.
Keywords :
data mining; information management; information retrieval; medical computing; neural nets; pattern recognition; support vector machines; biomedical research; coexpression predicates; conditional random fields; data mining; gene-gene relationship; graphical model; information extraction; information management; nearest neighbor; neural networks; support vector machines; Data mining; Graphical models; Hidden Markov models; Information retrieval; Machine learning; Nearest neighbor searches; Neural networks; Support vector machine classification; Support vector machines; Text mining;
Conference_Titel :
Data Mining Workshops, 2009. ICDMW '09. IEEE International Conference on
Conference_Location :
Miami, FL
Print_ISBN :
978-1-4244-5384-9
Electronic_ISBN :
978-0-7695-3902-7
DOI :
10.1109/ICDMW.2009.53