مرکز منطقه ای اطلاع رساني علوم و فناوري - A Data Mining Method to Extract and Rank Papers Describing Coexpression Predicates Semantically

DocumentCode :

2774261

Title :

A Data Mining Method to Extract and Rank Papers Describing Coexpression Predicates Semantically

Author :

Zhang, Chengcui ; Tiwari, Richa ; Chen, Wei-Bang

Author_Institution :

Dept. of Comput. & Inf. Sci., Univ. of Alabama at Birmingham, Birmingham, AL, USA

fYear :

2009

fDate :

6-6 Dec. 2009

Firstpage :

483

Lastpage :

488

Abstract :

Information management and extraction in the field of biomedical research has become a requirement with the rapid increase in the amount of data being published in this area. In this paper, a graphical model, Conditional Random Fields has been used to extract a particular gene-gene relationship called Â¿coexpressionÂ¿ from the existing literature. First, a Conditional Random Fields based model has been trained and tested on full-length papers downloaded from PubMed, to label the predicates that talk about coexpression of genes. Proper local and contextual text features at both word and sentence levels are proposed and extracted during the pre-processing step. The classification performance of the model trained based on the proposed features has been compared with the that of Support Vector Machines, Nearest Neighbor with generalization, and Neural Networks algorithms, and seen to outperform them all. In our second experiment, the proposed ranking scheme, which is based on classification results, is applied to the ranked lists of papers returned by PubMed and Google, respectively. The comparison of our ranked results to that of PubMed and Google demonstrates that our proposed ranking scheme performs better than both in distinguishing a positive paper from a negative paper. In conclusion, this paper describes a specialized classification and ranking framework that can retrieve papers that really talk about coexpression between and among genes based on mining of semantics and not just lexical search.

Keywords :

data mining; information management; information retrieval; medical computing; neural nets; pattern recognition; support vector machines; biomedical research; coexpression predicates; conditional random fields; data mining; gene-gene relationship; graphical model; information extraction; information management; nearest neighbor; neural networks; support vector machines; Data mining; Graphical models; Hidden Markov models; Information retrieval; Machine learning; Nearest neighbor searches; Neural networks; Support vector machine classification; Support vector machines; Text mining;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Data Mining Workshops, 2009. ICDMW '09. IEEE International Conference on

Conference_Location :

Miami, FL

Print_ISBN :

978-1-4244-5384-9

Electronic_ISBN :

978-0-7695-3902-7

Type :

conf

DOI :

10.1109/ICDMW.2009.53

Filename :

5360454

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2774261