DocumentCode :
1496203
Title :
Multistage Gene Normalization and SVM-Based Ranking for Protein Interactor Extraction in Full-Text Articles
Author :
Dai, Hong-Jie ; Lai, Po-Ting ; Tsai, Richard Tzong-Han
Author_Institution :
Dept. of Comput. Sci., Nat. TsingHua Univ., Hsinchu, Taiwan
Volume :
7
Issue :
3
fYear :
2010
Firstpage :
412
Lastpage :
420
Abstract :
The interactor normalization task (INT) is to identify genes that play the interactor role in protein-protein interactions (PPIs), to map these genes to unique IDs, and to rank them according to their normalized confidence. INT has two subtasks: gene normalization (GN) and interactor ranking. The main difficulties of INT GN are identifying genes across species and using full papers instead of abstracts. To tackle these problems, we developed a multistage GN algorithm and a ranking method, which exploit information in different parts of a paper. Our system achieved a promising AUC of 0.43471. Using the multistage GN algorithm, we have been able to improve system performance (AUC) by 1.719 percent compared to a one-stage GN algorithm. Our experimental results also show that with full text, versus abstract only, INT AUC performance was 22.6 percent higher.
Keywords :
bioinformatics; data mining; full-text databases; genetics; proteins; support vector machines; text analysis; SVM-based ranking; data mining; full-text article; interactor normalization task; interactor ranking; multistage GN algorithm; multistage gene normalization; protein interactor extraction; protein-protein interactions; system performance; text mining; Data mining; feature evaluation and selection; mining methods and algorithms; scientific databases.; text mining; Abstracting and Indexing as Topic; Algorithms; Artificial Intelligence; Data Mining; Databases, Genetic; Genes; Information Storage and Retrieval; Natural Language Processing; Periodicals as Topic; Protein Interaction Mapping;
fLanguage :
English
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
Publisher :
ieee
ISSN :
1545-5963
Type :
jour
DOI :
10.1109/TCBB.2010.45
Filename :
5467043
Link To Document :
بازگشت