DocumentCode :
3533066
Title :
Comparison of classification methods on protein-protein interaction document classification
Author :
Xu, Guixian ; Niu, Zhendong ; Uetz, Peter ; Gao, Xu ; Liu, Hongfang
Author_Institution :
Coll. of Comput. Sci., Beijing Inst. of Technol., Beijing
fYear :
2008
fDate :
3-5 Nov. 2008
Firstpage :
83
Lastpage :
90
Abstract :
Protein-protein interaction (PPI) network is essential to understand the fundamental processes governing cell biology. The mining and curation of experimental PPI knowledge is critical for analysis of high-throughput genomics and proteomics data. Several PPI knowledge bases have been generated by expensive manual curation but far from comprehensive. Document classification systems have been shown to have the potential to accelerate the curation process by retrieving PPI-related documents. However, it is usually a case that a small number of positive documents can be obtained manually or from PPI knowledge bases with literature-based evidence and there are a large number of unlabeled documents where most of them are negative documents. Such data sets are called imbalanced. Learning from imbalanced data sets, where the number of examples of one (majority) class is much higher than the others, presents an important challenge to the machine learning community. It is not clear what kind of classification algorithm is suitable for PPI document classification. In this paper, we compared the performance of several document classifiers on two PPI document sets and varied the size of the number of positives and the ratio of the number of positives to the number of negatives (or unlabeled) in the experiment.
Keywords :
biology computing; document handling; genomics; learning systems; proteins; proteomics; cell biology; curation process; genomics; machine learning; protein-protein interaction document classification; proteomics data; Bioinformatics; Biological cells; Classification algorithms; Data mining; Educational institutions; Genomics; Machine learning; Protein engineering; Proteomics; Text categorization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Bioinformatics and Biomeidcine Workshops, 2008. BIBMW 2008. IEEE International Conference on
Conference_Location :
Philadelphia, PA
Print_ISBN :
978-1-4244-2890-8
Type :
conf
DOI :
10.1109/BIBMW.2008.4686213
Filename :
4686213
Link To Document :
بازگشت