DocumentCode
2962902
Title
Text-mining protein-protein interaction corpus using concept clustering to identify intermittency
Author
Peterson, Leif E. ; Coleman, Matthew A.
Author_Institution
Center for Biostat., Methodist Hosp. Res. Inst., Houston, TX
fYear
2008
fDate
1-8 June 2008
Firstpage
3634
Lastpage
3640
Abstract
We used human protein-protein interaction (PPI) data transformed into documents to perform text-mining via concept clusters. The advantage of text-mining PPI data is that words (proteins) that are very sparse or over-abundant can be dropped, leaving the remaining bulk of data for clustering and rule mining. Libraries of tissue-specific binary PPIs were constructed from a list of 36,137 binary PPIs in the Human Protein Reference Database(HPRD). A randomization test for intermittency in the form of spikes and holes in frequency distributions of cluster-specific word frequencies was developed using scaled factorial moments. The test was based on a permutation form of a log-linear regression model to determine differences in slopes for ln(F2) vs. ln(M) in the intermittent and null distributions. Significant intermittency (p < 0:0005) in PPI was detected for prostate and testis tissue after a Bonferroni adjustment for multiple tests. The presence of intermittency reflects spikes and holes in histograms of cluster-specific word frequencies and possibly suggests identification of novel large signal transduction pathways or networks.
Keywords
bioinformatics; data mining; pattern clustering; proteins; regression analysis; statistical distributions; text analysis; concept clustering; frequency distribution; human protein reference database; intermittency identification; intermittent distribution; log-linear regression model; null distribution; randomization test; scaled factorial moment; text-mining protein-protein interaction corpus; Data mining; Databases; Frequency; Genetic mutations; Histograms; Humans; Libraries; Proteins; Technological innovation; Testing;
fLanguage
English
Publisher
ieee
Conference_Titel
Neural Networks, 2008. IJCNN 2008. (IEEE World Congress on Computational Intelligence). IEEE International Joint Conference on
Conference_Location
Hong Kong
ISSN
1098-7576
Print_ISBN
978-1-4244-1820-6
Electronic_ISBN
1098-7576
Type
conf
DOI
10.1109/IJCNN.2008.4634318
Filename
4634318
Link To Document