DocumentCode :
2962902
Title :
Text-mining protein-protein interaction corpus using concept clustering to identify intermittency
Author :
Peterson, Leif E. ; Coleman, Matthew A.
Author_Institution :
Center for Biostat., Methodist Hosp. Res. Inst., Houston, TX
fYear :
2008
fDate :
1-8 June 2008
Firstpage :
3634
Lastpage :
3640
Abstract :
We used human protein-protein interaction (PPI) data transformed into documents to perform text-mining via concept clusters. The advantage of text-mining PPI data is that words (proteins) that are very sparse or over-abundant can be dropped, leaving the remaining bulk of data for clustering and rule mining. Libraries of tissue-specific binary PPIs were constructed from a list of 36,137 binary PPIs in the Human Protein Reference Database(HPRD). A randomization test for intermittency in the form of spikes and holes in frequency distributions of cluster-specific word frequencies was developed using scaled factorial moments. The test was based on a permutation form of a log-linear regression model to determine differences in slopes for ln(F2) vs. ln(M) in the intermittent and null distributions. Significant intermittency (p < 0:0005) in PPI was detected for prostate and testis tissue after a Bonferroni adjustment for multiple tests. The presence of intermittency reflects spikes and holes in histograms of cluster-specific word frequencies and possibly suggests identification of novel large signal transduction pathways or networks.
Keywords :
bioinformatics; data mining; pattern clustering; proteins; regression analysis; statistical distributions; text analysis; concept clustering; frequency distribution; human protein reference database; intermittency identification; intermittent distribution; log-linear regression model; null distribution; randomization test; scaled factorial moment; text-mining protein-protein interaction corpus; Data mining; Databases; Frequency; Genetic mutations; Histograms; Humans; Libraries; Proteins; Technological innovation; Testing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Neural Networks, 2008. IJCNN 2008. (IEEE World Congress on Computational Intelligence). IEEE International Joint Conference on
Conference_Location :
Hong Kong
ISSN :
1098-7576
Print_ISBN :
978-1-4244-1820-6
Electronic_ISBN :
1098-7576
Type :
conf
DOI :
10.1109/IJCNN.2008.4634318
Filename :
4634318
Link To Document :
بازگشت