DocumentCode :
1814793
Title :
Discriminative discovery of transcription factor binding sites from location data
Author :
Kawada, Yuji ; Sakakibara, Yasubumi
Author_Institution :
Dept. of Biosci. & Informatics, Keio Univ., Yokohama, Japan
fYear :
2005
fDate :
8-11 Aug. 2005
Firstpage :
86
Lastpage :
89
Abstract :
The availability of genome-wide location analyses based on chromatin immunoprecipitation (CMP) data gives a new insight for in silico analysis of transcriptional regulations. We propose a novel discriminative discovery framework for precisely identifying transcriptional regulatory motifs from both positive and negative samples (sets of upstream sequences of both bound and unbound genes by a transcription factor (TF)) based on the genome-wide location data. In this framework, our goal is to find such discriminative motifs that best explain the location data in the sense that the motifs precisely discriminate the positive samples from the negative ones. First, in order to discover an initial set of discriminative substrings between positive and negative samples, we apply a decision tree learning method which produces a text-classification tree. We extract several clusters consisting of similar substrings from the internal nodes of the learned tree. Second, we start with initial profile-HMMs constructed from each cluster for representing putative motifs and iteratively refine the profile-HMMs to improve the discrimination accuracies. Our genome-wide experimental results on yeast show that our method successfully identifies the consensus sequences for known TFs in the literature and further presents significant performances for discriminating between positive and negative samples in all the TFs, while most other motif detecting methods show very poor performances on the problem of discriminations. Our learned profile-HMMs also improve false negative predictions of ChIP data.
Keywords :
biology computing; cellular biophysics; decision trees; genetics; hidden Markov models; microorganisms; molecular biophysics; chromatin immunoprecipitation; decision tree learning method; genome-wide location analyses; motif detecting method; profile-HMM; silico analysis; substrings; text-classification tree; transcriptional regulation; yeast; Availability; Bioinformatics; Data mining; Decision trees; Fungi; Genomics; Hidden Markov models; Histograms; Informatics; Machine learning;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computational Systems Bioinformatics Conference, 2005. Proceedings. 2005 IEEE
Print_ISBN :
0-7695-2344-7
Type :
conf
DOI :
10.1109/CSB.2005.30
Filename :
1498010
Link To Document :
بازگشت