Title :
Probabilistic Models for Semisupervised Discriminative Motif Discovery in DNA Sequences
Author :
Kim, Jong Kyoung ; Choi, Seungjin
Author_Institution :
Dept. of Comput. Sci., Pohang Univ. of Sci. & Technol., Pohang, South Korea
Abstract :
Methods for discriminative motif discovery in DNA sequences identify transcription factor binding sites (TFBSs), searching only for patterns that differentiate two sets (positive and negative sets) of sequences. On one hand, discriminative methods increase the sensitivity and specificity of motif discovery, compared to generative models. On the other hand, generative models can easily exploit unlabeled sequences to better detect functional motifs when labeled training samples are limited. In this paper, we develop a hybrid generative/discriminative model which enables us to make use of unlabeled sequences in the framework of discriminative motif discovery, leading to semisupervised discriminative motif discovery. Numerical experiments on yeast ChIP-chip data for discovering DNA motifs demonstrate that the best performance is obtained between the purely-generative and the purely-discriminative and the semisupervised learning improves the performance when labeled sequences are limited.
Keywords :
DNA; biology computing; learning (artificial intelligence); microorganisms; molecular biophysics; molecular configurations; physiological models; probability; DNA sequences; functional motifs; generative models; probabilistic models; semisupervised discriminative motif discovery; semisupervised learning; sensitivity; specificity; transcription factor binding sites; yeast ChIP-chip data; Biological system modeling; Computational modeling; DNA; Hybrid power systems; Joints; Numerical models; Probabilistic logic; Graphical models; hybrid generative/discriminative models; motif discovery; probabilistic models; semisupervised learning.; Algorithms; Artificial Intelligence; Binding Sites; Computational Biology; Computer Simulation; DNA; DNA, Fungal; Databases, Genetic; Models, Genetic; Models, Statistical; Nucleotide Motifs; Oligonucleotide Array Sequence Analysis; Sequence Analysis, DNA; Transcription Factors;
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
DOI :
10.1109/TCBB.2010.84