DocumentCode
848552
Title
SCS: Signal, Context, and Structure Features for Genome-Wide Human Promoter Recognition
Author
Zeng, Jia ; Zhao, Xiao-Yu ; Cao, Xiao-Qin ; Yan, Hong
Author_Institution
Sch. of Comput. Sci. & Technol., Soochow Univ., Suzhou, China
Volume
7
Issue
3
fYear
2010
Firstpage
550
Lastpage
562
Abstract
This paper integrates the signal, context, and structure features for genome-wide human promoter recognition, which is important in improving genome annotation and analyzing transcriptional regulation without experimental supports of ESTs, cDNAs, or mRNAs. First, CpG islands are salient biological signals associated with approximately 50 percent of mammalian promoters. Second, the genomic context of promoters may have biological significance, which is based on n-mers (sequences of n bases long) and their statistics estimated from training samples. Third, sequence-dependent DNA flexibility originates from DNA 3D structures and plays an important role in guiding transcription factors to the target site in promoters. Employing decision trees, we combine above signal, context, and structure features to build a hierarchical promoter recognition system called SCS. Experimental results on controlled data sets and the entire human genome demonstrate that SCS is significantly superior in terms of sensitivity and specificity as compared to other state-of-the-art methods. The SCS promoter recognition system is available online as supplemental materials for academic use and can be found on the Computer Society Digital Library at http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.95.
Keywords
biology computing; genomics; molecular biophysics; molecular configurations; DNA 3D structures; genome annotation; genome-wide human promoter recognition; genomic context; hierarchical promoter recognition system; salient biological signals; sequence-dependent DNA flexibility; signal, context; transcriptional regulation; Bioinformatics; Computer Society; DNA; Decision trees; Genomics; Humans; Sensitivity and specificity; Sequences; Signal analysis; Statistics; Biology and genetics; Pattern Recognition; Promoter recognition; classifier combination; feature extraction; genome analysis.; Algorithms; CpG Islands; Gene Expression Regulation; Genome, Human; Genomics; Humans; Promoter Regions, Genetic; Sequence Analysis, DNA; Societies, Scientific;
fLanguage
English
Journal_Title
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
Publisher
ieee
ISSN
1545-5963
Type
jour
DOI
10.1109/TCBB.2008.95
Filename
4609377
Link To Document