• DocumentCode
    2199843
  • Title

    Predicting Plant Pol-II Promoter Based on Subsequence Increment of Overlap Content Diversity

  • Author

    Zuo, Yongchun ; Li, Qianzhong

  • Author_Institution
    Lab. of Theor. Biophys., Inner Mongolia Univ., Hohhot, China
  • fYear
    2009
  • fDate
    17-19 Oct. 2009
  • Firstpage
    1
  • Lastpage
    5
  • Abstract
    Promoter identification is the first and the most important step for understanding gene transcription regulation. In this study, one new information content feature, the subsequence increment of overlapping content diversity (IOCD), is firstly presented to describe the subsequence content of plant pollII promoter. The negative datasets include five different regions of Arabidopsis thaliana complete genomes, Codings, Introns, Intergenics, 5´ untranslation regions (UTRs) and 3´ untranslation regions (UTRs). The prediction capacity of our algorithm is tested by 10-fold cross validation test based on Kmer IOCD. The results show that the IOCD can describe the promoter sequence content well. Further, based on the interval distances between transcription start site (TSS) and translation initiation site (TIS), the method is applied to search the complete genomes of Arabidopsis thaliana and more than ten thousand probable promoters are founded.
  • Keywords
    biology computing; botany; genetics; genomics; 3´ untranslation regions; 5´ untranslation regions; Arabidopsis thaliana complete genomes; Intergenics; Introns; K-mer IOCD; gene transcription regulation; negative datasets; overlap content diversity; overlapping content diversity; plant pol-II promoter; transcription start site; translation initiation site; Bayesian methods; Bioinformatics; Biophysics; DNA; Entropy; Genomics; Laboratories; Prediction algorithms; Sequences; Testing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Biomedical Engineering and Informatics, 2009. BMEI '09. 2nd International Conference on
  • Conference_Location
    Tianjin
  • Print_ISBN
    978-1-4244-4132-7
  • Electronic_ISBN
    978-1-4244-4134-1
  • Type

    conf

  • DOI
    10.1109/BMEI.2009.5305749
  • Filename
    5305749