• DocumentCode
    3417009
  • Title

    A new algorithm of promoter prediction and identification

  • Author

    Fang, Rongxin ; Wu, Shuanhu ; Zhang, Wenyan ; Liu, Qicheng ; Song, Yibin

  • Author_Institution
    Sch. of Comput. Sci. & Technol., Yantai Univ., Yantai, China
  • fYear
    2011
  • fDate
    19-21 Oct. 2011
  • Firstpage
    236
  • Lastpage
    241
  • Abstract
    In this paper, an effective promoter identification algorithm is proposed. This new algorithm is based on the following features of promoters: (I) Promoter regions include some binding sites where RNA polymerase II binds to and also where transcription starts. These binding sites include core-promoter, like TATA-box, GC-box, i.e. However, spacing structure of binding sites is not always consistent, the same kind of binding sites in promoter regions often differ in structure because of nucleotide variation. (II) Positions of binding sites in the gene are not fixed, instead, their positions are actually more likely to fluctuate in an approximate region. Based on above two features of promoters, firstly, we overlook differences in structure of binding sites caused by nucleotide variation. In another word, Those binding motifs, with similarity in structure but appearing in different forms caused by nucleotide variation, are seen as one binding motif. Secondly, we divide promoter regions into equal-length intervals and calculate occurring probability of binding sites in each interval. It is the first time for us to present a new concept “Interval Weight Matrix (IWM)” to reflect relationship between interval and occurring probability of binding sites. Then a new promoter identification system is proposed. After testing on large sequences and comparing with other well-known systems, it is proved that our new algorithm performs much better in reducing false positives(FP) than other well-known systems.
  • Keywords
    RNA; biology computing; enzymes; genetics; molecular biophysics; molecular configurations; molecular weight; GC-box core-promoter; RNA polymerase; TATA-box core-promoter; binding motifs; binding sites; equal-length intervals; fluctuation; gene; interval weight matrix; nucleotide variation; promoter identification algorithm; promoter prediction algorithm; spacing structure; transcription; Bioinformatics; DNA; Genomics; Pulse width modulation; Sensitivity; Testing; Training;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Advanced Computational Intelligence (IWACI), 2011 Fourth International Workshop on
  • Conference_Location
    Wuhan
  • Print_ISBN
    978-1-61284-374-2
  • Type

    conf

  • DOI
    10.1109/IWACI.2011.6160009
  • Filename
    6160009