DocumentCode :
3417009
Title :
A new algorithm of promoter prediction and identification
Author :
Fang, Rongxin ; Wu, Shuanhu ; Zhang, Wenyan ; Liu, Qicheng ; Song, Yibin
Author_Institution :
Sch. of Comput. Sci. & Technol., Yantai Univ., Yantai, China
fYear :
2011
fDate :
19-21 Oct. 2011
Firstpage :
236
Lastpage :
241
Abstract :
In this paper, an effective promoter identification algorithm is proposed. This new algorithm is based on the following features of promoters: (I) Promoter regions include some binding sites where RNA polymerase II binds to and also where transcription starts. These binding sites include core-promoter, like TATA-box, GC-box, i.e. However, spacing structure of binding sites is not always consistent, the same kind of binding sites in promoter regions often differ in structure because of nucleotide variation. (II) Positions of binding sites in the gene are not fixed, instead, their positions are actually more likely to fluctuate in an approximate region. Based on above two features of promoters, firstly, we overlook differences in structure of binding sites caused by nucleotide variation. In another word, Those binding motifs, with similarity in structure but appearing in different forms caused by nucleotide variation, are seen as one binding motif. Secondly, we divide promoter regions into equal-length intervals and calculate occurring probability of binding sites in each interval. It is the first time for us to present a new concept “Interval Weight Matrix (IWM)” to reflect relationship between interval and occurring probability of binding sites. Then a new promoter identification system is proposed. After testing on large sequences and comparing with other well-known systems, it is proved that our new algorithm performs much better in reducing false positives(FP) than other well-known systems.
Keywords :
RNA; biology computing; enzymes; genetics; molecular biophysics; molecular configurations; molecular weight; GC-box core-promoter; RNA polymerase; TATA-box core-promoter; binding motifs; binding sites; equal-length intervals; fluctuation; gene; interval weight matrix; nucleotide variation; promoter identification algorithm; promoter prediction algorithm; spacing structure; transcription; Bioinformatics; DNA; Genomics; Pulse width modulation; Sensitivity; Testing; Training;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Advanced Computational Intelligence (IWACI), 2011 Fourth International Workshop on
Conference_Location :
Wuhan
Print_ISBN :
978-1-61284-374-2
Type :
conf
DOI :
10.1109/IWACI.2011.6160009
Filename :
6160009
Link To Document :
بازگشت