DocumentCode
2234323
Title
Promoter prediction in eukaryotes using soft computing techniques
Author
Premalatha, C. ; Aravindan, Chandrabose ; Kannan, K.
Author_Institution
SASTRA Univ., Thanjavur, India
fYear
2011
fDate
22-24 Sept. 2011
Firstpage
528
Lastpage
532
Abstract
In molecular biology, in silico identification of eukaryotic promoters is a challenging task. Currently available classifiers generate either poor sensitivity or specificity. In this paper, we propose a support vector machine classifier, referred to as PSVM, to recognize the human pol.II promoters using markov model for extracting features representing k-mer frequency, along with features representing other transcription signals such as TATA box, GC box etc. This classifier is trained using data set comprising 1862 promoters and 1759 non promoters in human genome and takes only 12 parameters to classify a given sequence as promoter or not. Among the 20 verified promoters in human chromosome 22, PSVM recognizes 18. Also it successfully identifies all the 14 well annotated exons of human chromosome 22 as non promoters. When 90% of data is used to train PSVM, it yields a sensitivity of 93.55% and specificity of 98.86% which are significantly better than previously reported results and also those of online promoter prediction tools such as NNPP, ProScan, and TSSG. Thus, k-mer frequency represented by markov model of order k, TATA box, GC box, CAAT box, Init box, and CpG island can be a valuable combination of features for predicting eukaryotic pol.II promoters.
Keywords
Markov processes; biology computing; molecular biophysics; pattern classification; support vector machines; CAAT box; CpG island; GC box; Init box; Markov model; NNPP; PSVM; ProScan; TATA box; TSSG; eukaryotic promoter prediction; feature extraction; human chromosome; human genome; k-mer frequency; molecular biology; online promoter prediction tools; soft computing techniques; support vector machine classifier; Bioinformatics; Feature extraction; Genomics; Humans; Markov processes; Support vector machines; Training;
fLanguage
English
Publisher
ieee
Conference_Titel
Recent Advances in Intelligent Computational Systems (RAICS), 2011 IEEE
Conference_Location
Trivandrum
Print_ISBN
978-1-4244-9478-1
Type
conf
DOI
10.1109/RAICS.2011.6069368
Filename
6069368
Link To Document