DocumentCode :
1936140
Title :
Human Promoter Recognition using Kullback-Leibler Divergence
Author :
Zeng, Jia ; Cao, Xiao-Qin ; Yan, Hong
Author_Institution :
City Univ. of Hong Kong, Kowloon
Volume :
6
fYear :
2007
fDate :
19-22 Aug. 2007
Firstpage :
3319
Lastpage :
3325
Abstract :
Human gene expression can be regulated at four levels: differential gene transcription, selective nuclear RNA (nRNA) processing, selective messenger RNA (mRNA) translation, and differential protein modification. In the transcription level, the promoter region plays an important role in binding of RNA polymerase for the subsequent initiation of transcription. In this paper, we use the Kullback-Leibler (KL) divergence to select the most informative and discriminative word-based features to differentiate the promoter and non-promoter regions in large genomic sequences. First, we assume that the nucleotide sequence is composed of random "words" with the same length. Second, based on promoter and non-promoter training samples, we have two discrete distributions over the same random word, and the KL divergence is a measure of the distance between these distributions. Finally, we select two groups of words, one for the class "promoter" and the other for the class "non-promoter", which have the maximum KL divergence in training samples. The gene recognition proceeds by comparing the number of word matches between the two groups of informative words. Encouraging results are obtained in experiments on the human promoter recognition in DBTSS database. The recognition rate is comparable with those of the state-of-the-art promoter recognizers. Future works about the combination of words with different lengths are also discussed.
Keywords :
biology computing; genetics; macromolecules; statistical distributions; DBTSS database; Kullback-Leibler divergence; RNA polymerase; differential gene transcription; differential protein modification; discrete distributions; gene recognition; genomic sequences; human gene expression; human promoter recognition; nucleotide sequence; selective messenger RNA translation; selective nuclear RNA processing; word-based features; Bioinformatics; DNA; Gene expression; Genomics; Humans; Machine learning; Polymers; Proteins; RNA; Sequences;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Machine Learning and Cybernetics, 2007 International Conference on
Conference_Location :
Hong Kong
Print_ISBN :
978-1-4244-0973-0
Electronic_ISBN :
978-1-4244-0973-0
Type :
conf
DOI :
10.1109/ICMLC.2007.4370721
Filename :
4370721
Link To Document :
بازگشت