Title :
Mature miRNA identification via the use of a Naive Bayes classifier
Author :
Gkirtzou, Katerina ; Tsakalides, Panagiotis ; Poirazi, Panayiota
Author_Institution :
Dept. of Comput. Sci., Univ. of Crete, Heraklion
Abstract :
MicroRNAs (miRNAs) are small single stranded RNAs, on average 22nt long, generated from endogenous hairpin-shaped transcripts with post-transcriptional activity. Although many computational methods are currently available for identifying miRNA genes in the genomes of various species, very few algorithms can accurately predict the functional part of the miRNA gene, namely the mature miRNA. We introduce a computational method that uses a Naive Bayes classifier to identify mature miRNA candidates based on sequence and secondary structure information of the miRNA precursor. Specifically, for each mature miRNA, we generate a set of negative examples of equal length on the respective precursor(s). The true and negative sets are then used to estimate probability distributions for sequence composition and secondary structure on each position along the RNA. The distance between these distributions is estimated using the symmetric Kullback-Leibler metric. The positions at which the two distributions differ significantly and consistently over a 10-fold cross-validation procedure are used as features for training the Naive Bayes classifier. A total of 15 classifiers were trained with true positive and negative examples from human and mouse. A performance of 76% sensitivity and 65% specificity was achieved using a consensus averaging over a 10-fold cross-validation procedure. Our findings suggest that position specific sequence and structure information combined with a simple Bayes classifier achieve a good performance on the challenging task of mature miRNA identification.
Keywords :
Bayes methods; bioinformatics; genomics; pattern classification; Naive Bayes classifier; genomes; miRNA identification; microRNA; symmetric Kullback-Leibler metric; Bayesian methods; Bioinformatics; Computer science; Genomics; Humans; Mice; Prediction algorithms; Probability distribution; Proteins; RNA;
Conference_Titel :
BioInformatics and BioEngineering, 2008. BIBE 2008. 8th IEEE International Conference on
Conference_Location :
Athens
Print_ISBN :
978-1-4244-2844-1
Electronic_ISBN :
978-1-4244-2845-8
DOI :
10.1109/BIBE.2008.4696697