Title :
Improved and Promising Identification of Human MicroRNAs by Incorporating a High-Quality Negative Set
Author :
Leyi Wei ; Minghong Liao ; Yue Gao ; Rongrong Ji ; Zengyou He ; Quan Zou
Author_Institution :
Sch. of Inf. Sci. & Technol., Xiamen Univ., Xiamen, China
Abstract :
MicroRNA (miRNA) plays an important role as a regulator in biological processes. Identification of (pre-) miRNAs helps in understanding regulatory processes. Machine learning methods have been designed for pre-miRNA identification. However, most of them cannot provide reliable predictive performances on independent testing data sets. We assumed this is because the training sets, especially the negative training sets, are not sufficiently representative. To generate a representative negative set, we proposed a novel negative sample selection technique, and successfully collected negative samples with improved quality. Two recent classifiers rebuilt with the proposed negative set achieved an improvement of ~6 percent in their predictive performance, which confirmed this assumption. Based on the proposed negative set, we constructed a training set, and developed an online system called miRNApre specifically for human pre-miRNA identification. We showed that miRNApre achieved accuracies on updated human and non-human data sets that were 34.3 and 7.6 percent higher than those achieved by current methods. The results suggest that miRNApre is an effective tool for pre-miRNA identification. Additionally, by integrating miRNApre, we developed a miRNA mining tool, mirnaDetect, which can be applied to find potential miRNAs in genome-scale data. MirnaDetect achieved a comparable mining performance on human chromosome 19 data as other existing methods.
Keywords :
RNA; biochemistry; bioinformatics; classification; data mining; genomics; information services; learning (artificial intelligence); molecular biophysics; MirnaDetect mining performance; biological process regulator; classifier predictive performance; genome-scale data; high-quality negative training set incorporation; human chromosome 19 data; human microRNA identification; human pre-miRNA identification; independent testing data sets; machine learning methods; miRNA mining tool; miRNApre accuracy; mirnaDetect; negative sample collection; negative sample quality; negative sample selection technique; nonhuman data sets; online system; representative negative set generation; training set construction; Biological processes; Data mining; Genetics; Machine learning; RNA; MicroRNA; high-quality negative set; microRNA identification; multi-level negative sample selection;
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
DOI :
10.1109/TCBB.2013.146