Title :
Identifying essential features for the classification of real and pseudo microRNAs precursors using fuzzy decision trees
Author :
Abu-Halaweh, Nael M. ; Harrison, Robert W.
Author_Institution :
Comput. Sci. Dept., Georgia State Univ., Atlanta, GA, USA
Abstract :
MicroRNAs play an important role in post-transcriptional gene regulation. Experimental approaches to identify microRNAs are expensive and time-consuming. Computational approaches have proven to be useful for identifying microRNA candidates. Most approaches rely on features extracted from miroRNA precursors (pre-microRNA) and their secondary structure. Selecting the appropriate set of features plays a critical role in improving the prediction accuracy of pre-microRNA candidates. This work aims to investigate the triplet elements encoding scheme and to identify essential features needed for the correct classification of pre-microRNAs. To achieve these goals, an extension of the triplet elements encoding scheme is introduced. Features extracted using the extended scheme were combined with global features introduced in the literature, and fuzzy decision tree (FDT) is used as a classification and a feature selection tool. Unlike previous machine-learning-based approaches, FDT produces a human comprehensible classification model. The interpretability of the classification model provides a means to identify the essential features needed to recognize microRNA candidates and offers a better understanding of this problem. Our results indicate that the triplet elements scheme is not superior to any of its proposed extensions. Further analysis revealed that including the features extracted using triplet elements scheme does not add any value to this classification problem but rather introduces some noisy features, and comparable classification results can be achieved by using only the six global features identified by FDT.
Keywords :
decision trees; feature extraction; fuzzy set theory; macromolecules; medical image processing; organic compounds; pattern classification; feature extraction; fuzzy decision trees; microRNAs precursors classification; post-transcriptional gene regulation; pre-microRNA; triplet elements encoding scheme; Biological information theory; Classification tree analysis; Decision trees; Diseases; Encoding; Feature extraction; Machine learning; RNA; Support vector machine classification; Support vector machines;
Conference_Titel :
Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), 2010 IEEE Symposium on
Conference_Location :
Montreal, QC
Print_ISBN :
978-1-4244-6766-2
DOI :
10.1109/CIBCB.2010.5510430