DocumentCode
3033231
Title
Identifying essential features for the classification of real and pseudo microRNAs precursors using fuzzy decision trees
Author
Abu-Halaweh, Nael M. ; Harrison, Robert W.
Author_Institution
Comput. Sci. Dept., Georgia State Univ., Atlanta, GA, USA
fYear
2010
fDate
2-5 May 2010
Firstpage
1
Lastpage
7
Abstract
MicroRNAs play an important role in post-transcriptional gene regulation. Experimental approaches to identify microRNAs are expensive and time-consuming. Computational approaches have proven to be useful for identifying microRNA candidates. Most approaches rely on features extracted from miroRNA precursors (pre-microRNA) and their secondary structure. Selecting the appropriate set of features plays a critical role in improving the prediction accuracy of pre-microRNA candidates. This work aims to investigate the triplet elements encoding scheme and to identify essential features needed for the correct classification of pre-microRNAs. To achieve these goals, an extension of the triplet elements encoding scheme is introduced. Features extracted using the extended scheme were combined with global features introduced in the literature, and fuzzy decision tree (FDT) is used as a classification and a feature selection tool. Unlike previous machine-learning-based approaches, FDT produces a human comprehensible classification model. The interpretability of the classification model provides a means to identify the essential features needed to recognize microRNA candidates and offers a better understanding of this problem. Our results indicate that the triplet elements scheme is not superior to any of its proposed extensions. Further analysis revealed that including the features extracted using triplet elements scheme does not add any value to this classification problem but rather introduces some noisy features, and comparable classification results can be achieved by using only the six global features identified by FDT.
Keywords
decision trees; feature extraction; fuzzy set theory; macromolecules; medical image processing; organic compounds; pattern classification; feature extraction; fuzzy decision trees; microRNAs precursors classification; post-transcriptional gene regulation; pre-microRNA; triplet elements encoding scheme; Biological information theory; Classification tree analysis; Decision trees; Diseases; Encoding; Feature extraction; Machine learning; RNA; Support vector machine classification; Support vector machines;
fLanguage
English
Publisher
ieee
Conference_Titel
Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), 2010 IEEE Symposium on
Conference_Location
Montreal, QC
Print_ISBN
978-1-4244-6766-2
Type
conf
DOI
10.1109/CIBCB.2010.5510430
Filename
5510430
Link To Document