Title :
Identification of mRNA poly(A) signal patterns
Author :
Wu, Xiaohui ; Liu, Qi ; Tang, Meishuang ; Zhang, Huanghui ; Yao, Junfeng ; Ji, Guoli
Author_Institution :
Dept. of Autom., Xiamen Univ., Xiamen, China
Abstract :
The poly(A) signal patterns surrounding the poly(A) site in model plant Arabidopsis thaliana were generated, selected and verified. First, candidate nucleotide patterns of different signal regions were generated based on their conservatism, using the TFxIDF index of vector space model that is widely used in text categorization. Then, effective features were selected through a genetic algorithm based wrapper feature selection method. Finally, a boosting method called Adaboost.M1 was adopted to verify the feature subset by identifying poly(A) sites. The results showed that our feature selection method could significantly reduce the dimension of feature space to enhance the classifier performance to a large extent. Moreover, the selected features could be used to improve the parameters of the poly(A) site recognition model, thus enhanced the prediction accuracy greatly. This study will not only enhance our understanding of poly(A) signals, but also concisely show a poly(A) site recognition model by applying classifier on the feature space.
Keywords :
adaptive systems; biology computing; feature extraction; learning (artificial intelligence); pattern recognition; text analysis; Adaboost.Ml; TFxIDF index; arabidopsis thaliana; genetic algorithm based wrapper feature selection method; mRNA poly(A) signal patterns identification; prediction accuracy enhancement; text categorization; vector space model; Accuracy; Classification algorithms; Genetic algorithms; Hidden Markov models; Predictive models; Tin; Training; Adaboost.M1; Poly(A) Signal; TFxIDF; Wrapper;
Conference_Titel :
Innovations in Intelligent Systems and Applications (INISTA), 2011 International Symposium on
Conference_Location :
Istanbul
Print_ISBN :
978-1-61284-919-5
DOI :
10.1109/INISTA.2011.5946151