Title :
A classification of “Gracilaria changii” protein sequences using back-propagation classifier
Author :
Mohamed, Nur Shazila ; Othman, Zulaiha Ali ; Bakar, Afarulrazi Abu
Author_Institution :
Fac. of Inf. Sci. & Technol., Univ. Kebangsaan Malaysia, Bangi, Malaysia
Abstract :
This paper focuses on protein sequences family classification from Gracilaria changii seaweed species using back-propagation classifier. Classification of protein sequence family is to infer the function of an unknown protein by analysing its structural similarity to a given family of proteins. The use of sequence alignment technique to classify the protein sequence is less efficient because the entire sequence is used for classification. Data mining offers the uses of an artificial intelligence technique that is well known and good for classification. Therefore, the purpose of this research is to develop protein sequences classification for Gracilaria changii using data mining approach with feature extraction. The feature extraction is to identify the best features in the overall sequence. Data preparation for feature extraction is used bioinformatics tools to translate DNA to protein (batch translator) and to analyze the family protein (InterProScan). The feature extraction process is done on the data that has been prepared using the 2-gram method. Features that are obtained with this method are then used to develop the classification model using back-propagation neural network technique (RNPB). Experiment results from RNPB are then compared with the sequence alignment technique (HMMER). The comparison results show that classification model produced from RNPB is better than sequence alignment technique with average accuracy for the whole family as much as 99.01% compared 96.51%. For the specificity and sensitivity of the prediction, the HMMER and ANN were equally efficient.
Keywords :
DNA; backpropagation; bioinformatics; data mining; feature extraction; neural nets; pattern classification; proteins; 2-gram method; DNA; Gracilaria changii protein sequence family classification; HMMER; RNPB; artificial intelligence technique; back-propagation neural network technique; batch translator; bioinformatics tool; data mining; data preparation; feature extraction; seaweed species; sequence alignment technique; structural similarity; Artificial intelligence; Artificial neural networks; Bioinformatics; DNA; Data mining; Feature extraction; Hidden Markov models; Neural networks; Protein engineering; Protein sequence; Gracilaria changii; Protein; back-propagation neural network; classification; data mining; sequence alignment;
Conference_Titel :
Data Mining and Optimization, 2009. DMO '09. 2nd Conference on
Conference_Location :
Kajand
Print_ISBN :
978-1-4244-4944-6
DOI :
10.1109/DMO.2009.5341902