Title :
New Method Combining Feature Weighting and Feature Selection for Protein Classification
Author :
El Haj Mohamed, Salma Aouled ; Mhamdi, Faouzi
Author_Institution :
Nat. Super. Sch. of Eng. of Tunis (ENSIT), Univ. of Tunis, Tunis, Tunisia
Abstract :
The primary biological data structure is represented as a string of characters. Several issues in bioinformatics are interested in handling this type of data. Such as the alignment of biological sequences, the 2D/3D structure prediction and detection of anomalies in the genes. Our work is integrated into the framework of knowledge discovery from biological data (KDBD) process and specifically the pre-processing phase. We are interested in protein classification using features extracted from their primary structures. Data mining techniques require a data matrix: Individual X Features (sequence X n-grams, in our case), so we understand the importance of the type of features used, their number and their weighting in the protein classification problem. In this paper we first present a new method for feature weighting based on dynamic programming of the Smith/Waterman local alignment algorithm, then a new method of feature selection. We used the SVM classifier to calculate the error rates. The results have shown the effectiveness of this work, especially by comparing it to previous works.
Keywords :
bioinformatics; data mining; data structures; feature selection; pattern classification; proteins; support vector machines; 2D structure prediction; 3D structure prediction; KDBD process; SVM classifier; Smith-Waterman local alignment algorithm; X-features; bioinformatics; biological sequence alignment; character string; data matrix; data mining techniques; dynamic programming; error rates; feature extraction; feature selection; feature weighting; gene anomaly detection; knowledge discovery-from-biological data process; preprocessing phase; primary biological data structure; protein classification; Classification algorithms; Data mining; Error analysis; Feature extraction; Proteins; Support vector machines; KDBD; SVM classifier; feature extraction; feature selection; feature weighting; protein classification;
Conference_Titel :
Database and Expert Systems Applications (DEXA), 2014 25th International Workshop on
Conference_Location :
Munich
Print_ISBN :
978-1-4799-5721-7
DOI :
10.1109/DEXA.2014.27