مرکز منطقه ای اطلاع رساني علوم و فناوري - New Method Combining Feature Weighting and Feature Selection for Protein Classification

DocumentCode :

174846

Title :

New Method Combining Feature Weighting and Feature Selection for Protein Classification

Author :

El Haj Mohamed, Salma Aouled ; Mhamdi, Faouzi

Author_Institution :

Nat. Super. Sch. of Eng. of Tunis (ENSIT), Univ. of Tunis, Tunis, Tunisia

fYear :

2014

fDate :

1-5 Sept. 2014

Firstpage :

Lastpage :

Abstract :

The primary biological data structure is represented as a string of characters. Several issues in bioinformatics are interested in handling this type of data. Such as the alignment of biological sequences, the 2D/3D structure prediction and detection of anomalies in the genes. Our work is integrated into the framework of knowledge discovery from biological data (KDBD) process and specifically the pre-processing phase. We are interested in protein classification using features extracted from their primary structures. Data mining techniques require a data matrix: Individual X Features (sequence X n-grams, in our case), so we understand the importance of the type of features used, their number and their weighting in the protein classification problem. In this paper we first present a new method for feature weighting based on dynamic programming of the Smith/Waterman local alignment algorithm, then a new method of feature selection. We used the SVM classifier to calculate the error rates. The results have shown the effectiveness of this work, especially by comparing it to previous works.

Keywords :

bioinformatics; data mining; data structures; feature selection; pattern classification; proteins; support vector machines; 2D structure prediction; 3D structure prediction; KDBD process; SVM classifier; Smith-Waterman local alignment algorithm; X-features; bioinformatics; biological sequence alignment; character string; data matrix; data mining techniques; dynamic programming; error rates; feature extraction; feature selection; feature weighting; gene anomaly detection; knowledge discovery-from-biological data process; preprocessing phase; primary biological data structure; protein classification; Classification algorithms; Data mining; Error analysis; Feature extraction; Proteins; Support vector machines; KDBD; SVM classifier; feature extraction; feature selection; feature weighting; protein classification;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Database and Expert Systems Applications (DEXA), 2014 25th International Workshop on

Conference_Location :

Munich

ISSN :

1529-4188

Print_ISBN :

978-1-4799-5721-7

Type :

conf

DOI :

10.1109/DEXA.2014.27

Filename :

6974826

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=174846