• DocumentCode
    174846
  • Title

    New Method Combining Feature Weighting and Feature Selection for Protein Classification

  • Author

    El Haj Mohamed, Salma Aouled ; Mhamdi, Faouzi

  • Author_Institution
    Nat. Super. Sch. of Eng. of Tunis (ENSIT), Univ. of Tunis, Tunis, Tunisia
  • fYear
    2014
  • fDate
    1-5 Sept. 2014
  • Firstpage
    51
  • Lastpage
    55
  • Abstract
    The primary biological data structure is represented as a string of characters. Several issues in bioinformatics are interested in handling this type of data. Such as the alignment of biological sequences, the 2D/3D structure prediction and detection of anomalies in the genes. Our work is integrated into the framework of knowledge discovery from biological data (KDBD) process and specifically the pre-processing phase. We are interested in protein classification using features extracted from their primary structures. Data mining techniques require a data matrix: Individual X Features (sequence X n-grams, in our case), so we understand the importance of the type of features used, their number and their weighting in the protein classification problem. In this paper we first present a new method for feature weighting based on dynamic programming of the Smith/Waterman local alignment algorithm, then a new method of feature selection. We used the SVM classifier to calculate the error rates. The results have shown the effectiveness of this work, especially by comparing it to previous works.
  • Keywords
    bioinformatics; data mining; data structures; feature selection; pattern classification; proteins; support vector machines; 2D structure prediction; 3D structure prediction; KDBD process; SVM classifier; Smith-Waterman local alignment algorithm; X-features; bioinformatics; biological sequence alignment; character string; data matrix; data mining techniques; dynamic programming; error rates; feature extraction; feature selection; feature weighting; gene anomaly detection; knowledge discovery-from-biological data process; preprocessing phase; primary biological data structure; protein classification; Classification algorithms; Data mining; Error analysis; Feature extraction; Proteins; Support vector machines; KDBD; SVM classifier; feature extraction; feature selection; feature weighting; protein classification;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Database and Expert Systems Applications (DEXA), 2014 25th International Workshop on
  • Conference_Location
    Munich
  • ISSN
    1529-4188
  • Print_ISBN
    978-1-4799-5721-7
  • Type

    conf

  • DOI
    10.1109/DEXA.2014.27
  • Filename
    6974826