• DocumentCode
    671699
  • Title

    A dissimilarity-based classifier for generalized sequences by a granular computing approach

  • Author

    Rizzi, Antonello ; Possemato, Francesca ; Livi, Lorenzo ; Sebastiani, Azzurra ; Giuliani, Alessandro ; Mascioli, Fabio Massimo Frattale

  • Author_Institution
    Dept. of Inf. Eng., Electron., & Telecommun., SAPIENZA Univ. of Rome, Rome, Italy
  • fYear
    2013
  • fDate
    4-9 Aug. 2013
  • Firstpage
    1
  • Lastpage
    8
  • Abstract
    In this paper we propose a classifier for generalized sequences that is conceived in the granular computing framework. The classification system processes the input sequences of objects by means of a suited interplay among dissimilarity and clustering based techniques. The core data mining engine retrieves information granules that are used to represent the input sequences as feature vectors. Such a representation allows to deal with the original sequence classification problem through standard pattern recognition tools. We have evaluated the generalization capability of the system in an interesting case study concerning the protein folding problem. In the considered dataset, the entire E. Coli proteome was screened as for the prediction of protein relative solubility on a pure amino acids sequence basis. We report the analysis of the dataset considering different settings, showing interesting test set classification accuracy results. The developed system consents also to extract knowledge from the considered training set, by allowing the analysis of the retrieved information granules.
  • Keywords
    biology computing; data mining; granular computing; pattern classification; pattern clustering; proteins; vectors; E. Coli proteome; clustering based technique; core data mining engine; dissimilarity based technique; dissimilarity-based classifier; feature vectors; generalization capability; generalized sequences; granular computing approach; information granule retrieval; input sequence representation; knowledge extraction; pattern recognition tools; protein folding problem; protein relative solubility prediction; pure amino acids sequence basis; sequence classification problem; suited interplay; test set classification; Data mining; Feature extraction; Histograms; Mathematical model; Optimization; Proteins; Training; Granular computing and modeling; Protein folding prediction; Sequence representation and classification;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Neural Networks (IJCNN), The 2013 International Joint Conference on
  • Conference_Location
    Dallas, TX
  • ISSN
    2161-4393
  • Print_ISBN
    978-1-4673-6128-6
  • Type

    conf

  • DOI
    10.1109/IJCNN.2013.6707041
  • Filename
    6707041