• DocumentCode
    2961840
  • Title

    Classification of molecular structures made easy

  • Author

    Trentin, Edmondo ; Iorio, Ernesto Di

  • Author_Institution
    Dipt. di Ing. dell´´Inf., Univ. di Siena, Siena
  • fYear
    2008
  • fDate
    1-8 June 2008
  • Firstpage
    3241
  • Lastpage
    3246
  • Abstract
    Several problems in bioinformatics and cheminformatics concern the classification of molecules. Relevant instances are automatic cancer detection/classification, machine-learning pathologic prediction, automatic predictive toxicology, etc. Molecules may be represented in terms of graphical structures in a natural way: each node in the graph can be used to represent an atom, whilst the edges of the graph represent the atom-atom bonds. Labels (in the form of real-valued vectors) are associated with nodes and edges in order to express physical and chemical properties of the corresponding atoms and bonds, respectively. These structured data are expected to contain more information than a traditional (flat) feature vector, information that may strengthen the classification capabilities of a machine learner. This paper investigates the application of a novel Bayesian/connectionist classifier to this graphical pattern recognition task. The approach is much simpler than state-of-the-art machine learning paradigms for graphical/relational learning. It relies on the idea of describing the graph in terms of a binary relation. The posterior probability of a class given the relation is estimated as a function of probabilistic quantities modeled with a neural network, trained over individual vertex pairs in the graph. The popular and challenging Mutagenesis dataset is considered for the experimental evaluation. Despite its simplicity, the technique turns out to yield the highest recognition accuracies to date on the complete (friendly + unfriendly) dataset, outperforming complex machines (relational and graph neural nets, kernels for graphs, inductive logic programming techniques, etc.). Some preliminary chemical/biological implications are eventually hypothesized in the light of the results obtained.
  • Keywords
    Bayes methods; biology computing; graph theory; molecular biophysics; pattern classification; probability; Bayesian/connectionist classifier; Mutagenesis dataset; bioinformatics; cheminformatics; graphical pattern recognition; molecular structure; molecules classification; neural network; posterior probability; Bayesian methods; Bioinformatics; Cancer detection; Chemicals; Kernel; Logic programming; Machine learning; Neural networks; Pattern recognition; Toxicology;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Neural Networks, 2008. IJCNN 2008. (IEEE World Congress on Computational Intelligence). IEEE International Joint Conference on
  • Conference_Location
    Hong Kong
  • ISSN
    1098-7576
  • Print_ISBN
    978-1-4244-1820-6
  • Electronic_ISBN
    1098-7576
  • Type

    conf

  • DOI
    10.1109/IJCNN.2008.4634258
  • Filename
    4634258