DocumentCode
2961840
Title
Classification of molecular structures made easy
Author
Trentin, Edmondo ; Iorio, Ernesto Di
Author_Institution
Dipt. di Ing. dell´´Inf., Univ. di Siena, Siena
fYear
2008
fDate
1-8 June 2008
Firstpage
3241
Lastpage
3246
Abstract
Several problems in bioinformatics and cheminformatics concern the classification of molecules. Relevant instances are automatic cancer detection/classification, machine-learning pathologic prediction, automatic predictive toxicology, etc. Molecules may be represented in terms of graphical structures in a natural way: each node in the graph can be used to represent an atom, whilst the edges of the graph represent the atom-atom bonds. Labels (in the form of real-valued vectors) are associated with nodes and edges in order to express physical and chemical properties of the corresponding atoms and bonds, respectively. These structured data are expected to contain more information than a traditional (flat) feature vector, information that may strengthen the classification capabilities of a machine learner. This paper investigates the application of a novel Bayesian/connectionist classifier to this graphical pattern recognition task. The approach is much simpler than state-of-the-art machine learning paradigms for graphical/relational learning. It relies on the idea of describing the graph in terms of a binary relation. The posterior probability of a class given the relation is estimated as a function of probabilistic quantities modeled with a neural network, trained over individual vertex pairs in the graph. The popular and challenging Mutagenesis dataset is considered for the experimental evaluation. Despite its simplicity, the technique turns out to yield the highest recognition accuracies to date on the complete (friendly + unfriendly) dataset, outperforming complex machines (relational and graph neural nets, kernels for graphs, inductive logic programming techniques, etc.). Some preliminary chemical/biological implications are eventually hypothesized in the light of the results obtained.
Keywords
Bayes methods; biology computing; graph theory; molecular biophysics; pattern classification; probability; Bayesian/connectionist classifier; Mutagenesis dataset; bioinformatics; cheminformatics; graphical pattern recognition; molecular structure; molecules classification; neural network; posterior probability; Bayesian methods; Bioinformatics; Cancer detection; Chemicals; Kernel; Logic programming; Machine learning; Neural networks; Pattern recognition; Toxicology;
fLanguage
English
Publisher
ieee
Conference_Titel
Neural Networks, 2008. IJCNN 2008. (IEEE World Congress on Computational Intelligence). IEEE International Joint Conference on
Conference_Location
Hong Kong
ISSN
1098-7576
Print_ISBN
978-1-4244-1820-6
Electronic_ISBN
1098-7576
Type
conf
DOI
10.1109/IJCNN.2008.4634258
Filename
4634258
Link To Document