DocumentCode :
2455932
Title :
Pre-Processing Structured Data for Standard Machine Learning Algorithms by Supervised Graph Propositionalization - A Case Study with Medicinal Chemistry Datasets
Author :
Karunaratne, Thashmee ; Boström, Henrik ; Norinder, Ulf
Author_Institution :
Dept. of Comput. & Syst. Sci., Stockholm Univ., Stockholm, Sweden
fYear :
2010
fDate :
12-14 Dec. 2010
Firstpage :
828
Lastpage :
833
Abstract :
Graph propositionalization methods can be used to transform structured and relational data into fixed-length feature vectors, enabling standard machine learning algorithms to be used for generating predictive models. It is however not clear how well different propositionalization methods work in conjunction with different standard machine learning algorithms. Three different graph propositionalization methods are investigated in conjunction with three standard learning algorithms: random forests, support vector machines and nearest neighbor classifiers. An experiment on 21 datasets from the domain of medicinal chemistry shows that the choice of propositionalization method may have a significant impact on the resulting accuracy. The empirical investigation further shows that for datasets from this domain, the use of the maximal frequent item set approach for propositionalization results in the most accurate classifiers, significantly outperforming the two other graph propositionalization methods considered in this study, SUBDUE and MOSS, for all three learning methods.
Keywords :
chemistry computing; graph theory; learning (artificial intelligence); pattern classification; support vector machines; MOSS; SUBDUE; fixed-length feature vectors; medicinal chemistry datasets; nearest neighbor classifiers; random forests; standard machine learning algorithms; structured data preprocessing; supervised graph propositionalization method; support vector machines; Chemistry; Classification algorithms; Data mining; Itemsets; Kernel; Machine learning; Machine learning algorithms; graph propositionalization; k-nearest neighbor; medicinal chemistry; random forests; structured data; support-vector machines;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Machine Learning and Applications (ICMLA), 2010 Ninth International Conference on
Conference_Location :
Washington, DC
Print_ISBN :
978-1-4244-9211-4
Type :
conf
DOI :
10.1109/ICMLA.2010.128
Filename :
5708951
Link To Document :
بازگشت