Title :
Improving Training Speed of Support Vector Machines by Creating Exploitable Trends of Lagrangian Variables: An Application to DNA Splice Site Detection
Author :
Li, Jason ; Halgamuge, Saman K.
Author_Institution :
DMME, Melbourne Univ., Melbourne, VIC
Abstract :
Support vector machines are state-of-the-art machine learning algorithms that can be used for classification problems such as DNA splice site identification. However, the large number of samples in biological data sets can often lead to slow training speed. The training speed can be improved by removing non-support vectors prior to training. This paper proposes a method to predict non-support vectors with high accuracy by the use of strict- constrained gradient ascent optimisation. Unlike other data preselection methods, the proposed gradient based method is itself a training algorithm for SVM, and is also very simple to implement. Experiments with comparable results are conducted on a DNA splice-site detection problem. Results show significant speed improvements over other algorithms. The relationship between speed improvement and cache memory size is also exploited. Generalisation capability of the proposed algorithm is also shown to be better than some other reformulated SVMs.
Keywords :
DNA; biology computing; support vector machines; DNA splice site detection; Lagrangian variables; cache memory size; strict-constrained gradient ascent optimisation; support vector machines; Biomedical computing; Cache memory; DNA; Kernel; Lagrangian functions; Machine learning algorithms; Physics computing; Quadratic programming; Support vector machine classification; Support vector machines;
Conference_Titel :
Frontiers in the Convergence of Bioscience and Information Technologies, 2007. FBIT 2007
Conference_Location :
Jeju City
Print_ISBN :
978-0-7695-2999-8
DOI :
10.1109/FBIT.2007.56