Title :
Biosequence Classification using Sequential Pattern Mining and Optimization
Author :
Fotiadis, D.I. ; Exarchos, T.P. ; Tsipouras, M.G. ; Papaloukas, C.
Author_Institution :
Univ. of Ioannina, Ioannina
Abstract :
In this paper we present a methodology for biosequence classification, which employs sequential pattern mining and optimization algorithms. In the first stage, a sequential pattern mining algorithm is applied to a set of biological sequences and the sequential patterns are extracted. Then, the score of each pattern with respect to each sequence is calculated using a scoring function and the score of each class under consideration is estimated. The scores of the patterns and classes are updated, multiplied by a weight. In the second stage an optimization technique is employed to calculate the weight values to achieve the optimal classification accuracy. The methodology is applied in the protein class and fold prediction problem. Extensive evaluation is carried out, using a dataset obtained from the Protein Data Bank.
Keywords :
biology computing; data mining; optimisation; pattern classification; biosequence classification; optimization algorithm; protein data bank; scoring function; sequential pattern mining; Application software; Biology; Computer science; DNA; Information systems; Intelligent systems; Optimization methods; Proteins; Sequences; Text categorization; Sequential pattern mining; biosequence classification; optimization;
Conference_Titel :
Information Technology Applications in Biomedicine, 2007. ITAB 2007. 6th International Special Topic Conference on
Conference_Location :
Tokyo
Print_ISBN :
978-1-4244-1868-8
Electronic_ISBN :
978-1-4244-1868-8
DOI :
10.1109/ITAB.2007.4407423