Title :
Protein Classification using Sequential Pattern Mining
Author :
Exarchos, Themis P. ; Papaloukas, Costas ; Lampros, Christos ; Fotiadis, Dimitrios I.
Author_Institution :
Dept. of Comput. Sci., Ioannina Univ.
fDate :
Aug. 30 2006-Sept. 3 2006
Abstract :
Protein classification in terms of fold recognition can be employed to determine the structural and functional properties of a newly discovered protein. In this work sequential pattern mining (SPM) is utilized for sequence-based fold recognition. One of the most efficient SPM algorithms, cSPADE, is employed for protein primary structure analysis. Then a classifier uses the extracted sequential patterns for classifying proteins of unknown structure in the appropriate fold category. The proposed methodology exhibited an overall accuracy of 36% in a multi-class problem of 17 candidate categories. The classification performance reaches up to 65% when the three most probable protein folds are considered
Keywords :
biology computing; data mining; molecular biophysics; pattern classification; pattern recognition; proteins; cSPADE; protein classification; protein primary structure analysis; sequence-based fold recognition; sequential pattern mining; Algorithm design and analysis; Cities and towns; Data mining; Genomics; Itemsets; Pattern recognition; Proteins; Scanning probe microscopy; Sequences; USA Councils;
Conference_Titel :
Engineering in Medicine and Biology Society, 2006. EMBS '06. 28th Annual International Conference of the IEEE
Conference_Location :
New York, NY
Print_ISBN :
1-4244-0032-5
Electronic_ISBN :
1557-170X
DOI :
10.1109/IEMBS.2006.260336