DocumentCode
2845060
Title
Secondary structure prediction using SVM and clustering
Author
Doong, Shing H. ; Yeh, Chi Y.
Author_Institution
Dept. of Inf. Manage., ShuTe Univ., Kaohsiung, Taiwan
fYear
2004
fDate
5-8 Dec. 2004
Firstpage
297
Lastpage
302
Abstract
Protein secondary structure can be used to help determine the tertiary structure via the fold recognition method. Predicting the secondary structure from the protein sequence has attracted the attention of many researchers. Support vector machine (SVM) is a new learning algorithm that has been successfully applied to many prediction problems. However, the algorithm takes a long time to train the prediction model when a large data set is present. It becomes important to revise the method so that the time performance is improved while the accuracy performance is maintained. In this study, we implement a genetic algorithm to cluster the training set before a prediction model is built. Using position specific scoring matrix (PSSM) as part of the input, the hybrid method achieves good performances on sets of 513 nonredundant protein sequences and 294 partially redundant sequences. The results also show that clustering achieves the goal of data preprocessing differently on redundant and nonredundant sets, and it seems almost preferable to cluster the data before prediction is preformed.
Keywords
biology computing; genetic algorithms; learning (artificial intelligence); pattern clustering; proteins; sequences; support vector machines; SVM; genetic algorithm; pattern clustering; position specific scoring matrix; protein secondary structure prediction; protein sequences; support vector machine; training set; Accuracy; Artificial neural networks; Clustering algorithms; Encoding; Information management; Machine learning algorithms; Prediction algorithms; Predictive models; Proteins; Support vector machines; Secondary structure prediction; clustering; position specific scoring matrix; support vector machine;
fLanguage
English
Publisher
ieee
Conference_Titel
Hybrid Intelligent Systems, 2004. HIS '04. Fourth International Conference on
Print_ISBN
0-7695-2291-2
Type
conf
DOI
10.1109/ICHIS.2004.84
Filename
1410020
Link To Document