Title :
Alternate representation of distance matrices for characterization of protein structure
Author :
Marsolo, Keith ; Parthasarathy, Srinivasan
Author_Institution :
Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH, USA
Abstract :
The most suitable method for the automated classification of protein structures remains an open problem in computational biology. In order to classify a protein structure with any accuracy, an effective representation must be chosen. Here we present two methods of representing protein structure. One involves representing the distances between the Ca atoms of a protein as a two-dimensional matrix and creating a model of the resulting surface with Zernike polynomials. The second uses a wavelet-based approach. We convert the distances between a protein´s Ca atoms into a one-dimensional signal which is then decomposed using a discrete wavelet transformation. Using the Zernike coefficients and the approximation coefficients of the wavelet decomposition as feature vectors, we test the effectiveness of our representation with two different classifiers on a dataset of more than 600 proteins taken from the 27 most-populated SCOP folds. We find that the wavelet decomposition greatly outperforms the Zernike model. With the wavelet representation, we achieve an accuracy of approximately 56%, roughly 12% higher than results reported on a similar, but less-challenging dataset. In addition, we can couple our structure-based feature vectors with several sequence-based properties to increase accuracy another 5-7%. Finally, we use a multi-stage classification strategy on the combined features to increase performance to 78%, an improvement in accuracy of more than 15-20% and 34% over the highest reported sequence-based and structure-based classification results, respectively.
Keywords :
Zernike polynomials; biology computing; data reduction; matrix algebra; pattern classification; proteins; wavelet transforms; 1D signal; 2D matrix; Zernike coefficients; Zernike model; Zernike polynomials; approximation coefficients; automated protein structure classification; computational biology; discrete wavelet transformation; distance matrices; multistage classification; structure-based feature vectors; wavelet decomposition; wavelet representation; wavelet-based approach; Biological system modeling; Computational biology; Computer science; Databases; Matrix converters; Matrix decomposition; Nuclear magnetic resonance; Polynomials; Protein engineering; Testing;
Conference_Titel :
Data Mining, Fifth IEEE International Conference on
Print_ISBN :
0-7695-2278-5
DOI :
10.1109/ICDM.2005.19