Title :
Protein superfamily classification using Kernel Principal Component Analysis and Probabilistic Neural Networks
Author :
Vipsita, Swati ; Shee, Bithin Kanti ; Rath, Santanu Kumar
Author_Institution :
Dept. of Comput. Sci. & Eng., N.I.T. Rourkela, Rourkela, India
Abstract :
This paper intends to implement Probabilistic Neural Network(PNN) for protein superfamily classification problem. The classification task organizes proteins into their superfamilies and helps in correct prediction of structure and function of newly discovered proteins. The two main steps for any pattern classification problem are feature selection and feature extraction. The bi-gram hashing function is used which extracts and counts the occurrences of bi-gram patterns from long strings of amino acid sequences. The bi-gram method maps sequences of different length into input vectors of same length, but the major drawback of this method is that, the size of the input feature vector tends to be very large. Selection of optimal number of features remains a critical issue for any pattern classification problem. Principal Component Analysis(PCA), a very powerful statistical technique, is used to reduce the dimension of the large input vector without much loss of information and thereby identifying pattern in data of high dimension. Traditional PCA makes a linear transformation wheras Kernel PCA(KPCA) is used when data are distributed nonlinearly. Numerical simulations have shown that for protein data distributed non-linearly, KPCA outperforms PCA in terms of accuracy, sensitivity and specificity.
Keywords :
bioinformatics; neural nets; pattern classification; principal component analysis; proteins; PNN; bi-gram hashing function; feature selection; kernel principal component analysis; pattern classification problem; probabilistic neural networks; protein superfamily classification; Amino acids; Feature extraction; Kernel; Principal component analysis; Proteins; Training; Vectors; Dimensionality Reduction; Feature extraction; Feature selection; Gaussian Kernel; Precision; Sensitivity; Smoothing parameter; Specificity;
Conference_Titel :
India Conference (INDICON), 2011 Annual IEEE
Conference_Location :
Hyderabad
Print_ISBN :
978-1-4577-1110-7
DOI :
10.1109/INDCON.2011.6139395