Title :
G-protein Coupled Receptor Subfamilies Prediction Based on Nearest Neighbor Approach
Author :
Fayyaz, Mudassir ; Mujahid, Adnan ; Khan, Asifullah ; Choi, Tae-Sun ; Iqbal, Nadeem
Author_Institution :
Ghulam Ishaq Khan Inst. of Eng. Sci. & Technol., Swabi
Abstract :
Hydrophobicity has been considered as the potential measurement for the prediction of G-Proteins coupled receptor subfamilies. In the present work, using Hydrophobicity measure, we make use of fast Fourier transform to better analyze the sequence information. In our experiments, we have observed that sequence pattern based information could easily be exploited in the frequency domain using proximity rather than increasing margin of separation between the classes. Based on this information, a simple nearest neighbor (NN) method is then used to classify the 17 subfamilies. The proposed proximity based approach has outperformed the one against all implementation of support vector machine (SVM) [Y. Z. Guo, et al, Acta Biochimica et Biophysica Sinica, 37(2005) 759]. Our simple proximity based approach has superior performance in terms of all three measures on both Jackknife and independent data set. For B, C, D and F subfamilies, the Mathew´s correlation coefficient and overall accuracy using jackknife test are 0.96 and 96.03%, while, using independent data set are 0.91 and 91.6% respectively. The results validate the idea of exploiting sequence pattern based information in the frequency domain using proximity in terms of Euclidian distance. Another side advantage is that instead of training and saving 17 SVM models, we need a single NN classifier.
Keywords :
Fourier transform spectra; biochemistry; biological techniques; biology computing; cellular biophysics; correlation methods; molecular biophysics; proteins; Euclidian distance; G-protein coupled receptor subfamily prediction; Mathew´s correlation coefficient; SVM models; fast Fourier transform; frequency domain; hydrophobicity; jackknife test; nearest neighbor approach; proximity based approach; sequence pattern based information; single NN classifier; support vector machine; Amino acids; Databases; Fast Fourier transforms; Hidden Markov models; Nearest neighbor searches; Neural networks; Proteins; Sequences; Support vector machine classification; Support vector machines; Fast Fourier Transform; G-Proteins Coupled Receptors; Multilevel classification; Nearest Neighbor Classifier;
Conference_Titel :
Bioinformatics and Bioengineering, 2007. BIBE 2007. Proceedings of the 7th IEEE International Conference on
Conference_Location :
Boston, MA
Print_ISBN :
978-1-4244-1509-0
DOI :
10.1109/BIBE.2007.4375745