DocumentCode
1992693
Title
G-protein Coupled Receptor Subfamilies Prediction Based on Nearest Neighbor Approach
Author
Fayyaz, Mudassir ; Mujahid, Adnan ; Khan, Asifullah ; Choi, Tae-Sun ; Iqbal, Nadeem
Author_Institution
Ghulam Ishaq Khan Inst. of Eng. Sci. & Technol., Swabi
fYear
2007
fDate
14-17 Oct. 2007
Firstpage
1348
Lastpage
1354
Abstract
Hydrophobicity has been considered as the potential measurement for the prediction of G-Proteins coupled receptor subfamilies. In the present work, using Hydrophobicity measure, we make use of fast Fourier transform to better analyze the sequence information. In our experiments, we have observed that sequence pattern based information could easily be exploited in the frequency domain using proximity rather than increasing margin of separation between the classes. Based on this information, a simple nearest neighbor (NN) method is then used to classify the 17 subfamilies. The proposed proximity based approach has outperformed the one against all implementation of support vector machine (SVM) [Y. Z. Guo, et al, Acta Biochimica et Biophysica Sinica, 37(2005) 759]. Our simple proximity based approach has superior performance in terms of all three measures on both Jackknife and independent data set. For B, C, D and F subfamilies, the Mathew´s correlation coefficient and overall accuracy using jackknife test are 0.96 and 96.03%, while, using independent data set are 0.91 and 91.6% respectively. The results validate the idea of exploiting sequence pattern based information in the frequency domain using proximity in terms of Euclidian distance. Another side advantage is that instead of training and saving 17 SVM models, we need a single NN classifier.
Keywords
Fourier transform spectra; biochemistry; biological techniques; biology computing; cellular biophysics; correlation methods; molecular biophysics; proteins; Euclidian distance; G-protein coupled receptor subfamily prediction; Mathew´s correlation coefficient; SVM models; fast Fourier transform; frequency domain; hydrophobicity; jackknife test; nearest neighbor approach; proximity based approach; sequence pattern based information; single NN classifier; support vector machine; Amino acids; Databases; Fast Fourier transforms; Hidden Markov models; Nearest neighbor searches; Neural networks; Proteins; Sequences; Support vector machine classification; Support vector machines; Fast Fourier Transform; G-Proteins Coupled Receptors; Multilevel classification; Nearest Neighbor Classifier;
fLanguage
English
Publisher
ieee
Conference_Titel
Bioinformatics and Bioengineering, 2007. BIBE 2007. Proceedings of the 7th IEEE International Conference on
Conference_Location
Boston, MA
Print_ISBN
978-1-4244-1509-0
Type
conf
DOI
10.1109/BIBE.2007.4375745
Filename
4375745
Link To Document