Title :
Prediction of the O-glycosylation by Support Vector Machines and Characteristics of the Crowded and Isolated O-glycosylation Sites
Author :
Nakajima, Yukiko ; Sakakibara, Kazutoshi ; Ito, Masahiro ; Nishikawa, Ikuko
Author_Institution :
Coll. of Inf. Sci. & Eng., Ritsumeikan Univ., Kusatsu, Japan
Abstract :
O-glycosylation of the mammalian protein is studied. It is serine or threonine specific, though any consensus sequence is still unknown. We have been applied support vector machines (SVM) for the prediction of O-glycosylation sites from various kinds of protein information, aiming to investigate a glycosylation condition and elucidate the mechanisms. In the present study, we focus on the distribution of the glycosylation sites. Many O-glycosylated sites are observed in clusters of closely spaced glycosylated sites, whereas the other sites are found sparsely or isolated. These two types of crowded and isolated sites may have different glycosylation mechanisms. We divide the whole O-glycosylation sites into the crowded and the isolated groups. For each group, SVM is trained to predict the O-glycosylation sites separately. The prediction results of two groups have different input information dependency. The results indicate that some motifs are expected for the isolated group, while the interaction between the glycosylated sites and the relative proportion of the surrounding amino acids affect the glycosylation for the crowded group. Then, we also compare the statistics of amino acid sequences around the glycosylation sites of both groups. As the results, some amino acids (proline, valine, alanine etc.) have high existence probabilities at each specific positions relative to a glycosylation site, especially for the isolated glycosylation. Moreover, independent component analysi (ICA) for the amino acid sequences elucidates position specific existences of the above amino acids, including well known proline at -1 and +3, which are found as different independent components.
Keywords :
biology computing; independent component analysis; proteins; sequences; statistical distributions; support vector machines; O-glycosylation prediction; SVM training; amino acid; amino acid sequence; consensus sequence; crowded group; independent component analysis; information dependency; isolated O-glycosylation site distribution; mammalian protein; probability; support vector machine; Alzheimer´s disease; Amino acids; Educational institutions; Independent component analysis; Information science; Probability; Protein engineering; Sequences; Statistics; Support vector machines; Independent Component Analysis; Support Vector Machine; glycosylation; prediction; protein;
Conference_Titel :
Intelligent Information Hiding and Multimedia Signal Processing, 2009. IIH-MSP '09. Fifth International Conference on
Conference_Location :
Kyoto
Print_ISBN :
978-1-4244-4717-6
Electronic_ISBN :
978-0-7695-3762-7
DOI :
10.1109/IIH-MSP.2009.154