• DocumentCode
    2092732
  • Title

    Prediction of Mucin-type O-glycosylation by Support Vector Machines

  • Author

    Nishikawa, Ikuko ; Sakamoto, Hirotaka ; Nouno, Ikue ; Sakakibara, Kazutoshi ; Ito, Masahiro

  • Author_Institution
    Ritsumeikan Univ., Kusatsu
  • fYear
    2007
  • fDate
    23-27 May 2007
  • Firstpage
    1870
  • Lastpage
    1874
  • Abstract
    Mucin-type O-glycosylation is one of the main types of the mammalian protein glycosylation. It is serine (Ser) or threonine (Thr) specific, though any consensus sequence is still unknown. In this report, support vector machines (SVM) are used for the prediction of O-glycosylation for each Ser or Thr site in the protein sequences. 99 mammalian protein sequences are selected from UniProt8.0. A certain length of a protein subsequence with Ser or Thr site at the center is used as input data to SVM, after the encoding in three ways. That is, sparse encoding, 5-letter encoding, and multiple encoding which uses both sparse and 5-letter encodings. The results of prediction experiments show that multiple encoding is most effective. The effective prediction requires the detailed information on amino acid residues in the nearest neighbors of the prediction target site, and the relatively rough information of biochemical characteristics on amino acid residues within approximately the 15th nearest neighbors of the target site. In addition, it is observed that the ratio of positive to negative data for the learning affects the performance.
  • Keywords
    association; biology computing; molecular biophysics; proteins; sugar; support vector machines; 5 letter encoding; SVM learning; UniProt8.0; amino acid residue; mammalian protein glycosylation; mammalian protein sequence; mucin type O glycosylation prediction; multiple encoding; protein subsequence length; serine; sparse encoding; support vector machine; threonine; Alzheimer´s disease; Amino acids; Biological information theory; Databases; Encoding; Indium tin oxide; Lipidomics; Nearest neighbor searches; Protein sequence; Support vector machines;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Complex Medical Engineering, 2007. CME 2007. IEEE/ICME International Conference on
  • Conference_Location
    Beijing
  • Print_ISBN
    978-1-4244-1077-4
  • Electronic_ISBN
    978-1-4244-1078-1
  • Type

    conf

  • DOI
    10.1109/ICCME.2007.4382072
  • Filename
    4382072