• DocumentCode
    952238
  • Title

    Gene Classification Using Codon Usage and Support Vector Machines

  • Author

    Ma, Jianmin ; Nguyen, Minh N. ; Rajapakse, Jagath C.

  • Author_Institution
    Biolnf. Res. Center, Nanyang Technol. Univ., Singapore
  • Volume
    6
  • Issue
    1
  • fYear
    2009
  • Firstpage
    134
  • Lastpage
    143
  • Abstract
    A novel approach for gene classification, which adopts codon usage bias as input feature vector for classification by support vector machines (SVM) is proposed. The DNA sequence is first converted to a 59-dimensional feature vector where each element corresponds to the relative synonymous usage frequency of a codon. As the input to the classifier is independent of sequence length and variance, our approach is useful when the sequences to be classified are of different lengths, a condition that homology-based methods tend to fail. The method is demonstrated by using 1,841 Human Leukocyte Antigen (HLA) sequences which are classified into two major classes: HLA-I and HLA-II; each major class is further subdivided into sub-groups of HLA-I and HLA-II molecules. Using codon usage frequencies, binary SVM achieved accuracy rate of 99.3% for HLA major class classification and multi-class SVM achieved accuracy rates of 99.73% and 98.38% for sub-class classification of HLA-I and HLA-II molecules, respectively. The results show that gene classification based on codon usage bias is consistent with the molecular structures and biological functions of HLA molecules.
  • Keywords
    DNA; biology computing; genetics; molecular biophysics; pattern classification; support vector machines; DNA sequence; HLA-I molecules; HLA-II molecules; biological function; codon usage bias; codon usage frequencies; gene classification; human leukocyte antigen sequences; input feature vector; molecular structures; support vector machines; Cluster analysis; Human Leukocyte Antigen (HLA); Major Histocompatibility Complex (MHC); Relative Synonymous Codon Use (RSCU) frequency; codon usage bias; gene classification; Algorithms; Artificial Intelligence; Codon; Databases, Genetic; Discriminant Analysis; Genes; Genes, MHC Class I; Genes, MHC Class II; Genetic Code; HLA Antigens; Humans; Major Histocompatibility Complex; Normal Distribution; Pattern Recognition, Automated; Reproducibility of Results; Sequence Analysis, DNA;
  • fLanguage
    English
  • Journal_Title
    Computational Biology and Bioinformatics, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    1545-5963
  • Type

    jour

  • DOI
    10.1109/TCBB.2007.70240
  • Filename
    4359889