Title :
Determination of the Relative Importance of Gene Function or Taxonomic Grouping to Codon Usage Bias Using Cluster Analysis and SVMs
Author :
Ma, Jianmin ; Nguyen, Minh N. ; Fogel, Gary B. ; Rajapakse, Jagath C.
Author_Institution :
BioInformatics Res. Center, Nanyang Technol. Univ., Singapore
Abstract :
The codon usage patterns of 2,552 major histocompatibility complex (MHC) sequences from 33 primate species, and the consequent subsets of sequences obtained by removing species with most abundant sequences was observed. The correlation between function and species with regards to MHC codon usage patterns was analyzed using cluster analysis and support vector machines (SVMs). The results show that gene function is the major factor, while species is the minor factor correlated to codon usage bias, but their interactions complicate the codon usage pattern. When the weight of the factor of species increases, the accuracy rate of classification dropped accordingly. The factors of gene function and species can be adopted as feature vectors in the field of gene classification and phylogenetic studies respectively. As the input of codon usage to the classifier is independent of sequence length and variance, our approach is useful when the sequences to be analyzed are of different lengths, a condition where classic homology-based approaches tend to be difficult. To focus on the phylogenetic features of the MHC sequences through codon usage analysis, we must try to minimize or even eliminate the influence of gene function
Keywords :
biology computing; genetics; pattern classification; pattern clustering; support vector machines; cluster analysis; codon usage bias; codon usage patterns; gene classification; gene function; histocompatibility complex sequences; phylogenetic studies; support vector machines; taxonomic grouping; Bioinformatics; Frequency; Immune system; Pattern analysis; Phylogeny; Principal component analysis; Proteins; Sequences; Support vector machine classification; Support vector machines; Cluster analysis; Major Histocompatibility Complex (MHC); Principal Component Analysis (PCA); Relative Synonymous Codon Use (RSCU) frequency; Support Vector Machines (SVM); codon usage bias;
Conference_Titel :
Computational Intelligence and Bioinformatics and Computational Biology, 2006. CIBCB '06. 2006 IEEE Symposium on
Conference_Location :
Toronto, Ont.
Print_ISBN :
1-4244-0623-4
Electronic_ISBN :
1-4244-0624-2
DOI :
10.1109/CIBCB.2006.330955