• DocumentCode
    3074247
  • Title

    A Data Mining Approach to Predicting Phylum for Microbial Organisms Using Genome-Wide Sequence Data

  • Author

    Kotamarti, Rao M. ; Raiford, Douglas W. ; Raymer, Michael L. ; Dunham, Margaret H.

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Southern Methodist Univ., Dallas, TX, USA
  • fYear
    2009
  • fDate
    22-24 June 2009
  • Firstpage
    161
  • Lastpage
    167
  • Abstract
    Genomic sequencing projects are generating vast stores of data that provide opportunities and challenges in data analysis. Investigations of trends in codon usage have proven to be a rich area of study in this field. There are a number of methods for isolating codon usage bias in microbial organisms, each designed to capture a specific aspect of the bias. We posit that each species has evolved under the influence of a unique set of environmental constraints that has governed the shaping of the organism´s codon usage. Analysis of codon usage data should, therefore, provide insights into the selection process at work influencing genomic composition. To this end, we describe the large-scale mining of genome-level data from several codon usage bias isolation techniques to determine whether this information can be used to predict the phylum and class to which each organism belongs. Successful prediction is an indication that the forces molding the codon usage of a given phylum/class are indeed distinctive, and that it would be of use in understanding the evolutionary forces involved. Additionally, it supports using this method to aid in, and validate existing taxonomic classification techniques.
  • Keywords
    bioinformatics; data analysis; data mining; evolution (biological); genetics; genomics; microorganisms; data analysis; environmental constraints; evolution; genome-level data mining approach; genome-wide sequence data; genomic sequencing projects; microbial organisms; organism codon usage; phylum prediction; taxonomic classification technique; Bioinformatics; Computer science; Data analysis; Data mining; Databases; Genetic engineering; Genomics; Microorganisms; Organisms; Sequences;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Bioinformatics and BioEngineering, 2009. BIBE '09. Ninth IEEE International Conference on
  • Conference_Location
    Taichung
  • Print_ISBN
    978-0-7695-3656-9
  • Type

    conf

  • DOI
    10.1109/BIBE.2009.14
  • Filename
    5211298