• DocumentCode
    1268021
  • Title

    Distinguishing Endogenous Retroviral LTRs from SINE Elements Using Features Extracted from Evolved Side Effect Machines

  • Author

    Ashlock, Wendy ; Datta, Soupayan

  • Author_Institution
    Dept. of Comput. Sci. & Eng., York Univ., Toronto, ON, Canada
  • Volume
    9
  • Issue
    6
  • fYear
    2012
  • Firstpage
    1676
  • Lastpage
    1689
  • Abstract
    Side effect machines produce features for classifiers that distinguish different types of DNA sequences. They have the, as yet unexploited, potential to give insight into biological features of the sequences. We introduce several innovations to the production and use of side effect machine sequence features. We compare the results of using consensus sequences and genomic sequences for training classifiers and find that more accurate results can be obtained using genomic sequences. Surprisingly, we were even able to build a classifier that distinguished consensus sequences from genomic sequences with high accuracy, suggesting that consensus sequences are not always representative of their genomic counterparts. We apply our techniques to the problem of distinguishing two types of transposable elements, solo LTRs and SINEs. Identifying these sequences is important because they affect gene expression, genome structure, and genetic diversity, and they serve as genetic markers. They are of similar length, neither codes for protein, and both have many nearly identical copies throughout the genome. Being able to efficiently and automatically distinguish them will aid efforts to improve annotations of genomes. Our approach reveals structural characteristics of the sequences of potential interest to biologists.
  • Keywords
    biology computing; feature extraction; genetic algorithms; genetics; genomics; molecular biophysics; molecular configurations; proteins; DNA sequences; SINE elements; biological features; biologists; classifier training; endogenous retroviral LTRs; evolved side effect machines; feature extraction; gene expression; genetic diversity; genetic markers; genome structure; genomic sequences; protein codes; structural characteristics; transposable elements; Bioinformatics; DNA; Feature extraction; Genetic algorithms; Genomics; Machine learning; Endogenous retroviruses; LTR retrotransposons; SINE elements; evolutionary computing and genetic algorithms; feature evaluation and selection; machine learning; side effect machines; Algorithms; Artificial Intelligence; Cluster Analysis; Computational Biology; DNA Transposable Elements; Humans; Retroviridae; Short Interspersed Nucleotide Elements; Terminal Repeat Sequences;
  • fLanguage
    English
  • Journal_Title
    Computational Biology and Bioinformatics, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    1545-5963
  • Type

    jour

  • DOI
    10.1109/TCBB.2012.116
  • Filename
    6275431