• DocumentCode
    106504
  • Title

    Beyond Fixed-Resolution Alignment-Free Measures for Mammalian Enhancers Sequence Comparison

  • Author

    Comin, Matteo ; Verzotto, Davide

  • Author_Institution
    Dept. of Inf. Eng., Univ. of Padova, Padua, Italy
  • Volume
    11
  • Issue
    4
  • fYear
    2014
  • fDate
    July-Aug. 2014
  • Firstpage
    628
  • Lastpage
    637
  • Abstract
    The cell-type diversity is to a large degree driven by transcription regulation, i.e., enhancers. It has been recently shown that in high-level eukaryotes enhancers rarely work alone, instead they collaborate by forming clusters of cis-regulatory modules (CRMs). Even if the binding of transcription factors is sequence-specific, the identification of functionally similar enhancers is very difficult. A similarity measure to detect related regulatory sequences is crucial to understand functional correlation between two enhancers. This will allow large-scale analyses, clustering and genome-wide classifications. In this paper we present Under2, a parameter-free alignment-free statistic based on variable-length words. As opposed to traditional alignment-free methods, which are based on fixed-length patterns or, in other words, tied to a fixed resolution, our statistic is built upon variable-length words, and thus multiple resolutions are allowed. This will capture the great variability of lengths of CRMs. We evaluate several alignment-free statistics on simulated data and real ChIP-seq sequences. The new statistic is highly successful in discriminating functionally related enhancers and, in almost all experiments, it outperforms fixed-resolution methods. Finally, experiments on mouse enhancers show that Under2 can separate enhancers active in different tissues. Availability: http://www.dei.unipd.it/~ciompin/main/UnderIICRMS.html.
  • Keywords
    DNA; biochemistry; biology computing; cellular biophysics; genomics; molecular biophysics; molecular clusters; molecular configurations; proteins; statistical analysis; Under2; cell-type diversity; cis-regulatory module clusters; detect related regulatory sequences; fixed-resolution alignment-free measures; genome-wide classifications; high-level eukaryotes enhancers; large-scale analyses; mammalian enhancers sequence comparison; parameter-free alignment-free statistics; real ChIP-seq sequences; simulated data; tissues; transcription factor binding; transcription regulation; variable-length words; Bioinformatics; Computational biology; Computational modeling; Customer relationship management; Genomics; Alignment-free statistics; pattern discovery; regulatory sequences comparison;
  • fLanguage
    English
  • Journal_Title
    Computational Biology and Bioinformatics, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    1545-5963
  • Type

    jour

  • DOI
    10.1109/TCBB.2014.2306830
  • Filename
    6744577