• DocumentCode
    3259212
  • Title

    Discovering Frequent Poly-Regions in DNA Sequences

  • Author

    Papapetrou, Panagiotis ; Benson, Gary ; Kollios, George

  • Author_Institution
    Dept. of Comput. Sci., Boston Univ., MA
  • fYear
    2006
  • fDate
    Dec. 2006
  • Firstpage
    94
  • Lastpage
    98
  • Abstract
    The problem of discovering arrangements of regions of high occurrence of one or more items of a given alphabet in a sequence is studied, and two efficient algorithms are proposed. The first one is entropy-based and uses an existing recursive segmentation technique to split the input sequence into a set of homogeneous segments. The key idea of the second approach is to use a set of sliding windows over the sequence. Each sliding window keeps a set of statistics of a sequence segment that mainly includes the number of occurrences of each item in that segment. Combining these statistics efficiently yields the complete set of regions of high occurrence of the items of the given alphabet. After identifying these regions, the sequence is converted to a sequence of labeled intervals (each one corresponding to a region). An efficient algorithm for mining frequent arrangements of event intervals is applied to the converted sequence to discover frequently occurring arrangements of these regions. The proposed algorithms are tested on various DNA sequences producing results with significant biological meaning
  • Keywords
    DNA; biology computing; data mining; entropy; molecular biophysics; statistics; DNA sequences; biological meaning; discovering frequent poly-regions; entropy; homogeneous segments; labeled interval sequence; recursive segmentation; sequence segment; sliding windows; Bioinformatics; Computer science; DNA; Genomics; Hidden Markov models; Maximum likelihood estimation; Predictive models; Proteins; Sequences; Statistics;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining Workshops, 2006. ICDM Workshops 2006. Sixth IEEE International Conference on
  • Conference_Location
    Hong Kong
  • Print_ISBN
    0-7695-2702-7
  • Type

    conf

  • DOI
    10.1109/ICDMW.2006.63
  • Filename
    4063605