• DocumentCode
    3134471
  • Title

    String join using precedence count matrix

  • Author

    Cao, Xia ; Tung, Anthony K H ; Ooi, Beng Chin ; Tan, Kian-Lee ; Li, Shuai Cheng

  • Author_Institution
    Dept. of Comput. Sci., National Univ. of Singapore, Singapore
  • fYear
    2004
  • fDate
    21-23 June 2004
  • Firstpage
    345
  • Lastpage
    348
  • Abstract
    In this paper; we propose a filter-and-refine string join algorithm. While the filtering phase can rapidly prune away strings that are not joinable, the refinement phase employs a comprehensive algorithm to remove the remaining false alarms. The efficiency of the proposed scheme lies in the use of the precedence count matrix (PCM) for computing the edit distance between two sequences. With PCM, the complexity of sequence comparison is a constant time. We also evaluated the proposed sequence join algorithm, and our study shows that it outperforms the known techniques.
  • Keywords
    DNA; distributed databases; genetics; query languages; relational databases; scientific information systems; string matching; DNA sequences; constant time complexity; false alarm removal; filter-and-refine string join algorithm; genomic applications; precedence count matrix; sequence comparison; sequence edit distance computing; sequence join algorithm; string data manipulation; string pruning; string refinement; string similarity; Assembly; Bioinformatics; Computer science; Dynamic programming; Filtering algorithms; Filters; Finance; Genomics; Phase change materials;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Scientific and Statistical Database Management, 2004. Proceedings. 16th International Conference on
  • ISSN
    1099-3371
  • Print_ISBN
    0-7695-2146-0
  • Type

    conf

  • DOI
    10.1109/SSDM.2004.1311228
  • Filename
    1311228