• DocumentCode
    392433
  • Title

    Filtration of string proximity search via transformation

  • Author

    Aghili, S. Alireza ; Agrawal, Divyakant ; El Abbadi, Amr

  • Author_Institution
    Dept. of Comput. Sci., Univ. of California, Santa Barbara, CA, USA
  • fYear
    2003
  • fDate
    10-12 March 2003
  • Firstpage
    149
  • Lastpage
    157
  • Abstract
    The problem of proximity search in biological databases is addressed. We study vector transformations and conduct the application of DFT (Discrete Fourier Transformation) and DWT (Discrete Wavelet Transformation, Haar) dimensionality reduction techniques for DNA sequence proximity search to reduce the search time of range queries. Our empirical results on a number of Prokaryote and Eukaryote DNA contig databases demonstrate up to 50-fold filtration ratio of the search space, up to 13 times faster filtration. The proposed transformation techniques may easily be integrated as a preprocessing phase on top of the current existing similarity search heuristics such as BLAST, PattenHunter, FastA, QUASAR and to efficiently prune non-relevant sequences. We study the precision of applying dimensionality reduction techniques for faster and more efficient range query searches, and discuss the imposed trade-offs.
  • Keywords
    DNA; biology computing; database management systems; discrete Fourier transforms; discrete wavelet transforms; BLAST; Eukaryote DNA contig databases; FastA; PattenHunter; Prokaryote; QUASAR; efficient pruning; nonrelevant sequences; preprocessing phase; search heuristics; search time reduction; vector transformations; Bioinformatics; DNA; Databases; Filtration; Genomics; Humans; Pattern analysis; Phylogeny; Proteins; Sequences;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Bioinformatics and Bioengineering, 2003. Proceedings. Third IEEE Symposium on
  • Print_ISBN
    0-7695-1907-5
  • Type

    conf

  • DOI
    10.1109/BIBE.2003.1188941
  • Filename
    1188941