• DocumentCode
    2564747
  • Title

    Secret sequence comparison in distributed computing environments by interval sampling

  • Author

    Kurata, K. ; Breton, V. ; Nakamura, H.

  • Author_Institution
    Res. Center for Adv. Sci. & Technol., Tokyo Univ., Japan
  • fYear
    2004
  • fDate
    7-8 Oct. 2004
  • Firstpage
    198
  • Lastpage
    205
  • Abstract
    Once a new gene has been sequenced, it must be verified whether or not it is similar to previously sequenced genes. In many cases, the organization that sequenced a potentially novel gene needs to keep the sequence itself in confidence. However, to compare the potentially novel sequence with known sequences, it must either be sent as a query to public databases, or these databases must be downloaded onto a local computer. In both cases, the potentially new sequence is exposed to the public. In this work, we propose a new method, called interval sampling, to compare sequences without leaking exact information about the new sequence. In order to keep the exact sequence information secret, this method samples intervals (subsequences) from a sequence, and these intervals are hashed. The hashed data are open to the public to verify the novelty of the sequence. We find that this method works well in parallel in a distributed computing environment, such as the Grid. The experimental results for 19797 h.sapiens genes and 25000 m.musculus genes show that the parallel implementation of this method performs reasonably well in terms of speed and memory usage.
  • Keywords
    biology computing; file organisation; genetics; grid computing; sampling methods; sequences; distributed computing environment; grid; h.sapiens gene; hashed data; interval sampling; m.musculus gene; public database; secret sequence comparison; Bioinformatics; Contracts; Costs; Databases; Distributed computing; Genomics; IP networks; Sampling methods; Search methods; Sequences;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Intelligence in Bioinformatics and Computational Biology, 2004. CIBCB '04. Proceedings of the 2004 IEEE Symposium on
  • Print_ISBN
    0-7803-8728-7
  • Type

    conf

  • DOI
    10.1109/CIBCB.2004.1393954
  • Filename
    1393954