• DocumentCode
    2776219
  • Title

    Secret sequence comparison on public grid computing resources

  • Author

    Kurata, Ken-Ichi ; Nakamura, Hiroshi ; Breton, Vincent

  • Author_Institution
    Res. Center for Adv. Sci. & Technol., Tokyo Univ., Japan
  • Volume
    2
  • fYear
    2005
  • fDate
    9-12 May 2005
  • Firstpage
    832
  • Abstract
    Once a new gene has been sequenced, it must be verified whether or not it is similar to previously sequenced genes. In many cases, the organization that sequenced a potentially novel gene needs to keep the sequence itself in confidence. However, to compare the potentially novel sequence with known sequences, it must either be sent as a query to public databases, or these databases must be downloaded onto a local computer. In both cases, the potentially new sequence is exposed to the public. In this work, we propose a novel method to compare sequences without any exact sequence information leaks to the public. This method is based on our previous proposed method to find unique sequences on grid computing environments, which is well-parallelized in reasonable performance. In order to keep the exact sequence information in confidence, this method samples intervals (subsequences) from a sequence, and these intervals are hashed. Any key cryptosystem is not used. The hashed data are open to the public to verify the novelty of the sequence. The experimental results for 19797 h.sapiens genes show that the parallel implementation of this method performs reasonably well in terms of speed and memory usage. In this paper, the implementation on the world-wide testbeds of European Data Grid (EDG) and its results are described.
  • Keywords
    biology computing; file organisation; genetics; grid computing; sampling methods; sequences; EDG; European Data Grid; data hashing; exact sequence information; grid computing; public databases; public grid computing resources; secret gene sequence comparison; sequence interval sampling; Contracts; Costs; Cryptography; Databases; Distributed computing; Genomics; Grid computing; Sampling methods; Sequences; Testing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cluster Computing and the Grid, 2005. CCGrid 2005. IEEE International Symposium on
  • Print_ISBN
    0-7803-9074-1
  • Type

    conf

  • DOI
    10.1109/CCGRID.2005.1558648
  • Filename
    1558648