Title :
Secret sequence comparison on public grid computing resources
Author :
Kurata, Ken-Ichi ; Nakamura, Hiroshi ; Breton, Vincent
Author_Institution :
Res. Center for Adv. Sci. & Technol., Tokyo Univ., Japan
Abstract :
Once a new gene has been sequenced, it must be verified whether or not it is similar to previously sequenced genes. In many cases, the organization that sequenced a potentially novel gene needs to keep the sequence itself in confidence. However, to compare the potentially novel sequence with known sequences, it must either be sent as a query to public databases, or these databases must be downloaded onto a local computer. In both cases, the potentially new sequence is exposed to the public. In this work, we propose a novel method to compare sequences without any exact sequence information leaks to the public. This method is based on our previous proposed method to find unique sequences on grid computing environments, which is well-parallelized in reasonable performance. In order to keep the exact sequence information in confidence, this method samples intervals (subsequences) from a sequence, and these intervals are hashed. Any key cryptosystem is not used. The hashed data are open to the public to verify the novelty of the sequence. The experimental results for 19797 h.sapiens genes show that the parallel implementation of this method performs reasonably well in terms of speed and memory usage. In this paper, the implementation on the world-wide testbeds of European Data Grid (EDG) and its results are described.
Keywords :
biology computing; file organisation; genetics; grid computing; sampling methods; sequences; EDG; European Data Grid; data hashing; exact sequence information; grid computing; public databases; public grid computing resources; secret gene sequence comparison; sequence interval sampling; Contracts; Costs; Cryptography; Databases; Distributed computing; Genomics; Grid computing; Sampling methods; Sequences; Testing;
Conference_Titel :
Cluster Computing and the Grid, 2005. CCGrid 2005. IEEE International Symposium on
Print_ISBN :
0-7803-9074-1
DOI :
10.1109/CCGRID.2005.1558648