Title :
Simsearcher: a local similarity search engine for biological sequence databases
Author :
Tsai, Tian-Haw ; Lee, Suh-Yin
Author_Institution :
Dept. of Comput. Sci. & Inf. Eng., Nat. Chiao Tung Univ., Hsinchu, Taiwan
Abstract :
An efficient local similarity search engine is developed by exploiting some techniques of data mining. All frequent patterns in the database are retrieved and recorded in a one-time preprocessing process. Then a query sequence is checked to see whether any pattern from the preprocessing stage is matched to the query. Two regions coming from the query and a database sequence that both match a pattern form a possible seed for local similarity. Finally, we extend and score each such seed region pair to see whether there really exists local similarity with a score high enough for reporting. For computational efficiency, a novel clustering approach is proposed and integrated into the proposed system, which is based on the local similarity search engine - the DELPHI system proposed by IBM. Extensive experiments are demonstrated to show the performance of our system.
Keywords :
biology computing; data mining; pattern matching; query processing; scientific information systems; search engines; sequences; IBM DELPHI system; SimSearcher; biological sequence databases; data mining; database sequence; local similarity search engine; pattern matching; query sequence; Biology; Computational efficiency; Computer science; Data engineering; Data mining; Databases; Genetic mutations; Information retrieval; Pattern matching; Search engines;
Conference_Titel :
Multimedia Software Engineering, 2003. Proceedings. Fifth International Symposium on
Conference_Location :
Taichung, Taiwan
Print_ISBN :
0-7695-2031-6
DOI :
10.1109/MMSE.2003.1254456