DocumentCode
3225043
Title
Common substring in multiple sequences using hash based technique
Author
Dheenadayalan, Kumar ; Muralidhara, V.N. ; Katru, Jayakrishna
Author_Institution
Int. Inst. of Inf. Technol., Bangalore, India
fYear
2013
fDate
23-26 June 2013
Firstpage
140
Lastpage
145
Abstract
Searching for the longest common substring in multiple sequences is of great practical application in the field of Bioinformatics. Two memory efficient solutions to the problem of finding common substrings in multiple sequences are proposed in this paper. First algorithm is a combination of hashing technique and Suffix Tree to find common substrings in long DNA or Protein sequences. This algorithm is three times more memory efficient when compared to other alternate data structures. k-Truncated Suffix Tree, a variation of Suffix Tree was proposed recently to find common substrings for short sequences. The second algorithm uses hashing with separate chaining for short sequences which offers a memory advantage of around 10 times when compared to k-truncated Suffix Tree. These algorithms also offer a great potential for parallelization of the search process which can improve the run time of the search by a large factor.
Keywords
DNA; bioinformatics; molecular biophysics; proteins; string matching; tree data structures; tree searching; bioinformatics; data structures; hash-based technique; long DNA sequences; longest common substring search; multiple sequences; protein sequences; search process parallelization; short sequences; truncated suffix tree; Bioinformatics; Genomics; Irrigation; bioinformatics; hashing; k-truncated suffix tree; longest common substring; suffix tree;
fLanguage
English
Publisher
ieee
Conference_Titel
Technology, Informatics, Management, Engineering, and Environment (TIME-E), 2013 International Conference on
Conference_Location
Bandung
Print_ISBN
978-1-4673-5730-2
Type
conf
DOI
10.1109/TIME-E.2013.6611980
Filename
6611980
Link To Document