Title :
String parsing-based similarity detection
Author :
Yang, Jia ; Speidel, Ulrich
Author_Institution :
Dept. of Comput. Sci., Auckland Univ., New Zealand
fDate :
29 Aug.-1 Sept. 2005
Abstract :
This paper compares the similarity-detection abilities of two string parsing algorithms from the Lempel-Ziv family and the T-decomposition algorithm proposed by Titchener against the Hamming and Levenshtein measures. Our results show that LZ and T-decomposition based measures work in a wider range of contexts. We also argue that T-decomposition based measures represent a good compromise between accuracy and time complexity.
Keywords :
Hamming codes; computational complexity; data compression; program compilers; Hamming measure; Lempel-Ziv family; Levenshtein measure; T-decomposition algorithm; context wider range; similarity-detection ability; string parsing algorithms; Area measurement; Automata; Compression algorithms; Computer science; Data compression; Data mining; History; Length measurement; Production; Time measurement;
Conference_Titel :
Information Theory Workshop, 2005 IEEE
Print_ISBN :
0-7803-9480-1
DOI :
10.1109/ITW.2005.1531901