DocumentCode
3204743
Title
An efficient uniform-cost normalized edit distance algorithm
Author
Arslan, Abdullah N. ; Egecioglu, Ömer
Author_Institution
Dept. of Comput. Sci., California Univ., Santa Barbara, CA, USA
fYear
1999
fDate
1999
Firstpage
8
Lastpage
15
Abstract
A common model for computing the similarity of two strings X and Y of lengths m, and n respectively with m⩾n, is to transform X into Y through a sequence of three types of edit operations: insertion, deletion, and substitution. The model assumes a given cost function which assigns a non-negative real weight to each edit operation. The amortized weight for a given edit sequence is the ratio of its weight to its length, and the minimum of this ratio over all edit sequences is the normalized edit distance. Existing algorithms for normalized edit distance computation with proven complexity bounds require O(mn2 ) time in the worst-case. We give an O(mn log n)-time algorithm for the problem when the cost function is uniform, i.e., the weight of each edit operation is constant within the same type, except substitutions can have different weights depending on whether they are matching or non-matching
Keywords
computational complexity; dynamic programming; string matching; text editing; amortized weight; complexity bounds; computational biology; cost function; deletion; dynamic programming; edit distance; edit operations; edit sequence; error correction; fractional programming; information retrieval; insertion; large databases; optical character recognition; pattern matching; pattern recognition; ratio minimization; signal processing; strings; substitution; text processing; uniform-cost normalized edit distance algorithm; Application software; Biomedical optical imaging; Computational biology; Computer science; Costs; Dynamic programming; Optical character recognition software; Optical signal processing; Sequences; Text processing;
fLanguage
English
Publisher
ieee
Conference_Titel
String Processing and Information Retrieval Symposium, 1999 and International Workshop on Groupware
Conference_Location
Cancun
Print_ISBN
0-7695-0268-7
Type
conf
DOI
10.1109/SPIRE.1999.796572
Filename
796572
Link To Document