Abstract :
The edit distance between two given strings X and Y is the minimum number of edit operations that transform X into Y. In ordinary course, string editing is based on character insert, delete, and substitute operations. It has been suggested that extending this model with block edits would be useful in applications such as DNA sequence comparison and sentence similarity computation. However, the existing algorithms have generally focused on the normalized edit distance, and seldom of them consider the block swap operations at a higher level. In this paper, we introduce an extended edit distance algorithm which permits insertions, deletions, and substitutions at character level, and also permits block swap operations. Experimental results on randomly generated strings verify the algorithm´s rationality and efficiency. The main contribution of this paper is that we present an algorithm to compute the lowest edit cost for string transformation with block swap in polynomial time, and propose a breaking points selection algorithm to improve the computation speed.
Keywords :
string matching; block swap operation; breaking points selection algorithm; character insert; normalized edit distance algorithm; string editing; string transformation; Conference management; Data engineering; Engineering management; Information management; Information resources; Knowledge engineering; Knowledge management; Laboratories; Resource management; Sequences; block swap; edit distance; edit operation; string matching;