• DocumentCode
    1837244
  • Title

    An Edit Distance Algorithm with Block Swap

  • Author

    Xia, Tian

  • Author_Institution
    Key Lab. of Data Eng. & Knowledge Eng., Renmin Univ. of China, Beijing
  • fYear
    2008
  • fDate
    18-21 Nov. 2008
  • Firstpage
    54
  • Lastpage
    59
  • Abstract
    The edit distance between two given strings X and Y is the minimum number of edit operations that transform X into Y. In ordinary course, string editing is based on character insert, delete, and substitute operations. It has been suggested that extending this model with block edits would be useful in applications such as DNA sequence comparison and sentence similarity computation. However, the existing algorithms have generally focused on the normalized edit distance, and seldom of them consider the block swap operations at a higher level. In this paper, we introduce an extended edit distance algorithm which permits insertions, deletions, and substitutions at character level, and also permits block swap operations. Experimental results on randomly generated strings verify the algorithm´s rationality and efficiency. The main contribution of this paper is that we present an algorithm to compute the lowest edit cost for string transformation with block swap in polynomial time, and propose a breaking points selection algorithm to improve the computation speed.
  • Keywords
    string matching; block swap operation; breaking points selection algorithm; character insert; normalized edit distance algorithm; string editing; string transformation; Conference management; Data engineering; Engineering management; Information management; Information resources; Knowledge engineering; Knowledge management; Laboratories; Resource management; Sequences; block swap; edit distance; edit operation; string matching;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Young Computer Scientists, 2008. ICYCS 2008. The 9th International Conference for
  • Conference_Location
    Hunan
  • Print_ISBN
    978-0-7695-3398-8
  • Electronic_ISBN
    978-0-7695-3398-8
  • Type

    conf

  • DOI
    10.1109/ICYCS.2008.14
  • Filename
    4708948