• DocumentCode
    893846
  • Title

    Learning Finite-State Transducers: Evolution Versus Heuristic State Merging

  • Author

    Lucas, Simon M. ; Reynolds, T. Jeff

  • Author_Institution
    Dept. of Comput. Sci., Essex Univ., Colchester
  • Volume
    11
  • Issue
    3
  • fYear
    2007
  • fDate
    6/1/2007 12:00:00 AM
  • Firstpage
    308
  • Lastpage
    325
  • Abstract
    Finite-state transducers (FSTs) are finite-state machines (FSMs) that map strings in a source domain into strings in a target domain. While there are many reports in the literature of evolving FSMs, there has been much less work on evolving FSTs. In particular, the fitness functions required for evolving FSTs are generally different from those used for FSMs. In this paper, three string distance-based fitness functions are evaluated, in order of increasing computational complexity: string equality, Hamming distance, and edit distance. The fitness-distance correlation (FDC) and evolutionary performance of each fitness function is analyzed when used within a random mutation hill-climber (RMHC). Edit distance has the strongest FDC and also provides the best evolutionary performance, in that it is more likely to find the target FST within a given number of fitness function evaluations. Edit distance is also the most expensive to compute, but in most cases this extra computation is more than justified by its performance. The RMHC was compared with the best known heuristic method for learning FSTs, the onward subsequential transducer inference algorithm (OSTIA). On noise-free data, the RMHC performs best on problems with sparse training sets and small target machines. The RMHC and OSTIA offer similar performance for large target machines and denser data sets. When noise-corrupted data is used for training, the RMHC still performs well, while OSTIA performs poorly given even small amounts of noise. The RMHC is also shown to outperform a genetic algorithm. Hence, for certain classes of FST induction problem, the RMHC presented in this paper offers the best performance of any known algorithm
  • Keywords
    computational complexity; evolutionary computation; finite state machines; learning (artificial intelligence); Hamming distance; computational complexity; finite-state machines; finite-state transducers; fitness-distance correlation; onward subsequential transducer inference algorithm; random mutation hill-climber; sparse training sets; string equality; Application software; Computational complexity; Genetic mutations; Hamming distance; Humans; Inference algorithms; Machine learning; Merging; Performance analysis; Transducers; Finite-state transducer (FST); random mutation hill-climber (RMHC); state merging; string distance; string translation;
  • fLanguage
    English
  • Journal_Title
    Evolutionary Computation, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1089-778X
  • Type

    jour

  • DOI
    10.1109/TEVC.2006.880329
  • Filename
    4220679