DocumentCode
893846
Title
Learning Finite-State Transducers: Evolution Versus Heuristic State Merging
Author
Lucas, Simon M. ; Reynolds, T. Jeff
Author_Institution
Dept. of Comput. Sci., Essex Univ., Colchester
Volume
11
Issue
3
fYear
2007
fDate
6/1/2007 12:00:00 AM
Firstpage
308
Lastpage
325
Abstract
Finite-state transducers (FSTs) are finite-state machines (FSMs) that map strings in a source domain into strings in a target domain. While there are many reports in the literature of evolving FSMs, there has been much less work on evolving FSTs. In particular, the fitness functions required for evolving FSTs are generally different from those used for FSMs. In this paper, three string distance-based fitness functions are evaluated, in order of increasing computational complexity: string equality, Hamming distance, and edit distance. The fitness-distance correlation (FDC) and evolutionary performance of each fitness function is analyzed when used within a random mutation hill-climber (RMHC). Edit distance has the strongest FDC and also provides the best evolutionary performance, in that it is more likely to find the target FST within a given number of fitness function evaluations. Edit distance is also the most expensive to compute, but in most cases this extra computation is more than justified by its performance. The RMHC was compared with the best known heuristic method for learning FSTs, the onward subsequential transducer inference algorithm (OSTIA). On noise-free data, the RMHC performs best on problems with sparse training sets and small target machines. The RMHC and OSTIA offer similar performance for large target machines and denser data sets. When noise-corrupted data is used for training, the RMHC still performs well, while OSTIA performs poorly given even small amounts of noise. The RMHC is also shown to outperform a genetic algorithm. Hence, for certain classes of FST induction problem, the RMHC presented in this paper offers the best performance of any known algorithm
Keywords
computational complexity; evolutionary computation; finite state machines; learning (artificial intelligence); Hamming distance; computational complexity; finite-state machines; finite-state transducers; fitness-distance correlation; onward subsequential transducer inference algorithm; random mutation hill-climber; sparse training sets; string equality; Application software; Computational complexity; Genetic mutations; Hamming distance; Humans; Inference algorithms; Machine learning; Merging; Performance analysis; Transducers; Finite-state transducer (FST); random mutation hill-climber (RMHC); state merging; string distance; string translation;
fLanguage
English
Journal_Title
Evolutionary Computation, IEEE Transactions on
Publisher
ieee
ISSN
1089-778X
Type
jour
DOI
10.1109/TEVC.2006.880329
Filename
4220679
Link To Document