Abstract :
Suppose m is an integer, and let M denote the finite set {1, ..., m}. Suppose {Yt}t∞=-∞ is a stationary stochastic process assuming values in the finite set M. The problem studied in this is paper is that of approximating the process {Yt} using a finite memory Markov process. Two distinct approximation problems are studied. First, suppose k is some fixed integer, and the frequencies of all k-tuples of the process {Yt} are specified. Suppose another integer l ≤ k - 2 is specified. The problem is to find the `best possible´ l-step Markov process such that the set of k-tuple frequencies of the Markov process is as close as possible to those of the original process {Yt}. (It is known that if l = k - 1, then it is possible to reproduce the k-tuple frequencies exactly.) This problem can be thought of as `partial realization using a multi-step Markov model.´ In the second problem, it is assumed that the process {Yt} to be approximated is itself a (k-1)-step Markov process. The problem is to find the `best possible´ l-step Markov process such that the Kullback-Leibler divergence rate is minimized. This problem can be thought of as `order reduction of Markov chains.´ Explicit solutions are given to both problems, and it is shown that they are closely related. These problems are motivated by some problems in genomics, specifically, finding genes from bacterial genomes. However, the results of applying these methods to genomics are not reported here for want of space.
Keywords :
Markov processes; approximation theory; information theory; Kullback-Leibler divergence rate; Markov chains; approximation problems; bacterial genomes; finite alphabet; finite memory Markov process; finite set; genes; genomics; integer; k-tuple frequencies; l-step Markov process; multistep Markov model; order reduction; partial realization; stationary stochastic process; stochastic modelling; Approximation methods; Hidden Markov models; Manganese; Markov processes; Vectors; Yttrium;