Title :
On the approximate pattern occurrences in a text
Author :
Régnier, Mireille ; Szpankowski, Wojciech
Author_Institution :
Inst. Nat. de Recherche en Inf. et Autom., Le Chesnay, France
Abstract :
Consider a given pattern H and a random text T generated randomly according to the Bernoulli model. We study the frequency of approximate occurrences of the pattern H in a random text when overlapping copies of the approximate pattern are counted separately. We provide exact and asymptotic formulae for mean, variance and probability of occurrence as well as asymptotic results including the central limit theorem and large deviations. Our approach is combinatorial: we first construct some language expressions that characterize pattern occurrences which are translated into generating functions, and finally we use analytical methods to extract asymptotic behaviours of the pattern frequency. Applications of these results include molecular biology, source coding, synchronization, wireless communications, approximate pattern matching, games, and stock market analysis. These findings are of particular interest to information theory (e.g., second-order properties of the relative frequency), and molecular biology problems (e.g., finding patterns with unexpected high or low frequencies, and gene recognition)
Keywords :
approximation theory; combinatorial mathematics; pattern recognition; probability; random processes; Bernoulli model; approximate occurrences; approximate pattern matching; approximate pattern occurrences; asymptotic behaviour; asymptotic formulae; central limit theorem; combinatorial approach; exact formulae; games; gene recognition; information theory; language expression; mean; molecular biology; overlapping copies; pattern frequency; probability of occurrence; random text; relative frequency; second-order properties; source coding; stock market analysis; synchronization; variance; wireless communications; Character generation; Data mining; Frequency synchronization; Information theory; Pattern analysis; Pattern matching; Pattern recognition; Source coding; Stock markets; Wireless communication;
Conference_Titel :
Compression and Complexity of Sequences 1997. Proceedings
Conference_Location :
Salerno
Print_ISBN :
0-8186-8132-2
DOI :
10.1109/SEQUEN.1997.666920