DocumentCode
1174280
Title
Discovering frequent episodes and learning hidden Markov models: a formal connection
Author
Laxman, Srivatsan ; Sastry, P.S. ; Unnikrishnan, K.P.
Author_Institution
Indian Inst. of Sci., Bangalore, India
Volume
17
Issue
11
fYear
2005
Firstpage
1505
Lastpage
1517
Abstract
This paper establishes a formal connection between two common, but previously unconnected methods for analyzing data streams: discovering frequent episodes in a computer science framework and learning generative models in a statistics framework. We introduce a special class of discrete hidden Markov models (HMMs), called episode generating HMMs (EGHs), and associate each episode with a unique EGH. We prove that, given any two episodes, the EGH that is more likely to generate a given data sequence is the one associated with the more frequent episode. To be able to establish such a relationship, we define a new measure of frequency of an episode, based on what we call nonoverlapping occurrences of the episode in the data. An efficient algorithm is proposed for counting the frequencies for a set of episodes. Through extensive simulations, we show that our algorithm is both effective and more efficient than current methods for frequent episode discovery. We also show how the association between frequent episodes and EGHs can be exploited to assess the significance of frequent episodes discovered and illustrate empirically how this idea may be used to improve the efficiency of the frequent episode discovery.
Keywords
data mining; hidden Markov models; sequences; statistical analysis; data sequence; data streams; formal connection; frequent episode discovery; generative model; hidden Markov model; statistical significance; temporal data mining; Application software; Computer science; Data analysis; Data mining; Frequency measurement; Hidden Markov models; Pattern analysis; Statistical analysis; Stochastic processes; Time series analysis; Hidden Markov Models; Index Terms- Temporal data mining; frequent episodes; sequential data; statistical significance.;
fLanguage
English
Journal_Title
Knowledge and Data Engineering, IEEE Transactions on
Publisher
ieee
ISSN
1041-4347
Type
jour
DOI
10.1109/TKDE.2005.181
Filename
1512036
Link To Document