DocumentCode :
2369567
Title :
Sequence modeling with mixtures of conditional maximum entropy distributions
Author :
Pavlov, Dmitry
fYear :
2003
fDate :
19-22 Nov. 2003
Firstpage :
251
Lastpage :
258
Abstract :
We present a novel approach to modeling sequences using mixtures of conditional maximum entropy (maxent) distributions. Our method generalizes the mixture of first-order Markov models by including the "long-term" dependencies in model components. The "long-term" dependencies are represented by the frequently used in the natural language processing (NLP) domain probabilistic triggers or rules (such as "A occurred k positions back"→"the current symbol is B" with probability P). The maxent framework is then used to create a coherent global probabilistic model from all selected triggers. We enhance this formalism by using probabilistic mixtures with maxent models as components, thus representing hidden or unobserved effects in the data. We demonstrate how our mixture of conditional maxent models can be learned from data using the generalized EM algorithm that scales linearly in the dimensions of the data and the number of mixture components. We present empirical results on the simulated and real-world data sets and demonstrate that the proposed approach enables us to create better quality models than the mixtures of first-order Markov models and resist overfitting and curse of dimensionality that would inevitably present themselves for the higher order Markov models.
Keywords :
hidden Markov models; learning (artificial intelligence); maximum entropy methods; natural languages; optimisation; EM algorithm; Markov model; NLP; conditional maximum entropy distribution; global probabilistic model; maxent model; natural language processing; sequence modelling; Analytical models; DNA; Data mining; Entropy; Hidden Markov models; History; Natural language processing; Proteins; Resists; Sequences;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining, 2003. ICDM 2003. Third IEEE International Conference on
Print_ISBN :
0-7695-1978-4
Type :
conf
DOI :
10.1109/ICDM.2003.1250927
Filename :
1250927
Link To Document :
بازگشت