Title :
Identifying Patterns in Texts
Author :
Huang, Minhua ; Haralick, Robert M.
Author_Institution :
Dept. of Comput. Sci., City Univ. of New York, New York, NY, USA
Abstract :
We discuss a probabilistic graphical model for recognizing patterns in texts. It is derived from the probability function for a sequence of categories given a sequence of symbols under two reasonable conditional independence assumptions and represented by a product of combinations of conditional and marginal probability functions. The novelty of our model is that it has a mathematical representation which is completely different from existing graphical models such as CRFs, HMMs, and MEMMs. Moreover, it can be used for identifying various patterns in texts. Up to now, we have used this model for recognizing NP chunks and senses of a polysemous word in sentences. This model has achieved very promising results on standard data sets. In the future, we will use this model for extracting semantic roles in a sentence.
Keywords :
pattern recognition; text analysis; NP chunks; mathematical representation; pattern identification; polysemous word; probabilistic graphical model; probability function; text patterns; Computer science; Data mining; Graphical models; Hidden Markov models; Labeling; Mathematical model; Pattern recognition; Testing; Text recognition;
Conference_Titel :
Semantic Computing, 2009. ICSC '09. IEEE International Conference on
Conference_Location :
Berkeley, CA
Print_ISBN :
978-1-4244-4962-0
Electronic_ISBN :
978-0-7695-3800-6
DOI :
10.1109/ICSC.2009.22