DocumentCode :
1411206
Title :
CoCITe—Coordinating Changes in Text
Author :
Wright, Jeremy ; Grothendieck, John
Author_Institution :
AT&T Labs.-Res., Florham Park, NJ, USA
Volume :
24
Issue :
1
fYear :
2012
Firstpage :
15
Lastpage :
29
Abstract :
Text streams are ubiquitous and contain a wealth of information, but are typically orders of magnitude too large in scale for comprehensive human inspection. There is a need for tools that can detect and group changes occurring within text streams and substreams, in order to find, structure, and summarize these changes for presentation to human analysts. This paper describes a procedure for efficiently finding step changes, trends, bursts, and cyclic changes affecting frequencies of words, or more general lexical items, within streams of documents which may be optionally labeled with metadata. The common phenomenon of over-dispersion is accommodated using mixture distributions. A streaming implementation is described which can process data from a continuous feed. Anomalies can be detected, grouped, and rendered visually for human comprehension.
Keywords :
data mining; text analysis; CoCITe; comprehensive human inspection; continuous feed; general lexical item; human analyst; mixture distribution; over-dispersion; text streams; Data models; Dynamic programming; Heuristic algorithms; Multimedia communication; Statistical analysis; Text mining; Time frequency analysis; Statistical software; modeling structured; text mining.; textual and multimedia data;
fLanguage :
English
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
Publisher :
ieee
ISSN :
1041-4347
Type :
jour
DOI :
10.1109/TKDE.2010.250
Filename :
5674040
Link To Document :
بازگشت