DocumentCode :
2080094
Title :
Efficient identification of coupled entities in document collections
Author :
Sarkas, Nikos ; Angel, Albert ; Koudas, Nick ; Srivastava, Divesh
Author_Institution :
Univ. of Toronto, Toronto, ON, Canada
fYear :
2010
fDate :
1-6 March 2010
Firstpage :
769
Lastpage :
772
Abstract :
The relentless pace at which textual data are generated on-line necessitates novel paradigms for their understanding and exploration. To this end, we introduce a methodology for discovering strong entity associations in all the slices (meta-data value restrictions) of a document collection. Since related documents mention approximately the same group of core entities (people, locations, etc.), the groups of coupled entities discovered can be used to expose themes in the document collection. We devise and evaluate algorithms capable of addressing two flavors of our core problem: algorithm THR-ENT for computing all sufficiently strong entity associations and algorithm TOP-ENT for computing the top-k strongest entity associations, for each slice of the document collection.
Keywords :
text analysis; THR-ENT algorithm; TOP-ENT algorithm; coupled entity identification; document collections; metadata value restrictions; strong entity associations; textual data; Association rules; Data mining; Demography; Diseases; Information services; Internet; TV; Testing; User-generated content; Web sites;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Engineering (ICDE), 2010 IEEE 26th International Conference on
Conference_Location :
Long Beach, CA
Print_ISBN :
978-1-4244-5445-7
Electronic_ISBN :
978-1-4244-5444-0
Type :
conf
DOI :
10.1109/ICDE.2010.5447820
Filename :
5447820
Link To Document :
بازگشت