DocumentCode :
2918946
Title :
Mining Higher-Order Association Rules from Distributed Named Entity Databases
Author :
Li, Shenzhi ; Janneck, Christopher D. ; Belapurkar, Aditya P. ; Ganiz, Murat ; Yang, Xiaoning ; Dilsizian, Mark ; Wu, Tianhao ; Bright, John M. ; Pottenger, William M.
Author_Institution :
Department of Computer Science and Engineering, Lehigh University, 19 Memorial Dr. Bethlehem, PA 18015, shl3@lehigh.edu
fYear :
2007
fDate :
23-24 May 2007
Firstpage :
236
Lastpage :
243
Abstract :
The burgeoning amount of textual data in distributed sources combined with the obstacles involved in creating and maintaining central repositories motivates the need for effective distributed information extraction and mining techniques. Recently, as the need to mine patterns across distributed databases has grown, Distributed Association Rule Mining (D-ARM) algorithms have been developed. These algorithms, however, assume that the databases are either horizontally or vertically distributed. In the special case of databases populated from information extracted from textual data, existing D-ARM algorithms cannot discover rules based on higher-order associations between items in distributed textual documents that are neither vertically nor horizontally distributed, but rather a hybrid of the two. In this article we present D-HOTM, a framework for Distributed Higher Order Text Mining. Unlike existing algorithms, D-HOTM requires neither full knowledge of the global schema nor that the distribution of data be horizontal or vertical. D-HOTM discovers rules based on higher-order associations between distributed database records containing the extracted entities. In this paper, two approaches to the definition and discovery of higher order itemsets are presented. The implementation of D-HOTM is based on the TMI [20] and tested on a cluster at the National Center for Supercomputing Applications (NCSA). Results on a real-world dataset from the Richmond, VA police department demonstrate the performance and relevance of D-HOTM in law enforcement and homeland defense.
Keywords :
Association rules; Chemical products; Computer science; Dairy products; Data mining; Distributed databases; Drugs; Joining processes; Law enforcement; Text mining; Distributed Higher Order Text Mining; Higher Order Association;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Intelligence and Security Informatics, 2007 IEEE
Conference_Location :
New Brunswick, NJ, USA
Electronic_ISBN :
1-4244-1329-X
Type :
conf
DOI :
10.1109/ISI.2007.379478
Filename :
4258704
Link To Document :
بازگشت