DocumentCode :
140820
Title :
Complete discovery of high-quality patterns in large numerical tensors
Author :
Cerf, Loic ; Meira, Wagner
Author_Institution :
Dept. of Comput. Sci., Univ. Fed. de Minas Gerais, Belo Horizonte, Brazil
fYear :
2014
fDate :
March 31 2014-April 4 2014
Firstpage :
448
Lastpage :
459
Abstract :
Many datasets are numerical tensors, i. e., associate n-tuples with numerical values. Until recently, the discovery of relevant local patterns in such numerical and multidimensional data has received little attention despite the broad applicative perspectives offered by this general framework. Even in the simpler 2-dimensional case, almost every proposal so far is either incomplete (i. e., it does not list every pattern) or relies on binning and mines Boolean tensors. In both cases, some information is lost during the process. In uncertain tensors, n-tuples satisfy the studied predicate to a certain extent and no information is lost w.r.t. the original data. Given an uncertain tensor, the closed patterns are its maximal “sub-tensors” covering n-tuples that “mostly” satisfy the predicate. Defining “mostly” is the key problem: the patterns should be both relevant given the data and efficiently extractable. The proposed complete extractor reuses the enumeration principles of the state-of-the-art miner for closed n-sets but incrementally enforces the newly designed definition. In this way, the proposed algorithm runs orders of magnitude faster than its only competitor and large datasets are tractable. The experimental section reports the discovery of dynamic patterns of influence in Twitter as well as usage patterns in a transportation network. Additional experiments on synthetic data quantitatively assess the quality of the chosen definition for the patterns.
Keywords :
Boolean algebra; data mining; set theory; social networking (online); tensors; Boolean tensors; Twitter; closed n-sets; data mining; enumeration principles; high-quality patterns; n-tuples; numerical tensors; transportation network; Data mining; Itemsets; Noise; Proposals; Radiation detectors; Tensile stress; Writing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Engineering (ICDE), 2014 IEEE 30th International Conference on
Conference_Location :
Chicago, IL
Type :
conf
DOI :
10.1109/ICDE.2014.6816672
Filename :
6816672
Link To Document :
بازگشت