DocumentCode
740513
Title
Using Copulas in Data Mining Based on the Observational Calculus
Author
Holena, Martin ; Bajer, Lukas ; Scavnicky, Martin
Author_Institution
Institute of Computer Science, Academy of Sciences of the Czech Republic, Pod vod??renskou v?????? 2, Prague, Czech Republic
Volume
27
Issue
10
fYear
2015
Firstpage
2851
Lastpage
2864
Abstract
The objective of the paper is a contribution to data mining within the framework of the observational calculus, through introducing ǵeneralized quantifiers related to copulas. Fitting copulas to multidimensional data is an increasingly important method for analyzing dependencies, and the proposed quantifiers of observational calculus assess the results of estimating the structure of joint distributions of continuous variables by means of hierarchical Archimedean copulas. To this end, the existing theory of hierarchical Archimedean copulas has been slightly extended in the paper: It has been proven that sufficient conditions for the function defining a hierarchical Archimedean copula to be indeed a copula, which have so far been rigorously established only for the special case of fully nested Archimedean copulas, hold in general. These conditions allow us to define three new generalized quantifiers, which are then thoroughly validated on four benchmark data sets and one data set from a real-world application. The paper concludes by comparing the proposed quantifiers to a more traditional approach—maximum weight spanning trees.
Keywords
Calculus; Data mining; Estimation; Generators; Joints; Labeling; Random variables; Data mining; copulas; data mining; generalized quantifiers; hierarchical; hierarchical Archimedean copulas; joint probability distribution; observational calculus;
fLanguage
English
Journal_Title
Knowledge and Data Engineering, IEEE Transactions on
Publisher
ieee
ISSN
1041-4347
Type
jour
DOI
10.1109/TKDE.2015.2426705
Filename
7095574
Link To Document