Title :
Finding Dependency Trees from Binary Data
Author :
Sha, Chaofeng ; Tao, Dao ; Zhou, Aoying ; Qian, Weining
Author_Institution :
Dept. of Comput. Sci. & Eng., Fudan Univ., Shanghai
Abstract :
Much work has been done in finding interesting subsets of items, since it has broad applications in financial data analysis, e-commerce, text data mining, and so on. Though the well-known frequent pattern mining attracted much attention in research community, recently, more work has been devoted to analysis of more sophisticated relationships among items. Chow-Liu tree and low-entropy tree, for example, were used to summarize the frequent patterns. In this paper, we consider finding a novel dependency tree from binary data. It has several advantages over previous related work. Firstly, we propose a novel distance measure between items based on information theory, which captures the expected uncertainty in the item pairs and the mutual information between them. Based on this distance measure, we present a simple yet efficient algorithm for finding the dependency trees from binary data. We also show how our new approach can find applications in frequent pattern summarization. Our running example on synthetic dataset shows that our approach achieves good results compared to existing popular heuristics.
Keywords :
data mining; information theory; tree data structures; binary data; dependency trees; distance measure; frequent pattern mining; frequent pattern summarization; information theory; mutual information; synthetic dataset;
Conference_Titel :
Computer and Information Technology Workshops, 2008. CIT Workshops 2008. IEEE 8th International Conference on
Conference_Location :
Sydney, QLD
Print_ISBN :
978-0-7695-3242-4
Electronic_ISBN :
978-0-7695-3239-1
DOI :
10.1109/CIT.2008.Workshops.92