DocumentCode
652358
Title
Detecting Associations in Large Dataset on MapReduce
Author
Dong Dai ; Xi Li ; Chao Wang ; Junneng Zhang ; Xuehai Zhou
Author_Institution
Comput. Sci. Coll., Univ. of Sci. & Technol. of China, Hefei, China
fYear
2013
fDate
16-18 July 2013
Firstpage
1788
Lastpage
1794
Abstract
In daily life, we are surrounded by all kinds of data. How to find the relationship between these data has become one of the most challenges before the data scientists. In 2011, David N. Reshef etc. took a great leap on solving this problem. They has proved that maximal information coefficient(mic) is an effective tool to detect different kinds of relationships between any given variable pairs no matter these relationships are functional or not. However, challenges remained because the computation procedure is too complex and time-consuming for large dataset and make this algorithm not possible to work in reality. In this paper, we explore the possible parallel ways to detect the associations between variables in large dataset, and propose a high performance MapReduce based solution, which includes data storage pattern, preprocessing algorithms, distributed memory cache mechanism, and a serial of MapReduce jobs. The experiments show that our parallel solution provide a linear speedup comparing with original algorithm without affecting the correctness. The work done in this paper makes the famous mic algorithm more practical in solving real problem.
Keywords
data handling; parallel processing; MapReduce based solution; MapReduce jobs; association detection; data storage pattern; distributed memory cache mechanism; large dataset; maximal information coefficient; mic algorithm; parallel solution; preprocessing algorithms; Complexity theory; Microwave integrated circuits; Mutual information; Parallel algorithms; Partitioning algorithms; Servers; Vectors; Associations; Distributed Algorithm; MapReduce; information theory;
fLanguage
English
Publisher
ieee
Conference_Titel
Trust, Security and Privacy in Computing and Communications (TrustCom), 2013 12th IEEE International Conference on
Conference_Location
Melbourne, VIC
Type
conf
DOI
10.1109/TrustCom.2013.222
Filename
6681053
Link To Document