• DocumentCode
    652358
  • Title

    Detecting Associations in Large Dataset on MapReduce

  • Author

    Dong Dai ; Xi Li ; Chao Wang ; Junneng Zhang ; Xuehai Zhou

  • Author_Institution
    Comput. Sci. Coll., Univ. of Sci. & Technol. of China, Hefei, China
  • fYear
    2013
  • fDate
    16-18 July 2013
  • Firstpage
    1788
  • Lastpage
    1794
  • Abstract
    In daily life, we are surrounded by all kinds of data. How to find the relationship between these data has become one of the most challenges before the data scientists. In 2011, David N. Reshef etc. took a great leap on solving this problem. They has proved that maximal information coefficient(mic) is an effective tool to detect different kinds of relationships between any given variable pairs no matter these relationships are functional or not. However, challenges remained because the computation procedure is too complex and time-consuming for large dataset and make this algorithm not possible to work in reality. In this paper, we explore the possible parallel ways to detect the associations between variables in large dataset, and propose a high performance MapReduce based solution, which includes data storage pattern, preprocessing algorithms, distributed memory cache mechanism, and a serial of MapReduce jobs. The experiments show that our parallel solution provide a linear speedup comparing with original algorithm without affecting the correctness. The work done in this paper makes the famous mic algorithm more practical in solving real problem.
  • Keywords
    data handling; parallel processing; MapReduce based solution; MapReduce jobs; association detection; data storage pattern; distributed memory cache mechanism; large dataset; maximal information coefficient; mic algorithm; parallel solution; preprocessing algorithms; Complexity theory; Microwave integrated circuits; Mutual information; Parallel algorithms; Partitioning algorithms; Servers; Vectors; Associations; Distributed Algorithm; MapReduce; information theory;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Trust, Security and Privacy in Computing and Communications (TrustCom), 2013 12th IEEE International Conference on
  • Conference_Location
    Melbourne, VIC
  • Type

    conf

  • DOI
    10.1109/TrustCom.2013.222
  • Filename
    6681053