• DocumentCode
    46230
  • Title

    i ^2 MapReduce: Incremental MapReduce for Mining Evolving Big Data

  • Author

    Yanfeng Zhang ; Shimin Chen ; Qiang Wang ; Ge Yu

  • Author_Institution
    Comput. Center, Northeastern Univ., Shenyang, China
  • Volume
    27
  • Issue
    7
  • fYear
    2015
  • fDate
    July 1 2015
  • Firstpage
    1906
  • Lastpage
    1919
  • Abstract
    As new data and updates are constantly arriving, the results of data mining applications become stale and obsolete over time. Incremental processing is a promising approach to refreshing mining results. It utilizes previously saved states to avoid the expense of re-computation from scratch. In this paper, we propose i2MapReduce, a novel incremental processing extension to MapReduce, the most widely used frameworkfor mining big data. Compared with the state-of-the-art work on Incoop, i2MapReduce (i) performs key-value pair level incremental processing rather than task level re-computation, (ii) supports not only one-step computation but also more sophisticated iterative computation, which is widely used in data mining applications, and (iii) incorporates a set of novel techniques to reduce I/O overhead for accessing preserved fine-grain computation states. We evaluate i2MapReduce using a one-step algorithm and four iterative algorithms with diverse computation characteristics. Experimental results on Amazon EC2 show significant performance improvements of i2MapReduce compared to both plain and iterative MapReduce performing re-computation.
  • Keywords
    Big Data; data handling; data mining; iterative methods; Amazon EC2; Big data mining; I/O overhead reduction; Incremental processing; fine-grain computation states; i2MapReduce; incremental MapReduce; iterative algorithm; one-step algorithm; Big data; Computational modeling; Data mining; Data models; Engines; Indexes; Programming; Incremental processing; MapReduce; big data; iterative computation;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2015.2397438
  • Filename
    7029111