• DocumentCode
    3705111
  • Title

    Performance enhancement of distributed K-Means clustering for big Data analytics through in-memory computation

  • Author

    Shwet Ketu;Sonali Agarwal

  • Author_Institution
    Indian Institute of Information Technology, Allahabad, India
  • fYear
    2015
  • Firstpage
    318
  • Lastpage
    324
  • Abstract
    Big Data analytics are recently coming up as prominent research area in the field of Information Technology serving various data driven domains for effective processing of big data. Big data analytics have been facing various challenges such as inefficient storage, processing delays, low rate of information retrieval, complex algorithms which cannot be handled and managed using traditional methods. For assisting software developers to deal with big data challenges, new programming frameworks are required. In this research paper Hadoop MapReduce and Apache Spark are taken for this purpose which supports on-disk and in-memory computation respectively. Clustering is one of the important tasks of big data mining used for information retrieval and knowledge discovery. In this research work, we are analyzing the performance of distributed K-Means clustering based on in-memory and on-disk computational models. For performance enhancement of distributed K-Means clustering, in-memory and on-disk computational models have been adopted and an experimental analysis has been performed.
  • Keywords
    "Sparks","Big data","Computational modeling","Data models","Programming","Java","Clustering algorithms"
  • Publisher
    ieee
  • Conference_Titel
    Contemporary Computing (IC3), 2015 Eighth International Conference on
  • Print_ISBN
    978-1-4673-7947-2
  • Type

    conf

  • DOI
    10.1109/IC3.2015.7346700
  • Filename
    7346700