• DocumentCode
    3575391
  • Title

    Scalability, Memory Issues and Challenges in Mining Large Data Sets

  • Author

    Kolici, Vladi ; Xhafa, Fatos ; Barolli, Leonard ; Lala, Algenti

  • Author_Institution
    Polytech. Univ. of Tirana, Tirana, Albania
  • fYear
    2014
  • Firstpage
    268
  • Lastpage
    273
  • Abstract
    Data mining is an active field of research and development aiming to automatically extract "knowledge" from analyzing data sets. Knowledge can be defined in different ways such as discovering (structured, frequent, approximate, etc.) patterns in data, grouping/clustering/bi-clustering data according to one or more criteria, finding association rules, etc. Such knowledge is then fed-back to decision support systems enabling end-users (actors) to make more informed decisions, which in economic terms could lead to advantages as compared to traditional decision support systems. It should be noted however, that data mining algorithms and frameworks have been proposed prior to the "Big Data" explosion. While data mining algorithms have considered efficiency and computational complexity as an important requirement, they did not take into account features of Big Data such as very large size, velocity with which data is generated, variety, etc. On the other hand, these features are indeed posing issues and challenges to data mining algorithms and frameworks. In this paper we analyse some of the issues in mining large data sets such as scalability and in-memory needs. We also show some computational results pointing out to such issues.
  • Keywords
    Big Data; computational complexity; data mining; decision support systems; pattern clustering; Big Data explosion; association rule; computational complexity; data bi-clustering; data clustering; data grouping; data mining algorithm; decision support system; knowledge extraction; large data sets; Algorithm design and analysis; Big data; Data mining; Distributed databases; Memory management; Scalability; Data Mining; Distributed Data Mining; Hadoop; Large Data Sets; Map Reduce; Memory; Scalability;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Networking and Collaborative Systems (INCoS), 2014 International Conference on
  • Print_ISBN
    978-1-4799-6386-7
  • Type

    conf

  • DOI
    10.1109/INCoS.2014.50
  • Filename
    7057101