• DocumentCode
    3307777
  • Title

    SAKU: A distributed system for data analysis in large-scale dataset based on cloud computing

  • Author

    Lei Qin ; Bin Wu ; Qing Ke ; Yuxiao Dong

  • Author_Institution
    Sch. of Comput. Sci., Beijing Univ. of Posts & Telecommun., Beijing, China
  • Volume
    2
  • fYear
    2011
  • fDate
    26-28 July 2011
  • Firstpage
    1257
  • Lastpage
    1261
  • Abstract
    Data analysis has been widely used in the enterprises for its high efficiency and accuracy, especially in the field of telecommunication industry, such as User Behavior Analysis, Customer Churn Prediction, etc. However, as the exponential growth of data, traditional data analysis tools can not handle such large-scale dataset. Furthermore, as business gets more and more complicated, there is much more requirement for integration of different data analysis tools. On the other hand, traditional analysis tools lack of visualization, which makes the result hard to understand. We propose a distributed system named SAKU, which resolves those problems. In this paper, we implement some algorithms using mapreduce framework in order to process large-scale data. We also discuss every part of the system. Furthermore, we come up with a new report framework based on cloud computing for visualization of largescale data. The most important thing is, we apply this system into a scenario which meets real-world requirements by using a large volume of data obtained from the telecom operators, which demonstrates high efficiency and scalability of the system.
  • Keywords
    cloud computing; data analysis; data visualisation; distributed databases; very large databases; SAKU; business; cloud computing; data analysis tools; data visualization; distributed system; large-scale dataset; mapreduce; telecom operators; telecommunication industry; Algorithm design and analysis; Business; Clustering algorithms; Data analysis; Data mining; Telecommunications; cloud computing; distributed system; large-scale dataset; mapreduce; report;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Fuzzy Systems and Knowledge Discovery (FSKD), 2011 Eighth International Conference on
  • Conference_Location
    Shanghai
  • Print_ISBN
    978-1-61284-180-9
  • Type

    conf

  • DOI
    10.1109/FSKD.2011.6019711
  • Filename
    6019711