• DocumentCode
    35406
  • Title

    Parallelized user clicks recognition from massive HTTP data based on dependency graph model

  • Author

    Cheng Fang ; Jun Liu ; Zhenming Lei

  • Author_Institution
    Beijing Key Lab. of Network Syst. Archit. & Convergence, Beijing Univ. of Posts & Telecommun., Beijing, China
  • Volume
    11
  • Issue
    12
  • fYear
    2014
  • fDate
    Dec. 2014
  • Firstpage
    13
  • Lastpage
    25
  • Abstract
    With increasingly complex website structure and continuously advancing web technologies, accurate user clicks recognition from massive HTTP data, which is critical for web usage mining, becomes more difficult. In this paper, we propose a dependency graph model to describe the relationships between web requests. Based on this model, we design and implement a heuristic parallel algorithm to distinguish user clicks with the assistance of cloud computing technology. We evaluate the proposed algorithm with real massive data. The size of the dataset collected from a mobile core network is 228.7GB. It covers more than three million users. The experiment results demonstrate that the proposed algorithm can achieve higher accuracy than previous methods.
  • Keywords
    Web sites; data mining; graph theory; hypermedia; parallel algorithms; transport protocols; Web technologies; Web usage mining; Website structure; cloud computing; dependency graph model; heuristic parallel algorithm; massive HTTP data; mobile core network; parallelized user clicks recognition; Algorithm design and analysis; Big data; Computational modeling; Data mining; Data models; Data preprocessing; Internet; Parallel algorithms; cloud computing; graph model; massive data; web usage mining;
  • fLanguage
    English
  • Journal_Title
    Communications, China
  • Publisher
    ieee
  • ISSN
    1673-5447
  • Type

    jour

  • DOI
    10.1109/CC.2014.7019836
  • Filename
    7019836