• DocumentCode
    1618869
  • Title

    Bayesian analysis of online newspaper log data

  • Author

    Wettig, Hannes ; Lahtinen, Jussi ; Lepola, Tuomas ; Myllymaki, Petri ; Tirri, Henry

  • Author_Institution
    Complex Syst. Comput. Group (CoSCo), Helsinki Univ., Finland
  • fYear
    2003
  • Firstpage
    282
  • Lastpage
    287
  • Abstract
    In this paper we address the problem of analyzing Web log data collected at a typical online newspaper site. We propose a two-way clustering technique based on probability theory. On one hand the suggested method clusters the readers of the online newspaper into user groups of similar browsing behaviour where the clusters are determined solely based on the click streams collected. On the other hand, the articles of the newspaper are clustered based on the reading behaviour of the users. The two-way clustering produces statistical user and page profiles that can be analyzed by domain experts for content personalization. In addition, the produced model can also be used for on-line prediction so that given the user cluster of a person entering the site, and the page cluster of an article of a newspaper one can infer whether or not the user will have a look at the page in question.
  • Keywords
    Bayes methods; Internet; Web sites; belief networks; electronic publishing; information retrieval; pattern clustering; Bayesian analysis; Web log data analysis; click streams; content personalization; on-line prediction; online newspaper log data; online newspaper site; probability theory; reader clustering; reading behaviour; similar browsing behaviour; statistical page profiles; statistical user profiles; two-way clustering technique; user groups; Bayesian methods; Cleaning; Clustering algorithms; Conferences; Data mining; Demography; Internet; Law; Legal factors; Uniform resource locators;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Applications and the Internet Workshops, 2003. Proceedings. 2003 Symposium on
  • Print_ISBN
    0-7695-1873-7
  • Type

    conf

  • DOI
    10.1109/SAINTW.2003.1210173
  • Filename
    1210173