• DocumentCode
    3628790
  • Title

    Investigation of internet system user behaviour using cluster analysis

  • Author

    Dariusz Krol;Michal Scigajlo;Bogdan Trawinski

  • Author_Institution
    Wroc?aw University of Technology, Institute of Applied Informatics, Wyb. Wyspia?skiego 27, 50-370, Poland
  • Volume
    6
  • fYear
    2008
  • fDate
    7/1/2008 12:00:00 AM
  • Firstpage
    3408
  • Lastpage
    3412
  • Abstract
    The method of the investigation of information Web system userspsila activity using a clustering method is presented in the paper. On the basis of a Web server log, anonymous sessions are determined in the form of a 65 dimensional vector, where dimensions represent individual Web system pages. Each dimension comprises the value of a measure of user interest in a page during a given session. This value is calculated as a ratio of time user spent visiting a given page to the total time of a session. Then the whole set of sessions is clustered using HCM (Hard C-Means) algorithm. The resulting clusters are assumed as the user activity patterns and among them clusters dominated by a page are selected as those where the user interest value exceeds a given threshold value e.g. 50 per cent. The sessions of named users, registered in the system, are determined using an application log of user activity. The frequencies of named user sessions, comprised by individual clusters, are calculated for a given period of time e.g. one month. The user activity can be assessed by analyzing frequencies obtained. For example, the user behavior can be regarded as deviated from normal pattern when the frequency of a session in a cluster dominated by a page is below a determined threshold value e.g. 10 per cent. The method was evaluated using data from a cadastral Web system exploited in an extranet.
  • Keywords
    "Internet","Machine learning","Cybernetics","Clustering algorithms","Local government","Data mining","Web server"
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Cybernetics, 2008 International Conference on
  • ISSN
    2160-133X
  • Print_ISBN
    978-1-4244-2095-7
  • Electronic_ISBN
    2160-1348
  • Type

    conf

  • DOI
    10.1109/ICMLC.2008.4620993
  • Filename
    4620993