• DocumentCode
    585853
  • Title

    Cut-off time calculation for user session identification by reference length

  • Author

    Kapusta, Jozef ; Munk, Michal ; Drlík, Martin

  • Author_Institution
    Dept. of Inf., Constantine the Philosopher Univ. in Nitra, Nitra, Slovakia
  • fYear
    2012
  • fDate
    17-19 Oct. 2012
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    One of the methods of web log mining is also discovering patterns of behavior of web site visitors. Based on the found users´ behavior patterns that are represented by sequence rules, it is possible to modify and improve web site of the organization. Data for the analysis are gained from the web server log file. These anonymous data represent the problem of unique identification of the web site visitor. The paper deals with less commonly used navigation-driven methods of user session identification. These methods assume that the user goes over several navigation pages during her/his visit until she/he finds the content page with required information. The content page is a page where the user spends considerably more time in comparison with navigation pages. The content page is considered to be the end of the session. Searching of the next content page using navigation pages constitutes a new user session. The division of pages into content and navigation pages is based on the calculation of cut-off time C. The verification of exponential distribution of variable that represents the time which user spent on the particular page is coessential. We prepared an experiment with data gained from log file of university web server. We tried to verify, if the time spent on web pages has exponential distribution and we estimated the value of cut-off time. The found results confirm our assumptions that the navigation oriented methods could be used to proper user session identification.
  • Keywords
    Internet; Web sites; behavioural sciences; data analysis; data mining; exponential distribution; Web log mining; Web pages; Web server log file; Web site visitor behavior; Web site visitor identification; content page; cut-off time calculation; data analysis; exponential distribution; exponential variable distribution; navigation-driven methods; pattern discovery; reference length; sequence rules; user behavior patterns; user session identification; Educational institutions; Exponential distribution; IP networks; Navigation; Web pages; Web servers; Cut-off Time; Reference Length; Session Identification; Web Log Mining;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Application of Information and Communication Technologies (AICT), 2012 6th International Conference on
  • Conference_Location
    Tbilisi
  • Print_ISBN
    978-1-4673-1739-9
  • Type

    conf

  • DOI
    10.1109/ICAICT.2012.6398500
  • Filename
    6398500