• DocumentCode
    124202
  • Title

    A Dynamic Approach to the Website Boundary Detection Problem Using Random Walks

  • Author

    Alshukri, Ayesh ; Coenen, Frans

  • Author_Institution
    Dept. of Comput. Sci., Univ. of Liverpool, Liverpool, UK
  • Volume
    2
  • fYear
    2014
  • fDate
    11-14 Aug. 2014
  • Firstpage
    9
  • Lastpage
    14
  • Abstract
    This paper presents an investigation into the Website Boundary Detection (WBD) problem in the dynamic context. In the dynamic context (as opposed to the static context) the web data to be considered is not fully available prior to the start of the website boundary detection process. The dynamic approaches presented in this paper are all probabilistic and based on the concept of random walks, three variations are considered: (i) the standard Random Walk (RW), (ii) a Self Avoiding RW and (iii) the Metropolis Hastings RW. The reported evaluation demonstrates that the proposed technique produces good WBD solutions while at the same time reducing the amount of "noise" pages visited. The best performing variation was found to be a Metropolis Hastings RW.
  • Keywords
    Web sites; random functions; Metropolis Hastings RW; WBD; Website boundary detection problem; random walks; self avoiding RW; Compounds; Context; Educational institutions; Noise; Probabilistic logic; Radiation detectors; Web pages; Clustering; K-means; Metropolis Hastings Random Walk; Random Walks; Self Avoiding Random Walk; Website Boundary Detection;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web Intelligence (WI) and Intelligent Agent Technologies (IAT), 2014 IEEE/WIC/ACM International Joint Conferences on
  • Conference_Location
    Warsaw
  • Type

    conf

  • DOI
    10.1109/WI-IAT.2014.74
  • Filename
    6927601