Title :
Hierarchical sessionization at preprocessing level of WUM based on swarm intelligence
Author :
Hussain, Tasawar ; Asghar, Sohail ; Masood, Nayyer
Author_Institution :
Dept. of Comput. Sci., Muhammad Ali Jinnah Univ. (MAJU), Islamabad, Pakistan
Abstract :
Web based applications are increasing at an enormous speed and consequently its users are also increasing at an exponential speed. The evolutionary changes in technology have made it possible to capture the users´ essence and interactions with web applications through web server log file as web usage. The web usage Mining (WUM) is the process of discovering hidden patterns from the web usage. Due to large amount of “irrelevant information” in the web log, the original log file cannot be directly used in the WUM process. Therefore, the preprocessing of web log file becomes imperative. The proper analysis of web log file is beneficial to manage the websites effectively for administrative and users´ prospective. Web log preprocessing is an initial necessary step to improve the quality and efficiency of the later steps of WUM. There are number of techniques available at preprocessing level of WUM such as data cleaning; data filtering; user identification; session identification and session clustering. In this research paper, a complete preprocessing technique is being proposed to preprocess the web log for extraction of user patterns. Data cleaning algorithm removes the irrelevant entries from web log and filtering algorithm discards the uninterested attributes from log file. User and sessions are identified. Proposed hierarchical sessionization algorithm generates the hierarchy of sessions. We obtain unbiased hierarchical clusters from the web log file.
Keywords :
Internet; Web sites; data mining; information filtering; pattern clustering; Web based application; Web log file analysis; Web log preprocessing; Web server log file; Web usage mining; Website management; data cleaning; data filtering; evolutionary change; hierarchical sessionization; pattern discovery; session clustering; session identification; swarm intelligence; user identification; user pattern extraction; Cleaning; Clustering algorithms; Data mining; Euclidean distance; Filtering; Filtering algorithms; IP networks; Hierarchical Sessionization; Particle Swarm; Preprocessing; Structured Information; Web Usage Mining;
Conference_Titel :
Emerging Technologies (ICET), 2010 6th International Conference on
Conference_Location :
Islamabad
Print_ISBN :
978-1-4244-8057-9
DOI :
10.1109/ICET.2010.5638388