DocumentCode :
1130657
Title :
Data preprocessing for WUM
Author :
Tanasa, Doru ; Trousse, Brigitte
Author_Institution :
Inst. Nat. de Recherche en Inf. et Autom., Sophia-Antipolis, France
Volume :
23
Issue :
3
fYear :
2004
Firstpage :
22
Lastpage :
25
Abstract :
This paper focuses on data preprocessing for WUM. Web page mining (WUM) applies data procedures to analyze user access of Well sites. As with any knowledge, discovery and data mining (KDD) process, WUM contains three main steps: preprocessing, knowledge extraction and results analysis. This data preprocessing try to determine the exact list of users who accessed the Web site and to reconstitute user sessions-the sequence of actions each user performed at the Web site. For privacy reasons, the preprocessing users use Web server log files from Web servers as well as the Website map and then anonymizing and joining log files are used. The data preprocessing involves data fusion, data cleaning, data structuration and data summarization. This data preprocessing not only reduces the log file size but also increases the quality of available data through the new data structures.
Keywords :
Web design; data mining; data privacy; data structures; file servers; WUM; Web page mining; Web server log file; Website map; Well site user access; anonymizing log file; data analysis; data cleaning; data fusion; data preprocessing; data structuration; data summarization; joining log file; knowledge extraction; knowledge-discovery-data mining process; log file size; privacy reason; user session; Data mining; Data preprocessing; File servers; Internet; Protocols; Robotics and automation; Robots; Search engines; Software tools; Web pages;
fLanguage :
English
Journal_Title :
Potentials, IEEE
Publisher :
ieee
ISSN :
0278-6648
Type :
jour
DOI :
10.1109/MP.2004.1341781
Filename :
1341781
Link To Document :
بازگشت