Title :
A New Clustering and Preprocessing for Web Log Mining
Author :
Maheswari, B. Uma ; Sumathi, P.
Author_Institution :
Bharathiyar Univ., Coimbatore, India
fDate :
Feb. 27 2014-March 1 2014
Abstract :
World Wide Web is a massive repository of web pages and links. It provides information about vast area for the Internet users. There is tremendous growth and development in internet. Users´ accesses are documented in web logs. Web usage mining is application of mining techniques in logs. Since due to tremendous usage, the log files are growing at a faster rate and the size is becoming huge. Preprocessing plays a vital role in efficient mining process as Log data is normally noisy and indistinct. Reconstruction of sessions and paths are completed by appending missing pages in preprocessing. Additionally, the transactions which illustrate the behavior of users are constructed exactly in preprocessing by calculating the Reference Lengths of user access by means of byte rate. Using Web clustering several types of objects can be clustered into different groups for various purposes. By using the theory of distribution in Dempster-Shafer´s theory, the belief function similarity measure in this algorithm adds to the clustering task the ability to capture the uncertainty among Web user´s navigation performance. This paper experiments about the accomplishment of preprocessing and clustering of web log. The experimental result shows the considerable performance of the proposed algorithm.
Keywords :
Internet; belief maintenance; data mining; inference mechanisms; pattern clustering; uncertainty handling; Dempster-Shafer theory; Internet; Web clustering; Web links; Web log mining; Web pages; Web usage mining; World Wide Web; belief function similarity measure; clustering task; data clustering; data preprocessing; log data; mining techniques; Algorithm design and analysis; Cleaning; Clustering algorithms; Data mining; Data preprocessing; IP networks; Web sites; Clustering; DataCleaning; Dempster-Shafer; Preprocessing;
Conference_Titel :
Computing and Communication Technologies (WCCCT), 2014 World Congress on
Conference_Location :
Trichirappalli
Print_ISBN :
978-1-4799-2876-7
DOI :
10.1109/WCCCT.2014.67