Title :
Scalability of OAT
Author :
Mizher, Jason ; Dunham, Margaret H. ; Lu, Lin ; Xiao, Yongqiao
Author_Institution :
Dept. of Comput. Sci. & Eng., Southern Methodist Univ., Dallas, TX, USA
Abstract :
Summary form only given. Mining user access patterns from clickstream data has attracted much attention from the research community. However, the scalability testing of corresponding mining algorithms has been virtually ignored. Memory requirements of these algorithms may be quite large due to the fact that in-memory data structures whose size depends on the number and length of patterns is often assumed. Due to the importance of the scalability of algorithms to the usefulness of the Web usage mining (WUM) techniques, we propose two new sampling techniques, continuous and random, which can be applied to static sized test datasets to examine WUM algorithm scalability. We illustrate the usefulness of these scalability approaches by performing scalability tests using the online adaptive traversal (OAT) pattern mining algorithm. These experiments show that indeed the OAT algorithm adjusts to the amount of memory and time requirements grow at a linear rate. This paper has several results: 1. The OAT algorithm is shown to be scalable in both space and time. The time grows at a linear rate, while the space adapts to available memory through compression. 2. Two sampling techniques are presented which facilitate the performance of scalability experiments against fixed size Web logs. 3. The impact of spiders crawling on the Web can have a disastrous impact on programs running to collect WUM statistics and patterns.
Keywords :
Internet; data mining; information retrieval; sampling methods; OAT algorithm; OAT scalability; Web crawling; Web logs; Web usage mining; clickstream data; in-memory data structures; online adaptive traversal pattern mining; sampling techniques; scalability testing; static sized test datasets; user access pattern mining; Association rules; Computer science; Data engineering; Data mining; Data structures; Partitioning algorithms; Performance evaluation; Sampling methods; Scalability; Testing;
Conference_Titel :
Computer Systems and Applications, 2005. The 3rd ACS/IEEE International Conference on
Print_ISBN :
0-7803-8735-X
DOI :
10.1109/AICCSA.2005.1387045