DocumentCode :
3248590
Title :
Online algorithms for mining semi-structured data stream
Author :
Asai, Tatsuya ; Arimura, Hiroki ; Abe, Kenji ; Kawasoe, Shinji ; Arikawa, Setsuo
Author_Institution :
Dept. of Inf., Kyushu Univ., Fukuoka, Japan
fYear :
2002
fDate :
2002
Firstpage :
27
Lastpage :
34
Abstract :
In this paper, we study an online data mining problem from streams of semi-structured data such as XML data. Modeling semi-structured data and patterns as labeled ordered trees, we present an online algorithm StreamT that receives fragments of an unseen possibly infinite semi-structured data in the document order through a data stream, and can return the current set of frequent patterns immediately on request at any time. A crucial part of our algorithm is the incremental maintenance of the occurrences of possibly frequent patterns using a tree sweeping technique. We give modifications of the algorithm to other online mining model. We present theoretical and empirical analyses to evaluate the performance of the algorithm.
Keywords :
data mining; data structures; hypermedia markup languages; pattern recognition; StreamT; XML data; frequent pattern discovery; incremental maintenance; online algorithm; online data mining; semi-structured data; tree sweeping technique; Algorithm design and analysis; Data communication; Data mining; Informatics; Monitoring; Pattern analysis; Performance analysis; Technology management; Web pages; XML;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining, 2002. ICDM 2003. Proceedings. 2002 IEEE International Conference on
Print_ISBN :
0-7695-1754-4
Type :
conf
DOI :
10.1109/ICDM.2002.1183882
Filename :
1183882
Link To Document :
بازگشت