DocumentCode :
2699200
Title :
Discovery of Maximally Frequent Tag Tree Patterns with Height-Constrained Variables from Semistructured Web Documents
Author :
Suzuki, Yusuke ; Miyahara, Tetsuhiro ; Shoudai, Takayoshi ; Uchida, Tomoyuki ; Nakamura, Yasuaki
Author_Institution :
Fac. of Inf. Sci., Hiroshima City Univ.
fYear :
2005
fDate :
8-9 April 2005
Firstpage :
104
Lastpage :
112
Abstract :
In order to realize Web information retrieval using characteristic tree structured patterns in semistructured Web documents, methods for discovering frequent patterns or common characteristics in semistructured documents become more and more important. We have studied methods for discovering maximally frequent tree structured patterns in semistructured Web documents. A tag tree pattern is an edge labeled tree with ordered children and structured variables. An edge label of a tag tree pattern is a tag or a keyword in Web documents, or a wildcard for any string. Each variable, which matches any subtree, represents a field of a Web document. A tag tree pattern is much more powerful than a usual tree structured pattern. In order to represent tree structured patterns with rich structural features, we introduce a new kind of variables, called height-constrained variables. An (i, j)-height-constrained variable matches any subtree such that the trunk length of the subtree is at least i and the height of the subtree is at most j. We propose a method for generating all maximally frequent tag tree patterns with height-constrained variables and no variable-chain
Keywords :
Internet; data mining; document handling; information retrieval; tree data structures; Web information retrieval; characteristic tree structured patterns; edge labeled tree; frequent pattern discovery; height-constrained variables; maximally frequent tag tree patterns; semistructured Web documents; structured variables; Character generation; Conferences; Data mining; Data models; HTML; Informatics; Information retrieval; Internet; Technical Activities Guide -TAG; XML;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Web Information Retrieval and Integration, 2005. WIRI '05. Proceedings. International Workshop on Challenges in
Conference_Location :
Tokyo
Print_ISBN :
0-7695-2414-1
Type :
conf
DOI :
10.1109/WIRI.2005.40
Filename :
1553002
Link To Document :
بازگشت