DocumentCode
2699200
Title
Discovery of Maximally Frequent Tag Tree Patterns with Height-Constrained Variables from Semistructured Web Documents
Author
Suzuki, Yusuke ; Miyahara, Tetsuhiro ; Shoudai, Takayoshi ; Uchida, Tomoyuki ; Nakamura, Yasuaki
Author_Institution
Fac. of Inf. Sci., Hiroshima City Univ.
fYear
2005
fDate
8-9 April 2005
Firstpage
104
Lastpage
112
Abstract
In order to realize Web information retrieval using characteristic tree structured patterns in semistructured Web documents, methods for discovering frequent patterns or common characteristics in semistructured documents become more and more important. We have studied methods for discovering maximally frequent tree structured patterns in semistructured Web documents. A tag tree pattern is an edge labeled tree with ordered children and structured variables. An edge label of a tag tree pattern is a tag or a keyword in Web documents, or a wildcard for any string. Each variable, which matches any subtree, represents a field of a Web document. A tag tree pattern is much more powerful than a usual tree structured pattern. In order to represent tree structured patterns with rich structural features, we introduce a new kind of variables, called height-constrained variables. An (i, j)-height-constrained variable matches any subtree such that the trunk length of the subtree is at least i and the height of the subtree is at most j. We propose a method for generating all maximally frequent tag tree patterns with height-constrained variables and no variable-chain
Keywords
Internet; data mining; document handling; information retrieval; tree data structures; Web information retrieval; characteristic tree structured patterns; edge labeled tree; frequent pattern discovery; height-constrained variables; maximally frequent tag tree patterns; semistructured Web documents; structured variables; Character generation; Conferences; Data mining; Data models; HTML; Informatics; Information retrieval; Internet; Technical Activities Guide -TAG; XML;
fLanguage
English
Publisher
ieee
Conference_Titel
Web Information Retrieval and Integration, 2005. WIRI '05. Proceedings. International Workshop on Challenges in
Conference_Location
Tokyo
Print_ISBN
0-7695-2414-1
Type
conf
DOI
10.1109/WIRI.2005.40
Filename
1553002
Link To Document