• DocumentCode
    2699200
  • Title

    Discovery of Maximally Frequent Tag Tree Patterns with Height-Constrained Variables from Semistructured Web Documents

  • Author

    Suzuki, Yusuke ; Miyahara, Tetsuhiro ; Shoudai, Takayoshi ; Uchida, Tomoyuki ; Nakamura, Yasuaki

  • Author_Institution
    Fac. of Inf. Sci., Hiroshima City Univ.
  • fYear
    2005
  • fDate
    8-9 April 2005
  • Firstpage
    104
  • Lastpage
    112
  • Abstract
    In order to realize Web information retrieval using characteristic tree structured patterns in semistructured Web documents, methods for discovering frequent patterns or common characteristics in semistructured documents become more and more important. We have studied methods for discovering maximally frequent tree structured patterns in semistructured Web documents. A tag tree pattern is an edge labeled tree with ordered children and structured variables. An edge label of a tag tree pattern is a tag or a keyword in Web documents, or a wildcard for any string. Each variable, which matches any subtree, represents a field of a Web document. A tag tree pattern is much more powerful than a usual tree structured pattern. In order to represent tree structured patterns with rich structural features, we introduce a new kind of variables, called height-constrained variables. An (i, j)-height-constrained variable matches any subtree such that the trunk length of the subtree is at least i and the height of the subtree is at most j. We propose a method for generating all maximally frequent tag tree patterns with height-constrained variables and no variable-chain
  • Keywords
    Internet; data mining; document handling; information retrieval; tree data structures; Web information retrieval; characteristic tree structured patterns; edge labeled tree; frequent pattern discovery; height-constrained variables; maximally frequent tag tree patterns; semistructured Web documents; structured variables; Character generation; Conferences; Data mining; Data models; HTML; Informatics; Information retrieval; Internet; Technical Activities Guide -TAG; XML;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web Information Retrieval and Integration, 2005. WIRI '05. Proceedings. International Workshop on Challenges in
  • Conference_Location
    Tokyo
  • Print_ISBN
    0-7695-2414-1
  • Type

    conf

  • DOI
    10.1109/WIRI.2005.40
  • Filename
    1553002