• DocumentCode
    2193009
  • Title

    ATreeGrep: approximate searching in unordered trees

  • Author

    Shasha, Dennis ; Wang, Jason T L ; Shan, Huiyuan ; Zhang, Kaizhong

  • Author_Institution
    Courant Inst. of Math. Sci., New York Univ., NY, USA
  • fYear
    2002
  • fDate
    2002
  • Firstpage
    89
  • Lastpage
    98
  • Abstract
    An unordered labeled tree is a tree in which each node has a string label and the parent-child relationship is significant, but the order among siblings is unimportant. This paper presents an approach to the nearest neighbor search problem for these trees. Given a database D of unordered labeled trees and a query tree Q, the goal is to find those trees in D that "approximately" contain Q. Our approach is based on storing the paths of the trees in a suffix array and then counting the number of mismatching paths between the query tree and a data tree. To speed up a search, we use a hash-based technique to filter out unqualified data trees at an early stage of the search. Experimental results obtained by running our techniques on phylogenetic trees and synthetic data demonstrate the good performance of the proposed approach. We also discuss the use of our work in XML and scientific database management.
  • Keywords
    file organisation; hypermedia markup languages; query processing; scientific information systems; tree data structures; trees (mathematics); ATreeGrep; XML; approximate searching; data tree; database; hash-based technique; mismatching paths; nearest neighbor search problem; parent-child relationship; phylogenetic trees; query tree; scientific database management; string label; suffix array; synthetic data; tree path storage; unordered labeled tree; unqualified data tree filtering; Algorithm design and analysis; Change detection algorithms; Computer science; Decision trees; Filters; Nearest neighbor searches; Object oriented databases; Object oriented modeling; Phylogeny; XML;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Scientific and Statistical Database Management, 2002. Proceedings. 14th International Conference on
  • ISSN
    1099-3371
  • Print_ISBN
    0-7695-1632-7
  • Type

    conf

  • DOI
    10.1109/SSDM.2002.1029709
  • Filename
    1029709