• DocumentCode
    713913
  • Title

    A sequence-based tree similarity search

  • Author

    Algergawy, Alsayed ; Klan, Friederike

  • Author_Institution
    Dept. of Distrib. Inf. Syst., Friedrich Schiller Univ. of Jena, Jena, Germany
  • fYear
    2015
  • fDate
    13-15 May 2015
  • Firstpage
    121
  • Lastpage
    126
  • Abstract
    Tree-structured data are pervasively growing and exploiting them based on similarity is essential for a broad number of applications. Therefore, there has been a growing need to develop high-performance techniques to efficiently look for similar trees across a large number of trees. To this end, in this paper, we present a new sequence-based approach for tree similarity search that exploits both the structural and the content characteristics of tree-structured data. In particular, we transform tree data into sequence representations using a modified Prüfer sequence that constructs a one-to-one mapping between tree data and their sequence representations. We introduce a new tree sequence distance based on the structural information of the data tree, which filters out a set of false positive candidates. We then introduce a refinement step exploiting the content information of data trees. The preliminarily experimental results show that our algorithm achieves high performance. Our method is especially suitable for accelerating similarity computation in clustering and/or classification of large numbers of trees in massive datasets.
  • Keywords
    search problems; tree data structures; trees (mathematics); content characteristics; content information; data tree; false positive candidate; high-performance technique; modified Prüfer sequence; one-to-one mapping; refinement step; sequence representation; sequence-based approach; sequence-based tree similarity search; similarity computation; structural characteristics; structural information; tree sequence distance; tree-structured data; Edit distance; Extended Prüfer sequence; Similarity measure; Tree structured data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Research Challenges in Information Science (RCIS), 2015 IEEE 9th International Conference on
  • Conference_Location
    Athens
  • Type

    conf

  • DOI
    10.1109/RCIS.2015.7128871
  • Filename
    7128871