• DocumentCode
    1264245
  • Title

    Fast indexing and visualization of metric data sets using slim-trees

  • Author

    Traina, Caetano, Jr. ; Traina, Agma ; Faloutsos, Christor ; Seeger, Bernhard

  • Author_Institution
    Dept. of Comput. Sci., Univ. of Sao Paulo, Sao Carlos, Brazil
  • Volume
    14
  • Issue
    2
  • fYear
    2002
  • Firstpage
    244
  • Lastpage
    260
  • Abstract
    Many recent database applications need to deal with similarity queries. For such applications, it is important to measure the similarity between two objects using the distance between them. Focusing on this problem, this paper proposes the slim-tree, a new dynamic tree for organizing metric data sets in pages of fixed size. The slim-tree uses the triangle inequality to prune the distance calculations that are needed to answer similarity queries over objects in metric spaces. The proposed insertion algorithm uses new policies to select the nodes where incoming objects are stored. When a node overflows, the slim-tree uses a minimal spanning tree to help with the splitting. The new insertion algorithm leads to a tree with high storage utilization and improved query performance. The slim-tree is a metric access method that tackles the problem of overlaps between nodes in metric spaces and that allows one to minimize the overlap. The proposed "fat-factor" is a way to quantify whether a given tree can be improved and also to compare two trees. We show how to use the fat-factor to achieve accurate estimates of the search performance and also how to improve the performance of a metric tree through the proposed "slim-down" algorithm. This paper also presents a new tool in the slim-tree\´s arsenal of resources, aimed at visualizing it. Visualization is a powerful tool for interactive data mining and for the visual tracking of the behavior of a tree under updates. Finally, we present a formula to estimate the number of disk accesses in range queries. Results from experiments with real and synthetic data sets show that the new slim-tree algorithms lead to performance improvements. These results show that the slim-tree outperforms the M-tree by up to 200% for range queries. For insertion and splitting, the minimal-spanning-tree-based algorithm achieves up to 40 times faster insertions. We observed improvements of up to 40% in range queries after applying the slim-down algorithm
  • Keywords
    data mining; data visualisation; database indexing; minimisation; multimedia databases; query processing; software performance evaluation; tree data structures; tree searching; data visualization; disk accesses; distance calculation pruning; distance metric; dynamic tree; fat-factor; fixed-size pages; incoming object storage; index structures; insertion algorithm; interactive data mining; metric access method; metric data set indexing; metric databases; minimal spanning tree; multimedia databases; node overflow; node overlap minimization; node selection policies; node splitting; query performance; range queries; search performance; selectivity estimation; similarity queries; similarity search; slim-down algorithm; slim-trees; storage utilization; tree updates; triangle inequality; visual tracking; Data visualization; Indexing;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/69.991715
  • Filename
    991715