Abstract :
We evaluate the performance of range queries in the Recursive List of Clusters (RLC) metric data structure, when the metric spaces are natural language dictionaries with the Levenshtein distance. The study compares RLC with five data structures (GNAT, H-Dsatl, LAESA, LC, and vp-trees) and comprises six dictionaries. The natural language dictionaries (in English, French, German, Italian, Portuguese, and Spanish), are characterised according to the mean and the variance of the histograms of distances. The experimental results show that RLC has a good performance in all tested cases and, in some of them, it outperforms all the other data structures. In addition, RLC is the only data structure that always keeps its good performance, whether the space dimension is lower or higher, and whether the query radius is smaller or larger.
Keywords :
data structures; dictionaries; natural languages; query processing; metric data structure; natural language dictionaries; range queries; recursive lists of clusters; DNA; Data structures; Dictionaries; Extraterrestrial measurements; Histograms; Image databases; Multimedia databases; Natural languages; Spatial databases; Testing;