Title :
AutoPart: automating schema design for large scientific databases using data partitioning
Author :
Papadomanolakis, Stratos ; Ailamaki, Anastassia
Author_Institution :
Carnegie Mellon Univ., Pittsburgh, PA, USA
Abstract :
Database applications that use multi-terabyte datasets are becoming increasingly important for scientific fields such as astronomy and biology. Scientific databases are particularly suited for the application of automated physical design techniques, because of their data volume and the complexity of the scientific workloads. Current automated physical design tools focus on the selection of indexes and materialized views. In large-scale scientific databases, however the data volume and the continuous insertion of new data allows for only limited indexes and materialized views. By contrast, data partitioning does not replicate data, thereby reducing space requirements and minimizing update overhead. In this paper we present AutoPart, an algorithm that automatically partitions database tables to optimize sequential access assuming prior knowledge of a representative workload. The resulting schema is indexed using a fraction of the space required for indexing the original schema. To evaluate AutoPart we built an automated schema design tool that interfaces to commercial database systems. We experiment with AutoPart in the context of the Sloan Digital Sky Survey database, a real-world astronomical database, running on SQL Server 2000. Our experiments demonstrate the benefits of partitioning for large-scale systems: partitioning alone improves query execution performance by a factor of two on average. Combined with indexes, the new schema also outperforms the indexed original schema by 20% (for queries) and a factor of five (for updates), while using only half the original index space.
Keywords :
astronomy computing; data analysis; database indexing; optimisation; query processing; scientific information systems; very large databases; AutoPart; SQL Server 2000; Sloan Digital Sky Survey database; astronomical database; astronomy; automated physical design techniques; biology; data insertion; data partitioning; data volume; database applications; database table partitioning; index selection; large-scale scientific databases; materialized views; multiterabyte datasets; query execution; schema design automation; scientific workload complexity; sequential access optimization; space requirements; update overhead; Astronomy; Database systems; Indexes; Indexing; Large-scale systems; Partitioning algorithms; Query processing; Spatial databases; Stress; Telescopes;
Conference_Titel :
Scientific and Statistical Database Management, 2004. Proceedings. 16th International Conference on
Print_ISBN :
0-7695-2146-0
DOI :
10.1109/SSDM.2004.1311234