Author :
Kruber, Nico ; Schintke, Florian ; Berlin, Michael
Abstract :
Distributed key-value stores are horizontally scalable by design. However, structured data with links between values may raise hotspots or bottlenecks caused by popular keys and large index objects. These hotspots typically reduce the scalability of the key-value store, especially for operations changing data. Relational database management systems, on the other hand, are designed to handle relational data efficiently, but generally do not scale horizontally in a cost-efficient way. Combining the best of both worlds, would be great. With a wiki as a demonstrator, we map a relational database schema to a distributed transactional key-value store. This includes solutions for typical constraints key-value stores impose on applications due to their limited query expressibility. It also includes the mapping of dependent tables and secondary indices to a single key-value namespace. We evaluate and identify hotspots and bottlenecks and propose improved mappings. We reduce the effects of the most prominent hotspots, i.e. secondary indices, by applying advanced partitioning schemes which both reduce the size of the indices and allow more concurrent write accesses in transactional contexts. These optimisations are generic and help to map relational schemas and corresponding applications to transactional key-value stores in a way to preserve their horizontal scalability. With our data models for key-value stores, we get the best of two worlds for the wiki application: a horizontally scalable database serving a moderately complex relational schema. Our optimisations give up to 96% fewer transaction aborts for data change operations and an up to 25-fold latency improvement for the overall operations mix, i.e. reading, changing, and creating data, compared to the basic mapping, when replaying an access trace of the Wikipedia on our system.
Keywords :
Web sites; data models; relational databases; Scalaris; Wikipedia; complex relational schema; concurrent write accesses; data change operations; data creation; data models; data reading; dependent table mapping; distributed transactional key-value store; horizontal scalability; horizontally scalable database; horizontally scalable distributed key-value stores; index size reduction; latency improvement; query expressibility; relational data handling; relational database management systems; relational database schema mapping; scalability reduction; secondary index mapping; secondary indices; single key-value namespace; structured data; wiki application; Data models; Electronic publishing; Encyclopedias; Indexes; Internet; Scalability; DHT; P2P; Wikipedia; horizontal scalability; key-value store; relational schema; scalable data model;