• DocumentCode
    3600779
  • Title

    The Design and Implementations of Locality-Aware Approximate Queries in Hybrid Storage Systems

  • Author

    Yu Hua ; Bin Xiao ; Xue Liu ; Dan Feng

  • Author_Institution
    Sch. of Comput. Sci. & Technol., Huazhong Univ. of Sci. & Technol., Wuhan, China
  • Volume
    26
  • Issue
    11
  • fYear
    2015
  • Firstpage
    3194
  • Lastpage
    3207
  • Abstract
    Cloud computing applications face the challenges of dealing with a huge volume of data that needs the support of accurate and fast approximate queries to enhance system scalability and improve quality of service. Locality-sensitive hashing (LSH) can support the approximate queries that unfortunately suffer from imbalanced load and space inefficiency among distributed data servers, which severely limits the query accuracy and incurs long query latency between users and cloud servers. In this paper, we propose a novel scheme, called NEST, which offers easy-to-use and cost-effective approximate queries for cloud computing. The novelty of NEST is to leverage cuckoo-driven locality-sensitive hashing to find similar items that are further placed closely through cuckoo-driven method to obtain load-balancing buckets in hash tables. NEST hence carries out flat and manageable addressing in adjacent buckets, and obtains constant-scale query complexity even in the worst case. The benefits of NEST include the increments of space utilization and fast query response. Moreover, due to the salient property of flat addressing in NEST, we implement NEST design in a real hybrid storage system, which consists of DRAM, SSD, and hard disk. The flat addressing allows efficient operations in SSD to improve system performance. We argue that a proper “division of labor” among DRAM, SSD, and hard disk in the hybrid and heterogeneous storage hierarchy is desperately needed to strike an optimal balance to remove the indexing bottleneck. Theoretical analysis and extensive experiments (on LANL and Microsoft metadata) in a large-scale cloud testbed demonstrate the salient properties of NEST to meet the needs of approximate query service in cloud computing environments. We have offered open-source codes of NEST for public use.
  • Keywords
    cloud computing; meta data; query processing; resource allocation; storage management; DRAM; LANL metadata; LSH; Microsoft metadata; NEST design; SSD; approximate query service; cloud computing environments; cloud servers; constant-scale query complexity; cost-effective approximate queries; cuckoo-driven locality-sensitive hashing; cuckoo-driven method; distributed data servers; flat addressing; hard disk; hash tables; heterogeneous storage hierarchy; hybrid storage system; imbalanced load; indexing bottleneck; labor division; large-scale cloud testbed; load-balancing buckets; locality-aware approximate queries; open-source codes; optimal balance; quality of service; query accuracy; query latency; query response; space inefficiency; space utilization; system performance; system scalability; Artificial neural networks; Cloud computing; Complexity theory; Hard disks; Random access memory; Standards; Vectors; Hybrid storage systems; approximate queries; locality;
  • fLanguage
    English
  • Journal_Title
    Parallel and Distributed Systems, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1045-9219
  • Type

    jour

  • DOI
    10.1109/TPDS.2014.2367497
  • Filename
    6948249