Title :
Semantic-Aware Metadata Organization Paradigm in Next-Generation File Systems
Author :
Hua, Yu ; Jiang, Hong ; Zhu, Yifeng ; Feng, Dan ; Tian, Lei
Author_Institution :
Wuhan Nat. Lab. for Optoelectron., Huazhong Univ. of Sci. & Technol., Wuhan, China
Abstract :
Existing data storage systems based on the hierarchical directory-tree organization do not meet the scalability and functionality requirements for exponentially growing data sets and increasingly complex metadata queries in large-scale, Exabyte-level file systems with billions of files. This paper proposes a novel decentralized semantic-aware metadata organization, called SmartStore, which exploits semantics of files´ metadata to judiciously aggregate correlated files into semantic-aware groups by using information retrieval tools. The key idea of SmartStore is to limit the search scope of a complex metadata query to a single or a minimal number of semantically correlated groups and avoid or alleviate brute-force search in the entire system. The decentralized design of SmartStore can improve system scalability and reduce query latency for complex queries (including range and top-k queries). Moreover, it is also conducive to constructing semantic-aware caching, and conventional filename-based point query. We have implemented a prototype of SmartStore and extensive experiments based on real-world traces show that SmartStore significantly improves system scalability and reduces query latency over database approaches. To the best of our knowledge, this is the first study on the implementation of complex queries in large-scale file systems.
Keywords :
file organisation; meta data; query processing; SmartStore organization; data storage system; exabyte-level file system; filename-based point query; hierarchical directory-tree organization; information retrieval tools; metadata query; next-generation file system; query latency; range query; semantic-aware caching; semantic-aware metadata organization; top-k query; Computational complexity; Correlation; Indexes; Large scale integration; Organizations; Semantics; File systems; metadata management; performance evaluation.; scalability;
Journal_Title :
Parallel and Distributed Systems, IEEE Transactions on
DOI :
10.1109/TPDS.2011.169