Author_Institution :
DEIB, Politec. di Milano, Milan, Italy
Abstract :
Nowadays, thanks to the massive usage of the Cloud, different providers offer storage as a service solutions. Each of these solutions is characterized by different storage capacity and features. They are also offered according to various business models, typically, users can choose between free plans (with a limited amount of space) and paid plans. Free plans users, when the storage capacity lowers, tend to subscribe to new free plans from other providers, thus increasing the so called data fragmentation. This phenomenon heavily increases the file management complexity. This paper proposes a solution to the data fragmentation problem, by describing an innovative approach which allows to deploy a distributed file system on top of different SaaS storage accounts, offered by different providers. This approach, not only lowers the complexity of data management by providing a single transparent storage solution to the user, but it is also able to provide features like full-text search, file classification and categorization, data analytics (MapReduce) on top of these SaaS storage accounts. Furthermore, this approach proposes a new way to address data privacy and security issues, typically connected to SaaS storage accounts.
Keywords :
cloud computing; data analysis; data privacy; security of data; storage management; MapReduce; data analytics; data fragmentation; data management complexity; data privacy; data security; distributed file system; file categorization; file classification; file management complexity; full-text search; heterogeneous SaaS storage platform; single transparent storage solution; storage capacity; Cloud computing; Data privacy; Distributed databases; File systems; Memory; Protocols; Security; Big Data; Hadoop Distributed File System (HDFS); Storage as a Service;