• DocumentCode
    168597
  • Title

    MapReduce Analysis for Cloud-Archived Data

  • Author

    Palanisamy, Balaji ; Singh, Ashutosh ; Mandagere, Nagapramod ; Alatorre, Gabriel ; Ling Liu

  • fYear
    2014
  • fDate
    26-29 May 2014
  • Firstpage
    51
  • Lastpage
    60
  • Abstract
    Public storage clouds have become a popular choice for archiving certain classes of enterprise data - for example, application and infrastructure logs. These logs contain sensitive information like IP addresses or user logins due to which regulatory and security requirements often require data to be encrypted before moved to the cloud. In order to leverage such data for any business value, analytics systems (e.g. Hadoop/MapReduce) first download data from these public clouds, decrypt it and then process it at the secure enterprise site. We propose VNCache: an efficient solution for MapReduceanalysis of such cloud-archived log data without requiring an apriori data transfer and loading into the local Hadoop cluster. VNcache dynamically integrates cloud-archived data into a virtual namespace at the enterprise Hadoop cluster. Through a seamless data streaming and prefetching model, Hadoop jobs can begin execution as soon as they are launched without requiring any apriori downloading. With VNcache´s accurate pre-fetching and caching, jobs often run on a local cached copy of the data block significantly improving performance. When no longer needed, data is safely evicted from the enterprise cluster reducing the total storage footprint. Uniquely, VNcache is implemented with NO changes to the Hadoop application stack.
  • Keywords
    cache storage; cloud computing; parallel programming; storage management; Hadoop application stack; Hadoop cluster; MapReduce analysis; VNCache; analytics systems; cloud-archived data; data streaming; encryption; enterprise data archiving; prefetching model; public storage clouds; regulatory requirements; security requirements; storage footprint reduction; virtual namespace; Cloud computing; Cryptography; Data models; Heuristic algorithms; Monitoring; Prefetching; Caching; Cloud Computing; Filesystem; MapReduce;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cluster, Cloud and Grid Computing (CCGrid), 2014 14th IEEE/ACM International Symposium on
  • Conference_Location
    Chicago, IL
  • Type

    conf

  • DOI
    10.1109/CCGrid.2014.13
  • Filename
    6846440