• DocumentCode
    2792952
  • Title

    Storage Optimization for Large-Scale Distributed Stream Processing Systems

  • Author

    Hildrum, Kirsten ; Douglis, Fred ; Wolf, Joel L. ; Yu, Philip ; Fleischer, Lisa ; Katta, Akshay

  • Author_Institution
    IBM Thomas J. Watson Res. Center, Hawthorne, NY
  • fYear
    2007
  • fDate
    26-30 March 2007
  • Firstpage
    1
  • Lastpage
    8
  • Abstract
    We consider storage in an extremely large-scale distributed computer system designed for stream processing applications. In such systems, incoming data and intermediate results may need to be stored to enable future analyses. The quantity of such data would dominate even the largest storage system. Thus, a mechanism is needed to keep the most useful data. One recently introduced approach is to employ retention value functions, which effectively assign each data object a value that changes over time. Storage space is then reclaimed automatically by deleting data of lowest current value. In such large systems, there can naturally be multiple file systems available, each with different properties. Choosing the right file system for a given incoming data stream presents a challenge. In this paper we provide a novel and effective scheme for optimizing the placement of data within a distributed storage subsystem employing retention value functions. The goal is to keep the data of highest overall value, while simultaneously balancing the read load to the file system.
  • Keywords
    optimisation; resource allocation; storage management; file system; large-scale distributed stream processing system; resource allocation; storage optimization; Application software; Distributed computing; Educational institutions; File systems; Information services; Internet; Large-scale systems; Relational databases; Streaming media; Web sites; Storage management; file assignment problem; load balancing; optimization; streaming systems; theory;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Processing Symposium, 2007. IPDPS 2007. IEEE International
  • Conference_Location
    Long Beach, CA
  • Print_ISBN
    1-4244-0910-1
  • Electronic_ISBN
    1-4244-0910-1
  • Type

    conf

  • DOI
    10.1109/IPDPS.2007.370633
  • Filename
    4228361