• DocumentCode
    2414457
  • Title

    Distributed Stream Processing Analysis in High Availability Context

  • Author

    Gorawski, Marcin ; Marks, Pawel

  • Author_Institution
    Inst. of Comput. Sci., Silesian Univ. of Technol., Gliwice
  • fYear
    2007
  • fDate
    10-13 April 2007
  • Firstpage
    61
  • Lastpage
    68
  • Abstract
    Not so long ago data warehouses were used to process data sets loaded periodically during ETL process (extraction, transformation and loading). We could distinguish two kinds of ETL processes: full and incremental. Now we often have to process real-time data and analyse them almost on-the-fly, so the analyses are always up to date. There are many possible applications for real-time data warehouses. In most cases two features are important: delivering data to the warehouse as quick as possible, and not losing any tuple in case of failures. In this paper we propose an architecture for gathering and processing data from geographically distributed data sources. We present theoretical analysis, mathematical model of a data source, some rules of system modules configuration and results of experiments. At the end of the paper our future plans are described briefly
  • Keywords
    data acquisition; data warehouses; distributed processing; ETL processes; data gathering; data processing; distributed stream processing analysis; real-time data warehouses; Application software; Availability; Computer science; Data analysis; Data mining; Data warehouses; Energy consumption; Mathematical model; Meter reading; Patient monitoring;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Availability, Reliability and Security, 2007. ARES 2007. The Second International Conference on
  • Conference_Location
    Vienna
  • Print_ISBN
    0-7695-2775-2
  • Type

    conf

  • DOI
    10.1109/ARES.2007.72
  • Filename
    4159788