• DocumentCode
    822930
  • Title

    Modeling and Tracking of Transaction Flow Dynamics for Fault Detection in Complex Systems

  • Author

    Jiang, Guofei ; Chen, Haifeng ; Yoshihira, Kenji

  • Author_Institution
    NEC Labs. America Inc., Princeton, NJ
  • Volume
    3
  • Issue
    4
  • fYear
    2006
  • Firstpage
    312
  • Lastpage
    326
  • Abstract
    With the prevalence of Internet services and the increase of their complexity, there is a growing need to improve their operational reliability and availability. While a large amount of monitoring data can be collected from systems for fault analysis, it is hard to correlate this data effectively across distributed systems and observation time. In this paper, we analyze the mass characteristics of user requests and propose a novel approach to model and track transaction flow dynamics for fault detection in complex information systems. We measure the flow intensity at multiple checkpoints inside the system and apply system identification methods to model transaction flow dynamics between these measurements. With the learned analytical models, a model-based fault detection and isolation method is applied to track the flow dynamics in real time for fault detection. We also propose an algorithm to automatically search and validate the dynamic relationship between randomly selected monitoring points. Our algorithm enables systems to have self-cognition capability for system management. Our approach is tested in a real system with a list of injected faults. Experimental results demonstrate the effectiveness of our approach and algorithms
  • Keywords
    Internet; data handling; distributed processing; fault diagnosis; information systems; monitoring; transaction processing; Internet service; complex information system; distributed system; fault analysis; fault detection; monitoring; operational availability; operational reliability; self-cognition capability; system identification; system management; transaction flow dynamics; Analytical models; Availability; Fault detection; Fault location; Fluid flow measurement; Information analysis; Information systems; Monitoring; System identification; Web and internet services; Fault detection; dynamic relationship; flow intensity and dynamics.; information systems; model validation; model-based FDI; regression model; system management;
  • fLanguage
    English
  • Journal_Title
    Dependable and Secure Computing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1545-5971
  • Type

    jour

  • DOI
    10.1109/TDSC.2006.52
  • Filename
    4012644