• DocumentCode
    86917
  • Title

    Pythia: detection, localization, and diagnosis of performance problems

  • Author

    Kanuparthy, Partha ; Lee, Daewoo ; Matthews, William ; Dovrolis, Constantine ; Zarifzadeh, Sajjad

  • Volume
    51
  • Issue
    11
  • fYear
    2013
  • fDate
    Nov-13
  • Firstpage
    55
  • Lastpage
    62
  • Abstract
    Performance problem diagnosis is a critical part of network operations in ISPs. Service providers use a combination of approaches to troubleshoot performance of their networks, such as active monitoring infrastructure and data collection (SNMP, Netflow, router logs, table dumps, etc.) along with customer trouble tickets. Some of these approaches, however, do not scale to wide area inter-domain networks due to unavailability of such data; moreover, troubleshooting is either reactive (e.g., driven by customer complaints) or (typically) automated using static thresholds. In this article, we describe the design and implementation of a system for root cause analysis and localization of performance problems in ISP networks. Our approach works with legacy monitoring infrastructure (e.g., perfSONAR deployments) and does not need specialized active probing tools or network data. Our system provides a language for network operators to define performance problem signatures, and provides near-real-time performance diagnosis and localization. We describe our deployment of Pythia in perfSONAR monitors in production networks in Georgia, covering over 250 inter-domain paths.
  • Keywords
    monitoring; performance evaluation; real-time systems; telecommunication network routing; wide area networks; ISP networks; Netflow; Pythia; SNMP; active monitoring infrastructure; active probing tools; customer complaints; customer trouble tickets; data collection; legacy monitoring infrastructure; localization; near-real-time performance diagnosis; network data; network operations; network operators; network performance; perfSONAR deployments; performance problem diagnosis; performance problem signatures; performance problems; root cause analysis; router logs; service providers; static thresholds; table dumps; wide area inter-domain networks; Databases; Internet service providers; Metasearch; Telecommunication network management; Time series analysis; Web and internet services;
  • fLanguage
    English
  • Journal_Title
    Communications Magazine, IEEE
  • Publisher
    ieee
  • ISSN
    0163-6804
  • Type

    jour

  • DOI
    10.1109/MCOM.2013.6658653
  • Filename
    6658653