• DocumentCode
    1809169
  • Title

    Do we need a perfect ground-truth for benchmarking Internet traffic classifiers?

  • Author

    Rosario Oliveira, M. ; Neves, Joao ; Valadas, Rui ; Salvador, Paulo

  • Author_Institution
    Dept. de Mat., Univ. de Lisboa, Lisbon, Portugal
  • fYear
    2015
  • fDate
    April 26 2015-May 1 2015
  • Firstpage
    2452
  • Lastpage
    2460
  • Abstract
    The classification of Internet traffic using supervised or semi-supervised statistical learning techniques, both for anomaly detection and identification of Internet applications, has been impaired by difficulties in obtaining a reliable ground-truth, required both to train the classifier and to evaluate its performance. A perfect ground-truth is increasingly difficult, or sometimes impossible, to obtain due to the growing percentage of cyphered traffic, the sophistication of network attacks, and the constant updates of Internet applications. In this paper, we study the impact of the ground-truth on training the classifier and estimating its performance measures. We show both theoretically and through simulation that ground-truth imperfections can severely bias the performance estimates. We then propose a latent class model that overcomes this problem by combining estimates of several classifiers over the same dataset. The model is evaluated using a high-quality dataset that includes the most representative Internet applications and network attacks. The results show that our latent class model produces very good performance estimates under mild levels of ground-truth imperfection, and can thus be used to correctly benchmark Internet traffic classifiers when only an imperfect ground-truth is available.
  • Keywords
    Internet; learning (artificial intelligence); statistical analysis; telecommunication traffic; Internet traffic classification; ground-truth imperfection; latent class model; semisupervised statistical learning technique; Computers; Conferences; Estimation; IP networks; Internet; Standards; Training; Anomaly Detection; Identification of Internet Applications; Latent Class Models; Traffic Classification;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Communications (INFOCOM), 2015 IEEE Conference on
  • Conference_Location
    Kowloon
  • Type

    conf

  • DOI
    10.1109/INFOCOM.2015.7218634
  • Filename
    7218634