• DocumentCode
    555753
  • Title

    Wiki-Watchdog: Anomaly Detection in Wikipedia Through a Distributional Lens

  • Author

    Arackaparambil, Chrisil ; Yan, Guanhua

  • Volume
    1
  • fYear
    2011
  • fDate
    22-27 Aug. 2011
  • Firstpage
    257
  • Lastpage
    264
  • Abstract
    Wikipedia has become a standard source of reference online, and many people (some unknowingly) now trust this corpus of knowledge as an authority to fulfil their information requirements. In doing so they task the human contributors of Wikipedia with maintaining the accuracy of articles, a job that these contributors have been performing admirably. We study the problem of monitoring the Wikipedia corpus with the goal of automated, online anomaly detection. We present Wiki-watchdog, an efficient distribution-based methodology that monitors distributions of revision activity for changes. We show that using our methods it is possible to detect the activity of bots, flash events, and outages, as they occur. Our methods are proposed to support the monitoring of the contributors. They are useful to speed-up anomaly detection, and identify events that are hard to detect manually. We show the efficacy and the low false-positive rate of our methods by experiments on the revision history of Wikipedia. Our results show that distribution-based anomaly detection has a higher detection rate than traditional methods based on either volume or entropy alone. Unlike previous work on anomaly detection in information networks that worked with a static network graph, our methods consider the network as it evolves and monitors properties of the network for changes. Although our methodology is developed and evaluated on Wikipedia, we believe it is an effective generic anomaly detection framework in its own right.
  • Keywords
    Web sites; network theory (graphs); security of data; Wiki-watchdog; Wikipedia corpus; contributor monitoring; distribution-based methodology; distributional lens; entropy; information networks; information requirements; online anomaly detection; static network graph; Electronic publishing; Encyclopedias; Entropy; Internet; Measurement; Monitoring;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web Intelligence and Intelligent Agent Technology (WI-IAT), 2011 IEEE/WIC/ACM International Conference on
  • Conference_Location
    Lyon
  • Print_ISBN
    978-1-4577-1373-6
  • Electronic_ISBN
    978-0-7695-4513-4
  • Type

    conf

  • DOI
    10.1109/WI-IAT.2011.86
  • Filename
    6036762