• DocumentCode
    54422
  • Title

    Identifying Redundancy and Exposing Provenance in Crowdsourced Data Analysis

  • Author

    Willett, Wesley ; Ginosar, Shiry ; Steinitz, Avital ; Hartmann, Bjorn ; Agrawala, Maneesh

  • Author_Institution
    INRIA, Sophia-Antipolis, France
  • Volume
    19
  • Issue
    12
  • fYear
    2013
  • fDate
    Dec. 2013
  • Firstpage
    2198
  • Lastpage
    2206
  • Abstract
    We present a system that lets analysts use paid crowd workers to explore data sets and helps analysts interactively examine and build upon workers´ insights. We take advantage of the fact that, for many types of data, independent crowd workers can readily perform basic analysis tasks like examining views and generating explanations for trends and patterns. However, workers operating in parallel can often generate redundant explanations. Moreover, because workers have different competencies and domain knowledge, some responses are likely to be more plausible than others. To efficiently utilize the crowd´s work, analysts must be able to quickly identify and consolidate redundant responses and determine which explanations are the most plausible. In this paper, we demonstrate several crowd-assisted techniques to help analysts make better use of crowdsourced explanations: (1) We explore crowd-assisted strategies that utilize multiple workers to detect redundant explanations. We introduce color clustering with representative selection-a strategy in which multiple workers cluster explanations and we automatically select the most-representative result-and show that it generates clusterings that are as good as those produced by experts. (2) We capture explanation provenance by introducing highlighting tasks and capturing workers´ browsing behavior via an embedded web browser, and refine that provenance information via source-review tasks. We expose this information in an explanation-management interface that allows analysts to interactively filter and sort responses, select the most plausible explanations, and decide which to explore further.
  • Keywords
    data analysis; groupware; information filtering; online front-ends; pattern clustering; sorting; color clustering with representative selection; crowd-assisted techniques; crowdsourced data analysis; crowdsourced explanations; embedded Web browser; explanation provenance expose; explanation-management interface; provenance information; redundancy explanation identification; response filtering; response sorting; source-review tasks; workers browsing behavior; Clustering algorithms; Data analysis; Image color analysis; Market research; Redundancy; Social network services; Clustering algorithms; Crowdsourcing; Data analysis; Image color analysis; Market research; Redundancy; Social Data Analysis; Social network services; Algorithms; Computer Graphics; Data Mining; Databases, Factual; Information Storage and Retrieval; Internet; User-Computer Interface;
  • fLanguage
    English
  • Journal_Title
    Visualization and Computer Graphics, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1077-2626
  • Type

    jour

  • DOI
    10.1109/TVCG.2013.164
  • Filename
    6634191