• DocumentCode
    3717173
  • Title

    Brown Dog: Leveraging everything towards autocuration

  • Author

    Smruti Padhy;Greg Jansen;Jay Alameda;Edgar Black;Liana Diesendruck;Mike Dietze;Praveen Kumar;Rob Kooper;Jong Lee;Rui Liu;Richard Marciano;Luigi Marini;Dave Mattson;Barbara Minsker;Chris Navarro;Marcus Slavenas;William Sullivan;Jason Votava;Inna Zharnitsky

  • Author_Institution
    National Center for Supercomputing Applications University of Illinois at Urbana-Champaign
  • fYear
    2015
  • Firstpage
    493
  • Lastpage
    500
  • Abstract
    We present Brown Dog, two highly extensible services that aim to leverage any existing pieces of code, libraries, services, or standalone software (past or present) towards providing users with a simple to use and programmable means of automated aid in the curation and indexing of distributed collections of uncurated and/or unstructured data. Data collections such as these encompassing large varieties of data, in addition to large amounts of data, pose a significant challenge within modern day "Big Data" efforts. The two services, the Data Access Proxy (DAP) and the Data Tilling Service (DTS), focusing on format conversions and content based analysis/extraction respectively, wrap relevant conversion and extraction operations within arbitrary software, manages their deployment in an elastic manner, and manages job execution from behind a deliberately compact REST API. We describe both the motivation and need/scientific drivers for such services, the constituent components that allow for arbitrary software/code to be used and managed, and lastly an evaluation of the systems capabilities and scalability.
  • Keywords
    "Data mining","Metadata","Software","Libraries","Big data","Indexing"
  • Publisher
    ieee
  • Conference_Titel
    Big Data (Big Data), 2015 IEEE International Conference on
  • Type

    conf

  • DOI
    10.1109/BigData.2015.7363791
  • Filename
    7363791