• DocumentCode
    633070
  • Title

    Approximate Incremental Big-Data Harmonization

  • Author

    Agarwal, Prabhakar ; Shroff, Gautam ; Malhotra, Pankaj

  • Author_Institution
    TCS Innovation Labs., Tata Consultancy Services Ltd., Noida, India
  • fYear
    2013
  • fDate
    June 27 2013-July 2 2013
  • Firstpage
    118
  • Lastpage
    125
  • Abstract
    The needs of `big data analytics´ increasingly require IT organizations to ingest, process, and extract business insights from ever larger volumes of data that arrive far more rapidly than before, as well as from new sources such as social media, mobile devices, and sensors. However, in order to extract insights from diverse information feeds from multiple, often unrelated sources, these first need to be correlated or harmonized to a common level of granularity. We formally define this commonly arising data harmonization problem. We show how to correlate disparate data sources using map-reduce, but in an approximate and/or incremental manner as often required in practice. We motivate our techniques through a real-life enterprise data-harmonization case study for which we describe our performance results on big-data technologies, namely, Map Reduce, Hadoop and PIG.
  • Keywords
    business data processing; data analysis; Hadoop; IT organizations; Map-Reduce; PIG; approximate incremental big-data harmonization; big data analytics; big-data technologies; business insight extraction; business insight ingestion; business insight processing; enterprise data-harmonization; Bismuth; Business; Correlation; Current measurement; Data mining; Indexes; Approximate-Join; BigData; ETL-MR; Harmonization; Incremental ETL; Map-Reduce;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Big Data (BigData Congress), 2013 IEEE International Congress on
  • Conference_Location
    Santa Clara, CA
  • Print_ISBN
    978-0-7695-5006-0
  • Type

    conf

  • DOI
    10.1109/BigData.Congress.2013.24
  • Filename
    6597127