DocumentCode
633070
Title
Approximate Incremental Big-Data Harmonization
Author
Agarwal, Prabhakar ; Shroff, Gautam ; Malhotra, Pankaj
Author_Institution
TCS Innovation Labs., Tata Consultancy Services Ltd., Noida, India
fYear
2013
fDate
June 27 2013-July 2 2013
Firstpage
118
Lastpage
125
Abstract
The needs of `big data analytics´ increasingly require IT organizations to ingest, process, and extract business insights from ever larger volumes of data that arrive far more rapidly than before, as well as from new sources such as social media, mobile devices, and sensors. However, in order to extract insights from diverse information feeds from multiple, often unrelated sources, these first need to be correlated or harmonized to a common level of granularity. We formally define this commonly arising data harmonization problem. We show how to correlate disparate data sources using map-reduce, but in an approximate and/or incremental manner as often required in practice. We motivate our techniques through a real-life enterprise data-harmonization case study for which we describe our performance results on big-data technologies, namely, Map Reduce, Hadoop and PIG.
Keywords
business data processing; data analysis; Hadoop; IT organizations; Map-Reduce; PIG; approximate incremental big-data harmonization; big data analytics; big-data technologies; business insight extraction; business insight ingestion; business insight processing; enterprise data-harmonization; Bismuth; Business; Correlation; Current measurement; Data mining; Indexes; Approximate-Join; BigData; ETL-MR; Harmonization; Incremental ETL; Map-Reduce;
fLanguage
English
Publisher
ieee
Conference_Titel
Big Data (BigData Congress), 2013 IEEE International Congress on
Conference_Location
Santa Clara, CA
Print_ISBN
978-0-7695-5006-0
Type
conf
DOI
10.1109/BigData.Congress.2013.24
Filename
6597127
Link To Document