Title :
A Domain-Driven, Generative Data Model for Big Pet Store
Author :
Nowling, Ronald J. ; Vyas, Jay
Author_Institution :
Red Hat Inc., Raleigh, NC, USA
Abstract :
Generating large amounts of semantically-rich data for testing big data workflows is paramount for scalable performance benchmarking and quality assurance in modern machine-learning and analytics workloads. The most obvious use case for such a generative algorithm is in conjunction with a big data application blueprint, which can be used by developers (to test their emerging big data solutions) as well as end users (as a starting point for validating infrastructure installations, building novel applications, and learning analytics methods). We present a new domain-driven, generative data model for Big Pet Store, a big data application blueprint for the Hadoop ecosystem included in the Apache Big Top distribution. We describe the model and demonstrate its ability to generate semantically-rich data at variable scale ranging from a single machine to a large cluster. We validate the model by using the generated data to answer questions about customer locations and purchasing habits for a fictional targeted advertising campaign, a common business use case.
Keywords :
Big Data; data models; learning (artificial intelligence); program testing; public domain software; quality assurance; software quality; workflow management software; Apache big top distribution; Big Data application blueprint; Big data workflow testing; BigPetStore; Hadoop ecosystem; analytics workloads; domain-driven generative data model; generative algorithm; machine-learning; quality assurance; scalable performance benchmarking; semantically-rich data; Benchmark testing; Big data; Data models; Generators; Hidden Markov models; Probability density function; benchmarking; big data; data generation; probabilistic models; synthetic data sets; testing;
Conference_Titel :
Big Data and Cloud Computing (BdCloud), 2014 IEEE Fourth International Conference on
Conference_Location :
Sydney, NSW
DOI :
10.1109/BDCloud.2014.38