DocumentCode
734236
Title
WebScalding: A Framework for Big Data Web Services
Author
Jacob, Ferosh ; Johnson, Aaron ; Javed, Faizan ; Meng Zhao ; McNair, Matt
Author_Institution
DataScience R&D, Norcross, GA, USA
fYear
2015
fDate
March 30 2015-April 2 2015
Firstpage
493
Lastpage
498
Abstract
CareerBuilder (CB) currently has 50 million active resumes and 2 million active job postings. Our team has been working to provide the most relevant jobs for job seekers and resumes for employers and recruiters. These goals often lead to Big Data problems. In this paper, we introduce WebScalding, a Big Data framework designed and developed to solve some of the common large scale data challenges at CB. The WebScalding framework raises the level of abstraction of Twitter´s Scalding framework to adapt to CB´s unique challenges. The WebScalding framework helps users by ensuring that: 1) All internal web services are available as cascading pipe operations, 2) These pipe operations can read from our common data sources and create a pipe assembly and, 3) The pipe assembly such created can be executed in the CB Hadoop cluster as well as local machines without making any changes. We describe WebScalding using three case studies taken from actual internal projects that explain how data scientists at CB not well versed in Big Data tools and methodologies leverage WebScalding to design, implement, and test Big Data applications. We also compare the execution time of a WebScalding program with its sequential Python counterpart to illustrate the super linear speed up of WebScalding programs.
Keywords
Big Data; Internet; Web services; data handling; parallel processing; social networking (online); Big Data Web services; CB Hadoop cluster; CareerBuilder; Twitter scalding framework; WebScalding framework; cascading pipe operations; pipe assembly; sequential Python; Big data; Encyclopedias; Libraries; Resumes; Web services; XML;
fLanguage
English
Publisher
ieee
Conference_Titel
Big Data Computing Service and Applications (BigDataService), 2015 IEEE First International Conference on
Conference_Location
Redwood City, CA
Type
conf
DOI
10.1109/BigDataService.2015.53
Filename
7184921
Link To Document