• DocumentCode
    734236
  • Title

    WebScalding: A Framework for Big Data Web Services

  • Author

    Jacob, Ferosh ; Johnson, Aaron ; Javed, Faizan ; Meng Zhao ; McNair, Matt

  • Author_Institution
    DataScience R&D, Norcross, GA, USA
  • fYear
    2015
  • fDate
    March 30 2015-April 2 2015
  • Firstpage
    493
  • Lastpage
    498
  • Abstract
    CareerBuilder (CB) currently has 50 million active resumes and 2 million active job postings. Our team has been working to provide the most relevant jobs for job seekers and resumes for employers and recruiters. These goals often lead to Big Data problems. In this paper, we introduce WebScalding, a Big Data framework designed and developed to solve some of the common large scale data challenges at CB. The WebScalding framework raises the level of abstraction of Twitter´s Scalding framework to adapt to CB´s unique challenges. The WebScalding framework helps users by ensuring that: 1) All internal web services are available as cascading pipe operations, 2) These pipe operations can read from our common data sources and create a pipe assembly and, 3) The pipe assembly such created can be executed in the CB Hadoop cluster as well as local machines without making any changes. We describe WebScalding using three case studies taken from actual internal projects that explain how data scientists at CB not well versed in Big Data tools and methodologies leverage WebScalding to design, implement, and test Big Data applications. We also compare the execution time of a WebScalding program with its sequential Python counterpart to illustrate the super linear speed up of WebScalding programs.
  • Keywords
    Big Data; Internet; Web services; data handling; parallel processing; social networking (online); Big Data Web services; CB Hadoop cluster; CareerBuilder; Twitter scalding framework; WebScalding framework; cascading pipe operations; pipe assembly; sequential Python; Big data; Encyclopedias; Libraries; Resumes; Web services; XML;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Big Data Computing Service and Applications (BigDataService), 2015 IEEE First International Conference on
  • Conference_Location
    Redwood City, CA
  • Type

    conf

  • DOI
    10.1109/BigDataService.2015.53
  • Filename
    7184921