• DocumentCode
    3274622
  • Title

    Safely Managing Data Variety in Big Data Software Development

  • Author

    Cerqueus, Thomas ; Cunha de Almeida, Eduardo ; Scherzinger, Stefanie

  • Author_Institution
    INSA-Lyon, Univ. de Lyon, Lyon, France
  • fYear
    2015
  • fDate
    23-23 May 2015
  • Firstpage
    4
  • Lastpage
    10
  • Abstract
    We consider the task of building Big Data software systems, offered as software-as-a-service. These applications are commonly backed by NoSQL data stores that address the proverbial Vs of Big Data processing: NoSQL data stores can handle large volumes of data and many systems do not enforce a global schema, to account for structural variety in data. Thus, software engineers can design the data model on the go, a flexibility that is particularly crucial in agile software development. However, NoSQL data stores commonly do not yet account for the veracity of changes when it comes to changes in the structure of persisted data. Yet this is an inevitable consequence of agile software development. In most NoSQL-based application stacks, schema evolution is completely handled within the application code, usually involving object mapper libraries. Yet simple code refactorings, such as renaming a class attribute at the source code level, can cause data loss or runtime errors once the application has been deployed to production. We address this pain point by contributing type checking rules that we have implemented within an IDE plug in. Our plug in ControVol statically type checks the object mapper class declarations against the code release history. ControVol is thus capable of detecting common yet risky cases of mismatched data and schema, and can even suggest automatic fixes.
  • Keywords
    Big Data; cloud computing; programming environments; software engineering; Big Data software development; ControVol; IDE plugin; NoSQL data stores; code release history; object mapper class declarations; software-as-a-service; type checking rules; Big data; History; Java; Loading; Production; Runtime; Software; NoSQL data stores; object mapping; schema evolution; type checking;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Big Data Software Engineering (BIGDSE), 2015 IEEE/ACM 1st International Workshop on
  • Conference_Location
    Florence
  • Type

    conf

  • DOI
    10.1109/BIGDSE.2015.9
  • Filename
    7165991