Title :
Automation of the Validation, Anonymization, and Augmentation of Big Data from a Multi-year Driving Study
Author :
Wallace, Bruce ; Goubran, Rafik ; Knoefel, Frank ; Marshall, Shawn ; Porter, Michelle ; Harlow, Madelaine ; Puli, Akshay
Author_Institution :
Syst. & Comput. Eng., Carleton Univ., Ottawa, ON, Canada
Abstract :
The Candrive/Ozcandrive project is a long term study that is now entering its sixth year focused on improving the safety of older drivers. The study includes 256 older drivers in the Ottawa area and is an example of a longitudinal study that generates big data sensor information recorded from the participant vehicles. This paper uses the Can drive data and proposes solutions that would enable differential privacy including a theoretical open access model for the data using k anonymity techniques for any combination of 7 parameters that have identifiable attributes. The dataset includes an in-vehicle sensor that captures Global Positioning System (GPS) and On Board Diagnostics II (OBDII) data for every second that the vehicle is operating. The resulting data set includes hundreds to thousands of hours of data for each of the study vehicles. The paper discusses methods to address the challenge of transitioning a large data set of GPS and other raw sensor samples to data ready to analyze. Automated methods to detect and correct any issues in the individual data samples along with the needed tools to adapt the raw sensor data into formats that can be easily processed are shown. The paper provides solutions to ensure k anonymity based privacy of the study participant´s identity for seven parameters including location of their home through vehicle location information or through a combination of the sensor information. The paper presents mechanisms to augment the captured sensor data through fusion with external data resources to bring added information to the data set including weather information, road information from mapping sources and day/night status. The paper will present the performance applicability for analysis of the resulting dataset within a cloud computing architecture.
Keywords :
Big Data; Global Positioning System; cloud computing; data analysis; data privacy; sensor fusion; Candrive project; GPS; Global Positioning System; OBDII data; On Board Diagnostics II data; Ozcandrive project; big data anonymization; big data augmentation; big data sensor information; big data validation; cloud computing architecture; dataset analysis; differential privacy; external data resources; in-vehicle sensor; k anonymity based privacy; multiyear driving study; theoretical open access model; Data privacy; Engines; Global Positioning System; Meteorology; Privacy; Roads; Vehicles; Differential Privacy; Global Positioning System (GPS); data analytics; driving; k-Anonymity;
Conference_Titel :
Big Data (BigData Congress), 2015 IEEE International Congress on
Conference_Location :
New York, NY
Print_ISBN :
978-1-4673-7277-0
DOI :
10.1109/BigDataCongress.2015.93