DocumentCode
3461657
Title
SAFAL: A MapReduce Spatio-temporal Analyzer for UNAVCO FTP Logs
Author
Hodgkinson, Kathleen ; Rezgui, Abdelmounaam
Author_Institution
Plate Boundary Obs., UNAVCO, Boulder, CO, USA
fYear
2013
fDate
3-5 Dec. 2013
Firstpage
1083
Lastpage
1090
Abstract
UNAVCO is a National Science Foundation (NSF) funded consortium that facilitates geoscience research and education using geodesy. It is responsible for the collection, archiving and distribution of data from GPS sites installed in every continent of the world. In addition to GPS data, UNAVCO collects borehole seismic, strain meter, meteorological, and digital imagery data. One of UNAVCO´s largest programs is the Plate Boundary Observatory (PBO), the geodetic component of the NSF funded Earth scope program. PBO consists of over 1100 continuous GPS sites plus 80 borehole strain and seismic sites. In this paper, we present SAFAL, a Spatio-temporal Analyzer of FTP Access Logs collected by UNAVCO´s data center. We developed SAFAL using Hadoop/MapReduce. The motivation for this work was to build an efficient system able to quickly identify trends in GPS data usage. The system is able to processes millions of lines of data in minutes. It supports queries such as: (i) what is the most downloaded GPS site, (ii) who is downloading the data most, or (iii) what periods of data are of greatest interest. Answers to these and similar queries are useful for planning network growth, allocating Web resources, and tracking hot topics in geoscience research. They also may be extremely useful to help UNAVCO illuminate dark data.
Keywords
Global Positioning System; Internet; data acquisition; data mining; geodesy; geographic information systems; information retrieval; FTP access logs; GPS data usage; GPS sites; Hadoop; MapReduce spatio-temporal analyzer; NSF funded Earth scope program; NSF funded consortium; National Science Foundation; PBO; SAFAL; UNAVCO FTP logs; UNAVCO data center; Web usage mining; borehole seismic data; borehole strain; data archiving; data collection; data distribution; digital imagery data; geodesy; geodetic component; geoscience education; geoscience research; meteorological data; plate boundary observatory; strain meter; Data mining; Geoscience; Global Positioning System; Planning; Servers; US Government; FTP access logs; GPS sites; Hadoop; MapReduce; Web usage mining;
fLanguage
English
Publisher
ieee
Conference_Titel
Computational Science and Engineering (CSE), 2013 IEEE 16th International Conference on
Conference_Location
Sydney, NSW
Type
conf
DOI
10.1109/CSE.2013.157
Filename
6755338
Link To Document