Title :
Detecting environmental disasters in digital news archives
Author :
Amelia Yzaguirre;Robert Warren;Mike Smit
Author_Institution :
Dalhousie University Halifax, Canada
Abstract :
Automatically extracting events from large, unstructured/semi-structured textual data requires a mechanism for identifying the event, abstracting it from the text, validating the event´s occurrence against some known values, and sharing the event with users effectively. Inherent in the challenge of Big Data is that it often exceeds a scale at which humans can effectively operate. In this paper, we focus on the domain of archived newspaper articles, and describe a system that generates a collection of event summaries from unstructured text, extracts a geographic marker for the event, and stores both in an on-line database that can be searched and/or visualized using an interactive map. The system relies on text mining techniques to filter out a dataset of news stories from a digital news archive source and extracts 1-2 sentences from each event to be stored in the database. We illustrate this approach using a flood database case study, automatically extracting descriptions of past flooding events occurring in Nova Scotia, Canada from a 20-year archive of regional newspaper articles. We validate our event extraction in two dimensions (identification of articles mentioning flood events; identification of accurate geographic markers from articles about flood events) using Amazon´s Mechanical Turk (MTurk) to obtain human assessments at scale.
Keywords :
"Vocabulary","Data mining","Data visualization","Big data","Manuals","Visual databases"
Conference_Titel :
Big Data (Big Data), 2015 IEEE International Conference on
DOI :
10.1109/BigData.2015.7363984