• DocumentCode
    3717363
  • Title

    Detecting environmental disasters in digital news archives

  • Author

    Amelia Yzaguirre;Robert Warren;Mike Smit

  • Author_Institution
    Dalhousie University Halifax, Canada
  • fYear
    2015
  • Firstpage
    2027
  • Lastpage
    2035
  • Abstract
    Automatically extracting events from large, unstructured/semi-structured textual data requires a mechanism for identifying the event, abstracting it from the text, validating the event´s occurrence against some known values, and sharing the event with users effectively. Inherent in the challenge of Big Data is that it often exceeds a scale at which humans can effectively operate. In this paper, we focus on the domain of archived newspaper articles, and describe a system that generates a collection of event summaries from unstructured text, extracts a geographic marker for the event, and stores both in an on-line database that can be searched and/or visualized using an interactive map. The system relies on text mining techniques to filter out a dataset of news stories from a digital news archive source and extracts 1-2 sentences from each event to be stored in the database. We illustrate this approach using a flood database case study, automatically extracting descriptions of past flooding events occurring in Nova Scotia, Canada from a 20-year archive of regional newspaper articles. We validate our event extraction in two dimensions (identification of articles mentioning flood events; identification of accurate geographic markers from articles about flood events) using Amazon´s Mechanical Turk (MTurk) to obtain human assessments at scale.
  • Keywords
    "Vocabulary","Data mining","Data visualization","Big data","Manuals","Visual databases"
  • Publisher
    ieee
  • Conference_Titel
    Big Data (Big Data), 2015 IEEE International Conference on
  • Type

    conf

  • DOI
    10.1109/BigData.2015.7363984
  • Filename
    7363984