• DocumentCode
    2398432
  • Title

    Enhancing Twitter Data Analysis with Simple Semantic Filtering: Example in Tracking Influenza-Like Illnesses

  • Author

    Doan, Son ; Ohno-Machado, Lucila ; Collier, Nigel

  • Author_Institution
    Div. of Biomed. Inf., Univ. of California, San Diego, La Jolla, CA, USA
  • fYear
    2012
  • fDate
    27-28 Sept. 2012
  • Firstpage
    62
  • Lastpage
    71
  • Abstract
    Systems that exploit publicly available user generated content such as Twitter messages have been successful in tracking seasonal influenza. We developed a novel filtering method for Influenza-Like-Ilnesses (ILI)-related messages using 587 million messages from Twitter micro-blogs. We first filtered messages based on syndrome keywords from the BioCaster Ontology, an extant knowledge model of laymen´s terms. We then filtered the messages according to semantic features such as negation, hashtags, emoticons, humor and geography. The data covered 36 weeks for the US 2009 influenza season from 30th August 2009 to 8th May 2010. Results showed that our system achieved the highest Pearson correlation coefficient of 98.46% (p-value<;2.2e-16), an improvement of 3.98% over the previous state-of-the-art method. The results indicate that simple NLP-based enhancements to existing approaches to mine Twitter data can increase the value of this inexpensive resource.
  • Keywords
    data mining; diseases; information filtering; medical computing; natural language processing; ontologies (artificial intelligence); social networking (online); statistical analysis; BioCaster ontology; ILI-related messages; NLP-based enhancements; Pearson correlation coefficient; Twitter data analysis enhancement; Twitter data mining; Twitter microblog message semantic filtering; emoticons; geography; hashtags; humor; knowledge model; negation; publicly-available user generated content; seasonal influenza-like illnesses tracking; semantic features; syndrome keywords; Correlation; Diseases; Lungs; Ontologies; Semantics; Surveillance; Twitter; Twitter; influenza; natural language processing; social media;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Healthcare Informatics, Imaging and Systems Biology (HISB), 2012 IEEE Second International Conference on
  • Conference_Location
    San Diego, CA
  • Print_ISBN
    978-1-4673-4803-4
  • Type

    conf

  • DOI
    10.1109/HISB.2012.21
  • Filename
    6366191