• DocumentCode
    3472322
  • Title

    Personal Health Information detection in unstructured web documents

  • Author

    Razavi, Amir H. ; Ghazinour, Kambiz

  • Author_Institution
    Sch. of Electr. Eng. & Comput. Sci., Univ. of Ottawa, Ottawa, ON, Canada
  • fYear
    2013
  • fDate
    20-22 June 2013
  • Firstpage
    155
  • Lastpage
    160
  • Abstract
    This paper describes our study of the incidence of Personal Health Information (PHI) on the Web. PHI is usually shared under conditions of confidentiality, protection and trust, and should not be disclosed or available to unrelated third parties or the general public. We first analyzed the characteristics that potentially make systems successful in identification of unsolicited or unjustified PHI disclosures. In the next stage, we designed and implemented an integrated Natural Language Processing/Machine Learning (NLP/ML)-based system that detects disclosures of personal health information, specifically according to the above characteristics including detected patterns. This research is regarded as the first step toward a learning system that will be trained based on a limited training set built on the result of the processing chain described in the paper in order to generally detect the PHI disclosures over the web.
  • Keywords
    Internet; learning (artificial intelligence); medical information systems; natural language processing; machine learning system; natural language processing system; personal health information; unjustified PHI disclosure identification; unsolicited PHI disclosure identification; unstructured Web document; Chemicals; Data mining; Diseases; Drugs; Manuals; Pediatrics;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer-Based Medical Systems (CBMS), 2013 IEEE 26th International Symposium on
  • Conference_Location
    Porto
  • Type

    conf

  • DOI
    10.1109/CBMS.2013.6627781
  • Filename
    6627781