DocumentCode
3472322
Title
Personal Health Information detection in unstructured web documents
Author
Razavi, Amir H. ; Ghazinour, Kambiz
Author_Institution
Sch. of Electr. Eng. & Comput. Sci., Univ. of Ottawa, Ottawa, ON, Canada
fYear
2013
fDate
20-22 June 2013
Firstpage
155
Lastpage
160
Abstract
This paper describes our study of the incidence of Personal Health Information (PHI) on the Web. PHI is usually shared under conditions of confidentiality, protection and trust, and should not be disclosed or available to unrelated third parties or the general public. We first analyzed the characteristics that potentially make systems successful in identification of unsolicited or unjustified PHI disclosures. In the next stage, we designed and implemented an integrated Natural Language Processing/Machine Learning (NLP/ML)-based system that detects disclosures of personal health information, specifically according to the above characteristics including detected patterns. This research is regarded as the first step toward a learning system that will be trained based on a limited training set built on the result of the processing chain described in the paper in order to generally detect the PHI disclosures over the web.
Keywords
Internet; learning (artificial intelligence); medical information systems; natural language processing; machine learning system; natural language processing system; personal health information; unjustified PHI disclosure identification; unsolicited PHI disclosure identification; unstructured Web document; Chemicals; Data mining; Diseases; Drugs; Manuals; Pediatrics;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer-Based Medical Systems (CBMS), 2013 IEEE 26th International Symposium on
Conference_Location
Porto
Type
conf
DOI
10.1109/CBMS.2013.6627781
Filename
6627781
Link To Document