• DocumentCode
    3717270
  • Title

    Data deidentification in medical transcriptions using regular expressions and machine learning

  • Author

    Joshua Seeger;Aron Culotta;Jason Keller;Patrick van Kessel;Michael Jugovich

  • Author_Institution
    NORC at the University of Chicago, 1 North State Street, 14th Floor, Chicago, IL 60602
  • fYear
    2015
  • Firstpage
    1322
  • Lastpage
    1323
  • Abstract
    A system is developed to redact personally identifiable information (PII) through a combination of entity recognition, regular expressions, and machine learning with very high precision from millions of medical transcriptions. This system is trained and tested with manually redacted medical transcriptions using an internally developed coding system, providing double blind classification capabilities.
  • Keywords
    "Medical services","Medical diagnostic imaging","Encoding","Pipelines","Manuals","Floors","Big data"
  • Publisher
    ieee
  • Conference_Titel
    Big Data (Big Data), 2015 IEEE International Conference on
  • Type

    conf

  • DOI
    10.1109/BigData.2015.7363889
  • Filename
    7363889