• DocumentCode
    270736
  • Title

    De-identification in natural language processing

  • Author

    Vincze, Veronika ; Farkas, Richárd

  • Author_Institution
    MTA-SZTE Res. Group on Artificial Intell., Univ. of Szeged Szeged, Szeged, Hungary
  • fYear
    2014
  • fDate
    26-30 May 2014
  • Firstpage
    1300
  • Lastpage
    1303
  • Abstract
    Natural language processing (NLP) systems usually require a huge amount of textual data but the publication of such datasets is often hindered by privacy and data protection issues. Here, we discuss the questions of de-identification related to three NLP areas, namely, clinical NLP, NLP for social media and information extraction from resumes. We also illustrate how de-identification is related to named entity recognition and we argue that de-identification tools can be successfully built on named entity recognizers.
  • Keywords
    data privacy; natural language processing; NLP areas; NLP systems; data protection; information extraction; natural language processing; privacy protection; social media; textual data; Databases; Educational institutions; Electronic mail; Informatics; Information retrieval; Media; Natural language processing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information and Communication Technology, Electronics and Microelectronics (MIPRO), 2014 37th International Convention on
  • Conference_Location
    Opatija
  • Print_ISBN
    978-953-233-081-6
  • Type

    conf

  • DOI
    10.1109/MIPRO.2014.6859768
  • Filename
    6859768