DocumentCode
270736
Title
De-identification in natural language processing
Author
Vincze, Veronika ; Farkas, RichaÌrd
Author_Institution
MTA-SZTE Res. Group on Artificial Intell., Univ. of Szeged Szeged, Szeged, Hungary
fYear
2014
fDate
26-30 May 2014
Firstpage
1300
Lastpage
1303
Abstract
Natural language processing (NLP) systems usually require a huge amount of textual data but the publication of such datasets is often hindered by privacy and data protection issues. Here, we discuss the questions of de-identification related to three NLP areas, namely, clinical NLP, NLP for social media and information extraction from resumes. We also illustrate how de-identification is related to named entity recognition and we argue that de-identification tools can be successfully built on named entity recognizers.
Keywords
data privacy; natural language processing; NLP areas; NLP systems; data protection; information extraction; natural language processing; privacy protection; social media; textual data; Databases; Educational institutions; Electronic mail; Informatics; Information retrieval; Media; Natural language processing;
fLanguage
English
Publisher
ieee
Conference_Titel
Information and Communication Technology, Electronics and Microelectronics (MIPRO), 2014 37th International Convention on
Conference_Location
Opatija
Print_ISBN
978-953-233-081-6
Type
conf
DOI
10.1109/MIPRO.2014.6859768
Filename
6859768
Link To Document