DocumentCode :
2052238
Title :
LAIR: A Language for Automated Semantics-Aware Text Sanitization Based on Frame Semantics
Author :
Hedegaard, Steffen ; Houen, Søren ; Simonsen, Jakob Grue
Author_Institution :
Dept. of Comput. Sci., Univ. of Copenhagen (DIKU), Copenhagen, Denmark
fYear :
2009
fDate :
14-16 Sept. 2009
Firstpage :
47
Lastpage :
52
Abstract :
We present LAIR: A domain-specific language that enables users to specify actions to be taken upon meeting specific semantic frames in a text, in particular to rephrase and redact the textual content. While LAIR presupposes superficial knowledge of frames and frame semantics, it requires only limited prior programming experience. It neither contain scripting or I/O primitives, nor does it contain general loop constructions and is not Turing-complete. We have implemented a LAIR compiler and integrated it in a pipeline for automated redaction of web pages. We detail our experience with automated redaction of web pages for subjectively undesirable content; initial experiments suggest that using a small language based on semantic recognition of undesirable terms can be highly useful as a supplement to traditional methods of text sanitization.
Keywords :
Internet; computational linguistics; natural languages; program compilers; text analysis; LAIR compiler; automated Web page redaction; automated semantics-aware text sanitization; domain-specific language; frame semantics; language for automatically inferred redaction; semantic recognition; textual content; Computer science; Data security; Domain specific languages; Government; Hospitals; Information security; Natural languages; Pipelines; Text recognition; Web pages;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Semantic Computing, 2009. ICSC '09. IEEE International Conference on
Conference_Location :
Berkeley, CA
Print_ISBN :
978-1-4244-4962-0
Electronic_ISBN :
978-0-7695-3800-6
Type :
conf
DOI :
10.1109/ICSC.2009.79
Filename :
5298551
Link To Document :
بازگشت