Title :
Using common-sense knowledge-base for detecting word obfuscation in adversarial communication
Author :
Agarwal, Swati ; Sureka, Ashish
Author_Institution :
Indraprasth Inst. of Inf. & Technol., Delhi, India
Abstract :
Word obfuscation or substitution means replacing one word with another word in a sentence to conceal the textual content or communication. Word obfuscation is used in adversarial communication by terrorist or criminals for conveying their messages without getting red-flagged by security and intelligence agencies intercepting or scanning messages (such as emails and telephone conversations). ConceptNet is a freely available semantic network represented as a directed graph consisting of nodes as concepts and edges as assertions of common sense about these concepts. We present a solution approach exploiting vast amount of semantic knowledge in ConceptNet for addressing the technically challenging problem of word substitution in adversarial communication. We frame the given problem as a textual reasoning and context inference task and utilize ConceptNet´s natural-language-processing tool-kit for determining word substitution. We use ConceptNet to compute the conceptual similarity between any two given terms and define a Mean Average Conceptual Similarity (MACS) metric to identify out-of-context terms. The test-bed to evaluate our proposed approach consists of Enron email dataset (having over 600000 emails generated by 158 employees of Enron Corporation) and Brown corpus (totaling about a million words drawn from a wide variety of sources). We implement word substitution techniques used by previous researches to generate a test dataset.We conduct a series of experiments consisting of word substitution methods used in the past to evaluate our approach. Experimental results reveal that the proposed approach is effective.
Keywords :
directed graphs; electronic mail; inference mechanisms; national security; natural language processing; semantic networks; terrorism; text analysis; word processing; Brown corpus; ConceptNet; ConceptNet natural-language-processing tool-kit; Enron Corporation; Enron email dataset; MACS metric; adversarial communication; common-sense knowledge-base; context inference task; criminals; directed graph; intelligence agencies; mean average conceptual similarity metric; message scanning; security agencies; semantic knowledge; semantic network; terrorist; textual communication; textual content; textual reasoning; word obfuscation detection; word substitution techniques; Bismuth; Postal services; ConceptNet; Intelligence and Security Informatics; Natural Language Processing; Semantic Similarity; Word Substitution;
Conference_Titel :
Communication Systems and Networks (COMSNETS), 2015 7th International Conference on
Conference_Location :
Bangalore
DOI :
10.1109/COMSNETS.2015.7098738