DocumentCode
1972384
Title
Digging up social structures from documents on the web
Author
Gessiou, E. ; Volanis, S. ; Athanasopoulos, Elias ; Markatos, Evangelos P. ; Ioannidis, Sotiris
Author_Institution
Polytech. Inst. of New York Univ., Brooklyn, NY, USA
fYear
2012
fDate
3-7 Dec. 2012
Firstpage
744
Lastpage
750
Abstract
We collected more than ten million Microsoft Office documents from public websites, analyzed the metadata stored in each document and extracted information related to social activities. Our analysis revealed the existence of exactly identified cliques of users that edit, revise and collaborate on industrial and military content. We also examined cliques in documents downloaded from Fortune-500 company websites. We constructed their graphs and measured their properties. The graphs contained many connected components and presented social properties. The a priori knowledge of a company´s social graph may significantly assist an adversary to launch targeted attacks, such as targeted advertisements and phishing emails. Our study demonstrates the privacy risks associated with metadata by cross-correlating all members identified in a clique with users of Twitter. We show that it is possible to match authors collaborating in the creation of a document with Twitter accounts. To the best of our knowledge, this study is the first to identify individuals and create social cliques solely based on information derived from document metadata. Our study raises major concerns about the risks involved in privacy leakage due to document metadata.
Keywords
data privacy; document handling; graph theory; meta data; social networking (online); Fortune-500 company Websites; Microsoft Office documents; Twitter accounts; company social graph; document metadata; information extraction; metadata analysis; phishing emails; privacy leakage; privacy risks; public Websites; social activities; social cliques; social properties; social structures; targeted advertisements;
fLanguage
English
Publisher
ieee
Conference_Titel
Global Communications Conference (GLOBECOM), 2012 IEEE
Conference_Location
Anaheim, CA
ISSN
1930-529X
Print_ISBN
978-1-4673-0920-2
Electronic_ISBN
1930-529X
Type
conf
DOI
10.1109/GLOCOM.2012.6503202
Filename
6503202
Link To Document