• DocumentCode
    1972384
  • Title

    Digging up social structures from documents on the web

  • Author

    Gessiou, E. ; Volanis, S. ; Athanasopoulos, Elias ; Markatos, Evangelos P. ; Ioannidis, Sotiris

  • Author_Institution
    Polytech. Inst. of New York Univ., Brooklyn, NY, USA
  • fYear
    2012
  • fDate
    3-7 Dec. 2012
  • Firstpage
    744
  • Lastpage
    750
  • Abstract
    We collected more than ten million Microsoft Office documents from public websites, analyzed the metadata stored in each document and extracted information related to social activities. Our analysis revealed the existence of exactly identified cliques of users that edit, revise and collaborate on industrial and military content. We also examined cliques in documents downloaded from Fortune-500 company websites. We constructed their graphs and measured their properties. The graphs contained many connected components and presented social properties. The a priori knowledge of a company´s social graph may significantly assist an adversary to launch targeted attacks, such as targeted advertisements and phishing emails. Our study demonstrates the privacy risks associated with metadata by cross-correlating all members identified in a clique with users of Twitter. We show that it is possible to match authors collaborating in the creation of a document with Twitter accounts. To the best of our knowledge, this study is the first to identify individuals and create social cliques solely based on information derived from document metadata. Our study raises major concerns about the risks involved in privacy leakage due to document metadata.
  • Keywords
    data privacy; document handling; graph theory; meta data; social networking (online); Fortune-500 company Websites; Microsoft Office documents; Twitter accounts; company social graph; document metadata; information extraction; metadata analysis; phishing emails; privacy leakage; privacy risks; public Websites; social activities; social cliques; social properties; social structures; targeted advertisements;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Global Communications Conference (GLOBECOM), 2012 IEEE
  • Conference_Location
    Anaheim, CA
  • ISSN
    1930-529X
  • Print_ISBN
    978-1-4673-0920-2
  • Electronic_ISBN
    1930-529X
  • Type

    conf

  • DOI
    10.1109/GLOCOM.2012.6503202
  • Filename
    6503202