• DocumentCode
    3073637
  • Title

    Adaptable N-gram classification model for data leakage prevention

  • Author

    Alneyadi, Sultan ; Sithirasenan, E. ; Muthukkumarasamy, Vallipuram

  • Author_Institution
    Sch. of Inf. & Commun. Technol., Griffith Univ., Gold Coast, QLD, Australia
  • fYear
    2013
  • fDate
    16-18 Dec. 2013
  • Firstpage
    1
  • Lastpage
    8
  • Abstract
    Data confidentiality, integrity and availability are the ultimate goals for all information security mechanisms. However, most of these mechanisms do not proactively protect sensitive data; rather, they work under predefined policies and conditions to protect data in general. Few systems such as anomaly-based intrusion detection systems (IDS) might work independently without much administrative interference, but with no dedication to sensitivity of data. New mechanisms called data leakage prevention systems (DLP) have been developed to mitigate the risk of sensitive data leakage. Current DLPs mostly use data fingerprinting and exact and partial document matching to classify sensitive data. These approaches can have a serious limitation because they are susceptible to data misidentification. In this paper, we investigate the use of N-grams statistical analysis for data classification purposes. Our method is based on using N-grams frequency to classify documents under distinct categories. We are using simple taxicap geometry to compute the similarity between documents and existing categories. Moreover, we examine the effect of removing the most common words and connecting phrases on the overall classification. We are aiming to compensate the limitations in current data classification approaches used in the field of data leakage prevention. We show that our method is capable of correctly classifying up to 90.5% of the tested documents.
  • Keywords
    pattern classification; security of data; statistical analysis; N-grams statistical analysis; adaptable N-gram classification model; data availability; data classification; data confidentiality; data fingerprinting; data integrity; data leakage prevention; data misidentification; information security; partial document matching; taxicap geometry; Encryption; IP networks; Radio access networks; Servers; Virtual private networks; Data leakage prevention; N-gram profiles; N-grams;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Signal Processing and Communication Systems (ICSPCS), 2013 7th International Conference on
  • Conference_Location
    Carrara, VIC
  • Type

    conf

  • DOI
    10.1109/ICSPCS.2013.6723919
  • Filename
    6723919