• DocumentCode
    3626464
  • Title

    Data Preparation for User Profiling from Traffic Log

  • Author

    Marek Kumpost

  • Author_Institution
    Masaryk University
  • fYear
    2007
  • Firstpage
    89
  • Lastpage
    94
  • Abstract
    This paper presents our current work on traffic log processing. Our goal is to find an approach to modeling user behaviour based on their behavioural patterns. Since the amount of input data we have is really large, effective preprocessing is crucial for the profiling to provide significant results. This paper presents our approach to restricting the input data with respect to its relevance. We use histogram clustering to identify sets of users with similar frequencies of communication; entropy and TF-IDF (term frequency - inverse document frequency) help to select destinations that are relevant for a given set of users. The main profiling is done with preprocessed data and our experiments show that this approach to restricting the input has a positive impact on the significance of results.
  • Keywords
    "Data preprocessing","Frequency","Traffic control","Data privacy","Predictive models","Communication system security","Data security","Information security","Informatics","Histograms"
  • Publisher
    ieee
  • Conference_Titel
    Emerging Security Information, Systems, and Technologies, 2007. SecureWare 2007. The International Conference on
  • ISSN
    2162-2108
  • Print_ISBN
    0-7695-2989-5;978-0-7695-2989-9
  • Electronic_ISBN
    2162-2116
  • Type

    conf

  • DOI
    10.1109/SECUREWARE.2007.4385316
  • Filename
    4385316