• DocumentCode
    229350
  • Title

    User identification through command history analysis

  • Author

    Khosmood, Foaad ; Nico, Phillip L. ; Woolery, Jonathan

  • Author_Institution
    Dept. of Comput. Sci., California Polytech. State Univ., San Luis Obispo, CA, USA
  • fYear
    2014
  • fDate
    9-12 Dec. 2014
  • Firstpage
    1
  • Lastpage
    7
  • Abstract
    As any veteran of the editor wars can attest, Unix users can be fiercely and irrationally attached to the commands they use and the manner in which they use them. In this work, we investigate the problem of identifying users out of a large set of candidates (25-97) through their command-line histories. Using standard algorithms and feature sets inspired by natural language authorship attribution literature, we demonstrate conclusively that individual users can be identified with a high degree of accuracy through their command-line behavior. Further, we report on the best performing feature combinations, from the many thousands that are possible, both in terms of accuracy and generality. We validate our work by experimenting on three user corpora comprising data gathered over three decades at three distinct locations. These are the Greenberg user profile corpus (168 users), Schonlau masquerading corpus (50 users) and Cal Poly command history corpus (97 users). The first two are well known corpora published in 1991 and 2001 respectively. The last is developed by the authors in a year-long study in 2014 and represents the most recent corpus of its kind. For a 50 user configuration, we find feature sets that can successfully identify users with over 90% accuracy on the Cal Poly, Greenberg and one variant of the Schonlau corpus, and over 87% on the other Schonlau variant.
  • Keywords
    Unix; information analysis; learning (artificial intelligence); natural language processing; Cal Poly command history corpus; Schonlau corpus; Schonlau masquerading corpus; Schonlau variant; Unix user; command history analysis; command-line behavior; command-line history; editor war; feature set; natural language authorship attribution literature; standard algorithm; user configuration; user corpora; user identification; user profile corpus; Accuracy; Computer science; Decision trees; Entropy; Feature extraction; History; Semantics;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Intelligence in Cyber Security (CICS), 2014 IEEE Symposium on
  • Conference_Location
    Orlando, FL
  • Type

    conf

  • DOI
    10.1109/CICYBS.2014.7013363
  • Filename
    7013363