• DocumentCode
    2918675
  • Title

    DOTS: Detection of Off-Topic Search via Result Clustering

  • Author

    Goharian, Nazli ; Platt, Alana

  • Author_Institution
    Illinois Inst. of Technol, Chicago
  • fYear
    2007
  • fDate
    23-24 May 2007
  • Firstpage
    145
  • Lastpage
    151
  • Abstract
    Often document dissemination is limited to a "need to know" basis so as to better maintain organizational trade secrets. Retrieving documents that are off-topic to a user\´s predefined area of information need (task) via a search engine is potentially a violation of access rights and is a concern to every private, commercial, and governmental organization. Such misuse, defined as "off-topic access to sensitive data by an authorized user", is the second most prevalent form of computer crime after viruses per a recent Computer Security Institute/Federal Bureau of Investigation study. We present a content-based off-topic detection approach that uses query result clustering to detect off-topic searches. This approach supports higher detection precision than the state of the art. Multiple methods for picking the "good" clusters are proposed, and their effect on the detection rate and precision is evaluated. A high detection precision is critical as a false access violation accusation unfairly and inappropriately subjects the user to scrutiny. Our empirical results show that using clustering query results can significantly reduce such false positives.
  • Keywords
    computer crime; pattern clustering; query processing; search engines; computer crime; content-based off-topic detection approach; document retrieval; query result clustering; search engine; Computer crime; Computer security; Computer viruses; Humans; Information filtering; Information retrieval; Laboratories; Permission; Search engines; US Department of Transportation; Clustering; Information Retrieval; Off Topic Search; misuse detection;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligence and Security Informatics, 2007 IEEE
  • Conference_Location
    New Brunswick, NJ
  • Electronic_ISBN
    1-4244-1329-X
  • Type

    conf

  • DOI
    10.1109/ISI.2007.379547
  • Filename
    4258688