• DocumentCode
    3128165
  • Title

    Ranking documents by internal variability

  • Author

    Skillicorn, D.B. ; Chandrasekaran, P.K.

  • Author_Institution
    Sch. of Comput., Queen´´s Univ., Kingston, ON, Canada
  • fYear
    2012
  • fDate
    11-14 June 2012
  • Firstpage
    180
  • Lastpage
    182
  • Abstract
    An analyst, presented with a corpus too large to read every document, must find some selection mechanism. A model for interestingness can be used to rank the documents so that only the subset at the top of the ranking need be examined. However, in many open-source intelligence settings, such a model is not known in advance. We design three measures for ranking documents by internal variability as a weak surrogate for interestingness. Selecting those documents ranked highly by these measures selects a superset of the documents an analyst might need to read, no matter what the specific model, and reduces the size of the corpus by an order of magnitude. We also discover that many corpora contain documents that are highly variable, but not interesting, and show how to remove them.
  • Keywords
    document handling; internal variability; open source intelligence settings; ranking documents; selection mechanism; Analytical models; Bayesian methods; Educational institutions; Humans; Loss measurement; Shape; Text analysis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligence and Security Informatics (ISI), 2012 IEEE International Conference on
  • Conference_Location
    Arlington, VA
  • Print_ISBN
    978-1-4673-2105-1
  • Type

    conf

  • DOI
    10.1109/ISI.2012.6284292
  • Filename
    6284292