DocumentCode
3128165
Title
Ranking documents by internal variability
Author
Skillicorn, D.B. ; Chandrasekaran, P.K.
Author_Institution
Sch. of Comput., Queen´´s Univ., Kingston, ON, Canada
fYear
2012
fDate
11-14 June 2012
Firstpage
180
Lastpage
182
Abstract
An analyst, presented with a corpus too large to read every document, must find some selection mechanism. A model for interestingness can be used to rank the documents so that only the subset at the top of the ranking need be examined. However, in many open-source intelligence settings, such a model is not known in advance. We design three measures for ranking documents by internal variability as a weak surrogate for interestingness. Selecting those documents ranked highly by these measures selects a superset of the documents an analyst might need to read, no matter what the specific model, and reduces the size of the corpus by an order of magnitude. We also discover that many corpora contain documents that are highly variable, but not interesting, and show how to remove them.
Keywords
document handling; internal variability; open source intelligence settings; ranking documents; selection mechanism; Analytical models; Bayesian methods; Educational institutions; Humans; Loss measurement; Shape; Text analysis;
fLanguage
English
Publisher
ieee
Conference_Titel
Intelligence and Security Informatics (ISI), 2012 IEEE International Conference on
Conference_Location
Arlington, VA
Print_ISBN
978-1-4673-2105-1
Type
conf
DOI
10.1109/ISI.2012.6284292
Filename
6284292
Link To Document