Title of article
Temporal contexts: Effective text classification in evolving document collections Review Article
Author/Authors
Leonardo Rocha Souza، نويسنده , , Fernando Mour?o، نويسنده , , Hilton Mota، نويسنده , , Thiago Salles، نويسنده , , Marcos André Gonçalves، نويسنده , , Wagner Meira Jr.، نويسنده ,
Issue Information
روزنامه با شماره پیاپی سال 2013
Pages
22
From page
388
To page
409
Abstract
The management of a huge and growing amount of information available nowadays makes Automatic Document Classification (ADC), besides crucial, a very challenging task. Furthermore, the dynamics inherent to classification problems, mainly on the Web, make this task even more challenging. Despite this fact, the actual impact of such temporal evolution on ADC is still poorly understood in the literature. In this context, this work concerns to evaluate, characterize and exploit the temporal evolution to improve ADC techniques. As first contribution we highlight the proposal of a pragmatical methodology for evaluating the temporal evolution in ADC domains. Through this methodology, we can identify measurable factors associated to ADC models degradation over time. Going a step further, based on such analyzes, we propose effective and efficient strategies to make current techniques more robust to natural shifts over time. We present a strategy, named temporal context selection, for selecting portions of the training set that minimize those factors. Our second contribution consists of proposing a general algorithm, called Chronos, for determining such contexts. By instantiating Chronos, we are able to reduce uncertainty and improve the overall classification accuracy. Empirical evaluations of heuristic instantiations of the algorithm, named WindowsChronos and FilterChronos, on two real document collections demonstrate the usefulness of our proposal. Comparing them against state-of-the-art ADC algorithms shows that selecting temporal contexts allows improvements on the classification accuracy up to 10%. Finally, we highlight the applicability and the generality of our proposal in practice, pointing out this study as a promising research direction.
Keywords
Text Mining , Temporal Evolution , Classification
Journal title
Information Systems
Serial Year
2013
Journal title
Information Systems
Record number
1230307
Link To Document