DocumentCode :
2023992
Title :
Robust Models in Information Retrieval
Author :
Lipka, Nedim ; Stein, Benno
Author_Institution :
Bauhaus-Univ. Weimar, Weimar, Germany
fYear :
2011
fDate :
Aug. 29 2011-Sept. 2 2011
Firstpage :
185
Lastpage :
189
Abstract :
Classification tasks in information retrieval deal with document collections of enormous size, which makes the ratio between the document set underlying the learning process and the set of unseen documents very small. With a ratio close to zero, the evaluation of a model-classifier-combination\´s generalization ability with leave-n-out-methods or cross-validation becomes unreliable: The generalization error of a complex model (with a more complex hypothesis structure) might underestimated compared to the generalization error of a simple model (with a less complex hypothesis structure). Given this situation, optimizing the bias-variance-tradeoff to select among these models will lead one astray. To address this problem we introduce the idea of robust models, where one intentionally restricts the hypothesis structure within the model formation process. We observe that -- despite the fact that such a robust model entails a higher test error -- its efficiency "in the wild" outperforms the model that would have been chosen normally, under the perspective of the best bias-variance-tradeoff. We present two case studies: (1) a categorization task, which demonstrates that robust models are more stable in retrieval situations when training data is scarce, and (2) a genre identification task, which underlines the practical relevance of robust models.
Keywords :
classification; information retrieval; learning (artificial intelligence); bias-variance-tradeoff; classification tasks; document collections; generalization error; information retrieval; learning process; robust models; Accuracy; Analytical models; Machine learning; Robustness; Training; Vocabulary; bias; machine learning; overfitting; retrieval model;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Database and Expert Systems Applications (DEXA), 2011 22nd International Workshop on
Conference_Location :
Toulouse
ISSN :
1529-4188
Print_ISBN :
978-1-4577-0982-1
Type :
conf
DOI :
10.1109/DEXA.2011.73
Filename :
6059815
Link To Document :
بازگشت