Title :
Use of Web Popularity on Entity Centric Document Filtering
Author :
Vincent Bouvier;Patrice Bellot
Author_Institution :
Kware, Aix-en-Provence, France
Abstract :
Filtering pages about an entity (person, company, music band...) so that only interesting pages are kept is a real challenge. The interest can be qualified using criteria such as recency, novelty. In the last decade, we have seen classification systems trained to detect the interest for a document regarding an entity. For scalability reasons, it is not possible to consider a manual annotation of a training set for each tracked entity. Some approaches strive to build entity independent systems. These approaches obtain the state of the art performances, but we show that they can be improved. Time features differ from one entity to another, therefore no relevant statistics can be estimated out of these observations by a single classifier. Instead of having one model per entity or one model for all entities, we propose an approach that uses one model per cluster of entities based on the entity web popularity. We also introduce different strategies for automatic classification model selection. We test our approach on the Knowledge Base Acceleration (KBA) framework from TREC and we show that our approach brings significant improvements over a non-cluster-based method.
Keywords :
"Correlation","Time series analysis","Feature extraction","Training data","Computational modeling","Yttrium","Elbow"
Conference_Titel :
Web Intelligence and Intelligent Agent Technology (WI-IAT), 2015 IEEE / WIC / ACM International Conference on
DOI :
10.1109/WI-IAT.2015.211