DocumentCode :
2422772
Title :
Using Progressive Filtering to Deal with Information Overload
Author :
Addis, Andrea ; Armano, Giuliano ; Vargiu, Eloisa
Author_Institution :
Univ. of Cagliari, Cagliari, Italy
fYear :
2010
fDate :
Aug. 30 2010-Sept. 3 2010
Firstpage :
20
Lastpage :
24
Abstract :
In the age of Web 2.0 people organize large collections of web pages, articles, or emails in hierarchies of topics, or arrange a large body of knowledge in ontologies. This scenario requires automatic text categorization systems able to cope with underlying taxonomies in an effective and efficient way, so that information overload and input imbalance can be suitably dealt with. In this work, we propose a hierarchical text categorization approach that decomposes a given rooted taxonomy into pipelines, one for each path that exists between the root and each node of the taxonomy, so that each pipeline can be tuned in isolation. Experimental results, performed on Reuters and DMOZ data collections, show that the proposed approach performs better than a flat approach in presence of input imbalance.
Keywords :
Internet; electronic mail; information filtering; ontologies (artificial intelligence); pipeline processing; text analysis; DMOZ data collection; Web 2.0; Web pages articles; automatic text categorization system; hierarchical text categorization approach; information overload; pipelines taxonomy; progressive filtering; Complexity theory; Conferences; Filtering; Pipelines; Taxonomy; Text categorization; Training; DMOZ; Hierarchical Text Categorization; Information Overload; Input Imbalance; Reuters Corpus;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Database and Expert Systems Applications (DEXA), 2010 Workshop on
Conference_Location :
Bilbao
ISSN :
1529-4188
Print_ISBN :
978-1-4244-8049-4
Type :
conf
DOI :
10.1109/DEXA.2010.26
Filename :
5591982
Link To Document :
بازگشت