Title :
Using Progressive Filtering to Deal with Information Overload
Author :
Addis, Andrea ; Armano, Giuliano ; Vargiu, Eloisa
Author_Institution :
Univ. of Cagliari, Cagliari, Italy
fDate :
Aug. 30 2010-Sept. 3 2010
Abstract :
In the age of Web 2.0 people organize large collections of web pages, articles, or emails in hierarchies of topics, or arrange a large body of knowledge in ontologies. This scenario requires automatic text categorization systems able to cope with underlying taxonomies in an effective and efficient way, so that information overload and input imbalance can be suitably dealt with. In this work, we propose a hierarchical text categorization approach that decomposes a given rooted taxonomy into pipelines, one for each path that exists between the root and each node of the taxonomy, so that each pipeline can be tuned in isolation. Experimental results, performed on Reuters and DMOZ data collections, show that the proposed approach performs better than a flat approach in presence of input imbalance.
Keywords :
Internet; electronic mail; information filtering; ontologies (artificial intelligence); pipeline processing; text analysis; DMOZ data collection; Web 2.0; Web pages articles; automatic text categorization system; hierarchical text categorization approach; information overload; pipelines taxonomy; progressive filtering; Complexity theory; Conferences; Filtering; Pipelines; Taxonomy; Text categorization; Training; DMOZ; Hierarchical Text Categorization; Information Overload; Input Imbalance; Reuters Corpus;
Conference_Titel :
Database and Expert Systems Applications (DEXA), 2010 Workshop on
Conference_Location :
Bilbao
Print_ISBN :
978-1-4244-8049-4
DOI :
10.1109/DEXA.2010.26