DocumentCode :
3088420
Title :
Feature reduction and database maintenance in NETNEWS classification
Author :
Hsu, Wen-Lin ; Lang, Sheau-Dong
Author_Institution :
Dept. of Comput. Sci., Univ. of Central Florida, Orlando, FL, USA
fYear :
1999
fDate :
36373
Firstpage :
137
Lastpage :
144
Abstract :
We propose a statistical feature reduction technique to filter out the most ambiguous articles in the training data for categorizing NETNEWS articles. We also incorporate a batch updating scheme to periodically do maintenance on the term structures of the news database after training. The baseline method combines the terms of all the articles of each newsgroup in the training set to represent the newsgroups as single vectors. After training, the incoming news articles are classified based on their similarity to the existing newsgroup categories. Our implementation uses an inverted file to store the trained term structures of each newsgroup, and uses a list similar to the inverted file to buffer the newly arrived articles, for efficient routing and updating purposes. Our experimental results using real NETNEWS articles and newsgroups demonstrate that: (1) applying feature reduction to the training set improves the routing accuracy, efficiency and database storage, (2) updating improves the routing accuracy; and (3) the batch technique improves the efficiency of the updating operation
Keywords :
database management systems; file organisation; information retrieval; software maintenance; statistical analysis; text analysis; NETNEWS articles; NETNEWS classification; ambiguous articles; baseline method; batch technique; batch updating scheme; database maintenance; database storage; feature reduction; inverted file; news database; newsgroup categories; real NETNEWS articles; routing accuracy; statistical feature reduction technique; term structures; trained term structures; training data; training set; updating operation; Buffer storage; Computer science; Data mining; Feature extraction; Information retrieval; Machine learning; Routing; Spatial databases; Text categorization; Training data;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Database Engineering and Applications, 1999. IDEAS '99. International Symposium Proceedings
Conference_Location :
Montreal, Que.
Print_ISBN :
0-7695-0265-2
Type :
conf
DOI :
10.1109/IDEAS.1999.787262
Filename :
787262
Link To Document :
بازگشت