DocumentCode :
3436280
Title :
An Approach for Text Categorization in Digital Library
Author :
Wang, Tao ; Desai, Bipin C.
Author_Institution :
Concordia Univ., Montreal
fYear :
2007
fDate :
6-8 Sept. 2007
Firstpage :
21
Lastpage :
27
Abstract :
Text categorization is a very effective way to organize enormous number of documents in Digital Libraries. Accurate classification of documents is able to not only enhance document search precision, but also facilitate browsing-by- topic functionality. It is, nonetheless, difficult to obtain a satisfactory categorization accuracy compared to the corresponding results given by professional catalogers. This is due largely to the complexity of the pre-defined large-scaled category hierarchies that makes it difficult for learning algorithms to distinguish among categories. This paper describes a top-down document classification approach which takes advantage of the hierarchical structure, more specifically, in two ways: identifying the number of independent local classifiers and guiding top-down classification procedure. We finally evaluate it within the CINDI Digital Library applying ACM Classification System as targeted hierarchy. Experimental results show the promise of this approach.
Keywords :
digital libraries; text analysis; browsing-by-topic functionality; digital library; document search precision; documents classification; large-scaled category hierarchies; learning algorithms; text categorization; Chromium; Classification tree analysis; Computer science; Learning systems; Neural networks; Probabilistic logic; Software libraries; Support vector machine classification; Support vector machines; Text categorization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Database Engineering and Applications Symposium, 2007. IDEAS 2007. 11th International
Conference_Location :
Banff, Alta.
ISSN :
1098-8068
Print_ISBN :
978-0-7695-2947-9
Type :
conf
DOI :
10.1109/IDEAS.2007.4318085
Filename :
4318085
Link To Document :
بازگشت