Title :
An Approach for Text Categorization in Digital Library
Author :
Wang, Tao ; Desai, Bipin C.
Author_Institution :
Concordia Univ., Montreal
Abstract :
Text categorization is a very effective way to organize enormous number of documents in Digital Libraries. Accurate classification of documents is able to not only enhance document search precision, but also facilitate browsing-by- topic functionality. It is, nonetheless, difficult to obtain a satisfactory categorization accuracy compared to the corresponding results given by professional catalogers. This is due largely to the complexity of the pre-defined large-scaled category hierarchies that makes it difficult for learning algorithms to distinguish among categories. This paper describes a top-down document classification approach which takes advantage of the hierarchical structure, more specifically, in two ways: identifying the number of independent local classifiers and guiding top-down classification procedure. We finally evaluate it within the CINDI Digital Library applying ACM Classification System as targeted hierarchy. Experimental results show the promise of this approach.
Keywords :
digital libraries; text analysis; browsing-by-topic functionality; digital library; document search precision; documents classification; large-scaled category hierarchies; learning algorithms; text categorization; Chromium; Classification tree analysis; Computer science; Learning systems; Neural networks; Probabilistic logic; Software libraries; Support vector machine classification; Support vector machines; Text categorization;
Conference_Titel :
Database Engineering and Applications Symposium, 2007. IDEAS 2007. 11th International
Conference_Location :
Banff, Alta.
Print_ISBN :
978-0-7695-2947-9
DOI :
10.1109/IDEAS.2007.4318085