Title :
Wikipedia in Action: Ontological Knowledge in Text Categorization
Author :
Janik, Maciej ; Kochut, Krys J.
Author_Institution :
Dept. of Comput. Sci., Univ. of Georgia, Athens, GA
Abstract :
We present a new, ontology-based approach to the automatic text categorization. An important and novel aspect of this approach is that our categorization method does not require a training set, which is in contrast to the traditional statistical and probabilistic methods. In the presented method, the ontology, including the domain concepts organized into hierarchies of categories and interconnected by relationships, as well as instances and connections among them, effectively becomes the classifier. Our method focuses on (i) converting a text document into a thematic graph of entities occurring in the document, (ii) ontological classification of the entities in the graph, and (iii) determining the overall categorization of the thematic graph, and as a result, the document itself. In the presented experiments, we used an RDF ontology constructed from the full English version of Wikipedia. Our experiments, conducted on corpora of Reuters news articles, showed that our training-less categorization method achieved a very good overall accuracy.
Keywords :
Web sites; graph theory; ontologies (artificial intelligence); pattern classification; text analysis; RDF ontology; Reuters news articles; Wikipedia; ontological classification; ontological knowledge; text categorization; text document; thematic entities graph; Automotive engineering; Computer science; Distributed computing; Distributed information systems; Information analysis; Large-scale systems; Ontologies; Resource description framework; Text categorization; Wikipedia; ontology; text categorization; wikipedia ontology;
Conference_Titel :
Semantic Computing, 2008 IEEE International Conference on
Conference_Location :
Santa Clara, CA
Print_ISBN :
978-0-7695-3279-0
Electronic_ISBN :
978-0-7695-3279-0
DOI :
10.1109/ICSC.2008.53