Title :
NASS: News Annotation Semantic System
Author :
Garrido, Angel L. ; Gómez, Oscar ; Ilarri, Sergio ; Mena, Eduardo
Author_Institution :
Grupo Heraldo - Grupo La Informacion, Pamplona, Spain
Abstract :
Today in media companies there is a serious problem for cataloging news due to the large number of articles received by the documentation departments. That manual labor is subject to many errors and omissions because of the different points of view and expertise level of each staff member. There is also an additional difficulty due to the large size of the list of words in a thesaurus. In this paper, we present a new method for solving the problem of text categorization over a corpus of newspaper articles where the annotation must be composed of thesaurus elements. The method consists of applying lemmatization, obtaining keywords and named entities, and finally using a combination of Support Vector Machines (SVM), ontologies and heuristics to infer appropriate tags for the annotation. We carried out a detailed evaluation of our method with real newspaper articles, and we compared out tagging with the annotation performed by a real documentation department, obtaining really promising results.
Keywords :
cataloguing; information science; ontologies (artificial intelligence); support vector machines; text analysis; thesauri; NASS; SVM; cataloging news; documentation departments; lemmatization; media companies; news annotation semantic system; newspaper articles; ontologies; support vector machines; text categorization; thesaurus elements; Data mining; Documentation; Media; Ontologies; Semantics; Support vector machines; Thesauri; Heuristics; Information Extraction; Knowledge Discovery; Media; Natural Language Processing; Ontologies; SVM; Text Mining;
Conference_Titel :
Tools with Artificial Intelligence (ICTAI), 2011 23rd IEEE International Conference on
Conference_Location :
Boca Raton, FL
Print_ISBN :
978-1-4577-2068-0
Electronic_ISBN :
1082-3409
DOI :
10.1109/ICTAI.2011.149