DocumentCode :
2652102
Title :
NASS: News Annotation Semantic System
Author :
Garrido, Angel L. ; Gómez, Oscar ; Ilarri, Sergio ; Mena, Eduardo
Author_Institution :
Grupo Heraldo - Grupo La Informacion, Pamplona, Spain
fYear :
2011
fDate :
7-9 Nov. 2011
Firstpage :
904
Lastpage :
905
Abstract :
Today in media companies there is a serious problem for cataloging news due to the large number of articles received by the documentation departments. That manual labor is subject to many errors and omissions because of the different points of view and expertise level of each staff member. There is also an additional difficulty due to the large size of the list of words in a thesaurus. In this paper, we present a new method for solving the problem of text categorization over a corpus of newspaper articles where the annotation must be composed of thesaurus elements. The method consists of applying lemmatization, obtaining keywords and named entities, and finally using a combination of Support Vector Machines (SVM), ontologies and heuristics to infer appropriate tags for the annotation. We carried out a detailed evaluation of our method with real newspaper articles, and we compared out tagging with the annotation performed by a real documentation department, obtaining really promising results.
Keywords :
cataloguing; information science; ontologies (artificial intelligence); support vector machines; text analysis; thesauri; NASS; SVM; cataloging news; documentation departments; lemmatization; media companies; news annotation semantic system; newspaper articles; ontologies; support vector machines; text categorization; thesaurus elements; Data mining; Documentation; Media; Ontologies; Semantics; Support vector machines; Thesauri; Heuristics; Information Extraction; Knowledge Discovery; Media; Natural Language Processing; Ontologies; SVM; Text Mining;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Tools with Artificial Intelligence (ICTAI), 2011 23rd IEEE International Conference on
Conference_Location :
Boca Raton, FL
ISSN :
1082-3409
Print_ISBN :
978-1-4577-2068-0
Electronic_ISBN :
1082-3409
Type :
conf
DOI :
10.1109/ICTAI.2011.149
Filename :
6103440
Link To Document :
بازگشت