Title of article

An automatic approach for ontology-based feature extraction from heterogeneous textualresources

Author/Authors

Vicient، نويسنده , , Carlos and Sلnchez، نويسنده , , David and Moreno-Yanguela، نويسنده , , Antonio، نويسنده ,

Pages

From page

1092

To page

1106

Abstract

Data mining algorithms such as data classification or clustering methods exploit features of entities to characterise, group or classify them according to their resemblance. In the past, many feature extraction methods focused on the analysis of numerical or categorical properties. In recent years, motivated by the success of the Information Society and the WWW, which has made available enormous amounts of textual electronic resources, researchers have proposed semantic data classification and clustering methods that exploit textual data at a conceptual level. To do so, these methods rely on pre-annotated inputs in which text has been mapped to their formal semantics according to one or several knowledge structures (e.g. ontologies, taxonomies). Hence, they are hampered by the bottleneck introduced by the manual semantic mapping process. To tackle this problem, this paper presents a domain-independent, automatic and unsupervised method to detect relevant features from heterogeneous textual resources, associating them to concepts modelled in a background ontology. The method has been applied to raw text resources and also to semi-structured ones (Wikipedia articles). It has been tested in the Tourism domain, showing promising results.

Keywords

Information extraction , feature extraction , ontologies , Wikipedia

Journal title

Astroparticle Physics

Record number

2047742

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=10&DC=2047742