DocumentCode
2459125
Title
Making Unstructured Data SPARQL Using Semantic Indexing in Oracle Database
Author
Das, Souripriya ; Sundara, Seema ; Perry, Matthew ; Srinivasan, Jagannathan ; Banerjee, Jayanta ; Yalamanchi, Aravind
Author_Institution
Oracle, Nashua, NH, USA
fYear
2012
fDate
1-5 April 2012
Firstpage
1405
Lastpage
1416
Abstract
This paper describes the Semantic Indexing feature introduced in Oracle Database for indexing unstructured text (document) columns. This capability enables searching for concepts (such as people, places, organizations, and events), in addition to words or phrases, with further options for sense disambiguation and term expansion by consulting knowledge captured in OWL/RDF ontologies. The distinguishing aspects of our approach are: 1) Indexing: Instead of building a traditional inverted index of (annotated) token and/or named entity occurrences, we extract the entities, associations, and events present in a text column data and store them as RDF named graphs in the Oracle Database Semantic Store. This base content can be further augmented with knowledge bases and inferred triples (obtained by applying domain-specific ontologies and rule bases). 2) Querying: Instead of relying on proprietary extensions for specifying a search, we allow users to specify a complete SPARQL query pattern that can capture arbitrarily complex relationships between query terms. We have implemented this feature by introducing a sem_contains SQL operator and the associated sem_indextype indexing scheme. The indexing scheme employs an extensible architecture that supports indexing of unstructured text using native as well as third party text extraction tools. The paper presents a model for the semantic index and querying, describes the feature, and outlines its implementation leveraging Oracle´s native support for RDF/OWL storage, inferencing, and querying. We also report a study involving use of this feature on a TREC collection of over 130,000 news articles.
Keywords
SQL; database indexing; knowledge representation languages; object-oriented databases; ontologies (artificial intelligence); query processing; text analysis; OWL-RDF ontologies; Oracle database semantic store; Oracle native support; RDF-OWL storage; SPARQL query pattern; TREC collection; association extraction; entities extraction; entity occurrences; information extraction tools; inverted index; knowledge bases; query terms; search specification; sem-contains SQL operator; sem-indextype indexing scheme; semantic indexing; sense disambiguation; term expansion; text column data; unstructured data SPARQL; unstructured text columns indexing; Data mining; Indexing; Knowledge based systems; Ontologies; Resource description framework; Semantics;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Engineering (ICDE), 2012 IEEE 28th International Conference on
Conference_Location
Washington, DC
ISSN
1063-6382
Print_ISBN
978-1-4673-0042-1
Type
conf
DOI
10.1109/ICDE.2012.59
Filename
6228209
Link To Document