Title :
Making Unstructured Data SPARQL Using Semantic Indexing in Oracle Database
Author :
Das, Souripriya ; Sundara, Seema ; Perry, Matthew ; Srinivasan, Jagannathan ; Banerjee, Jayanta ; Yalamanchi, Aravind
Author_Institution :
Oracle, Nashua, NH, USA
Abstract :
This paper describes the Semantic Indexing feature introduced in Oracle Database for indexing unstructured text (document) columns. This capability enables searching for concepts (such as people, places, organizations, and events), in addition to words or phrases, with further options for sense disambiguation and term expansion by consulting knowledge captured in OWL/RDF ontologies. The distinguishing aspects of our approach are: 1) Indexing: Instead of building a traditional inverted index of (annotated) token and/or named entity occurrences, we extract the entities, associations, and events present in a text column data and store them as RDF named graphs in the Oracle Database Semantic Store. This base content can be further augmented with knowledge bases and inferred triples (obtained by applying domain-specific ontologies and rule bases). 2) Querying: Instead of relying on proprietary extensions for specifying a search, we allow users to specify a complete SPARQL query pattern that can capture arbitrarily complex relationships between query terms. We have implemented this feature by introducing a sem_contains SQL operator and the associated sem_indextype indexing scheme. The indexing scheme employs an extensible architecture that supports indexing of unstructured text using native as well as third party text extraction tools. The paper presents a model for the semantic index and querying, describes the feature, and outlines its implementation leveraging Oracle´s native support for RDF/OWL storage, inferencing, and querying. We also report a study involving use of this feature on a TREC collection of over 130,000 news articles.
Keywords :
SQL; database indexing; knowledge representation languages; object-oriented databases; ontologies (artificial intelligence); query processing; text analysis; OWL-RDF ontologies; Oracle database semantic store; Oracle native support; RDF-OWL storage; SPARQL query pattern; TREC collection; association extraction; entities extraction; entity occurrences; information extraction tools; inverted index; knowledge bases; query terms; search specification; sem-contains SQL operator; sem-indextype indexing scheme; semantic indexing; sense disambiguation; term expansion; text column data; unstructured data SPARQL; unstructured text columns indexing; Data mining; Indexing; Knowledge based systems; Ontologies; Resource description framework; Semantics;
Conference_Titel :
Data Engineering (ICDE), 2012 IEEE 28th International Conference on
Conference_Location :
Washington, DC
Print_ISBN :
978-1-4673-0042-1
DOI :
10.1109/ICDE.2012.59