DocumentCode :
3200217
Title :
Cichlid: Efficient Large Scale RDFS/OWL Reasoning with Spark
Author :
Rong Gu ; Shanyong Wang ; Fangfang Wang ; Chunfeng Yuan ; Yihua Huang
Author_Institution :
Nat. Key Lab. for Novel Software Technol., Nanjing Univ., Nanjing, China
fYear :
2015
fDate :
25-29 May 2015
Firstpage :
700
Lastpage :
709
Abstract :
In the era of big data, the volume of semantic data grows rapidly. The large scale semantic data contains a lot of significant but often implicit information that needs to be derived by reasoning. The semantic data reasoning is a challenging process. On one hand, the traditional single-node reasoning systems can hardly cope with such large amount of data due to the resource limitations. On the other hand, the existing large scale reasoning systems are not very efficient and scalable due to the complexity of reasoning process. In this paper, we propose Cichlid, an efficient distributed reasoning engine for the widely-used RDFS and OWL Horst rule sets. Cichlid is built on top of Spark. It implements parallel reasoning algorithms with the Spark RDD programming model. Further, we proposed the optimized parallel RDFS reasoning algorithm from three aspects, including data partition model, the execution order of reasoning rules and removing of duplicated data. Then, for the parallel OWL reasoning process, we optimized the most time-consuming parts, including large-scale data join, the transitive closure computation and the equivalent relation computation. In addition to above optimizations at the reasoning algorithm level, we also optimized the inner Spark execution mechanism by proposing an off-heap memory storage mechanism for RDD. This system-level optimization patch has been accepted and integrated into Apache Spark 1.0. The experimental results show that Cichlid is around 10 times faster on average than the state-of-the-art distributed reasoning systems for both large scale synthetic and real-world benchmarks. The proposed reasoning algorithms and engine also achieve excellent scalability and fault tolerance.
Keywords :
Big Data; fault tolerant computing; inference mechanisms; knowledge representation languages; parallel algorithms; Apache Spark; Big Data; Cichlid; OWL Horst rule sets; Spark RDD programming model; Spark execution mechanism; data partition model; distributed reasoning engine; fault tolerance; large scale RDFS/OWL reasoning; large scale reasoning systems; large scale semantic data; off-heap memory storage mechanism; optimized parallel RDFS reasoning algorithm; parallel OWL reasoning process; parallel reasoning algorithms; semantic data reasoning; system-level optimization patch; widely-used RDFS; Cognition; Data models; OWL; Optimization; Resource description framework; Semantics; Sparks; OWL; RDFS; in-memory computing; parallel reasoning; reasoning; semantic data;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel and Distributed Processing Symposium (IPDPS), 2015 IEEE International
Conference_Location :
Hyderabad
ISSN :
1530-2075
Type :
conf
DOI :
10.1109/IPDPS.2015.14
Filename :
7161557
Link To Document :
بازگشت