Abstract
MapReduce has emerged as a key component of large scale data analysis in the cloud. However, it presents challenges for SPARQL query processing because of the absence of traditional join optimization machinery like statistics, indexes and techniques for translation of join-intensive workloads to efficient MapReduce workflows. Further, MapReduce is primarily a batch processing paradigm. Therefore, it is plausible that many workloads will include a batch of queries or new queries could be generated from given queries e.g. due to query rewriting of inferencing queries. Consequently, the issue of multi-query optimization deserves some focus and this paper lays out a vision for rule-based multi-query optimization based on a recently proposed data model and algebra, Nested TripleGroup Data Model and Algebra, for efficient SPARQL query processing on MapReduce.