Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1007/978-3-030-18576-3_44guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Leon: A Distributed RDF Engine for Multi-query Processing

Published: 24 April 2019 Publication History

Abstract

As similar queries keep springing up in real query logs, few RDF systems address this problem. In this paper, we propose Leon, a distributed RDF system, which can also deal with multi-query problem. First, we apply a characteristic-set-based partitioning scheme. This scheme (i) supports the fully parallel processing of join within characteristic sets; (ii) minimizes data communication by applying direct transmission of intermediate results instead of broadcasting. Then, Leon revisits the classical problem of multi-query optimization in the context of RDF/SPARQL. In light of the NP-hardness of the multi-query optimization for SPARQL, we propose a heuristic algorithm that partitions the input batch of queries into groups, and discover the common sub-query of multiple SPARQL queries. Our MQO algorithm incorporates with a subtle cost model to generate execution plans.
Our experiments with synthetic and real datasets verify that: (i) Leon’s startup overhead is low; (ii) Leon consistently outperforms centralized RDF engines by 1–2 orders of magnitude, and it is competitive with state-of-the-art distributed RDF engines; (iii) Our MQO approach consistently demonstrates 10 speedup over the baseline method.

References

[1]
Abdelaziz, I., Al-Harbi, R., Khayyat, Z., Kalnis, P.: A survey and experimental comparison of distributed SPARQL engines for very large RDF data. In: PVLDB (2017)
[2]
Atre, M., Chaoji, V., Zaki, M.J., Hendler, J.A.: Matrix “Bit” loaded: a scalable lightweight join query processor for RDF data. In: WWW (2010)
[3]
Carroll, J.J., Dickinson, I., Dollin, C., Reynolds, D., Seaborne, A., Wilkinson, K.J.: Jena: implementing the semantic web recommendations. In: WWW (2004)
[4]
Danon, L., Diaz-Guilera, A., Duch, J., Arenas, A.: Comparing community structure identification. J. Stat. Mech. Theory Exp. (2005)
[5]
Feng, J., Meng, C., Song, J., Zhang, X., Feng, Z., Zou, L.: SPARQL query parallel processing: a survey. In: 2017 IEEE BigData Congress (2017)
[6]
Gurajada, S.: TriAD: a distributed shared-nothing RDF engine based on asynchronous message passing. In: SIGMOD Conference (2014)
[7]
Harbi R, Abdelaziz I, Kalnis P, and Mamoulis N Accelerating SPARQL queries by exploiting hash-based locality and adaptive partitioning VLDB J. 2016 25 3 355-380
[8]
Hong, M., Demers, A.J., Gehrke, J., Koch, C., Riedewald, M.: Massively multi-query join processing in publish/subscribe systems. In: SIGMOD Conference (2007)
[9]
Hose, K., Schenkel, R.: WARP: workload-aware replication and partitioning for RDF. In: 2013 IEEE 29th International Conference on Data Engineering Workshops (ICDEW), pp. 1–6 (2013)
[10]
Kaoudi Z and Manolescu I RDF in the clouds: a survey VLDB J. 2014 24 67-91
[11]
Karypis G and Kumar V A fast and high quality multilevel scheme for partitioning irregular graphs SIAM J. Sci. Comput. 1998 20 1 359-392
[12]
Kementsietsidis A, Neven F, de Craen DV, and Vansummeren S Scalable multi-query optimization for exploratory queries over federated scientific databases PVLDB 2008 1 16-27
[13]
Kim, I., Lee, K.H., Lee, K.C.: SAMUEL: a sharing-based approach to processing multiple SPARQL queries with MapReduce. In: EDBT (2018)
[14]
Le, W., Kementsietsidis, A., Duan, S., Li, F.: Scalable multi-query optimization for SPARQL. In: 2012 IEEE 28th International Conference on Data Engineering (2012)
[15]
Lee K and Liu L Scaling queries over big RDF graphs with semantic hash partitioning PVLDB 2013 6 1894-1905
[16]
Liu, C., Qu, J., Qi, G., Wang, H., Yu, Y.: HadoopSPARQL: a hadoop-based engine for multiple SPARQL query answering. In: ESWC (2012)
[17]
Neumann, T., Moerkotte, G.: Characteristic sets: accurate cardinality estimation for RDF queries with multiple joins. In: 2011 IEEE 27th International Conference on Data Engineering, pp. 984–994 (2011)
[18]
Neumann T and Weikum G RDF-3X: a RISC-style engine for RDF PVLDB 2008 1 1 647-659
[19]
Papailiou, N., Konstantinou, I., Tsoumakos, D.: HRDF+: high-performance distributed joins over large-scale RDF graphs. In: BigData Conference (2013)
[20]
Ren X and Wang J Multi-query optimization for subgraph isomorphism search PVLDB 2016 10 121-132
[21]
Rohloff, K., Schantz, R.E.: High-performance, massively scalable distributed systems using the MapReduce software framework: the SHARD triple-store. In: PSI EtA (2010)
[22]
Roy, P., Seshadri, S., Sudarshan, S., Bhobe, S.: Efficient and extensible algorithms for multi query optimization. In: SIGMOD Conference (2000)
[23]
Schätzle A, Przyjaciel-Zablocki M, Skilevic S, and Lausen G S2RDF: RDF querying with SPARQL on spark PVLDB 2016 9 804-815
[24]
Shao, B., Wang, H., Li, Y.: Trinity: a distributed graph engine on a memory cloud. In: SIGMOD Conference (2013)
[25]
Srivastava, D.: Navigation- vs. index-based XML multi-query processing. In: Proceedings of the ICDE, pp. 139–150 (2003)
[26]
Trigoni N, Yao Y, Demers A, Gehrke J, and Rajaraman R Prasanna VK, Iyengar SS, Spirakis PG, and Welsh M Multi-query optimization for sensor networks Distributed Computing in Sensor Systems 2005 Heidelberg Springer 307-321
[27]
Walker DW and Dongarra JJ MPI: a standard message passing interface Supercomputer 1996 12 56-68
[28]
Wu, B., Zhou, Y., Yuan, P., Liu, L., Jin, H.: Scalable SPARQL querying using path partitioning. In: 2015 IEEE 31st International Conference on Data Engineering, pp. 795–806 (2015)
[29]
Yuan P, Liu P, Wu B, Jin H, Zhang W, and Liu L TripleBit: a fast and compact system for large scale RDF data PVLDB 2013 6 517-528
[30]
Zhang, X., Chen, L., Tong, Y., Wang, M.: EAGRE: towards scalable I/O efficient SPARQL query evaluation on the cloud. In: 2013 IEEE 29th International Conference on Data Engineering (ICDE), pp. 565–576 (2013)
[31]
Zhao, Y., Deshpande, P., Naughton, J.F., Shukla, A.: Simultaneous optimization and evaluation of multiple dimensional queries. In: SIGMOD Conference (1998)
[32]
Zou L, Mo J, Chen L, Özsu MT, and Zhao D gStore: answering SPARQL queries via subgraph matching PVLDB 2011 4 8 482-493

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
Database Systems for Advanced Applications: 24th International Conference, DASFAA 2019, Chiang Mai, Thailand, April 22–25, 2019, Proceedings, Part I
Apr 2019
828 pages
ISBN:978-3-030-18575-6
DOI:10.1007/978-3-030-18576-3

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 24 April 2019

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 23 Dec 2024

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media