Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2631775.2631789acmconferencesArticle/Chapter ViewAbstractPublication PageshtConference Proceedingsconference-collections
poster

A two-tier index architecture for fast processing large RDF data over distributed memory

Published: 01 September 2014 Publication History

Abstract

We propose an efficient method for fast processing large RDF data over distributed memory. Our approach adopts a two-tier index architecture on each computation node: (1) a light-weight primary index, to keep loading times low, and (2) a dynamic, multi-level secondary index, calculated as a by-product of query execution, to decrease or remove inter-machine data movement for subsequent queries that contain the same graph patterns. Experimental results on a commodity cluster show that we can load large RDF data very quickly in memory while remaining within an interactive range for query processing with the secondary index.

References

[1]
D. J. Abadi, A. Marcus, S. R. Madden, and K. Hollenbach. Scalable semantic web data management using vertical partitioning. In Proceedings of the 33rd International Conference on Very large Data Bases, VLDB' 07, pages 411--422, 2007.
[2]
B. Bishop, A. Kiryakov, D. Ognyanoff, I. Peikov, Z. Tashev, and R. Velkov. OWLIM: A family of scalable semantic repositories. Semantic Web, 2(1):33--42, 2011.
[3]
P. Charles, C. Grothoff, V. Saraswat, C. Donawa, A. Kielstra, K. Ebcioglu, C. Von Praun, and V. Sarkar. X10: An object-oriented approach to non-uniform cluster computing. ACM SIGPLAN Notices, 40(10):519--538, 2005.
[4]
L. Cheng, S. Kotoulas, T. Ward, and G. Theodoropoulos. Runtime characterization of triple stores. In Proceedings of the 15th IEEE International Conference on Computational Science and Engineering, CSE' 12, pages 66--73, 2012.
[5]
L. Cheng, S. Kotoulas, T. E. Ward, and G. Theodoropoulos. QbDJ: A novel framework for handling skew in parallel join processing on distributed memory. In Proceedings of the 15th IEEE International Conference on High Performance Computing and Communications, HPCC' 13, pages 1519--1527, 2013.
[6]
L. Cheng, S. Kotoulas, T. E. Ward, and G. Theodoropoulos. Efficient handling skew in outer joins on distributed systems. In Proceedings of the 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGrid' 14, pages 295--304, 2014.
[7]
L. Cheng, S. Kotoulas, T. E. Ward, and G. Theodoropoulos. Robust and efficient large-large table outer joins on distributed infrastructures. In Proceedings of the 20th European Conference on Parallel Processing, Euro-Par' 14, pages 258--269, 2014.
[8]
L. Cheng, A. Malik, S. Kotoulas, T. E. Ward, and G. Theodoropoulos. Efficient parallel dictionary encoding for RDF data. In Proceedings of the 17th International Workshop on the Web and Databases, WebDB' 14, 2014.
[9]
O. Erling and I. Mikhailov. Virtuoso: RDF support in a native RDBMS. In Semantic Web Information Management, pages 501--519. Springer, 2010.
[10]
J. Huang, D. J. Abadi, and K. Ren. Scalable SPARQL querying of large RDF graphs. Proceedings of the VLDB Endowment, 4(11):1123--1134, 2011.
[11]
M. Husain, J. McGlothlin, M. M. Masud, L. Khan, and B. M. Thuraisingham. Heuristics-based query processing for large RDF graphs using cloud computing. IEEE Transactions on Knowledge and Data Engineering, 23(9):1312--1327, 2011.
[12]
S. Idreos, M. L. Kersten, and S. Manegold. Database cracking. In CIDR, pages 68--78, 2007.
[13]
S. Idreos, M. L. Kersten, and S. Manegold. Self-organizing tuple reconstruction in column-stores. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, SIGMOD '09, pages 297--308, 2009.
[14]
K. Kim, B. Moon, and H.-J. Kim. R3F: RDF triple filtering method for efficient SPARQL query processing. World Wide Web, pages 1--41, 2013.
[15]
S. Kotoulas, J. Urbani, P. Boncz, and P. Mika. Robust runtime optimization and skew-resistant execution of analytical SPARQL queries on PIG. In Proceedings of the 11th International Semantic Web Conference, ISWC' 12, pages 247--262. 2012.
[16]
K. Rohloff and R. E. Schantz. High-performance, massively scalable distributed systems using the Map Reduce software framework: The SHARD triple-store. In Programming Support Innovations for Emerging Distributed Applications, 2010.
[17]
B. Thompson and M. Personick. Bigdata: The semantic web on an open source cloud. In International Semantic Web Conference, 2009.
[18]
J. Weaver and G. T. Williams. Scalable RDF query processing on clusters and supercomputers. In The 5th International Workshop on Scalable Semantic Web Knowledge Base Systems, SSWS' 09, 2009.

Cited By

View all
  • (2014)A fully parallel framework for analyzing RDF dataProceedings of the 2014 International Conference on Posters & Demonstrations Track - Volume 127210.5555/2878453.2878526(289-292)Online publication date: 21-Oct-2014

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
HT '14: Proceedings of the 25th ACM conference on Hypertext and social media
September 2014
346 pages
ISBN:9781450329545
DOI:10.1145/2631775
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 September 2014

Check for updates

Author Tags

  1. distributed rdf processing
  2. dynamic indexing

Qualifiers

  • Poster

Funding Sources

Conference

HT '14
Sponsor:

Acceptance Rates

HT '14 Paper Acceptance Rate 49 of 86 submissions, 57%;
Overall Acceptance Rate 378 of 1,158 submissions, 33%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 28 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2014)A fully parallel framework for analyzing RDF dataProceedings of the 2014 International Conference on Posters & Demonstrations Track - Volume 127210.5555/2878453.2878526(289-292)Online publication date: 21-Oct-2014

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media