Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/1325851.1325900dlproceedingsArticle/Chapter ViewAbstractPublication PagesvldbConference Proceedingsconference-collections
research-article

Scalable semantic web data management using vertical partitioning

Published: 23 September 2007 Publication History

Abstract

Efficient management of RDF data is an important factor in realizing the Semantic Web vision. Performance and scalability issues are becoming increasingly pressing as Semantic Web technology is applied to real-world applications. In this paper, we examine the reasons why current data management solutions for RDF data scale poorly, and explore the fundamental scalability limitations of these approaches. We review the state of the art for improving performance for RDF databases and consider a recent suggestion, "property tables." We then discuss practically and empirically why this solution has undesirable features. As an improvement, we propose an alternative solution: vertically partitioning the RDF data. We compare the performance of vertical partitioning with prior art on queries generated by a Web-based RDF browser over a large-scale (more than 50 million triples) catalog of library data. Our results show that a vertical partitioned schema achieves similar performance to the property table technique while being much simpler to design. Further, if a column-oriented DBMS (a database architected specially for the vertically partitioned case) is used instead of a row-oriented DBMS, another order of magnitude performance improvement is observed, with query times dropping from minutes to several seconds.

References

[1]
Library catalog data. http://simile.mit.edu/rdf-test-data/barton/.
[2]
Longwell website. http://simile.mit.edu/longwell/.
[3]
Redland RDF Application Framework. http://librdf.org/.
[4]
Simile website. http://simile.mit.edu/.
[5]
Swoogle. http://swoogle.umbc.edu/.
[6]
Uniprot rdf dataset. http://dev.isb-sib.ch/projects/uniprot-rdf/.
[7]
Wordnet rdf dataset. http://www.cogsci.princeton.edu/~wn/.
[8]
World Wide Web Consortium (W3C). http://www.w3.org/.
[9]
RDF Primer. W3C Recommendation. http://www.w3.org/TR/rdf-primer, 2004.
[10]
RDQL - A Query Language for RDF. W3C Member Submission 9 January 2004. http://www.w3.org/Submission/RDQL/, 2004.
[11]
SPARQL Query Language for RDF. W3C Working Draft 4 October 2006. http://www.w3.org/TR/rdf-sparql-query/, 2006.
[12]
D. Abadi, A. Marcus, S. Madden, and K. Hollenbach. Using the Barton libraries dataset as an RDF benchmark. Technical Report MIT-CSAIL-TR-2007-036, MIT.
[13]
D. J. Abadi. Column stores for wide and sparse data. In CIDR, 2007.
[14]
D. J. Abadi, S. Madden, and M. Ferreira. Integrating Compression and Execution in Column-Oriented Database Systems. In SIGMOD, 2006.
[15]
D. J. Abadi, D. S. Myers, D. J. DeWitt, and S. R. Madden. Materialization strategies in a column-oriented DBMS. In Proc. of ICDE, 2007.
[16]
R. Agrawal, A. Somani, and Y. Xu. Storage and Querying of E-Commerce Data. In VLDB, 2001.
[17]
J. Beckmann, A. Halverson, R. Krishnamurthy, and J. Naughton. Extending RDBMSs To Support Sparse Datasets Using An Interpreted Attribute Storage Format. In ICDE, 2006.
[18]
P. A. Boncz and M. L. Kersten. MIL primitives for querying a fragmented world. VLDB Journal, 8(2):101--119, 1999.
[19]
P. A. Boncz, M. Zukowski, and N. Nes. MonetDB/X100: Hyper-pipelining query execution. In CIDR, pages 225--237, 2005.
[20]
V. Bonstrom, A. Hinze, and H. Schweppe. Storing RDF as a graph. In Proc. of LA-WEB, 2003.
[21]
J. Broekstra, A. Kampman, and F. van Harmelen. Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema. In ISWC, pages 54--68, 2002.
[22]
E. I. Chong, S. Das, G. Eadon, and J. Srinivasan. An Efficient SQL-based RDF Querying Scheme. In VLDB, pages 1216--1227, 2005.
[23]
G. P. Copeland and S. N. Khoshafian. A decomposition storage model. In Proc. of SIGMOD, pages 268--279, 1985.
[24]
J. Corwin, A. Silberschatz, P. L. Miller, and L. Marenco. Dynamic tables: An architecture for managing evolving, heterogeneous biomedical data in relational database management systems. Journal of the American Medical Informatics Association, 14(1):86--93, 2007.
[25]
D. Florescu and D. Kossmann. Storing and querying XML data using an RDMBS. IEEE Data Eng. Bull., 22(3):27--34, 1999.
[26]
S. Harris and N. Gibbins. 3store: Efficient bulk RDF storage. In In Proc. of PSSS'03, pages 1--15, 2003.
[27]
J. M. Hellerstein, J. F. Naughton, and A. Pfeffer. Generalized search trees for database systems. In Proc. of VLDB 1995, Zurich, Switzerland, pages 562--573.
[28]
R. MacNicol and B. French. Sybase IQ Multiplex - Designed For Analytics. In VLDB, pages 1227--1230, 2004.
[29]
J. Shanmugasundaram, K. Tufte, C. Zhang, G. He, D. J. DeWitt, and J. F. Naughton. Relational databases for querying XML documents: Limitations and opportunities. In Proc. of VLDB, pages 302--314, 1999.
[30]
M. Stonebraker, D. J. Abadi, A. Batkin, X. Chen, M. Cherniack, M. Ferreira, E. Lau, A. Lin, S. Madden, E. J. O'Neil, P. E. O'Neil, A. Rasin, N. Tran, and S. B. Zdonik. C-Store: A column-oriented DBMS. In VLDB, pages 553--564, 2005.
[31]
K. Wilkinson. Jena property table implementation. In SSWS, 2006.
[32]
K. Wilkinson, C. Sayers, H. Kuno, and D. Reynolds. Efficient RDF Storage and Retrieval in Jena2. In SWDB, pages 131--150, 2003.

Cited By

View all
  • (2024)Accurate Sampling-Based Cardinality Estimation for Complex Graph QueriesACM Transactions on Database Systems10.1145/368920949:3(1-46)Online publication date: 17-Sep-2024
  • (2023)Web Science: An Interdisciplinary Approach to Understanding the WebLinking the World’s Information10.1145/3591366.3591374(67-84)Online publication date: 5-Sep-2023
  • (2023)A Survey on Mapping Semi-Structured Data and Graph Data to Relational DataACM Computing Surveys10.1145/356744455:10(1-38)Online publication date: 2-Feb-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
VLDB '07: Proceedings of the 33rd international conference on Very large data bases
September 2007
1443 pages
ISBN:9781595936493

Sponsors

  • Yahoo! Research
  • Google Inc.
  • SAP
  • Intel: Intel
  • Microsoft Research: Microsoft Research
  • ORACLE: ORACLE
  • Connex.cc
  • HP invent
  • WKO
  • IBM: IBM

Publisher

VLDB Endowment

Publication History

Published: 23 September 2007

Qualifiers

  • Research-article

Conference

VLDB '07
Sponsor:
  • Intel
  • Microsoft Research
  • ORACLE
  • IBM

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Accurate Sampling-Based Cardinality Estimation for Complex Graph QueriesACM Transactions on Database Systems10.1145/368920949:3(1-46)Online publication date: 17-Sep-2024
  • (2023)Web Science: An Interdisciplinary Approach to Understanding the WebLinking the World’s Information10.1145/3591366.3591374(67-84)Online publication date: 5-Sep-2023
  • (2023)A Survey on Mapping Semi-Structured Data and Graph Data to Relational DataACM Computing Surveys10.1145/356744455:10(1-38)Online publication date: 2-Feb-2023
  • (2023)An Effective Framework for Enhancing Query Answering in a Heterogeneous Data LakeProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591637(770-780)Online publication date: 19-Jul-2023
  • (2022)Large-Scale Commodity Knowledge Organization and Intelligent Query OptimizationInternational Journal of Mobile Computing and Multimedia Communications10.4018/IJMCMC.29796513:1(1-25)Online publication date: 22-Apr-2022
  • (2022)SRX: efficient management of spatial RDF dataThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-019-00554-z28:5(703-733)Online publication date: 11-Mar-2022
  • (2021)Columnar storage and list-based processing for graph database management systemsProceedings of the VLDB Endowment10.14778/3476249.347629714:11(2491-2504)Online publication date: 27-Oct-2021
  • (2021)Load Balanced Semantic Aware Distributed RDF GraphProceedings of the 25th International Database Engineering & Applications Symposium10.1145/3472163.3472167(127-133)Online publication date: 14-Jul-2021
  • (2020)SuccinctEdgeProceedings of the VLDB Endowment10.14778/3415478.341549313:12(2857-2860)Online publication date: 1-Aug-2020
  • (2020)Scalable SPARQL querying of large RDF graphsProceedings of the VLDB Endowment10.14778/3402707.34027474:11(1123-1134)Online publication date: 3-Jun-2020
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media