Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2602622.2602624acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Automated schema design for NoSQL databases

Published: 18 June 2014 Publication History

Abstract

Selecting appropriate indices and materialized views is critical for high performance in relational databases. By example, we show that the problem of schema optimization is also highly relevant for NoSQL databases. We explore the problem of schema design in NoSQL databases with a goal of optimizing query performance while minimizing storage overhead. Our suggested approach uses the cost of executing a given workload for a given schema to guide the mapping from the application data model to a physical schema. We propose a cost-driven approach for optimization and discuss its usefulness as part of an automated schema design tool.

References

[1]
HBase: A Distributed Database for Large Datasets. Retrieved March 7, 2013 from http://hbase.apache.org.
[2]
S. Agrawal, S. Chaudhuri, and V. R. Narasayya. Automated Selection of Materialized Views and Indexes in SQL Databases. In VLDB '00, pages 496--505, San Francisco, CA, USA, 2000. Morgan Kaufmann Publishers Inc.
[3]
Benoit Dageville, D. Das, K. Dias, K. Yagoub, and M. Zait. Automatic SQL tuning in oracle 10g. VLDB '04, 30:1098--1109, 2004.
[4]
V. Benzaken, G. Castagna, K. Nguyen, and J. Siméon. Static and dynamic semantics of NoSQL languages. In POPL '13, pages 101--114, New York, New York, USA, 2013. ACM Press.
[5]
K. S. Beyer, V. Ercegovac, R. Gemulla, A. Balmin, M. Y. Eltabakh, C.-C. Kanne, F. Özcan, and E. J. Shekita. Jaql: A Scripting Language for Large Scale Semistructured Data Analysis. PVLDB, 4(12):1272--1283, 2011.
[6]
A. Calil and S. Mello. SimpleSQL : A Relational Layer for SimpleDB. In Advances in Databases and Information Systems, pages 99--110. 2012.
[7]
R. Cattell. Scalable SQL and NoSQL data stores. ACM SIGMOD Record, 39(4):12--27, May 2011.
[8]
E. Hewitt. Cassandra: The Definitive Guide. O'Reilly Media, Sebastopol, CA, 2 edition, 2011.
[9]
A. Lakshman and P. Malik. Cassandra: a decentralized structured storage system. ACM SIGOPS Operating Systems Review, 44(2):35, Apr. 2010.
[10]
A. Lamb, M. Fuller, R. Varadarajan, N. Tran, B. Vandiver, L. Doshi, and C. Bear. The Vertica Analytic Database : C-Store 7 Years Later. In VLDB '12, volume 5, pages 1790--1801, 2012.
[11]
A. Rasin and S. Zdonik. An Automatic Physical Design Tool for Clustered Column-Stores. In EDBT '13, pages 203--214, 2013.
[12]
G. L. Sanders and S. Shin. Denormalization effects on performance of RDBMS. In Proceedings of the 34th Annual Hawaii International Conference on System Sciences. IEEE Comput. Soc, 2001.
[13]
S. Scherzinger, E. C. De Almeida, F. Ickert, and M. D. Del Fabro. On the necessity of model checking NoSQL database schemas when building SaaS applications. Proceedings of the 2013 International Workshop on Testing the Cloud - TTC 2013, 2013.
[14]
M. Stonebraker, D. J. Abadi, A. Batkin, X. Chen, M. Cherniack, M. Ferreira, E. Lau, A. Lin, S. Madden, E. O. Neil, P. O. Neil, A. Rasin, N. Tran, and S. Zdonik. C-Store : A Column-oriented DBMS. In VLDB '05, pages 553--564, 2005.
[15]
O. G. Tsatalos, M. H. Solomon, and Y. E. Ioannidis. The GMAP: a versatile tool for physical data independence. The VLDB Journal The International Journal on Very Large Data Bases, 5(2):101--118, Apr. 1996.
[16]
T. Vajk, L. Deák, K. Fekete, and G. Mezei. Automatic NoSQL Schema Development: A Case Study. In Artificial Intelligence and Applications, number Pdcn, pages 656--663. Actapress, 2013.
[17]
D. C. Zilio, J. Rao, S. Lightstone, G. Lohman, A. Storm, C. Garcia-Arellano, and S. Fadden. DB2 design advisor: integrated automatic physical database design. In VLDB '04, pages 1087--1097, 2004.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD'14 PhD Symposium: Proceedings of the 2014 SIGMOD PhD symposium
June 2014
58 pages
ISBN:9781450329248
DOI:10.1145/2602622
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. nosql
  2. schema optimization
  3. workload modeling

Qualifiers

  • Research-article

Conference

SIGMOD/PODS'14
Sponsor:

Acceptance Rates

SIGMOD'14 PhD Symposium Paper Acceptance Rate 10 of 13 submissions, 77%;
Overall Acceptance Rate 40 of 60 submissions, 67%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)38
  • Downloads (Last 6 weeks)2
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Schema generation for document stores using workload-driven approachThe Journal of Supercomputing10.1007/s11227-023-05613-580:3(4000-4048)Online publication date: 1-Feb-2024
  • (2024)An ontology-based approach to designing a NoSQL database for semi-structured and unstructured health dataCluster Computing10.1007/s10586-023-03995-y27:1(959-976)Online publication date: 1-Feb-2024
  • (2023)Are NoSQL Databases Affected by Schema?IETE Journal of Research10.1080/03772063.2023.223747870:5(4770-4791)Online publication date: 26-Jul-2023
  • (2023)Enabling schema-independent data retrieval queries in MongoDBInformation Systems10.1016/j.is.2023.102165114:COnline publication date: 1-Mar-2023
  • (2023)Mining, Analyzing, and Evolving Data-Intensive Software EcosystemsSoftware Ecosystems10.1007/978-3-031-36060-2_11(281-314)Online publication date: 6-Oct-2023
  • (2022)Static Analysis of Database Accesses in MongoDB Applications2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER53432.2022.00111(930-934)Online publication date: Mar-2022
  • (2022) Z i g Z a g +Engineering Applications of Artificial Intelligence10.1016/j.engappai.2022.105251115:COnline publication date: 1-Oct-2022
  • (2021)Data Modeling and NoSQL Databases - A Systematic Mapping ReviewACM Computing Surveys10.1145/345760854:6(1-26)Online publication date: 13-Jul-2021
  • (2021)A Systematic Review of Data Models for the Big Data ProblemIEEE Access10.1109/ACCESS.2021.31128809(128889-128904)Online publication date: 2021
  • (2021)Influence of Schema Design in NoSQL Document StoresMobile Computing and Sustainable Informatics10.1007/978-981-16-1866-6_32(435-452)Online publication date: 23-Jul-2021
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media