Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3035918.3056101acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases

Published: 09 May 2017 Publication History

Abstract

Amazon Aurora is a relational database service for OLTP workloads offered as part of Amazon Web Services (AWS). In this paper, we describe the architecture of Aurora and the design considerations leading to that architecture. We believe the central constraint in high throughput data processing has moved from compute and storage to the network. Aurora brings a novel architecture to the relational database to address this constraint, most notably by pushing redo processing to a multi-tenant scale-out storage service, purpose-built for Aurora. We describe how doing so not only reduces network traffic, but also allows for fast crash recovery, failovers to replicas without loss of data, and fault-tolerant, self-healing storage. We then describe how Aurora achieves consensus on durable state across numerous storage nodes using an efficient asynchronous scheme, avoiding expensive and chatty recovery protocols. Finally, having operated Aurora as a production service for over 18 months, we share the lessons we have learnt from our customers on what modern cloud applications expect from databases.

References

[1]
B. Calder, J. Wang, et al. Windows Azure storage: A highly available cloud storage service with strong consistency. In SOSP 201
[2]
O. Khan, R. Burns, J. Plank, W. Pierce, and C. Huang. Rethinking erasure codes for cloud file systems: Minimizing I/O for recovery and degraded reads. In FAST 2012.
[3]
P.A. Bernstein, V. Hadzilacos, and N. Goodman. Concurrency control and recovery in database systems, Chapter 7, Addison Wesley Publishing Company, ISBN 0-201-10715-5, 1997.
[4]
C. Mohan, B. Lindsay, and R. Obermarck. Transaction management in the R* distributed database management system?. ACM TODS, 11(4):378--396, 1986.
[5]
C. Mohan and B. Lindsay. Efficient commit protocols for the tree of processes model of distributed transactions. ACM SIGOPS Operating Systems Review, 19(2):40--52, 1985.
[6]
D.K. Gifford. Weighted voting for replicated data. In SOSP 1979.
[7]
C. Mohan, D.L. Haderle, B. Lindsay, H. Pirahesh, and P. Schwarz. ARIES: A transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging. ACM TODS, 17 (1): 94--162, 1992.
[8]
R. van Renesse and F. Schneider. Chain replication for supporting high throughput and availability. In OSDI 2004.
[9]
A. Kopytov. Sysbench Manual. Available at http://imysql.com/wp-content/uploads/2014/10/sysbench-manual.pdf
[10]
J. Levandoski, D. Lomet, S. Sengupta, R. Stutsman, and R. Wang. High performance transactions in deuteronomy. In CIDR 2015.
[11]
P. Bailis, A. Fekete, A. Ghodsi, J.M. Hellerstein, and I. Stoica. Scalable atomic visibility with RAMP Transactions. In SIGMOD 2014.
[12]
P. Bailis, A. Davidson, A. Fekete, A. Ghodsi, J.M. Hellerstein, and I. Stoica. Highly available transactions: virtues and limitations. In VLDB 2014.
[13]
R. Taft, E. Mansour, M. Serafini, J. Duggan, A.J. Elmore, A. Aboulnaga, A. Pavlo, and M. Stonebraker. E-Store: fine-grained elastic partitioning for distributed transaction processing systems. In VLDB 2015.
[14]
R. Woollen. The internal design of salesforce.com's multi-tenant architecture. In SoCC 2010.
[15]
S. Davidson, H. Garcia-Molina, and D. Skeen. Consistency in partitioned networks. ACM CSUR, 17(3):341--370, 1985.
[16]
S. Gilbert and N. Lynch. Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services. SIGACT News, 33(2):51--59, 2002.
[17]
D.J. Abadi. Consistency tradeoffs in modern distributed database system design: CAP is only part of the story. IEEE Computer, 45(2), 2012.
[18]
A. Adya. Weak consistency: a generalized theory and optimistic implementations for distributed transactions. PhD Thesis, MIT, 1999.
[19]
Y. Saito and M. Shapiro. Optimistic replication. ACM Comput. Surv., 37(1), Mar. 2005.
[20]
H. Berenson, P. Bernstein, J. Gray, J. Melton, E. O'Neil, and P. O'Neil. A critique of ANSI SQL isolation levels. In SIGMOD 1995.
[21]
P. Bailis and A. Ghodsi. Eventual consistency today: limitations, extensions, and beyond. ACM Queue, 11(3), March 2013.
[22]
P. Bernstein and S. Das. Rethinking eventual consistency. In SIGMOD, 2013.
[23]
B. Cooper et al. PNUTS: Yahoo!'s hosted data serving platform. In VLDB 2008.
[24]
J. C. Corbett, J. Dean, et al. Spanner: Google's globally-distributed database. In OSDI 2012.
[25]
David K. Gifford. Information Storage in a Decentralized Computer System. Tech. rep. CSL-81--8. PhD dissertation. Xerox PARC, July 1982.
[26]
Jeffrey Dean and Sanjay Ghemawat. MapReduce: a flexible data processing tool?. CACM 53 (1):72--77, 2010.
[27]
J. M. Hellerstein, M. Stonebraker, and J. R. Hamilton. Architecture of a database system. Foundations and Trends in Databases. 1(2) pp. 141--259, 2007.
[28]
J. Gray, R. A. Lorie, G. R. Putzolu, I. L. Traiger. Granularity of locks in a shared data base. In VLDB 1975.
[29]
P-A Larson, et al. High-Performance Concurrency control mechanisms for main-memory databases. PVLDB, 5(4): 298--309, 2011.
[30]
M. Stonebraker and A. Weisberg. The VoltDB main memory DBMS. IEEE Data Eng. Bull., 36(2): 21--27, 2013.
[31]
V. Leis, A. Kemper, and T. Neumann. Exploiting hardware transactional memory in main-memory databases. In ICDE 2014.
[32]
H. Mühe, S. Wolf, A. Kemper, and T. Neumann: An evaluation of strict timestamp ordering concurrency control for main-memory database systems. In IMDM Workshop 2013.
[33]
M. Rosenblum and J. Ousterhout. The design and implementation of a log-structured file system. ACM TOCS 10(1): 26--52, 1992.
[34]
J. Levandoski, D. Lomet, S. Sengupta. LLAMA: A cache/storage subsystem for modern hardware. PVLDB 6(10): 877--888, 2013.
[35]
J. Levandoski, D. Lomet, and S. Sengupta. The Bw-Tree: A B-tree for new hardware platforms. In ICDE 2013.
[36]
M. Aguilera, J. Leners, and M. Walfish. Yesquel: scalable SQL storage for web applications. In SOSP 2015.
[37]
Percona Lab. TPC-C Benchmark over MySQL. Available at https://github.com/Percona-Lab/tpcc-mysql
[38]
P. Bernstein, C. Reid, and S. Das. Hyder -- A transactional record manager for shared flash. In CIDR 2011.
[39]
M. Aguilera, A. Merchant, M. Shah, A. Veitch, and C. Karamanolis. Sinfonia: A new paradigm for building scalable distributed systems. ACM Trans. Comput. Syst. 27(3): 2009.
[40]
M. Weiner. Sharding Pinterest: How we scaled our MySQL fleet. Pinterest Engineering Blog. Available at: https://engineering.pinterest.com/blog/sharding-pinterest-how-we-scaled-our-mysql-fleet
[41]
G. Graefe. Instant recovery for data center savings. ACM SIGMOD Record. 44(2):29--34, 2015.
[42]
J. Dean and L. Barroso. The tail at scale. CACM 56(2):74--80, 2013.

Cited By

View all
  • (2025)Practical DB-OS Co-Design with Privileged Kernel BypassProceedings of the ACM on Management of Data10.1145/37097143:1(1-27)Online publication date: 11-Feb-2025
  • (2025)Boosting OLTP Performance with Per-Page Logging on NVDIMMProceedings of the ACM on Management of Data10.1145/37096673:1(1-28)Online publication date: 11-Feb-2025
  • (2025)Nazar: Monitoring and Adapting ML Models on Mobile DevicesProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3669940.3707246(746-761)Online publication date: 3-Feb-2025
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '17: Proceedings of the 2017 ACM International Conference on Management of Data
May 2017
1810 pages
ISBN:9781450341974
DOI:10.1145/3035918
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 May 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. databases
  2. distributed systems
  3. log processing
  4. oltp
  5. performance
  6. quorum models
  7. recovery
  8. replication

Qualifiers

  • Research-article

Conference

SIGMOD/PODS'17
Sponsor:

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)443
  • Downloads (Last 6 weeks)48
Reflects downloads up to 10 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Practical DB-OS Co-Design with Privileged Kernel BypassProceedings of the ACM on Management of Data10.1145/37097143:1(1-27)Online publication date: 11-Feb-2025
  • (2025)Boosting OLTP Performance with Per-Page Logging on NVDIMMProceedings of the ACM on Management of Data10.1145/37096673:1(1-28)Online publication date: 11-Feb-2025
  • (2025)Nazar: Monitoring and Adapting ML Models on Mobile DevicesProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3669940.3707246(746-761)Online publication date: 3-Feb-2025
  • (2025)Design and Analysis of Distributed Message Ordering over a Unidirectional Logical RingComputer Performance Engineering10.1007/978-3-031-80932-3_1(1-13)Online publication date: 13-Feb-2025
  • (2024)Hybrid Shared-Buffer for Multi-Master DatabasesJournal of Database Management10.4018/JDM.35692035:1(1-27)Online publication date: 11-Oct-2024
  • (2024)Resource Management in Aurora ServerlessProceedings of the VLDB Endowment10.14778/3685800.368582517:12(4038-4050)Online publication date: 8-Nov-2024
  • (2024)TDSQL: Tencent Distributed Database SystemProceedings of the VLDB Endowment10.14778/3685800.368581217:12(3869-3882)Online publication date: 8-Nov-2024
  • (2024)GaussDB: A Cloud-Native Multi-Primary Database with Compute-Memory-Storage DisaggregationProceedings of the VLDB Endowment10.14778/3685800.368580617:12(3786-3798)Online publication date: 8-Nov-2024
  • (2024)PALF: Replicated Write-Ahead Logging for Distributed DatabasesProceedings of the VLDB Endowment10.14778/3685800.368580317:12(3745-3758)Online publication date: 8-Nov-2024
  • (2024)Bf-Tree: A Modern Read-Write-Optimized Concurrent Larger-Than-Memory Range IndexProceedings of the VLDB Endowment10.14778/3681954.368201217:11(3442-3455)Online publication date: 1-Jul-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media