research-article

Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases

Authors:

Alexandre Verbitski,

Murali Brahmadesam,

Sailesh Krishnamurthy,

Sandor Maurice,

Tengiz Kharatishvili,

Xiaofeng BaoAuthors Info & Claims

SIGMOD '17: Proceedings of the 2017 ACM International Conference on Management of Data

Pages 1041 - 1052

https://doi.org/10.1145/3035918.3056101

Published: 09 May 2017 Publication History

Abstract

Amazon Aurora is a relational database service for OLTP workloads offered as part of Amazon Web Services (AWS). In this paper, we describe the architecture of Aurora and the design considerations leading to that architecture. We believe the central constraint in high throughput data processing has moved from compute and storage to the network. Aurora brings a novel architecture to the relational database to address this constraint, most notably by pushing redo processing to a multi-tenant scale-out storage service, purpose-built for Aurora. We describe how doing so not only reduces network traffic, but also allows for fast crash recovery, failovers to replicas without loss of data, and fault-tolerant, self-healing storage. We then describe how Aurora achieves consensus on durable state across numerous storage nodes using an efficient asynchronous scheme, avoiding expensive and chatty recovery protocols. Finally, having operated Aurora as a production service for over 18 months, we share the lessons we have learnt from our customers on what modern cloud applications expect from databases.

References

[1]

B. Calder, J. Wang, et al. Windows Azure storage: A highly available cloud storage service with strong consistency. In SOSP 201

Digital Library

[2]

O. Khan, R. Burns, J. Plank, W. Pierce, and C. Huang. Rethinking erasure codes for cloud file systems: Minimizing I/O for recovery and degraded reads. In FAST 2012.

Digital Library

[3]

P.A. Bernstein, V. Hadzilacos, and N. Goodman. Concurrency control and recovery in database systems, Chapter 7, Addison Wesley Publishing Company, ISBN 0-201-10715-5, 1997.

Digital Library

[4]

C. Mohan, B. Lindsay, and R. Obermarck. Transaction management in the R* distributed database management system?. ACM TODS, 11(4):378--396, 1986.

Digital Library

[5]

C. Mohan and B. Lindsay. Efficient commit protocols for the tree of processes model of distributed transactions. ACM SIGOPS Operating Systems Review, 19(2):40--52, 1985.

Digital Library

[6]

D.K. Gifford. Weighted voting for replicated data. In SOSP 1979.

Digital Library

[7]

C. Mohan, D.L. Haderle, B. Lindsay, H. Pirahesh, and P. Schwarz. ARIES: A transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging. ACM TODS, 17 (1): 94--162, 1992.

Digital Library

[8]

R. van Renesse and F. Schneider. Chain replication for supporting high throughput and availability. In OSDI 2004.

Digital Library

[9]

A. Kopytov. Sysbench Manual. Available at http://imysql.com/wp-content/uploads/2014/10/sysbench-manual.pdf

[10]

J. Levandoski, D. Lomet, S. Sengupta, R. Stutsman, and R. Wang. High performance transactions in deuteronomy. In CIDR 2015.

[11]

P. Bailis, A. Fekete, A. Ghodsi, J.M. Hellerstein, and I. Stoica. Scalable atomic visibility with RAMP Transactions. In SIGMOD 2014.

Digital Library

[12]

P. Bailis, A. Davidson, A. Fekete, A. Ghodsi, J.M. Hellerstein, and I. Stoica. Highly available transactions: virtues and limitations. In VLDB 2014.

Digital Library

[13]

R. Taft, E. Mansour, M. Serafini, J. Duggan, A.J. Elmore, A. Aboulnaga, A. Pavlo, and M. Stonebraker. E-Store: fine-grained elastic partitioning for distributed transaction processing systems. In VLDB 2015.

Digital Library

[14]

R. Woollen. The internal design of salesforce.com's multi-tenant architecture. In SoCC 2010.

Digital Library

[15]

S. Davidson, H. Garcia-Molina, and D. Skeen. Consistency in partitioned networks. ACM CSUR, 17(3):341--370, 1985.

Digital Library

[16]

S. Gilbert and N. Lynch. Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services. SIGACT News, 33(2):51--59, 2002.

Digital Library

[17]

D.J. Abadi. Consistency tradeoffs in modern distributed database system design: CAP is only part of the story. IEEE Computer, 45(2), 2012.

Digital Library

[18]

A. Adya. Weak consistency: a generalized theory and optimistic implementations for distributed transactions. PhD Thesis, MIT, 1999.

Digital Library

[19]

Y. Saito and M. Shapiro. Optimistic replication. ACM Comput. Surv., 37(1), Mar. 2005.

Digital Library

[20]

H. Berenson, P. Bernstein, J. Gray, J. Melton, E. O'Neil, and P. O'Neil. A critique of ANSI SQL isolation levels. In SIGMOD 1995.

Digital Library

[21]

P. Bailis and A. Ghodsi. Eventual consistency today: limitations, extensions, and beyond. ACM Queue, 11(3), March 2013.

Digital Library

[22]

P. Bernstein and S. Das. Rethinking eventual consistency. In SIGMOD, 2013.

Digital Library

[23]

B. Cooper et al. PNUTS: Yahoo!'s hosted data serving platform. In VLDB 2008.

Digital Library

[24]

J. C. Corbett, J. Dean, et al. Spanner: Google's globally-distributed database. In OSDI 2012.

Digital Library

[25]

David K. Gifford. Information Storage in a Decentralized Computer System. Tech. rep. CSL-81--8. PhD dissertation. Xerox PARC, July 1982.

Digital Library

[26]

Jeffrey Dean and Sanjay Ghemawat. MapReduce: a flexible data processing tool?. CACM 53 (1):72--77, 2010.

Digital Library

[27]

J. M. Hellerstein, M. Stonebraker, and J. R. Hamilton. Architecture of a database system. Foundations and Trends in Databases. 1(2) pp. 141--259, 2007.

Digital Library

[28]

J. Gray, R. A. Lorie, G. R. Putzolu, I. L. Traiger. Granularity of locks in a shared data base. In VLDB 1975.

Digital Library

[29]

P-A Larson, et al. High-Performance Concurrency control mechanisms for main-memory databases. PVLDB, 5(4): 298--309, 2011.

Digital Library

[30]

M. Stonebraker and A. Weisberg. The VoltDB main memory DBMS. IEEE Data Eng. Bull., 36(2): 21--27, 2013.

[31]

V. Leis, A. Kemper, and T. Neumann. Exploiting hardware transactional memory in main-memory databases. In ICDE 2014.

[32]

H. Mühe, S. Wolf, A. Kemper, and T. Neumann: An evaluation of strict timestamp ordering concurrency control for main-memory database systems. In IMDM Workshop 2013.

[33]

M. Rosenblum and J. Ousterhout. The design and implementation of a log-structured file system. ACM TOCS 10(1): 26--52, 1992.

Digital Library

[34]

J. Levandoski, D. Lomet, S. Sengupta. LLAMA: A cache/storage subsystem for modern hardware. PVLDB 6(10): 877--888, 2013.

Digital Library

[35]

J. Levandoski, D. Lomet, and S. Sengupta. The Bw-Tree: A B-tree for new hardware platforms. In ICDE 2013.

Digital Library

[36]

M. Aguilera, J. Leners, and M. Walfish. Yesquel: scalable SQL storage for web applications. In SOSP 2015.

Digital Library

[37]

Percona Lab. TPC-C Benchmark over MySQL. Available at https://github.com/Percona-Lab/tpcc-mysql

[38]

P. Bernstein, C. Reid, and S. Das. Hyder -- A transactional record manager for shared flash. In CIDR 2011.

[39]

M. Aguilera, A. Merchant, M. Shah, A. Veitch, and C. Karamanolis. Sinfonia: A new paradigm for building scalable distributed systems. ACM Trans. Comput. Syst. 27(3): 2009.

Digital Library

[40]

M. Weiner. Sharding Pinterest: How we scaled our MySQL fleet. Pinterest Engineering Blog. Available at: https://engineering.pinterest.com/blog/sharding-pinterest-how-we-scaled-our-mysql-fleet

[41]

G. Graefe. Instant recovery for data center savings. ACM SIGMOD Record. 44(2):29--34, 2015.

Digital Library

[42]

J. Dean and L. Barroso. The tail at scale. CACM 56(2):74--80, 2013.

Digital Library

Cited By

Zhou XLeis VHu JYu XStonebraker M(2025)Practical DB-OS Co-Design with Privileged Kernel BypassProceedings of the ACM on Management of Data10.1145/37097143:1(1-27)Online publication date: 11-Feb-2025
https://dl.acm.org/doi/10.1145/3709714
Lee BMoon SPark JLee S(2025)Boosting OLTP Performance with Per-Page Logging on NVDIMMProceedings of the ACM on Management of Data10.1145/37096673:1(1-28)Online publication date: 11-Feb-2025
https://dl.acm.org/doi/10.1145/3709667
Hao WWang ZHong LLi LKarayanni NDasbach-Prisk AMao CYang JCidon AEeckhout LSmaragdakis GLiang KSampson AKim MRossbach C(2025)Nazar: Monitoring and Adapting ML Models on Mobile DevicesProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3669940.3707246(746-761)Online publication date: 3-Feb-2025
https://dl.acm.org/doi/10.1145/3669940.3707246
Show More Cited By

Index Terms

Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases
1. Computer systems organization
  1. Architectures
    1. Distributed architectures
  2. Dependable and fault-tolerant systems and networks
2. Information systems
  1. Data management systems
    1. Database management system engines
  2. Information storage systems
    1. Storage architectures
    2. Storage replication

Recommendations

Amazon Aurora: On Avoiding Distributed Consensus for I/Os, Commits, and Membership Changes
SIGMOD '18: Proceedings of the 2018 International Conference on Management of Data

Amazon Aurora is a high-throughput cloud-native relational database offered as part of Amazon Web Services (AWS). One of the more novel differences between Aurora and other relational databases is how it pushes redo processing to a multi-tenant scale-...
Benchmarking OLTP/web databases in the cloud: the OLTP-bench framework
CloudDB '12: Proceedings of the fourth international workshop on Cloud data management

Benchmarking is a key activity in building and tuning data management systems, but the lack of reference workloads and a common platform makes it a time consuming and painful task. The need for such a tool is heightened with the advent of cloud ...
Automated control for SLA-aware elastic clouds
FeBiD '10: Proceedings of the Fifth International Workshop on Feedback Control Implementation and Design in Computing Systems and Networks

Although Cloud Computing provides a means to support remote, on-demand access top a set of computing resources, its ad-hoc management for quality-of-service and SLA poses significant challenges to the performance, availability and economical costs of ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGMOD '17: Proceedings of the 2017 ACM International Conference on Management of Data

May 2017

1810 pages

ISBN:9781450341974

DOI:10.1145/3035918

General Chairs:
Rada Chirkova
North Carolina State University, USA
,
Jun Yang
Duke University, USA
,
Program Chair:
Dan Suciu
University of Washington, USA

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMOD: ACM Special Interest Group on Management of Data

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 May 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SIGMOD/PODS'17

Sponsor:

SIGMOD

SIGMOD/PODS'17: International Conference on Management of Data

May 14 - 19, 2017

Illinois, Chicago, USA

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

185
Total Citations
View Citations
10,167
Total Downloads

Downloads (Last 12 months)443
Downloads (Last 6 weeks)48

Reflects downloads up to 10 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhou XLeis VHu JYu XStonebraker M(2025)Practical DB-OS Co-Design with Privileged Kernel BypassProceedings of the ACM on Management of Data10.1145/37097143:1(1-27)Online publication date: 11-Feb-2025
https://dl.acm.org/doi/10.1145/3709714
Lee BMoon SPark JLee S(2025)Boosting OLTP Performance with Per-Page Logging on NVDIMMProceedings of the ACM on Management of Data10.1145/37096673:1(1-28)Online publication date: 11-Feb-2025
https://dl.acm.org/doi/10.1145/3709667
Hao WWang ZHong LLi LKarayanni NDasbach-Prisk AMao CYang JCidon AEeckhout LSmaragdakis GLiang KSampson AKim MRossbach C(2025)Nazar: Monitoring and Adapting ML Models on Mobile DevicesProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3669940.3707246(746-761)Online publication date: 3-Feb-2025
https://dl.acm.org/doi/10.1145/3669940.3707246
Liu YEzhilchelvan PMitrani I(2025)Design and Analysis of Distributed Message Ordering over a Unidirectional Logical RingComputer Performance Engineering10.1007/978-3-031-80932-3_1(1-13)Online publication date: 13-Feb-2025
https://doi.org/10.1007/978-3-031-80932-3_1
Zhang RYe ZCai PZhou XCai DQian L(2024)Hybrid Shared-Buffer for Multi-Master DatabasesJournal of Database Management10.4018/JDM.35692035:1(1-27)Online publication date: 11-Oct-2024
https://doi.org/10.4018/JDM.356920
Barnhart BBrooker MChinenkov DHooper TIm JJha PKraska TKurakula AKuznetsov AMcAlister GMuthukrishnan ANarayanan ATerry DUrgaonkar BYan J(2024)Resource Management in Aurora ServerlessProceedings of the VLDB Endowment10.14778/3685800.368582517:12(4038-4050)Online publication date: 8-Nov-2024
https://dl.acm.org/doi/10.14778/3685800.3685825
Chen YPan ALei HYe AHan STang YLu WChai YZhang FDu X(2024)TDSQL: Tencent Distributed Database SystemProceedings of the VLDB Endowment10.14778/3685800.368581217:12(3869-3882)Online publication date: 8-Nov-2024
https://dl.acm.org/doi/10.14778/3685800.3685812
Li GTian WZhang JGrosman RLiu ZLi S(2024)GaussDB: A Cloud-Native Multi-Primary Database with Compute-Memory-Storage DisaggregationProceedings of the VLDB Endowment10.14778/3685800.368580617:12(3786-3798)Online publication date: 8-Nov-2024
https://dl.acm.org/doi/10.14778/3685800.3685806
Han FLiu HChen BJia DZhou JTeng XYang CXi HTian WTao SWang SXu QYang Z(2024)PALF: Replicated Write-Ahead Logging for Distributed DatabasesProceedings of the VLDB Endowment10.14778/3685800.368580317:12(3745-3758)Online publication date: 8-Nov-2024
https://dl.acm.org/doi/10.14778/3685800.3685803
Hao XChandramouli B(2024)Bf-Tree: A Modern Read-Write-Optimized Concurrent Larger-Than-Memory Range IndexProceedings of the VLDB Endowment10.14778/3681954.368201217:11(3442-3455)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.14778/3681954.3682012
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten