Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

Detock: High Performance Multi-region Transactions at Scale

Published: 20 June 2023 Publication History

Abstract

Many globally distributed data stores need to replicate data across large geographic distances. Since synchronously replicating data across such distances is slow, those systems with high consistency requirements often geo-partition data and direct all linearizable requests to the primary region of the accessed data. This significantly improves performance for workloads where most transactions access data close to where they originate from. However, supporting serializable multi-geo-partition transactions is a challenge, and they often degrade the performance of the whole system. This becomes even more challenging when they conflict with single-partition requests, where optimistic protocols lead to high numbers of aborts, and pessimistic protocols lead to high numbers of distributed deadlocks. In this paper, we describe the design of concurrency control and deadlock resolution protocols, built within a practical, complete implementation of a geographically replicated database system called Detock, that enables processing strictly-serializable multi-region transactions with near-zero performance degradation at extremely high conflict and order of magnitude higher throughput relative to state-of-the art geo-replication approaches, while improving latency by up to a factor of 5.

Supplemental Material

MP4 File
Presentation video for SIGMOD 2023
PDF File
Read me
ZIP File
Source Code

References

[1]
2007. ZeroMQ. https://zeromq.org/.
[2]
2009. MongoDB. https://mongodb.com.
[3]
2010. TPC Benchmark C. http://www.tpc.org/tpcc/.
[4]
2012. Fauna. https://fauna.com.
[5]
2021. tc(8) - Linux manual page. https://man7.org/linux/man-pages/man8/tc.8.html.
[6]
2022. Production checklist | CockroachDB Docs. https://www.cockroachlabs.com/docs/stable/recommended-production-settings.htm.
[7]
Michael Abebe, Brad Glasbergen, and Khuzaima Daudjee. 2020. DynaMast: Adaptive Dynamic Mastering for Replicated Systems. In 2020 IEEE 36th International Conference on Data Engineering (ICDE). 1381--1392. https://doi.org/10.1109/ICDE48307.2020.00123
[8]
Michael Abebe, Brad Glasbergen, and Khuzaima Daudjee. 2020. MorphoSys: Automatic Physical Design Metamorphosis for Distributed Database Systems. Proc. VLDB Endow. 13, 13 (sep 2020), 3573--3587. https://doi.org/10.14778/3424573.3424578
[9]
Jason Baker, Chris Bond, James C. Corbett, JJ Furman, Andrey Khorlin, James Larson, Jean-Michel Leon, Yawei Li, Alexander Lloyd, and Vadim Yushprakh. 2011. Megastore: Providing Scalable, Highly Available Storage for Interactive Services. In Proceedings of the Conference on Innovative Data system Research (CIDR). 223--234. http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf
[10]
Philip A Bernstein, Vassos Hadzilacos, and Nathan Goodman. 1986. Concurrency Control and Recovery in Database Systems. Addison-Wesley Longman Publishing Co., Inc., USA.
[11]
P. A. Bernstein, D. W. Shipman, and W. S. Wong. 1979. Formal Aspects of Serializability in Database Concurrency Control. IEEE Trans. Softw. Eng. 5, 3 (may 1979), 203--216. https://doi.org/10.1109/TSE.1979.234182
[12]
Yuri Breitbart, Hector Garcia-Molina, and Avi Silberschatz. 1992. Overview of Multidatabase Transaction Management. The VLDB Journal 1, 2 (oct 1992), 181--240. https://doi.org/10.1007/BF01231700
[13]
Nathan Bronson, Zach Amsden, George Cabrera, Prasad Chakka, Peter Dimov, Hui Ding, Jack Ferris, Anthony Giardullo, Sachin Kulkarni, Harry Li, Mark Marchukov, Dmitri Petrov, Lovro Puzar, Yee Jiun Song, and Venkat Venkataramani. 2013. TAO: Facebook's Distributed Data Store for the Social Graph. In Proceedings of the 2013 USENIX Conference on Annual Technical Conference (San Jose, CA) (USENIX ATC'13). USENIX Association, USA, 49--60.
[14]
Xusheng Chen, Haoze Song, Jianyu Jiang, Chaoyi Ruan, Cheng Li, Sen Wang, Gong Zhang, Reynold Cheng, and Heming Cui. 2021. Achieving Low Tail-Latency and High Scalability for Serializable Transactions in Edge Computing. In Proceedings of the Sixteenth European Conference on Computer Systems (Online Event, United Kingdom) (EuroSys '21). Association for Computing Machinery, New York, NY, USA, 210--227. https://doi.org/10.1145/3447786.3456238
[15]
Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, and Ramana Yerneni. 2008. PNUTS: Yahoo!'s Hosted Data Serving Platform. Proc. VLDB Endow. 1, 2 (aug 2008), 1277--1288. https://doi.org/10.14778/1454159.1454167
[16]
Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking Cloud Serving Systems with YCSB. In Proceedings of the 1st ACM Symposium on Cloud Computing (Indianapolis, Indiana, USA) (SoCC '10). Association for Computing Machinery, New York, NY, USA, 143--154. https://doi.org/10.1145/1807128.1807152
[17]
James C. Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, J. J. Furman, Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter Hochschild, Wilson Hsieh, Sebastian Kanthak, Eugene Kogan, Hongyi Li, Alexander Lloyd, Sergey Melnik, David Mwaura, David Nagle, Sean Quinlan, Rajesh Rao, Lindsay Rolig, Yasushi Saito, Michal Szymaniak, Christopher Taylor, Ruth Wang, and Dale Woodford. 2013. Spanner: Google's Globally Distributed Database. 31, 3, Article 8 (aug 2013), 22 pages. https://doi.org/10.1145/2491245
[18]
Sudipto Das, Divyakant Agrawal, and Amr El Abbadi. 2010. G-Store: A Scalable Data Store for Transactional Multi Key Access in the Cloud. In Proceedings of the 1st ACM Symposium on Cloud Computing (Indianapolis, Indiana, USA) (SoCC '10). Association for Computing Machinery, New York, NY, USA, 163--174. https://doi.org/10.1145/1807128.1807157
[19]
K. Daudjee and K. Salem. 2004. Lazy database replication with ordering guarantees. In Proceedings. 20th International Conference on Data Engineering. 424--435. https://doi.org/10.1109/ICDE.2004.1320016
[20]
Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels. 2007. Dynamo: Amazon's Highly Available Key-Value Store (SOSP '07). Association for Computing Machinery, New York, NY, USA, 205--220. https://doi.org/10.1145/1294261.1294281
[21]
Diego Didona, Rachid Guerraoui, Jingjing Wang, and Willy Zwaenepoel. 2018. Causal Consistency and Latency Optimality: Friend or Foe? Proc. VLDB Endow. 11, 11 (jul 2018), 1618--1632. https://doi.org/10.14778/3236187.3236210
[22]
Jose M. Faleiro, Alexander Thomson, and Daniel J. Abadi. 2014. Lazy Evaluation of Transactions in Database Systems. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data (Snowbird, Utah, USA) (SIGMOD '14). Association for Computing Machinery, New York, NY, USA, 15--26. https://doi.org/10.1145/2588555.2610529
[23]
Hua Fan and Wojciech Golab. 2019. Ocean Vista: Gossip-Based Visibility Control for Speedy Geo-Distributed Transactions. Proc. VLDB Endow. 12, 11 (jul 2019), 1471--1484. https://doi.org/10.14778/3342263.3342627
[24]
Maurice P. Herlihy and Jeannette M. Wing. 1990. Linearizability: A Correctness Condition for Concurrent Objects. 12, 3 (jul 1990), 463--492. https://doi.org/10.1145/78969.78972
[25]
Shady Issa, Miguel Viegas, Pedro Raminhas, Nuno Machado, Miguel Matos, and Paolo Romano. 2020. Exploiting Symbolic Execution to Accelerate Deterministic Databases. In 2020 IEEE 40th International Conference on Distributed Computing Systems (ICDCS). 678--688. https://doi.org/10.1109/ICDCS47774.2020.00040
[26]
Bettina Kemme and Gustavo Alonso. 2000. Don't Be Lazy, Be Consistent: Postgres-R, A New Way to Implement Database Replication. In Proceedings of the 26th International Conference on Very Large Data Bases (VLDB '00). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 134--143.
[27]
Spencer Kimball and Irfan Sharif. 2022. Living Without Atomic Clocks. https://www.cockroachlabs.com/blog/living-without-atomic-clocks/.
[28]
Tim Kraska, Gene Pang, Michael J. Franklin, Samuel Madden, and Alan Fekete. 2013. MDCC: Multi-Data Center Consistency. In Proceedings of the 8th ACM European Conference on Computer Systems (Prague, Czech Republic) (EuroSys '13). Association for Computing Machinery, New York, NY, USA, 113--126. https://doi.org/10.1145/2465351.2465363
[29]
Avinash Lakshman and Prashant Malik. 2010. Cassandra: A Decentralized Structured Storage System. SIGOPS Oper. Syst. Rev. 44, 2 (apr 2010), 35--40. https://doi.org/10.1145/1773912.1773922
[30]
Leslie Lamport. 1998. The Part-Time Parliament. ACM Trans. Comput. Syst. 16, 2 (may 1998), 133--169. https://doi.org/10.1145/279227.279229
[31]
Leslie Lamport. 2001. Paxos Made Simple. ACM SIGACT News (Distributed Computing Column) 32, 4 (Whole Number 121, December 2001) (December 2001), 51--58. https://www.microsoft.com/en-us/research/publication/paxos-made-simple/
[32]
Cheng Li, Daniel Porto, Allen Clement, Johannes Gehrke, Nuno Preguiça, and Rodrigo Rodrigues. 2012. Making Geo-Replicated Systems Fast as Possible, Consistent When Necessary. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation (Hollywood, CA, USA) (OSDI'12). USENIX Association, USA, 265--278.
[33]
Qian Lin, Pengfei Chang, Gang Chen, Beng Chin Ooi, Kian-Lee Tan, and Zhengkui Wang. 2016. Towards a Non-2PC Transaction Management in Distributed Database Systems (SIGMOD '16). Association for Computing Machinery, New York, NY, USA, 1659--1674. https://doi.org/10.1145/2882903.2882923
[34]
Wyatt Lloyd, Michael J. Freedman, Michael Kaminsky, and David G. Andersen. 2011. Don't Settle for Eventual: Scalable Causal Consistency for Wide-Area Storage with COPS (SOSP '11). Association for Computing Machinery, New York, NY, USA, 401--416. https://doi.org/10.1145/2043556.2043593
[35]
Wyatt Lloyd, Michael J. Freedman, Michael Kaminsky, and David G. Andersen. 2013. Stronger Semantics for Low- Latency Geo-Replicated Storage. In Proceedings of the 10th USENIX Conference on Networked Systems Design and Implementation (Lombard, IL) (nsdi'13). USENIX Association, USA, 313--328.
[36]
Yi Lu, Xiangyao Yu, Lei Cao, and Samuel Madden. 2020. Aria: A Fast and Practical Deterministic OLTP Database. Proc. VLDB Endow. 13, 12 (jul 2020), 2047--2060. https://doi.org/10.14778/3407790.3407808
[37]
Hatem Mahmoud, Faisal Nawab, Alexander Pucher, Divyakant Agrawal, and Amr El Abbadi. 2013. Low-Latency Multi-Datacenter Databases Using Replicated Commit. Proc. VLDB Endow. 6, 9 (jul 2013), 661--672. https://doi.org/10.14778/2536360.2536366
[38]
Iulian Moraru, David G. Andersen, and Michael Kaminsky. 2013. There is More Consensus in Egalitarian Parliaments (SOSP '13). Association for Computing Machinery, New York, NY, USA, 358--372. https://doi.org/10.1145/2517349.2517350
[39]
Shuai Mu, Yang Cui, Yang Zhang, Wyatt Lloyd, and Jinyang Li. 2014. Extracting More Concurrency from Distributed Transactions. In Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation (Broomfield, CO) (OSDI'14). USENIX Association, USA, 479--494.
[40]
Shuai Mu, Lamont Nelson, Wyatt Lloyd, and Jinyang Li. 2016. Consolidating Concurrency Control and Consensus for Commits under Conflicts. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (Savannah, GA, USA) (OSDI'16). USENIX Association, USA, 517--532.
[41]
Diego Ongaro and John Ousterhout. 2014. In Search of an Understandable Consensus Algorithm. In Proceedings of the 2014 USENIX Conference on USENIX Annual Technical Conference (Philadelphia, PA) (USENIX ATC'14). USENIX Association, USA, 305--320.
[42]
Christos H. Papadimitriou. 1979. The Serializability of Concurrent Database Updates. J. ACM 26, 4 (oct 1979), 631--653. https://doi.org/10.1145/322154.322158
[43]
Abhinav Pathak, Himabindu Pucha, Ying Zhang, Y. Charlie Hu, and Z. Morley Mao. 2008. A Measurement Study of Internet Delay Asymmetry. In Proceedings of the 9th International Conference on Passive and Active Network Measurement (Cleveland, OH, USA) (PAM'08). Springer-Verlag, Berlin, Heidelberg, 182--191.
[44]
Karin Petersen, Mike J. Spreitzer, Douglas B. Terry, Marvin M. Theimer, and Alan J. Demers. 1997. Flexible Update Propagation for Weakly Consistent Replication. In Proceedings of the Sixteenth ACM Symposium on Operating Systems Principles (Saint Malo, France) (SOSP '97). Association for Computing Machinery, New York, NY, USA, 288--301. https://doi.org/10.1145/268998.266711
[45]
Seth Proctor. 2013. Exploring the Architecture of the NuoDB Database, Part 1. https://www.infoq.com/articles/nuodb-architecture-1.
[46]
Seth Proctor. 2013. Exploring the Architecture of the NuoDB Database, Part 2. https://www.infoq.com/articles/nuodb-architecture-2.
[47]
Thamir Qadah, Suyash Gupta, and Mohammad Sadoghi. 2020. Q-Store: Distributed, Multi-partition Transactions via Queue-oriented Execution and Communication. In EDBT (Copenhagen, Denmark). 73--84.
[48]
Sajjad Rahnama, Suyash Gupta, Rohan Sogani, Dhruv Krishnan, and Mohammad Sadoghi. 2022. RingBFT: Resilient Consensus over Sharded Ring Topology. In EDBT (Edinburgh, UK). 298--311.
[49]
Kun Ren, Dennis Li, and Daniel J. Abadi. 2019. SLOG: Serializable, Low-Latency, Geo-Replicated Transactions. Proc. VLDB Endow. 12, 11 (jul 2019), 1747--1761. https://doi.org/10.14778/3342263.3342647
[50]
Kun Ren, Alexander Thomson, and Daniel J. Abadi. 2014. An Evaluation of the Advantages and Disadvantages of Deterministic Database Systems. Proc. VLDB Endow. 7, 10 (jun 2014), 821--832. https://doi.org/10.14778/2732951.2732955
[51]
Michael Stonebraker, Samuel Madden, Daniel J. Abadi, Stavros Harizopoulos, Nabil Hachem, and Pat Helland. 2007. The End of an Architectural Era: (It's Time for a Complete Rewrite). In Proceedings of the 33rd International Conference on Very Large Data Bases (Vienna, Austria) (VLDB '07). VLDB Endowment, 1150--1160.
[52]
Rebecca Taft, Irfan Sharif, Andrei Matei, Nathan VanBenschoten, Jordan Lewis, Tobias Grieger, Kai Niemi, Andy Woods, Anne Birzin, Raphael Poss, Paul Bardea, Amruta Ranade, Ben Darnell, Bram Gruneir, Justin Jaffray, Lucy Zhang, and Peter Mattis. 2020. CockroachDB: The Resilient Geo-Distributed SQL Database. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (Portland, OR, USA) (SIGMOD '20). Association for Computing Machinery, New York, NY, USA, 1493--1509. https://doi.org/10.1145/3318464.3386134
[53]
D. B. Terry, M. M. Theimer, Karin Petersen, A. J. Demers, M. J. Spreitzer, and C. H. Hauser. 1995. Managing Update Conflicts in Bayou, a Weakly Connected Replicated Storage System. In Proceedings of the Fifteenth ACM Symposium on Operating Systems Principles (Copper Mountain, Colorado, USA) (SOSP '95). Association for Computing Machinery, New York, NY, USA, 172--182. https://doi.org/10.1145/224056.224070
[54]
Alexander Thomson, Thaddeus Diamond, Shu-Chun Weng, Kun Ren, Philip Shao, and Daniel J. Abadi. 2012. Calvin: Fast Distributed Transactions for Partitioned Database Systems (SIGMOD '12). Association for Computing Machinery, New York, NY, USA, 1--12. https://doi.org/10.1145/2213836.2213838
[55]
Nathan VanBenschoten, Arul Ajmani, Marcus Gartner, Andrei Matei, Aayush Shah, Irfan Sharif, Alexander Shraer, Adam Storm, Rebecca Taft, Oliver Tan, Andy Woods, and Peyton Walters. 2022. Enabling the Next Generation of Multi-Region Applications with CockroachDB. In Proceedings of the 2022 International Conference on Management of Data (Philadelphia, PA, USA) (SIGMOD '22). Association for Computing Machinery, New York, NY, USA, 2312--2325. https://doi.org/10.1145/3514221.3526053
[56]
Arthur Whitney, Dennis Shasha, and Stevan Apter. 1997. High volume transaction processing without currency control, two phase commit, SQL or C. In Seventh international workshop on high performance transaction systems, September 1997, Asimolar, California. 211--217.
[57]
Xinan Yan, Linguan Yang, and Bernard Wong. 2020. Domino: Using Network Measurements to Reduce State Machine Replication Latency in WANs. In Proceedings of the 16th International Conference on Emerging Networking EXperiments and Technologies (Barcelona, Spain) (CoNEXT '20). Association for Computing Machinery, New York, NY, USA, 351--363. https://doi.org/10.1145/3386367.3431291
[58]
Xinan Yan, Linguan Yang, Hongbo Zhang, Xiayue Charles Lin, Bernard Wong, Kenneth Salem, and Tim Brecht. 2018. Carousel: Low-Latency Transaction Processing for Globally-Distributed Data. In Proceedings of the 2018 International Conference on Management of Data (Houston, TX, USA) (SIGMOD '18). Association for Computing Machinery, New York, NY, USA, 231--243. https://doi.org/10.1145/3183713.3196912
[59]
Chang Yao, Divyakant Agrawal, Gang Chen, Qian Lin, Beng Chin Ooi, Weng-Fai Wong, and Meihui Zhang. 2016. Exploiting Single-Threaded Model in Multi-Core In-Memory Systems. IEEE Trans. on Knowl. and Data Eng. 28, 10 (oct 2016), 2635--2650. https://doi.org/10.1109/TKDE.2016.2578319
[60]
Irene Zhang, Naveen Kr. Sharma, Adriana Szekeres, Arvind Krishnamurthy, and Dan R. K. Ports. 2018. Building Consistent Transactions with Inconsistent Replication. ACM Trans. Comput. Syst. 35, 4, Article 12 (dec 2018), 37 pages. https://doi.org/10.1145/3269981

Cited By

View all
  • (2023)Caerus: Low-Latency Distributed Transactions for Geo-Replicated SystemsProceedings of the VLDB Endowment10.14778/3632093.363210917:3(469-482)Online publication date: 1-Nov-2023

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Management of Data
Proceedings of the ACM on Management of Data  Volume 1, Issue 2
PACMMOD
June 2023
2310 pages
EISSN:2836-6573
DOI:10.1145/3605748
Issue’s Table of Contents
This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 June 2023
Published in PACMMOD Volume 1, Issue 2

Badges

Author Tags

  1. deadlock resolution
  2. deterministic database
  3. multi-region database

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1,359
  • Downloads (Last 6 weeks)92
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Caerus: Low-Latency Distributed Transactions for Geo-Replicated SystemsProceedings of the VLDB Endowment10.14778/3632093.363210917:3(469-482)Online publication date: 1-Nov-2023

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media