Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2882903.2882951acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

TARDiS: A Branch-and-Merge Approach To Weak Consistency

Published: 26 June 2016 Publication History

Abstract

This paper presents the design, implementation, and evaluation of TARDiS (Transactional Asynchronously Replicated Divergent Store), a transactional key-value store explicitly designed for weakly-consistent systems. Reasoning about these systems is hard, as neither causal consistency nor per-object eventual convergence allow applications to deal satisfactorily with write-write conflicts. TARDiS instead exposes as its fundamental abstraction the set of conflicting branches that arise in weakly-consistent systems. To this end, TARDiS introduces a new concurrency control mechanism: branch-on-conflict. On the one hand, TARDiS guarantees that storage will appear sequential to any thread of execution that extends a branch, keeping application logic simple. On the other, TARDiS provides applications, when needed, with the tools and context necessary to merge branches atomically, when and how applications want. Since branch-on-conflict in TARDiS is fast, weakly-consistent applications can benefit from adopting this paradigm not only for operations issued by different sites, but also, when appropriate, for conflicting local operations. We find that TARDiS reduces coding complexity for these applications and that judicious branch-on-conflict can improve their local throughput at each site by two to eight times.

References

[1]
Gossip-based computer networking. ACM SIGOPS Operating Systems Review, 41(5), 2007.
[2]
M. K. Aguilera, A. Merchant, M. Shah, A. Veitch, and C. Karamanolis. Sinfonia: A new paradigm for building scalable distributed systems. In Proceedings of the 21st ACM Symposium on Operating Systems Principles, SOSP '07, pages 159--174.
[3]
M. K. Aguilera, S. Spence, and A. Veitch. Olive: Distributed point-in-time branching storage for real systems. In Proceedings of the 3rd USENIX Symposium on Networked Systems Design and Implementation - Volume 3, NSDI '06.
[4]
M. Ahamad, G. Neiger, J. E. Burns, P. Kohli, and P. Hutto. Causal memory: Definitions, implementation and programming. Technical report, Georgia Institute of Technology, 1994.
[5]
P. Alvaro, P. Bailis, N. Conway, and J. M. Hellerstein. Consistency without borders. In Proceedings of the 4th ACM Symposium on Cloud Computing, SOCC '13, pages 23:1--23:10.
[6]
Apache. Cassandra. http://cassandra.apache.org/.
[7]
M. S. Ardekani, P. Sutra, and M. Shapiro. Non-monotonic snapshot isolation: Scalable and strong consistency for geo-replicated transactional systems. In Proceedings of the 32nd International Symposium on Reliable Distributed Systems, SRDS '13, pages 163--172.
[8]
P. Bailis, A. Fekete, A. Ghodsi, J. M. Hellerstein, and I. Stoica. The potential dangers of causal consistency and an explicit solution. In Proceedings of the 3rd ACM Symposium on Cloud Computing, SOCC '12, pages 22:1--22:7.
[9]
P. Bailis, A. Ghodsi, J. M. Hellerstein, and I. Stoica. Bolt-on causal consistency. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, SIGMOD '13.
[10]
Basho. Riak. http://basho.com/products/.
[11]
H. Berenson, P. Bernstein, J. Gray, J. Melton, E. O'Neil, and P. O'Neil. A critique of ANSI SQL isolation levels. In ACM SIGMOD Record, volume 24, pages 1--10, 1995.
[12]
P. A. Bernstein and N. Goodman. Multiversion concurrency control;theory and algorithms. ACM Transactions on Database Systems, 8(4):465--483, 1983.
[13]
P. A. Bernstein, V. Hadzilacos, and N. Goodman. Concurrency control and recovery in database systems. 1987.
[14]
E. A. Brewer. Towards robust distributed systems (abstract). In Proceedings of the 19th ACM Symposium on Principles of Distributed Computing, PODC '00.
[15]
C. Cachin, I. Keidar, and A. Shraer. Trusting the cloud. SIGACT News, 40(2):81--86, June 2009.
[16]
Cassandra. Cassandra Use Cases. http://www.planetcassandra.org/apache-cassandra-use-cases/.
[17]
B. F. Cooper, R. Ramakrishnan, U. Srivastava, A. Silberstein, P. Bohannon, H.-A. Jacobsen, N. Puz, D. Weaver, and R. Yerneni. PNUTS: Yahoo!'s hosted data serving platform. Proceedings of the VLDB Endowment, 1(2):1277--1288, 2008.
[18]
B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears. Benchmarking cloud serving systems with YCSB. In Proceedings of the 1st ACM Symposium on Cloud Computing, SoCC '10, pages 143--154.
[19]
J. C. Corbett, J. Dean, M. Epstein, A. Fikes, C. Frost, J. J. Furman, S. Ghemawat, A. Gubarev, C. Heiser, P. Hochschild, W. Hsieh, S. Kanthak, E. Kogan, H. Li, A. Lloyd, S. Melnik, D. Mwaura, D. Nagle, S. Quinlan, R. Rao, L. Rolig, Y. Saito, M. Szymaniak, C. Taylor, R. Wang, and D. Woodford. Spanner: Google's globally-distributed database. In Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation, OSDI '12, pages 251--264.
[20]
G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: Amazon's highly available key-value store. In Proceedings of 21st ACM Symposium on Operating Systems Principles, SOSP '07, pages 205--220.
[21]
J. Du, C. Iorgulescu, A. Roy, and W. Zwaenepoel. Gentlerain: Cheap and scalable causal consistency with physical clocks. In Proceedings of the ACM Symposium on Cloud Computing, SOCC '14, pages 4:1--4:13.
[22]
A. J. Feldman, W. P. Zeller, M. J. Freedman, and E. W. Felten. SPORC: Group collaboration using untrusted cloud resources. In Proceedings of the 9th USENIX Symposium on Operating Systems Design and Implementation, OSDI '10.
[23]
S. Gilbert and N. Lynch. Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services. SIGACT News, 33(2):51--59, June 2002.
[24]
Git. Git: the fast version control system. http://git-scm.com.
[25]
R. G. Guy, J. S. Heidemann, W. Mak, T. W. Page, Jr., G. J. Popek, and D. Rothmeir. Implementation of the Ficus Replicated File System. In Proceedings of the Summer 1990 USENIX Conference, pages 63--72, 1990.
[26]
R. Klophaus. Riak core: Building distributed applications without shared state. In ACM SIGPLAN Commercial Users of Functional Programming, CUFP '10.
[27]
T. Kraska, G. Pang, M. J. Franklin, S. Madden, and A. Fekete. Mdcc: multi-data center consistency. In Proceedings of the 8th ACM European Conference on Computer Systems, EuroSys '13, pages 113--126.
[28]
H. Kung and J. T. Robinson. On optimistic methods for concurrency control. ACM Transactions on Database Systems, 6(2):213--226, June 1981.
[29]
C. Li, D. Porto, A. Clement, J. Gehrke, N. Preguiça, and R. Rodrigues. Making geo-replicated systems fast as possible, consistent when necessary. In Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation, OSDI '12, pages 265--278.
[30]
J. Li, M. Krohn, D. Mazières, and D. Shasha. Secure untrusted data repository (SUNDR). In Proceedings of the 6th USENIX Symposium on Operating Systems Design and Implementation - Volume 6, OSDI '04.
[31]
W. Lloyd, M. J. Freedman, M. Kaminsky, and D. G. Andersen. Don't settle for eventual: scalable causal consistency for wide-area storage with COPS. In Proceedings of the 23rd ACM Symposium on Operating Systems Principles, SOSP '11, pages 401--416.
[32]
W. Lloyd, M. J. Freedman, M. Kaminsky, and D. G. Andersen. Stronger semantics for low-latency geo-replicated storage. In Proceedings of the 10th USENIX Symposium on Networked Systems Design and Implementation, NSDI '13, pages 313--328.
[33]
P. Mahajan, L. Alvisi, and M. Dahlin. Consistency, availability, convergence. Technical Report TR-11--22, Computer Science Department, University of Texas at Austin, May 2011.
[34]
P. Mahajan, S. Setty, S. Lee, A. Clement, L. Alvisi, M. Dahlin, and M. Walfish. Depot: Cloud storage with minimal trust. ACM Transactions on Computer Systems, 29(4):12, 2011.
[35]
MapDB. MapDB: Embedded Database Engine. http://www.mapdb.org/.
[36]
A. J. Mashtizadeh, A. Bittau, Y. F. Huang, and D. Mazières. Replication, History, and Grafting in the Ori File System. In Proceedings of the 24th ACM Symposium on Operating Systems Principles, SOSP '13, pages 151--166.
[37]
MongoDB. Agility, Performance, Scalibility. Pick three. https://www.mongodb.org/.
[38]
M. A. Olson, K. Bostic, and M. Seltzer. Berkeley DB. In Proceedings of the Annual Conference on USENIX Annual Technical Conference, ATEC '99.
[39]
R. Padilha and F. Pedone. Augustus: Scalable and robust storage for cloud applications. In Proceedings of the 8th ACM European Conference on Computer Systems, EuroSys '13, pages 99--112.
[40]
N. Preguica, J. M. Marques, M. Shapiro, and M. Letia. A commutative replicated data type for cooperative editing. In Proceedings of the 29th IEEE International Conference on Distributed Computing Systems, ICDCS '09, pages 395--403.
[41]
Retwis. Twitter-like Clone. http://retwis.redis.io/.
[42]
M. Shapiro, N. Pregui\c ca, C. Baquero, and M. Zawirski. A comprehensive study of Convergent and Commutative Replicated Data Types. Rapport de recherche RR-7506, INRIA, Jan. 2011.
[43]
L. Shrira, H. Tian, and D. Terry. Exo-leasing: escrow synchronization for mobile clients of commodity storage servers. In Proceedings of the 9th ACM/IFIP/USENIX International Conference on Middleware, Middleware '08, pages 42--61.
[44]
Y. Sovran, R. Power, M. K. Aguilera, and J. Li. Transactional storage for geo-replicated systems. In Proceedings of the 23rd ACM Symposium on Operating Systems Principles, SOSP '11, pages 385--400.
[45]
C. Sun and C. Ellis. Operational transformation in real-time group editors: issues, algorithms, and achievements. In Proceedings of the 1998 ACM Conference on Computer Supported Cooperative Work, CSCW '98, pages 59--68.
[46]
D. B. Terry, A. J. Demers, K. Petersen, M. J. Spreitzer, M. M. Theimer, and B. B. Welch. Session guarantees for weakly consistent replicated data. In Proceedings of the 3rd International Conference on on Parallel and Distributed Information Systems, PDIS '94, pages 140--150.
[47]
D. B. Terry, V. Prabhakaran, R. Kotla, M. Balakrishnan, M. K. Aguilera, and H. Abu-Libdeh. Consistency-based service level agreements for cloud storage. In Proceedings of the 24th ACM Symposium on Operating Systems Principles, SOSP '13, pages 309--324.
[48]
D. B. Terry, M. M. Theimer, K. Petersen, A. J. Demers, M. J. Spreitzer, and C. H. Hauser. Managing update conflicts in Bayou, a weakly connected replicated storage system. In Proceedings of the 15th ACM Symposium on Operating Systems Principles, SOSP '95, pages 172--182.
[49]
A. Thomson, T. Diamond, S.-C. Weng, K. Ren, P. Shao, and D. J. Abadi. Calvin: Fast distributed transactions for partitioned database systems. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, SIGMOD '12, pages 1--12.
[50]
Wikipedia. Wikipedia: Conflicting Sources. http://en.wikipedia.org/wiki/Wikipedia:Conflicting_sources.
[51]
Z. Wu, M. Butkiewicz, D. Perkins, E. Katz-Bassett, and H. V. Madhyastha. Spanstore: Cost-effective geo-replicated storage spanning multiple cloud services. In Proceedings of the 24th ACM Symposium on Operating Systems Principles, SOSP '13, pages 292--308.
[52]
H. Yu and A. Vahdat. Design and evaluation of a continuous consistency model for replicated services. In Proceedings of the 4th USENIX Symposium on Operating Systems Design and Implementation - Volume 4, OSDI '00.
[53]
Y. Zhang, R. Power, S. Zhou, Y. Sovran, M. K. Aguilera, and J. Li. Transaction chains: Achieving serializability with low latency in geo-distributed storage systems. In Proceedings of the 24th ACM Symposium on Operating Systems Principles, SOSP '13, pages 276--291.

Cited By

View all
  • (2023)RALF: Accuracy-Aware Scheduling for Feature Store MaintenanceProceedings of the VLDB Endowment10.14778/3632093.363211617:3(563-576)Online publication date: 1-Nov-2023
  • (2022)Keep CALM and CRDT OnProceedings of the VLDB Endowment10.14778/3574245.357426816:4(856-863)Online publication date: 1-Dec-2022
  • (2022)Katara: synthesizing CRDTs with verified liftingProceedings of the ACM on Programming Languages10.1145/35633366:OOPSLA2(1349-1377)Online publication date: 31-Oct-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '16: Proceedings of the 2016 International Conference on Management of Data
June 2016
2300 pages
ISBN:9781450335317
DOI:10.1145/2882903
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 June 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. causal consistency
  2. databases
  3. distributed systems
  4. eventual consistency
  5. geo-distribution
  6. merging
  7. replication
  8. transactions
  9. weak consistency

Qualifiers

  • Research-article

Funding Sources

  • Google European Fellowship in Distributed Computing
  • Google Cloud Platform Credit award
  • National Science Foundation

Conference

SIGMOD/PODS'16
Sponsor:
SIGMOD/PODS'16: International Conference on Management of Data
June 26 - July 1, 2016
California, San Francisco, USA

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)33
  • Downloads (Last 6 weeks)6
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2023)RALF: Accuracy-Aware Scheduling for Feature Store MaintenanceProceedings of the VLDB Endowment10.14778/3632093.363211617:3(563-576)Online publication date: 1-Nov-2023
  • (2022)Keep CALM and CRDT OnProceedings of the VLDB Endowment10.14778/3574245.357426816:4(856-863)Online publication date: 1-Dec-2022
  • (2022)Katara: synthesizing CRDTs with verified liftingProceedings of the ACM on Programming Languages10.1145/35633366:OOPSLA2(1349-1377)Online publication date: 31-Oct-2022
  • (2022)Certified mergeable replicated data typesProceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation10.1145/3519939.3523735(332-347)Online publication date: 9-Jun-2022
  • (2021)Version Reconciliation for Collaborative DatabasesProceedings of the ACM Symposium on Cloud Computing10.1145/3472883.3486980(473-488)Online publication date: 1-Nov-2021
  • (2020)CloudburstProceedings of the VLDB Endowment10.14778/3407790.340783613:12(2438-2452)Online publication date: 14-Sep-2020
  • (2020)Interactive checks for coordination avoidanceThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-020-00628-330:1(71-92)Online publication date: 5-Sep-2020
  • (2020)Banyan: Coordination-Free Distributed Transactions over Mergeable TypesProgramming Languages and Systems10.1007/978-3-030-64437-6_12(231-250)Online publication date: 30-Nov-2020
  • (2019)Mergeable replicated data typesProceedings of the ACM on Programming Languages10.1145/33605803:OOPSLA(1-29)Online publication date: 10-Oct-2019
  • (2019)GoTcha: an interactive debugger for GoT-based distributed systemsProceedings of the 2019 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software10.1145/3359591.3359733(94-110)Online publication date: 23-Oct-2019
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media