Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

Spanner: Google’s Globally Distributed Database

Published: 01 August 2013 Publication History

Abstract

Spanner is Google’s scalable, multiversion, globally distributed, and synchronously replicated database. It is the first system to distribute data at global scale and support externally-consistent distributed transactions. This article describes how Spanner is structured, its feature set, the rationale underlying various design decisions, and a novel time API that exposes clock uncertainty. This API and its implementation are critical to supporting external consistency and a variety of powerful features: nonblocking reads in the past, lock-free snapshot transactions, and atomic schema changes, across all of Spanner.

References

[1]
Abouzeid, A., Bajda-Pawlikowski, K., Abadi, D., Silberschatz, A., and Rasin, A. 2009. Hadoopdb: An architectural hybrid of mapreduce and DBMS technologies for analytical workloads. In Proceedings of the International Conference on Very Large Data Bases. 922--933.
[2]
Adya, A., Gruber, R., Liskov, B., and Maheshwari, U. 1995. Efficient optimistic concurrency control using loosely synchronized clocks. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 23--34.
[3]
Amazon. 2012. Amazon dynamodb. http://aws.amazon.com/dynamodb.
[4]
Armbrust, M., Curtis, K., Kraska, T., Fox, A., Franklin, M., and Patterson, D. 2011. PIQL: Success-tolerant query processing in the cloud. In Proceedings of the International Conference on Very Large Data Bases. 181--192.
[5]
Baker, J., Bond, C., Corbett, J. C., Furman, J., Khorlin, A., Larson, J., Léon, J.-M., Li, Y., Lloyd, A., and Yushprakh, V. 2011. Megastore: Providing scalable, highly available storage for interactive services. In Proceedings of CIDR. 223--234.
[6]
Berenson, H., Bernstein, P., Gray, J., Melton, J., O’Neil, E., and O’Neil, P. 1995. A critique of ANSI SQL isolation levels. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 1--10.
[7]
Brantner, M., Florescu, D., Graf, D., Kossmann, D., and Kraska, T. 2008. Building a database on S3. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 251--264.
[8]
Chan, A. and Gray, R. 1985. Implementing distributed read-only transactions. IEEE Trans. Softw. Eng. SE-11, 2, 205--212.
[9]
Chang, F., Dean, J., Ghemawat, S., Hsieh, W. C., Wallach, D. A., Burrows, M., Chandra, T., Fikes, A., and Gruber, R. E. 2008. Bigtable: A distributed storage system for structured data. ACM Trans. Comput. Syst. 26, 2, 4:1--4:26.
[10]
Cooper, B. F., Ramakrishnan, R., Srivastava, U., Silberstein, A., Bohannon, P., Jacobsen, H.-A., Puz, N., Weaver, D., and Yerneni, R. 2008. PNUTS: Yahoo!’s hosted data serving platform. In Proceedings of the International Conference on Very Large Data Bases. 1277--1288.
[11]
Cowling, J. and Liskov, B. 2012. Granola: Low-overhead distributed transaction coordination. In Proceedings of USENIX ATC. 223--236.
[12]
Dean, J. and Ghemawat, S. 2010. MapReduce: A flexible data processing tool. Comm. ACM 53, 1, 72--77.
[13]
Douceur, J. and Howell, J. 2003. Scalable Byzantine-fault-quantifying clock synchronization. Tech. rep. MSR-TR-2003-67, MS Research.
[14]
Douceur, J. R. and Howell, J. 2006. Distributed directory service in the Farsite file system. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation. 321--334.
[15]
Ghemawat, S., Gobioff, H., and Leung, S.-T. 2003. The Google file system. In Proceedings of the Symposium on Operating Systems Principles. 29--43.
[16]
Gifford, D. K. 1982. Information storage in a decentralized computer system. Tech. rep. CSL-81-8, Xerox PARC.
[17]
Glendenning, L., Beschastnikh, I., Krishnamurthy, A., and Anderson, T. 2011. Scalable consistency in scatter. In Proceedings of the Symposium on Operating Systems Principles.
[18]
Google. 2008. Protocol buffers --- Google’s data interchange format. https://code.google.com/p/protobuf.
[19]
Gray, J. and Lamport, L. 2006. Consensus on transaction commit. ACM Trans. Datab. Syst. 31, 1, 133--160.
[20]
Helland, P. 2007. Life beyond distributed transactions: An apostate’s opinion. In Proceedings of the Biennial Conference on Innovative Data Systems Research. 132--141.
[21]
Herlihy, M. P. and Wing, J. M. 1990. Linearizability: A correctness condition for concurrent objects. ACM Trans. Program. Lang. Syst. 12, 3, 463--492.
[22]
Lamport, L. 1998. The part-time parliament. ACM Trans. Comput. Syst. 16, 2, 133--169.
[23]
Lamport, L., Malkhi, D., and Zhou, L. 2010. Reconfiguring a state machine. SIGACT News 41, 1, 63--73.
[24]
Liskov, B. 1993. Practical uses of synchronized clocks in distributed systems. Distrib. Comput. 6, 4, 211--219.
[25]
Lomet, D. B. and Li, F. 2009. Improving transaction-time DBMS performance and functionality. In Proceedings of the International Conference on Data Engineering. 581--591.
[26]
Lorch, J. R., Adya, A., Bolosky, W. J., Chaiken, R., Douceur, J. R., and Howell, J. 2006. The SMART way to migrate replicated stateful services. In Proceedings of EuroSys. 103--115.
[27]
MarkLogic. 2012. Marklogic 5 product documentation. http://community.marklogic.com/docs.
[28]
Marzullo, K. and Owicki, S. 1983. Maintaining the time in a distributed system. In Proceedings of the Symposium on Principles of Distributed Computing. 295--305.
[29]
Melnik, S., Gubarev, A., Long, J. J., Romer, G., Shivakumar, S., Tolton, M., and Vassilakis, T. 2010. Dremel: Interactive analysis of Web-scale datasets. In Proceedings of the International Conference on Very Large Data Bases. 330--339.
[30]
Mills, D. 1981. Time synchronization in DCNET hosts. Internet project rep. IEN--173, COMSAT Laboratories.
[31]
Oracle. 2012. Oracle total recall. http://www.oracle.com/technetwork/database/focus-areas/storage/total-recall-whitepaper-171749.pdf.
[32]
Pavlo, A., Paulson, E., Rasin, A., Abadi, D. J., DeWitt, D. J., Madden, S., and Stonebraker, M. 2009. A comparison of approaches to large-scale data analysis. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 165--178.
[33]
Peng, D. and Dabek, F. 2010. Large-scale incremental processing using distributed transactions and notifications. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation. 1--15.
[34]
Rosenkrantz, D. J., Stearns, R. E., and Lewis II, P. M. 1978. System level concurrency control for distributed database systems. ACM Trans. Datab. Syst. 3, 2, 178--198.
[35]
Shraer, A., Reed, B., Malkhi, D., and Junqueiera, F. 2012. Dynamic reconfiguration of primary/backup clusters. In Proceedings of USENIX ATC. 425--438.
[36]
Shute, J., Oancea, M., Ellner, S., Handy, B., Rollins, E., Samwel, B., Vingralek, R., Whipkey, C., Chen, X., Jegerlehner, B., Littlefield, K., and Tong, P. 2012. F1 --- The fault-tolerant distributed RDBMS supporting Google’s ad business. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 777--778.
[37]
Sovran, Y., Power, R., Aguilera, M. K., and Li, J. 2011. Transactional storage for geo-replicated systems. In Proceedings of the Symposium on Operating Systems Principles. 385--400.
[38]
Stonebraker, M. 2010a. Six SQL urban myths. http://highscalability.com/blog/2010/6/28/voltdb-decapitates -six-sql-urban-myths-and-delivers-internet.html.
[39]
Stonebraker, M. 2010b. Why enterprises are uninterested in NoSQL. http://cacm.acm.org/blogs/blog-cacm/99512-why-enterprises-are-uninterested-in-nosql/fulltext.
[40]
Stonebraker, M., Madden, S., Abadi, D. J., Harizopoulos, S., Hachem, N., and Helland, P. 2007. The end of an architectural era: (It’s time for a complete rewrite). In Proceedings of the International Conference on Very Large Data Bases. 1150--1160.
[41]
Thomson, A., Diamond, T., Shu-Chun Weng, T. D., Ren, K., Shao, P., and Abadi, D. J. 2012. Calvin: Fast distributed transactions for partitioned database systems. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 1--12.
[42]
Thusoo, A., Sarma, J. S., Jain, N., Shao, Z., Chakka, P., Zhang, N., Antony, S., Liu, H., and Murthy, R. 2010. Hive --- A petabyte scale data warehouse using Hadoop. In Proceedings of the International Conference on Data Engineering. 996--1005.
[43]
VoltDB. 2012. VoltDB resources. http://voltdb.com/resources/whitepapers.

Cited By

View all
  • (2025)Efficient distributed matrix for resolving computational intensity in remote sensingFuture Generation Computer Systems10.1016/j.future.2024.107644166(107644)Online publication date: May-2025
  • (2024)Horizontally Scalable Implementation of a Distributed DBMS Delivering Causal Consistency via the Actor ModelElectronics10.3390/electronics1317336713:17(3367)Online publication date: 24-Aug-2024
  • (2024)Asynchronous Consensus Quorum Read: Pioneering Read Optimization for Asynchronous Consensus ProtocolsElectronics10.3390/electronics1303048113:3(481)Online publication date: 23-Jan-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Computer Systems
ACM Transactions on Computer Systems  Volume 31, Issue 3
August 2013
94 pages
ISSN:0734-2071
EISSN:1557-7333
DOI:10.1145/2518037
Issue’s Table of Contents
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 August 2013
Accepted: 01 May 2013
Received: 01 January 2013
Published in TOCS Volume 31, Issue 3

Check for updates

Author Tags

  1. Distributed databases
  2. concurrency control
  3. replication
  4. time management
  5. transactions

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4,706
  • Downloads (Last 6 weeks)756
Reflects downloads up to 23 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2025)Efficient distributed matrix for resolving computational intensity in remote sensingFuture Generation Computer Systems10.1016/j.future.2024.107644166(107644)Online publication date: May-2025
  • (2024)Horizontally Scalable Implementation of a Distributed DBMS Delivering Causal Consistency via the Actor ModelElectronics10.3390/electronics1317336713:17(3367)Online publication date: 24-Aug-2024
  • (2024)Asynchronous Consensus Quorum Read: Pioneering Read Optimization for Asynchronous Consensus ProtocolsElectronics10.3390/electronics1303048113:3(481)Online publication date: 23-Jan-2024
  • (2024)Consensus in Data Management: With Use Cases in Edge-Cloud and Blockchain SystemsProceedings of the VLDB Endowment10.14778/3685800.368584317:12(4233-4236)Online publication date: 1-Aug-2024
  • (2024)Differentially Private Stream Processing at ScaleProceedings of the VLDB Endowment10.14778/3685800.368583317:12(4145-4158)Online publication date: 1-Aug-2024
  • (2024)Transparent Migration from Datastore to FirestoreProceedings of the VLDB Endowment10.14778/3685800.368581917:12(3960-3972)Online publication date: 1-Aug-2024
  • (2024)Timestamp as a Service, Not an OracleProceedings of the VLDB Endowment10.14778/3641204.364121017:5(994-1006)Online publication date: 1-Jan-2024
  • (2024)OMAHA: Opportunistic Message Aggregation for pHase-based AlgorithmsFormal Aspects of Computing10.1145/369859336:4(1-23)Online publication date: 19-Oct-2024
  • (2024)ByteMQ: A Cloud-native Streaming Data Layer in ByteDanceProceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698536(774-791)Online publication date: 20-Nov-2024
  • (2024)Occam's Razor for Distributed ProtocolsProceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698514(618-636)Online publication date: 20-Nov-2024
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media