Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

OceanBase Paetica: A Hybrid Shared-Nothing/Shared-Everything Database for Supporting Single Machine and Distributed Cluster

Published: 01 August 2023 Publication History

Abstract

In the ongoing evolution of the OceanBase database system, it is essential to enhance its adaptability to small-scale enterprises. The OceanBase database system has demonstrated its stability and effectiveness within the Ant Group and other commercial organizations, besides through the TPC-C and TPC-H tests. In this paper, we have designed a stand-alone and distributed integrated architecture named Paetica to address the overhead caused by the distributed components in the stand-alone mode, with respect to the OceanBase system. Paetica enables adaptive configuration of the database that allows OceanBase to support both serial and parallel executions in stand-alone and distributed scenarios, thus providing efficiency and economy. This design has been implemented in version 4.0 of the OceanBase system, and the experiments show that Paetica exhibits notable scalability and outperforms alternative stand-alone or distributed databases. Furthermore, it enables the transition of OceanBase from primarily serving large enterprises to truly catering to small and medium enterprises, by employing a single OceanBase database for the successive stages of enterprise or business development, without the requirement for migration. Our experiments confirm that Paetica has achieved linear scalability with the increasing CPU core number within the stand-alone mode. It also outperforms MySQL and Greenplum in the Sysbench and TPC-H evaluations.

References

[1]
2021. Favorite of Taobao.com. https://shoucang.taobao.com.
[2]
2021. MulanPubL-2.0. https://license.coscl.org.cn/MulanPubL-2.0/index.html.
[3]
2021. OceanBase. https://gitee.com/oceanbase.
[4]
2021. OceanBase. https://github.com/oceanbase.
[5]
2021. OceanBase: 15 million QphH@30,000GB. http://tpc.org/3375.
[6]
2021. OceanBase: 707 million tmpC. http://tpc.org/1803.
[7]
2023. TPC-H. https://www.tpc.org/tpch/.
[8]
Y Al-Houmaily and P Chrysanthis. 1995. Two-phase commit in gigabit-networked distributed databases. In Int. Conf. on Parallel and Distributed Computing Systems (PDCS). Citeseer.
[9]
David F Bacon, Nathan Bales, Nico Bruno, Brian F Cooper, Adam Dickinson, Andrew Fikes, Campbell Fraser, Andrey Gubarev, Milind Joshi, Eugene Kogan, et al. 2017. Spanner: Becoming a SQL system. In Proceedings of the 2017 ACM International Conference on Management of Data. 331--343.
[10]
Claude Barthels, Ingo Müller, Konstantin Taranov, Gustavo Alonso, and Torsten Hoefler. 2019. Strong consistency is not hard to get: Two-Phase Locking and Two-Phase Commit on Thousands of Cores. Proceedings of the VLDB Endowment 12, 13 (2019), 2325--2338.
[11]
Philip A Bernstein and Nathan Goodman. 1983. Multiversion concurrency control---theory and algorithms. ACM Transactions on Database Systems (TODS) 8, 4 (1983), 465--483.
[12]
William Bridge, Ashok Joshi, M Keihl, Tirthankar Lahiri, Juan Loaiza, and N MacNaughton. 1997. The oracle universal server buffer manager. In PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON VERY LARGE DATA BASES. Citeseer, 590--594.
[13]
Wei Cao, Yang Liu, Zhushi Cheng, Ning Zheng, Wei Li, Wenjie Wu, Linqiang Ouyang, Peng Wang, Yijing Wang, Ray Kuan, Zhenjun Liu, Feng Zhu, and Tong Zhang. 2020. POLARDB Meets Computational Storage: Efficiently Support Analytical Workloads in Cloud-Native Relational Database. In 18th USENIX Conference on File and Storage Technologies, FAST 2020, Santa Clara, CA, USA, February 24--27, 2020, Sam H. Noh and Brent Welch (Eds.). USENIX Association, 29--41.
[14]
Michael J Carey and Waleed A Muhanna. 1986. The performance of multiversion concurrency control algorithms. ACM Transactions on Computer Systems (TOCS) 4, 4 (1986), 338--378.
[15]
Apache Cassandra. 2014. Apache cassandra. Website. Available online at http://planetcassandra.org/what-isapache-cassandra 13 (2014).
[16]
Transaction Processing Performance Council. 2010. TPC BENCHMARK™ C Standard Specification Revision 5.11 Standard Specification.
[17]
Umur Cubukcu, Ozgun Erdogan, Sumedh Pathak, Sudhakar Sannakkayala, and Marco Slot. 2021. Citus: Distributed PostgreSQL for data-intensive applications. In Proceedings of the 2021 International Conference on Management of Data. 2490--2502.
[18]
Maxwell Dayvson Da Silva and Hugo Lopes Tavares. 2015. Redis Essentials. Packt Publishing Ltd.
[19]
Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels. 2007. Dynamo: Amazon's highly available key-value store. ACM SIGOPS operating systems review 41, 6 (2007), 205--220.
[20]
Mostafa Elhemali, Niall Gallagher, Bin Tang, Nick Gordon, Hao Huang, Haibo Chen, Joseph Idziorek, Mengtian Wang, Richard Krog, Zongpeng Zhu, et al. 2022. Amazon {DynamoDB}: A Scalable, Predictably Performant, and Fully Managed {NoSQL} Database Service. In 2022 USENIX Annual Technical Conference (USENIX ATC 22). 1037--1048.
[21]
Vibby Gottemukkala, Edward Omiecinski, and Umakishore Ramachandran. 1994. A scalable sharing architecture for a parallel database system. In Proceedings of 1994 6th IEEE Symposium on Parallel and Distributed Processing. IEEE, 110--117.
[22]
Jim Gray and Leslie Lamport. 2006. Consensus on transaction commit. ACM Transactions on Database Systems (TODS) 31, 1 (2006), 133--160.
[23]
Rick Greenwald, Robert Stackowiak, and Jonathan Stern. 2013. Oracle essentials: Oracle database 12c. " O'Reilly Media, Inc.".
[24]
Pat Helland, Harald Sammer, Jim Lyon, Richard Carr, Phil Garrett, and Andreas Reuter. 1987. Group Commit Timers and High Volume Transaction Systems. In High Performance Transaction Systems, 2nd International Workshop, Asilomar Conference Center, Pacific Grove, California, USA, September 28--30, 1987, Proceedings (Lecture Notes in Computer Science), Dieter Gawlick, Mark N. Haynie, and Andreas Reuter (Eds.), Vol. 359. Springer, 301--329.
[25]
Dongxu Huang, Qi Liu, Qiu Cui, Zhuhe Fang, Xiaoyu Ma, Fei Xu, Li Shen, Liu Tang, Yuxing Zhou, Menglong Huang, et al. 2020. TiDB: a Raft-based HTAP database. Proceedings of the VLDB Endowment 13, 12 (2020), 3072--3084.
[26]
Alexey Kopytov. 2018. Sysbench: Scriptable database and system performance benchmark.
[27]
KR Krish, Aleksandr Khasymski, Guanying Wang, Ali R Butt, and Gaurav Makkar. 2013. On the use of shared storage in shared-nothing environments. In 2013 IEEE International Conference on Big Data. IEEE, 313--318.
[28]
Leslie Lamport. 2001. Paxos made simple. ACM SIGACT News (Distributed Computing Column) 32, 4 (Whole Number 121, December 2001) (2001), 51--58.
[29]
Feifei Li. 2019. Cloud-native database systems at Alibaba: Opportunities and challenges. Proceedings of the VLDB Endowment 12, 12 (2019), 2263--2272.
[30]
Yi Lu, Xiangyao Yu, Lei Cao, and Samuel Madden. 2021. Epoch-based commit and replication in distributed OLTP databases. (2021).
[31]
Zhenghua Lyu, Huan Hubert Zhang, Gang Xiong, Gang Guo, Haozhou Wang, Jinbao Chen, Asim Praveen, Yu Yang, Xiaoming Gao, Alexandra Wang, et al. 2021. Greenplum: A Hybrid Database for Transactional and Analytical Workloads. In Proceedings of the 2021 International Conference on Management of Data. 2530--2542.
[32]
Hatem A. Mahmoud, Faisal Nawab, Alexander Pucher, Divyakant Agrawal, and Amr El Abbadi. 2013. Low-Latency Multi-Datacenter Databases using Replicated Commit. Proc. VLDB Endow. 6, 9 (2013), 661--672.
[33]
DB Mongo. 2015. Mongodb.
[34]
AB MySQL. 2001. MySQL.
[35]
Takatsugu Ono, Yotaro Konishi, Teruo Tanimoto, Noboru Iwamatsu, Takashi Miyoshi, and Jun Tanaka. 2014. FlexDAS: A flexible direct attached storage for I/O intensive applications. In 2014 IEEE international conference on big data (big data). IEEE, 147--152.
[36]
Patrick O'Neil, Edward Cheng, Dieter Gawlick, and Elizabeth O'Neil. 1996. The log-structured merge-tree (LSM-tree). Acta Informatica 33, 4 (1996), 351--385.
[37]
Behandelt PostgreSQL. 1996. PostgreSQL. Web resource: http://www.PostgreSQL.org/about (1996).
[38]
George Samaras, Kathryn Britton, Andrew Citron, and C Mohan. 1995. Two-phase commit optimizations in a commercial distributed environment. Distributed and Parallel Databases 3, 4 (1995), 325--360.
[39]
Amir Shaikhha, Mohammad Dashti, and Christoph Koch. 2018. Push versus pull-based loop fusion in query engines. Journal of Functional Programming 28 (2018), e10.
[40]
Konstantin Shvachko, Hairong Kuang, Sanjay Radia, and Robert Chansler. 2010. The Hadoop Distributed File System. In 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST). 1--10.
[41]
Huaiming Song, Xian-He Sun, and Yong Chen. 2011. A hybrid shared-nothing/shared-data storage scheme for large-scale data processing. In 2011 IEEE Ninth International Symposium on Parallel and Distributed Processing with Applications. IEEE, 161--166.
[42]
Rebecca Taft, Irfan Sharif, Andrei Matei, Nathan VanBenschoten, Jordan Lewis, Tobias Grieger, Kai Niemi, Andy Woods, Anne Birzin, Raphael Poss, et al. 2020. Cockroachdb: The resilient geo-distributed sql database. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 1493--1509.
[43]
Alexandre Verbitski, Anurag Gupta, Debanjan Saha, Murali Brahmadesam, Kamal Gupta, Raman Mittal, Sailesh Krishnamurthy, Sandor Maurice, Tengiz Kharatishvili, and Xiaofeng Bao. 2017. Amazon aurora: Design considerations for high throughput cloud-native relational databases. In Proceedings of the 2017 ACM International Conference on Management of Data. 1041--1052.
[44]
Tobias Vinçon, Christian Knödler, Leonardo Solis-Vasquez, Arthur Bernhardt, Sajjad Tamimi, Lukas Weber, Florian Stock, Andreas Koch, and Ilia Petrov. 2022. Near-data processing in database systems on native computational storage under htap workloads. Proceedings of the VLDB Endowment 15, 10 (2022), 1991--2004.
[45]
Mehul Nalin Vora. 2011. Hadoop-HBase for large-scale data. In Proceedings of 2011 International Conference on Computer Science and Network Technology, Vol. 1. IEEE, 601--605.
[46]
Zhenkun Yang, Chen Qian, Xuwang Teng, Fanyu Kong, Fusheng Han, and Quanqing Xu. 2023. LCL: A Lock Chain Length-based Distributed Algorithm for Deadlock Detection and Resolution. In 2023 IEEE 39th International Conference on Data Engineering. IEEE, xxx--xxx.
[47]
Zhenkun Yang, Chuanhui Yang, Fusheng Han, Mingqiang Zhuang, Bing Yang, Zhifeng Yang, Xiaojun Cheng, Yuzhong Zhao, Wenhui Shi, Huafeng Xi, Huang Yu, Bin Liu, Yi Pan, Boxue Yin, Junquan Chen, and Quanqing Xu. 2022. Ocean-Base: A 707 Million tpmC Distributed Relational Database System. Proceedings of the VLDB Endowment 15, 12 (2022), 3385--3397.
[48]
Qian Zhang, Jingyao Li, Hongyao Zhao, Quanqing Xu, Wei Lu, Jinliang Xiao, Fusheng Han, Chuanhui Yang, and Xiaoyong Du. 2023. Efficient Distributed Transaction Processing in Heterogeneous Networks. Proc. VLDB Endow. 16, 6 (2023), 1372--1385.

Cited By

View all
  • (2024)Learning diffusions under uncertaintyProceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v38i18.30026(20430-20437)Online publication date: 20-Feb-2024
  • (2024)Native Distributed Databases: Problems, Challenges and OpportunitiesProceedings of the VLDB Endowment10.14778/3685800.368583917:12(4217-4220)Online publication date: 8-Nov-2024
  • (2024)Extremely-Compressed SSDs with I/O Behavior PredictionACM Transactions on Storage10.1145/367704420:4(1-38)Online publication date: 16-Jul-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 16, Issue 12
August 2023
685 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 August 2023
Published in PVLDB Volume 16, Issue 12

Check for updates

Badges

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)123
  • Downloads (Last 6 weeks)6
Reflects downloads up to 29 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Learning diffusions under uncertaintyProceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v38i18.30026(20430-20437)Online publication date: 20-Feb-2024
  • (2024)Native Distributed Databases: Problems, Challenges and OpportunitiesProceedings of the VLDB Endowment10.14778/3685800.368583917:12(4217-4220)Online publication date: 8-Nov-2024
  • (2024)Extremely-Compressed SSDs with I/O Behavior PredictionACM Transactions on Storage10.1145/367704420:4(1-38)Online publication date: 16-Jul-2024
  • (2024)Benchtemp: A General Benchmark for Evaluating Temporal Graph Neural Networks2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00310(4044-4057)Online publication date: 13-May-2024
  • (2024)Functionality-Aware Database Tuning via Multi-Task Learning2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00014(83-95)Online publication date: 13-May-2024
  • (2024)Experimental Evaluation of Scalable Database Architectures for High-Performance ApplicationsComputing and Machine Learning10.1007/978-981-97-7571-2_3(27-38)Online publication date: 25-Dec-2024
  • (2024)TreeCSS: An Efficient Framework for Vertical Federated LearningDatabase Systems for Advanced Applications10.1007/978-981-97-5552-3_29(425-441)Online publication date: 2-Jul-2024
  • (2024)VFDV-IM: An Efficient and Securely Vertical Federated Data ValuationDatabase Systems for Advanced Applications10.1007/978-981-97-5552-3_28(409-424)Online publication date: 2-Jul-2024
  • (2024)SPQO: Learning to Safely Reuse Cached Plans for Dynamic WorkloadsDatabase Systems for Advanced Applications10.1007/978-981-97-5552-3_21(315-330)Online publication date: 2-Jul-2024
  • (2024) : Query Aware Database Generation for Match OperatorsDatabase Systems for Advanced Applications10.1007/978-981-97-5552-3_18(266-282)Online publication date: 2-Jul-2024
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media