Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Alibaba hologres: a cloud-native service for hybrid serving/analytical processing

Published: 01 August 2020 Publication History

Abstract

In existing big data stacks, the processes of analytical processing and knowledge serving are usually separated in different systems. In Alibaba, we observed a new trend where these two processes are fused: knowledge serving incurs generation of new data, and these data are fed into the process of analytical processing which further fine tunes the knowledge base used in the serving process. Splitting this fused processing paradigm into separate systems incurs overhead such as extra data duplication, discrepant application development and expensive system maintenance.
In this work, we propose Hologres, which is a cloud native service for hybrid serving and analytical processing (HSAP). Hologres decouples the computation and storage layers, allowing flexible scaling in each layer. Tables are partitioned into self-managed shards. Each shard processes its read and write requests concurrently independent of each other. Hologres leverages hybrid row/column storage to optimize operations such as point lookup, column scan and data ingestion used in HSAP. We propose Execution Context as a resource abstraction between system threads and user tasks. Execution contexts can be cooperatively scheduled with little context switching overhead. Queries are parallelized and mapped to execution contexts for concurrent execution. The scheduling framework enforces resource isolation among different queries and supports customizable schedule policy. We conducted experiments comparing Hologres with existing systems specifically designed for analytical processing and serving workloads. The results show that Hologres consistently outperforms other systems in both system throughput and end-to-end query latency.

References

[1]
Actian vector. https://www.actian.com.
[2]
Apache arrow. https://arrow.apache.org.
[3]
Apache hdfs. https://hadoop.apache.org.
[4]
Flink. https://flink.apache.org.
[5]
Greenplum. https://greenplum.org.
[6]
Hbase. https://hbase.apache.org.
[7]
Hive. https://hive.apache.org.
[8]
Intel avx-512 instruction set. https://www.intel.com/content/www/us/en/architecture-and-technology/avx-512-overview.html.
[9]
Memsql. http://www.memsql.com/.
[10]
Mysql. https://www.mysql.com.
[11]
Pivotal greenplum. https://gpdb.docs.pivotal.io/6-0/admin_guide/workload_mgmt.html.
[12]
Postgresql. https://www.postgresql.org.
[13]
Rocksdb. https://github.com/facebook/rocksdb/wiki.
[14]
Teradata. http://www.teradata.com.
[15]
Tpc-h benchmark. http://www.tpc.org/tpch.
[16]
F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: A distributed storage system for structured data. ACM Trans. Comput. Syst., 26(2), June 2008.
[17]
B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears. Benchmarking cloud serving systems with ycsb. In Proceedings of the 1st ACM Symposium on Cloud Computing, SoCC 2010, New York, NY, USA, 2010. Association for Computing Machinery.
[18]
J. C. Corbett, J. Dean, M. Epstein, A. Fikes, C. Frost, J. J. Furman, S. Ghemawat, A. Gubarev, C. Heiser, P. Hochschild, and et al. Spanner: Google's globally distributed database. ACM Trans. Comput. Syst., 31(3), Aug. 2013.
[19]
S. Das, V R. Narasayya, F. Li, and M. Syamala. CPU sharing techniques for performance isolation in multitenant relational database-as-a-service. PVLDB, 7(1)37--48, 2013.
[20]
C. Diaconu, C. Freedman, E. Ismert, P.-A. Larson, P. Mittal, R. Stonecipher, N. Verma, and M. Zwilling. Hekaton: Sql server's memory-optimized oltp engine. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pages 1243--1254, 2013.
[21]
F. Färber, N. May, W. Lehner, P. Große, I. Müller, H. Rauhe, and J. Dees. The sap hana database-an architecture overview. IEEE Data Eng. Bull., 35(1):28--33, 2012.
[22]
J.-F. Im, K. Gopalakrishna, S. Subramaniam, M. Shrivastava, A. Tumbde, X. Jiang, J. Dai, S. Lee, N. Pawar, J. Li, and et al. Pinot: Realtime olap for 530 million users. In Proceedings of the 2018 International Conference on Management of Data, SIGMOD 2018, New York, NY, USA, 2018. Association for Computing Machinery.
[23]
A. Kemper and T. Neumann. Hyper: A hybrid oltp&olap main memory database system based on virtual memory snapshots. In 2011 IEEE 27th International Conference on Data Engineering, pages 195--206. IEEE, 2011.
[24]
M. Kornacker, A. Behm, V. Bittorf, T. Bobrovytsky, C. Ching, A. Choi, J. Erickson, M. Grund, D. Hecht, M. Jacobs, I. Joshi, L. Kuff, D. Kumar, A. Leblang, N. Li, I. Pandis, H. Robinson, D. Rorke, S. Rus, J. Russell, D. Tsirogiannis, S. Wanderman-Milne, and M. Yoder. Impala: A modern, open-source SQL engine forhadoop. In CIDR 2015, Seventh Biennial Conference on Innovative Data Systems Research, Asilomar, CA, USA, January 4-7, 2015, Online Proceedings. www.cidrdb.org, 2015.
[25]
T. Lahiri, S. Chavan, M. Colgan, D. Das, A. Ganesh, M. Gleeson, S. Hase, A. Holloway, J. Kamp, T. Lee, J. Loaiza, N. Macnaughton, V. Marwah, N. Mukherjee, A. Mullick, S. Muthulingam, V. Raja, M. Roth, E. Soylemez, and M. Zait. Oracle database in-memory: A dual format in-memory database. In 2015 IEEE 31st International Conference on Data Engineering, pages 1253--1258, 2015.
[26]
A. Lakshman and P. Malik. Cassandra: A decentralized structured storage system. SIGOPS Oper. Syst. Rev., 44(2), Apr. 2010.
[27]
A. Lamb, M. Fuller, R. Varadarajan, N. Tran, B. Vandiver, L. Doshi, and C. Bear. The vertica analytic database: C-store 7 years later. PVLDB, 5(12):1790--1801, 2012.
[28]
P.-r. Larson, A. Birka, E. N. Hanson, W. Huang, M. Nowakiewicz, and V. Papadimos. Real-time analytical processing with sql server. PVLDB, 8(12):1740--1751, 2015.
[29]
V. Leis, P. Boncz, A. Kemper, and T. Neumann. Morsel-driven parallelism: A numa-aware query evaluation framework for the many-core age. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD 2014, New York, NY, USA, 2014. Association for Computing Machinery.
[30]
Y. Mao, E. Kohler, and R. T. Morris. Cache craftiness for fast multicore key-value storage. In Proceedings of the 7th ACM European Conference on Computer Systems, EuroSys 2012, New York, NY, USA, 2012. Association for Computing Machinery.
[31]
J. M. Patel, H. Deshmukh, J. Zhu, N. Potti, Z. Zhang, M. Spehlmann, H. Memisoglu, and S. Saurabh. Quickstep: A data platform based on the scaling-up approach. PVLDB, 11(6):663--676, 2018.
[32]
I. Psaroudakis, T. Scheuer, N. May, and A. Ailamaki. Task scheduling for highly concurrent analytical and transactional main-memory workloads. In Proceedings of the Fourth International Workshop on Accelerating Data Management Systems Using Modern Processor and Storage Architectures (ADMS 2013), number CONF, 2013.
[33]
R. Ramamurthy, D. J. DeWitt, and Q. Su. A case for fractured mirrors. In Proceedings of the 28th International Conference on Very Large Data Bases, page 430--441. VLDB Endowment, 2002.
[34]
V. Raman, G. Attaluri, R. Barber, N. Chainani, D. Kalmuk, V. KulandaiSamy, J. Leenstra, S. Lightstone, S. Liu, G. M. Lohman, et al. Db2 with blu acceleration: So much more than just a column store. PVLDB, 6(11):1080--1091, 2013.
[35]
M. Stonebraker and A. Weisberg. The voltdb main memory dbms. IEEE Data Eng. Bull., 36(2):21--27, 2013.
[36]
F. Yang, E. Tschetter, X. Léauté, N. Ray, G. Merlino, and D. Ganguli. Druid: A real-time analytical data store. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD 2014, New York, NY, USA, 2014. Association for Computing Machinery.
[37]
M. Zukowski and P. A. Boncz. Vectorwise: Beyond column stores. IEEE Data Eng. Bull., 35:21--27, 2012.

Cited By

View all
  • (2024)Lindorm-UWC: An Ultra-Wide-Column Database for Internet of VehiclesProceedings of the VLDB Endowment10.14778/3685800.368583117:12(4117-4129)Online publication date: 8-Nov-2024
  • (2024)BG3: A Cost Effective and I/O Efficient Graph Database in BytedanceCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3653373(360-372)Online publication date: 9-Jun-2024
  • (2023)Krypton: Real-Time Serving and Analytical SQL Engine at ByteDanceProceedings of the VLDB Endowment10.14778/3611540.361154516:12(3528-3542)Online publication date: 1-Aug-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 13, Issue 12
August 2020
1710 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 August 2020
Published in PVLDB Volume 13, Issue 12

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)45
  • Downloads (Last 6 weeks)3
Reflects downloads up to 06 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Lindorm-UWC: An Ultra-Wide-Column Database for Internet of VehiclesProceedings of the VLDB Endowment10.14778/3685800.368583117:12(4117-4129)Online publication date: 8-Nov-2024
  • (2024)BG3: A Cost Effective and I/O Efficient Graph Database in BytedanceCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3653373(360-372)Online publication date: 9-Jun-2024
  • (2023)Krypton: Real-Time Serving and Analytical SQL Engine at ByteDanceProceedings of the VLDB Endowment10.14778/3611540.361154516:12(3528-3542)Online publication date: 1-Aug-2023
  • (2023)A Model and Survey of Distributed Data-Intensive SystemsACM Computing Surveys10.1145/360480156:1(1-69)Online publication date: 26-Aug-2023
  • (2023)Presto: A Decade of SQL Analytics at MetaProceedings of the ACM on Management of Data10.1145/35897691:2(1-25)Online publication date: 20-Jun-2023
  • (2023)Formal Modeling and Verifying Dubbo Using Process AlgebraMobile Networks and Applications10.1007/s11036-023-02181-zOnline publication date: 23-Oct-2023
  • (2022)ByteGraphProceedings of the VLDB Endowment10.14778/3554821.355482415:12(3306-3318)Online publication date: 1-Aug-2022
  • (2022)Remus: Efficient Live Migration for Distributed Databases with Snapshot IsolationProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3526047(2232-2245)Online publication date: 10-Jun-2022
  • (2022)Optimizing Data-intensive Systems in Disaggregated Data Centers with TELEPORTProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3517856(1345-1359)Online publication date: 10-Jun-2022
  • (2021)Real-time Data Infrastructure at UberProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3457552(2503-2516)Online publication date: 9-Jun-2021
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media