Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

AnalyticDB: real-time OLAP database system at Alibaba cloud

Published: 01 August 2019 Publication History
  • Get Citation Alerts
  • Abstract

    With data explosion in scale and variety, OLAP databases play an increasingly important role in serving real-time analysis with low latency (e.g., hundreds of milliseconds), especially when incoming queries are complex and ad hoc in nature. Moreover, these systems are expected to provide high query concurrency and write throughput, and support queries over structured and complex data types (e.g., JSON, vector and texts).
    In this paper, we introduce AnalyticDB, a real-time OLAP database system developed at Alibaba. AnalyticDB maintains all-column indexes in an asynchronous manner with acceptable overhead, which provides low latency for complex ad-hoc queries. Its storage engine extends hybrid row-column layout for fast retrieval of both structured data and data of complex types. To handle large-scale data with high query concurrency and write throughput, AnalyticDB decouples read and write access paths. To further reduce query latency, novel storage-aware SQL optimizer and execution engine are developed to fully utilize the advantages of the underlying storage and indexes. AnalyticDB has been successfully deployed on Alibaba Cloud to serve numerous customers (both large and small). It is capable of holding 100 trillion rows of records, i.e., 10PB+ in size. At the same time, it is able to serve 10m+ writes and 100k+ queries per second, while completing complex queries within hundreds of milliseconds.

    References

    [1]
    Alibaba Cloud. https://www.alibabacloud.com.
    [2]
    ANTLR ASM. https://www.antlr.org.
    [3]
    Apache ORC File. https://orc.apache.org/.
    [4]
    Benchmarking Nearest Neighbours. https://github.com/erikbern/ann-benchmarks.
    [5]
    Greenplum. https://greenplum.org/.
    [6]
    MySQL. https://www.mysql.com/.
    [7]
    Pangu. https://www.alibabacloud.com/blog/pangu---the-high-performance-distributed-file-system-by-alibaba-cloud_594059.
    [8]
    PostgreSQL. https://www.postgresql.org/.
    [9]
    Presto. https://prestodb.io/.
    [10]
    Teradata Database. http://www.teradata.com.
    [11]
    TPC-H Benchmark. http://www.tpc.org/tpch/.
    [12]
    D. J. Abadi, S. R. Madden, and N. Hachem. Column-stores vs. row-stores: how different are they really? In SIGMOD, pages 967--980. ACM, 2008.
    [13]
    M. Armbrust, R. S. Xin, C. Lian, Y. Huai, D. Liu, J. K. Bradley, X. Meng, T. Kaftan, M. J. Franklin, A. Ghodsi, et al. Spark sql: Relational data processing in spark. In SIGMOD, pages 1383--1394. ACM, 2015.
    [14]
    J. Backus. Can programming be liberated from the von Neumann style?: a functional style and its algebra of programs. ACM, 2007.
    [15]
    P. A. Bernstein and N. Goodman. Multiversion concurrency control-theory and algorithms. ACM Transactions on Database Systems (TODS), 8(4):465--483, 1983.
    [16]
    D. Comer. Ubiquitous b-tree. ACM Computing Surveys (CSUR), 11(2):121--137, 1979.
    [17]
    T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to algorithms. MIT press, 2009.
    [18]
    J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. Communications of the ACM, 51(1):107--113, 2008.
    [19]
    A. Eisenberg, J. Melton, K. Kulkarni, J.-E. Michels, and F. Zemke. Sql: 2003 has been published. ACM SIGMOD Record, 33(1):119--126, 2004.
    [20]
    M. Grund, J. Krüger, H. Plattner, A. Zeier, P. Cudre-Mauroux, and S. Madden. Hyrise: a main memory hybrid storage engine. PVLDB, 4(2):105--116, 2010.
    [21]
    A. Gupta, D. Agarwal, D. Tan, J. Kulesza, R. Pathak, S. Stefani, and V. Srinivasan. Amazon redshift and the case for simpler data warehouses. In SIGMOD, pages 1917--1923. ACM, 2015.
    [22]
    K. Hajebi, Y. Abbasi-Yadkori, H. Shahbazi, and H. Zhang. Fast approximate nearest-neighbor search with k-nearest neighbor graph. In IJCAI, pages 1312--1317, 2011.
    [23]
    S. Harizopoulos, V. Liang, D. J. Abadi, and S. Madden. Performance tradeoffs in read-optimized databases. In VLDB, pages 487--498. VLDB Endowment, 2006.
    [24]
    P. Hunt, M. Konar, F. P. Junqueira, and B. Reed. Zookeeper: Wait-free coordination for internet-scale systems. In USENIX ATC, volume 8. Boston, MA, USA, 2010.
    [25]
    J.-F. Im, K. Gopalakrishna, S. Subramaniam, M. Shrivastava, A. Tumbde, X. Jiang, J. Dai, S. Lee, N. Pawar, J. Li, et al. Pinot: Realtime olap for 530 million users. In SIGMOD, pages 583--594. ACM, 2018.
    [26]
    H. Jégou, M. Douze, and C. Schmid. Product quantization for nearest neighbor search. IEEE Trans. Pattern Anal. Mach. Intell., 33(1):117--128, 2011.
    [27]
    F. V. Jensen. An introduction to Bayesian networks, volume 210. UCL press London, 1996.
    [28]
    M. Kornacker, A. Behm, V. Bittorf, T. Bobrovytsky, C. Ching, A. Choi, J. Erickson, M. Grund, D. Hecht, M. Jacobs, et al. Impala: A modern, open-source sql engine for hadoop. In Cidr, volume 1, page 9, 2015.
    [29]
    A. Lamb, M. Fuller, R. Varadarajan, N. Tran, B. Vandiver, L. Doshi, and C. Bear. The vertica analytic database: C-store 7 years later. PVLDB, 5(12):1790--1801, 2012.
    [30]
    G. M. Lohman. Grammar-like functional rules for representing query optimization alternatives, volume 17. ACM, 1988.
    [31]
    S. Melnik, A. Gubarev, J. J. Long, G. Romer, S. Shivakumar, M. Tolton, and T. Vassilakis. Dremel: interactive analysis of web-scale datasets. PVLDB, 3(1-2):330--339, 2010.
    [32]
    T. Neumann. Efficiently compiling efficient query plans for modern hardware. PVLDB, 4(9):539--550, 2011.
    [33]
    K. Sato. An inside look at google bigquery.(2012). Retrieved Jan, 29:2018, 2012.
    [34]
    M. Stonebraker, D. J. Abadi, A. Batkin, X. Chen, M. Cherniack, M. Ferreira, E. Lau, A. Lin, S. Madden, E. O'Neil, et al. C-store: a column-oriented dbms. In VLDB, pages 553--564. VLDB Endowment, 2005.
    [35]
    A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, S. Anthony, H. Liu, P. Wyckoff, and R. Murthy. Hive: a warehousing solution over a map-reduce framework. PVLDB, 2(2):1626--1629, 2009.
    [36]
    F. Yang, E. Tschetter, X. Léauté, N. Ray, G. Merlino, and D. Ganguli. Druid: A real-time analytical data store. In SIGMOD, pages 157--168. ACM, 2014.
    [37]
    M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker, and I. Stoica. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In NSDI, pages 2--2. USENIX Association, 2012.
    [38]
    Z. Zhang, C. Li, Y. Tao, R. Yang, H. Tang, and J. Xu. Fuxi: a fault-tolerant resource management and job scheduling system at internet scale. PVLDB, 7(13):1393--1404, 2014.
    [39]
    M. Zukowski, S. Heman, N. Nes, and P. Boncz. Super-scalar RAM-CPU cache compression. IEEE, 2006.

    Cited By

    View all
    • (2024)Understanding the Performance Implications of the Design Principles in Storage-Disaggregated DatabasesProceedings of the ACM on Management of Data10.1145/36549832:3(1-26)Online publication date: 30-May-2024
    • (2024)Revisiting B-tree Compression: An Experimental StudyProceedings of the ACM on Management of Data10.1145/36549722:3(1-25)Online publication date: 30-May-2024
    • (2024)Bouncer: Admission Control with Response Time Objectives for Low-latency Online Data SystemsCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3653384(400-413)Online publication date: 9-Jun-2024
    • Show More Cited By

    Index Terms

    1. AnalyticDB: real-time OLAP database system at Alibaba cloud
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image Proceedings of the VLDB Endowment
      Proceedings of the VLDB Endowment  Volume 12, Issue 12
      August 2019
      547 pages

      Publisher

      VLDB Endowment

      Publication History

      Published: 01 August 2019
      Published in PVLDB Volume 12, Issue 12

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)134
      • Downloads (Last 6 weeks)11
      Reflects downloads up to 11 Aug 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Understanding the Performance Implications of the Design Principles in Storage-Disaggregated DatabasesProceedings of the ACM on Management of Data10.1145/36549832:3(1-26)Online publication date: 30-May-2024
      • (2024)Revisiting B-tree Compression: An Experimental StudyProceedings of the ACM on Management of Data10.1145/36549722:3(1-25)Online publication date: 30-May-2024
      • (2024)Bouncer: Admission Control with Response Time Objectives for Low-latency Online Data SystemsCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3653384(400-413)Online publication date: 9-Jun-2024
      • (2024)Flux: Decoupled Auto-Scaling for Heterogeneous Query Workload in Alibaba AnalyticDBCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3653381(255-268)Online publication date: 9-Jun-2024
      • (2024)ByteCard: Enhancing ByteDance's Data Warehouse with Learned Cardinality EstimationCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3653376(41-54)Online publication date: 9-Jun-2024
      • (2023)Anser: Adaptive Information Sharing Framework of AnalyticDBProceedings of the VLDB Endowment10.14778/3611540.361155316:12(3636-3648)Online publication date: 1-Aug-2023
      • (2023)Krypton: Real-Time Serving and Analytical SQL Engine at ByteDanceProceedings of the VLDB Endowment10.14778/3611540.361154516:12(3528-3542)Online publication date: 1-Aug-2023
      • (2023)Sieve: A Learned Data-Skipping Index for Data AnalyticsProceedings of the VLDB Endowment10.14778/3611479.361152016:11(3214-3226)Online publication date: 24-Aug-2023
      • (2023)Exploiting Cloud Object Storage for High-Performance AnalyticsProceedings of the VLDB Endowment10.14778/3611479.361148616:11(2769-2782)Online publication date: 24-Aug-2023
      • (2023)Efficient Approximation Framework for Attribute RecommendationProceedings of the ACM on Management of Data10.1145/36267261:4(1-26)Online publication date: 12-Dec-2023
      • Show More Cited By

      View Options

      Get Access

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media