Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3183713.3190661acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Pinot: Realtime OLAP for 530 Million Users

Published: 27 May 2018 Publication History

Abstract

Modern users demand analytical features on fresh, real time data. Offering these analytical features to hundreds of millions of users is a relevant problem encountered by many large scale web companies.
Relational databases and key-value stores can be scaled to provide point lookups for a large number of users but fall apart at the combination of high ingest rates, high query rates at low latency for analytical queries. Online analytical databases typically rely on bulk data loads and are not typically built to handle nonstop operation in demanding web environments. Offline analytical systems have high throughput but do not offer low query latencies nor can scale to serving tens of thousands of queries per second.
We present Pinot, a single system used in production at Linkedin that can serve tens of thousands of analytical queries per second, offers near-realtime data ingestion from streaming data sources, and handles the operational requirements of large web properties. We also provide a performance comparison with Druid, a system similar to Pinot.

References

[1]
Daniel Abadi, Samuel Madden, and Miguel Ferreira. 2006. Integrating compression and execution in column-oriented database systems Proceedings of the 2006 ACM SIGMOD international conference on Management of data. ACM, 671--682.
[2]
Kevin Beyer and Raghu Ramakrishnan. 1999. Bottom-up computation of sparse and iceberg cube. In ACM SIGMOD Record, Vol. Vol. 28. ACM, 359--370.
[3]
MKABV Bittorf, Taras Bobrovytsky, Casey Ching Alan Choi Justin Erickson, Martin Grund Daniel Hecht, Matthew Jacobs Ishaan Joshi Lenni Kuff, Dileep Kumar Alex Leblang, Nong Li Ippokratis Pandis Henry Robinson, David Rorke Silvius Rus, John Russell Dimitris Tsirogiannis Skye Wanderman, and Milne Michael Yoder. 2015. Impala: A modern, open-source SQL engine for Hadoop Proceedings of the 7th Biennial Conference on Innovative Data Systems Research.
[4]
Peter A Boncz, Marcin Zukowski, and Niels Nes. 2005. MonetDB/X100: Hyper-Pipelining Query Execution. CIDR, Vol. Vol. 5. 225--237.
[5]
Samy Chambi, Daniel Lemire, Robert Godin, Kamel Boukhalfa, Charles R Allen, and Fangjin Yang. 2016 a. Optimizing druid with roaring bitmaps. In Proceedings of the 20th International Database Engineering & Applications Symposium. ACM, 77--86.
[6]
Samy Chambi, Daniel Lemire, Owen Kaser, and Robert Godin. 2016 b. Better bitmap performance with roaring bitmaps. Software: practice and experience Vol. 46, 5 (2016), 709--719.
[7]
C. Chen. 2005. Top 10 unsolved information visualization problems. Computer Graphics and Applications, IEEE Vol. 25, 4 (july-aug. 2005), 12--16.

Cited By

View all
  • (2024)ClickHouse - Lightning Fast Analytics for EveryoneProceedings of the VLDB Endowment10.14778/3685800.368580217:12(3731-3744)Online publication date: 8-Nov-2024
  • (2024)μWheel: Aggregate Management for Streams and QueriesProceedings of the 18th ACM International Conference on Distributed and Event-based Systems10.1145/3629104.3666031(54-65)Online publication date: 24-Jun-2024
  • (2023)Krypton: Real-Time Serving and Analytical SQL Engine at ByteDanceProceedings of the VLDB Endowment10.14778/3611540.361154516:12(3528-3542)Online publication date: 1-Aug-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '18: Proceedings of the 2018 International Conference on Management of Data
May 2018
1874 pages
ISBN:9781450347037
DOI:10.1145/3183713
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 May 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. near real time data ingestion
  2. olap
  3. parallel and distributed dbmss
  4. pinot

Qualifiers

  • Research-article

Conference

SIGMOD/PODS '18
Sponsor:

Acceptance Rates

SIGMOD '18 Paper Acceptance Rate 90 of 461 submissions, 20%;
Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)61
  • Downloads (Last 6 weeks)11
Reflects downloads up to 11 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)ClickHouse - Lightning Fast Analytics for EveryoneProceedings of the VLDB Endowment10.14778/3685800.368580217:12(3731-3744)Online publication date: 8-Nov-2024
  • (2024)μWheel: Aggregate Management for Streams and QueriesProceedings of the 18th ACM International Conference on Distributed and Event-based Systems10.1145/3629104.3666031(54-65)Online publication date: 24-Jun-2024
  • (2023)Krypton: Real-Time Serving and Analytical SQL Engine at ByteDanceProceedings of the VLDB Endowment10.14778/3611540.361154516:12(3528-3542)Online publication date: 1-Aug-2023
  • (2022)Meta's next-generation realtime monitoring and analytics platformProceedings of the VLDB Endowment10.14778/3554821.355484115:12(3522-3534)Online publication date: 1-Aug-2022
  • (2022)User-Centric Interference-Aware Load Balancing for Cloud-Deployed ApplicationsIEEE Transactions on Cloud Computing10.1109/TCC.2019.294356010:1(736-748)Online publication date: 1-Jan-2022
  • (2022)From Batch Processing to Real Time Analytics: Running Presto® at Scale2022 IEEE 38th International Conference on Data Engineering (ICDE)10.1109/ICDE53745.2022.00165(1598-1609)Online publication date: May-2022
  • (2022)Recent Advances in Data Engineering for NetworkingIEEE Access10.1109/ACCESS.2022.316286310(34449-34496)Online publication date: 2022
  • (2022)Processing Physiological Sensor Data in Near Real-Time as Social Signals for Their Use on Social Virtual Reality PlatformsExtended Reality10.1007/978-3-031-15553-6_4(44-62)Online publication date: 28-Aug-2022
  • (2021)LogStoreProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3457565(2464-2476)Online publication date: 9-Jun-2021
  • (2021)Real-time Data Infrastructure at UberProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3457552(2503-2516)Online publication date: 9-Jun-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media