Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Couchbase analytics: NoETL for scalable NoSQL data analysis

Published: 01 August 2019 Publication History

Abstract

Couchbase Server is a highly scalable document-oriented database management system. With a shared-nothing architecture, it exposes a fast key-value store with a managed cache for sub-millisecond data operations, indexing for fast queries, and a powerful query engine for executing declarative SQL-like queries. Its Query Service debuted several years ago and supports high volumes of low-latency queries and updates for JSON documents. Its recently introduced Analytics Service complements the Query Service. Couchbase Analytics, the focus of this paper, supports complex analytical queries (e.g., ad hoc joins and aggregations) over large collections of JSON documents. This paper describes the Analytics Service from the outside in, including its user model, its SQL++ based query language, and its MPP-based storage and query processing architecture. It also briefly touches on the relationship of Couchbase Analytics to Apache AsterixDB, the open source Big Data management system at the core of Couchbase Analytics.

References

[1]
S. Alsubaiee, A. Behm, V. R. Borkar, Z. Heilbron, Y.-S. Kim, M. J. Carey, M. Dreseler, and C. Li. Storage management in asterixdb. PVLDB, 7(10):841--852, 2014.
[2]
A. Alsuliman. Optimizing external parallel sorting in AsterixDB. M.S. Thesis, Department of Computer Science, University of California, Irvine, 2018.
[3]
Apache AsterixDB, http://asterixdb.apache.org.
[4]
ASTERIX, http://asterix.ics.uci.edu.
[5]
D. Borkar, R. Mayuram, G. Sangudi, and M. J. Carey. Have your data and query it too: From key-value caching to Big Data management. In Proceedings of the ACM International Conference on Management of Data (SIGMOD), San Francisco, CA, USA, June 26 - July 01, 2016, pages 239--251.
[6]
V. Borkar, Y. Bu, E. P. Carman, Jr., N. Onose, T. Westmann, P. Pirzadeh, M. Carey, and V. Tsotras. Algebricks: A data model-agnostic compiler backend for Big Data languages. In Proceedings of the Sixth ACM Symposium on Cloud Computing (SoCC), pages 422--433, New York, NY, USA, 2015.
[7]
V. R. Borkar, M. J. Carey, R. Grover, N. Onose, and R. Vernica. Hyracks: A flexible and extensible foundation for data-intensive computing. In Proceedings of the 27th International Conference on Data Engineering (ICDE), April 11--16, pages 1151--1162, Hannover, Germany, 2011.
[8]
M. Carey. AsterixDB mid-flight: a case study in building systems in academia. In Proceedings of the 35th International Conference on Data Engineering (ICDE), April 8--11, Macao, China, pages 1--12, 2019.
[9]
R. Chaiken, B. Jenkins, P.-Å. Larson, B. Ramsey, D. Shakib, S. Weaver, and J. Zhou. SCOPE: easy and efficient parallel processing of massive data sets. PVLDB, 1(2):1265--1276, 2008.
[10]
D. Chamberlin. SQL++ for SQL Users: A Tutorial. September 2018. (Available via Amazon.com.).
[11]
S. Chaudhuri, U. Dayal, and V. Narasayya. An overview of business intelligence technology. Commun. ACM, 54(8):88--98, Aug. 2011.
[12]
E. F. Codd. Derivability, redundancy and consistency of relations stored in large data banks. IBM Research Report, San Jose, California, RJ599, 1969.
[13]
E. F. Codd. A relational model of data for large shared data banks. Commun. ACM, 13(6):377--387, 1970.
[14]
D. J. DeWitt and J. Gray. Parallel database systems: The future of high performance database systems. Commun. ACM, 35(6):85--98, 1992.
[15]
T. Elliott. What is hybrid transaction/analytical processing (HTAP)? https://www.zdnet.com/article/what-is-hybrid-transactionanalytical-processing-htap/, December 15, 2014.
[16]
G. Graefe. Query evaluation techniques for large databases. ACM Comput. Surv., 25(2):73--170, 1993.
[17]
JSON. http://www.json.org/.
[18]
T. Kim, A. Behm, M. Blow, V. Borkar, Y. Bu, M. J. Carey, M. Hubail, S. Jahangiri, J. Jia, C. Li, C. Luo, I. Maxon, and P. Pirzadeh. Robust and efficient memory management in Apache AsterixDB. 2019. Submitted for publication.
[19]
C. Luo and M. J. Carey. LSM-based Storage Techniques: A survey. CoRR, abs/1812.07527, 2018.
[20]
Couchbase N1QL for Analytics language web page, Couchbase, Inc., https://docs.couchbase.com/server/6.0/analytics/introduction.html#n1ql-for-analytics-query-language.
[21]
P. E. O'Neil, E. Cheng, D. Gawlick, and E. J. O'Neil. The log-structured merge-tree (lsm-tree). Acta Inf., 33(4):351--385, 1996.
[22]
K. W. Ong, Y. Papakonstantinou, and R. Vernoux. The SQL++ semi-structured data model and query language: A capabilities survey of SQL-on-Hadoop, NoSQL and NewSQL databases. CoRR, abs/1405.3631, 2014.
[23]
L. D. Shapiro. Join processing in database systems with large main memories. ACM Transactions on Database Systems (TODS), 11(3):239--264, 1986.
[24]
SocialGen, https://github.com/pouriapirz/socialGen.

Cited By

View all

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 12, Issue 12
August 2019
547 pages

Publisher

VLDB Endowment

Publication History

Published: 01 August 2019
Published in PVLDB Volume 12, Issue 12

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)51
  • Downloads (Last 6 weeks)4
Reflects downloads up to 03 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Scalable Reasoning on Document Stores via Instance-Aware Query RewritingProceedings of the VLDB Endowment10.14778/3611479.361148116:11(2699-2713)Online publication date: 24-Aug-2023
  • (2022)MagmaProceedings of the VLDB Endowment10.14778/3554821.355483915:12(3496-3508)Online publication date: 29-Sep-2022
  • (2022)ByteHTAPProceedings of the VLDB Endowment10.14778/3554821.355483215:12(3411-3424)Online publication date: 29-Sep-2022
  • (2021)The Forgotten Document-Oriented Database Management SystemsBig Data Research10.1016/j.bdr.2021.10020525:COnline publication date: 29-Dec-2021
  • (2020)King louieProceedings of the 35th Annual ACM Symposium on Applied Computing10.1145/3341105.3373968(144-153)Online publication date: 30-Mar-2020

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media