Apache Doris vs Elasticsearch: An In-Depth Comparative Analysis

Apache Doris excels in complex analytics with SQL support and high performance, while Elasticsearch is ideal for full-text search and real-time retrieval.

haijun huang

Apr. 23, 25 · Analysis

Likes (0)

Comment

Save

1.9K Views

In the field of big data analytics, Apache Doris and Elasticsearch (ES) are frequently utilized for real-time analytics and retrieval tasks. However, their design philosophies and technical focuses differ significantly.

This article offers a detailed comparison across six dimensions: core architecture, query language, real-time capabilities, application scenarios, performance, and enterprise practices.

1. Core Design Philosophy: MPP Architecture vs. Search Engine Architecture

Apache Doris employs a typical MPP (Massively Parallel Processing) distributed architecture, tailored for high-concurrency, low-latency real-time online analytical processing (OLAP) scenarios. It comprises front-end and back-end components, leveraging multi-node parallel computing and columnar storage to efficiently manage massive datasets. This design enables Doris to deliver query results in sub-seconds, making it ideal for complex aggregations and analytical queries on large datasets.

In contrast, Elasticsearch is based on a full-text search engine architecture, utilizing a sharding and inverted index design that prioritizes rapid text retrieval and filtering. ES stores data as documents, with each field indexed via an inverted index, excelling in keyword searches and log queries. However, it struggles with complex analytics and large-scale aggregation computations.

The core architectural differences are summarized below:

Architectural Philosophy	Apache Doris (MPP Analytical Database)	Elasticsearch (Distributed Search Engine)
Design Intent	Geared toward real-time data warehousing/BI, supporting high-throughput parallel computing OLAP engine; emphasizes high-concurrency aggregation queries and low latency	Focused on full-text search/log retrieval, built on Lucene’s inverted index; excels at keyword search and filtering, primarily a search engine despite structured query support
Data Storage	Columnar storage with column-encoded compression, achieving high compression ratios (5-10×) to save space; supports multiple table models (Duplicate, Aggregate, Unique) with pre-aggregation during writes	Document storage , with inverted indexes per field (low compression ratio, ~1.5×); schema changes are challenging post-index creation, requiring reindexing for field additions or modifications
Scalability and Elasticity	Shared-nothing node design for easy linear scaling; supports strict read-write separation and multi-tenant isolation; version 3.0 introduces storage-compute separation for elastic scaling	Scales via shard replicas but is constrained by single-node memory and JVM GC limits, risking memory shortages during large queries; thread pool model offers limited isolation
Typical Features	Fully open-source (Apache 2.0), MySQL protocol compatible; no external dependencies, offers materialized views and rich SQL functions for enhanced analytics	Core developed by Elastic (license changes over time), natively supports full-text search and near-real-time indexing; rich ecosystem (Kibana, Logstash), with some advanced features requiring paid plugins

Analysis: Doris’s MPP architecture provides a natural edge in big data aggregation analytics, leveraging columnar storage and vectorized execution to optimize IO and CPU usage. Features like pre-aggregation, materialized views, and a scalable design make it outperform ES in large-scale data analytics.

Conversely, Elasticsearch’s search engine roots make it superior for instant searches and basic metrics, but it falters in complex SQL analytics and joins. Doris also offers greater schema flexibility, allowing real-time column/index modifications, while ES’s fixed schemas often necessitate costly reindexing.

Overall, Doris emphasizes analytical power and usability, while ES prioritizes retrieval, giving Doris an advantage in complex enterprise analytics.

2. Query Language: SQL vs. DSL Ease of Use and Expressiveness

Doris and ES diverge sharply in query interfaces: Doris natively supports standard SQL, while Elasticsearch uses JSON DSL (Domain Specific Language). Doris aligns with the MySQL protocol, offering robust SQL 92 features such as SELECT, WHERE, GROUP BY, ORDER BY, multi-table JOINs, subqueries, window functions, UDFs/UDAFs, and materialized views. This comprehensive SQL support allows analysts and engineers to perform complex queries using familiar syntax without learning a new language.

Elasticsearch, however, employs a proprietary JSON-based DSL, distinct from SQL, requiring nested structures for filtering and aggregation. This presents a steep learning curve for new users and complicates integration with traditional BI tools.

The comparison is detailed below:

Query Language	Apache Doris (SQL Interface)	Elasticsearch (JSON DSL)
Syntax Style	Standard SQL (MySQL-like), intuitive and readable	Proprietary DSL (JSON), nested and less intuitive
Expressiveness	Supports multi-table JOINs, subqueries, views, UDFs for complex logic; enables direct associative analytics	Limited to single-index queries, no native JOINs or subqueries; complex analytics require pre-processed data models
Learning Cost	SQL is widely known, low entry barrier; mature debugging tools available	DSL is custom, high learning threshold; error troubleshooting is challenging
Ecosystem Integration	MySQL protocol compatible, integrates seamlessly with BI tools (e.g., Tableau, Grafana)	Closed ecosystem, difficult to integrate with BI tools without plugins; Kibana offers basic visualization

Analysis: Doris’s SQL interface excels in usability and efficiency, lowering the entry threshold by leveraging familiar syntax. For instance, aggregating log data by multiple dimensions in Doris requires a simple SQL GROUP BY, while ES demands complex, nested DSL aggregations, reducing development efficiency.

Doris’s support for JOINs and subqueries also suits data warehouse modeling (e.g., star schemas), whereas ES’s lack of JOINs necessitates pre-denormalized data or application-layer processing. Thus, Doris outperforms in query ease and power, enhancing integration with analytics ecosystems.

3. Real-Time Data Processing Mechanisms: Write Architecture and Data Updates

Doris and ES adopt distinct approaches to real-time data ingestion and querying. Elasticsearch prioritizes near-real-time search with document-by-document writes and frequent index refreshes. Data is ingested via REST APIs (e.g., Bulk), tokenized, and indexed, becoming searchable after periodic refreshes (default: 1 second). This ensures rapid log retrieval but incurs high write overhead, with CPU-intensive indexing limiting single-core throughput to ~2 MB/s, often causing bottlenecks during peaks.

Apache Doris, conversely, uses a high-throughput batch write architecture. Data is imported in small batches (via Stream Load or Routine Load from queues like Kafka), written efficiently in columnar format across multiple replicas. Avoiding per-field indexing, Doris achieves write speeds 5 times higher than ES per ES Rally benchmarks, and supports direct queue integration, simplifying pipelines.

Key differences in updates and real-time capabilities include:

Storage mechanism: Doris’s columnar storage achieves 5:1 to 10:1 compression, using ~20% of ES’s space for the same data, enhancing IO efficiency. ES’s inverted indexes yield a ~1.5:1 compression ratio, inflating storage.
Data updates: Doris’s Unique Key model supports primary key updates with minimal performance loss (<10%), while ES’s document updates require costly reindexing (up to 3x performance hit). Doris’s Aggregate Key model ensures consistent aggregations during imports, unlike ES’s less flexible, eventually consistent rollups.
Query visibility: ES offers second-level visibility post-refresh, ideal for instant log retrieval. Doris achieves sub-minute visibility via batch imports, sufficient for most real-time analytics, with memory-buffered data ensuring timely query access.

Analysis: Doris excels in high-throughput, consistent analysis, while ES focuses on millisecond writes and near-real-time retrieval. Doris’s batch writes and compression outperform ES in write performance (5x), query speed (2.3x), and storage efficiency (1/5th), making it ideal for high-frequency writes and fast analytics, with flexible schema evolution further enhancing its real-time capabilities.

4. Typical Application Scenario Comparison: Log Analysis, BI Reporting, etc.

Doris and ES shine in different scenarios due to their architectural strengths:

Scenario	Apache Doris	Elasticsearch
Log Analysis	Excels in storage and multi-dimensional analysis of large logs; supports long-term retention and fast aggregations/JOINs. Enterprises report 10x faster analytics and 60% cost savings, integrating search and analysis with inverted index support	Ideal for real-time log search and simple stats; fast keyword retrieval suits monitoring and troubleshooting (e.g., ELK). Struggles with complex aggregations and long-term analysis due to cost and performance limits
BI Reporting	Perfect for interactive reporting and ad-hoc analysis; full SQL and JOINs support data warehousing and dashboards. A logistics firm saw 5-10x faster queries and 2x concurrency	Rarely used for BI; lacks JOINs and robust SQL, limiting complex reporting. Best for simple metrics in monitoring, not rich BI logic

Analysis: In log analysis, Doris and ES complement each other: ES handles real-time searches, while Doris manages long-term, complex analytics. For BI, Doris’s SQL and performance make it far superior, directly supporting enterprise data warehouses and reporting.

5. Performance Benchmark Comparison

ES Rally benchmarks highlight Doris’s edge:

Log analysis: Elasticsearch vs Apache Doris - Apache Doris
Performance comparison: write throughput, storage, query response time

Doris achieves 550 MB/s write speed (5x ES), uses 1/5th the storage, and offers 2.3x faster queries (e.g., 1s vs. 6-7s for 40M log aggregations). Its MPP architecture ensures stability under high concurrency, unlike ES, which struggles with memory limits.

6. Enterprise Practice Cases

360 security browser: Replaced ES with Doris, improving analytics speed by 10x and cutting storage costs by 60%.
Tencent music: Reduced storage by 80% (697GB to 195GB) and boosted writes 4x with Doris.
Large bank: Enhanced log analysis efficiency, eliminating redundancy.
Payment firm: Achieved 4x write speed, 3x query performance, and 50% storage savings.

These cases underscore Doris’s superiority in large-scale writes and complex queries, often supplementing ES’s search strengths.

Summary

Doris excels in complex analytics, SQL usability, and efficiency, ideal for unified real-time platforms, while ES dominates in full-text search and real-time queries. Enterprises can combine them — Doris for analysis, ES for retrieval — to maximize value, with Doris poised to expand in analytics and ES in intelligent search.

Elasticsearch Log analysis Domain-Specific Language Apache

Opinions expressed by DZone contributors are their own.

Related

Trending