Apache Doris vs Elasticsearch: An In-Depth Comparative Analysis
Apache Doris excels in complex analytics with SQL support and high performance, while Elasticsearch is ideal for full-text search and real-time retrieval.
Join the DZone community and get the full member experience.
Join For FreeThis article offers a detailed comparison across six dimensions: core architecture, query language, real-time capabilities, application scenarios, performance, and enterprise practices.

1. Core Design Philosophy: MPP Architecture vs. Search Engine Architecture
Apache Doris employs a typical MPP (Massively Parallel Processing) distributed architecture, tailored for high-concurrency, low-latency real-time online analytical processing (OLAP) scenarios. It comprises front-end and back-end components, leveraging multi-node parallel computing and columnar storage to efficiently manage massive datasets. This design enables Doris to deliver query results in sub-seconds, making it ideal for complex aggregations and analytical queries on large datasets.
In contrast, Elasticsearch is based on a full-text search engine architecture, utilizing a sharding and inverted index design that prioritizes rapid text retrieval and filtering. ES stores data as documents, with each field indexed via an inverted index, excelling in keyword searches and log queries. However, it struggles with complex analytics and large-scale aggregation computations.
The core architectural differences are summarized below:
Architectural Philosophy
|
Apache Doris (MPP Analytical Database)
|
Elasticsearch (Distributed Search Engine)
|
---|---|---|
Design Intent
|
Geared toward real-time data warehousing/BI, supporting high-throughput parallel computing OLAP engine; emphasizes
high-concurrency aggregation queries
and
low latency
|
Focused on full-text search/log retrieval, built on Lucene’s inverted index; excels at
keyword search
and filtering, primarily a search engine despite structured query support
|
Data Storage
|
Columnar storage
with column-encoded compression, achieving high compression ratios (5-10×) to save space; supports multiple table models (Duplicate, Aggregate, Unique) with pre-aggregation during writes
|
Document storage
, with inverted indexes per field (low compression ratio, ~1.5×); schema changes are challenging post-index creation, requiring reindexing for field additions or modifications
|
Scalability and Elasticity
|
Shared-nothing node design for easy linear scaling; supports strict read-write separation and multi-tenant isolation; version 3.0 introduces
storage-compute separation
for elastic scaling
|
Scales via shard replicas but is constrained by single-node memory and JVM GC limits, risking memory shortages during large queries; thread pool model offers limited isolation
|
Typical Features
|
Fully open-source (Apache 2.0), MySQL protocol compatible; no external dependencies, offers
materialized views
and rich SQL functions for enhanced analytics
|
Core developed by Elastic (license changes over time), natively supports
full-text search
and near-real-time indexing; rich ecosystem (Kibana, Logstash), with some advanced features requiring paid plugins
|
Analysis: Doris’s MPP architecture provides a natural edge in big data aggregation analytics, leveraging columnar storage and vectorized execution to optimize IO and CPU usage. Features like pre-aggregation, materialized views, and a scalable design make it outperform ES in large-scale data analytics.
Conversely, Elasticsearch’s search engine roots make it superior for instant searches and basic metrics, but it falters in complex SQL analytics and joins. Doris also offers greater schema flexibility, allowing real-time column/index modifications, while ES’s fixed schemas often necessitate costly reindexing.
Overall, Doris emphasizes analytical power and usability, while ES prioritizes retrieval, giving Doris an advantage in complex enterprise analytics.
2. Query Language: SQL vs. DSL Ease of Use and Expressiveness
Doris and ES diverge sharply in query interfaces: Doris natively supports standard SQL, while Elasticsearch uses JSON DSL (Domain Specific Language). Doris aligns with the MySQL protocol, offering robust SQL 92 features such as SELECT, WHERE, GROUP BY, ORDER BY, multi-table JOINs, subqueries, window functions, UDFs/UDAFs, and materialized views. This comprehensive SQL support allows analysts and engineers to perform complex queries using familiar syntax without learning a new language.
Elasticsearch, however, employs a proprietary JSON-based DSL, distinct from SQL, requiring nested structures for filtering and aggregation. This presents a steep learning curve for new users and complicates integration with traditional BI tools.
The comparison is detailed below:
Query Language
|
Apache Doris (SQL Interface)
|
Elasticsearch (JSON DSL)
|
---|---|---|
Syntax Style
|
Standard SQL (MySQL-like), intuitive and readable
|
Proprietary DSL (JSON), nested and less intuitive
|
Expressiveness
|
Supports multi-table JOINs, subqueries, views, UDFs for complex logic; enables direct associative analytics
|
Limited to single-index queries, no native JOINs or subqueries; complex analytics require pre-processed data models
|
Learning Cost
|
SQL is widely known, low entry barrier; mature debugging tools available
|
DSL is custom, high learning threshold; error troubleshooting is challenging
|
Ecosystem Integration
|
MySQL protocol compatible, integrates seamlessly with BI tools (e.g., Tableau, Grafana)
|
Closed ecosystem, difficult to integrate with BI tools without plugins; Kibana offers basic visualization
|
Analysis: Doris’s SQL interface excels in usability and efficiency, lowering the entry threshold by leveraging familiar syntax. For instance, aggregating log data by multiple dimensions in Doris requires a simple SQL GROUP BY, while ES demands complex, nested DSL aggregations, reducing development efficiency.
Doris’s support for JOINs and subqueries also suits data warehouse modeling (e.g., star schemas), whereas ES’s lack of JOINs necessitates pre-denormalized data or application-layer processing. Thus, Doris outperforms in query ease and power, enhancing integration with analytics ecosystems.
3. Real-Time Data Processing Mechanisms: Write Architecture and Data Updates
Doris and ES adopt distinct approaches to real-time data ingestion and querying. Elasticsearch prioritizes near-real-time search with document-by-document writes and frequent index refreshes. Data is ingested via REST APIs (e.g., Bulk), tokenized, and indexed, becoming searchable after periodic refreshes (default: 1 second). This ensures rapid log retrieval but incurs high write overhead, with CPU-intensive indexing limiting single-core throughput to ~2 MB/s, often causing bottlenecks during peaks.
Apache Doris, conversely, uses a high-throughput batch write architecture. Data is imported in small batches (via Stream Load or Routine Load from queues like Kafka), written efficiently in columnar format across multiple replicas. Avoiding per-field indexing, Doris achieves write speeds 5 times higher than ES per ES Rally benchmarks, and supports direct queue integration, simplifying pipelines.
Key differences in updates and real-time capabilities include:
-
Storage mechanism: Doris’s columnar storage achieves 5:1 to 10:1 compression, using ~20% of ES’s space for the same data, enhancing IO efficiency. ES’s inverted indexes yield a ~1.5:1 compression ratio, inflating storage.
-
Data updates: Doris’s Unique Key model supports primary key updates with minimal performance loss (<10%), while ES’s document updates require costly reindexing (up to 3x performance hit). Doris’s Aggregate Key model ensures consistent aggregations during imports, unlike ES’s less flexible, eventually consistent rollups.
-
Query visibility: ES offers second-level visibility post-refresh, ideal for instant log retrieval. Doris achieves sub-minute visibility via batch imports, sufficient for most real-time analytics, with memory-buffered data ensuring timely query access.
4. Typical Application Scenario Comparison: Log Analysis, BI Reporting, etc.
Scenario
|
Apache Doris
|
Elasticsearch
|
---|---|---|
Log Analysis
|
Excels in storage and multi-dimensional analysis of large logs; supports long-term retention and fast aggregations/JOINs. Enterprises report
10x faster analytics and 60% cost savings, integrating search and analysis with inverted index support
|
Ideal for real-time log search and simple stats; fast keyword retrieval suits monitoring and troubleshooting (e.g., ELK). Struggles with complex aggregations and long-term analysis due to cost and performance limits
|
BI Reporting
|
Perfect for interactive reporting and ad-hoc analysis; full SQL and JOINs support data warehousing and dashboards. A logistics firm saw 5-10x faster queries and 2x concurrency
|
Rarely used for BI; lacks JOINs and robust SQL, limiting complex reporting. Best for simple metrics in monitoring, not rich BI logic
|
Analysis: In log analysis, Doris and ES complement each other: ES handles real-time searches, while Doris manages long-term, complex analytics. For BI, Doris’s SQL and performance make it far superior, directly supporting enterprise data warehouses and reporting.
5. Performance Benchmark Comparison
-
Log analysis: Elasticsearch vs Apache Doris - Apache Doris
-
Performance comparison: write throughput, storage, query response time
6. Enterprise Practice Cases
-
360 security browser: Replaced ES with Doris, improving analytics speed by 10x and cutting storage costs by 60%.
-
Tencent music: Reduced storage by 80% (697GB to 195GB) and boosted writes 4x with Doris.
-
Large bank: Enhanced log analysis efficiency, eliminating redundancy.
-
Payment firm: Achieved 4x write speed, 3x query performance, and 50% storage savings.
Summary
Opinions expressed by DZone contributors are their own.
Comments