Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
194 views

Elasticsearch Monitoring Cheatsheet

This document provides a cheatsheet for monitoring and tuning Elasticsearch. It lists various API endpoints and commands that can be used to collect metrics on nodes, clusters, indexes, searches, indexing, garbage collection, disk usage, memory usage, and more. It also lists default directories for configuration, logs, and data on various operating systems. The document recommends testing any tuning actions before implementing in production environments.

Uploaded by

timar89
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
194 views

Elasticsearch Monitoring Cheatsheet

This document provides a cheatsheet for monitoring and tuning Elasticsearch. It lists various API endpoints and commands that can be used to collect metrics on nodes, clusters, indexes, searches, indexing, garbage collection, disk usage, memory usage, and more. It also lists default directories for configuration, logs, and data on various operating systems. The document recommends testing any tuning actions before implementing in production environments.

Uploaded by

timar89
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Cheatsheet: Elasticsearch Monitoring

Note:
ʒ Windows users should download cURL to use the commands below.
Collect these OOTB metrics with Datadog START YOUR FREE TRIAL
ʒ Some commands require jq to parse JSON for relevant metrics.
ʒ For more info, visit dtdg.co/monitoring-elasticsearch

General monitoring API endpoints Thread pool queues & rejections—more info
METRIC DESCRIPTION COMMAND METRIC DESCRIPTION COMMAND
Stats from all nodes curl 'localhost:9200/_nodes/stats' Number of queued threads in a thread pool curl 'localhost:9200/_nodes/stats/thread_pool' | jq '.nodes[] |
{node_name: .name, bulk_queue: .thread_pool.bulk.queue, search_
Stats from specific nodes curl 'localhost:9200/_nodes/­node1,node2/stats' queue: .thread_pool.search.queue, index_queue: .thread_pool.
Stats from a specific index index.queue}'
curl 'localhost:9200/<INDEX_NAME>/_stats'
Number of rejected threads in a thread pool curl 'localhost:9200/_nodes/stats/thread_pool' |
Cluster-wide stats curl 'localhost:9200/_cluster/stats' jq '.nodes[] | {node_name: .name, bulk_rejected:
.thread_pool.bulk.rejected, search_rejected:
.thread_pool.search.rejected, index_rejected:
.thread_pool.index.rejected}'
Cluster health—more info
METRIC DESCRIPTION COMMAND
Cluster status & unassigned shards curl 'localhost:9200/_cat/health?v' Fielddata cache usage
METRIC DESCRIPTION COMMAND
Size of the fielddata cache (bytes) curl 'localhost:9200/_cat/nodes?v&h=name,fielddataMemory'
Search performance—more info Number of evictions from the fielddata curl 'localhost:9200/_cat/nodes?v&h=name,fielddataEvictions'
cache
METRIC DESCRIPTION COMMAND
Number of times the fielddata circuit breakercurl 'localhost:9200/_nodes/stats/breaker' | jq '.nodes[] |
Total number of queries curl 'localhost:9200/_cat/nodes?v&h=name,searchQueryTotal' has been tripped (ES version >=1.3) {node_name: .name, fielddata: .breakers.fielddata}'
Total time spent on queries curl 'localhost:9200/_cat/nodes?v&h=name,searchQueryTime'
Number of queries currently in progress curl 'localhost:9200/_cat/nodes?v&h=name,searchQueryCurrent'
Total number of fetches curl 'localhost:9200/_cat/nodes?v&h=name,searchFetchTotal' Host-level network and system metrics—more info
Total time spent on fetches curl 'localhost:9200/_cat/nodes?v&h=name,searchFetchTime' METRIC DESCRIPTION COMMAND
Number of fetches currently in progress curl 'localhost:9200/_cat/nodes?v&h=name,searchFetchCurrent' Disk space total, free, curl 'localhost:9200/_nodes/stats/fs' | jq '.nodes[] | {node_name:
available .name, disk_total_in_bytes: .fs.total.total_in_bytes,
disk_free_in_bytes: .fs.total.free_in_bytes, disk_available_in_bytes:
.fs.total.available_in_bytes}'

Indexing performance—more info Percent of disk in use curl 'localhost:9200/_cat/allocation?v'

METRIC DESCRIPTION COMMAND Memory curl 'localhost:9200/_nodes/​stats/os'

Total number of documents indexed curl 'localhost:9200/_cat/nodes?v&h=name,indexingIndexTotal' CPU curl 'localhost:9200/_nodes/stats/os'

Total time spent indexing documents curl 'localhost:9200/_cat/nodes?v&h=name,indexingIndexTime' I/O utilization Consult a tool like iostat
Number of documents currently being curl 'localhost:9200/_cat/nodes?v&h=name,indexingIndexCurrent' Used file descriptors curl 'localhost:9200/_cat/nodes?v&h=host,name,​fileDescriptorPercent'
indexed percentage
Total number of index flushes to disk curl 'localhost:9200/_cat/nodes?v&h=name,flushTotal' Network bytes sent/received curl 'localhost:9200/_nodes/stats/transport' | jq '.nodes[] |
{node_name: .name, network_bytes_sent: .transport.tx_size_in_bytes, network_
Total time spent on flushing indices to disk curl 'localhost:9200/_cat/nodes?v&h=name,flushTotalTime' bytes_received: .transport.rx_size_in_bytes}'
HTTP connections currently curl 'localhost:9200/_nodes/stats/http' | jq '.nodes[] | {node_name: .name,
open & total opened over time http_current_open: .http.current_open, http_total_opened:
.http.total_opened}'
JVM heap usage—more info
METRIC DESCRIPTION COMMAND
Garbage collection frequency and duration curl 'localhost:9200/_nodes/stats/jvm' | jq '.nodes[] | {node_ Default directories
name: .name, young_gc_count:
.jvm.gc.collectors.young.collection_count, young_gc_time: .jvm. DEBIAN/UBUNTU RHEL/CENTOS ZIP OR TAR INSTALLATION
gc.collectors.young.collection_time_in_millis,
old_gc_count: .jvm.gc.collectors.old.collection_count, Configuration /etc /etc <ELASTICSEARCH INSTALLATION HOME
old_gc_time: ↳/elasticsearch ↳/elasticsearch DIRECTORY>/config
.jvm.gc.collectors.old.collection_time_in_millis}' Logs /var/log /var/log <ELASTICSEARCH INSTALLATION HOME
Percent of JVM heap currently in use curl 'localhost:9200/_cat/nodes?v&h=name,heapPercent' ↳/elasticsearch ↳/elasticsearch DIRECTORY>/logs
Data /var/lib /var/lib <ELASTICSEARCH INSTALLATION HOME
↳/elasticsearch ↳/elasticsearch DIRECTORY>/data
↳/data
Pending tasks
METRIC DESCRIPTION COMMAND
Number of pending tasks curl 'localhost:9200/_cluster/pending_tasks'
Cheatsheet: Elasticsearch Tuning
Note:
ʒ Windows users should download cURL to use the commands below.
Results of each suggested action may vary depending on your particular use case and setup.
Please test them out before implementing in production. For more info, visit dtdg.co/tuning-elasticsearch

Unassigned shards—more info Tune the JVM heap size


Check which shards are unassigned: Note: The Elasticsearch docs recommend setting your heap size below 50% of a node's available memory (and never
curl 'localhost:9200/_cat/shards' | grep UNASSIGNED going above 32GB), to leave more memory for the file system cache.
SUGGESTED ACTION COMMAND SUGGESTED ACTION COMMAND
Reduce number of replicas for an index curl -XPUT 'localhost:9200/<INDEX_NAME>/_settings' -d '{"number_ Set heap size upon starting up Elasticsearch ES_HEAP_SIZE=DESIRED_SIZE (e.g. "3g")
(master will not assign multiple copies of a of_replicas": <DESIRED NUMBER OF REPLICAS>}' ./bin/elasticsearch
shard on the same node)
Set heap as an environment variable export ES_HEAP_SIZE=DESIRED_SIZE (e.g. 3g)
Re-enable shard allocation curl -XPUT 'localhost:9200/_cluster/settings' -d '{"transient": (requires Elasticsearch restart)
{"cluster.routing.allocation.enable": "all"}}'
Manually allocate an unassigned shard curl -XPOST 'localhost:9200/_cluster/reroute' -d '{"commands":
[{"allocate": {"index": "<INDEX_NAME>", "shard": <SHARD_NUMBER>,
"node": "<NODE_NAME>"}}]}' Bulk rejections—more info
Check disk usage; master node will not curl 'localhost:9200/_cat/allocation?v' Implement a linear or exponential backoff strategy until the bulk rejections decrease.
assign shards to any node using >85% of
disk
Check that every node is running the same curl 'localhost:9200/_cat/nodes?v&h=host,name,version'
Backlog of pending tasks
version of Elasticsearch; master node will ʒ Allocate more resources to master-eligible nodes.
not assign to older version ʒ Create a new cluster if you suspect that the current cluster's demands have outgrown the master's capabilities.
ʒ Make sure your mappings do not allow users to create an unlimited number of new fields in documents.

Search performance—more info


Log slow queries in slow search log (replace with your desired thresholds): Fielddata usage
curl -XPUT 'localhost:9200/<INDEX_NAME>/_settings' -d '{ SUGGESTED ACTION COMMAND
"index.search.slowlog.threshold.query.warn" : "10s",
"index.search.slowlog.threshold.fetch.debug": "500ms", Enable doc values for a non-analyzed string curl -XPUT 'localhost:9200/<INDEX_NAME>/_mapping/<DOC_TYPE>'
"index.indexing.slowlog.threshold.index.info": "5s" field (enabled by default for ES versions -d '{"properties": {"<FIELD_NAME>": {"type": "string", "index":
}' 2.0+) "not_analyzed", "doc_values": true }}}'
SUGGESTED ACTION COMMAND
Route high-priority, low-volume documents curl -XPUT 'localhost:9200/<INDEX_NAME>' -d '{"mappings": {"<DOC_
of TYPE>": {"_routing": {"required": true}}}}'
a <DOC_TYPE> to the same place so only one Low disk space—more info
shard will be queried ʒ General actions:
Merge segments in an index ES versions 2.1.0+: ʒ Turn off replication for outdated data
curl -XPOST 'localhost:9200/<INDEX_NAME>/_forcemerge' ʒ Store old data off-cluster
ES versions prior to 2.1.0: ʒ If all nodes are running out of disk space:
curl -XPOST 'localhost:9200/<INDEX_NAME>/_optimize' ʒ Add more data-eligible nodes
ʒ If specific nodes are running out of disk space:
ʒ Reindex the data into a new index with a greater number of primary shards, and make sure you have
enough data nodes to evenly distribute the shards
ʒ Upgrade the hardware on those nodes (scale vertically)
Indexing performance—more info
SUGGESTED ACTION COMMAND
Bulk index documents from a JSON curl -XPOST 'localhost:9200/<INDEX_NAME>/<MY_TYPE>/_bulk?pretty'
file --data-binary "@<YOUR_FILE>.json"
Increase refresh interval to optimize curl -XPUT 'localhost:9200/<INDEX_NAME>/_settings' -d '{"index":
indexing, rather than making new {"refresh_interval": DESIRED_INTERVAL, e.g. "30s"}}'
data immediately searchable
Disable merge throttling to leave curl -XPUT 'localhost:9200/_cluster/settings' -d '{"transient":
more {"indices.store.throttle.type": "none"}}'
resources for indexing, not merging
Disable shard replication curl -XPUT 'localhost:9200/<INDEX_NAME>/_settings' -d
'{"number_of_replicas": 0}'
Commit translog to disk less curl -XPUT 'localhost:9200/<INDEX_NAME>/_settings' -d '{"index":
frequently {"translog": {"durability": "async"}}}'
Cheatsheet: Elasticsearch Monitoring with Datadog
Note:
ʒ For metric descriptions and more info: dtdg.co/monitoring-elasticsearch

4. JVM heap usage—more info


METRIC DESCRIPTION DATADOG METRIC NAME
Garbage collection frequency and duration jvm.gc.collectors.young.count
jvm.gc.collectors.young.collection_time
jvm.gc.collectors.old.count
jvm.gc.collectors.old.collection_time
Percent of JVM heap currently in use jvm.mem.heap_in_use

5. Pending tasks
METRIC DESCRIPTION DATADOG METRIC NAME
Number of pending tasks elasticsearch.pending_tasks_total

6. Thread pool queues & rejections—more info


METRIC DESCRIPTION DATADOG METRIC NAME
Number of queued threads in a thread pool elasticsearch.thread_pool.bulk.queue
elasticsearch.thread_pool.index.queue
elasticsearch.thread_pool.search.queue
Number of rejected threads in a thread pool elasticsearch.thread_pool.bulk.rejected
elasticsearch.thread_pool.index.rejected
elasticsearch.thread_pool.search.rejected

7. Fielddata cache usage


METRIC DESCRIPTION DATADOG METRIC NAME
Size of the fielddata cache (bytes) elasticsearch.fielddata.size
Number of evictions from the fielddata cache elasticsearch.fielddata.evictions
Number of times the fielddata circuit breaker elasticsearch.breakers.fielddata.tripped
Datadog's out-of-the-box screenboard. Sections 1-8 correspond to the metric categories outlined below. has been tripped (ES version >=1.3)

1. Cluster health—more info 8. Host-level network and system metrics—more info


METRIC DESCRIPTION DATADOG METRIC NAME METRIC DESCRIPTION DATADOG METRIC NAME
Cluster status elasticsearch.cluster_status Percent of disk space in use system.disk.in_use
Number of unassigned shards elasticsearch.unassigned_shards Page cache usage system.mem.cached
CPU system.cpu.system
2. Search performance—more info I/O utilization system.io.util
METRIC DESCRIPTION DATADOG METRIC NAME Open file descriptors elasticsearch.process.open_fd
Total number of queries elasticsearch.search.query.total Network bytes sent/received system.net.bytes_sent
system.net.bytes_rcvd
Total time spent on queries (s) elasticsearch.search.query.time
HTTP connections currently open & total elasticsearch.http.current_open
Number of queries in progress elasticsearch.search.query.current opened over time elasticsearch.http.total_opened
Total number of fetches elasticsearch.search.fetch.total
Total time spent on fetches (s) elasticsearch.search.fetch.time
Default directories
Number of fetches in progress elasticsearch.search.fetch.current
DEBIAN/UBUNTU RHEL/CENTOS ZIP OR TAR INSTALLATION
Configuration /etc /etc <ELASTICSEARCH INSTALLATION HOME
↳/elasticsearch ↳/elasticsearch DIRECTORY>/config
3. Indexing performance—more info
Logs /var/log /var/log <ELASTICSEARCH INSTALLATION HOME
METRIC DESCRIPTION DATADOG METRIC NAME ↳/elasticsearch ↳/elasticsearch DIRECTORY>/logs
Total number of documents indexed elasticsearch.indexing.index.total Data /var/lib /var/lib <ELASTICSEARCH INSTALLATION HOME
Total time spent indexing documents (s) elasticsearch.indexing.index.time ↳/elasticsearch ↳/elasticsearch DIRECTORY>/data
↳/data
Number of documents currently being indexed elasticsearch.indexing.index.current
Total number of index flushes to disk elasticsearch.flush.total
Total time spent on flushing indices to disk (s) elasticsearch.flush.total.time Monitor Elasticsearch Technology with Datadog Free START YOUR FREE TRIAL

You might also like