Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Nested Queries

Overview

Join the Preview!

This feature is in Preview. To enable this feature, reach out to Customer Success.

By default, every metric query in Datadog consists of two layers of aggregation. Nested queries allows you to reuse the results of a previous query in a subsequent one.

Nested queries unlocks several powerful capabilities:

Multilayer aggregation

In Datadog, each metric query in Datadog is evaluated with two layers of aggregation: first by time, then by space. Multilayer aggregation allows you to apply additional layers of time or space aggregation. For more information on aggregation, see the anatomy of a metric’s query.

example of applying nested queries before and after

Multilayer time aggregation

Access multilayer time aggregation with the rollup function. Every metric query already contains an initial rollup (time aggregation) that controls the granularity of the data points displayed on the graph. For more information, see the Rollup documentation.

You can apply additional layers of time aggregation with subsequent rollups.

The first rollup supports the following aggregators:

  • avg
  • sum
  • min
  • max
  • count

Additional layers provided by multilayer time aggregation supports the following time aggregators:

  • avg
  • sum
  • min
  • max
  • count
  • arbitrary percentile pxx (p78, p99, p99.99, etc.)
  • stddev

Multilayer time aggregation can be used with the following functions:

Supported FunctionsDescription
Arithmetic operators+, -, *, /
Timeshift functions<METRIC_NAME>{*}, -<TIME_IN_SECOND>
hour_before(<METRIC_NAME>{*})
day_before(<METRIC_NAME>{*})
week_before(<METRIC_NAME>{*})
month_before(<METRIC_NAME>{*})
Top-k selectiontop(<METRIC_NAME>{*}, <LIMIT_TO>, '<BY>', '<DIR>')

Any functions not listed above cannot be combined with multilayer time aggregation.

This query first calculates the average CPU utilization for each EC2 instance grouped by env and team, rolled up into 5-minute intervals. Then multilayer time aggregation is applied to calculate the 95th percentile in time of this nested query over 30m intervals.

example of multilayer time aggregation in the JSON
example of multilayer time aggregation in the JSON

Multilayer space aggregation

After you specify tag(s) in your first layer of space aggregation to group by, access multilayer space aggregation with the Group By function.

You can apply additional layers of space aggregation with subsequent Group Bys. Note: if you do not specify tag(s) to group by in your initial space aggregation layer, multilayer space aggregation will not be available.

The first layer of space aggregation supports the following aggregators:

  • avg by
  • sum by
  • min by
  • max by

Additional layers of space aggregation support:

  • avg by
  • sum by
  • min by
  • max by
  • arbitrary percentile pXX (p75, p99, p99.99, etc.)
  • stddev by

Multilayer space aggregation can be used with the following functions:

Supported FunctionsDescription
Arithmetic operators+, -, *, /
Timeshift functions<METRIC_NAME>{*}, -<TIME_IN_SECOND>
hour_before(<METRIC_NAME>{*})
day_before(<METRIC_NAME>{*})
week_before(<METRIC_NAME>{*})
month_before(<METRIC_NAME>{*})
Top-k selectiontop(<METRIC_NAME>{*}, <LIMIT_TO>, '<BY>', '<DIR>')

Any other functions not listed above cannot be combined with multilayer space aggregation.

All space aggregators with the exception of percentile space aggregators have one argument, which is the tag key(s) you want to group by. Percentile space aggregators require two arguments:

  • The arbitrary percentile pXX
  • The tag(s) to group by

This initial query, avg:aws.ec2.cpuutilization{*} by {env,host}.rollup(avg, 300) calculates the sum of average CPU utilization, grouped by env and host every 5 minutes. Then multilayer space aggregation is applied to calculate maximum value of the average CPU utilization by env.

In the UI or JSON tab, it would look as follows:

example of multilayer space aggregation in the UI
example of multilayer space aggregation in the JSON

Percentiles and Standard Deviation for aggregated counts, rates, and gauges

You can use multilayer aggregation (time and space) to query percentiles and standard deviation from queries on counts, rates, and gauges. They allow you to better understand the variability and spread of your large datasets and allow you to better identify outliers.

Note: The percentile or standard deviation aggregators in Nested queries are calculated using the results of an existing, aggregated count, rate, or gauge metrics. For globally accurate percentiles that are computed on unaggregated, raw values of a metric, use distribution metrics instead.

We can use percentiles in multilayer time aggregation to summarize the results of our nested query (avg CPU utilization by env and team every 5 minutes) by calculating the p95th value of this nested query every 30 minutes.

example of MLA time agg percentiles using nested queries in the UI
example of MLA time agg percentiles using nested queries in the JSON

We can use percentiles in multilayer space aggregation to summarize the results of our nested query (avg CPU utilization by env and team every 5 minutes) by calculating the p95th value of this nested query for every unique env value.

In the UI or JSON tab, it would look as follows:

example of MLA space agg percentiles using nested queries in the UI
example of MLA space agg percentiles using nested queries in the JSON

Standard deviation helps measure the variability or dispersion of a dataset. The following query uses standard deviation with multilayer time aggregation to calculate the standard deviation of our nested query (sum of API request counts, averaged over 4 hour) over longer twelve-hour periods:

In the UI or JSON tab, it would look as follows:

example of standard deviation with nested queries in the UI
example of standard deviation with nested queries in the JSON

Higher resolution queries over historical time frames

Every metric query contains an initial layer of time aggregation (rollup) which controls the granularity of datapoints shown. Datadog provides default rollup time intervals that increase as your overall query timeframe grows. With nested queries, you can access more granular, high-resolution data over longer, historical timeframes.

example of higher resolution queries over historical timeframes in the UI

Historically, when querying a metric over the past month, you would see data at 4-hour granularity by default. You can use nested queries to access higher granularity data over this historical timeframe. Here’s an example query graphed over the past month where the query batch count is initially rolled up in 5 minute intervals. Then multilayer time aggregation is applied to calculate the standard deviation in time of this nested query over 4 hour intervals for a more human-readable graph.

Note: Datadog recommends that you define your initial rollup with the most granular rollup interval and use multilayer time aggregation with coarser rollup intervals to get more user-readable graphs.

In the UI or JSON tab, it would look as follows:

example of higher resolution queries using nested queries in the UI
example of higher resolution queries using nested queries in the JSON

Use nested queries with Datadog’s API

You can use nested queries functionality in our public API for querying timeseries data. Change the contents of the formula object

example of higher resolution queries using nested queries in the JSON

Further reading

Additional helpful documentation, links, and articles: