Correlate Software Performance and Resource Consumption With New Saved Views in Live Processes | Datadog
  Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Correlate software performance and resource consumption with new saved views in Live Processes

Author Yael Goldstein
Author David M. Lentz

Published: May 13, 2021

Your applications rely on third-party software running throughout your infrastructure, and it can be challenging to monitor each of these technologies individually. To give you the visibility you need, Datadog Live Processes now monitors all of your third-party workloads in one place. In this post, we’ll show you how Live Processes allows you to:

The Live Processes Integration Metrics tab shows graphs that visualize Redis metrics.

Quickly find and share saved views within Live Processes

Live Processes shows you all of the processes running in your infrastructure, and now saved views let you quickly filter your data to focus on a single technology. Live Processes provides out-of-the-box saved views for many technologies, so you can see performance and resource usage data from third-party software throughout your infrastructure without composing queries or configuring filters.

Saved views make it easy to determine whether a resource usage pattern is specific to a single host or whether it applies to the technology as a whole. For example, the screenshot below shows the NGINX saved view, sorted to find the NGINX processes with the highest CPU usage. This view makes it clear that only one of the NGINX hosts is at 100 percent CPU utilization, so you can focus your troubleshooting there—for example, investigating that host’s configuration and processes—rather than searching for an issue that affects your entire NGINX workload.

The Live Processes view shows a summary of third-party software metrics—in this example, NGINX.

It’s easy to customize any saved view to suit your use case. Starting with an out-of-the-box saved view, you can filter and group your data, then save your customized view under a new name. The screenshot below shows a customized view based on the out-of-the-box NGINX saved view shown above. It uses the env and availability-zone tags to focus on processes running in a specific segment of infrastructure. It also uses the team custom tag to display only processes associated with the ads team. Once you’ve customized a saved view, you can share the URL via email or Slack to facilitate cross-team troubleshooting.

A customized saved view shows NGINX resource usage scoped to a single availability zone.

Correlate performance with resource usage

If you need to troubleshoot degraded performance in your third-party software, you may find the cause of the problem in its resource usage. From the Live Processes view, you can correlate performance of a third-party software process (shown in the Integration Metrics tab) with its resource consumption (visualized in the Resource Metrics tab).

If these two tabs reveal a correlation between performance and resource consumption, you can use that information to guide your troubleshooting. For example, in the screenshot below, the MySQL Integration Metrics tab shows a spike in the mysql.performance.slow_queries metric, indicating increased latency in some queries at that time.

The Live Processes Integration Metrics tab visualizes a spike in MySQL's rate of slow queries.

To search for the cause of the increased latency, you can click the Resource Metrics tab to see details about MySQL’s resource consumption on this host. The screenshot below shows a spike in the host’s CPU utilization at the same time as the increased latency in the screenshot above.

The Live Process Resource Metrics tab visualizes a spike in CPU utilization.

This correlation could mean that a CPU-intensive operation—for example, a query that requires a full table scan—is contributing to increased latency by causing MySQL to delay execution of other queries until CPU resources are available.

If resource metrics don’t show a correlation with integration metrics, the root of the problem could be a separate process running on the same infrastructure. For example, if CPU utilization on the MySQL host rises but latency remains low, a separate process like a misconfigured log rotation—which you’ll see if you click the Related Processes tab—could be the cause.

You can easily export any graph from the Resource Metrics tab to a notebook, a new dashboard, or an existing dashboard to share it with your team for further collaboration. And you can quickly navigate from the Integration Metrics tab to the MySQL out-of-the-box dashboard to explore its most important metrics.

If you suspect that the underlying cause of an issue is an error in your application’s code, you can explore logs to uncover application activity and errors in the Logs tab, and visualize code dependencies and bottlenecks in the Traces tab. To investigate problems in the flow of data to and from your application, you can view the Network tab, then dig deeper using Network Performance Monitoring. And to explore all processes executed by the same command or running on the same host, click the Related Processes tab.

Discover integrations to expand your visibility into third-party workloads

To maximize your visibility, Live Processes automatically detects when third-party software running in your infrastructure has an integration you can enable. This helps you avoid blind spots by ensuring that you enable the integration across all of your infrastructure. If you haven’t enabled an integration on all hosts or pods where the software is running, you’ll see a notification on the Live Processes page that identifies the integration and provides a link to enable it.

The screenshot below shows the prompt you’ll see if Datadog auto-detects that one or more Memcached servers is running—but not enabled for monitoring—in your infrastructure.

The Live Processes page notes that Datadog has detected Memcached running on a server, but it's not enabled for monitoring by Datadog.

Each auto-detected integration you enable has its own saved view, and you can see its integration metrics and dashboard as soon as you enable it. You’ll also find a list of auto-detected integrations on your account’s Integrations page, as shown in the screenshot below.

The Datadog integrations page shows eight auto-detected integrations, some displaying the percentage of hosts on which the integration has been installed.

Track third-party software with Live Processes

To quickly gain visibility into the health and resource usage of your third-party software, enable Live Processes today. You’ll see out-of-the-box saved views that make it easy to track integration metrics and performance data for the third-party software you rely on—with no setup required. Datadog will even auto-detect the integrations you haven’t yet started monitoring. If you’re new to Datadog, you can get started with a .