OpenTelemetry Metrics

OpenTelemetry Metrics available in gRPC

OpenTelemetry Metrics

OpenTelemetry Metrics available in gRPC

Overview

gRPC provides support for an OpenTelemetry plugin that provides metrics that can help you -

Troubleshoot your system
Iterate on improving system performance
Setup continuous monitoring and alerting.

Background

OpenTelemetry is an observability framework to create and manage telemetry data. gRPC previously provided observability support through OpenCensus which has been sunsetted in the favor of OpenTelemetry.

Instruments

The gRPC OpenTelemetry plugin accepts a MeterProvider and depends on the OpenTelemetry API to create a Meter that identifies the gRPC library being used, for example, grpc-c++ at version 1.57.1. The following listed instruments are created using this meter. Users should employ the OpenTelemetry SDK to customize the views exported by OpenTelemetry.

More and more gRPC components are being instrumented for observability. Currently, we have the following components instrumented -

Per-call (stable, on by default) : Observe RPCs themselves (for example, latency.)
- Client Per-Call : Observe a client call
- Client Per-Attempt : Observe attempts for a client call, since a call can have multiple attempts due to retry or hedging.
- Server : Observe a call received at the server.
LB Policy : Observe various load-balancing policies
- Weighted Round Robin (experimental)
- Pick-First (experimental)
XdsClient (experimental)

NOTE Some instruments are off by default and need to be explicitly enabled from the gRPC OpenTelemetry plugin API. Experimental metrics are always off by default. (Reference C++ API)

Per-Call Metrics

Client Per-Call Instruments

Name	Type	Unit	Labels (required)	Description
grpc.client.call.duration	Histogram	s	grpc.method, grpc.target , grpc.status	This metric aims to measure the end-to-end time the gRPC library takes to complete an RPC from the application’s perspective.

Refer A66: OpenTelemetry Metrics for details.

Client Per-Attempt Instruments

Name	Type	Unit	Labels (disposition)	Description
grpc.client.attempt. started	Counter	{attempt}	grpc.method (required), grpc.target (required)	The total number of RPC attempts started, including those that have not completed.
grpc.client.attempt. duration	Histogram	s	grpc.method (required), grpc.target (required), grpc.status (required), grpc.lb.locality (optional)	End-to-end time taken to complete an RPC attempt including the time it takes to pick a subchannel.
grpc.client.attempt. sent_total_compressed_message_size	Histogram	By	grpc.method (required), grpc.target (required), grpc.status (required), grpc.lb.locality (optional)	Total bytes (compressed but not encrypted) sent across all request messages (metadata excluded) per RPC attempt; does not include grpc or transport framing bytes.
grpc.client.attempt. rcvd_total_compressed_message_size	Histogram	By	grpc.method (required), grpc.target (required), grpc.status (required), grpc.lb.locality (optional)	Total bytes (compressed but not encrypted) received across all response messages (metadata excluded) per RPC attempt; does not include grpc or transport framing bytes.

Refer A66: OpenTelemetry Metrics for details.

Server Instruments

Name	Type	Unit	Labels (required)	Description
grpc.server.call. started	Counter	{call}	grpc.method	The total number of RPCs started, including those that have not completed.
grpc.server.call. sent_total_compressed_message_size	Histogram	By	grpc.method, grpc.status	Total bytes (compressed but not encrypted) sent across all response messages (metadata excluded) per RPC; does not include grpc or transport framing bytes.
grpc.server.call. rcvd_total_compressed_message_size	Histogram	By	grpc.method, grpc.status	Total bytes (compressed but not encrypted) received across all request messages (metadata excluded) per RPC; does not include grpc or transport framing bytes.
grpc.server.call. duration	Histogram	s	grpc.method, grpc.status	This metric aims to measure the end2end time an RPC takes from the server transport’s (HTTP2/ inproc) perspective.

Refer A66: OpenTelemetry Metrics for details.

LB Policy Instruments

Weighted Round Robin LB Policy Instruments

Name	Type	Unit	Labels (disposition)	Description
grpc.lb.wrr. rr_fallback	Counter	{update}	grpc.target (required), grpc.lb.locality (optional)	EXPERIMENTAL: Number of scheduler updates in which there were not enough endpoints with valid weight, which caused the WRR policy to fall back to RR behavior.
grpc.lb.wrr. endpoint_weight_not_yet_usable	Counter	{endpoint}	grpc.target (required), grpc.lb.locality (optional)	EXPERIMENTAL: Number of endpoints from each scheduler update that don’t yet have usable weight information (i.e., either the load report has not yet been received, or it is within the blackout period).
grpc.lb.wrr. endpoint_weight_stale	Counter	{endpoint}	grpc.target (required), grpc.lb.locality (optional)	EXPERIMENTAL: Number of endpoints from each scheduler update whose latest weight is older than the expiration period.
grpc.lb.wrr. endpoint_weights	Histogram	{weight}	grpc.target (required), grpc.lb.locality (optional)	EXPERIMENTAL: Weight of an endpoint recorded every scheduler update.

Refer A78: gRPC OTel Metrics for WRR, Pick First, and XdsClient for details.

Pick First LB Policy Instruments

Name	Type	Unit	Labels (required)	Description
grpc.lb.pick_first. disconnections	Counter	{disconnection}	grpc.target	EXPERIMENTAL: Number of times the selected subchannel becomes disconnected.
grpc.lb.pick_first. connection_attempts_succeeded	Counter	{attempt}	grpc.target	EXPERIMENTAL: Number of successful connection attempts.
grpc.lb.pick_first. connection_attempts_failed	Counter	{attempt}	grpc.target	EXPERIMENTAL: Number of failed connection attempts.

Refer A78: gRPC OTel Metrics for WRR, Pick First, and XdsClient for details.

XdsClient Instruments

Name	Type	Unit	Labels (required)	Description
grpc.xds_client. connected	Gauge	{bool}	grpc.target, grpc.xds.server	EXPERIMENTAL: Whether or not the xDS client currently has a working ADS stream to the xDS server.
grpc.xds_client. server_failure	Counter	{failure}	grpc.target, grpc.xds.server	EXPERIMENTAL: A counter of xDS servers going from healthy to unhealthy.
grpc.xds_client. resource_updates_valid	Counter	{resource}	grpc.target, grpc.xds.server, grpc.xds.resource_type	EXPERIMENTAL: A counter of resources received that were considered valid, even if unchanged.
grpc.xds_client. resource_updates_invalid	Counter	{resource}	grpc.target, grpc.xds.server, grpc.xds.resource_type	EXPERIMENTAL: A counter of resources received that were considered invalid.
grpc.xds_client. resources	Gauge	{resource}	grpc.target, grpc.xds.authority, grpc.xds.cache_state, grpc.xds.resource_type	EXPERIMENTAL: Number of xDS resources.

Refer A78: gRPC OTel Metrics for WRR, Pick First, and XdsClient for details.

Labels/Attributes

With a recorded measurement for an instrument, gRPC might provide some additional information as attributes or labels. For example, grpc.client.attempt.started has the labels grpc.method and grpc.target along with each measurement that tell us the method and the target associated with the RPC attempt being observed.

NOTE Some attributes are marked as optional on the instruments. These need to be explicitly enabled from the gRPC OpenTelemetry Plugin API. (Reference C++ API)

Name	Description
grpc.method	Full gRPC method name, including package, service and method, e.g. “google.bigtable.v2.Bigtable/CheckAndMutateRow”.
grpc.status	gRPC server status code received, e.g. “OK”, “CANCELLED”, “DEADLINE_EXCEEDED”.
grpc.target	Canonicalized target URI used when creating gRPC Channel, e.g. “dns:///pubsub.googleapis.com:443”, “xds:///helloworld-gke:8000”.
grpc.lb.locality	The locality to which the traffic is being sent.
grpc.xds.server	For clients, indicates the target of the gRPC channel in which the XdsClient is used. For servers, will be the string “#server”.
grpc.xds.authority	The xDS authority. The value will be “#old” for old-style non-xdstp resource names.
grpc.xds.cache_state	Indicates the cache state of an xDS resource (“requested”, “does_not_exist”, “acked”, “nacked”, “nacked_but_cached”).
grpc.xds.resource_type	xDS resource type, such as “envoy.config.listener.v3.Listener”.

FAQ

Q. How do I get throughput or QPS (queries per second)?

Use a count aggregation on the latency histogram metrics - grpc.client.attempt.duration / grpc.client.call.duration (for clients) or grpc.server.call.duration (for servers).

Q. How do I get error rate for RPCs?

Error counts can be calculated by using a filter grpc.status != OK value on the latency histogram metrics grpc.client.attempt.duration / grpc.client.call.duration (for clients) or grpc.server.call.duration (for servers).

Language examples

Language	Example
C++	C++ Example
Go	Go Example
Java	Java Example
Python	Python Example

Additional Resources

Last modified August 5, 2024: Add Go and Java examples to OpenTelemetry Metrics Language examples. (#1342) (7b283ae)

View page source Edit this page Create child page Create documentation issue Create project issue