feat(telemetry): support gRPC transport for OTLP exporters#2325
feat(telemetry): support gRPC transport for OTLP exporters#2325aribault wants to merge 2 commits into
Conversation
setup_meter previously hard-coded the MeterProvider construction to only accept resource and metric_readers, forcing users who needed views, exemplar_filter, or shutdown_on_exit to bypass the helper entirely and reconstruct the provider from scratch. This change accepts arbitrary keyword arguments and forwards them, exposing the full MeterProvider surface without changing existing behavior. Resolves: strands-agents#843
setup_otlp_exporter and setup_meter previously hard-coded the HTTP variant
of OTLPSpanExporter and OTLPMetricExporter, leaving users who need gRPC
(common with OpenTelemetry Collector and several vendor agents) no choice
but to bypass the helpers and reconstruct the providers manually.
Both methods now accept a protocol parameter ("http/protobuf" or "grpc"),
falling back to the standard OTEL_EXPORTER_OTLP_PROTOCOL env var, then to
http/protobuf. The gRPC dependency lives in a new opt-in 'otel-grpc' extra
so existing 'otel' installs are unchanged in size and behavior.
Resolves: strands-agents#689
|
/strands review |
|
|
||
| all = ["strands-agents[a2a,anthropic,docs,gemini,litellm,llamaapi,mistral,ollama,openai,writer,sagemaker,otel]"] | ||
| all = ["strands-agents[a2a,anthropic,docs,gemini,litellm,llamaapi,mistral,ollama,openai,writer,sagemaker,otel,otel-grpc]"] | ||
| bidi-all = ["strands-agents[a2a,bidi,bidi-io,bidi-gemini,bidi-openai,docs,otel]"] |
There was a problem hiding this comment.
Issue: The bidi-all extra includes otel but not otel-grpc. Since the all extra now includes both, bidi-all appears inconsistent. Was this intentional?
Suggestion: If bidi-all is meant to be a comprehensive superset for bidirectional use cases, consider adding otel-grpc there as well for consistency, or document why it's excluded.
| from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter | ||
| resolved = _resolve_otlp_protocol(protocol) | ||
| if resolved == _GRPC_PROTOCOL: | ||
| try: |
There was a problem hiding this comment.
Issue: Behavioral change — setup_otlp_exporter previously caught all exceptions (including import failures) and logged them. Now ValueError and ImportError propagate before reaching the try/except block, which is a subtle contract change for existing callers that relied on this method never raising.
Suggestion: This is arguably the correct design (fail-fast on configuration errors vs. silently misconfiguring), but it's worth calling out in the PR description as a minor behavioral change for existing users. If someone was calling setup_otlp_exporter() without the otel extra installed, they would have previously gotten a logged exception and a no-op; now they get an ImportError.
| if resolved not in _VALID_OTLP_PROTOCOLS: | ||
| raise ValueError(f"protocol=<{resolved}> | unsupported OTLP protocol, must be one of {_VALID_OTLP_PROTOCOLS}") | ||
| return resolved | ||
|
|
There was a problem hiding this comment.
Issue: Using protocol or os.getenv(...) means that if a caller explicitly passes protocol="" (empty string), it will be treated as falsy and fall through to the env var/default, rather than raising a ValueError for the invalid value.
Suggestion: Consider using protocol if protocol is not None else os.getenv(...) to distinguish between "not provided" (None) and "provided but invalid" (empty string). This gives clearer error feedback when an empty string is accidentally passed.
| ValueError: When otlp_protocol is not a supported OTLP protocol. | ||
| ImportError: When enable_otlp_exporter=True and the optional extra | ||
| for the resolved protocol is not installed (`otel` for | ||
| http/protobuf, `otel-grpc` for grpc). |
There was a problem hiding this comment.
Issue: The docstring for setup_meter states that resource and metric_readers "cannot be overridden via this parameter", but there's no runtime guard. If a caller passes resource=my_resource or metric_readers=[...] in **provider_kwargs, they'd silently override the managed values (Python dict merge semantics with **kwargs after explicit keyword args would raise a TypeError for duplicates at the call site, which is fine — but it's worth verifying this is the intended error UX).
Actually, since resource and metric_readers are passed as explicit keyword arguments before **provider_kwargs, passing them in kwargs would raise TypeError: got multiple values for argument. This is fine behavior, but the error message won't be very user-friendly. Consider adding a check and a clearer error message, or just note this is acceptable.
| """protocol='grpc' imports the gRPC OTLPSpanExporter.""" | ||
| monkeypatch.delenv("OTEL_EXPORTER_OTLP_PROTOCOL", raising=False) | ||
| fake_modules, fake_module = _inject_grpc_trace_module() | ||
| with mock.patch.dict(sys.modules, fake_modules): |
There was a problem hiding this comment.
Issue: The test uses mock.patch.dict(sys.modules, fake_modules) with a context manager, but the StrandsTelemetry() instantiation inside the with block triggers _initialize_tracer which has side effects (setting global tracer provider, propagators). Since mock_tracer_provider fixture already patches SDKTracerProvider, this works — but the intent could be clearer.
More importantly: the assertion fake_module.OTLPSpanExporter.assert_called_once_with(foo="bar") happens outside the mock.patch.dict context. This works because the mock object persists after the context exits, but it's a subtle pattern. Consider adding a brief comment explaining why the assertion is outside the with block (the mock object is still valid even after sys.modules is restored).
|
Assessment: Comment (leans toward Approve) This is a clean, well-designed feature that addresses a real user need (issue #689). The implementation follows OTel conventions, maintains backward compatibility for existing users, and has solid test coverage. Review Themes
Nice work aligning with the standard OTel SDK resolution order and keeping the HTTP default unchanged. |
Motivation
StrandsTelemetry.setup_otlp_exporterand the OTLP-metrics path insidesetup_meterhard-code the HTTP variant ofOTLPSpanExporter/OTLPMetricExporter. Users running their telemetry through the OpenTelemetry Collector (or vendor agents that prefer gRPC) currently have to bypass the helpers entirely and reconstruct the providers themselves.This change adds first-class gRPC support while keeping HTTP as the default, matching
Unshure's direction in #689.Resolves: #689
Public API Changes
setup_otlp_exporternow accepts aprotocolparameter, andsetup_meteracceptsotlp_protocol. Resolution order is explicit arg →OTEL_EXPORTER_OTLP_PROTOCOLenv var →http/protobuf(matching the OTel SDK convention). When\"grpc\"is selected and the optional dependency is missing, anImportErrorsurfaces apip installhint pointing at the newotel-grpcextra.New Optional Dependency
A new
otel-grpcextra inpyproject.toml:The existing
otelextra is untouched (HTTP-only). Users opt into gRPC explicitly viapip install 'strands-agents[otel-grpc]'. Theallmeta-extra now includesotel-grpcso users on[all]get both transports.Existing callers (no
protocolargument,otelextra installed) keep their previous behavior with no install-size growth.