feat(tracing): enrich OTel resource, compress OTLP, annotate spans#5502
feat(tracing): enrich OTel resource, compress OTLP, annotate spans#5502martinconic wants to merge 6 commits into
Conversation
|
🤖 AI-assisted review (Claude + Gemini second opinion) — +1, looks solid / ready. Nicely addresses the resource-attribution and flag feedback from #5456. Spans are attributable per node now ( Two small things worth confirming (non-blocking):
+1 |
Addressed with comments (no behavior change):
|
Checklist
Description
Follow-up to the OpenTelemetry migration (#5456) that addresses the reviewers' feedback. It targets
feat/opentelemetry-migrationrather thanmaster, so the migration PR stays focused on the OpenTracing→OTel swap while the review changes are collected and reviewed here. Once merged, these commits become part of #5456.Without these changes, every node's spans look identical in an OTLP backend and a request cannot be followed end to end. This PR makes spans attributable per node, network and release, trims export bandwidth, surfaces tracing problems, and lets a value such as the postage batch id follow an upload across hops.
Resource attributes —
service.version,deployment.environment(derived from the network id),host.name, andservice.instance.id(the node's overlay address). Env-based options (WithFromEnv/WithTelemetrySDK/WithHost) are applied before the configured attributes, so configured values win while inNewBeewas moved below overlay finalization so the (immutable) resource can carry the final overlay as the instance id.Exporter & processor — gzip compression on the HTTP and gRPC OTLP clients; the batch span processor keeps SDK defaults and honors the standard
OTEL_BSP_*env vars (documented, no new flags).Observability of tracing itself — a confirmation log line once tracing is wired up (endpoint, protocol, sampling ratio), and OTLP exporter errors (e.g. an unreachable collector) are routed through the OTel global error handler to the node logger instead of being silently dropped.
Span attributes & new spans — chunk
addresson the netstore get/put spans andpeer_addresson the pingpong spans; new spans on the previously untraced subsystems:postage-batch-create(withbatch_id),kademlia-connect(withpeer_address), andsalud-round(parenting the per-peer status snapshots).Baggage propagation — W3C baggage now propagates across HTTP (composite TraceContext + Baggage propagator) and p2p (a separate additive
tracing-baggageheader; the existing span-context carrier is unchanged and peers that don't understand the header simply ignore it). The API attaches thebatch_idas baggage when stamping a chunk, so it follows the chunk into the direct-upload/pushsync path.Flag rename (breaking) — the tracing flags drop the
otlpinfix:tracing-otlp-endpoint/-insecure/-ca-file/-protocol→tracing-endpoint/-insecure/-ca-file/-protocol(and the nestedtracing.otlp-*config keys likewise).tracing-endpointis reused as the active OTLP collector endpoint, so it is removed from the deprecated no-op keys (tracing-host/tracing-portremain deprecated).Open API Spec Version Changes (if applicable)
Motivation and Context (Optional)
Addresses both review passes from @darkobas2 on #5456 (resource enrichment / exporter hardening / span coverage / baggage, and the flag rename).
Related Issue (Optional)
Stacked on #5456.
Screenshots (if appropriate):
AI Disclosure