feat(otel): client-side redaction of PII and/or secrets#584
feat(otel): client-side redaction of PII and/or secrets#584simonvdk-mistral wants to merge 12 commits into
Conversation
Fold AWS, Google, JWT, PEM and Stripe key patterns into DEFAULT_TOKEN_PATTERNS and compose DEFAULT_PII_SECRET_PATTERNS from it, with tests covering each secret and the composition invariant. Generated by Mistral Vibe. Co-Authored-By: Mistral Vibe <vibe@mistral.ai>
Use a conditional base (SpanExporter under TYPE_CHECKING, object at runtime) so linters verify the export/shutdown/force_flush overrides while keeping the OpenTelemetry SDK an optional import. Generated by Mistral Vibe. Co-Authored-By: Mistral Vibe <vibe@mistral.ai>
Add a shared _redact_value helper so both policies scan the string elements of list/tuple attribute values instead of passing them through verbatim, preserving the container type and leaving numeric/bool sequences untouched. Generated by Mistral Vibe. Co-Authored-By: Mistral Vibe <vibe@mistral.ai>
Rewrite test_redaction.py as pytest classes with fixtures and parametrization, and convert the TestTelemetryRedaction class in test_telemetry.py to a plain pytest class (using caplog for log assertions). Optional span attributes/context are narrowed with asserts instead of file-level pyright suppressions. Older telemetry tests are left as-is. Generated by Mistral Vibe. Co-Authored-By: Mistral Vibe <vibe@mistral.ai>
️✅ There are no secrets present in this pull request anymore.If these secrets were true positive and are still valid, we highly recommend you to revoke them. 🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request. |
These are all fake/dummy secrets to test redaction |
Switch default_redaction_policy() from the key-oriented AttributeRedactionPolicy to the content-oriented RegexRedactionPolicy, which preserves keys/structure and redacts only matched secret/PII substrings. Update docstrings, README policy table, the example, and adapt the behavioural tests accordingly. Generated by Mistral Vibe. Co-Authored-By: Mistral Vibe <vibe@mistral.ai>
| _PRIMITIVE_TYPES: Final[tuple[type, ...]] = (str, bool, int, float) | ||
|
|
||
|
|
||
| class AttributeRedactionPolicy(RedactionPolicy): |
There was a problem hiding this comment.
nit: worth separating the implementations over multiple files?
| from mistralai.extra.observability import AttributeRedactionPolicy, configure_telemetry | ||
|
|
||
|
|
||
| def main() -> None: | ||
| api_key = os.environ["MISTRAL_API_KEY"] | ||
|
|
||
| with Mistral(api_key=api_key) as client: | ||
| configure_telemetry(client, redaction=AttributeRedactionPolicy()) |
There was a problem hiding this comment.
In the various examples, worth showing how to extend the defaults and provide an example where an attribute will be redacted? e.g.
| from mistralai.extra.observability import AttributeRedactionPolicy, configure_telemetry | |
| def main() -> None: | |
| api_key = os.environ["MISTRAL_API_KEY"] | |
| with Mistral(api_key=api_key) as client: | |
| configure_telemetry(client, redaction=AttributeRedactionPolicy()) | |
| from mistralai.extra.observability import ( | |
| AttributeRedactionPolicy, | |
| configure_telemetry, | |
| ) | |
| from mistralai.extra.observability.redaction import DEFAULT_SENSITIVE_ATTRIBUTE_KEYS | |
| def main() -> None: | |
| api_key = os.environ["MISTRAL_API_KEY"] | |
| server_url = os.environ.get("MISTRAL_SERVER_URL") | |
| with Mistral(api_key=api_key, server_url=server_url) as client: | |
| configure_telemetry( | |
| client, | |
| redaction=AttributeRedactionPolicy( | |
| sensitive_keys=DEFAULT_SENSITIVE_ATTRIBUTE_KEYS | |
| | {"telemetry.sdk.language"} | |
| ), | |
| ) |
rbarbadillo
left a comment
There was a problem hiding this comment.
generally lgtm! just a small edge-case to cover
| if hook._auto_telemetry_provider is not None: | ||
| return True |
There was a problem hiding this comment.
Should this early return take replace_existing into account?
| if hook._auto_telemetry_provider is not None: | |
| if not replace_existing: | |
| return True | |
| _shutdown_telemetry_provider(hook) |
I think an app can reasonably do:
configure_telemetry(client)
# ... some beautiful code ...
configure_telemetry(client, redaction=False)but the second call keeps the first RedactingSpanExporter because we return here before rebuilding the provider. Maybe when replace_existing=True, we should shutdown/recreate the auto provider so the new redaction setting actually applies.
What
Adds a client-side redaction layer for OpenTelemetry spans so PII and secrets never leave the machine. The core primitive is reusable by any OTEL application, and the Mistral SDK installs it automatically when it owns the exporter.
Also add documentation and examples for the observability features of the SDK.
The primitive
RedactingSpanExporterwraps anySpanExporterand redacts each span before delegating the actual export:Redaction covers the whole span surface: attributes, events, links, resource attributes, span name, and status description.
The module requires the optional OpenTelemetry SDK (the
telemetryextra) to run, but not to import — so it stays importable in environments without the extra.Why an exporter wrapper (and not span-creation-time redaction)
Redaction is deliberately placed in a
SpanExporterdecorator, invoked at export time, rather than when spans/attributes are created. In dedicated mode the SDK wires:This has real, concrete benefits:
BatchSpanProcessor(what the SDK installs), export runs on a dedicated background thread, in batches. The regex scanning and span rebuild happen there — not on the threads serving the user'schat/embeddingscalls — so redaction adds no latency to application requests. Cost is amortized across a batch and absorbed by the exporter thread; under extreme load it manifests as export backpressure, never as slower API calls.SpanExporterdecorator with no Mistral coupling, so any OpenTelemetry application can wrap its own exporter — which is exactly what users do inglobal/custom-provider mode.Caveat: the "off the hot path" property depends on an asynchronous processor.
BatchSpanProcessor(installed by the SDK) gives it; aSimpleSpanProcessorwould export — and therefore redact — synchronously on span end.Policies
RegexRedactionPolicy(default,redaction=True)AttributeRedactionPolicyCallbackRedactionPolicy(redaction=<callable>)(key, value) -> value | Nonemasker per attribute; returnNoneto drop the attribute.SDK integration
configure_telemetrygains aredactionargument:True(default) — default (regex) policyFalse— redaction disabledRedactionPolicyinstance (e.g.AttributeRedactionPolicy())(key, value) -> value | NonecallbackRedaction only applies in dedicated provider mode, where the SDK owns the exporter. In
global/custom-provider modes the application owns the export pipeline, so the argument is ignored and a warning is logged — wrap your own exporter withRedactingSpanExporterin that case.Tests
test_redaction.pycovering the policies, span rebuild, and the exporterwrapper.
test_telemetry.pyfor theredactionwiring and theignored-argument warnings.