feat(sandbox): seccomp-notify DNS-pinned allowlist for Platform mode#17
Open
Ladas wants to merge 3 commits into
Open
feat(sandbox): seccomp-notify DNS-pinned allowlist for Platform mode#17Ladas wants to merge 3 commits into
Ladas wants to merge 3 commits into
Conversation
408aa3b to
2446c42
Compare
9ec5718 to
179d108
Compare
Ladas
added a commit
that referenced
this pull request
Jun 12, 2026
Add kernel-level network syscall interception using SECCOMP_RET_USER_NOTIF for Platform mode. Provides mandatory, syscall-level enforcement without any capabilities. DnsPinnedAllowlist: resolve domains to IPs at sandbox creation, freeze for session lifetime (DNS rebinding prevention). BPF filter intercepts: connect, sendto, sendmsg, recvfrom, recvmsg, bind. Validates AUDIT_ARCH to prevent x32/compat ABI bypass. Linux syscall wrappers: notification fd ioctls, pidfd_open/pidfd_getfd for on-behalf-of operations (TOCTOU-safe), read_process_memory with read_exact (no short reads), sockaddr parser (correct endianness for sa_family, port, flowinfo), verify_socket_fd (mitigates fd-swap race), deny/allow_connect response helpers. Code review fixes applied across all PRs: - PR #15: gateway propagates network_enforcement to DriverSandboxSpec - PR #15: driver uses typed enum comparison (not magic integer) - PR #16: saturating_sub prevents underflow in Landlock skipped count - PR #16: warn!() on TCP port restriction failure (was debug) - PR #17: BPF arch check, recvfrom/recvmsg/bind interception, verify_socket_fd, read_exact, allow_connect rename, flowinfo endianness, safety comments on all unsafe blocks 8 tests. Compiles, 949 tests pass, clippy clean. Ref: NVIDIA#899
2446c42 to
6078a8e
Compare
Add NetworkMode::Platform that enables the OpenShell supervisor to run without any elevated capabilities on Kubernetes platforms enforcing the restricted Pod Security Standard (e.g. OpenShift restricted-v2 SCC). Platform Mode keeps Landlock filesystem isolation, seccomp syscall filtering, OPA policy evaluation, credential injection, and L7 inspection via a loopback CONNECT proxy. It replaces the network namespace (which requires CAP_SYS_ADMIN + CAP_NET_ADMIN) with Kubernetes NetworkPolicy for L3/L4 egress control. Changes: - proto: add NetworkEnforcementMode enum to SandboxPolicy (field 6) and DriverSandboxSpec (field 12), backward-compatible - sandbox: add Platform variant to NetworkMode, wire TryFrom conversion - sandbox: skip netns, bind proxy to loopback (127.0.0.1:3128) - sandbox: allow AF_INET sockets in seccomp for Platform mode - driver-k8s: zero capabilities (drop ALL), typed enum comparison - server: propagate network_enforcement to DriverSandboxSpec Ref: NVIDIA#899 Signed-off-by: Ladislav Smola <lsmola@redhat.com>
When Platform mode is active, apply Landlock ABI v4 network rules to restrict TCP connect to only the proxy port (default 3128). This makes the loopback CONNECT proxy mandatory at the kernel level. Graceful degradation: if kernel ABI < v4, the network rules are silently skipped with a warn-level log and enforcement falls back to the cooperative proxy + NetworkPolicy. Also fixes rules_applied underflow via saturating_sub. Ref: NVIDIA#899 Signed-off-by: Ladislav Smola <lsmola@redhat.com>
179d108 to
59d148a
Compare
Add kernel-level network syscall interception using SECCOMP_RET_USER_NOTIF. DnsPinnedAllowlist: DNS pinning at sandbox creation, frozen IPs. BPF filter intercepts: connect, sendto, sendmsg, recvfrom, recvmsg, bind. Validates AUDIT_ARCH (prevents x32/compat bypass). Syscall wrappers: notification fd ioctls, pidfd_open/pidfd_getfd (TOCTOU-safe on-behalf-of), verify_socket_fd (fd-swap mitigation), read_process_memory (read_exact), sockaddr parser (correct endianness), deny/allow_connect helpers. Ref: NVIDIA#899 Signed-off-by: Ladislav Smola <lsmola@redhat.com>
6078a8e to
254154b
Compare
59d148a to
8354d68
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Foundation for kernel-level connect() interception using seccomp-notify.
Adds
DnsPinnedAllowlistmodule: resolves allowed domains to IPs atsandbox creation, freezes them for the session (prevents DNS rebinding).
The notification event loop and on-behalf-of operations (pidfd_getfd)
will be wired once OPA policy integration is complete.
Depends on: #16 (Landlock TCP port restriction) → #15 (Platform mode base)
2 files, +135 lines. 820 tests pass, clippy clean.
What this PR adds
DnsPinnedAllowlist: resolve domains, pin IPs, check connect targetsWhat's NOT in this PR (follow-up)
SECCOMP_FILTER_FLAG_NEW_LISTENER)pidfd_getfd()Ref: NVIDIA#899
Assisted-By: Claude Code