Skip to content

feat(sandbox): seccomp-notify DNS-pinned allowlist for Platform mode#17

Open
Ladas wants to merge 3 commits into
feat/landlock-tcp-portfrom
feat/seccomp-notify
Open

feat(sandbox): seccomp-notify DNS-pinned allowlist for Platform mode#17
Ladas wants to merge 3 commits into
feat/landlock-tcp-portfrom
feat/seccomp-notify

Conversation

@Ladas

@Ladas Ladas commented Jun 12, 2026

Copy link
Copy Markdown

Summary

Foundation for kernel-level connect() interception using seccomp-notify.
Adds DnsPinnedAllowlist module: resolves allowed domains to IPs at
sandbox creation, freezes them for the session (prevents DNS rebinding).

The notification event loop and on-behalf-of operations (pidfd_getfd)
will be wired once OPA policy integration is complete.

Depends on: #16 (Landlock TCP port restriction) → #15 (Platform mode base)

2 files, +135 lines. 820 tests pass, clippy clean.

What this PR adds

  • DnsPinnedAllowlist: resolve domains, pin IPs, check connect targets
  • Loopback always allowed (proxy address)
  • 4 unit tests
  • Full rustdoc (architecture, TOCTOU safety, requirements, references)

What's NOT in this PR (follow-up)

  • seccomp filter installation (SECCOMP_FILTER_FLAG_NEW_LISTENER)
  • Notification event loop (async read from notification fd)
  • On-behalf-of connect via pidfd_getfd()
  • Fork-based supervisor architecture in lib.rs
  • Integration with OPA network policies

Ref: NVIDIA#899

Assisted-By: Claude Code

@Ladas Ladas force-pushed the feat/seccomp-notify branch 2 times, most recently from 408aa3b to 2446c42 Compare June 12, 2026 16:14
@Ladas Ladas force-pushed the feat/landlock-tcp-port branch from 9ec5718 to 179d108 Compare June 12, 2026 16:26
Ladas added a commit that referenced this pull request Jun 12, 2026
Add kernel-level network syscall interception using SECCOMP_RET_USER_NOTIF
for Platform mode. Provides mandatory, syscall-level enforcement without
any capabilities.

DnsPinnedAllowlist: resolve domains to IPs at sandbox creation, freeze
for session lifetime (DNS rebinding prevention).

BPF filter intercepts: connect, sendto, sendmsg, recvfrom, recvmsg,
bind. Validates AUDIT_ARCH to prevent x32/compat ABI bypass.

Linux syscall wrappers: notification fd ioctls, pidfd_open/pidfd_getfd
for on-behalf-of operations (TOCTOU-safe), read_process_memory with
read_exact (no short reads), sockaddr parser (correct endianness for
sa_family, port, flowinfo), verify_socket_fd (mitigates fd-swap race),
deny/allow_connect response helpers.

Code review fixes applied across all PRs:
- PR #15: gateway propagates network_enforcement to DriverSandboxSpec
- PR #15: driver uses typed enum comparison (not magic integer)
- PR #16: saturating_sub prevents underflow in Landlock skipped count
- PR #16: warn!() on TCP port restriction failure (was debug)
- PR #17: BPF arch check, recvfrom/recvmsg/bind interception,
  verify_socket_fd, read_exact, allow_connect rename, flowinfo
  endianness, safety comments on all unsafe blocks

8 tests. Compiles, 949 tests pass, clippy clean.

Ref: NVIDIA#899
@Ladas Ladas force-pushed the feat/seccomp-notify branch from 2446c42 to 6078a8e Compare June 12, 2026 16:28
Add NetworkMode::Platform that enables the OpenShell supervisor to run
without any elevated capabilities on Kubernetes platforms enforcing the
restricted Pod Security Standard (e.g. OpenShift restricted-v2 SCC).

Platform Mode keeps Landlock filesystem isolation, seccomp syscall
filtering, OPA policy evaluation, credential injection, and L7
inspection via a loopback CONNECT proxy. It replaces the network
namespace (which requires CAP_SYS_ADMIN + CAP_NET_ADMIN) with
Kubernetes NetworkPolicy for L3/L4 egress control.

Changes:
- proto: add NetworkEnforcementMode enum to SandboxPolicy (field 6)
  and DriverSandboxSpec (field 12), backward-compatible
- sandbox: add Platform variant to NetworkMode, wire TryFrom conversion
- sandbox: skip netns, bind proxy to loopback (127.0.0.1:3128)
- sandbox: allow AF_INET sockets in seccomp for Platform mode
- driver-k8s: zero capabilities (drop ALL), typed enum comparison
- server: propagate network_enforcement to DriverSandboxSpec

Ref: NVIDIA#899
Signed-off-by: Ladislav Smola <lsmola@redhat.com>
When Platform mode is active, apply Landlock ABI v4 network rules to
restrict TCP connect to only the proxy port (default 3128). This makes
the loopback CONNECT proxy mandatory at the kernel level.

Graceful degradation: if kernel ABI < v4, the network rules are
silently skipped with a warn-level log and enforcement falls back
to the cooperative proxy + NetworkPolicy.

Also fixes rules_applied underflow via saturating_sub.

Ref: NVIDIA#899
Signed-off-by: Ladislav Smola <lsmola@redhat.com>
@Ladas Ladas force-pushed the feat/landlock-tcp-port branch from 179d108 to 59d148a Compare June 16, 2026 14:06
Add kernel-level network syscall interception using SECCOMP_RET_USER_NOTIF.

DnsPinnedAllowlist: DNS pinning at sandbox creation, frozen IPs.

BPF filter intercepts: connect, sendto, sendmsg, recvfrom, recvmsg,
bind. Validates AUDIT_ARCH (prevents x32/compat bypass).

Syscall wrappers: notification fd ioctls, pidfd_open/pidfd_getfd
(TOCTOU-safe on-behalf-of), verify_socket_fd (fd-swap mitigation),
read_process_memory (read_exact), sockaddr parser (correct endianness),
deny/allow_connect helpers.

Ref: NVIDIA#899
Signed-off-by: Ladislav Smola <lsmola@redhat.com>
@Ladas Ladas force-pushed the feat/seccomp-notify branch from 6078a8e to 254154b Compare June 16, 2026 14:53
@Ladas Ladas force-pushed the feat/landlock-tcp-port branch from 59d148a to 8354d68 Compare June 16, 2026 15:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant