Threat Model¶
ZViz isolates a container workload with five layers of kernel-level enforcement. This page states the trust boundary explicitly so users can decide whether the guarantees match their deployment.
Trusted computing base¶
| Component | Why it is trusted |
|---|---|
| Host kernel | All enforcement primitives run in it; a kernel CVE that escapes seccomp/Landlock/namespace isolation defeats every container runtime equally. ZViz reduces but does not eliminate the host kernel surface reachable from inside a container (docs/architecture/comparison.md for the measured number). |
| ZViz parent supervisor process | Sets up the cgroup, writes the user namespace's uid/gid maps, applies Landlock and seccomp in the child before execve. Compromise of the parent before the child enters the sandbox is fatal to isolation. |
| In-kernel BPF interpreter | Evaluates the seccomp filter; assumed correct. |
| The broker process (when a profile configures one) | Mediates argument-sensitive syscalls. The default profile ships with an empty broker set, so most deployments do not need to trust a userspace broker at all. |
Attacker capabilities¶
The container workload is untrusted. The attacker model is:
- Arbitrary code execution as container-root inside the user namespace (mapped to the invoking user on the host).
- May issue any syscall, open any path the policy permits, and deliberately probe for escape primitives.
- Cannot inject code into the parent supervisor or alter the BPF program after
execve(NO_NEW_PRIVS plus the irrevocablelandlock_restrict_selfenforce this).
We do not assume the workload is benign or buggy. The security evaluation in the accompanying paper runs real escape payloads inside the runtime and records the kernel-level outcome.
In scope¶
- Container escape via the syscall interface.
- Capability escalation inside the namespace.
- Filesystem-access policy violation (paths not granted by Landlock or by an OCI
mounts[]entry). - Network policy violation (socket-domain filtering blocks
AF_PACKET/AF_NETLINK/etc. by default). - Resource exhaustion of the host (cgroup memory and pids limits).
Out of scope¶
The same boundary gVisor draws for its own model:
- Hardware side channels and firmware attacks.
- Supply-chain attacks on the runtime itself (a malicious ZViz binary is not a container-escape problem; verify the build).
- Hypervisor escape (irrelevant; ZViz does not run a hypervisor).
- Memory-primitive kernel CVEs that work through permitted syscalls (Dirty COW, Dirty Pipe). A syscall filter cannot block these; defending against them is the host kernel's job.
We do not assume the host kernel is bug-free. Instead the paper measures how much of it each runtime exposes, rather than asserting that the residual surface is safe.
What changes if you opt into the broker¶
Enabling broker-mediated syscalls (e.g. path-restricted openat) adds the
broker process to the TCB. Because SECCOMP_RET_USER_NOTIF dereferences
pointer arguments in the target's address space, naive brokers are vulnerable
to a time-of-check/time-of-use race. ZViz validates the notification id with
SECCOMP_IOCTL_NOTIF_ID_VALID before acting; profiles that ship a broker
should review this invariant.