Security Profiles¶

Profiles are the core configuration mechanism in ZViz. They define what a container can and cannot do, and how resources are limited.

What is a Profile?¶

A profile is a YAML file that specifies:

Syscalls: Which system calls are allowed, denied, or brokered
Filesystem: What paths can be read/written
Network: What network destinations are reachable
Resources: CPU, memory, and process limits
Capabilities: Which Linux capabilities are retained

Profile Structure¶

name: my-profile
version: 1.0
description: "Description of this profile's purpose"

# Syscall control
syscalls:
  allow:
    - read
    - write
    - exit_group
  deny:
    - mount
    - bpf
    - init_module
  broker:
    - openat
    - socket
    - clone

# Filesystem access
filesystem:
  readonly:
    - /usr
    - /lib
    - /etc
  writable:
    - /tmp
    - /work
  hidden:
    - /etc/shadow
    - /root

# Network policy
network:
  egress:
    allow:
      - 10.0.0.0/8
      - 172.16.0.0/12
    deny:
      - 0.0.0.0/0
  ingress:
    deny_all: true

# Resource limits
resources:
  memory_max: "512M"
  cpu_max: "100000 100000"  # quota period (100%)
  pids_max: 100
  io_max:
    - device: "8:0"
      rbps: 10485760  # 10MB/s

# Linux capabilities
capabilities:
  keep:
    - CAP_NET_BIND_SERVICE
  drop_all: true

Profile Categories¶

Syscalls¶

Syscalls are divided into three categories:

Category	Action	Use Case
`allow`	Execute directly	Safe syscalls (read, write, exit)
`deny`	Return EPERM	Dangerous syscalls (mount, bpf)
`broker`	Mediate via broker	Syscalls needing arg inspection

syscalls:
  # Fast path - no broker overhead
  allow:
    - read
    - write
    - close
    - mmap
    - munmap
    - brk

  # Blocked immediately
  deny:
    - mount
    - umount2
    - bpf
    - init_module
    - delete_module
    - reboot
    - kexec_load

  # Inspected by broker
  broker:
    - openat      # Path validation
    - socket      # Domain/type filtering
    - clone       # Flag validation
    - ioctl       # Command filtering
    - prctl       # Operation filtering

Filesystem¶

Control file and directory access:

filesystem:
  # Read-only paths
  readonly:
    - /usr
    - /lib
    - /lib64
    - /etc
    - /bin
    - /sbin

  # Writable paths
  writable:
    - /tmp
    - /var/tmp
    - /work

  # Hidden from container
  hidden:
    - /etc/shadow
    - /etc/gshadow
    - /root/.ssh

  # Executable paths (for execve)
  executable:
    - /usr/bin
    - /bin
    - /usr/local/bin

Network¶

Define network access rules:

network:
  # Outbound connections
  egress:
    allow:
      - 10.0.0.0/8       # Private networks
      - 172.16.0.0/12
      - 192.168.0.0/16
      - 169.254.169.254/32  # Cloud metadata
    deny:
      - 0.0.0.0/0        # Block public internet

  # Inbound connections
  ingress:
    deny_all: true       # No incoming connections

  # DNS proxy (optional)
  dns_proxy:
    enabled: true
    upstream: "10.0.0.53:53"

  # Socket types allowed
  sockets:
    allow:
      - tcp
      - udp
      - unix
    deny:
      - raw
      - netlink

Resources¶

Set resource limits via cgroups v2:

resources:
  # Memory limit (bytes or human-readable)
  memory_max: "256M"

  # Memory + swap limit
  memory_swap_max: "512M"

  # CPU quota: "quota period" in microseconds
  # 50000 100000 = 50% of one CPU
  cpu_max: "50000 100000"

  # Maximum number of processes
  pids_max: 50

  # I/O limits per device
  io_max:
    - device: "8:0"
      rbps: 10485760   # Read bytes/sec
      wbps: 5242880    # Write bytes/sec
      riops: 1000      # Read IOPS
      wiops: 500       # Write IOPS

Capabilities¶

Control Linux capabilities:

capabilities:
  # Drop all capabilities first
  drop_all: true

  # Then add back specific ones
  keep:
    - CAP_NET_BIND_SERVICE  # Bind to ports < 1024
    - CAP_DAC_OVERRIDE      # Bypass file permissions

Available capabilities:

Capability	Purpose
`CAP_NET_BIND_SERVICE`	Bind to privileged ports
`CAP_SYS_PTRACE`	Use ptrace
`CAP_SETUID`	Change UID
`CAP_SETGID`	Change GID
`CAP_CHOWN`	Change file ownership
`CAP_DAC_OVERRIDE`	Bypass file permissions

Security Risk

Adding capabilities increases attack surface. Only add what's absolutely necessary.

Profile Inheritance¶

Profiles can extend other profiles:

name: my-extended-profile
extends: ci-runner

# Add additional permissions
syscalls:
  allow:
    - ptrace  # For debugging

filesystem:
  writable:
    - /custom/path

Profile Compilation¶

Profiles are compiled into enforcement artifacts:

# Compile a profile
zviz compile my-profile.yaml

# Output files
my-profile.bpf       # Seccomp BPF program
my-profile.apparmor  # AppArmor profile
my-profile.nft       # nftables rules
my-profile.broker    # Broker rule table
my-profile.manifest  # Compilation manifest

Validation¶

Validate profiles before use:

# Check syntax and semantics
zviz compile --validate my-profile.yaml

# Check against host capabilities
zviz compile --check-host my-profile.yaml

Best Practices¶

1. Principle of Least Privilege¶

Start with minimal permissions and add only what's needed:

# Bad - too permissive
syscalls:
  deny: [mount, bpf]  # Implicit allow for everything else

# Good - explicit allowlist
syscalls:
  allow: [read, write, openat, close, exit_group]
  deny: ["*"]

2. Use Wildcards Carefully¶

# Dangerous - blocks all syscalls
syscalls:
  deny: ["*"]

# Better - explicit lists
syscalls:
  deny:
    - mount
    - bpf
    - init_module

3. Test in Audit Mode¶

# Run with audit mode to discover required syscalls
sudo zviz run --audit --profile my-profile container . /bin/my-app

# Review audit log
jq '.[] | select(.decision == "denied")' /var/log/zviz/audit.json

4. Document Your Profile¶

name: my-profile
description: |
  Profile for Node.js web applications.
  Allows network access to internal services only.
  Requires /work mount for application code.

# ... rest of profile