Configuration Schema Reference¶

Complete schema reference for the gpuemu.toml configuration file. This file defines your project, operations, kernels, validation policies, and CI settings.

Config Discovery

gpuemu searches for gpuemu.toml starting from the current directory and walking up the directory tree. Override this with the GPUEMU_CONFIG environment variable.

`[project]`¶

Top-level project metadata.

Field	Type	Default	Description
`name`	`String`	`"unnamed"`	Project name used in reports and baselines
`version`	`Option<String>`	`None`	Optional project version string
`framework`	`Option<String>`	`None`	Default framework: `"pytorch"`, `"jax"`, or `"tensorflow"`

[project]
name = "my-kernels"
version = "0.2.1"
framework = "pytorch"

`[validation]`¶

Global validation settings that apply to all ops and kernels unless overridden.

Field	Type	Default	Description
`dtypes`	`Vec<String>`	`["float32"]`	Data types to validate by default
`check_nan`	`bool`	`true`	Check for NaN values in outputs
`check_inf`	`bool`	`true`	Check for Inf values in outputs
`seed`	`Option<u64>`	`None`	Global RNG seed for reproducible runs
`tolerances`	`HashMap<String, f64>`	(see below)	Per-dtype absolute tolerance thresholds

Default Tolerances

Dtype	Default Tolerance
`float32`	`1e-5`
`float16`	`1e-3`
`bfloat16`	`1e-3`

[validation]
dtypes = ["float32", "float16"]
check_nan = true
check_inf = true
seed = 42

[validation.tolerances]
float32 = 1e-5
float16 = 1e-3
bfloat16 = 1e-3

Note

The tolerances map uses dtype names as keys and absolute tolerance values as floating-point numbers. Any dtype not listed falls back to the float32 default.

`[[ops]]`¶

Define operations to validate. Each [[ops]] entry describes a single op with its reference implementation and validation parameters.

Field	Type	Default	Description
`name`	`String`	(required)	Unique name for this op
`module`	`Option<String>`	`None`	Python module path for the op
`reference`	`String`	(required)	Path to the reference script
`op_script`	`Option<String>`	`None`	Path to an optional op-under-test script
`input_names`	`Vec<String>`	`[]`	Names of the input tensors
`execution_mode`	`String`	`"client_side"`	Execution mode: `"client_side"` or `"daemon_side"`
`frameworks`	`Vec<String>`	`[]`	Frameworks this op supports
`tolerances`	`HashMap<String, f64>`	`{}`	Per-dtype tolerances overriding the global defaults
`invariants`	`InvariantConfig`	(see below)	Output invariant checks

[[ops]]
name = "softmax"
reference = "scripts/softmax_ref.py"
op_script = "scripts/softmax_op.py"
input_names = ["logits"]
execution_mode = "client_side"
frameworks = ["pytorch", "jax"]

[ops.tolerances]
float32 = 1e-6
float16 = 5e-4

[ops.invariants]
non_negative = true
no_nan = true

`[[ops]].invariants`¶

Invariant checks applied to op outputs after numerical comparison.

Field	Type	Default	Description
`non_negative`	`bool`	`false`	Assert all output values are >= 0
`shape_preserved`	`bool`	`false`	Assert output shape matches input shape
`no_nan`	`bool`	`false`	Assert no NaN values in output
`no_inf`	`bool`	`false`	Assert no Inf values in output

[ops.invariants]
non_negative = true
shape_preserved = true
no_nan = true
no_inf = true

Info

Invariant checks run independently of numerical tolerance checks. An op can pass the tolerance check but fail an invariant check.

`[[kernels]]`¶

Define compiled GPU kernels to validate and lint. Each [[kernels]] entry describes a kernel with its source, reference, and artifact-level checks.

Field	Type	Default	Description
`name`	`String`	(required)	Unique name for this kernel
`source`	`Option<String>`	`None`	Path to the kernel source file (e.g., `.cu`)
`reference`	`String`	(required)	Path to the reference script
`tolerances`	`HashMap<String, f64>`	`{}`	Per-dtype tolerances overriding the global defaults
`invariants`	`InvariantConfig`	(defaults)	Output invariant checks (same schema as `[[ops]].invariants`)
`artifact_checks`	`ArtifactCheckConfig`	(see below)	Artifact-level resource and pattern checks

[[kernels]]
name = "fused_softmax"
source = "kernels/fused_softmax.cu"
reference = "scripts/softmax_ref.py"

[kernels.tolerances]
float32 = 1e-5

[kernels.invariants]
non_negative = true
no_nan = true

`[[kernels]].artifact_checks`¶

Resource usage and pattern checks applied to compiled kernel artifacts (PTX assembly).

Field	Type	Default	Description
`max_registers`	`u32`	`64`	Maximum number of registers per thread
`max_spills`	`u32`	`0`	Maximum number of register spills allowed
`max_local_memory`	`u32`	`0`	Maximum local memory usage in bytes
`required_patterns`	`Vec<String>`	`[]`	PTX patterns that must appear in the artifact
`forbidden_patterns`	`Vec<String>`	`[]`	PTX patterns that must not appear in the artifact

[kernels.artifact_checks]
max_registers = 48
max_spills = 0
max_local_memory = 0
required_patterns = ["shared.f32"]
forbidden_patterns = ["spill", "local.f32"]

Warning

Setting max_spills = 0 is strict and will fail if the compiler introduces any register spills. Increase this value if your kernel legitimately requires spills.

`[policies]`¶

Global policies that govern how gpuemu treats validation results.

Field	Type	Default	Description
`fail_on_regression`	`bool`	`true`	Treat numerical regressions as failures
`warn_threshold`	`f64`	`0.1`	Tolerance delta above which a warning is emitted

[policies]
fail_on_regression = true
warn_threshold = 0.1

`[ci]`¶

Settings specific to CI pipeline execution (gpuemu ci).

Field	Type	Default	Description
`quick_dtypes`	`Vec<String>`	`["float32"]`	Dtypes used in `--quick` CI mode
`thorough_timeout`	`u64`	`3600`	Timeout in seconds for thorough CI runs
`parallel_jobs`	`u32`	`4`	Default number of parallel validation jobs

[ci]
quick_dtypes = ["float32"]
thorough_timeout = 3600
parallel_jobs = 4

Complete Example¶

A full gpuemu.toml demonstrating all sections:

[project]
name = "my-gpu-kernels"
version = "1.0.0"
framework = "pytorch"

[validation]
dtypes = ["float32", "float16", "bfloat16"]
check_nan = true
check_inf = true
seed = 42

[validation.tolerances]
float32 = 1e-5
float16 = 1e-3
bfloat16 = 1e-3

[[ops]]
name = "softmax"
reference = "scripts/softmax_ref.py"
op_script = "scripts/softmax_op.py"
input_names = ["logits"]
execution_mode = "client_side"
frameworks = ["pytorch", "jax"]

[ops.tolerances]
float32 = 1e-6
float16 = 5e-4

[ops.invariants]
non_negative = true
shape_preserved = true
no_nan = true
no_inf = false

[[ops]]
name = "layernorm"
reference = "scripts/layernorm_ref.py"
input_names = ["x", "weight", "bias"]
execution_mode = "daemon_side"
frameworks = ["pytorch"]

[ops.invariants]
no_nan = true

[[kernels]]
name = "fused_softmax"
source = "kernels/fused_softmax.cu"
reference = "scripts/softmax_ref.py"

[kernels.tolerances]
float32 = 1e-5

[kernels.invariants]
non_negative = true
no_nan = true

[kernels.artifact_checks]
max_registers = 48
max_spills = 0
max_local_memory = 0
required_patterns = ["shared.f32"]
forbidden_patterns = ["spill"]

[[kernels]]
name = "fused_layernorm"
source = "kernels/fused_layernorm.cu"
reference = "scripts/layernorm_ref.py"

[kernels.artifact_checks]
max_registers = 64
max_spills = 2
max_local_memory = 512

[policies]
fail_on_regression = true
warn_threshold = 0.05

[ci]
quick_dtypes = ["float32"]
thorough_timeout = 1800
parallel_jobs = 8

Configuration Schema Reference¶

[project]¶

[validation]¶

[[ops]]¶

[[ops]].invariants¶

[[kernels]]¶

[[kernels]].artifact_checks¶

[policies]¶

[ci]¶