Common Issues¶
This page covers the most frequently encountered problems when using gpuemu, along with their causes and solutions.
1. Daemon Won't Start¶
Symptom
Running gpuemu daemon start hangs, exits silently, or prints an error about the socket file.
Solution
Check if a socket file already exists from a previous crashed daemon:
If the file exists but no daemon is running, remove it manually:
Check if another daemon instance is already running:
If a daemon process exists, either use it or kill it before starting a new one.
Check the daemon logs for errors:
Common log errors include port conflicts, permission issues on ~/.gpuemu/, or corrupted database files.
2. Connection Refused / Socket Not Found¶
Symptom
CLI or Python client reports ConnectionRefused, Socket not found, or Failed to connect to daemon.
Solution
The daemon is not running. Start it:
Verify it is listening:
If you are using a custom socket path, make sure the GPUEMU_SOCKET environment variable is set correctly in both the daemon and client:
export GPUEMU_SOCKET=/path/to/custom.sock
gpuemu daemon start --background
gpuemu daemon status # Should connect to the custom socket
If the daemon is running but the client still cannot connect, check that the socket file exists and has the correct permissions:
3. Reference Script Fails¶
Symptom
Validation returns ReferenceScriptFailed with the reference script's stderr output.
Solution
Reference scripts must follow a strict protocol: read JSON from stdin, write JSON to stdout. The most common causes of failure:
Stray print() statements that corrupt stdout:
# BAD -- this output will be mixed into the JSON response
print("Debug: processing inputs")
# GOOD -- use stderr for debug output
import sys
print("Debug: processing inputs", file=sys.stderr)
Test the script manually to verify it works:
echo '{"inputs": {"a": {"data": "AAAAAAAAAIA/", "dtype": "float32", "shape": [1, 1]}}, "kwargs": {}}' | python scripts/my_ref.py
You should see clean JSON output with no extra text.
Missing dependencies: Ensure the reference script's dependencies (NumPy, etc.) are installed in the same Python environment that the daemon uses.
Syntax errors or exceptions: Run the script directly to see the full traceback:
4. Tolerance Exceeded Warnings¶
Symptom
Validation fails with messages like max absolute diff 2.3e-4 exceeds tolerance 1e-5 or you see warnings about tolerance consumption exceeding the warning threshold.
Solution
The configured tolerance is too tight for your dtype and operation. Some operations (e.g., reductions, softmax, layer normalization) accumulate more floating-point error than element-wise ops.
Use recommended tolerances based on empirical measurement:
from gpuemu.tolerances import get_recommended_tolerance
rec = get_recommended_tolerance("softmax", dtype="float32")
print(rec) # {"atol": 1e-4, "rtol": 1e-4}
Calibrate tolerances by running multiple iterations and finding the minimum safe threshold:
from gpuemu.tolerances import calibrate_tolerance
result = calibrate_tolerance(client, "softmax", dtype="float32", iterations=200)
print(result) # {"atol": 5.2e-5, "rtol": 4.8e-5}
Apply operation-specific multipliers in gpuemu.toml:
[[ops]]
name = "softmax"
reference = "scripts/softmax_ref.py"
[ops.tolerances]
float32 = { atol = 1e-4, rtol = 1e-4 } # Looser than global default
float16 = { atol = 5e-2, rtol = 5e-2 }
As a general rule: reductions, normalizations, and operations involving exp() or log() need looser tolerances than element-wise operations.
5. NaN/Inf Detected in Output¶
Symptom
Validation fails with NaN detected in output or Inf detected in output.
Solution
Check your reference implementation for common numerical hazards:
- Division by zero -- Add epsilon to denominators:
x / (y + 1e-8)instead ofx / y - Log of zero -- Clamp inputs:
np.log(np.maximum(x, 1e-8)) - Exp overflow -- Use numerically stable implementations:
log_softmaxinstead oflog(softmax(x)) - Large intermediate values -- Consider computing in higher precision, then casting down
If NaN or Inf values are expected behavior for your operation (e.g., an op that intentionally produces Inf for certain inputs), disable the checks for that specific op:
[[ops]]
name = "my_op"
reference = "scripts/my_op_ref.py"
# Disable NaN/Inf checks for this op only
check_nan = false
check_inf = false
Or disable globally in [validation] (not recommended):
6. Shape Mismatch Errors¶
Symptom
Validation fails with Shape mismatch: expected (4, 8) but got (8, 4) or similar.
Solution
The output shape from your op must exactly match the output shape from the reference script. Common causes:
Transposed output: Your op returns a transposed result compared to the reference. Check the dimension ordering in both implementations.
Missing batch dimension: Your reference script may not handle batched inputs correctly. Verify the reference works for both single and batched inputs.
Squeeze/unsqueeze mismatch: One implementation drops a size-1 dimension that the other preserves. Be explicit about shapes:
# In your reference script, be explicit about output shape
result = np.matmul(a, b)
assert result.shape == expected_shape, f"Expected {expected_shape}, got {result.shape}"
Debug by printing shapes to stderr in your reference script:
7. pynng Import Errors¶
Solution
Install pynng with the correct version:
Or install the full gpuemu Python package which includes it as a dependency:
On macOS, pynng compiles native NNG bindings and may require CMake:
In virtual environments, make sure you are installing into the correct environment:
8. Slow Validation¶
Symptom
Validation takes many seconds per op, or fuzzing runs are unacceptably slow.
Solution
Reference scripts are spawned as Python subprocesses for each validation call, which includes Python interpreter startup time. To improve performance:
Use --quick mode for faster iteration during development:
This tests only the primary dtype (typically float32) instead of all configured dtypes.
Keep reference scripts simple: Minimize imports and avoid heavy initialization. NumPy is usually sufficient -- do not import PyTorch or TensorFlow in reference scripts.
# SLOW -- imports a large framework
import torch
result = torch.matmul(torch.from_numpy(a), torch.from_numpy(b)).numpy()
# FAST -- uses NumPy directly
import numpy as np
result = np.matmul(a, b)
Use daemon-orchestrated mode for batch runs and CI. The daemon keeps reference scripts warm and batches invocations:
Increase parallel jobs in gpuemu.toml for CI:
9. Artifact Linting Fails¶
Symptom
gpuemu lint reports errors about cuobjdump not available or fails with PTX parse errors.
Solution
cuobjdump is not available: This tool is part of the CUDA toolkit and is only available on Linux with CUDA installed. On macOS, artifact inspection is optional and can be skipped:
# Skip artifact checks -- validation still works
gpuemu test # Artifact linting is separate from validation
Artifact linting is invoked explicitly via gpuemu lint or through kernel definitions in gpuemu.toml. If you do not have CUDA tools, you can safely omit [[kernels]] sections from your configuration.
PTX parse errors: Ensure the PTX content is valid and complete. The parser uses regex-based extraction and may fail on unusual PTX formatting. Check the PTX file manually:
Adjusting artifact limits: If linting fails because your kernel exceeds resource limits, adjust the thresholds to match your target architecture:
10. VS Code Extension Not Activating¶
Symptom
The gpuemu VS Code extension does not show diagnostics, the status bar item is missing, or commands are not available.
Solution
The extension requires a gpuemu.toml file in the workspace root to activate. Create one or open a directory that contains one:
Check the gpuemu.binaryPath setting: The extension needs to find the gpuemu CLI binary. Open VS Code settings and verify:
If the binary is not on your PATH, provide the full path:
Check the extension is installed and enabled: Open the Extensions panel (Ctrl+Shift+X / Cmd+Shift+X) and search for "gpuemu". Ensure it is installed and not disabled.
Check the Output panel for errors: Open View > Output, select "gpuemu" from the dropdown, and look for error messages.
11. Fuzzing Finds Too Many Failures¶
Symptom
A fuzz run produces hundreds of failures, making it difficult to know where to start.
Solution
Start with fewer iterations to get a manageable number of failures:
Tighten one issue at a time. Look at the first few failures and categorize them:
- Are they all the same dtype? Fix that dtype's tolerance first.
- Are they all large shapes? You may have an accumulation error that scales with input size.
- Are they all a specific layout? Check non-contiguous tensor handling.
Use minimization to find the simplest reproducer for each unique failure mode:
Review the failure summary to identify patterns:
Fix the root cause, then re-run fuzzing to confirm the fix and discover the next category of issues.
12. Baseline Comparison Fails¶
Symptom
Running gpuemu diff or gpuemu test with fail_on_regression = true reports baseline comparison failures even though the op is correct.
Solution
Ensure a baseline was stored before running the comparison. Baselines are created explicitly:
# Store the current results as a named baseline
gpuemu baseline store v1.0
# Later, compare against the stored baseline
gpuemu baseline diff v1.0
If no baseline exists for the tag you are comparing against, the diff will fail with BaselineNotFound.
Update the baseline after intentional changes to your op. If you have changed the implementation and the new results are correct, store a new baseline:
gpuemu test # Verify the new results are correct
gpuemu baseline store v1.1 # Store updated baseline
Check that the seed matches: Baselines are seed-specific. If you changed the global seed in gpuemu.toml, previous baselines may no longer be comparable. Use explicit seeds for baseline comparisons:
Still Stuck?¶
If your issue is not covered here:
- Check the FAQ for general questions.
- Review the Configuration reference to ensure your
gpuemu.tomlis correct. - Inspect daemon logs with
gpuemu daemon logsfor detailed error information. - Search existing issues in the GitHub repository.