Troubleshooting¶
Common issues and solutions when using numaperf.
Diagnostic Commands¶
Start by gathering system information:
# NUMA topology
numactl --hardware
lscpu | grep -i numa
# Current process NUMA stats
numastat -p $(pgrep your_app)
# Memory policy of a process
cat /proc/$(pgrep your_app)/numa_maps
# Check kernel NUMA settings
cat /proc/sys/kernel/numa_balancing
cat /proc/sys/vm/zone_reclaim_mode
Common Issues¶
"NUMA not detected" on a NUMA system¶
Symptoms: Topology::discover() returns only one node on a multi-socket system.
Causes:
- NUMA disabled in BIOS
- Kernel booted with
numa=off - Running in a VM without NUMA passthrough
Solutions:
# Check kernel command line
cat /proc/cmdline | grep numa
# If numa=off, remove it from GRUB config
# /etc/default/grub: GRUB_CMDLINE_LINUX_DEFAULT="..."
# For VMs, enable NUMA in hypervisor settings
Permission denied for memory binding¶
Symptoms: NumaError::IoError with permission denied when using MemPolicy::Bind.
Cause: MPOL_BIND strict mode requires CAP_SYS_ADMIN.
Solutions:
# Option 1: Run as root
sudo ./your_app
# Option 2: Grant capability
sudo setcap 'cap_sys_admin+ep' ./your_app
# Option 3: Use soft mode (default)
# MemPolicy::Bind will fall back to best-effort
Thread affinity not applied¶
Symptoms: Workers running on unexpected CPUs despite pinning.
Causes:
- Missing
CAP_SYS_NICE - cgroup CPU restrictions
- CPU isolation settings
Diagnostic:
# Check which CPUs the process can use
cat /proc/$(pgrep your_app)/status | grep Cpus_allowed_list
# Check cgroup restrictions
cat /sys/fs/cgroup/$(cat /proc/$(pgrep your_app)/cgroup | cut -d: -f3)/cpuset.cpus
Solutions:
# Grant capability
sudo setcap 'cap_sys_nice+ep' ./your_app
# Or adjust cgroup settings
echo "0-15" > /sys/fs/cgroup/.../cpuset.cpus
Poor locality despite correct configuration¶
Symptoms: LocalityStats shows high remote steals even with LocalOnly policy.
Causes:
- NUMA balancing migrating pages
- Data allocated before worker starts
- Shared data structures
Diagnostic:
# Check NUMA balancing
cat /proc/sys/kernel/numa_balancing
# Should be 0
# Check where pages actually reside
grep -E "^[0-9a-f]+" /proc/$(pgrep your_app)/numa_maps | head
Solutions:
# Disable NUMA balancing
sudo sysctl -w kernel.numa_balancing=0
# In code: allocate data AFTER pinning
let _pin = ScopedPin::pin_current(cpus)?;
let data = NumaRegion::anon(size, MemPolicy::Local, ...)?;
Memory allocation fails with OOM¶
Symptoms: NumaError::IoError with out of memory when using MemPolicy::Bind.
Cause: Requested node has insufficient free memory.
Diagnostic:
Solutions:
// Option 1: Use Preferred instead of Bind
MemPolicy::Preferred(node_id)
// Option 2: Spread across multiple nodes
let mut nodes = NodeMask::new();
nodes.add(NodeId::new(0));
nodes.add(NodeId::new(1));
MemPolicy::Bind(nodes)
// Option 3: Use Interleave
MemPolicy::Interleave(all_nodes)
Executor hangs on shutdown¶
Symptoms: exec.shutdown() never returns.
Causes:
- Task that never completes
- Deadlock in task
- Task waiting for more work
Diagnostic:
# Check thread states
cat /proc/$(pgrep your_app)/task/*/stat | awk '{print $1, $3}'
# Use gdb to inspect
gdb -p $(pgrep your_app)
(gdb) info threads
(gdb) thread apply all bt
Solutions:
// Add timeout to tasks
exec.submit_to_node(node, || {
let result = std::panic::catch_unwind(|| {
// Your task
});
if result.is_err() {
log::error!("Task panicked");
}
});
// Consider using a watchdog
HardModeUnavailable errors¶
Symptoms: NumaError::HardModeUnavailable when building executor.
Diagnostic:
let caps = Capabilities::detect();
println!("{}", caps.summary());
for missing in caps.missing_for_hard_mode() {
println!("Missing: {}", missing);
}
Solutions:
See Hard Mode for capability setup.
Performance Issues¶
High latency variance¶
Symptoms: P99 latency much higher than P50.
Causes:
- Page faults during execution
- NUMA balancing moving pages
- Work stealing from remote nodes
Solutions:
// Prefault memory
NumaRegion::anon(size, policy, opts, Prefault::Touch)?;
// Disable NUMA balancing
// sysctl -w kernel.numa_balancing=0
// Use LocalOnly steal policy
.steal_policy(StealPolicy::LocalOnly)
Lower throughput than expected¶
Symptoms: Adding NUMA awareness didn't improve performance.
Causes:
- Workload isn't memory-bandwidth bound
- Tasks too fine-grained
- Single-socket system
Diagnostic:
# Check if memory bound
perf stat -e cycles,instructions,cache-misses ./your_app
# High cache-misses suggests memory bound
Solutions:
// Batch small tasks
for chunk in items.chunks(100) {
exec.submit_to_node(node, move || {
for item in chunk {
process(item);
}
});
}
Memory usage higher than expected¶
Symptoms: Process uses more memory than data size.
Causes:
- Per-node sharding overhead
- Cache padding
- Huge page alignment
This is normal: NUMA-aware allocation trades memory for performance.
Debug Logging¶
Enable detailed logging:
Or in code:
env_logger::Builder::from_env(
env_logger::Env::default().default_filter_or("numaperf=debug")
).init();
Reporting Issues¶
When reporting issues, include:
- System info:
lscpu,numactl --hardware - Kernel version:
uname -a - Capabilities: Output of
Capabilities::detect().summary() - Minimal reproduction: Smallest code that demonstrates the issue
- Error message: Full error including backtrace if available