Thread Pinning¶

Learn how to pin threads to specific CPUs for NUMA locality.

Why Pin Threads?¶

Without pinning, the OS scheduler can move threads between CPUs:

Thread migrates to different NUMA node
Memory that was local becomes remote
Performance becomes unpredictable

Basic Pinning¶

use numaperf::{ScopedPin, CpuSet};

fn main() -> Result<(), numaperf::NumaError> {
    let cpus = CpuSet::parse("0-3")?;  // CPUs 0, 1, 2, 3

    {
        let _pin = ScopedPin::pin_current(cpus)?;
        // Thread is now restricted to CPUs 0-3

        // Do work here...

    } // Pin automatically restored when dropped

    Ok(())
}

CPU Set Syntax¶

// Single CPU
let cpus = CpuSet::single(0);

// Parse from string
let cpus = CpuSet::parse("0")?;        // Just CPU 0
let cpus = CpuSet::parse("0-3")?;      // CPUs 0, 1, 2, 3
let cpus = CpuSet::parse("0,2,4")?;    // CPUs 0, 2, 4
let cpus = CpuSet::parse("0-3,8-11")?; // CPUs 0-3 and 8-11

// From topology
let node0_cpus = topo.cpu_set(NodeId::new(0));

Pin to NUMA Node¶

use numaperf::{ScopedPin, Topology, NodeId};

let topo = Topology::discover()?;

// Pin to all CPUs on node 0
let node0_cpus = topo.cpu_set(NodeId::new(0));
let _pin = ScopedPin::pin_current(node0_cpus)?;

Pin to Single CPU¶

let _pin = ScopedPin::pin_to_cpu(0)?;  // Pin to CPU 0 only

Check Current Affinity¶

use numaperf::get_affinity;

let current = get_affinity()?;
println!("Current affinity: {}", current);
println!("CPU count: {}", current.iter().count());

The Pin-Then-Allocate Pattern¶

Memory is allocated on the current thread's node:

use numaperf::{ScopedPin, Topology};

let topo = Topology::discover()?;

for node in topo.numa_nodes() {
    // 1. Pin to this node's CPUs
    let _pin = ScopedPin::pin_current(node.cpus().clone())?;

    // 2. Allocate - will be local to this node
    let data: Vec<u8> = vec![0; 1024 * 1024];

    // 3. Use data while pinned
    process_data(&data);
}

Hard Mode Pinning¶

For guaranteed pinning:

use numaperf::{ScopedPin, HardMode, CpuSet};

let cpus = CpuSet::parse("0-3")?;

// Fails if pinning cannot be guaranteed
let _pin = ScopedPin::pin_current_with_mode(cpus, HardMode::Strict)?;

Worker Thread Pattern¶

Pin worker threads at spawn time:

use std::thread;

let topo = Arc::new(Topology::discover()?);

for node in topo.numa_nodes() {
    let cpus = node.cpus().clone();

    thread::spawn(move || {
        // Pin immediately after spawn
        let _pin = ScopedPin::pin_current(cpus).unwrap();

        // Worker loop - always runs on this node
        loop {
            // Process work...
        }
    });
}

Important Notes¶

ScopedPin is !Send¶

ScopedPin cannot be sent between threads:

let pin = ScopedPin::pin_current(cpus)?;

// This won't compile!
thread::spawn(move || {
    drop(pin);  // Would restore wrong thread's affinity
});

Nested Pinning¶

Pins can be nested - each restores to its previous state:

let cpus_broad = CpuSet::parse("0-7")?;
let cpus_narrow = CpuSet::parse("0-1")?;

let _outer = ScopedPin::pin_current(cpus_broad)?;
// Pinned to 0-7

{
    let _inner = ScopedPin::pin_current(cpus_narrow)?;
    // Pinned to 0-1
}
// Back to 0-7

// Back to original affinity

Best Practices¶

Pin early - Pin before allocating memory
Use RAII - Let ScopedPin handle restoration
Pin to nodes - Use topo.cpu_set(node_id) for node-level pinning
Consider hard mode for production workloads
Don't hold pins across await points in async code