Python API Reference¶

Complete reference for the gpuemu Python client library. This package provides a high-level interface for communicating with the gpuemu daemon, running validations, fuzz testing, and managing results.

pip install gpuemu

Client¶

The primary interface for interacting with the gpuemu daemon.

Constructor¶

Client(socket_path: str | None = None, timeout_ms: int = 30000)

Parameter	Type	Default	Description
`socket_path`	`str \\| None`	`None`	Path to the daemon Unix socket. Defaults to `~/.gpuemu/gpuemu.sock`.
`timeout_ms`	`int`	`30000`	Request timeout in milliseconds

Context Manager Support

The Client class supports the context manager protocol for automatic cleanup:

from gpuemu import Client

with Client() as client:
    result = client.ping()
    print(result)

Methods¶

`ping()`¶

Check connectivity with the daemon.

def ping() -> str

Returns "pong" if the daemon is reachable.

`validate_op()`¶

Validate an operation against its reference implementation.

def validate_op(
    op_name: str,
    inputs: dict[str, np.ndarray],
    output: np.ndarray,
    dtype: str = "float32",
    seed: int | None = None,
) -> ValidationResult

Parameter	Type	Description
`op_name`	`str`	Name of the op (must match `gpuemu.toml`)
`inputs`	`dict[str, np.ndarray]`	Named input tensors
`output`	`np.ndarray`	The output tensor to validate
`dtype`	`str`	Data type used for tolerance lookup
`seed`	`int \\| None`	Optional seed for reproducibility

Returns a ValidationResult.

`get_result()`¶

Retrieve a stored validation result by seed.

def get_result(seed: int) -> ValidationResult

`list_results()`¶

List all stored validation results.

def list_results() -> list[ValidationResult]

`store_baseline()`¶

Store current results as a named baseline.

def store_baseline(tag: str) -> None

`fuzz_op()`¶

Run daemon-side fuzz testing on an operation.

def fuzz_op(
    op_name: str,
    iterations: int = 100,
    seed: int | None = None,
) -> FuzzResults

Parameter	Type	Default	Description
`op_name`	`str`		Name of the op to fuzz
`iterations`	`int`	`100`	Number of fuzz iterations
`seed`	`int \\| None`	`None`	Fixed seed for reproducibility

Returns a FuzzResults.

`reproduce()`¶

Reproduce a specific fuzz failure.

def reproduce(seed: int) -> ReproduceResult

Returns a ReproduceResult.

`minimize()`¶

Minimize a failing test case.

def minimize(
    seed: int,
    strategy: str | None = None,
    max_iters: int = 100,
) -> MinimizeResult

Parameter	Type	Default	Description
`seed`	`int`		Seed of the failure to minimize
`strategy`	`str \\| None`	`None`	`"binary-search-dims"` or `"binary-search-values"`
`max_iters`	`int`	`100`	Maximum minimization iterations

Returns a MinimizeResult.

`list_failures()`¶

List stored fuzz failures.

def list_failures(limit: int = 20) -> list[ValidationResult]

`get_test_case()`¶

Retrieve a specific test case from the daemon for client-side execution.

def get_test_case(op_name: str, seed: int) -> dict

Returns a dictionary containing the test case inputs and metadata.

`get_test_batch()`¶

Retrieve a batch of test cases for client-side execution.

def get_test_batch(op_name: str, seeds: list[int]) -> list[dict]

`submit_output()`¶

Submit the output of a client-side execution back to the daemon for validation.

def submit_output(
    op_name: str,
    seed: int,
    output: np.ndarray,
) -> ValidationResult

`fuzz_op_client_side()`¶

Run client-side fuzz testing. The daemon generates test cases, the client executes them locally, and submits results back for validation.

def fuzz_op_client_side(
    op_name: str,
    op_fn: Callable,
    iterations: int = 100,
    seed: int | None = None,
) -> FuzzResults

Parameter	Type	Description
`op_name`	`str`	Name of the op to fuzz
`op_fn`	`Callable`	The function under test
`iterations`	`int`	Number of fuzz iterations
`seed`	`int \\| None`	Fixed seed for reproducibility

Data Classes¶

`ValidationResult`¶

Result of a single validation run.

Field	Type	Description
`passed`	`bool`	Whether the validation passed
`seed`	`int`	Seed used for this validation
`op_name`	`str`	Name of the validated op
`max_diff`	`float`	Maximum absolute difference
`max_rel_diff`	`float`	Maximum relative difference
`failures`	`list[str]`	List of failure descriptions
`timestamp`	`str`	ISO 8601 timestamp
`duration_ms`	`int`	Validation duration in milliseconds
`repro_info`	`ReproductionInfo \\| None`	Reproduction information if the test failed

`FuzzResults`¶

Aggregated results from a fuzz testing session.

Field	Type	Description
`seed`	`int`	Root seed for this fuzz session
`total`	`int`	Total number of iterations run
`passed`	`int`	Number of passing iterations
`failed`	`int`	Number of failing iterations
`failures`	`list[ValidationResult]`	Detailed results for each failure

`ReproduceResult`¶

Result of reproducing a specific failure.

Field	Type	Description
`result`	`ValidationResult`	The validation result of the reproduction
`inputs`	`dict[str, np.ndarray]`	The input tensors that triggered the failure

`MinimizeResult`¶

Result of minimizing a failing test case.

Field	Type	Description
`original_seed`	`int`	The original failure seed
`minimized_seed`	`int`	Seed for the minimized test case
`minimized_shape`	`tuple[int, ...]`	The minimized input shape
`result`	`ValidationResult`	Validation result of the minimized case

`ReproductionInfo`¶

Metadata needed to exactly reproduce a test case.

Field	Type	Description
`seed`	`int`	RNG seed
`shape`	`tuple[int, ...]`	Input tensor shape
`strides`	`tuple[int, ...]`	Input tensor strides
`dtype`	`str`	Data type string
`layout`	`str`	Memory layout descriptor
`fuzz_config`	`FuzzConfig`	The fuzz configuration used
`input_snapshot`	`dict`	Serialized snapshot of input values

Validation Utilities¶

`validate_op()` Context Manager¶

A convenience context manager that wraps op execution with automatic validation.

from gpuemu.validation import validate_op

with validate_op("softmax", inputs={"logits": x}) as ctx:
    output = my_softmax(x)
    ctx.set_output(output)

assert ctx.result.passed

Fuzz Generators¶

Generators that yield randomized configurations for fuzz testing.

`fuzz_shapes()`¶

def fuzz_shapes(
    min_dims: int = 1,
    max_dims: int = 4,
    min_size: int = 1,
    max_size: int = 1024,
) -> Iterator[tuple[int, ...]]

Yields random tensor shapes.

`fuzz_dtypes()`¶

def fuzz_dtypes(
    include: list[str] | None = None,
    exclude: list[str] | None = None,
) -> Iterator[str]

Yields random dtype strings, optionally filtered.

`fuzz_layouts()`¶

def fuzz_layouts() -> Iterator[str]

Yields random memory layouts ("contiguous", "strided", "channels_last", etc.).

`fuzz_shapes_seeded()`¶

def fuzz_shapes_seeded(seed: int, **kwargs) -> Iterator[tuple[int, ...]]

Deterministic variant of fuzz_shapes() with a fixed seed.

`fuzz_dtypes_seeded()`¶

def fuzz_dtypes_seeded(seed: int, **kwargs) -> Iterator[str]

Deterministic variant of fuzz_dtypes() with a fixed seed.

`fuzz_layouts_seeded()`¶

def fuzz_layouts_seeded(seed: int) -> Iterator[str]

Deterministic variant of fuzz_layouts() with a fixed seed.

`generate_random_tensor()`¶

Generate a random tensor from a seed and specification.

def generate_random_tensor(
    seed: int,
    shape: tuple[int, ...],
    dtype: str = "float32",
    domain: tuple[float, float] = (-1.0, 1.0),
) -> np.ndarray

Parameter	Type	Default	Description
`seed`	`int`		RNG seed for reproducibility
`shape`	`tuple[int, ...]`		Tensor shape
`dtype`	`str`	`"float32"`	NumPy-compatible dtype string
`domain`	`tuple[float, float]`	`(-1.0, 1.0)`	Value range `(min, max)`

`FuzzConfig`¶

Configuration dataclass for fuzz testing sessions.

@dataclass
class FuzzConfig:
    iterations: int = 100
    seed: int | None = None
    min_dims: int = 1
    max_dims: int = 4
    min_size: int = 1
    max_size: int = 1024
    dtypes: list[str] = field(default_factory=lambda: ["float32"])
    layouts: list[str] = field(default_factory=lambda: ["contiguous"])

`SeededFuzzer`¶

A stateful fuzzer that generates reproducible test cases.

class SeededFuzzer:
    def __init__(self, seed: int, config: FuzzConfig | None = None): ...
    def next_test_case(self) -> TestCase: ...
    def run(self, op_fn: Callable) -> FuzzResults: ...

`TestCase`¶

@dataclass
class TestCase:
    seed: int
    shape: tuple[int, ...]
    dtype: str
    layout: str
    inputs: dict[str, np.ndarray]

RNG¶

Deterministic random number generation for reproducible testing.

`SeededRng`¶

A portable, seedable RNG that produces identical sequences across Python and Rust.

class SeededRng:
    def __init__(self, seed: int): ...
    def derive(self, domain: str) -> "SeededRng": ...
    def choice(self, items: list[T]) -> T: ...
    def gen_range(self, low: int, high: int) -> int: ...
    def gen_u64(self) -> int: ...
    def gen_f32(self) -> float: ...
    def randn(self, shape: tuple[int, ...]) -> np.ndarray: ...

Method	Description
`derive(domain)`	Create a child RNG scoped to a named domain
`choice(items)`	Pick a random element from a list
`gen_range(low, high)`	Generate an integer in `[low, high)`
`gen_u64()`	Generate a random unsigned 64-bit integer
`gen_f32()`	Generate a random float in `[0.0, 1.0)`
`randn(shape)`	Generate a tensor of normally distributed values

Standalone Functions¶

`derive_seed()`¶

def derive_seed(seed: int, domain: str) -> int

Derive a new seed by hashing the parent seed with a domain string.

`generate_seed()`¶

def generate_seed() -> int

Generate a fresh random seed from system entropy.

Tolerances¶

Utilities for managing numerical comparison tolerances.

`ToleranceConfig`¶

Configuration for a single tolerance check.

@dataclass
class ToleranceConfig:
    atol: float  # Absolute tolerance
    rtol: float  # Relative tolerance

Method	Description
`for_dtype(dtype: str)`	Return a `ToleranceConfig` appropriate for the given dtype
`strict()`	Return a strict tolerance (`atol=1e-7, rtol=1e-7`)
`relaxed()`	Return a relaxed tolerance (`atol=1e-3, rtol=1e-3`)
`scale(factor: float)`	Return a new config with tolerances scaled by `factor`

`ToleranceProfile`¶

Named tolerance profiles for common use cases.

class ToleranceProfile:
    @staticmethod
    def get(name: str) -> ToleranceConfig: ...

    @staticmethod
    def for_testing() -> ToleranceConfig: ...

    @staticmethod
    def for_production() -> ToleranceConfig: ...

    @staticmethod
    def for_cross_framework() -> ToleranceConfig: ...

Profile	Description
`for_testing()`	Relaxed tolerances suitable for development
`for_production()`	Strict tolerances for production validation
`for_cross_framework()`	Tolerances accounting for cross-framework numerical variance

Standalone Functions¶

`calibrate_tolerance()`¶

def calibrate_tolerance(
    op_fn: Callable,
    ref_fn: Callable,
    shapes: list[tuple[int, ...]],
    dtype: str = "float32",
    n_samples: int = 100,
) -> ToleranceConfig

Empirically determine appropriate tolerances by running both functions on random inputs.

`get_recommended_tolerance()`¶

def get_recommended_tolerance(
    dtype: str,
    op_type: str = "elementwise",
) -> ToleranceConfig

Return recommended tolerance values based on dtype and operation type.

Auto-generated API Documentation¶

`gpuemu.client.Client` ¶

Client for communicating with the gpuemu daemon.

Example

client = Client() client.ping()

Source code in gpuemu/client.py

class Client:
    """Client for communicating with the gpuemu daemon.

    Example:
        >>> client = Client()
        >>> client.ping()
        {'version': '0.1.0', 'uptime_secs': 123}
    """

    def __init__(
        self,
        socket_path: Optional[str] = None,
        timeout_ms: int = 30000,
    ):
        """Initialize the client.

        Args:
            socket_path: Path to the daemon socket. Defaults to ~/.gpuemu/gpuemu.sock
            timeout_ms: Timeout for requests in milliseconds.
        """
        if socket_path is None:
            socket_path = os.path.expanduser("~/.gpuemu/gpuemu.sock")

        self.socket_path = socket_path
        self.timeout_ms = timeout_ms
        self._socket = None
        self._version_checked = False

    def _ensure_connected(self):
        """Ensure we have a connection to the daemon."""
        if not HAS_PYNNG:
            raise ImportError(
                "pynng is required for gpuemu. Install with: pip install pynng"
            )

        if self._socket is None:
            sock = pynng.Req0()
            sock.recv_timeout = self.timeout_ms
            sock.send_timeout = self.timeout_ms

            socket_url = f"ipc://{self.socket_path}"
            try:
                sock.dial(socket_url)
            except pynng.exceptions.ConnectionRefused:
                raise ClientError(
                    f"Cannot connect to daemon at {self.socket_path}. "
                    "Is the daemon running? Start it with: gpuemu daemon start"
                )
            # Set the socket BEFORE the version check so the check's own request
            # reuses it (the guard prevents re-entrant reconnect/recursion).
            self._socket = sock
            if not self._version_checked:
                self._version_checked = True
                self._check_protocol_version()

        return self._socket

    def _check_protocol_version(self):
        """Verify daemon protocol version is compatible (called once on connect)."""
        try:
            ping_resp = self._send_request({"type": "Ping"})
            daemon_pv = ping_resp.get("protocol_version", 0)
            if daemon_pv != PROTOCOL_VERSION:
                raise ClientError(
                    f"Protocol version mismatch: client={PROTOCOL_VERSION}, "
                    f"daemon={daemon_pv}. Please upgrade the "
                    f"{'client' if daemon_pv > PROTOCOL_VERSION else 'daemon'}."
                )
        except ClientError:
            raise
        except Exception:
            pass

    def close(self):
        """Close the connection."""
        if self._socket is not None:
            self._socket.close()
            self._socket = None
        self._version_checked = False

    def __enter__(self):
        return self

    def __exit__(self, *args):
        self.close()

    def _send_request(self, request: Dict[str, Any]) -> Dict[str, Any]:
        """Send a request and return the response."""
        socket = self._ensure_connected()

        # Serialize request as JSON (simple protocol for MVP)
        request_bytes = json.dumps(request).encode("utf-8")

        try:
            socket.send(request_bytes)
            response_bytes = socket.recv()
            return json.loads(response_bytes.decode("utf-8"))
        except pynng.exceptions.Timeout:
            raise ClientError("Request timed out")
        except Exception as e:
            raise ClientError(f"Request failed: {e}")

    def ping(self) -> Dict[str, Any]:
        """Ping the daemon to check if it's alive.

        Returns:
            Dict with 'version', 'protocol_version', and 'uptime_secs'.

        Raises:
            ClientError: If the daemon has an incompatible protocol version.
        """
        response = self._send_request({"type": "Ping"})

        if response.get("type") == "Pong":
            daemon_pv = response.get("protocol_version", 0)
            if daemon_pv != PROTOCOL_VERSION:
                raise ClientError(
                    f"Protocol version mismatch: client={PROTOCOL_VERSION}, "
                    f"daemon={daemon_pv}. Please upgrade the "
                    f"{'client' if daemon_pv > PROTOCOL_VERSION else 'daemon'}."
                )
            return {
                "version": response.get("version", "unknown"),
                "protocol_version": daemon_pv,
                "uptime_secs": response.get("uptime_secs", 0),
            }
        elif response.get("type") == "Error":
            raise ClientError(response.get("message", "Unknown error"))
        else:
            raise ClientError(f"Unexpected response: {response}")

    def validate_op(
        self,
        op_name: str,
        inputs: Dict[str, np.ndarray],
        output: np.ndarray,
        **kwargs,
    ) -> ValidationResult:
        """Validate an op output against its reference implementation.

        Args:
            op_name: Name of the op (must be registered in gpuemu.toml).
            inputs: Input tensors as numpy arrays.
            output: Output tensor to validate.
            **kwargs: Additional kwargs to pass to the reference script.

        Returns:
            ValidationResult with pass/fail status and details.
        """
        # Encode tensors for transmission
        encoded_inputs = {
            name: self._encode_tensor(arr) for name, arr in inputs.items()
        }
        encoded_output = self._encode_tensor(output)

        request = {
            "type": "ValidateOp",
            "op_name": op_name,
            "inputs": encoded_inputs,
            "output": encoded_output,
            "kwargs": {k: str(v) for k, v in kwargs.items()},
        }

        response = self._send_request(request)

        if response.get("type") == "ValidationResult":
            return ValidationResult.from_dict(response.get("result", {}))
        elif response.get("type") == "Error":
            raise ClientError(response.get("message", "Unknown error"))
        else:
            raise ClientError(f"Unexpected response: {response}")

    def get_result(self, seed: int) -> Optional[ValidationResult]:
        """Get a stored validation result by seed.

        Args:
            seed: The seed of the validation run.

        Returns:
            ValidationResult if found, None otherwise.
        """
        request = {"type": "GetResult", "seed": seed}
        response = self._send_request(request)

        if response.get("type") == "ValidationResult":
            return ValidationResult.from_dict(response.get("result", {}))
        elif response.get("type") == "Error":
            if response.get("code") == "NotFound":
                return None
            raise ClientError(response.get("message", "Unknown error"))
        else:
            raise ClientError(f"Unexpected response: {response}")

    def list_results(self, limit: int = 100) -> List[ValidationResult]:
        """List recent validation results.

        Args:
            limit: Maximum number of results to return.

        Returns:
            List of ValidationResult objects.
        """
        request = {"type": "ListResults", "limit": limit}
        response = self._send_request(request)

        if response.get("type") == "Results":
            return [ValidationResult.from_dict(r) for r in response.get("results", [])]
        elif response.get("type") == "Error":
            raise ClientError(response.get("message", "Unknown error"))
        else:
            raise ClientError(f"Unexpected response: {response}")

    def store_baseline(self, tag: str) -> None:
        """Store current results as a baseline.

        Args:
            tag: Tag name for the baseline.
        """
        request = {"type": "StoreBaseline", "tag": tag}
        response = self._send_request(request)

        if response.get("type") == "Ok":
            return
        elif response.get("type") == "Error":
            raise ClientError(response.get("message", "Unknown error"))
        else:
            raise ClientError(f"Unexpected response: {response}")

    # =========================================================================
    # Phase 2: Fuzzing and Reproducibility
    # =========================================================================

    def fuzz_op(
        self,
        op_name: str,
        seed: Optional[int] = None,
        iterations: int = 100,
        fail_fast: bool = False,
        batch_sizes: Optional[List[int]] = None,
        seq_lengths: Optional[List[int]] = None,
        hidden_dims: Optional[List[int]] = None,
        dtypes: Optional[List[str]] = None,
        layouts: Optional[List[str]] = None,
    ) -> FuzzResults:
        """Fuzz test an op with random inputs.

        Args:
            op_name: Name of the op (must be registered in gpuemu.toml).
            seed: Master seed for reproducibility. If None, uses current timestamp.
            iterations: Number of test cases to generate.
            fail_fast: Stop on first failure.
            batch_sizes: List of batch sizes to use.
            seq_lengths: List of sequence lengths to use.
            hidden_dims: List of hidden dimensions to use.
            dtypes: List of dtype strings to use.
            layouts: List of layout types to use.

        Returns:
            FuzzResults with pass/fail counts and list of failures.

        Example:
            >>> results = client.fuzz_op("matmul", seed=12345, iterations=100)
            >>> print(f"Passed: {results.passed}/{results.total}")
            >>> for failure in results.failures:
            ...     print(f"  Seed {failure.seed}: {failure.failures[0]['message']}")
        """
        if seed is None:
            seed = int(time.time_ns()) & 0xFFFFFFFFFFFFFFFF

        # Build fuzz config
        fuzz_config = {
            "seed": seed,
            "shape_options": {
                "batch_sizes": batch_sizes or [1, 2, 4, 8, 16, 32],
                "seq_lengths": seq_lengths or [64, 128, 256, 512, 1024],
                "hidden_dims": hidden_dims or [256, 512, 768, 1024],
                "edge_cases": [[1], [1, 1], [1, 1, 1]],
            },
            "dtypes": dtypes or ["float32", "float16"],
            "layouts": layouts or ["Contiguous", "Strided", "Transposed"],
        }

        request = {
            "type": "FuzzOp",
            "op_name": op_name,
            "fuzz_config": fuzz_config,
            "iterations": iterations,
            "fail_fast": fail_fast,
        }

        response = self._send_request(request)

        if response.get("type") == "FuzzResults":
            return FuzzResults.from_dict(response)
        elif response.get("type") == "Error":
            raise ClientError(response.get("message", "Unknown error"))
        else:
            raise ClientError(f"Unexpected response: {response}")

    def reproduce(self, seed: int) -> ReproduceResult:
        """Reproduce a failing test case by seed.

        Retrieves the stored failure and regenerates the exact inputs
        that caused the failure.

        Args:
            seed: The seed of the failing test case.

        Returns:
            ReproduceResult with the original result and regenerated inputs.

        Example:
            >>> repro = client.reproduce(12345)
            >>> print(f"Op: {repro.result.op_name}")
            >>> print(f"Input shape: {repro.inputs['input'].shape}")
        """
        request = {"type": "Reproduce", "seed": seed}
        response = self._send_request(request)

        if response.get("type") == "ReproduceResult":
            return ReproduceResult.from_dict(response, self._decode_tensor)
        elif response.get("type") == "Error":
            raise ClientError(response.get("message", "Unknown error"))
        else:
            raise ClientError(f"Unexpected response: {response}")

    def minimize(
        self,
        seed: int,
        strategy: str = "binary-search-dims",
        max_iters: int = 100,
    ) -> MinimizeResult:
        """Minimize a failing test case.

        Attempts to find a smaller input that still triggers the failure.

        Args:
            seed: The seed of the failing test case.
            strategy: Minimization strategy. One of:
                - "binary-search-dims": Binary search to reduce dimensions.
                - "binary-search-values": Binary search to reduce values.
            max_iters: Maximum iterations for minimization.

        Returns:
            MinimizeResult with minimized seed, shape, and result.

        Example:
            >>> result = client.minimize(12345)
            >>> print(f"Minimized shape: {result.minimized_shape}")
        """
        # Convert strategy string to protocol enum
        strategy_map = {
            "binary-search-dims": "BinarySearchDims",
            "binary-search-values": "BinarySearchValues",
        }
        proto_strategy = strategy_map.get(strategy, "BinarySearchDims")

        request = {
            "type": "Minimize",
            "seed": seed,
            "strategy": proto_strategy,
            "max_iters": max_iters,
        }
        response = self._send_request(request)

        if response.get("type") == "MinimizeResult":
            return MinimizeResult.from_dict(response)
        elif response.get("type") == "Error":
            raise ClientError(response.get("message", "Unknown error"))
        else:
            raise ClientError(f"Unexpected response: {response}")

    def list_failures(self, limit: int = 20) -> List[ValidationResult]:
        """List stored failures.

        Args:
            limit: Maximum number of failures to return.

        Returns:
            List of ValidationResult objects for failed tests.

        Example:
            >>> failures = client.list_failures(limit=10)
            >>> for f in failures:
            ...     print(f"Seed {f.seed}: {f.op_name}")
        """
        request = {"type": "ListFailures", "limit": limit}
        response = self._send_request(request)

        if response.get("type") == "Results":
            return [ValidationResult.from_dict(r) for r in response.get("results", [])]
        elif response.get("type") == "Error":
            raise ClientError(response.get("message", "Unknown error"))
        else:
            raise ClientError(f"Unexpected response: {response}")

    # =========================================================================
    # Phase 3: Artifact Inspection
    # =========================================================================

    def lint_kernel(
        self, ptx_content: str, kernel_name: Optional[str] = None
    ) -> List[Dict[str, Any]]:
        """Lint PTX through the daemon's artifact analyzer.

        Extracts static metrics (registers, spills, local memory, instruction mix)
        and checks them against configured thresholds. If no kernel is registered,
        the daemon detects the kernel name from the PTX and uses default thresholds.

        Args:
            ptx_content: Raw PTX assembly text.
            kernel_name: Optional kernel name to lint (else all / detected).

        Returns:
            List of lint-result dicts, each with keys: kernel_name, passed,
            metrics (register_count, spill_count, ...), violations, timestamp.
        """
        request = {
            "type": "LintKernel",
            "kernel_name": kernel_name,
            "ptx_content": ptx_content,
        }
        response = self._send_request(request)
        if response.get("type") == "LintResults":
            return response.get("results", [])
        elif response.get("type") == "Error":
            raise ClientError(response.get("message", "Unknown error"))
        raise ClientError(f"Unexpected response: {response}")

    @staticmethod
    def _encode_tensor(arr: np.ndarray) -> Dict[str, Any]:
        """Encode a numpy array for transmission."""
        return {
            "shape": list(arr.shape),
            "strides": list(arr.strides),
            "dtype": Client._numpy_dtype_to_protocol(arr.dtype),
            "data": base64.b64encode(arr.tobytes()).decode("utf-8"),
        }

    @staticmethod
    def _numpy_dtype_to_protocol(dtype: np.dtype) -> str:
        """Convert a numpy dtype to the protocol dtype string.

        Maps numpy dtypes to the Rust DType enum variant names
        (lowercase, matching serde serialization).
        """
        mapping = {
            "float16": "float16",
            "float32": "float32",
            "float64": "float64",
            "int8": "int8",
            "int16": "int16",
            "int32": "int32",
            "int64": "int64",
            "uint8": "uint8",
            "uint16": "uint16",
            "uint32": "uint32",
            "uint64": "uint64",
            "bool": "bool",
        }
        name = str(dtype)
        if name in mapping:
            return mapping[name]
        if "bfloat16" in name or "bf16" in name:
            return "bfloat16"
        return name

    @staticmethod
    def _protocol_dtype_to_numpy(dtype_str: str) -> np.dtype:
        """Convert a protocol dtype string back to a numpy dtype.

        Handles bfloat16 by falling back to float16 as proxy,
        since numpy has no native bfloat16.
        """
        mapping = {
            "float16": np.float16,
            "bfloat16": np.float16,
            "float32": np.float32,
            "float64": np.float64,
            "int8": np.int8,
            "int16": np.int16,
            "int32": np.int32,
            "int64": np.int64,
            "uint8": np.uint8,
            "uint16": np.uint16,
            "uint32": np.uint32,
            "uint64": np.uint64,
            "bool": np.bool_,
        }
        return np.dtype(mapping.get(dtype_str, np.float32))

    @staticmethod
    def _decode_tensor(data: Dict[str, Any]) -> np.ndarray:
        """Decode a numpy array from transmission format."""
        shape = tuple(data["shape"])
        dtype = Client._protocol_dtype_to_numpy(data.get("dtype", "float32"))
        raw = base64.b64decode(data["data"])
        return np.frombuffer(raw, dtype=dtype).reshape(shape).copy()

    # =========================================================================
    # Execution Modes: Client-Side Fuzzing
    # =========================================================================

    def get_test_case(self, op_name: str, seed: Optional[int] = None) -> Dict[str, Any]:
        """Get a single test case from the daemon for client-side execution.

        The daemon generates random inputs. The client runs the actual op
        on GPU and submits the output for validation via submit_output().

        Args:
            op_name: Name of the op (must be registered in gpuemu.toml).
            seed: Master seed for reproducibility. Auto-generated if None.

        Returns:
            Dict with 'seed', 'inputs' (dict of name->ndarray), 'shape', 'dtype', 'layout'.
        """
        if seed is None:
            seed = int(time.time_ns()) & 0xFFFFFFFFFFFFFFFF

        fuzz_config = {
            "seed": seed,
            "shape_options": {
                "batch_sizes": [1, 2, 4, 8],
                "seq_lengths": [64, 128, 256],
                "hidden_dims": [256, 512],
                "edge_cases": [[1], [1, 1]],
            },
            "dtypes": ["float32", "float16"],
            "layouts": ["Contiguous", "Strided"],
        }

        request = {
            "type": "GetTestCase",
            "op_name": op_name,
            "fuzz_config": fuzz_config,
        }

        response = self._send_request(request)

        if response.get("type") == "TestCase":
            inputs = {
                name: self._decode_tensor(tensor)
                for name, tensor in response.get("inputs", {}).items()
            }
            return {
                "seed": response.get("seed", 0),
                "inputs": inputs,
                "shape": response.get("shape", []),
                "dtype": response.get("dtype", "float32"),
                "layout": response.get("layout", "contiguous"),
            }
        elif response.get("type") == "Error":
            raise ClientError(response.get("message", "Unknown error"))
        else:
            raise ClientError(f"Unexpected response: {response}")

    def get_test_batch(
        self,
        op_name: str,
        count: int = 10,
        seed: Optional[int] = None,
        op_schema: Optional[Dict[str, Any]] = None,
        dtypes: Optional[List[str]] = None,
    ) -> List[Dict[str, Any]]:
        """Get a batch of test cases from the daemon.

        Args:
            op_name: Name of the op.
            count: Number of test cases to generate.
            seed: Master seed. Auto-generated if None.
            op_schema: Optional operator-aware shape schema. When provided, the
                daemon generates per-input shapes from shared symbolic dims
                (e.g. matmul A[M,K]/B[K,N]) instead of one shape for all inputs.
                Shape: {"name", "dims": [{"name","candidates"}],
                        "inputs": [{"name","dims"}], "output": {"name","dims"}}.

        Returns:
            List of test case dicts (same format as get_test_case).
        """
        if seed is None:
            seed = int(time.time_ns()) & 0xFFFFFFFFFFFFFFFF

        fuzz_config = {
            "seed": seed,
            "shape_options": {
                "batch_sizes": [1, 2, 4, 8],
                "seq_lengths": [64, 128, 256],
                "hidden_dims": [256, 512],
                "edge_cases": [[1], [1, 1]],
            },
            "dtypes": dtypes or ["float32", "float16"],
            "layouts": ["Contiguous", "Strided"],
        }
        if op_schema is not None:
            fuzz_config["op_schema"] = op_schema

        request = {
            "type": "GetTestBatch",
            "op_name": op_name,
            "fuzz_config": fuzz_config,
            "count": count,
        }

        response = self._send_request(request)

        if response.get("type") == "TestBatch":
            cases = []
            for case_data in response.get("cases", []):
                inputs = {
                    name: self._decode_tensor(tensor)
                    for name, tensor in case_data.get("inputs", {}).items()
                }
                cases.append(
                    {
                        "seed": case_data.get("seed", 0),
                        "inputs": inputs,
                        "shape": case_data.get("shape", []),
                        "dtype": case_data.get("dtype", "float32"),
                        "layout": case_data.get("layout", "contiguous"),
                    }
                )
            return cases
        elif response.get("type") == "Error":
            raise ClientError(response.get("message", "Unknown error"))
        else:
            raise ClientError(f"Unexpected response: {response}")

    def submit_output(
        self,
        op_name: str,
        inputs: Dict[str, np.ndarray],
        output: np.ndarray,
        seed: int,
        **kwargs,
    ) -> ValidationResult:
        """Submit an op output for validation against the reference.

        This is the core method for client-side and daemon-orchestrated
        execution modes. The client runs the actual GPU op and submits
        the result here for comparison.

        Args:
            op_name: Name of the op (must be registered in gpuemu.toml).
            inputs: Input tensors as numpy arrays.
            output: Output tensor from the op under test.
            seed: Seed of the test case (from get_test_case or get_test_batch).
            **kwargs: Additional kwargs for the reference script.

        Returns:
            ValidationResult with pass/fail status and details.
        """
        encoded_inputs = {
            name: self._encode_tensor(arr) for name, arr in inputs.items()
        }
        encoded_output = self._encode_tensor(output)

        request = {
            "type": "SubmitOutput",
            "op_name": op_name,
            "inputs": encoded_inputs,
            "output": encoded_output,
            "seed": seed,
            "kwargs": {k: str(v) for k, v in kwargs.items()},
        }

        response = self._send_request(request)

        if response.get("type") == "SubmitResult":
            return ValidationResult.from_dict(response.get("result", {}))
        elif response.get("type") == "Error":
            raise ClientError(response.get("message", "Unknown error"))
        else:
            raise ClientError(f"Unexpected response: {response}")

    def fuzz_op_client_side(
        self,
        op_name: str,
        run_op: "Callable[[Dict[str, np.ndarray]], np.ndarray]",
        iterations: int = 100,
        seed: Optional[int] = None,
        fail_fast: bool = False,
        op_schema: Optional[Dict[str, Any]] = None,
        dtypes: Optional[List[str]] = None,
    ) -> FuzzResults:
        """Fuzz an op using client-side execution (THE RECOMMENDED DROP-IN PATH).

        This method generates random inputs via the daemon, runs the provided
        ``run_op`` callable on the client (which has GPU access), and validates
        the output against the reference script. This is how GPU developers
        should use gpuemu for fuzzing.

        Args:
            op_name: Name of the op (must be registered in gpuemu.toml).
            run_op: A callable that takes a dict of input tensors and returns
                     the output tensor. This is where you call your GPU kernel.
            iterations: Number of test cases to try.
            seed: Master seed. Auto-generated if None.
            fail_fast: Stop on first failure.
            op_schema: Optional operator-aware shape schema (see get_test_batch).
                Use for ops whose inputs have different but linked shapes
                (matmul, attention) so fuzzing covers the real operator domain.

        Returns:
            FuzzResults with pass/fail counts and list of failures.

        Example:
            >>> client = Client()
            >>> results = client.fuzz_op_client_side(
            ...     "my_flash_attention",
            ...     run_op=lambda inputs: my_flash_attn(inputs["q"], inputs["k"], inputs["v"]),
            ...     iterations=50,
            ... )
            >>> print(f"Passed: {results.passed}/{results.total}")
        """
        if seed is None:
            seed = int(time.time_ns()) & 0xFFFFFFFFFFFFFFFF

        cases = self.get_test_batch(
            op_name, count=iterations, seed=seed, op_schema=op_schema, dtypes=dtypes
        )
        total = 0
        passed = 0
        failed = 0
        failures = []

        for case in cases:
            total += 1
            try:
                output = run_op(case["inputs"])
                result = self.submit_output(
                    op_name, case["inputs"], output, case["seed"]
                )
                if result.passed:
                    passed += 1
                else:
                    failed += 1
                    failures.append(result)
                    if fail_fast:
                        break
            except Exception as e:
                failed += 1
                failures.append(
                    ValidationResult(
                        passed=False,
                        seed=case["seed"],
                        op_name=op_name,
                        max_diff=float("inf"),
                        max_rel_diff=float("inf"),
                        failures=[{"kind": "ExecutionError", "message": str(e)}],
                        timestamp=int(time.time()),
                        duration_ms=0,
                    )
                )
                if fail_fast:
                    break

        return FuzzResults(
            seed=seed,
            total=total,
            passed=passed,
            failed=failed,
            failures=failures,
        )

`init(socket_path=None, timeout_ms=30000)` ¶

Initialize the client.

Parameters:

Name	Type	Description	Default
`socket_path`	`Optional[str]`	Path to the daemon socket. Defaults to ~/.gpuemu/gpuemu.sock	`None`
`timeout_ms`	`int`	Timeout for requests in milliseconds.	`30000`

Source code in gpuemu/client.py

def __init__(
    self,
    socket_path: Optional[str] = None,
    timeout_ms: int = 30000,
):
    """Initialize the client.

    Args:
        socket_path: Path to the daemon socket. Defaults to ~/.gpuemu/gpuemu.sock
        timeout_ms: Timeout for requests in milliseconds.
    """
    if socket_path is None:
        socket_path = os.path.expanduser("~/.gpuemu/gpuemu.sock")

    self.socket_path = socket_path
    self.timeout_ms = timeout_ms
    self._socket = None
    self._version_checked = False

`close()` ¶

Close the connection.

Source code in gpuemu/client.py

def close(self):
    """Close the connection."""
    if self._socket is not None:
        self._socket.close()
        self._socket = None
    self._version_checked = False

`ping()` ¶

Ping the daemon to check if it's alive.

Returns:

Type	Description
`Dict[str, Any]`	Dict with 'version', 'protocol_version', and 'uptime_secs'.

Raises:

Type	Description
`ClientError`	If the daemon has an incompatible protocol version.

Source code in gpuemu/client.py

def ping(self) -> Dict[str, Any]:
    """Ping the daemon to check if it's alive.

    Returns:
        Dict with 'version', 'protocol_version', and 'uptime_secs'.

    Raises:
        ClientError: If the daemon has an incompatible protocol version.
    """
    response = self._send_request({"type": "Ping"})

    if response.get("type") == "Pong":
        daemon_pv = response.get("protocol_version", 0)
        if daemon_pv != PROTOCOL_VERSION:
            raise ClientError(
                f"Protocol version mismatch: client={PROTOCOL_VERSION}, "
                f"daemon={daemon_pv}. Please upgrade the "
                f"{'client' if daemon_pv > PROTOCOL_VERSION else 'daemon'}."
            )
        return {
            "version": response.get("version", "unknown"),
            "protocol_version": daemon_pv,
            "uptime_secs": response.get("uptime_secs", 0),
        }
    elif response.get("type") == "Error":
        raise ClientError(response.get("message", "Unknown error"))
    else:
        raise ClientError(f"Unexpected response: {response}")

`validate_op(op_name, inputs, output, **kwargs)` ¶

Validate an op output against its reference implementation.

Parameters:

Name	Type	Description	Default
`op_name`	`str`	Name of the op (must be registered in gpuemu.toml).	required
`inputs`	`Dict[str, ndarray]`	Input tensors as numpy arrays.	required
`output`	`ndarray`	Output tensor to validate.	required
`**kwargs`		Additional kwargs to pass to the reference script.	`{}`

Returns:

Type	Description
`ValidationResult`	ValidationResult with pass/fail status and details.

Source code in gpuemu/client.py

def validate_op(
    self,
    op_name: str,
    inputs: Dict[str, np.ndarray],
    output: np.ndarray,
    **kwargs,
) -> ValidationResult:
    """Validate an op output against its reference implementation.

    Args:
        op_name: Name of the op (must be registered in gpuemu.toml).
        inputs: Input tensors as numpy arrays.
        output: Output tensor to validate.
        **kwargs: Additional kwargs to pass to the reference script.

    Returns:
        ValidationResult with pass/fail status and details.
    """
    # Encode tensors for transmission
    encoded_inputs = {
        name: self._encode_tensor(arr) for name, arr in inputs.items()
    }
    encoded_output = self._encode_tensor(output)

    request = {
        "type": "ValidateOp",
        "op_name": op_name,
        "inputs": encoded_inputs,
        "output": encoded_output,
        "kwargs": {k: str(v) for k, v in kwargs.items()},
    }

    response = self._send_request(request)

    if response.get("type") == "ValidationResult":
        return ValidationResult.from_dict(response.get("result", {}))
    elif response.get("type") == "Error":
        raise ClientError(response.get("message", "Unknown error"))
    else:
        raise ClientError(f"Unexpected response: {response}")

`get_result(seed)` ¶

Get a stored validation result by seed.

Parameters:

Name	Type	Description	Default
`seed`	`int`	The seed of the validation run.	required

Returns:

Type	Description
`Optional[ValidationResult]`	ValidationResult if found, None otherwise.

Source code in gpuemu/client.py

def get_result(self, seed: int) -> Optional[ValidationResult]:
    """Get a stored validation result by seed.

    Args:
        seed: The seed of the validation run.

    Returns:
        ValidationResult if found, None otherwise.
    """
    request = {"type": "GetResult", "seed": seed}
    response = self._send_request(request)

    if response.get("type") == "ValidationResult":
        return ValidationResult.from_dict(response.get("result", {}))
    elif response.get("type") == "Error":
        if response.get("code") == "NotFound":
            return None
        raise ClientError(response.get("message", "Unknown error"))
    else:
        raise ClientError(f"Unexpected response: {response}")

`list_results(limit=100)` ¶

List recent validation results.

Parameters:

Name	Type	Description	Default
`limit`	`int`	Maximum number of results to return.	`100`

Returns:

Type	Description
`List[ValidationResult]`	List of ValidationResult objects.

Source code in gpuemu/client.py

def list_results(self, limit: int = 100) -> List[ValidationResult]:
    """List recent validation results.

    Args:
        limit: Maximum number of results to return.

    Returns:
        List of ValidationResult objects.
    """
    request = {"type": "ListResults", "limit": limit}
    response = self._send_request(request)

    if response.get("type") == "Results":
        return [ValidationResult.from_dict(r) for r in response.get("results", [])]
    elif response.get("type") == "Error":
        raise ClientError(response.get("message", "Unknown error"))
    else:
        raise ClientError(f"Unexpected response: {response}")

`store_baseline(tag)` ¶

Store current results as a baseline.

Parameters:

Name	Type	Description	Default
`tag`	`str`	Tag name for the baseline.	required

Source code in gpuemu/client.py

def store_baseline(self, tag: str) -> None:
    """Store current results as a baseline.

    Args:
        tag: Tag name for the baseline.
    """
    request = {"type": "StoreBaseline", "tag": tag}
    response = self._send_request(request)

    if response.get("type") == "Ok":
        return
    elif response.get("type") == "Error":
        raise ClientError(response.get("message", "Unknown error"))
    else:
        raise ClientError(f"Unexpected response: {response}")

`fuzz_op(op_name, seed=None, iterations=100, fail_fast=False, batch_sizes=None, seq_lengths=None, hidden_dims=None, dtypes=None, layouts=None)` ¶

Fuzz test an op with random inputs.

Parameters:

Name	Type	Description	Default
`op_name`	`str`	Name of the op (must be registered in gpuemu.toml).	required
`seed`	`Optional[int]`	Master seed for reproducibility. If None, uses current timestamp.	`None`
`iterations`	`int`	Number of test cases to generate.	`100`
`fail_fast`	`bool`	Stop on first failure.	`False`
`batch_sizes`	`Optional[List[int]]`	List of batch sizes to use.	`None`
`seq_lengths`	`Optional[List[int]]`	List of sequence lengths to use.	`None`
`hidden_dims`	`Optional[List[int]]`	List of hidden dimensions to use.	`None`
`dtypes`	`Optional[List[str]]`	List of dtype strings to use.	`None`
`layouts`	`Optional[List[str]]`	List of layout types to use.	`None`

Returns:

Type	Description
`FuzzResults`	FuzzResults with pass/fail counts and list of failures.

Example

results = client.fuzz_op("matmul", seed=12345, iterations=100) print(f"Passed: {results.passed}/{results.total}") for failure in results.failures: ... print(f" Seed {failure.seed}: {failure.failures[0]['message']}")

Source code in gpuemu/client.py

def fuzz_op(
    self,
    op_name: str,
    seed: Optional[int] = None,
    iterations: int = 100,
    fail_fast: bool = False,
    batch_sizes: Optional[List[int]] = None,
    seq_lengths: Optional[List[int]] = None,
    hidden_dims: Optional[List[int]] = None,
    dtypes: Optional[List[str]] = None,
    layouts: Optional[List[str]] = None,
) -> FuzzResults:
    """Fuzz test an op with random inputs.

    Args:
        op_name: Name of the op (must be registered in gpuemu.toml).
        seed: Master seed for reproducibility. If None, uses current timestamp.
        iterations: Number of test cases to generate.
        fail_fast: Stop on first failure.
        batch_sizes: List of batch sizes to use.
        seq_lengths: List of sequence lengths to use.
        hidden_dims: List of hidden dimensions to use.
        dtypes: List of dtype strings to use.
        layouts: List of layout types to use.

    Returns:
        FuzzResults with pass/fail counts and list of failures.

    Example:
        >>> results = client.fuzz_op("matmul", seed=12345, iterations=100)
        >>> print(f"Passed: {results.passed}/{results.total}")
        >>> for failure in results.failures:
        ...     print(f"  Seed {failure.seed}: {failure.failures[0]['message']}")
    """
    if seed is None:
        seed = int(time.time_ns()) & 0xFFFFFFFFFFFFFFFF

    # Build fuzz config
    fuzz_config = {
        "seed": seed,
        "shape_options": {
            "batch_sizes": batch_sizes or [1, 2, 4, 8, 16, 32],
            "seq_lengths": seq_lengths or [64, 128, 256, 512, 1024],
            "hidden_dims": hidden_dims or [256, 512, 768, 1024],
            "edge_cases": [[1], [1, 1], [1, 1, 1]],
        },
        "dtypes": dtypes or ["float32", "float16"],
        "layouts": layouts or ["Contiguous", "Strided", "Transposed"],
    }

    request = {
        "type": "FuzzOp",
        "op_name": op_name,
        "fuzz_config": fuzz_config,
        "iterations": iterations,
        "fail_fast": fail_fast,
    }

    response = self._send_request(request)

    if response.get("type") == "FuzzResults":
        return FuzzResults.from_dict(response)
    elif response.get("type") == "Error":
        raise ClientError(response.get("message", "Unknown error"))
    else:
        raise ClientError(f"Unexpected response: {response}")

`reproduce(seed)` ¶

Reproduce a failing test case by seed.

Retrieves the stored failure and regenerates the exact inputs that caused the failure.

Parameters:

Name	Type	Description	Default
`seed`	`int`	The seed of the failing test case.	required

Returns:

Type	Description
`ReproduceResult`	ReproduceResult with the original result and regenerated inputs.

Example

repro = client.reproduce(12345) print(f"Op: {repro.result.op_name}") print(f"Input shape: {repro.inputs['input'].shape}")

Source code in gpuemu/client.py

def reproduce(self, seed: int) -> ReproduceResult:
    """Reproduce a failing test case by seed.

    Retrieves the stored failure and regenerates the exact inputs
    that caused the failure.

    Args:
        seed: The seed of the failing test case.

    Returns:
        ReproduceResult with the original result and regenerated inputs.

    Example:
        >>> repro = client.reproduce(12345)
        >>> print(f"Op: {repro.result.op_name}")
        >>> print(f"Input shape: {repro.inputs['input'].shape}")
    """
    request = {"type": "Reproduce", "seed": seed}
    response = self._send_request(request)

    if response.get("type") == "ReproduceResult":
        return ReproduceResult.from_dict(response, self._decode_tensor)
    elif response.get("type") == "Error":
        raise ClientError(response.get("message", "Unknown error"))
    else:
        raise ClientError(f"Unexpected response: {response}")

`minimize(seed, strategy='binary-search-dims', max_iters=100)` ¶

Minimize a failing test case.

Attempts to find a smaller input that still triggers the failure.

Parameters:

Name	Type	Description	Default
`seed`	`int`	The seed of the failing test case.	required
`strategy`	`str`	Minimization strategy. One of: - "binary-search-dims": Binary search to reduce dimensions. - "binary-search-values": Binary search to reduce values.	`'binary-search-dims'`
`max_iters`	`int`	Maximum iterations for minimization.	`100`

Returns:

Type	Description
`MinimizeResult`	MinimizeResult with minimized seed, shape, and result.

Example

result = client.minimize(12345) print(f"Minimized shape: {result.minimized_shape}")

Source code in gpuemu/client.py

def minimize(
    self,
    seed: int,
    strategy: str = "binary-search-dims",
    max_iters: int = 100,
) -> MinimizeResult:
    """Minimize a failing test case.

    Attempts to find a smaller input that still triggers the failure.

    Args:
        seed: The seed of the failing test case.
        strategy: Minimization strategy. One of:
            - "binary-search-dims": Binary search to reduce dimensions.
            - "binary-search-values": Binary search to reduce values.
        max_iters: Maximum iterations for minimization.

    Returns:
        MinimizeResult with minimized seed, shape, and result.

    Example:
        >>> result = client.minimize(12345)
        >>> print(f"Minimized shape: {result.minimized_shape}")
    """
    # Convert strategy string to protocol enum
    strategy_map = {
        "binary-search-dims": "BinarySearchDims",
        "binary-search-values": "BinarySearchValues",
    }
    proto_strategy = strategy_map.get(strategy, "BinarySearchDims")

    request = {
        "type": "Minimize",
        "seed": seed,
        "strategy": proto_strategy,
        "max_iters": max_iters,
    }
    response = self._send_request(request)

    if response.get("type") == "MinimizeResult":
        return MinimizeResult.from_dict(response)
    elif response.get("type") == "Error":
        raise ClientError(response.get("message", "Unknown error"))
    else:
        raise ClientError(f"Unexpected response: {response}")

`list_failures(limit=20)` ¶

List stored failures.

Parameters:

Name	Type	Description	Default
`limit`	`int`	Maximum number of failures to return.	`20`

Returns:

Type	Description
`List[ValidationResult]`	List of ValidationResult objects for failed tests.

Example

failures = client.list_failures(limit=10) for f in failures: ... print(f"Seed {f.seed}: {f.op_name}")

Source code in gpuemu/client.py

def list_failures(self, limit: int = 20) -> List[ValidationResult]:
    """List stored failures.

    Args:
        limit: Maximum number of failures to return.

    Returns:
        List of ValidationResult objects for failed tests.

    Example:
        >>> failures = client.list_failures(limit=10)
        >>> for f in failures:
        ...     print(f"Seed {f.seed}: {f.op_name}")
    """
    request = {"type": "ListFailures", "limit": limit}
    response = self._send_request(request)

    if response.get("type") == "Results":
        return [ValidationResult.from_dict(r) for r in response.get("results", [])]
    elif response.get("type") == "Error":
        raise ClientError(response.get("message", "Unknown error"))
    else:
        raise ClientError(f"Unexpected response: {response}")

`lint_kernel(ptx_content, kernel_name=None)` ¶

Lint PTX through the daemon's artifact analyzer.

Extracts static metrics (registers, spills, local memory, instruction mix) and checks them against configured thresholds. If no kernel is registered, the daemon detects the kernel name from the PTX and uses default thresholds.

Parameters:

Name	Type	Description	Default
`ptx_content`	`str`	Raw PTX assembly text.	required
`kernel_name`	`Optional[str]`	Optional kernel name to lint (else all / detected).	`None`

Returns:

Type	Description
`List[Dict[str, Any]]`	List of lint-result dicts, each with keys: kernel_name, passed,
`List[Dict[str, Any]]`	metrics (register_count, spill_count, ...), violations, timestamp.

Source code in gpuemu/client.py

def lint_kernel(
    self, ptx_content: str, kernel_name: Optional[str] = None
) -> List[Dict[str, Any]]:
    """Lint PTX through the daemon's artifact analyzer.

    Extracts static metrics (registers, spills, local memory, instruction mix)
    and checks them against configured thresholds. If no kernel is registered,
    the daemon detects the kernel name from the PTX and uses default thresholds.

    Args:
        ptx_content: Raw PTX assembly text.
        kernel_name: Optional kernel name to lint (else all / detected).

    Returns:
        List of lint-result dicts, each with keys: kernel_name, passed,
        metrics (register_count, spill_count, ...), violations, timestamp.
    """
    request = {
        "type": "LintKernel",
        "kernel_name": kernel_name,
        "ptx_content": ptx_content,
    }
    response = self._send_request(request)
    if response.get("type") == "LintResults":
        return response.get("results", [])
    elif response.get("type") == "Error":
        raise ClientError(response.get("message", "Unknown error"))
    raise ClientError(f"Unexpected response: {response}")

`get_test_case(op_name, seed=None)` ¶

Get a single test case from the daemon for client-side execution.

The daemon generates random inputs. The client runs the actual op on GPU and submits the output for validation via submit_output().

Parameters:

Name	Type	Description	Default
`op_name`	`str`	Name of the op (must be registered in gpuemu.toml).	required
`seed`	`Optional[int]`	Master seed for reproducibility. Auto-generated if None.	`None`

Returns:

Type	Description
`Dict[str, Any]`	Dict with 'seed', 'inputs' (dict of name->ndarray), 'shape', 'dtype', 'layout'.

Source code in gpuemu/client.py

def get_test_case(self, op_name: str, seed: Optional[int] = None) -> Dict[str, Any]:
    """Get a single test case from the daemon for client-side execution.

    The daemon generates random inputs. The client runs the actual op
    on GPU and submits the output for validation via submit_output().

    Args:
        op_name: Name of the op (must be registered in gpuemu.toml).
        seed: Master seed for reproducibility. Auto-generated if None.

    Returns:
        Dict with 'seed', 'inputs' (dict of name->ndarray), 'shape', 'dtype', 'layout'.
    """
    if seed is None:
        seed = int(time.time_ns()) & 0xFFFFFFFFFFFFFFFF

    fuzz_config = {
        "seed": seed,
        "shape_options": {
            "batch_sizes": [1, 2, 4, 8],
            "seq_lengths": [64, 128, 256],
            "hidden_dims": [256, 512],
            "edge_cases": [[1], [1, 1]],
        },
        "dtypes": ["float32", "float16"],
        "layouts": ["Contiguous", "Strided"],
    }

    request = {
        "type": "GetTestCase",
        "op_name": op_name,
        "fuzz_config": fuzz_config,
    }

    response = self._send_request(request)

    if response.get("type") == "TestCase":
        inputs = {
            name: self._decode_tensor(tensor)
            for name, tensor in response.get("inputs", {}).items()
        }
        return {
            "seed": response.get("seed", 0),
            "inputs": inputs,
            "shape": response.get("shape", []),
            "dtype": response.get("dtype", "float32"),
            "layout": response.get("layout", "contiguous"),
        }
    elif response.get("type") == "Error":
        raise ClientError(response.get("message", "Unknown error"))
    else:
        raise ClientError(f"Unexpected response: {response}")

`get_test_batch(op_name, count=10, seed=None, op_schema=None, dtypes=None)` ¶

Get a batch of test cases from the daemon.

Parameters:

Name	Type	Description	Default
`op_name`	`str`	Name of the op.	required
`count`	`int`	Number of test cases to generate.	`10`
`seed`	`Optional[int]`	Master seed. Auto-generated if None.	`None`
`op_schema`	`Optional[Dict[str, Any]]`	Optional operator-aware shape schema. When provided, the daemon generates per-input shapes from shared symbolic dims (e.g. matmul A[M,K]/B[K,N]) instead of one shape for all inputs. Shape: {"name", "dims": [{"name","candidates"}], "inputs": [{"name","dims"}], "output": {"name","dims"}}.	`None`

Returns:

Type	Description
`List[Dict[str, Any]]`	List of test case dicts (same format as get_test_case).

Source code in gpuemu/client.py

def get_test_batch(
    self,
    op_name: str,
    count: int = 10,
    seed: Optional[int] = None,
    op_schema: Optional[Dict[str, Any]] = None,
    dtypes: Optional[List[str]] = None,
) -> List[Dict[str, Any]]:
    """Get a batch of test cases from the daemon.

    Args:
        op_name: Name of the op.
        count: Number of test cases to generate.
        seed: Master seed. Auto-generated if None.
        op_schema: Optional operator-aware shape schema. When provided, the
            daemon generates per-input shapes from shared symbolic dims
            (e.g. matmul A[M,K]/B[K,N]) instead of one shape for all inputs.
            Shape: {"name", "dims": [{"name","candidates"}],
                    "inputs": [{"name","dims"}], "output": {"name","dims"}}.

    Returns:
        List of test case dicts (same format as get_test_case).
    """
    if seed is None:
        seed = int(time.time_ns()) & 0xFFFFFFFFFFFFFFFF

    fuzz_config = {
        "seed": seed,
        "shape_options": {
            "batch_sizes": [1, 2, 4, 8],
            "seq_lengths": [64, 128, 256],
            "hidden_dims": [256, 512],
            "edge_cases": [[1], [1, 1]],
        },
        "dtypes": dtypes or ["float32", "float16"],
        "layouts": ["Contiguous", "Strided"],
    }
    if op_schema is not None:
        fuzz_config["op_schema"] = op_schema

    request = {
        "type": "GetTestBatch",
        "op_name": op_name,
        "fuzz_config": fuzz_config,
        "count": count,
    }

    response = self._send_request(request)

    if response.get("type") == "TestBatch":
        cases = []
        for case_data in response.get("cases", []):
            inputs = {
                name: self._decode_tensor(tensor)
                for name, tensor in case_data.get("inputs", {}).items()
            }
            cases.append(
                {
                    "seed": case_data.get("seed", 0),
                    "inputs": inputs,
                    "shape": case_data.get("shape", []),
                    "dtype": case_data.get("dtype", "float32"),
                    "layout": case_data.get("layout", "contiguous"),
                }
            )
        return cases
    elif response.get("type") == "Error":
        raise ClientError(response.get("message", "Unknown error"))
    else:
        raise ClientError(f"Unexpected response: {response}")

`submit_output(op_name, inputs, output, seed, **kwargs)` ¶

Submit an op output for validation against the reference.

This is the core method for client-side and daemon-orchestrated execution modes. The client runs the actual GPU op and submits the result here for comparison.

Parameters:

Name	Type	Description	Default
`op_name`	`str`	Name of the op (must be registered in gpuemu.toml).	required
`inputs`	`Dict[str, ndarray]`	Input tensors as numpy arrays.	required
`output`	`ndarray`	Output tensor from the op under test.	required
`seed`	`int`	Seed of the test case (from get_test_case or get_test_batch).	required
`**kwargs`		Additional kwargs for the reference script.	`{}`

Returns:

Type	Description
`ValidationResult`	ValidationResult with pass/fail status and details.

Source code in gpuemu/client.py

def submit_output(
    self,
    op_name: str,
    inputs: Dict[str, np.ndarray],
    output: np.ndarray,
    seed: int,
    **kwargs,
) -> ValidationResult:
    """Submit an op output for validation against the reference.

    This is the core method for client-side and daemon-orchestrated
    execution modes. The client runs the actual GPU op and submits
    the result here for comparison.

    Args:
        op_name: Name of the op (must be registered in gpuemu.toml).
        inputs: Input tensors as numpy arrays.
        output: Output tensor from the op under test.
        seed: Seed of the test case (from get_test_case or get_test_batch).
        **kwargs: Additional kwargs for the reference script.

    Returns:
        ValidationResult with pass/fail status and details.
    """
    encoded_inputs = {
        name: self._encode_tensor(arr) for name, arr in inputs.items()
    }
    encoded_output = self._encode_tensor(output)

    request = {
        "type": "SubmitOutput",
        "op_name": op_name,
        "inputs": encoded_inputs,
        "output": encoded_output,
        "seed": seed,
        "kwargs": {k: str(v) for k, v in kwargs.items()},
    }

    response = self._send_request(request)

    if response.get("type") == "SubmitResult":
        return ValidationResult.from_dict(response.get("result", {}))
    elif response.get("type") == "Error":
        raise ClientError(response.get("message", "Unknown error"))
    else:
        raise ClientError(f"Unexpected response: {response}")

`fuzz_op_client_side(op_name, run_op, iterations=100, seed=None, fail_fast=False, op_schema=None, dtypes=None)` ¶

Fuzz an op using client-side execution (THE RECOMMENDED DROP-IN PATH).

This method generates random inputs via the daemon, runs the provided run_op callable on the client (which has GPU access), and validates the output against the reference script. This is how GPU developers should use gpuemu for fuzzing.

Parameters:

Name	Type	Description	Default
`op_name`	`str`	Name of the op (must be registered in gpuemu.toml).	required
`run_op`	`Callable[[Dict[str, ndarray]], ndarray]`	A callable that takes a dict of input tensors and returns the output tensor. This is where you call your GPU kernel.	required
`iterations`	`int`	Number of test cases to try.	`100`
`seed`	`Optional[int]`	Master seed. Auto-generated if None.	`None`
`fail_fast`	`bool`	Stop on first failure.	`False`
`op_schema`	`Optional[Dict[str, Any]]`	Optional operator-aware shape schema (see get_test_batch). Use for ops whose inputs have different but linked shapes (matmul, attention) so fuzzing covers the real operator domain.	`None`

Returns:

Type	Description
`FuzzResults`	FuzzResults with pass/fail counts and list of failures.

Example

client = Client() results = client.fuzz_op_client_side( ... "my_flash_attention", ... run_op=lambda inputs: my_flash_attn(inputs["q"], inputs["k"], inputs["v"]), ... iterations=50, ... ) print(f"Passed: {results.passed}/{results.total}")

Source code in gpuemu/client.py

def fuzz_op_client_side(
    self,
    op_name: str,
    run_op: "Callable[[Dict[str, np.ndarray]], np.ndarray]",
    iterations: int = 100,
    seed: Optional[int] = None,
    fail_fast: bool = False,
    op_schema: Optional[Dict[str, Any]] = None,
    dtypes: Optional[List[str]] = None,
) -> FuzzResults:
    """Fuzz an op using client-side execution (THE RECOMMENDED DROP-IN PATH).

    This method generates random inputs via the daemon, runs the provided
    ``run_op`` callable on the client (which has GPU access), and validates
    the output against the reference script. This is how GPU developers
    should use gpuemu for fuzzing.

    Args:
        op_name: Name of the op (must be registered in gpuemu.toml).
        run_op: A callable that takes a dict of input tensors and returns
                 the output tensor. This is where you call your GPU kernel.
        iterations: Number of test cases to try.
        seed: Master seed. Auto-generated if None.
        fail_fast: Stop on first failure.
        op_schema: Optional operator-aware shape schema (see get_test_batch).
            Use for ops whose inputs have different but linked shapes
            (matmul, attention) so fuzzing covers the real operator domain.

    Returns:
        FuzzResults with pass/fail counts and list of failures.

    Example:
        >>> client = Client()
        >>> results = client.fuzz_op_client_side(
        ...     "my_flash_attention",
        ...     run_op=lambda inputs: my_flash_attn(inputs["q"], inputs["k"], inputs["v"]),
        ...     iterations=50,
        ... )
        >>> print(f"Passed: {results.passed}/{results.total}")
    """
    if seed is None:
        seed = int(time.time_ns()) & 0xFFFFFFFFFFFFFFFF

    cases = self.get_test_batch(
        op_name, count=iterations, seed=seed, op_schema=op_schema, dtypes=dtypes
    )
    total = 0
    passed = 0
    failed = 0
    failures = []

    for case in cases:
        total += 1
        try:
            output = run_op(case["inputs"])
            result = self.submit_output(
                op_name, case["inputs"], output, case["seed"]
            )
            if result.passed:
                passed += 1
            else:
                failed += 1
                failures.append(result)
                if fail_fast:
                    break
        except Exception as e:
            failed += 1
            failures.append(
                ValidationResult(
                    passed=False,
                    seed=case["seed"],
                    op_name=op_name,
                    max_diff=float("inf"),
                    max_rel_diff=float("inf"),
                    failures=[{"kind": "ExecutionError", "message": str(e)}],
                    timestamp=int(time.time()),
                    duration_ms=0,
                )
            )
            if fail_fast:
                break

    return FuzzResults(
        seed=seed,
        total=total,
        passed=passed,
        failed=failed,
        failures=failures,
    )

Python API Reference¶

Client¶

Constructor¶

Methods¶

ping()¶

validate_op()¶

get_result()¶

list_results()¶

store_baseline()¶

fuzz_op()¶

reproduce()¶

minimize()¶

list_failures()¶

get_test_case()¶

get_test_batch()¶

submit_output()¶

fuzz_op_client_side()¶

Data Classes¶

ValidationResult¶

FuzzResults¶

ReproduceResult¶

MinimizeResult¶

ReproductionInfo¶

Validation Utilities¶

validate_op() Context Manager¶

Fuzz Generators¶

fuzz_shapes()¶

fuzz_dtypes()¶

fuzz_layouts()¶

fuzz_shapes_seeded()¶

fuzz_dtypes_seeded()¶

fuzz_layouts_seeded()¶

generate_random_tensor()¶

FuzzConfig¶

SeededFuzzer¶

TestCase¶

RNG¶

SeededRng¶

Standalone Functions¶

derive_seed()¶

generate_seed()¶

Tolerances¶

ToleranceConfig¶

ToleranceProfile¶

Standalone Functions¶

calibrate_tolerance()¶

get_recommended_tolerance()¶

Auto-generated API Documentation¶

gpuemu.client.Client ¶

__init__(socket_path=None, timeout_ms=30000) ¶

close() ¶

ping() ¶

validate_op(op_name, inputs, output, **kwargs) ¶

get_result(seed) ¶

list_results(limit=100) ¶

store_baseline(tag) ¶

fuzz_op(op_name, seed=None, iterations=100, fail_fast=False, batch_sizes=None, seq_lengths=None, hidden_dims=None, dtypes=None, layouts=None) ¶

reproduce(seed) ¶

minimize(seed, strategy='binary-search-dims', max_iters=100) ¶

list_failures(limit=20) ¶

lint_kernel(ptx_content, kernel_name=None) ¶

get_test_case(op_name, seed=None) ¶

get_test_batch(op_name, count=10, seed=None, op_schema=None, dtypes=None) ¶

submit_output(op_name, inputs, output, seed, **kwargs) ¶

fuzz_op_client_side(op_name, run_op, iterations=100, seed=None, fail_fast=False, op_schema=None, dtypes=None) ¶

`ping()`¶

`validate_op()`¶

`get_result()`¶

`list_results()`¶

`store_baseline()`¶

`fuzz_op()`¶

`reproduce()`¶

`minimize()`¶

`list_failures()`¶

`get_test_case()`¶

`get_test_batch()`¶

`submit_output()`¶

`fuzz_op_client_side()`¶

`ValidationResult`¶

`FuzzResults`¶

`ReproduceResult`¶

`MinimizeResult`¶

`ReproductionInfo`¶

`validate_op()` Context Manager¶

`fuzz_shapes()`¶

`fuzz_dtypes()`¶

`fuzz_layouts()`¶

`fuzz_shapes_seeded()`¶

`fuzz_dtypes_seeded()`¶

`fuzz_layouts_seeded()`¶

`generate_random_tensor()`¶

`FuzzConfig`¶

`SeededFuzzer`¶

`TestCase`¶

`SeededRng`¶

`derive_seed()`¶

`generate_seed()`¶

`ToleranceConfig`¶

`ToleranceProfile`¶

`calibrate_tolerance()`¶

`get_recommended_tolerance()`¶

`gpuemu.client.Client` ¶

`init(socket_path=None, timeout_ms=30000)` ¶

`close()` ¶

`ping()` ¶

`validate_op(op_name, inputs, output, **kwargs)` ¶

`get_result(seed)` ¶

`list_results(limit=100)` ¶

`store_baseline(tag)` ¶

`fuzz_op(op_name, seed=None, iterations=100, fail_fast=False, batch_sizes=None, seq_lengths=None, hidden_dims=None, dtypes=None, layouts=None)` ¶

`reproduce(seed)` ¶

`minimize(seed, strategy='binary-search-dims', max_iters=100)` ¶

`list_failures(limit=20)` ¶

`lint_kernel(ptx_content, kernel_name=None)` ¶

`get_test_case(op_name, seed=None)` ¶

`get_test_batch(op_name, count=10, seed=None, op_schema=None, dtypes=None)` ¶

`submit_output(op_name, inputs, output, seed, **kwargs)` ¶

`fuzz_op_client_side(op_name, run_op, iterations=100, seed=None, fail_fast=False, op_schema=None, dtypes=None)` ¶