Skip to content

Python API Reference

Complete reference for the gpuemu Python client library. This package provides a high-level interface for communicating with the gpuemu daemon, running validations, fuzz testing, and managing results.

pip install gpuemu

Client

The primary interface for interacting with the gpuemu daemon.

Constructor

Client(socket_path: str | None = None, timeout_ms: int = 30000)
Parameter Type Default Description
socket_path str \| None None Path to the daemon Unix socket. Defaults to ~/.gpuemu/gpuemu.sock.
timeout_ms int 30000 Request timeout in milliseconds

Context Manager Support

The Client class supports the context manager protocol for automatic cleanup:

from gpuemu import Client

with Client() as client:
    result = client.ping()
    print(result)

Methods

ping()

Check connectivity with the daemon.

def ping() -> str

Returns "pong" if the daemon is reachable.


validate_op()

Validate an operation against its reference implementation.

def validate_op(
    op_name: str,
    inputs: dict[str, np.ndarray],
    output: np.ndarray,
    dtype: str = "float32",
    seed: int | None = None,
) -> ValidationResult
Parameter Type Description
op_name str Name of the op (must match gpuemu.toml)
inputs dict[str, np.ndarray] Named input tensors
output np.ndarray The output tensor to validate
dtype str Data type used for tolerance lookup
seed int \| None Optional seed for reproducibility

Returns a ValidationResult.


get_result()

Retrieve a stored validation result by seed.

def get_result(seed: int) -> ValidationResult

list_results()

List all stored validation results.

def list_results() -> list[ValidationResult]

store_baseline()

Store current results as a named baseline.

def store_baseline(tag: str) -> None

fuzz_op()

Run daemon-side fuzz testing on an operation.

def fuzz_op(
    op_name: str,
    iterations: int = 100,
    seed: int | None = None,
) -> FuzzResults
Parameter Type Default Description
op_name str Name of the op to fuzz
iterations int 100 Number of fuzz iterations
seed int \| None None Fixed seed for reproducibility

Returns a FuzzResults.


reproduce()

Reproduce a specific fuzz failure.

def reproduce(seed: int) -> ReproduceResult

Returns a ReproduceResult.


minimize()

Minimize a failing test case.

def minimize(
    seed: int,
    strategy: str | None = None,
    max_iters: int = 100,
) -> MinimizeResult
Parameter Type Default Description
seed int Seed of the failure to minimize
strategy str \| None None "binary-search-dims" or "binary-search-values"
max_iters int 100 Maximum minimization iterations

Returns a MinimizeResult.


list_failures()

List stored fuzz failures.

def list_failures(limit: int = 20) -> list[ValidationResult]

get_test_case()

Retrieve a specific test case from the daemon for client-side execution.

def get_test_case(op_name: str, seed: int) -> dict

Returns a dictionary containing the test case inputs and metadata.


get_test_batch()

Retrieve a batch of test cases for client-side execution.

def get_test_batch(op_name: str, seeds: list[int]) -> list[dict]

submit_output()

Submit the output of a client-side execution back to the daemon for validation.

def submit_output(
    op_name: str,
    seed: int,
    output: np.ndarray,
) -> ValidationResult

fuzz_op_client_side()

Run client-side fuzz testing. The daemon generates test cases, the client executes them locally, and submits results back for validation.

def fuzz_op_client_side(
    op_name: str,
    op_fn: Callable,
    iterations: int = 100,
    seed: int | None = None,
) -> FuzzResults
Parameter Type Description
op_name str Name of the op to fuzz
op_fn Callable The function under test
iterations int Number of fuzz iterations
seed int \| None Fixed seed for reproducibility

Data Classes

ValidationResult

Result of a single validation run.

Field Type Description
passed bool Whether the validation passed
seed int Seed used for this validation
op_name str Name of the validated op
max_diff float Maximum absolute difference
max_rel_diff float Maximum relative difference
failures list[str] List of failure descriptions
timestamp str ISO 8601 timestamp
duration_ms int Validation duration in milliseconds
repro_info ReproductionInfo \| None Reproduction information if the test failed

FuzzResults

Aggregated results from a fuzz testing session.

Field Type Description
seed int Root seed for this fuzz session
total int Total number of iterations run
passed int Number of passing iterations
failed int Number of failing iterations
failures list[ValidationResult] Detailed results for each failure

ReproduceResult

Result of reproducing a specific failure.

Field Type Description
result ValidationResult The validation result of the reproduction
inputs dict[str, np.ndarray] The input tensors that triggered the failure

MinimizeResult

Result of minimizing a failing test case.

Field Type Description
original_seed int The original failure seed
minimized_seed int Seed for the minimized test case
minimized_shape tuple[int, ...] The minimized input shape
result ValidationResult Validation result of the minimized case

ReproductionInfo

Metadata needed to exactly reproduce a test case.

Field Type Description
seed int RNG seed
shape tuple[int, ...] Input tensor shape
strides tuple[int, ...] Input tensor strides
dtype str Data type string
layout str Memory layout descriptor
fuzz_config FuzzConfig The fuzz configuration used
input_snapshot dict Serialized snapshot of input values

Validation Utilities

validate_op() Context Manager

A convenience context manager that wraps op execution with automatic validation.

from gpuemu.validation import validate_op

with validate_op("softmax", inputs={"logits": x}) as ctx:
    output = my_softmax(x)
    ctx.set_output(output)

assert ctx.result.passed

Fuzz Generators

Generators that yield randomized configurations for fuzz testing.

fuzz_shapes()

def fuzz_shapes(
    min_dims: int = 1,
    max_dims: int = 4,
    min_size: int = 1,
    max_size: int = 1024,
) -> Iterator[tuple[int, ...]]

Yields random tensor shapes.


fuzz_dtypes()

def fuzz_dtypes(
    include: list[str] | None = None,
    exclude: list[str] | None = None,
) -> Iterator[str]

Yields random dtype strings, optionally filtered.


fuzz_layouts()

def fuzz_layouts() -> Iterator[str]

Yields random memory layouts ("contiguous", "strided", "channels_last", etc.).


fuzz_shapes_seeded()

def fuzz_shapes_seeded(seed: int, **kwargs) -> Iterator[tuple[int, ...]]

Deterministic variant of fuzz_shapes() with a fixed seed.


fuzz_dtypes_seeded()

def fuzz_dtypes_seeded(seed: int, **kwargs) -> Iterator[str]

Deterministic variant of fuzz_dtypes() with a fixed seed.


fuzz_layouts_seeded()

def fuzz_layouts_seeded(seed: int) -> Iterator[str]

Deterministic variant of fuzz_layouts() with a fixed seed.


generate_random_tensor()

Generate a random tensor from a seed and specification.

def generate_random_tensor(
    seed: int,
    shape: tuple[int, ...],
    dtype: str = "float32",
    domain: tuple[float, float] = (-1.0, 1.0),
) -> np.ndarray
Parameter Type Default Description
seed int RNG seed for reproducibility
shape tuple[int, ...] Tensor shape
dtype str "float32" NumPy-compatible dtype string
domain tuple[float, float] (-1.0, 1.0) Value range (min, max)

FuzzConfig

Configuration dataclass for fuzz testing sessions.

@dataclass
class FuzzConfig:
    iterations: int = 100
    seed: int | None = None
    min_dims: int = 1
    max_dims: int = 4
    min_size: int = 1
    max_size: int = 1024
    dtypes: list[str] = field(default_factory=lambda: ["float32"])
    layouts: list[str] = field(default_factory=lambda: ["contiguous"])

SeededFuzzer

A stateful fuzzer that generates reproducible test cases.

class SeededFuzzer:
    def __init__(self, seed: int, config: FuzzConfig | None = None): ...
    def next_test_case(self) -> TestCase: ...
    def run(self, op_fn: Callable) -> FuzzResults: ...

TestCase

@dataclass
class TestCase:
    seed: int
    shape: tuple[int, ...]
    dtype: str
    layout: str
    inputs: dict[str, np.ndarray]

RNG

Deterministic random number generation for reproducible testing.

SeededRng

A portable, seedable RNG that produces identical sequences across Python and Rust.

class SeededRng:
    def __init__(self, seed: int): ...
    def derive(self, domain: str) -> "SeededRng": ...
    def choice(self, items: list[T]) -> T: ...
    def gen_range(self, low: int, high: int) -> int: ...
    def gen_u64(self) -> int: ...
    def gen_f32(self) -> float: ...
    def randn(self, shape: tuple[int, ...]) -> np.ndarray: ...
Method Description
derive(domain) Create a child RNG scoped to a named domain
choice(items) Pick a random element from a list
gen_range(low, high) Generate an integer in [low, high)
gen_u64() Generate a random unsigned 64-bit integer
gen_f32() Generate a random float in [0.0, 1.0)
randn(shape) Generate a tensor of normally distributed values

Standalone Functions

derive_seed()

def derive_seed(seed: int, domain: str) -> int

Derive a new seed by hashing the parent seed with a domain string.


generate_seed()

def generate_seed() -> int

Generate a fresh random seed from system entropy.


Tolerances

Utilities for managing numerical comparison tolerances.

ToleranceConfig

Configuration for a single tolerance check.

@dataclass
class ToleranceConfig:
    atol: float  # Absolute tolerance
    rtol: float  # Relative tolerance
Method Description
for_dtype(dtype: str) Return a ToleranceConfig appropriate for the given dtype
strict() Return a strict tolerance (atol=1e-7, rtol=1e-7)
relaxed() Return a relaxed tolerance (atol=1e-3, rtol=1e-3)
scale(factor: float) Return a new config with tolerances scaled by factor

ToleranceProfile

Named tolerance profiles for common use cases.

class ToleranceProfile:
    @staticmethod
    def get(name: str) -> ToleranceConfig: ...

    @staticmethod
    def for_testing() -> ToleranceConfig: ...

    @staticmethod
    def for_production() -> ToleranceConfig: ...

    @staticmethod
    def for_cross_framework() -> ToleranceConfig: ...
Profile Description
for_testing() Relaxed tolerances suitable for development
for_production() Strict tolerances for production validation
for_cross_framework() Tolerances accounting for cross-framework numerical variance

Standalone Functions

calibrate_tolerance()

def calibrate_tolerance(
    op_fn: Callable,
    ref_fn: Callable,
    shapes: list[tuple[int, ...]],
    dtype: str = "float32",
    n_samples: int = 100,
) -> ToleranceConfig

Empirically determine appropriate tolerances by running both functions on random inputs.


def get_recommended_tolerance(
    dtype: str,
    op_type: str = "elementwise",
) -> ToleranceConfig

Return recommended tolerance values based on dtype and operation type.


Auto-generated API Documentation

gpuemu.client.Client

Client for communicating with the gpuemu daemon.

Example

client = Client() client.ping()

Source code in gpuemu/client.py
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
class Client:
    """Client for communicating with the gpuemu daemon.

    Example:
        >>> client = Client()
        >>> client.ping()
        {'version': '0.1.0', 'uptime_secs': 123}
    """

    def __init__(
        self,
        socket_path: Optional[str] = None,
        timeout_ms: int = 30000,
    ):
        """Initialize the client.

        Args:
            socket_path: Path to the daemon socket. Defaults to ~/.gpuemu/gpuemu.sock
            timeout_ms: Timeout for requests in milliseconds.
        """
        if socket_path is None:
            socket_path = os.path.expanduser("~/.gpuemu/gpuemu.sock")

        self.socket_path = socket_path
        self.timeout_ms = timeout_ms
        self._socket = None
        self._version_checked = False

    def _ensure_connected(self):
        """Ensure we have a connection to the daemon."""
        if not HAS_PYNNG:
            raise ImportError(
                "pynng is required for gpuemu. Install with: pip install pynng"
            )

        if self._socket is None:
            sock = pynng.Req0()
            sock.recv_timeout = self.timeout_ms
            sock.send_timeout = self.timeout_ms

            socket_url = f"ipc://{self.socket_path}"
            try:
                sock.dial(socket_url)
            except pynng.exceptions.ConnectionRefused:
                raise ClientError(
                    f"Cannot connect to daemon at {self.socket_path}. "
                    "Is the daemon running? Start it with: gpuemu daemon start"
                )
            # Set the socket BEFORE the version check so the check's own request
            # reuses it (the guard prevents re-entrant reconnect/recursion).
            self._socket = sock
            if not self._version_checked:
                self._version_checked = True
                self._check_protocol_version()

        return self._socket

    def _check_protocol_version(self):
        """Verify daemon protocol version is compatible (called once on connect)."""
        try:
            ping_resp = self._send_request({"type": "Ping"})
            daemon_pv = ping_resp.get("protocol_version", 0)
            if daemon_pv != PROTOCOL_VERSION:
                raise ClientError(
                    f"Protocol version mismatch: client={PROTOCOL_VERSION}, "
                    f"daemon={daemon_pv}. Please upgrade the "
                    f"{'client' if daemon_pv > PROTOCOL_VERSION else 'daemon'}."
                )
        except ClientError:
            raise
        except Exception:
            pass

    def close(self):
        """Close the connection."""
        if self._socket is not None:
            self._socket.close()
            self._socket = None
        self._version_checked = False

    def __enter__(self):
        return self

    def __exit__(self, *args):
        self.close()

    def _send_request(self, request: Dict[str, Any]) -> Dict[str, Any]:
        """Send a request and return the response."""
        socket = self._ensure_connected()

        # Serialize request as JSON (simple protocol for MVP)
        request_bytes = json.dumps(request).encode("utf-8")

        try:
            socket.send(request_bytes)
            response_bytes = socket.recv()
            return json.loads(response_bytes.decode("utf-8"))
        except pynng.exceptions.Timeout:
            raise ClientError("Request timed out")
        except Exception as e:
            raise ClientError(f"Request failed: {e}")

    def ping(self) -> Dict[str, Any]:
        """Ping the daemon to check if it's alive.

        Returns:
            Dict with 'version', 'protocol_version', and 'uptime_secs'.

        Raises:
            ClientError: If the daemon has an incompatible protocol version.
        """
        response = self._send_request({"type": "Ping"})

        if response.get("type") == "Pong":
            daemon_pv = response.get("protocol_version", 0)
            if daemon_pv != PROTOCOL_VERSION:
                raise ClientError(
                    f"Protocol version mismatch: client={PROTOCOL_VERSION}, "
                    f"daemon={daemon_pv}. Please upgrade the "
                    f"{'client' if daemon_pv > PROTOCOL_VERSION else 'daemon'}."
                )
            return {
                "version": response.get("version", "unknown"),
                "protocol_version": daemon_pv,
                "uptime_secs": response.get("uptime_secs", 0),
            }
        elif response.get("type") == "Error":
            raise ClientError(response.get("message", "Unknown error"))
        else:
            raise ClientError(f"Unexpected response: {response}")

    def validate_op(
        self,
        op_name: str,
        inputs: Dict[str, np.ndarray],
        output: np.ndarray,
        **kwargs,
    ) -> ValidationResult:
        """Validate an op output against its reference implementation.

        Args:
            op_name: Name of the op (must be registered in gpuemu.toml).
            inputs: Input tensors as numpy arrays.
            output: Output tensor to validate.
            **kwargs: Additional kwargs to pass to the reference script.

        Returns:
            ValidationResult with pass/fail status and details.
        """
        # Encode tensors for transmission
        encoded_inputs = {
            name: self._encode_tensor(arr) for name, arr in inputs.items()
        }
        encoded_output = self._encode_tensor(output)

        request = {
            "type": "ValidateOp",
            "op_name": op_name,
            "inputs": encoded_inputs,
            "output": encoded_output,
            "kwargs": {k: str(v) for k, v in kwargs.items()},
        }

        response = self._send_request(request)

        if response.get("type") == "ValidationResult":
            return ValidationResult.from_dict(response.get("result", {}))
        elif response.get("type") == "Error":
            raise ClientError(response.get("message", "Unknown error"))
        else:
            raise ClientError(f"Unexpected response: {response}")

    def get_result(self, seed: int) -> Optional[ValidationResult]:
        """Get a stored validation result by seed.

        Args:
            seed: The seed of the validation run.

        Returns:
            ValidationResult if found, None otherwise.
        """
        request = {"type": "GetResult", "seed": seed}
        response = self._send_request(request)

        if response.get("type") == "ValidationResult":
            return ValidationResult.from_dict(response.get("result", {}))
        elif response.get("type") == "Error":
            if response.get("code") == "NotFound":
                return None
            raise ClientError(response.get("message", "Unknown error"))
        else:
            raise ClientError(f"Unexpected response: {response}")

    def list_results(self, limit: int = 100) -> List[ValidationResult]:
        """List recent validation results.

        Args:
            limit: Maximum number of results to return.

        Returns:
            List of ValidationResult objects.
        """
        request = {"type": "ListResults", "limit": limit}
        response = self._send_request(request)

        if response.get("type") == "Results":
            return [ValidationResult.from_dict(r) for r in response.get("results", [])]
        elif response.get("type") == "Error":
            raise ClientError(response.get("message", "Unknown error"))
        else:
            raise ClientError(f"Unexpected response: {response}")

    def store_baseline(self, tag: str) -> None:
        """Store current results as a baseline.

        Args:
            tag: Tag name for the baseline.
        """
        request = {"type": "StoreBaseline", "tag": tag}
        response = self._send_request(request)

        if response.get("type") == "Ok":
            return
        elif response.get("type") == "Error":
            raise ClientError(response.get("message", "Unknown error"))
        else:
            raise ClientError(f"Unexpected response: {response}")

    # =========================================================================
    # Phase 2: Fuzzing and Reproducibility
    # =========================================================================

    def fuzz_op(
        self,
        op_name: str,
        seed: Optional[int] = None,
        iterations: int = 100,
        fail_fast: bool = False,
        batch_sizes: Optional[List[int]] = None,
        seq_lengths: Optional[List[int]] = None,
        hidden_dims: Optional[List[int]] = None,
        dtypes: Optional[List[str]] = None,
        layouts: Optional[List[str]] = None,
    ) -> FuzzResults:
        """Fuzz test an op with random inputs.

        Args:
            op_name: Name of the op (must be registered in gpuemu.toml).
            seed: Master seed for reproducibility. If None, uses current timestamp.
            iterations: Number of test cases to generate.
            fail_fast: Stop on first failure.
            batch_sizes: List of batch sizes to use.
            seq_lengths: List of sequence lengths to use.
            hidden_dims: List of hidden dimensions to use.
            dtypes: List of dtype strings to use.
            layouts: List of layout types to use.

        Returns:
            FuzzResults with pass/fail counts and list of failures.

        Example:
            >>> results = client.fuzz_op("matmul", seed=12345, iterations=100)
            >>> print(f"Passed: {results.passed}/{results.total}")
            >>> for failure in results.failures:
            ...     print(f"  Seed {failure.seed}: {failure.failures[0]['message']}")
        """
        if seed is None:
            seed = int(time.time_ns()) & 0xFFFFFFFFFFFFFFFF

        # Build fuzz config
        fuzz_config = {
            "seed": seed,
            "shape_options": {
                "batch_sizes": batch_sizes or [1, 2, 4, 8, 16, 32],
                "seq_lengths": seq_lengths or [64, 128, 256, 512, 1024],
                "hidden_dims": hidden_dims or [256, 512, 768, 1024],
                "edge_cases": [[1], [1, 1], [1, 1, 1]],
            },
            "dtypes": dtypes or ["float32", "float16"],
            "layouts": layouts or ["Contiguous", "Strided", "Transposed"],
        }

        request = {
            "type": "FuzzOp",
            "op_name": op_name,
            "fuzz_config": fuzz_config,
            "iterations": iterations,
            "fail_fast": fail_fast,
        }

        response = self._send_request(request)

        if response.get("type") == "FuzzResults":
            return FuzzResults.from_dict(response)
        elif response.get("type") == "Error":
            raise ClientError(response.get("message", "Unknown error"))
        else:
            raise ClientError(f"Unexpected response: {response}")

    def reproduce(self, seed: int) -> ReproduceResult:
        """Reproduce a failing test case by seed.

        Retrieves the stored failure and regenerates the exact inputs
        that caused the failure.

        Args:
            seed: The seed of the failing test case.

        Returns:
            ReproduceResult with the original result and regenerated inputs.

        Example:
            >>> repro = client.reproduce(12345)
            >>> print(f"Op: {repro.result.op_name}")
            >>> print(f"Input shape: {repro.inputs['input'].shape}")
        """
        request = {"type": "Reproduce", "seed": seed}
        response = self._send_request(request)

        if response.get("type") == "ReproduceResult":
            return ReproduceResult.from_dict(response, self._decode_tensor)
        elif response.get("type") == "Error":
            raise ClientError(response.get("message", "Unknown error"))
        else:
            raise ClientError(f"Unexpected response: {response}")

    def minimize(
        self,
        seed: int,
        strategy: str = "binary-search-dims",
        max_iters: int = 100,
    ) -> MinimizeResult:
        """Minimize a failing test case.

        Attempts to find a smaller input that still triggers the failure.

        Args:
            seed: The seed of the failing test case.
            strategy: Minimization strategy. One of:
                - "binary-search-dims": Binary search to reduce dimensions.
                - "binary-search-values": Binary search to reduce values.
            max_iters: Maximum iterations for minimization.

        Returns:
            MinimizeResult with minimized seed, shape, and result.

        Example:
            >>> result = client.minimize(12345)
            >>> print(f"Minimized shape: {result.minimized_shape}")
        """
        # Convert strategy string to protocol enum
        strategy_map = {
            "binary-search-dims": "BinarySearchDims",
            "binary-search-values": "BinarySearchValues",
        }
        proto_strategy = strategy_map.get(strategy, "BinarySearchDims")

        request = {
            "type": "Minimize",
            "seed": seed,
            "strategy": proto_strategy,
            "max_iters": max_iters,
        }
        response = self._send_request(request)

        if response.get("type") == "MinimizeResult":
            return MinimizeResult.from_dict(response)
        elif response.get("type") == "Error":
            raise ClientError(response.get("message", "Unknown error"))
        else:
            raise ClientError(f"Unexpected response: {response}")

    def list_failures(self, limit: int = 20) -> List[ValidationResult]:
        """List stored failures.

        Args:
            limit: Maximum number of failures to return.

        Returns:
            List of ValidationResult objects for failed tests.

        Example:
            >>> failures = client.list_failures(limit=10)
            >>> for f in failures:
            ...     print(f"Seed {f.seed}: {f.op_name}")
        """
        request = {"type": "ListFailures", "limit": limit}
        response = self._send_request(request)

        if response.get("type") == "Results":
            return [ValidationResult.from_dict(r) for r in response.get("results", [])]
        elif response.get("type") == "Error":
            raise ClientError(response.get("message", "Unknown error"))
        else:
            raise ClientError(f"Unexpected response: {response}")

    # =========================================================================
    # Phase 3: Artifact Inspection
    # =========================================================================

    def lint_kernel(
        self, ptx_content: str, kernel_name: Optional[str] = None
    ) -> List[Dict[str, Any]]:
        """Lint PTX through the daemon's artifact analyzer.

        Extracts static metrics (registers, spills, local memory, instruction mix)
        and checks them against configured thresholds. If no kernel is registered,
        the daemon detects the kernel name from the PTX and uses default thresholds.

        Args:
            ptx_content: Raw PTX assembly text.
            kernel_name: Optional kernel name to lint (else all / detected).

        Returns:
            List of lint-result dicts, each with keys: kernel_name, passed,
            metrics (register_count, spill_count, ...), violations, timestamp.
        """
        request = {
            "type": "LintKernel",
            "kernel_name": kernel_name,
            "ptx_content": ptx_content,
        }
        response = self._send_request(request)
        if response.get("type") == "LintResults":
            return response.get("results", [])
        elif response.get("type") == "Error":
            raise ClientError(response.get("message", "Unknown error"))
        raise ClientError(f"Unexpected response: {response}")

    @staticmethod
    def _encode_tensor(arr: np.ndarray) -> Dict[str, Any]:
        """Encode a numpy array for transmission."""
        return {
            "shape": list(arr.shape),
            "strides": list(arr.strides),
            "dtype": Client._numpy_dtype_to_protocol(arr.dtype),
            "data": base64.b64encode(arr.tobytes()).decode("utf-8"),
        }

    @staticmethod
    def _numpy_dtype_to_protocol(dtype: np.dtype) -> str:
        """Convert a numpy dtype to the protocol dtype string.

        Maps numpy dtypes to the Rust DType enum variant names
        (lowercase, matching serde serialization).
        """
        mapping = {
            "float16": "float16",
            "float32": "float32",
            "float64": "float64",
            "int8": "int8",
            "int16": "int16",
            "int32": "int32",
            "int64": "int64",
            "uint8": "uint8",
            "uint16": "uint16",
            "uint32": "uint32",
            "uint64": "uint64",
            "bool": "bool",
        }
        name = str(dtype)
        if name in mapping:
            return mapping[name]
        if "bfloat16" in name or "bf16" in name:
            return "bfloat16"
        return name

    @staticmethod
    def _protocol_dtype_to_numpy(dtype_str: str) -> np.dtype:
        """Convert a protocol dtype string back to a numpy dtype.

        Handles bfloat16 by falling back to float16 as proxy,
        since numpy has no native bfloat16.
        """
        mapping = {
            "float16": np.float16,
            "bfloat16": np.float16,
            "float32": np.float32,
            "float64": np.float64,
            "int8": np.int8,
            "int16": np.int16,
            "int32": np.int32,
            "int64": np.int64,
            "uint8": np.uint8,
            "uint16": np.uint16,
            "uint32": np.uint32,
            "uint64": np.uint64,
            "bool": np.bool_,
        }
        return np.dtype(mapping.get(dtype_str, np.float32))

    @staticmethod
    def _decode_tensor(data: Dict[str, Any]) -> np.ndarray:
        """Decode a numpy array from transmission format."""
        shape = tuple(data["shape"])
        dtype = Client._protocol_dtype_to_numpy(data.get("dtype", "float32"))
        raw = base64.b64decode(data["data"])
        return np.frombuffer(raw, dtype=dtype).reshape(shape).copy()

    # =========================================================================
    # Execution Modes: Client-Side Fuzzing
    # =========================================================================

    def get_test_case(self, op_name: str, seed: Optional[int] = None) -> Dict[str, Any]:
        """Get a single test case from the daemon for client-side execution.

        The daemon generates random inputs. The client runs the actual op
        on GPU and submits the output for validation via submit_output().

        Args:
            op_name: Name of the op (must be registered in gpuemu.toml).
            seed: Master seed for reproducibility. Auto-generated if None.

        Returns:
            Dict with 'seed', 'inputs' (dict of name->ndarray), 'shape', 'dtype', 'layout'.
        """
        if seed is None:
            seed = int(time.time_ns()) & 0xFFFFFFFFFFFFFFFF

        fuzz_config = {
            "seed": seed,
            "shape_options": {
                "batch_sizes": [1, 2, 4, 8],
                "seq_lengths": [64, 128, 256],
                "hidden_dims": [256, 512],
                "edge_cases": [[1], [1, 1]],
            },
            "dtypes": ["float32", "float16"],
            "layouts": ["Contiguous", "Strided"],
        }

        request = {
            "type": "GetTestCase",
            "op_name": op_name,
            "fuzz_config": fuzz_config,
        }

        response = self._send_request(request)

        if response.get("type") == "TestCase":
            inputs = {
                name: self._decode_tensor(tensor)
                for name, tensor in response.get("inputs", {}).items()
            }
            return {
                "seed": response.get("seed", 0),
                "inputs": inputs,
                "shape": response.get("shape", []),
                "dtype": response.get("dtype", "float32"),
                "layout": response.get("layout", "contiguous"),
            }
        elif response.get("type") == "Error":
            raise ClientError(response.get("message", "Unknown error"))
        else:
            raise ClientError(f"Unexpected response: {response}")

    def get_test_batch(
        self,
        op_name: str,
        count: int = 10,
        seed: Optional[int] = None,
        op_schema: Optional[Dict[str, Any]] = None,
        dtypes: Optional[List[str]] = None,
    ) -> List[Dict[str, Any]]:
        """Get a batch of test cases from the daemon.

        Args:
            op_name: Name of the op.
            count: Number of test cases to generate.
            seed: Master seed. Auto-generated if None.
            op_schema: Optional operator-aware shape schema. When provided, the
                daemon generates per-input shapes from shared symbolic dims
                (e.g. matmul A[M,K]/B[K,N]) instead of one shape for all inputs.
                Shape: {"name", "dims": [{"name","candidates"}],
                        "inputs": [{"name","dims"}], "output": {"name","dims"}}.

        Returns:
            List of test case dicts (same format as get_test_case).
        """
        if seed is None:
            seed = int(time.time_ns()) & 0xFFFFFFFFFFFFFFFF

        fuzz_config = {
            "seed": seed,
            "shape_options": {
                "batch_sizes": [1, 2, 4, 8],
                "seq_lengths": [64, 128, 256],
                "hidden_dims": [256, 512],
                "edge_cases": [[1], [1, 1]],
            },
            "dtypes": dtypes or ["float32", "float16"],
            "layouts": ["Contiguous", "Strided"],
        }
        if op_schema is not None:
            fuzz_config["op_schema"] = op_schema

        request = {
            "type": "GetTestBatch",
            "op_name": op_name,
            "fuzz_config": fuzz_config,
            "count": count,
        }

        response = self._send_request(request)

        if response.get("type") == "TestBatch":
            cases = []
            for case_data in response.get("cases", []):
                inputs = {
                    name: self._decode_tensor(tensor)
                    for name, tensor in case_data.get("inputs", {}).items()
                }
                cases.append(
                    {
                        "seed": case_data.get("seed", 0),
                        "inputs": inputs,
                        "shape": case_data.get("shape", []),
                        "dtype": case_data.get("dtype", "float32"),
                        "layout": case_data.get("layout", "contiguous"),
                    }
                )
            return cases
        elif response.get("type") == "Error":
            raise ClientError(response.get("message", "Unknown error"))
        else:
            raise ClientError(f"Unexpected response: {response}")

    def submit_output(
        self,
        op_name: str,
        inputs: Dict[str, np.ndarray],
        output: np.ndarray,
        seed: int,
        **kwargs,
    ) -> ValidationResult:
        """Submit an op output for validation against the reference.

        This is the core method for client-side and daemon-orchestrated
        execution modes. The client runs the actual GPU op and submits
        the result here for comparison.

        Args:
            op_name: Name of the op (must be registered in gpuemu.toml).
            inputs: Input tensors as numpy arrays.
            output: Output tensor from the op under test.
            seed: Seed of the test case (from get_test_case or get_test_batch).
            **kwargs: Additional kwargs for the reference script.

        Returns:
            ValidationResult with pass/fail status and details.
        """
        encoded_inputs = {
            name: self._encode_tensor(arr) for name, arr in inputs.items()
        }
        encoded_output = self._encode_tensor(output)

        request = {
            "type": "SubmitOutput",
            "op_name": op_name,
            "inputs": encoded_inputs,
            "output": encoded_output,
            "seed": seed,
            "kwargs": {k: str(v) for k, v in kwargs.items()},
        }

        response = self._send_request(request)

        if response.get("type") == "SubmitResult":
            return ValidationResult.from_dict(response.get("result", {}))
        elif response.get("type") == "Error":
            raise ClientError(response.get("message", "Unknown error"))
        else:
            raise ClientError(f"Unexpected response: {response}")

    def fuzz_op_client_side(
        self,
        op_name: str,
        run_op: "Callable[[Dict[str, np.ndarray]], np.ndarray]",
        iterations: int = 100,
        seed: Optional[int] = None,
        fail_fast: bool = False,
        op_schema: Optional[Dict[str, Any]] = None,
        dtypes: Optional[List[str]] = None,
    ) -> FuzzResults:
        """Fuzz an op using client-side execution (THE RECOMMENDED DROP-IN PATH).

        This method generates random inputs via the daemon, runs the provided
        ``run_op`` callable on the client (which has GPU access), and validates
        the output against the reference script. This is how GPU developers
        should use gpuemu for fuzzing.

        Args:
            op_name: Name of the op (must be registered in gpuemu.toml).
            run_op: A callable that takes a dict of input tensors and returns
                     the output tensor. This is where you call your GPU kernel.
            iterations: Number of test cases to try.
            seed: Master seed. Auto-generated if None.
            fail_fast: Stop on first failure.
            op_schema: Optional operator-aware shape schema (see get_test_batch).
                Use for ops whose inputs have different but linked shapes
                (matmul, attention) so fuzzing covers the real operator domain.

        Returns:
            FuzzResults with pass/fail counts and list of failures.

        Example:
            >>> client = Client()
            >>> results = client.fuzz_op_client_side(
            ...     "my_flash_attention",
            ...     run_op=lambda inputs: my_flash_attn(inputs["q"], inputs["k"], inputs["v"]),
            ...     iterations=50,
            ... )
            >>> print(f"Passed: {results.passed}/{results.total}")
        """
        if seed is None:
            seed = int(time.time_ns()) & 0xFFFFFFFFFFFFFFFF

        cases = self.get_test_batch(
            op_name, count=iterations, seed=seed, op_schema=op_schema, dtypes=dtypes
        )
        total = 0
        passed = 0
        failed = 0
        failures = []

        for case in cases:
            total += 1
            try:
                output = run_op(case["inputs"])
                result = self.submit_output(
                    op_name, case["inputs"], output, case["seed"]
                )
                if result.passed:
                    passed += 1
                else:
                    failed += 1
                    failures.append(result)
                    if fail_fast:
                        break
            except Exception as e:
                failed += 1
                failures.append(
                    ValidationResult(
                        passed=False,
                        seed=case["seed"],
                        op_name=op_name,
                        max_diff=float("inf"),
                        max_rel_diff=float("inf"),
                        failures=[{"kind": "ExecutionError", "message": str(e)}],
                        timestamp=int(time.time()),
                        duration_ms=0,
                    )
                )
                if fail_fast:
                    break

        return FuzzResults(
            seed=seed,
            total=total,
            passed=passed,
            failed=failed,
            failures=failures,
        )

__init__(socket_path=None, timeout_ms=30000)

Initialize the client.

Parameters:

Name Type Description Default
socket_path Optional[str]

Path to the daemon socket. Defaults to ~/.gpuemu/gpuemu.sock

None
timeout_ms int

Timeout for requests in milliseconds.

30000
Source code in gpuemu/client.py
def __init__(
    self,
    socket_path: Optional[str] = None,
    timeout_ms: int = 30000,
):
    """Initialize the client.

    Args:
        socket_path: Path to the daemon socket. Defaults to ~/.gpuemu/gpuemu.sock
        timeout_ms: Timeout for requests in milliseconds.
    """
    if socket_path is None:
        socket_path = os.path.expanduser("~/.gpuemu/gpuemu.sock")

    self.socket_path = socket_path
    self.timeout_ms = timeout_ms
    self._socket = None
    self._version_checked = False

close()

Close the connection.

Source code in gpuemu/client.py
def close(self):
    """Close the connection."""
    if self._socket is not None:
        self._socket.close()
        self._socket = None
    self._version_checked = False

ping()

Ping the daemon to check if it's alive.

Returns:

Type Description
Dict[str, Any]

Dict with 'version', 'protocol_version', and 'uptime_secs'.

Raises:

Type Description
ClientError

If the daemon has an incompatible protocol version.

Source code in gpuemu/client.py
def ping(self) -> Dict[str, Any]:
    """Ping the daemon to check if it's alive.

    Returns:
        Dict with 'version', 'protocol_version', and 'uptime_secs'.

    Raises:
        ClientError: If the daemon has an incompatible protocol version.
    """
    response = self._send_request({"type": "Ping"})

    if response.get("type") == "Pong":
        daemon_pv = response.get("protocol_version", 0)
        if daemon_pv != PROTOCOL_VERSION:
            raise ClientError(
                f"Protocol version mismatch: client={PROTOCOL_VERSION}, "
                f"daemon={daemon_pv}. Please upgrade the "
                f"{'client' if daemon_pv > PROTOCOL_VERSION else 'daemon'}."
            )
        return {
            "version": response.get("version", "unknown"),
            "protocol_version": daemon_pv,
            "uptime_secs": response.get("uptime_secs", 0),
        }
    elif response.get("type") == "Error":
        raise ClientError(response.get("message", "Unknown error"))
    else:
        raise ClientError(f"Unexpected response: {response}")

validate_op(op_name, inputs, output, **kwargs)

Validate an op output against its reference implementation.

Parameters:

Name Type Description Default
op_name str

Name of the op (must be registered in gpuemu.toml).

required
inputs Dict[str, ndarray]

Input tensors as numpy arrays.

required
output ndarray

Output tensor to validate.

required
**kwargs

Additional kwargs to pass to the reference script.

{}

Returns:

Type Description
ValidationResult

ValidationResult with pass/fail status and details.

Source code in gpuemu/client.py
def validate_op(
    self,
    op_name: str,
    inputs: Dict[str, np.ndarray],
    output: np.ndarray,
    **kwargs,
) -> ValidationResult:
    """Validate an op output against its reference implementation.

    Args:
        op_name: Name of the op (must be registered in gpuemu.toml).
        inputs: Input tensors as numpy arrays.
        output: Output tensor to validate.
        **kwargs: Additional kwargs to pass to the reference script.

    Returns:
        ValidationResult with pass/fail status and details.
    """
    # Encode tensors for transmission
    encoded_inputs = {
        name: self._encode_tensor(arr) for name, arr in inputs.items()
    }
    encoded_output = self._encode_tensor(output)

    request = {
        "type": "ValidateOp",
        "op_name": op_name,
        "inputs": encoded_inputs,
        "output": encoded_output,
        "kwargs": {k: str(v) for k, v in kwargs.items()},
    }

    response = self._send_request(request)

    if response.get("type") == "ValidationResult":
        return ValidationResult.from_dict(response.get("result", {}))
    elif response.get("type") == "Error":
        raise ClientError(response.get("message", "Unknown error"))
    else:
        raise ClientError(f"Unexpected response: {response}")

get_result(seed)

Get a stored validation result by seed.

Parameters:

Name Type Description Default
seed int

The seed of the validation run.

required

Returns:

Type Description
Optional[ValidationResult]

ValidationResult if found, None otherwise.

Source code in gpuemu/client.py
def get_result(self, seed: int) -> Optional[ValidationResult]:
    """Get a stored validation result by seed.

    Args:
        seed: The seed of the validation run.

    Returns:
        ValidationResult if found, None otherwise.
    """
    request = {"type": "GetResult", "seed": seed}
    response = self._send_request(request)

    if response.get("type") == "ValidationResult":
        return ValidationResult.from_dict(response.get("result", {}))
    elif response.get("type") == "Error":
        if response.get("code") == "NotFound":
            return None
        raise ClientError(response.get("message", "Unknown error"))
    else:
        raise ClientError(f"Unexpected response: {response}")

list_results(limit=100)

List recent validation results.

Parameters:

Name Type Description Default
limit int

Maximum number of results to return.

100

Returns:

Type Description
List[ValidationResult]

List of ValidationResult objects.

Source code in gpuemu/client.py
def list_results(self, limit: int = 100) -> List[ValidationResult]:
    """List recent validation results.

    Args:
        limit: Maximum number of results to return.

    Returns:
        List of ValidationResult objects.
    """
    request = {"type": "ListResults", "limit": limit}
    response = self._send_request(request)

    if response.get("type") == "Results":
        return [ValidationResult.from_dict(r) for r in response.get("results", [])]
    elif response.get("type") == "Error":
        raise ClientError(response.get("message", "Unknown error"))
    else:
        raise ClientError(f"Unexpected response: {response}")

store_baseline(tag)

Store current results as a baseline.

Parameters:

Name Type Description Default
tag str

Tag name for the baseline.

required
Source code in gpuemu/client.py
def store_baseline(self, tag: str) -> None:
    """Store current results as a baseline.

    Args:
        tag: Tag name for the baseline.
    """
    request = {"type": "StoreBaseline", "tag": tag}
    response = self._send_request(request)

    if response.get("type") == "Ok":
        return
    elif response.get("type") == "Error":
        raise ClientError(response.get("message", "Unknown error"))
    else:
        raise ClientError(f"Unexpected response: {response}")

fuzz_op(op_name, seed=None, iterations=100, fail_fast=False, batch_sizes=None, seq_lengths=None, hidden_dims=None, dtypes=None, layouts=None)

Fuzz test an op with random inputs.

Parameters:

Name Type Description Default
op_name str

Name of the op (must be registered in gpuemu.toml).

required
seed Optional[int]

Master seed for reproducibility. If None, uses current timestamp.

None
iterations int

Number of test cases to generate.

100
fail_fast bool

Stop on first failure.

False
batch_sizes Optional[List[int]]

List of batch sizes to use.

None
seq_lengths Optional[List[int]]

List of sequence lengths to use.

None
hidden_dims Optional[List[int]]

List of hidden dimensions to use.

None
dtypes Optional[List[str]]

List of dtype strings to use.

None
layouts Optional[List[str]]

List of layout types to use.

None

Returns:

Type Description
FuzzResults

FuzzResults with pass/fail counts and list of failures.

Example

results = client.fuzz_op("matmul", seed=12345, iterations=100) print(f"Passed: {results.passed}/{results.total}") for failure in results.failures: ... print(f" Seed {failure.seed}: {failure.failures[0]['message']}")

Source code in gpuemu/client.py
def fuzz_op(
    self,
    op_name: str,
    seed: Optional[int] = None,
    iterations: int = 100,
    fail_fast: bool = False,
    batch_sizes: Optional[List[int]] = None,
    seq_lengths: Optional[List[int]] = None,
    hidden_dims: Optional[List[int]] = None,
    dtypes: Optional[List[str]] = None,
    layouts: Optional[List[str]] = None,
) -> FuzzResults:
    """Fuzz test an op with random inputs.

    Args:
        op_name: Name of the op (must be registered in gpuemu.toml).
        seed: Master seed for reproducibility. If None, uses current timestamp.
        iterations: Number of test cases to generate.
        fail_fast: Stop on first failure.
        batch_sizes: List of batch sizes to use.
        seq_lengths: List of sequence lengths to use.
        hidden_dims: List of hidden dimensions to use.
        dtypes: List of dtype strings to use.
        layouts: List of layout types to use.

    Returns:
        FuzzResults with pass/fail counts and list of failures.

    Example:
        >>> results = client.fuzz_op("matmul", seed=12345, iterations=100)
        >>> print(f"Passed: {results.passed}/{results.total}")
        >>> for failure in results.failures:
        ...     print(f"  Seed {failure.seed}: {failure.failures[0]['message']}")
    """
    if seed is None:
        seed = int(time.time_ns()) & 0xFFFFFFFFFFFFFFFF

    # Build fuzz config
    fuzz_config = {
        "seed": seed,
        "shape_options": {
            "batch_sizes": batch_sizes or [1, 2, 4, 8, 16, 32],
            "seq_lengths": seq_lengths or [64, 128, 256, 512, 1024],
            "hidden_dims": hidden_dims or [256, 512, 768, 1024],
            "edge_cases": [[1], [1, 1], [1, 1, 1]],
        },
        "dtypes": dtypes or ["float32", "float16"],
        "layouts": layouts or ["Contiguous", "Strided", "Transposed"],
    }

    request = {
        "type": "FuzzOp",
        "op_name": op_name,
        "fuzz_config": fuzz_config,
        "iterations": iterations,
        "fail_fast": fail_fast,
    }

    response = self._send_request(request)

    if response.get("type") == "FuzzResults":
        return FuzzResults.from_dict(response)
    elif response.get("type") == "Error":
        raise ClientError(response.get("message", "Unknown error"))
    else:
        raise ClientError(f"Unexpected response: {response}")

reproduce(seed)

Reproduce a failing test case by seed.

Retrieves the stored failure and regenerates the exact inputs that caused the failure.

Parameters:

Name Type Description Default
seed int

The seed of the failing test case.

required

Returns:

Type Description
ReproduceResult

ReproduceResult with the original result and regenerated inputs.

Example

repro = client.reproduce(12345) print(f"Op: {repro.result.op_name}") print(f"Input shape: {repro.inputs['input'].shape}")

Source code in gpuemu/client.py
def reproduce(self, seed: int) -> ReproduceResult:
    """Reproduce a failing test case by seed.

    Retrieves the stored failure and regenerates the exact inputs
    that caused the failure.

    Args:
        seed: The seed of the failing test case.

    Returns:
        ReproduceResult with the original result and regenerated inputs.

    Example:
        >>> repro = client.reproduce(12345)
        >>> print(f"Op: {repro.result.op_name}")
        >>> print(f"Input shape: {repro.inputs['input'].shape}")
    """
    request = {"type": "Reproduce", "seed": seed}
    response = self._send_request(request)

    if response.get("type") == "ReproduceResult":
        return ReproduceResult.from_dict(response, self._decode_tensor)
    elif response.get("type") == "Error":
        raise ClientError(response.get("message", "Unknown error"))
    else:
        raise ClientError(f"Unexpected response: {response}")

minimize(seed, strategy='binary-search-dims', max_iters=100)

Minimize a failing test case.

Attempts to find a smaller input that still triggers the failure.

Parameters:

Name Type Description Default
seed int

The seed of the failing test case.

required
strategy str

Minimization strategy. One of: - "binary-search-dims": Binary search to reduce dimensions. - "binary-search-values": Binary search to reduce values.

'binary-search-dims'
max_iters int

Maximum iterations for minimization.

100

Returns:

Type Description
MinimizeResult

MinimizeResult with minimized seed, shape, and result.

Example

result = client.minimize(12345) print(f"Minimized shape: {result.minimized_shape}")

Source code in gpuemu/client.py
def minimize(
    self,
    seed: int,
    strategy: str = "binary-search-dims",
    max_iters: int = 100,
) -> MinimizeResult:
    """Minimize a failing test case.

    Attempts to find a smaller input that still triggers the failure.

    Args:
        seed: The seed of the failing test case.
        strategy: Minimization strategy. One of:
            - "binary-search-dims": Binary search to reduce dimensions.
            - "binary-search-values": Binary search to reduce values.
        max_iters: Maximum iterations for minimization.

    Returns:
        MinimizeResult with minimized seed, shape, and result.

    Example:
        >>> result = client.minimize(12345)
        >>> print(f"Minimized shape: {result.minimized_shape}")
    """
    # Convert strategy string to protocol enum
    strategy_map = {
        "binary-search-dims": "BinarySearchDims",
        "binary-search-values": "BinarySearchValues",
    }
    proto_strategy = strategy_map.get(strategy, "BinarySearchDims")

    request = {
        "type": "Minimize",
        "seed": seed,
        "strategy": proto_strategy,
        "max_iters": max_iters,
    }
    response = self._send_request(request)

    if response.get("type") == "MinimizeResult":
        return MinimizeResult.from_dict(response)
    elif response.get("type") == "Error":
        raise ClientError(response.get("message", "Unknown error"))
    else:
        raise ClientError(f"Unexpected response: {response}")

list_failures(limit=20)

List stored failures.

Parameters:

Name Type Description Default
limit int

Maximum number of failures to return.

20

Returns:

Type Description
List[ValidationResult]

List of ValidationResult objects for failed tests.

Example

failures = client.list_failures(limit=10) for f in failures: ... print(f"Seed {f.seed}: {f.op_name}")

Source code in gpuemu/client.py
def list_failures(self, limit: int = 20) -> List[ValidationResult]:
    """List stored failures.

    Args:
        limit: Maximum number of failures to return.

    Returns:
        List of ValidationResult objects for failed tests.

    Example:
        >>> failures = client.list_failures(limit=10)
        >>> for f in failures:
        ...     print(f"Seed {f.seed}: {f.op_name}")
    """
    request = {"type": "ListFailures", "limit": limit}
    response = self._send_request(request)

    if response.get("type") == "Results":
        return [ValidationResult.from_dict(r) for r in response.get("results", [])]
    elif response.get("type") == "Error":
        raise ClientError(response.get("message", "Unknown error"))
    else:
        raise ClientError(f"Unexpected response: {response}")

lint_kernel(ptx_content, kernel_name=None)

Lint PTX through the daemon's artifact analyzer.

Extracts static metrics (registers, spills, local memory, instruction mix) and checks them against configured thresholds. If no kernel is registered, the daemon detects the kernel name from the PTX and uses default thresholds.

Parameters:

Name Type Description Default
ptx_content str

Raw PTX assembly text.

required
kernel_name Optional[str]

Optional kernel name to lint (else all / detected).

None

Returns:

Type Description
List[Dict[str, Any]]

List of lint-result dicts, each with keys: kernel_name, passed,

List[Dict[str, Any]]

metrics (register_count, spill_count, ...), violations, timestamp.

Source code in gpuemu/client.py
def lint_kernel(
    self, ptx_content: str, kernel_name: Optional[str] = None
) -> List[Dict[str, Any]]:
    """Lint PTX through the daemon's artifact analyzer.

    Extracts static metrics (registers, spills, local memory, instruction mix)
    and checks them against configured thresholds. If no kernel is registered,
    the daemon detects the kernel name from the PTX and uses default thresholds.

    Args:
        ptx_content: Raw PTX assembly text.
        kernel_name: Optional kernel name to lint (else all / detected).

    Returns:
        List of lint-result dicts, each with keys: kernel_name, passed,
        metrics (register_count, spill_count, ...), violations, timestamp.
    """
    request = {
        "type": "LintKernel",
        "kernel_name": kernel_name,
        "ptx_content": ptx_content,
    }
    response = self._send_request(request)
    if response.get("type") == "LintResults":
        return response.get("results", [])
    elif response.get("type") == "Error":
        raise ClientError(response.get("message", "Unknown error"))
    raise ClientError(f"Unexpected response: {response}")

get_test_case(op_name, seed=None)

Get a single test case from the daemon for client-side execution.

The daemon generates random inputs. The client runs the actual op on GPU and submits the output for validation via submit_output().

Parameters:

Name Type Description Default
op_name str

Name of the op (must be registered in gpuemu.toml).

required
seed Optional[int]

Master seed for reproducibility. Auto-generated if None.

None

Returns:

Type Description
Dict[str, Any]

Dict with 'seed', 'inputs' (dict of name->ndarray), 'shape', 'dtype', 'layout'.

Source code in gpuemu/client.py
def get_test_case(self, op_name: str, seed: Optional[int] = None) -> Dict[str, Any]:
    """Get a single test case from the daemon for client-side execution.

    The daemon generates random inputs. The client runs the actual op
    on GPU and submits the output for validation via submit_output().

    Args:
        op_name: Name of the op (must be registered in gpuemu.toml).
        seed: Master seed for reproducibility. Auto-generated if None.

    Returns:
        Dict with 'seed', 'inputs' (dict of name->ndarray), 'shape', 'dtype', 'layout'.
    """
    if seed is None:
        seed = int(time.time_ns()) & 0xFFFFFFFFFFFFFFFF

    fuzz_config = {
        "seed": seed,
        "shape_options": {
            "batch_sizes": [1, 2, 4, 8],
            "seq_lengths": [64, 128, 256],
            "hidden_dims": [256, 512],
            "edge_cases": [[1], [1, 1]],
        },
        "dtypes": ["float32", "float16"],
        "layouts": ["Contiguous", "Strided"],
    }

    request = {
        "type": "GetTestCase",
        "op_name": op_name,
        "fuzz_config": fuzz_config,
    }

    response = self._send_request(request)

    if response.get("type") == "TestCase":
        inputs = {
            name: self._decode_tensor(tensor)
            for name, tensor in response.get("inputs", {}).items()
        }
        return {
            "seed": response.get("seed", 0),
            "inputs": inputs,
            "shape": response.get("shape", []),
            "dtype": response.get("dtype", "float32"),
            "layout": response.get("layout", "contiguous"),
        }
    elif response.get("type") == "Error":
        raise ClientError(response.get("message", "Unknown error"))
    else:
        raise ClientError(f"Unexpected response: {response}")

get_test_batch(op_name, count=10, seed=None, op_schema=None, dtypes=None)

Get a batch of test cases from the daemon.

Parameters:

Name Type Description Default
op_name str

Name of the op.

required
count int

Number of test cases to generate.

10
seed Optional[int]

Master seed. Auto-generated if None.

None
op_schema Optional[Dict[str, Any]]

Optional operator-aware shape schema. When provided, the daemon generates per-input shapes from shared symbolic dims (e.g. matmul A[M,K]/B[K,N]) instead of one shape for all inputs. Shape: {"name", "dims": [{"name","candidates"}], "inputs": [{"name","dims"}], "output": {"name","dims"}}.

None

Returns:

Type Description
List[Dict[str, Any]]

List of test case dicts (same format as get_test_case).

Source code in gpuemu/client.py
def get_test_batch(
    self,
    op_name: str,
    count: int = 10,
    seed: Optional[int] = None,
    op_schema: Optional[Dict[str, Any]] = None,
    dtypes: Optional[List[str]] = None,
) -> List[Dict[str, Any]]:
    """Get a batch of test cases from the daemon.

    Args:
        op_name: Name of the op.
        count: Number of test cases to generate.
        seed: Master seed. Auto-generated if None.
        op_schema: Optional operator-aware shape schema. When provided, the
            daemon generates per-input shapes from shared symbolic dims
            (e.g. matmul A[M,K]/B[K,N]) instead of one shape for all inputs.
            Shape: {"name", "dims": [{"name","candidates"}],
                    "inputs": [{"name","dims"}], "output": {"name","dims"}}.

    Returns:
        List of test case dicts (same format as get_test_case).
    """
    if seed is None:
        seed = int(time.time_ns()) & 0xFFFFFFFFFFFFFFFF

    fuzz_config = {
        "seed": seed,
        "shape_options": {
            "batch_sizes": [1, 2, 4, 8],
            "seq_lengths": [64, 128, 256],
            "hidden_dims": [256, 512],
            "edge_cases": [[1], [1, 1]],
        },
        "dtypes": dtypes or ["float32", "float16"],
        "layouts": ["Contiguous", "Strided"],
    }
    if op_schema is not None:
        fuzz_config["op_schema"] = op_schema

    request = {
        "type": "GetTestBatch",
        "op_name": op_name,
        "fuzz_config": fuzz_config,
        "count": count,
    }

    response = self._send_request(request)

    if response.get("type") == "TestBatch":
        cases = []
        for case_data in response.get("cases", []):
            inputs = {
                name: self._decode_tensor(tensor)
                for name, tensor in case_data.get("inputs", {}).items()
            }
            cases.append(
                {
                    "seed": case_data.get("seed", 0),
                    "inputs": inputs,
                    "shape": case_data.get("shape", []),
                    "dtype": case_data.get("dtype", "float32"),
                    "layout": case_data.get("layout", "contiguous"),
                }
            )
        return cases
    elif response.get("type") == "Error":
        raise ClientError(response.get("message", "Unknown error"))
    else:
        raise ClientError(f"Unexpected response: {response}")

submit_output(op_name, inputs, output, seed, **kwargs)

Submit an op output for validation against the reference.

This is the core method for client-side and daemon-orchestrated execution modes. The client runs the actual GPU op and submits the result here for comparison.

Parameters:

Name Type Description Default
op_name str

Name of the op (must be registered in gpuemu.toml).

required
inputs Dict[str, ndarray]

Input tensors as numpy arrays.

required
output ndarray

Output tensor from the op under test.

required
seed int

Seed of the test case (from get_test_case or get_test_batch).

required
**kwargs

Additional kwargs for the reference script.

{}

Returns:

Type Description
ValidationResult

ValidationResult with pass/fail status and details.

Source code in gpuemu/client.py
def submit_output(
    self,
    op_name: str,
    inputs: Dict[str, np.ndarray],
    output: np.ndarray,
    seed: int,
    **kwargs,
) -> ValidationResult:
    """Submit an op output for validation against the reference.

    This is the core method for client-side and daemon-orchestrated
    execution modes. The client runs the actual GPU op and submits
    the result here for comparison.

    Args:
        op_name: Name of the op (must be registered in gpuemu.toml).
        inputs: Input tensors as numpy arrays.
        output: Output tensor from the op under test.
        seed: Seed of the test case (from get_test_case or get_test_batch).
        **kwargs: Additional kwargs for the reference script.

    Returns:
        ValidationResult with pass/fail status and details.
    """
    encoded_inputs = {
        name: self._encode_tensor(arr) for name, arr in inputs.items()
    }
    encoded_output = self._encode_tensor(output)

    request = {
        "type": "SubmitOutput",
        "op_name": op_name,
        "inputs": encoded_inputs,
        "output": encoded_output,
        "seed": seed,
        "kwargs": {k: str(v) for k, v in kwargs.items()},
    }

    response = self._send_request(request)

    if response.get("type") == "SubmitResult":
        return ValidationResult.from_dict(response.get("result", {}))
    elif response.get("type") == "Error":
        raise ClientError(response.get("message", "Unknown error"))
    else:
        raise ClientError(f"Unexpected response: {response}")

fuzz_op_client_side(op_name, run_op, iterations=100, seed=None, fail_fast=False, op_schema=None, dtypes=None)

Fuzz an op using client-side execution (THE RECOMMENDED DROP-IN PATH).

This method generates random inputs via the daemon, runs the provided run_op callable on the client (which has GPU access), and validates the output against the reference script. This is how GPU developers should use gpuemu for fuzzing.

Parameters:

Name Type Description Default
op_name str

Name of the op (must be registered in gpuemu.toml).

required
run_op Callable[[Dict[str, ndarray]], ndarray]

A callable that takes a dict of input tensors and returns the output tensor. This is where you call your GPU kernel.

required
iterations int

Number of test cases to try.

100
seed Optional[int]

Master seed. Auto-generated if None.

None
fail_fast bool

Stop on first failure.

False
op_schema Optional[Dict[str, Any]]

Optional operator-aware shape schema (see get_test_batch). Use for ops whose inputs have different but linked shapes (matmul, attention) so fuzzing covers the real operator domain.

None

Returns:

Type Description
FuzzResults

FuzzResults with pass/fail counts and list of failures.

Example

client = Client() results = client.fuzz_op_client_side( ... "my_flash_attention", ... run_op=lambda inputs: my_flash_attn(inputs["q"], inputs["k"], inputs["v"]), ... iterations=50, ... ) print(f"Passed: {results.passed}/{results.total}")

Source code in gpuemu/client.py
def fuzz_op_client_side(
    self,
    op_name: str,
    run_op: "Callable[[Dict[str, np.ndarray]], np.ndarray]",
    iterations: int = 100,
    seed: Optional[int] = None,
    fail_fast: bool = False,
    op_schema: Optional[Dict[str, Any]] = None,
    dtypes: Optional[List[str]] = None,
) -> FuzzResults:
    """Fuzz an op using client-side execution (THE RECOMMENDED DROP-IN PATH).

    This method generates random inputs via the daemon, runs the provided
    ``run_op`` callable on the client (which has GPU access), and validates
    the output against the reference script. This is how GPU developers
    should use gpuemu for fuzzing.

    Args:
        op_name: Name of the op (must be registered in gpuemu.toml).
        run_op: A callable that takes a dict of input tensors and returns
                 the output tensor. This is where you call your GPU kernel.
        iterations: Number of test cases to try.
        seed: Master seed. Auto-generated if None.
        fail_fast: Stop on first failure.
        op_schema: Optional operator-aware shape schema (see get_test_batch).
            Use for ops whose inputs have different but linked shapes
            (matmul, attention) so fuzzing covers the real operator domain.

    Returns:
        FuzzResults with pass/fail counts and list of failures.

    Example:
        >>> client = Client()
        >>> results = client.fuzz_op_client_side(
        ...     "my_flash_attention",
        ...     run_op=lambda inputs: my_flash_attn(inputs["q"], inputs["k"], inputs["v"]),
        ...     iterations=50,
        ... )
        >>> print(f"Passed: {results.passed}/{results.total}")
    """
    if seed is None:
        seed = int(time.time_ns()) & 0xFFFFFFFFFFFFFFFF

    cases = self.get_test_batch(
        op_name, count=iterations, seed=seed, op_schema=op_schema, dtypes=dtypes
    )
    total = 0
    passed = 0
    failed = 0
    failures = []

    for case in cases:
        total += 1
        try:
            output = run_op(case["inputs"])
            result = self.submit_output(
                op_name, case["inputs"], output, case["seed"]
            )
            if result.passed:
                passed += 1
            else:
                failed += 1
                failures.append(result)
                if fail_fast:
                    break
        except Exception as e:
            failed += 1
            failures.append(
                ValidationResult(
                    passed=False,
                    seed=case["seed"],
                    op_name=op_name,
                    max_diff=float("inf"),
                    max_rel_diff=float("inf"),
                    failures=[{"kind": "ExecutionError", "message": str(e)}],
                    timestamp=int(time.time()),
                    duration_ms=0,
                )
            )
            if fail_fast:
                break

    return FuzzResults(
        seed=seed,
        total=total,
        passed=passed,
        failed=failed,
        failures=failures,
    )