Skip to content

CheckStream

Production-ready streaming guardrails for LLM safety and compliance.

CheckStream is a high-performance Rust guardrail platform that enforces safety, security, and regulatory compliance on LLM outputs as tokens stream in real-time with sub-10ms latency.


Why CheckStream?

Modern AI applications require real-time safety enforcement without sacrificing user experience. CheckStream provides:

  • Real-time Protection: Sub-10ms latency guardrails that work as tokens stream
  • Regulatory Compliance: Built-in support for FCA, FINRA, GDPR, HIPAA regulations
  • Production Ready: 122+ tests passing, graceful shutdown, Kubernetes health probes
  • Zero Python Dependencies: Pure Rust with Candle ML framework for reliable deployment

How It Works

CheckStream operates as a transparent proxy between your application and any LLM provider:

┌─────────────┐     ┌──────────────────────────────────────────────┐     ┌─────────────┐
│             │     │              CheckStream Proxy                │     │             │
│   Client    │────▶│  Phase 1 ──▶ Phase 2 (stream) ──▶ Phase 3   │────▶│  LLM API    │
│ Application │◀────│  Ingress     Midstream            Egress     │◀────│  Backend    │
│             │     │                                              │     │             │
└─────────────┘     └──────────────────────────────────────────────┘     └─────────────┘

Three-Phase Pipeline:

Phase When Purpose Latency
Ingress Before LLM Validate prompts, block unsafe requests ~3ms
Midstream During streaming Real-time token safety, redaction ~2ms/chunk
Egress After completion Compliance checks, audit trail Async

Key Features

Tiered Classification System

Tier Latency Method Use Case
A <2ms Pattern matching PII, prompt injection patterns
B <5ms Quantized ML Toxicity, sentiment
C <10ms Full models Complex domain classifiers

Policy-as-Code

Define safety rules in simple YAML:

policies:
  - name: block_financial_advice
    trigger:
      classifier: financial_advice
      threshold: 0.8
    action: stop
    message: "Financial advice requires suitability assessment"
    regulation: "FCA COBS 9A.2.1R"

HuggingFace Integration

Auto-download and cache models from HuggingFace Hub:

classifiers:
  toxicity:
    model: "unitary/toxic-bert"
    tier: B
    device: auto

Quick Start

# Clone and build
git clone https://github.com/Skelf-Research/checkstream
cd checkstream
cargo build --release --features ml-models

# Download models
./scripts/download_models.sh

# Run proxy
./target/release/checkstream-proxy --config config.yaml

Then point your LLM client to http://localhost:8080 instead of the upstream API.

Get started with the full installation guide


Supported Backends

CheckStream works with any OpenAI-compatible API:

  • OpenAI
  • Anthropic (via adapter)
  • Azure OpenAI
  • vLLM
  • Ollama
  • Any OpenAI-compatible endpoint

Use Cases

  • Financial Services: FCA/FINRA compliant AI assistants
  • Healthcare: HIPAA-compliant patient communication
  • Enterprise: Content moderation and brand safety
  • Security: Prompt injection and jailbreak prevention

Explore use cases


Performance

Metric Target Actual
Pattern classification <2ms ~0.5ms
ML classification (CPU) <50ms 30-50ms
Total proxy overhead <10ms 5-8ms
Throughput 1000 req/s 1000+ req/s
Memory <500MB <500MB