Policy Engine Guide¶

Learn to write effective safety policies for CheckStream.

Overview¶

The policy engine evaluates classifier outputs and determines actions. Policies are defined in YAML and can be hot-reloaded without restart.

version: "1.0"
name: "my-policy"

policies:
  - name: rule_name
    trigger: ...
    action: ...

Policy Structure¶

Basic Policy¶

policies:
  - name: block_toxicity
    trigger:
      classifier: toxicity
      threshold: 0.8
    action: stop
    message: "Content blocked for safety"

Policy Fields¶

Field	Required	Description
`name`	Yes	Unique rule identifier
`trigger`	Yes	Condition to activate rule
`action`	Yes	What to do when triggered
`phase`	No	Limit to specific phase
`mode`	No	`enforce`, `shadow`, `disabled`
`message`	No	User-facing message
`regulation`	No	Regulatory reference
`priority`	No	Rule evaluation order

Trigger Types¶

Classifier Trigger¶

trigger:
  classifier: toxicity
  threshold: 0.8

Threshold Range¶

trigger:
  classifier: toxicity
  min_threshold: 0.5    # Minimum score
  max_threshold: 0.8    # Maximum score

Pattern Trigger¶

trigger:
  pattern: '\b(password|secret)\b'
  case_insensitive: true

Label Trigger¶

trigger:
  classifier: sentiment
  label: negative
  confidence: 0.7

Compound Triggers¶

All Conditions (AND)¶

trigger:
  all:
    - classifier: toxicity
      threshold: 0.6
    - classifier: sentiment
      label: negative
      confidence: 0.7

Any Condition (OR)¶

trigger:
  any:
    - classifier: toxicity
      threshold: 0.8
    - classifier: hate_speech
      threshold: 0.7

Nested Logic¶

trigger:
  all:
    - classifier: contains_advice
      threshold: 0.5
    - any:
        - classifier: financial_advice
          threshold: 0.7
        - classifier: medical_advice
          threshold: 0.7

NOT Condition¶

trigger:
  all:
    - classifier: toxicity
      threshold: 0.7
    - not:
        classifier: satire_detector
        threshold: 0.8

Actions¶

Stop Action¶

Block request or stop generation:

action: stop
message: "Request blocked for safety"

Redact Action¶

Replace content with placeholder:

action: redact
replacement: "[CONTENT REMOVED]"

Advanced redaction:

action: redact
options:
  replacement: "[REDACTED]"
  scope: matched        # matched, sentence, paragraph, all
  preserve_length: false

Inject Action¶

Add content to response:

action: inject
position: end           # start, end, inline
content: |
  ---
  *Disclaimer: This is not professional advice.*

Log Action¶

Record for analysis without blocking:

action: log
level: warn             # debug, info, warn, error
include_context: true

Audit Action¶

Create compliance record:

action: audit
include:
  - input
  - output
  - classifier_scores
  - timestamp
regulation: "FCA COBS 9A.2.1R"

Multiple Actions¶

action:
  - type: redact
    replacement: "[PII REMOVED]"
  - type: log
    level: warn
  - type: audit
    regulation: "GDPR Article 9"

Phase-Specific Policies¶

Ingress Only¶

policies:
  - name: block_injection
    phase: ingress
    trigger:
      classifier: prompt_injection
      threshold: 0.8
    action: stop

Midstream Only¶

policies:
  - name: redact_pii
    phase: midstream
    trigger:
      classifier: pii_detector
      threshold: 0.9
    action: redact

Egress Only¶

policies:
  - name: add_disclaimer
    phase: egress
    trigger:
      classifier: financial_advice
      threshold: 0.3
    action: inject
    position: end
    content: "\n\n*Not financial advice.*"

Policy Modes¶

Enforce Mode (Default)¶

policies:
  - name: strict_safety
    mode: enforce
    trigger: ...
    action: stop

Shadow Mode (Test)¶

Log what would happen without enforcing:

policies:
  - name: test_rule
    mode: shadow
    trigger:
      classifier: new_classifier
      threshold: 0.7
    action: stop
    # Logs trigger but doesn't block

Disabled Mode¶

policies:
  - name: deprecated_rule
    mode: disabled

Priority and Ordering¶

Higher priority rules are evaluated first:

policies:
  - name: critical_safety
    priority: 100
    trigger: ...
    action: stop

  - name: moderate_check
    priority: 50
    trigger: ...
    action: log

  - name: low_priority
    priority: 10
    trigger: ...
    action: audit

First matching rule wins (unless continue: true):

policies:
  - name: log_everything
    priority: 100
    trigger:
      classifier: any
      threshold: 0
    action: log
    continue: true    # Continue to next rule

  - name: block_severe
    priority: 50
    trigger:
      classifier: toxicity
      threshold: 0.9
    action: stop      # Stops evaluation

Variables and Context¶

Built-in Variables¶

Variable	Description
`${input}`	User input text
`${output}`	Generated output
`${tenant}`	Tenant identifier
`${model}`	LLM model name
`${timestamp}`	Current timestamp

Using Variables¶

policies:
  - name: audit_with_context
    trigger:
      classifier: financial_advice
      threshold: 0.5
    action: audit
    metadata:
      tenant: "${tenant}"
      model: "${model}"
      timestamp: "${timestamp}"

Real-World Examples¶

Financial Compliance¶

version: "1.0"
name: "fca-compliance"

policies:
  - name: block_specific_advice
    phase: ingress
    trigger:
      all:
        - classifier: financial_advice
          threshold: 0.8
        - pattern: '\b(buy|sell|invest)\s+(in|into)\b'
    action: stop
    message: "I cannot provide specific investment recommendations."
    regulation: "FCA COBS 9A.2.1R"

  - name: redact_projections
    phase: midstream
    trigger:
      pattern: '\b\d+%\s+(return|growth|yield)\b'
    action: redact
    replacement: "[PROJECTION REDACTED]"

  - name: add_risk_warning
    phase: egress
    trigger:
      classifier: investment_discussion
      threshold: 0.3
    action: inject
    position: end
    content: |

      ---
      **Risk Warning**: Past performance is not a guide to future performance.
      The value of investments can fall as well as rise.

Healthcare Compliance¶

version: "1.0"
name: "hipaa-compliance"

policies:
  - name: block_phi_requests
    phase: ingress
    trigger:
      pattern: '(patient|medical)\s+record'
    action: stop
    message: "I cannot access or discuss specific patient records."

  - name: redact_phi
    phase: midstream
    trigger:
      any:
        - classifier: pii_detector
          threshold: 0.9
        - pattern: '\b(MRN|DOB|SSN)[\s:]+\S+'
    action: redact
    replacement: "[PHI REDACTED]"

  - name: medical_disclaimer
    phase: egress
    trigger:
      classifier: medical_advice
      threshold: 0.4
    action: inject
    position: end
    content: |

      ---
      *This information is for educational purposes only and is not a substitute
      for professional medical advice. Please consult a healthcare provider.*

Content Moderation¶

version: "1.0"
name: "content-moderation"

policies:
  - name: block_hate_speech
    trigger:
      classifier: hate_speech
      threshold: 0.85
    action: stop
    message: "This content violates our community guidelines."

  - name: redact_profanity
    phase: midstream
    trigger:
      classifier: profanity
      threshold: 0.9
    action: redact
    replacement: "****"

  - name: flag_borderline
    trigger:
      classifier: toxicity
      min_threshold: 0.5
      max_threshold: 0.85
    action:
      - type: log
        level: warn
      - type: audit
        metadata:
          review_required: true

Testing Policies¶

Validate Syntax¶

./checkstream-proxy --validate-policy ./policies/my-policy.yaml

Test Against Input¶

curl http://localhost:8080/admin/test-policy \
  -H "Content-Type: application/json" \
  -d '{
    "policy": "fca-compliance",
    "text": "You should buy AAPL stock",
    "phase": "ingress"
  }'

Shadow Mode Analysis¶

# Enable shadow mode for new policy
# Review logs for trigger patterns
grep "shadow_trigger" /var/log/checkstream/*.log | jq .

Best Practices¶

Start with shadow mode - Test before enforcing
Use specific patterns - Avoid over-broad triggers
Layer defenses - Multiple rules for important cases
Document regulations - Include regulation field
Set appropriate thresholds - Balance safety vs usability
Use phases wisely - Fast checks in ingress, heavy in egress
Review regularly - Update thresholds based on data

Next Steps¶

Policy Language Reference - Complete syntax
Regulatory Compliance - Pre-built compliance packs
Pipeline Configuration - Classifier pipelines