Quality of Meaning (QoM)¶

Quality of Meaning (QoM) is MPL's system for measuring, quantifying, and enforcing semantic quality. Rather than treating quality as a binary pass/fail, QoM provides six numeric metrics that capture different dimensions of meaning fidelity. Configurable profiles set thresholds appropriate to each use case's risk level.

Overview¶

graph LR
    Message[MPL Envelope] --> Pipeline[QoM Pipeline]
    Pipeline --> SF[Schema Fidelity]
    Pipeline --> IC[Instruction Compliance]
    Pipeline --> G[Groundedness]
    Pipeline --> DJ[Determinism under Jitter]
    Pipeline --> OA[Ontology Adherence]
    Pipeline --> TOC[Tool Outcome Correctness]

    SF --> Report[QoM Report]
    IC --> Report
    G --> Report
    DJ --> Report
    OA --> Report
    TOC --> Report

    Report --> Decision{Meets Profile?}
    Decision -->|Yes| Allow[Allow]
    Decision -->|No| Breach[E-QOM-BREACH]

Design Philosophy

QoM treats semantic quality as a measurable, continuous signal rather than a gate. Each metric produces a value between 0.0 and 1.0. Profiles define thresholds -- the minimum acceptable scores for a given deployment context.

The Six Metrics¶

1. Schema Fidelity (SF)¶

What it measures: Whether the payload structurally conforms to its declared SType's JSON Schema.

Property	Value
Score range	0.0 or 1.0 (binary)
Evaluation	JSON Schema validation (draft 2020-12)
Mandatory	Yes -- all profiles require SF = 1.0
Cost	Negligible (local validation)

Schema Fidelity is the foundational gate. If the payload does not conform to its declared schema, no further QoM evaluation occurs. The error E-SCHEMA-FIDELITY is returned immediately.

{
  "metric": "schema_fidelity",
  "score": 1.0,
  "details": {
    "schema": "org.calendar.Event.v1",
    "validation_errors": []
  }
}

2. Instruction Compliance (IC)¶

What it measures: Whether the payload satisfies additional business assertions beyond schema structure.

Property	Value
Score range	0.0 to 1.0 (pass rate across assertions)
Evaluation	CEL expressions or JSONLogic rules
Mandatory	Required in `qom-strict-argcheck` and above
Cost	Low (rule evaluation)

Assertions are defined in .cel files alongside the schema:

// assertions.cel for org.calendar.Event.v1

// End time must be after start time
timestamp(payload.end) > timestamp(payload.start)

// Title must not be all whitespace
payload.title.trim().size() > 0

// Event duration must be at most 24 hours
timestamp(payload.end) - timestamp(payload.start) <= duration("24h")

The IC score is the fraction of assertions that pass:

IC = passing_assertions / total_assertions

3. Groundedness (G)¶

What it measures: Whether claims in the payload are supported by cited sources.

Property	Value
Score range	0.0 to 1.0 (claim support ratio)
Evaluation	Citation verification against source material
Mandatory	Required in `qom-comprehensive`
Cost	Medium (requires source retrieval)

Groundedness evaluates whether factual claims can be traced to source material. This is critical for RAG (Retrieval-Augmented Generation) scenarios:

G = supported_claims / total_claims

When Groundedness Applies

Groundedness is most relevant for STypes that carry factual assertions (e.g., research summaries, medical information, financial reports). For purely structural types (e.g., calendar events), Groundedness is typically not required and scores 1.0 by default.

4. Determinism under Jitter (DJ)¶

What it measures: Whether repeated execution with slight input perturbation produces semantically consistent results.

Property	Value
Score range	0.0 to 1.0 (similarity across re-executions)
Evaluation	BLEU, ROUGE, or cosine similarity
Mandatory	Required in `qom-comprehensive`
Cost	High (requires multiple re-executions)

Determinism under Jitter catches non-deterministic AI behaviors that could produce inconsistent outputs for the same logical input:

DJ = average_similarity(outputs_under_perturbation)

The evaluation process:

Introduce minor perturbations to the input (rephrasing, whitespace, ordering)
Re-execute the tool/agent call N times (configurable, default 3)
Compare outputs using configured similarity metric
Score is the average pairwise similarity

Performance Impact

DJ evaluation requires multiple re-executions of the tool call. Enable it only for high-stakes scenarios where consistency is critical. The qom-comprehensive profile includes DJ but with a relaxed threshold.

5. Ontology Adherence (OA)¶

What it measures: Whether the payload conforms to domain-specific ontological rules beyond JSON Schema.

Property	Value
Score range	0.0 to 1.0 (rule pass rate)
Evaluation	SHACL shapes, OWL constraints, or custom rules
Mandatory	Required in `qom-comprehensive`
Cost	Medium (rule engine evaluation)

Ontology Adherence captures domain knowledge that cannot be expressed in JSON Schema alone:

Medical: ICD-10 code validity, drug interaction checks
Financial: SWIFT code format, currency pair validity
Legal: Jurisdiction-specific clause requirements

{
  "metric": "ontology_adherence",
  "score": 0.95,
  "details": {
    "rules_evaluated": 20,
    "rules_passed": 19,
    "violations": [
      {
        "rule": "icd10_code_valid",
        "message": "Code Z99.99 is not a valid ICD-10 code"
      }
    ]
  }
}

6. Tool Outcome Correctness (TOC)¶

What it measures: Whether the side effects of a tool invocation match expectations.

Property	Value
Score range	0.0 to 1.0 (post-check pass rate)
Evaluation	Post-execution hooks that verify outcomes
Mandatory	Required in `qom-outcome` and above
Cost	Variable (depends on post-check implementation)

TOC verifies that tools actually did what they claimed. Post-check hooks run after tool execution:

# Post-check hook for calendar event creation
async def check_event_created(tool_result, payload):
    """Verify the event actually exists in the calendar system."""
    event_id = tool_result["eventId"]
    event = await calendar_api.get_event(event_id)

    checks = {
        "event_exists": event is not None,
        "title_matches": event.title == payload["title"],
        "time_matches": event.start == payload["start"],
    }

    return sum(checks.values()) / len(checks)

Metrics Summary¶

#	Metric	Abbreviation	Measures	Score Type	Cost
1	Schema Fidelity	SF	Structural conformance	Binary (0/1)	Negligible
2	Instruction Compliance	IC	Business rule adherence	Continuous	Low
3	Groundedness	G	Citation support	Continuous	Medium
4	Determinism under Jitter	DJ	Output consistency	Continuous	High
5	Ontology Adherence	OA	Domain rule conformance	Continuous	Medium
6	Tool Outcome Correctness	TOC	Side-effect verification	Continuous	Variable

QoM Profiles¶

Profiles define which metrics are evaluated and their minimum thresholds. Choose a profile based on your deployment's risk level:

Profile	Metrics Required	Thresholds	Use Case
`qom-basic`	SF	SF = 1.0	Development, testing, low-risk
`qom-strict-argcheck`	SF, IC	SF = 1.0, IC >= 0.97	Production, business-critical
`qom-outcome`	SF, IC, TOC	SF + IC + TOC >= 0.95 (each)	High-stakes, side-effect verification
`qom-comprehensive`	All 6	All metrics evaluated	Mission-critical, regulated environments

graph LR
    Basic["qom-basic<br/>SF only<br/><i>Development</i>"] --> Strict["qom-strict-argcheck<br/>SF + IC<br/><i>Production</i>"]
    Strict --> Outcome["qom-outcome<br/>SF + IC + TOC<br/><i>High-stakes</i>"]
    Outcome --> Comprehensive["qom-comprehensive<br/>All 6 metrics<br/><i>Mission-critical</i>"]

    style Basic fill:#c8e6c9
    style Strict fill:#fff9c4
    style Outcome fill:#ffe0b2
    style Comprehensive fill:#ffcdd2

Choosing a Profile

Development: Start with qom-basic to validate schemas without overhead
Production: Use qom-strict-argcheck for most business applications
Financial/Medical: Use qom-outcome when side effects must be verified
Regulated/Audited: Use qom-comprehensive for full governance coverage

Evaluation Pipeline¶

QoM metrics are evaluated in a defined order. Each stage can short-circuit on failure:

graph TD
    Start[Envelope Received] --> SF{1. Schema Fidelity}
    SF -->|"SF = 0.0"| Fail[E-SCHEMA-FIDELITY<br/>Reject immediately]
    SF -->|"SF = 1.0"| IC{2. Instruction Compliance}
    IC --> G{3. Groundedness}
    G --> DJ{4. Determinism under Jitter}
    DJ --> OA{5. Ontology Adherence}
    OA --> TOC{6. Tool Outcome Correctness}
    TOC --> Report[Generate QoM Report]
    Report --> Check{Meets Profile?}
    Check -->|Yes| Allow[Allow + attach report]
    Check -->|No| Breach[E-QOM-BREACH]

    style Fail fill:#ffcdd2
    style Breach fill:#ffcdd2
    style Allow fill:#c8e6c9

Short-Circuit Behavior

Schema Fidelity failure immediately rejects the message. Other metrics are evaluated based on the active profile -- if the profile does not require a metric, it is skipped (scored as 1.0 by default).

QoM Report Structure¶

Every evaluated message receives a QoM report attached to its response envelope:

{
  "qom_report": {
    "profile": "qom-strict-argcheck",
    "meets_profile": true,
    "evaluated_at": "2025-01-15T10:00:05.123Z",
    "metrics": {
      "schema_fidelity": {
        "score": 1.0,
        "details": {
          "schema": "org.calendar.Event.v1",
          "validation_errors": []
        }
      },
      "instruction_compliance": {
        "score": 0.98,
        "details": {
          "assertions_total": 50,
          "assertions_passed": 49,
          "failures": [
            {
              "assertion": "event_duration_reasonable",
              "message": "Event duration exceeds 8 hours"
            }
          ]
        }
      }
    },
    "skipped_metrics": ["groundedness", "determinism", "ontology_adherence", "tool_outcome_correctness"],
    "evaluation_duration_ms": 12
  }
}

Breach Handling¶

When a message fails to meet its negotiated QoM profile, the system raises an E-QOM-BREACH error:

Error Response¶

{
  "error": {
    "code": "E-QOM-BREACH",
    "message": "Message does not meet qom-strict-argcheck profile",
    "profile": "qom-strict-argcheck",
    "violations": [
      {
        "metric": "instruction_compliance",
        "required": 0.97,
        "actual": 0.89,
        "gap": 0.08
      }
    ],
    "retry_allowed": true,
    "retry_budget": 2
  }
}

Breach Response Strategies¶

Strategy	Behavior	Configuration
Reject	Return error to caller; no forwarding	`action: reject` (default)
Retry	Re-evaluate up to N times with regeneration	`retry_budget: 3`
Degrade	Fall back to a less strict profile	`fallback_profile: qom-basic`
Warn	Allow but flag in audit log	`action: warn`

Retry Policy¶

qom:
  breach_handling:
    action: reject
    retry:
      enabled: true
      budget: 3
      backoff: exponential
      base_delay_ms: 100
    fallback:
      enabled: true
      profile: qom-basic
      log_level: warn

Profile Degradation

Falling back to a weaker profile should be a last resort. Every degradation is logged with full context in the audit trail. Use this only for availability-critical systems where partial governance is better than no response.

Working with QoM in Code¶

Python SDK¶

from mpl_sdk import QomMetrics, QomProfile

# Create metrics from evaluation results
metrics = QomMetrics(
    schema_fidelity=1.0,
    instruction_compliance=0.95
)

# Load a profile
profile = QomProfile.strict_argcheck()

# Evaluate against profile
evaluation = profile.evaluate(metrics)
print(evaluation.meets_profile)  # False (IC 0.95 < required 0.97)
print(evaluation.violations)     # [Violation(metric="instruction_compliance", required=0.97, actual=0.95)]

# Check with a passing score
metrics_passing = QomMetrics(
    schema_fidelity=1.0,
    instruction_compliance=0.99
)
evaluation_pass = profile.evaluate(metrics_passing)
print(evaluation_pass.meets_profile)  # True

TypeScript SDK¶

import { QomMetrics, QomProfile } from '@mpl/sdk';

const metrics = new QomMetrics({
  schemaFidelity: 1.0,
  instructionCompliance: 0.95,
});

const profile = QomProfile.strictArgcheck();
const evaluation = profile.evaluate(metrics);

console.log(evaluation.meetsProfile);  // false
console.log(evaluation.violations);    // [{metric: "instructionCompliance", required: 0.97, actual: 0.95}]

Custom Profile Definition¶

from mpl_sdk import QomProfile, MetricThreshold

# Define a custom profile
custom_profile = QomProfile(
    name="my-org-production",
    thresholds={
        "schema_fidelity": MetricThreshold(min=1.0, required=True),
        "instruction_compliance": MetricThreshold(min=0.95, required=True),
        "groundedness": MetricThreshold(min=0.90, required=True),
        "tool_outcome_correctness": MetricThreshold(min=0.98, required=True),
    }
)

# Register in the local registry
registry.register_profile(custom_profile)

Configuring QoM Evaluation¶

Profile Selection Priority¶

QoM profiles are selected in this priority order:

Envelope-level: Profile specified in the message envelope
Handshake-level: Profile negotiated during AI-ALPN
SType-level: Default profile declared in SType metadata
Proxy-level: Default profile in proxy configuration

Per-SType Configuration¶

# registry/stypes/org/calendar/Event/v1/metadata.json
{
  "stype": "org.calendar.Event.v1",
  "default_profile": "qom-strict-argcheck",
  "assertions_path": "assertions.cel",
  "ontology_rules_path": null,
  "post_check_hooks": []
}

Observability¶

QoM metrics are exposed for monitoring:

Prometheus Metrics¶

# Histogram of QoM scores by metric
mpl_qom_score{metric="schema_fidelity", stype="org.calendar.Event.v1"} 1.0
mpl_qom_score{metric="instruction_compliance", stype="org.calendar.Event.v1"} 0.98

# Counter of QoM breaches by profile
mpl_qom_breaches_total{profile="qom-strict-argcheck", metric="instruction_compliance"} 42

# Histogram of evaluation duration
mpl_qom_evaluation_duration_seconds{profile="qom-strict-argcheck"} 0.012

Dashboard¶

The MPL dashboard (http://localhost:9080) provides real-time QoM visibility:

Score distributions per SType
Breach rates over time
Profile compliance trends
Assertion failure heat maps

Next Steps¶

STypes -- Understand the schemas that Schema Fidelity validates
Architecture -- See how QoM fits into the protocol stack
Integration Modes -- Deploy QoM evaluation in your environment