RAG with QoM Tutorial¶
This tutorial demonstrates how MPL validates Retrieval-Augmented Generation (RAG) pipelines. You will learn to structure RAG queries as typed contracts, evaluate groundedness of search results, and handle QoM breaches when answers lack sufficient source support.
Goal¶
By the end of this tutorial, you will:
- Send a typed RAG query through the MPL proxy
- Validate query structure with
eval.rag.RAGQuery.v1 - Understand groundedness scoring for RAG results
- Handle
E-QOM-BREACHwhen groundedness thresholds are not met - Compare behavior across QoM profiles
Prerequisites¶
| Requirement | Version | Check Command |
|---|---|---|
| MPL CLI | >= 0.5.0 | mpl --version |
| MPL Proxy | Running on :9443 |
curl http://localhost:9443/health |
| Python SDK | >= 0.3.0 | pip show mpl-sdk |
| Registry | With eval.rag.* STypes registered |
mpl schemas list --namespace eval |
Architecture¶
sequenceDiagram
participant App as Application
participant MPL as MPL Proxy (:9443)
participant RAG as RAG Pipeline
participant Vec as Vector Store
App->>MPL: RAGQuery (eval.rag.RAGQuery.v1)
Note over MPL: Validate query structure
MPL->>RAG: Forward validated query
RAG->>Vec: Retrieve documents
Vec-->>RAG: Relevant chunks
RAG-->>MPL: SearchResult (eval.rag.SearchResult.v1)
Note over MPL: Evaluate groundedness
Note over MPL: Score: supported_claims / total_claims
MPL-->>App: Result + QoM Report
The RAG STypes¶
eval.rag.RAGQuery.v1¶
The query SType defines the structure of a RAG request:
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "https://mpl.dev/stypes/eval/rag/RAGQuery/v1/schema.json",
"title": "RAG Query",
"description": "A structured query for retrieval-augmented generation with context parameters.",
"type": "object",
"required": ["queryId", "query", "context"],
"additionalProperties": false,
"properties": {
"queryId": {
"type": "string",
"format": "uuid",
"description": "Unique identifier for this query"
},
"query": {
"type": "string",
"minLength": 1,
"maxLength": 2000,
"description": "The natural language query"
},
"context": {
"type": "object",
"required": ["maxDocuments"],
"additionalProperties": false,
"properties": {
"maxDocuments": {
"type": "integer",
"minimum": 1,
"maximum": 50,
"description": "Maximum number of documents to retrieve"
},
"minRelevanceScore": {
"type": "number",
"minimum": 0.0,
"maximum": 1.0,
"description": "Minimum relevance score threshold for retrieved documents"
},
"timeRange": {
"type": "object",
"additionalProperties": false,
"properties": {
"start": {
"type": "string",
"format": "date-time",
"description": "Start of time range filter"
},
"end": {
"type": "string",
"format": "date-time",
"description": "End of time range filter"
}
}
}
},
"description": "Retrieval context configuration"
}
}
}
eval.rag.SearchResult.v1¶
The result SType structures the RAG pipeline response:
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "https://mpl.dev/stypes/eval/rag/SearchResult/v1/schema.json",
"title": "RAG Search Result",
"description": "A structured response from a RAG pipeline with source citations.",
"type": "object",
"required": ["queryId", "answer", "sources", "metadata"],
"additionalProperties": false,
"properties": {
"queryId": {
"type": "string",
"format": "uuid",
"description": "Matches the query's queryId"
},
"answer": {
"type": "string",
"description": "The generated answer text"
},
"sources": {
"type": "array",
"items": {
"type": "object",
"required": ["documentId", "chunk", "relevanceScore"],
"additionalProperties": false,
"properties": {
"documentId": { "type": "string" },
"chunk": { "type": "string" },
"relevanceScore": { "type": "number", "minimum": 0.0, "maximum": 1.0 }
}
},
"description": "Source documents used to generate the answer"
},
"metadata": {
"type": "object",
"required": ["documentsRetrieved", "generationModel"],
"additionalProperties": false,
"properties": {
"documentsRetrieved": { "type": "integer" },
"generationModel": { "type": "string" },
"latencyMs": { "type": "integer" }
}
}
}
}
Step 1: Send a RAG Query¶
Send a validated RAG query through the proxy:
from mpl_sdk import Client
import uuid
client = Client("http://localhost:9443")
query_id = str(uuid.uuid4())
result = await client.call(
"rag.query",
payload={
"queryId": query_id,
"query": "What are the key benefits of semantic governance for AI agents?",
"context": {
"maxDocuments": 5,
"minRelevanceScore": 0.75,
"timeRange": {
"start": "2024-01-01T00:00:00Z",
"end": "2025-12-31T23:59:59Z"
}
}
},
headers={"X-MPL-SType": "eval.rag.RAGQuery.v1"}
)
print(f"Valid: {result.valid}")
print(f"QoM passed: {result.qom_passed}")
import { MplClient } from '@mpl/sdk';
import { v4 as uuidv4 } from 'uuid';
const client = new MplClient('http://localhost:9443');
const queryId = uuidv4();
const result = await client.call('rag.query', {
payload: {
queryId,
query: 'What are the key benefits of semantic governance for AI agents?',
context: {
maxDocuments: 5,
minRelevanceScore: 0.75,
timeRange: {
start: '2024-01-01T00:00:00Z',
end: '2025-12-31T23:59:59Z',
},
},
},
headers: { 'X-MPL-SType': 'eval.rag.RAGQuery.v1' },
});
console.log(`Valid: ${result.valid}`);
console.log(`QoM passed: ${result.qomPassed}`);
curl -X POST http://localhost:9443/call \
-H "Content-Type: application/json" \
-H "X-MPL-SType: eval.rag.RAGQuery.v1" \
-d '{
"method": "rag.query",
"payload": {
"queryId": "550e8400-e29b-41d4-a716-446655440000",
"query": "What are the key benefits of semantic governance for AI agents?",
"context": {
"maxDocuments": 5,
"minRelevanceScore": 0.75,
"timeRange": {
"start": "2024-01-01T00:00:00Z",
"end": "2025-12-31T23:59:59Z"
}
}
}
}'
Step 2: Validate Query Structure¶
The proxy validates the query against the eval.rag.RAGQuery.v1 schema before forwarding. Here is an example of an invalid query:
# Invalid: maxDocuments exceeds the maximum of 50
try:
result = await client.call(
"rag.query",
payload={
"queryId": str(uuid.uuid4()),
"query": "What are the benefits?",
"context": {
"maxDocuments": 100 # Maximum is 50!
}
},
headers={"X-MPL-SType": "eval.rag.RAGQuery.v1"}
)
except client.ValidationError as e:
print(f"Error: {e.code}") # E-SCHEMA-FIDELITY
print(f"Details: {e.details}")
# "100 is greater than the maximum of 50"
Error response:
{
"success": false,
"error": {
"code": "E-SCHEMA-FIDELITY",
"message": "Payload does not conform to eval.rag.RAGQuery.v1",
"validation_errors": [
{
"path": "/context/maxDocuments",
"message": "100 is greater than the maximum of 50",
"keyword": "maximum"
}
]
}
}
Step 3: Understand Groundedness Evaluation¶
When the RAG pipeline returns a response, MPL evaluates the groundedness of the answer. Groundedness measures whether claims in the generated answer are supported by the cited sources.
graph TD
Answer[Generated Answer] --> Extract[Extract Claims]
Extract --> C1[Claim 1]
Extract --> C2[Claim 2]
Extract --> C3[Claim 3]
Sources[Source Documents] --> Verify{Verify Each Claim}
C1 --> Verify
C2 --> Verify
C3 --> Verify
Verify --> Score["G = supported / total<br/>e.g., 2/3 = 0.67"]
style Score fill:#fff9c4
Successful Response with Groundedness¶
A well-grounded response scores high:
{
"success": true,
"data": {
"queryId": "550e8400-e29b-41d4-a716-446655440000",
"answer": "Semantic governance provides three key benefits for AI agents: (1) type-safe communication through STypes, (2) measurable quality via QoM metrics, and (3) audit trails for compliance.",
"sources": [
{
"documentId": "doc-001",
"chunk": "STypes provide type-safe communication contracts between AI agents, ensuring payload structure is validated before processing.",
"relevanceScore": 0.92
},
{
"documentId": "doc-002",
"chunk": "Quality of Meaning metrics quantify semantic quality across six dimensions, giving operators measurable confidence in AI outputs.",
"relevanceScore": 0.88
},
{
"documentId": "doc-003",
"chunk": "MPL's audit trail captures semantic hashes and provenance chains, enabling full compliance visibility.",
"relevanceScore": 0.85
}
],
"metadata": {
"documentsRetrieved": 3,
"generationModel": "gpt-4",
"latencyMs": 1240
}
},
"mpl": {
"stype": "eval.rag.SearchResult.v1",
"sem_hash": "sha256:a3f2e1c...",
"qom_report": {
"profile": "qom-strict-argcheck",
"meets_profile": true,
"metrics": {
"schema_fidelity": {
"score": 1.0,
"details": { "validation_errors": [] }
},
"instruction_compliance": {
"score": 1.0,
"details": { "assertions_total": 2, "assertions_passed": 2 }
},
"groundedness": {
"score": 1.0,
"details": {
"claims_total": 3,
"claims_supported": 3,
"claims": [
{ "claim": "type-safe communication through STypes", "supported": true, "source": "doc-001" },
{ "claim": "measurable quality via QoM metrics", "supported": true, "source": "doc-002" },
{ "claim": "audit trails for compliance", "supported": true, "source": "doc-003" }
]
}
}
},
"evaluation_duration_ms": 85
}
}
}
Step 4: Handle Groundedness Breach¶
When the generated answer contains claims not supported by the retrieved sources, the groundedness score drops. If it falls below the profile threshold, an E-QOM-BREACH is raised.
from mpl_sdk import Client, QomProfile
client = Client(
"http://localhost:9443",
qom_profile=QomProfile.STRICT_ARGCHECK # Requires groundedness >= 0.9
)
# Simulate a poorly-grounded response from the RAG pipeline
# The answer makes claims not in the source documents
try:
result = await client.call(
"rag.query",
payload={
"queryId": str(uuid.uuid4()),
"query": "What is the market share of semantic governance tools?",
"context": {
"maxDocuments": 5,
"minRelevanceScore": 0.7
}
},
headers={"X-MPL-SType": "eval.rag.RAGQuery.v1"}
)
except client.QomBreachError as e:
print(f"Error: {e.code}") # E-QOM-BREACH
print(f"Profile: {e.profile}") # qom-strict-argcheck
print(f"Metric: {e.metric}") # groundedness
print(f"Required: {e.required}") # 0.9
print(f"Actual: {e.actual}") # 0.33
print(f"Retry allowed: {e.retry_allowed}") # True
import { MplClient, QomProfile, QomBreachError } from '@mpl/sdk';
const client = new MplClient('http://localhost:9443', {
qomProfile: QomProfile.StrictArgcheck,
});
try {
const result = await client.call('rag.query', {
payload: {
queryId: uuidv4(),
query: 'What is the market share of semantic governance tools?',
context: {
maxDocuments: 5,
minRelevanceScore: 0.7,
},
},
headers: { 'X-MPL-SType': 'eval.rag.RAGQuery.v1' },
});
} catch (error) {
if (error instanceof QomBreachError) {
console.log(`Error: ${error.code}`); // E-QOM-BREACH
console.log(`Profile: ${error.profile}`); // qom-strict-argcheck
console.log(`Metric: ${error.metric}`); // groundedness
console.log(`Required: ${error.required}`); // 0.9
console.log(`Actual: ${error.actual}`); // 0.33
}
}
Breach Response¶
{
"success": false,
"error": {
"code": "E-QOM-BREACH",
"message": "Response does not meet qom-strict-argcheck profile",
"profile": "qom-strict-argcheck",
"violations": [
{
"metric": "groundedness",
"required": 0.9,
"actual": 0.33,
"gap": 0.57
}
],
"retry_allowed": true,
"retry_budget": 2,
"details": {
"claims_total": 3,
"claims_supported": 1,
"unsupported_claims": [
"Semantic governance tools hold 45% market share",
"The market is expected to grow 200% by 2026"
]
}
}
}
Hallucination Detection
The groundedness metric is MPL's primary defense against AI hallucinations in RAG pipelines. When the answer makes claims not present in the retrieved sources, groundedness drops and the proxy can block the response before it reaches the end user.
Step 5: Compare QoM Profiles¶
Different profiles set different groundedness thresholds. Let us compare the behavior:
Profile Comparison¶
| Profile | Groundedness Required | Behavior |
|---|---|---|
qom-basic |
Not evaluated (1.0 by default) | Only schema is checked |
qom-strict-argcheck |
>= 0.9 | Answer must be well-grounded |
qom-comprehensive |
>= 0.95 | Near-perfect source support required |
Using qom-basic (No Groundedness Check)¶
client_basic = Client(
"http://localhost:9443",
qom_profile=QomProfile.BASIC
)
# Same query passes with qom-basic (groundedness not evaluated)
result = await client_basic.call(
"rag.query",
payload={
"queryId": str(uuid.uuid4()),
"query": "What is the market share of semantic governance tools?",
"context": { "maxDocuments": 5, "minRelevanceScore": 0.7 }
},
headers={"X-MPL-SType": "eval.rag.RAGQuery.v1"}
)
print(f"Valid: {result.valid}") # True (schema passes)
print(f"QoM passed: {result.qom_passed}") # True (groundedness not required)
curl -X POST http://localhost:9443/call \
-H "Content-Type: application/json" \
-H "X-MPL-SType: eval.rag.RAGQuery.v1" \
-H "X-MPL-QoM-Profile: qom-basic" \
-d '{
"method": "rag.query",
"payload": {
"queryId": "550e8400-e29b-41d4-a716-446655440002",
"query": "What is the market share of semantic governance tools?",
"context": { "maxDocuments": 5, "minRelevanceScore": 0.7 }
}
}'
Using qom-strict (Strict Groundedness)¶
client_strict = Client(
"http://localhost:9443",
qom_profile=QomProfile.STRICT_ARGCHECK
)
# Same query may fail with qom-strict if groundedness < 0.9
try:
result = await client_strict.call(
"rag.query",
payload={
"queryId": str(uuid.uuid4()),
"query": "What is the market share of semantic governance tools?",
"context": { "maxDocuments": 5, "minRelevanceScore": 0.7 }
},
headers={"X-MPL-SType": "eval.rag.RAGQuery.v1"}
)
print(f"Groundedness: {result.qom_report.metrics.groundedness.score}")
except client_strict.QomBreachError as e:
print(f"Blocked! Groundedness {e.actual} < {e.required}")
curl -X POST http://localhost:9443/call \
-H "Content-Type: application/json" \
-H "X-MPL-SType: eval.rag.RAGQuery.v1" \
-H "X-MPL-QoM-Profile: qom-strict-argcheck" \
-d '{
"method": "rag.query",
"payload": {
"queryId": "550e8400-e29b-41d4-a716-446655440003",
"query": "What is the market share of semantic governance tools?",
"context": { "maxDocuments": 5, "minRelevanceScore": 0.7 }
}
}'
Choosing the Right Profile for RAG
- Development: Use
qom-basicto iterate quickly without groundedness checks - Staging: Use
qom-strict-argcheckto catch hallucination patterns before production - Production: Use
qom-strict-argcheckorqom-comprehensivedepending on risk tolerance - Healthcare/Legal: Use
qom-comprehensivewith groundedness >= 0.95
Improving Groundedness¶
When your RAG pipeline produces low groundedness scores, consider these strategies:
| Strategy | Description | Impact |
|---|---|---|
Increase maxDocuments |
Retrieve more context for the generator | More evidence available |
Raise minRelevanceScore |
Only use highly relevant chunks | Higher quality evidence |
Narrow timeRange |
Focus on recent, authoritative sources | Reduces stale citations |
| Improve chunking | Better document splitting strategy | More precise chunks |
| Add citations | Instruct the model to cite sources inline | Easier verification |
What You Learned¶
In this tutorial, you:
- Structured RAG queries using the
eval.rag.RAGQuery.v1SType with context parameters - Validated query structure and saw how schema violations are caught
- Understood groundedness as the ratio of supported claims to total claims
- Handled QoM breaches when groundedness thresholds are not met
- Compared profiles and saw how different thresholds affect behavior
Next Steps¶
- Multi-Agent Workflow -- Typed communication between agents
- Creating a Custom SType -- Build your own semantic type
- QoM Concepts -- Deep dive into all six quality metrics
- Calendar Workflow -- Start with the basics if you skipped it