Design Decisions¶
This document explains the key design decisions in EmbedCache.
Dual Interface (Library + Service)¶
Decision: EmbedCache works both as a Rust library and a REST API service.
Rationale: - Library mode enables direct integration without network overhead - Service mode provides language-agnostic access - Same core code serves both use cases - Easy to start with service, migrate to library for performance
Trait-Based Extensibility¶
Decision: Core functionality uses traits (ContentChunker, Embedder).
Rationale: - Easy to add custom implementations - Test with mocks and stubs - Swap implementations without changing callers - Future-proof for new embedding providers
SQLite for Caching¶
Decision: Use SQLite for caching processed content.
Rationale: - Zero configuration database - Single file deployment - ACID compliance - Good performance for read-heavy workloads - WAL mode supports concurrent reads
Trade-offs: - Not suitable for distributed caching - Write throughput limited - Not designed for multi-server deployments
Hash-Based Cache Keys¶
Decision: Cache keys are SHA-256 hashes of URL + config.
Rationale: - Deterministic: same input = same key - Fixed length regardless of URL length - Collision-resistant - Config changes create new cache entries
FastEmbed for Embedding¶
Decision: Use FastEmbed library for embedding generation.
Rationale: - Local inference (no API costs) - Multiple model support - Rust-native library - Good performance
Trade-offs: - Memory usage for models - Initial download required - Limited to FastEmbed-supported models
Optional LLM Chunking¶
Decision: LLM-based chunking is optional, falls back to word chunking.
Rationale: - Works without LLM infrastructure - Progressive enhancement - Graceful degradation on failure - Users choose cost/quality trade-off
Async Architecture¶
Decision: Fully async with Tokio runtime.
Rationale: - Efficient handling of I/O-bound operations - Better resource utilization - Scales with concurrent requests - Required for HTTP service
Configuration via Environment¶
Decision: Configure via environment variables and .env file.
Rationale: - 12-factor app compliance - Easy Docker/container deployment - No config file parsing complexity - Works with most deployment systems
Modular Source Structure¶
Decision: Split code into focused modules.
Rationale: - Clear separation of concerns - Easier to understand and maintain - Parallel development possible - Focused testing
No Authentication Built-in¶
Decision: Authentication handled by reverse proxy.
Rationale: - Simpler core service - Flexible auth options via proxy - Separation of concerns - Production deployments use proxies anyway
Error Handling¶
Decision: Use anyhow for library, actix_web::Error for handlers.
Rationale:
- anyhow provides rich context
- Handler errors map to HTTP responses
- Consistent error handling pattern
- Easy error propagation with ?
No Built-in Rate Limiting¶
Decision: Rate limiting handled by reverse proxy.
Rationale: - More flexible configuration - Proxy already handles this well - Simpler service code - Standard deployment pattern
Chunking Size in Words¶
Decision: Chunk size specified in words, not tokens.
Rationale: - More intuitive for users - Consistent across models - Easier to reason about - Approximate token counts vary by model