Incremental Computation¶
Update strategies efficiently when new data arrives.
Overview¶
Instead of recomputing everything from scratch, sigc can incrementally update:
- Rolling statistics
- Signal values
- Portfolio weights
How It Works¶
Traditional vs Incremental¶
Traditional (Full Recompute):
Incremental:
Performance Comparison¶
| Operation | Full Recompute | Incremental | Speedup |
|---|---|---|---|
| Rolling mean (252) | O(n × 252) | O(n) | 252x |
| Rolling std (60) | O(n × 60) | O(n) | 60x |
| EMA (20) | O(n × 20) | O(n) | 20x |
Enabling Incremental Mode¶
Configuration¶
CLI Flag¶
Incremental Operations¶
Rolling Mean¶
Maintains running sum:
Rolling Standard Deviation¶
Uses Welford's online algorithm:
Exponential Moving Average¶
Naturally incremental:
State Management¶
Checkpoint Files¶
Checkpoint structure:
Checkpoint Contents¶
Checkpoint: momentum
Last Date: 2024-01-15
Rolling Buffers:
- ret_60: [252 values per asset]
- zscore_252: [252 values per asset]
Computed Values:
- AAPL: 1.234
- GOOGL: -0.567
...
Daemon Mode Integration¶
Automatic Incremental Updates¶
daemon:
enabled: true
mode: incremental
schedule:
- cron: "0 16 * * 1-5" # 4 PM weekdays
action: update
Update Flow¶
1. Load checkpoint from previous run
2. Fetch new data since checkpoint
3. Update rolling statistics
4. Compute new signal values
5. Generate weights
6. Save new checkpoint
Memory Efficiency¶
Buffer Management¶
Only keeps required history:
Example¶
If your longest lookback is 252 days:
sigc keeps only 252 days in memory, not full history.
Supported Operations¶
Fully Incremental¶
| Operation | State Size |
|---|---|
rolling_mean |
O(window) |
rolling_std |
O(window) |
rolling_sum |
O(window) |
ema |
O(1) |
lag |
O(lag) |
diff |
O(1) |
Partially Incremental¶
| Operation | Notes |
|---|---|
rolling_corr |
Requires covariance state |
rolling_rank |
Requires sorted buffer |
quantile |
Requires sorted buffer |
Non-Incremental¶
Some operations require full recompute:
| Operation | Reason |
|---|---|
zscore (cross-sectional) |
Needs all assets |
rank (cross-sectional) |
Needs all assets |
neutralize |
Needs all assets |
Handling Dependencies¶
Dependency Graph¶
signal a:
emit rolling_mean(prices, 20) // Independent
signal b:
emit rolling_std(prices, 20) // Independent
signal c:
emit a / b // Depends on a and b
Update order:
Automatic Ordering¶
sigc automatically determines update order based on dependencies.
Error Recovery¶
Checkpoint Validation¶
Validates: - Checkpoint date matches expected - All assets present - Buffer sizes correct
Recovery Options¶
performance:
incremental:
on_error: recompute # Full recompute on error
# or: fail # Stop and report
Full Recompute Triggers¶
Force full recompute when:
- No checkpoint exists
- Strategy changed
- Data gap detected
- Checkpoint corrupted
- Manual trigger
Best Practices¶
1. Set Checkpoint Interval¶
Balance between: - Too frequent: Disk I/O overhead - Too rare: Long recovery time
2. Monitor Checkpoint Size¶
3. Validate Periodically¶
Run full recompute periodically to verify:
4. Handle Universe Changes¶
When assets are added/removed:
Example: Daily Update Workflow¶
# config.yaml
data:
source = "prices.parquet"
daemon:
enabled: true
schedule:
- cron: "0 16 * * 1-5"
action: update
performance:
incremental:
enabled: true
checkpoint_dir: "./checkpoints"
checkpoint_interval: 1d
# Strategy
signal momentum:
emit zscore(ret(prices, 60))
portfolio main:
weights = rank(momentum).long_short(top=0.2, bottom=0.2)
Daily execution:
4:00 PM - Daemon triggers
4:00:01 - Load checkpoint
4:00:02 - Fetch today's prices
4:00:03 - Update rolling returns
4:00:04 - Compute new signals
4:00:05 - Generate weights
4:00:06 - Save checkpoint
4:00:07 - Output weights
Troubleshooting¶
Slow Incremental Updates¶
Check if cross-sectional operations are the bottleneck:
Checkpoint Mismatch¶
Solution: Check for missing data or run full recompute.
Memory Issues¶
Reduce checkpoint buffer sizes:
Next Steps¶
- Memory Mapping - Large dataset handling
- Parallel Execution - Multi-core processing
- Daemon Mode - Production scheduling