Parallel Execution¶
Leverage multi-core processing for faster computation.
Overview¶
sigc automatically parallelizes:
- Cross-sectional computations (across assets)
- Independent signal calculations
- Parameter optimization
- Walk-forward windows
Configuration¶
Enable Parallelism¶
Auto-Detection¶
What Gets Parallelized¶
Cross-Sectional Operations¶
Operations across assets run in parallel:
Independent Signals¶
Multiple signals compute simultaneously:
Text Only
signal momentum:
emit zscore(ret(prices, 60))
signal value:
emit zscore(book_to_market)
signal quality:
emit zscore(roe)
// All three signals compute in parallel
signal combined:
emit 0.33 * momentum + 0.33 * value + 0.34 * quality
Parameter Grid¶
Text Only
params:
lookback: range(20, 120, 20)
top_pct: range(0.1, 0.4, 0.1)
// Each parameter combination runs in parallel
portfolio optimized:
weights = rank(momentum).long_short(top=top_pct, bottom=top_pct)
backtest from 2020-01-01 to 2024-12-31
Walk-Forward Windows¶
Text Only
portfolio validated:
backtest walk_forward(
train_years = 5,
test_years = 2,
step_years = 2,
parallel = true // Windows compute in parallel
) from 2010-01-01 to 2024-12-31
Thread Pool Configuration¶
Basic Configuration¶
Advanced Configuration¶
YAML
performance:
parallel:
enabled: true
workers: 8
# Thread pool settings
thread_pool:
name: "sigc-compute"
stack_size_mb: 4
# Work stealing
work_stealing: true
# Granularity
min_batch_size: 100 # Min items per thread
Parallelism Strategies¶
Data Parallelism¶
Same operation on different data:
Text Only
┌─────────────────────────────────────────────────────┐
│ zscore(ret(prices, 60)) │
│ │
│ Thread 1 Thread 2 Thread 3 Thread 4 │
│ ┌───────┐ ┌───────┐ ┌───────┐ ┌───────┐ │
│ │AAPL │ │GOOGL │ │AMZN │ │NVDA │ │
│ │MSFT │ │META │ │TSLA │ │JPM │ │
│ │... │ │... │ │... │ │... │ │
│ └───────┘ └───────┘ └───────┘ └───────┘ │
└─────────────────────────────────────────────────────┘
Task Parallelism¶
Different operations simultaneously:
Text Only
┌─────────────────────────────────────────────────────┐
│ Multiple Signals │
│ │
│ Thread 1 Thread 2 Thread 3 │
│ ┌───────────┐ ┌───────────┐ ┌───────────┐ │
│ │ momentum │ │ value │ │ quality │ │
│ │ signal │ │ signal │ │ signal │ │
│ └───────────┘ └───────────┘ └───────────┘ │
│ │ │ │ │
│ └──────────────┼──────────────┘ │
│ ▼ │
│ ┌───────────┐ │
│ │ combined │ │
│ └───────────┘ │
└─────────────────────────────────────────────────────┘
Performance Tuning¶
Optimal Worker Count¶
YAML
# CPU-bound: Use all cores
workers: auto
# Memory-bound: Use fewer
workers: 4
# I/O-bound: Can exceed core count
workers: 16
Batch Size¶
Small batch = more overhead Large batch = less parallelism
Memory Considerations¶
Measuring Performance¶
Benchmark Mode¶
Output:
Text Only
Performance Benchmark:
======================
Total Time: 2.34s
Data Loading: 0.45s (19%)
Signal Computation: 1.52s (65%)
Portfolio Construction: 0.25s (11%)
Backtest Simulation: 0.12s (5%)
Parallelism:
Workers Used: 8
Parallel Efficiency: 85%
Speedup vs Serial: 6.8x
Memory:
Peak Usage: 1.2 GB
Data Size: 850 MB
Profiling¶
Generates detailed profile report.
CLI Options¶
Bash
# Specify worker count
sigc run strategy.sig --workers 8
# Disable parallelism
sigc run strategy.sig --workers 1
# Auto-detect
sigc run strategy.sig --workers auto
Best Practices¶
1. Match Workers to Cores¶
2. Profile Before Optimizing¶
3. Consider Memory¶
More workers = more memory usage.
4. Batch Small Operations¶
Very small operations have parallelism overhead.
5. Use SIMD Where Available¶
sigc uses SIMD for rolling statistics automatically.
Limitations¶
Sequential Dependencies¶
Some operations must be sequential:
Cross-Time Dependencies¶
Memory Bandwidth¶
Parallelism limited by memory bandwidth for large datasets.
Troubleshooting¶
High CPU, Low Speedup¶
- Too many small tasks
- Memory bandwidth limited
- Lock contention
Memory Issues¶
Inconsistent Results¶
Ensure operations are deterministic:
Next Steps¶
- Incremental Computation - Efficient updates
- Memory Mapping - Large datasets
- Configuration - Full config reference