PostgreSQL Integration¶
Load data directly from PostgreSQL databases.
Basic Usage¶
Text Only
data:
source = "postgresql://localhost/marketdb"
query = "SELECT date, ticker, close, volume FROM daily_prices"
Connection String¶
Format¶
Examples¶
Text Only
// Local database
source = "postgresql://localhost/marketdb"
// With credentials
source = "postgresql://user:password@localhost/marketdb"
// Custom port
source = "postgresql://localhost:5433/marketdb"
// SSL required
source = "postgresql://user:password@host/db?sslmode=require"
Authentication¶
Connection String (Development)¶
Environment Variables (Recommended)¶
Bash
export PGUSER=your_user
export PGPASSWORD=your_password
export PGHOST=localhost
export PGDATABASE=marketdb
Password File (~/.pgpass)¶
Query Syntax¶
Basic Query¶
Text Only
data:
source = "postgresql://localhost/marketdb"
query = "SELECT date, ticker, close, volume FROM daily_prices"
With Filters¶
Text Only
data:
source = "postgresql://localhost/marketdb"
query = """
SELECT date, ticker, close, volume
FROM daily_prices
WHERE date >= '2020-01-01'
AND ticker IN ('AAPL', 'MSFT', 'GOOGL')
ORDER BY date, ticker
"""
With Joins¶
Text Only
data:
source = "postgresql://localhost/marketdb"
query = """
SELECT
p.date,
p.ticker,
p.close,
p.volume,
f.pe_ratio,
f.book_value
FROM daily_prices p
LEFT JOIN fundamentals f
ON p.ticker = f.ticker
AND p.date = f.date
ORDER BY p.date, p.ticker
"""
Using Parameters¶
Text Only
data:
source = "postgresql://localhost/marketdb"
query = """
SELECT date, ticker, close, volume
FROM daily_prices
WHERE date BETWEEN $1 AND $2
"""
options:
params = ["2020-01-01", "2024-12-31"]
Column Mapping¶
Text Only
data:
source = "postgresql://localhost/marketdb"
query = "SELECT trade_date, symbol, adj_close, shares_traded FROM prices"
columns:
trade_date: Date
symbol: Symbol
adj_close: Numeric as prices
shares_traded: Numeric as volume
Connection Options¶
SSL Mode¶
Text Only
data:
source = "postgresql://host/db"
query = "..."
options:
sslmode = "require" # disable, allow, prefer, require, verify-ca, verify-full
Connection Pool¶
Text Only
data:
source = "postgresql://host/db"
query = "..."
options:
pool_size = 5
pool_timeout = 30
Statement Timeout¶
Text Only
data:
source = "postgresql://host/db"
query = "..."
options:
statement_timeout = 60000 # 60 seconds in milliseconds
Performance¶
Index Usage¶
Ensure your tables have appropriate indexes:
SQL
-- Recommended indexes for market data
CREATE INDEX idx_prices_date ON daily_prices(date);
CREATE INDEX idx_prices_ticker ON daily_prices(ticker);
CREATE INDEX idx_prices_date_ticker ON daily_prices(date, ticker);
Partitioned Tables¶
For large datasets, use PostgreSQL partitioning:
SQL
-- Partition by date range
CREATE TABLE daily_prices (
date DATE NOT NULL,
ticker TEXT NOT NULL,
close NUMERIC,
volume BIGINT
) PARTITION BY RANGE (date);
CREATE TABLE daily_prices_2023 PARTITION OF daily_prices
FOR VALUES FROM ('2023-01-01') TO ('2024-01-01');
Query Optimization¶
Text Only
// Use specific columns
query = "SELECT date, ticker, close FROM prices" // Not SELECT *
// Add date filters
query = "SELECT ... WHERE date >= '2020-01-01'"
// Use LIMIT for testing
query = "SELECT ... LIMIT 1000"
Parallel Queries¶
Text Only
data:
source = "postgresql://host/db"
query = "..."
options:
parallel_workers = 4 # Enable parallel query
Multiple Data Sources¶
Prices and Fundamentals¶
Text Only
data prices:
source = "postgresql://localhost/marketdb"
query = "SELECT date, ticker, close FROM daily_prices"
columns:
close: Numeric as prices
data fundamentals:
source = "postgresql://localhost/marketdb"
query = "SELECT date, ticker, pe_ratio, book_value FROM fundamentals"
signal combined:
momentum = zscore(ret(prices.prices, 60))
value = zscore(fundamentals.book_value)
emit 0.5 * momentum + 0.5 * value
Read Replica¶
Text Only
data:
source = "postgresql://read-replica.host/marketdb"
query = "..."
options:
target_session_attrs = "read-only"
TimescaleDB Support¶
sigc works with TimescaleDB hypertables:
Text Only
data:
source = "postgresql://localhost/marketdb"
query = """
SELECT time_bucket('1 day', date) AS date,
ticker,
last(close, date) AS close,
sum(volume) AS volume
FROM daily_prices
WHERE date >= '2020-01-01'
GROUP BY 1, 2
ORDER BY 1, 2
"""
Error Handling¶
Connection Failed¶
Check:
- PostgreSQL is running
- Host/port are correct
- Firewall allows connection
Authentication Failed¶
Check:
- Username/password correct
- User has database access
- pg_hba.conf allows connection
Query Timeout¶
Increase timeout:
Permission Denied¶
Grant access:
Best Practices¶
1. Use Read Replicas for Analysis¶
2. Add Date Filters¶
Text Only
query = """
SELECT date, ticker, close, volume
FROM daily_prices
WHERE date >= '2020-01-01' -- Always filter!
"""
3. Use Prepared Statements¶
4. Set Reasonable Timeouts¶
5. Close Connections¶
sigc automatically manages connection pooling and cleanup.
Example: Complete Setup¶
Text Only
data:
source = "postgresql://analytics:${PGPASSWORD}@db.example.com:5432/marketdb"
query = """
SELECT
p.date,
p.ticker,
p.adjusted_close AS close,
p.volume,
s.sector,
f.pe_ratio,
f.book_to_market
FROM prices p
JOIN securities s ON p.ticker = s.ticker
LEFT JOIN fundamentals f
ON p.ticker = f.ticker
AND p.date = f.date
WHERE p.date >= '2015-01-01'
ORDER BY p.date, p.ticker
"""
columns:
date: Date
ticker: Symbol
close: Numeric as prices
volume: Numeric
sector: String as sectors
pe_ratio: Numeric
book_to_market: Numeric
options:
sslmode = "require"
statement_timeout = 120000
signal momentum:
raw = zscore(ret(prices, 60))
neutral = neutralize(raw, by=sectors)
emit neutral
portfolio main:
weights = rank(momentum).long_short(top=0.2, bottom=0.2)
backtest from 2020-01-01 to 2024-12-31
Next Steps¶
- CSV Format - File-based loading
- S3 Storage - Cloud storage
- Data Quality - Validating your data