Elo Rating System¶
The Elo rating system, originally developed for chess, calculates relative skill levels based on head-to-head outcomes.
How It Works¶
- Initial Rating: All entities start at 1500 (configurable via
ELO_INITIAL_RATING) - Expected Score: Before a comparison, we calculate each entity's probability of winning
- Rating Update: After the comparison, ratings adjust based on the outcome vs expectation
The Math¶
Expected Score¶
The expected score (probability of winning) for entity A against entity B:
Where:
- \(E_A\) = Expected score for entity A
- \(R_A\) = Current rating of entity A
- \(R_B\) = Current rating of entity B
Rating Update¶
After the comparison:
Where:
- \(R'_A\) = New rating for entity A
- \(K\) = K-factor (sensitivity, default 32)
- \(S_A\) = Actual score (1 for win, 0 for loss, 0.5 for tie)
- \(E_A\) = Expected score
Example¶
Entity A (rating 1600) vs Entity B (rating 1400):
# Expected scores
E_a = 1 / (1 + 10**((1400 - 1600) / 400)) # = 0.76
E_b = 1 / (1 + 10**((1600 - 1400) / 400)) # = 0.24
# If A wins (expected outcome)
R_a_new = 1600 + 32 * (1 - 0.76) # = 1608 (+8)
R_b_new = 1400 + 32 * (0 - 0.24) # = 1392 (-8)
# If B wins (upset!)
R_a_new = 1600 + 32 * (0 - 0.76) # = 1576 (-24)
R_b_new = 1400 + 32 * (1 - 0.24) # = 1424 (+24)
Key Insight
Upsets cause larger rating changes than expected outcomes. Beating a higher-rated opponent gives more points than beating a lower-rated one.
K-Factor¶
The K-factor controls rating volatility:
| K-Factor | Use Case |
|---|---|
| 40-50 | New systems, rapid convergence needed |
| 32 | Standard (default) |
| 16-24 | Established rankings, stability preferred |
| 10-16 | Professional leagues, slow changes |
Configure with ELO_K_FACTOR environment variable.
Tie Support¶
Standard Elo only handles win/loss. Compere supports ties:
if winner_id == entity1.id:
score_a, score_b = 1, 0 # Entity 1 wins
elif winner_id == entity2.id:
score_a, score_b = 0, 1 # Entity 2 wins
else:
score_a, score_b = 0.5, 0.5 # Tie
Ties result in smaller rating adjustments, useful when preferences are genuinely equal.
Rating Interpretation¶
| Rating Range | Interpretation |
|---|---|
| 1700+ | Exceptional |
| 1600-1700 | Excellent |
| 1500-1600 | Above average |
| 1400-1500 | Below average |
| 1300-1400 | Poor |
| < 1300 | Bottom tier |
Note
These ranges are relative to your specific dataset. The actual ratings depend on the number and outcomes of comparisons.