testnet

Reference · methodology

How the Skill Score works

One number, 0 to 100. It answers: how often does this wallet call things right, with harder calls counting more and recent calls counting more than old ones?

Updated2026-05-19Commit0xd218418Read4 minSectionR1 of 05
METHOD V2used since 2026-05-12before that: v1 (Wilson + difficulty only)

01 · The formula

For every prediction the AI judge has decided, we compute one number — the prediction's contribution to the score:

formula
contribution = difficulty_weight × recency_weight

Add these up across every alias the same wallet has ever used, then run the at 95% confidence:

formula
skill = wilson_lower_bound_95(
  weighted_hits     = Σ contribution(p) for hits,
  weighted_attempts = Σ contribution(p) for all resolved
) × 100

02 · How hard was the call?

When the AI judge decides hit or miss, it also rates how hard the call was. Obvious calls (already true on the day they were locked) count for nothing — the anti-spam choice.

Difficulty weights used in the Skill Score formula. Higher weight = call counts more, whether it's a hit or a miss.
DifficultyWeightMeaning
Obvious0.0×Already true on the day it was locked. Doesn't move the score.
Easy0.3×Likely outcome — a safe macro guess or near-term price call.
Real call1.0×Genuine uncertainty — could go either way.
Bold call2.0×Going against the consensus. The riskiest, worth the most.

03 · Recent calls count more

Old hits fade. A prediction's contribution to the score halves every 180 days. This is the value uses for its own forecasting tournaments.

formula
recency_weight(p) = 0.5^(age_days / 180)
Recency decay curve0.000.250.500.751.00180d½360d¼540dtoday · 1.00AGE (DAYS) →WEIGHT
Recency weights at common ages. Multiply by the difficulty weight to get the per-prediction contribution.
AgeWeight
today1.000×
90 days ago0.707×
180 days ago0.500×
360 days ago0.250×
540 days ago0.125×
720 days ago0.063×

Why fading matters: it makes running many fake accounts expensive. If a wallet creates 10 aliases and abandons 8 of them, the old wins on the abandoned 8 keep fading until they barely matter — pressure to either keep every alias active or watch the score drop.

04 · Worked example

17 decided predictions, computed end-to-end against the real lib/leaderboard constants. Change SKILL_HALF_LIFE_MS in the code and the numbers below update automatically — this card stays in sync with the math.

Worked example · 17 decided predictions0xc187…391df
Sample predictions
agediff?contrib
12dboldhit2.0× × 0.95 = 1.91
34drealhit1.0× × 0.88 = 0.88
58drealmiss1.0× × 0.80 = 0.80
89deasyhit0.3× × 0.71 = 0.21
120dboldhit2.0× × 0.63 = 1.26
154drealhit1.0× × 0.55 = 0.55
11 more rows
Totals — no fading vs fading
Side-by-side comparison of the Skill Score components with and without the 180-day recency fading applied.
metricno fadewith fade
weighted hits10.96.0
weighted attempts17.58.6
p̂ (hit rate)0.6230.698
Wilson 95% lower0.390.37
SKILL SCORE · FINAL372 from the no-fade score of 39

05 · Who can rank

To appear on the ranked leaderboard, a wallet needs:

  • At least 3 decided predictions across all its aliases
  • At least 2 bold calls (real-call or bold difficulty) — stops anyone from ranking off a single lucky hit

Wallets below the bar show up in the "Provisional" section instead.

06 · One wallet, one score — aliases don't help

We compute the score from the sum of weighted hits and attempts across every alias one wallet operates. A wallet with 2 lucky aliases and 8 abandoned losing aliases sees its score dragged down by the 8 losers. Creating extra aliases stops being a strategy.

Each alias still shows its own score on its own profile page — for the curious. But the ranking number is always at the wallet level. The wallet is the anchor.

07 · Frequently asked

Can I see my raw (no-fade) score separately?
Yes. Profile pages show both, with a small was X before fading sub-label whenever the gap is more than 3 points. The leaderboard ranks on the fading version.
Why 180 days specifically?
That's what Metaculus uses for its own forecasting tournaments. We picked it for consistency with the broader prediction-market research. If crypto-pace data later suggests it should be shorter or longer, we'll change it.
Does making a new alias help my score?
No. The score lives at the wallet level — misses on alias B drag alias A down. This is on purpose: it stops anyone from cherry-picking wins across many fake accounts.
What about predictions decided years ago — do they count?
Barely. At 540 days a hit weighs 12.5%; at 720 days, 6%. The score is dominated by what you called right in the last ~360 days.
Where on the site will I see this score?
Profile headline, the "By wallet" tab on the leaderboard, and the reputation API at /api/wallet/[publisher]/stats.
How is difficulty decided?
The AI judge picks a difficulty at the same moment it decides hit or miss. Obvious = already true on the day it was locked (the anti-spam choice). Bold = going against the consensus. See the Resolution Agent page for the full prompt the AI uses.

08 · Why Wilson, not raw hit-rate?

A wallet with 3 hits out of 3 calls has a 100% hit rate — but that's a tiny sample. The at 95% gives a statistically careful answer that grows with sample size: a 3-for-3 profile scores around 30, while a 100-for-100 scores near 96. Forecasters earn the high score by being right consistently, not by getting lucky once.

The formula is Wilson (1927), most famously used for product ratings on Reddit and Yelp. We extend it to non-integer counts (difficulty weights × recency factors produce decimal values) — at hackathon scale the basic formula is close enough; a fully rigorous version would use a .

Related — the V4 anti-gaming kit

This Skill Score is one of four V4 anti-gaming pieces. The others: