Reference · methodology

How the Skill Score works

One number, 0 to 100. It answers: how often does this wallet call things right, with harder calls counting more and recent calls counting more than old ones?

Updated2026-05-19Commit0xd218418Read4 minSectionR1 of 05

METHOD V2used since 2026-05-12before that: v1 (Wilson + difficulty only)

01 · The formula

For every prediction the AI judge has decided, we compute one number — the prediction's contribution to the score:

formula

contribution = difficulty_weight × recency_weight

Add these up across every alias the same wallet has ever used, then run the at 95% confidence:

formula

skill = wilson_lower_bound_95(
  weighted_hits     = Σ contribution(p) for hits,
  weighted_attempts = Σ contribution(p) for all resolved
) × 100

02 · How hard was the call?

When the AI judge decides hit or miss, it also rates how hard the call was. Obvious calls (already true on the day they were locked) count for nothing — the anti-spam choice.

Difficulty weights used in the Skill Score formula. Higher weight = call counts more, whether it's a hit or a miss.
Difficulty	Weight	Meaning
Obvious	0.0×	Already true on the day it was locked. Doesn't move the score.
Easy	0.3×	Likely outcome — a safe macro guess or near-term price call.
Real call	1.0×	Genuine uncertainty — could go either way.
Bold call	2.0×	Going against the consensus. The riskiest, worth the most.

03 · Recent calls count more

Old hits fade. A prediction's contribution to the score halves every 180 days. This is the value uses for its own forecasting tournaments.

formula

recency_weight(p) = 0.5^(age_days / 180)

Recency weights at common ages. Multiply by the difficulty weight to get the per-prediction contribution.
Age	Weight
today	1.000×
90 days ago	0.707×
180 days ago	0.500×
360 days ago	0.250×
540 days ago	0.125×
720 days ago	0.063×

Why fading matters: it makes running many fake accounts expensive. If a wallet creates 10 aliases and abandons 8 of them, the old wins on the abandoned 8 keep fading until they barely matter — pressure to either keep every alias active or watch the score drop.

04 · Worked example

17 decided predictions, computed end-to-end against the real lib/leaderboard constants. Change SKILL_HALF_LIFE_MS in the code and the numbers below update automatically — this card stays in sync with the math.

Worked example · 17 decided predictions0xc187…391dfSample predictions
agediff?contrib
12dboldhit2.0× × 0.95 = 1.91
34drealhit1.0× × 0.88 = 0.88
58drealmiss1.0× × 0.80 = 0.80
89deasyhit0.3× × 0.71 = 0.21
120dboldhit2.0× × 0.63 = 1.26
154drealhit1.0× × 0.55 = 0.55
… 11 more rows
Totals — no fading vs fading
Side-by-side comparison of the Skill Score components with and without the 180-day recency fading applied.metricno fadewith fade
weighted hits10.96.0
weighted attempts17.58.6
p̂ (hit rate)0.6230.698
Wilson 95% lower0.390.37
SKILL SCORE · FINAL37−2 from the no-fade score of 39Read this as: recent misses weigh heavier than old wins. Fading pulls the raw hit rate down by the old-miss share, and Wilson then trims further because once you fade out the old hits there are fewer calls effectively carrying weight than the raw count suggests.

05 · Who can rank

To appear on the ranked leaderboard, a wallet needs:

At least 3 decided predictions across all its aliases
At least 2 bold calls (real-call or bold difficulty) — stops anyone from ranking off a single lucky hit

Wallets below the bar show up in the "Provisional" section instead.

06 · One wallet, one score — aliases don't help

We compute the score from the sum of weighted hits and attempts across every alias one wallet operates. A wallet with 2 lucky aliases and 8 abandoned losing aliases sees its score dragged down by the 8 losers. Creating extra aliases stops being a strategy.

Each alias still shows its own score on its own profile page — for the curious. But the ranking number is always at the wallet level. The wallet is the anchor.

07 · Frequently asked

Can I see my raw (no-fade) score separately?

Yes. Profile pages show both, with a small was X before fading sub-label whenever the gap is more than 3 points. The leaderboard ranks on the fading version.

Why 180 days specifically?

That's what Metaculus uses for its own forecasting tournaments. We picked it for consistency with the broader prediction-market research. If crypto-pace data later suggests it should be shorter or longer, we'll change it.

Does making a new alias help my score?

No. The score lives at the wallet level — misses on alias B drag alias A down. This is on purpose: it stops anyone from cherry-picking wins across many fake accounts.

What about predictions decided years ago — do they count?

Barely. At 540 days a hit weighs 12.5%; at 720 days, 6%. The score is dominated by what you called right in the last ~360 days.

Where on the site will I see this score?

Profile headline, the "By wallet" tab on the leaderboard, and the reputation API at /api/wallet/[publisher]/stats.

How is difficulty decided?

The AI judge picks a difficulty at the same moment it decides hit or miss. Obvious = already true on the day it was locked (the anti-spam choice). Bold = going against the consensus. See the Resolution Agent page for the full prompt the AI uses.

08 · Why Wilson, not raw hit-rate?

A wallet with 3 hits out of 3 calls has a 100% hit rate — but that's a tiny sample. The at 95% gives a statistically careful answer that grows with sample size: a 3-for-3 profile scores around 30, while a 100-for-100 scores near 96. Forecasters earn the high score by being right consistently, not by getting lucky once.

The formula is Wilson (1927), most famously used for product ratings on Reddit and Yelp. We extend it to non-integer counts (difficulty weights × recency factors produce decimal values) — at hackathon scale the basic formula is close enough; a fully rigorous version would use a .

Related — the V4 anti-gaming kit

This Skill Score is one of four V4 anti-gaming pieces. The others:

By-wallet leaderboardwhere the score is ranked
Wallet provenance footershows every alias one wallet owns
Trust badgesSingle · Multi · Churner · Spam

Full plan: ROADMAP_V4.md §Theme 1 ↗

age	diff	?	contrib
12d	bold	hit	2.0× × 0.95 = 1.91
34d	real	hit	1.0× × 0.88 = 0.88
58d	real	miss	1.0× × 0.80 = 0.80
89d	easy	hit	0.3× × 0.71 = 0.21
120d	bold	hit	2.0× × 0.63 = 1.26
154d	real	hit	1.0× × 0.55 = 0.55
… 11 more rows

metric	no fade	with fade
weighted hits	10.9	6.0
weighted attempts	17.5	8.6
p̂ (hit rate)	0.623	0.698
Wilson 95% lower	0.39	0.37

How the Skill Score works

01 · The formula¶

02 · How hard was the call?¶

03 · Recent calls count more¶

04 · Worked example¶

05 · Who can rank¶

06 · One wallet, one score — aliases don't help¶

07 · Frequently asked¶

08 · Why Wilson, not raw hit-rate?¶

01 · The formula

02 · How hard was the call?

03 · Recent calls count more

04 · Worked example

05 · Who can rank

06 · One wallet, one score — aliases don't help

07 · Frequently asked

08 · Why Wilson, not raw hit-rate?