Lab · alternate cuts of today's full board.
Each view applies a different filter/sort to the same 277-batter pool —
a way to shrink the denominator and surface picks the composite top-8
misses. Click any row for the Topps card.
Loading lab...
Live Today
Loading recap...
Batter × Game Heatmap
→
Pick-Rank Heat Map · Hit (green) vs Miss (red)
Rows = pick rank (#1 at top). Columns = days (most recent on the right).
Reveals whether the top picks (#1-3) are converting more reliably than the
bottom picks (#6-8), or if HRs scatter randomly across rank.
Click any cell to open the batter's card.
Loading heat map...
Loading history...
Backtest: Current Model vs History
Re-scores every batter in pick_inputs using TODAY's
score_* functions, then bins them into quintiles by
factor score and reports the actual HR rate per bin. A factor with
lift > 1.3×, monotonic ✓, and
AUC > 0.55 is doing real work. A factor near
1.0× / non-monotonic / 0.50 is dead weight that
shouldn't be carrying composite weight.
Loading factor accuracy…
Factor Score Trends (Selected Picks)
Composite Score → HR Rate (full board, all dates)
Bars: counts of HRs (green) and non-HRs (gray) per composite range. Orange line: HR rate % (right axis) — the model's signal climbs from low single digits at the bottom to ~40-50% at the top.
Per-Factor Mean — HR Hitters vs Misses (last 30 days)
Mean factor score for batters who hit a HR vs those who didn't. A bigger green-vs-gray gap = the factor is surfacing HR hitters. A flat pair = the factor isn't differentiating and is a candidate to drop or rework.
Factor Decomposition — What's Inside Each Score?
For each factor, the underlying raw inputs the model is consuming. Green = HR-hitter mean,
gray = miss mean. The All view shows overall signal; band views show what differs INSIDE
a composite range (e.g., "of guys we scored 40-60, what do the HR hitters share?").
Composite band:
Daily rank band:
Loading decomposition...
Why Are HR Hitters Stuck at 40-60?
Same inputs as above, but compares HR hitters who scored 40-60
(model under-rated them) vs HR hitters who scored 60-80
(model caught them). The biggest input gaps here are the signals the model fails to convert into score
for the under-rated group — candidates for weight increase or new feature.
Loading...
HR Hitters by Daily Rank — Where's the Tunable Cohort?
Daily rank controls for slate variance. Compares HR hitters we picked (#1-10)
vs HR hitters we just missed (#11-30) vs
deep-board HR hitters (#31-100). The largest input gaps between
#1-10 and #11-30 are the most actionable tuning levers — these are the closest cohort to breaking into the top-8.
Loading...
Input Calibration — Score Curve vs Empirical HR Rate
For each raw input, we bin into 5 quantile buckets and compare the empirical HR rate per bucket
to the average sub-factor score we assigned. Aligned = the
score curve climbs with HR rate (correct). Backwards = the score
climbs while HR rate drops or vice versa. Flat = no
signal in either curve. The most actionable diagnostic for fixing per-input score functions.
Loading...
Dome vs Outdoor — Is Our Dome Bias Justified?
Dome games get a flat weather score (50, neutral). Outdoor games can score higher with helping
wind but lower with hostile wind. If domes convert at a higher HR rate than outdoor games of
comparable park-factor, the bias is justified. If not, we're under-rewarding outdoor wind alignment.
Loading...
Wind Direction Effect — Outdoor Games Only
HR rate by wind-helping band (cosine of wind-to-CF angle × MPH). Out-blowing wind toward CF
should produce higher HR rates than in-blowing wind toward home plate. The avg_weather_score
column shows what our model is rewarding for each band — if HR rate climbs but score is flat,
the wind effect isn't being weighted enough.
Loading...
Temp × Humidity Interaction — HR Rate Heatmap
Outdoor games only. Rows = temperature band, columns = humidity band. Cell shows the
empirical HR rate. Green tint = positive
interaction (the combo produces MORE HRs than additive prediction);
red tint = negative. The hot+humid cell should
be green if our physics intuition holds (warm humid air is less dense).
Cells with n < 5 are suppressed.
Source:
Loading...
Elite Pitcher × Archetype Match — Is the Dampening Over-aggressive?
Pitcher vulnerability quintile (low HR/9 → "elite" on row 1) crossed with archetype
similarity quintile (high archetype match → column 5). The diagnostic cell is
row 1 × col 5 — elite pitcher facing his archetype victim. If empirical HR rate
there is high but our matchup score is low, the elite-pitcher dampening
(×0.70 multiplier when vulnerability < 25) is firing on cases where archetype
is screaming "vulnerability" — and we should condition the dampening on similarity.
Loading...
Pick Composition — What Does Our Top-8 Look Like in Aggregate?
Distribution of selected picks by park factor, batting order, and dome status. Surfaces
systematic biases that aren't visible per-pick — e.g., 80% dome share, never picks from
sub-95 park-factor venues, leadoff hitters absent.