DingersOnly.cc

Backtest: Current Model vs History

Re-scores every batter in pick_inputs using TODAY's score_* functions, then bins them into quintiles by factor score and reports the actual HR rate per bin. A factor with lift > 1.3×, monotonic ✓, and AUC > 0.55 is doing real work. A factor near 1.0× / non-monotonic / 0.50 is dead weight that shouldn't be carrying composite weight.

Loading factor accuracy…

Factor Score Trends (Selected Picks)

Composite Score → HR Rate (full board, all dates)

Bars: counts of HRs (green) and non-HRs (gray) per composite range. Orange line: HR rate % (right axis) — the model's signal climbs from low single digits at the bottom to ~40-50% at the top.

Per-Factor Mean — HR Hitters vs Misses (last 30 days)

Mean factor score for batters who hit a HR vs those who didn't. A bigger green-vs-gray gap = the factor is surfacing HR hitters. A flat pair = the factor isn't differentiating and is a candidate to drop or rework.

Factor Decomposition — What's Inside Each Score?

For each factor, the underlying raw inputs the model is consuming. Green = HR-hitter mean, gray = miss mean. The All view shows overall signal; band views show what differs INSIDE a composite range (e.g., "of guys we scored 40-60, what do the HR hitters share?").

Composite band:

Daily rank band:

Loading decomposition...

Why Are HR Hitters Stuck at 40-60?

Same inputs as above, but compares HR hitters who scored 40-60 (model under-rated them) vs HR hitters who scored 60-80 (model caught them). The biggest input gaps here are the signals the model fails to convert into score for the under-rated group — candidates for weight increase or new feature.

HR Hitters by Daily Rank — Where's the Tunable Cohort?

Daily rank controls for slate variance. Compares HR hitters we picked (#1-10) vs HR hitters we just missed (#11-30) vs deep-board HR hitters (#31-100). The largest input gaps between #1-10 and #11-30 are the most actionable tuning levers — these are the closest cohort to breaking into the top-8.

Input Calibration — Score Curve vs Empirical HR Rate

For each raw input, we bin into 5 quantile buckets and compare the empirical HR rate per bucket to the average sub-factor score we assigned. Aligned = the score curve climbs with HR rate (correct). Backwards = the score climbs while HR rate drops or vice versa. Flat = no signal in either curve. The most actionable diagnostic for fixing per-input score functions.

Dome vs Outdoor — Is Our Dome Bias Justified?

Dome games get a flat weather score (50, neutral). Outdoor games can score higher with helping wind but lower with hostile wind. If domes convert at a higher HR rate than outdoor games of comparable park-factor, the bias is justified. If not, we're under-rewarding outdoor wind alignment.

Wind Direction Effect — Outdoor Games Only

HR rate by wind-helping band (cosine of wind-to-CF angle × MPH). Out-blowing wind toward CF should produce higher HR rates than in-blowing wind toward home plate. The avg_weather_score column shows what our model is rewarding for each band — if HR rate climbs but score is flat, the wind effect isn't being weighted enough.

Temp × Humidity Interaction — HR Rate Heatmap

Outdoor games only. Rows = temperature band, columns = humidity band. Cell shows the empirical HR rate. Green tint = positive interaction (the combo produces MORE HRs than additive prediction); red tint = negative. The hot+humid cell should be green if our physics intuition holds (warm humid air is less dense). Cells with n < 5 are suppressed.

Source:

Elite Pitcher × Archetype Match — Is the Dampening Over-aggressive?

Pitcher vulnerability quintile (low HR/9 → "elite" on row 1) crossed with archetype similarity quintile (high archetype match → column 5). The diagnostic cell is row 1 × col 5 — elite pitcher facing his archetype victim. If empirical HR rate there is high but our matchup score is low, the elite-pitcher dampening (×0.70 multiplier when vulnerability < 25) is firing on cases where archetype is screaming "vulnerability" — and we should condition the dampening on similarity.

Pick Composition — What Does Our Top-8 Look Like in Aggregate?

Distribution of selected picks by park factor, batting order, and dome status. Surfaces systematic biases that aren't visible per-pick — e.g., 80% dome share, never picks from sub-95 park-factor venues, leadoff hitters absent.

Scores · minimum (0–100)

HR window · cold streak / hot streak

Sort by

Live Today

Pick-Rank Heat Map · Hit (green) vs Miss (red)