System Analysis | Baseball

What the System Found

Across all 60 players, the system identified the same performance structure that sabermetrics has spent decades building — independently, with zero baseball knowledge provided. Each outcome metric has distinct drivers, and the platform identifies them per player.

What Drives Each Outcome

Each of the four projected metrics has distinct drivers. The system found different primary factors for each:

Home Run Rate (HR/PA)

Opportunity + Mechanical Readiness

HR rate is driven by pitch velocity trends (43% of players) and combined factors (22%). Power output depends on seeing hittable pitches and being ready to turn on them. This was the most predictable metric, with the lowest error for 36 of 60 players.

Stat	Value
Mean Error	0.0127
Min Error	0.0005
Max Error	0.0580
Times Best Metric	36 of 60
Times Worst Metric	0 of 60

Hit Rate (H/PA)

Bat-to-Ball Skill + BABIP Noise

Hit rate is driven by zone rate patterns and swing-and-miss trends. However, it was the hardest metric to predict (worst for 24 of 60 players) because batting average is heavily influenced by BABIP: defensive alignment, batted ball luck, and sprint speed that the system doesn't currently observe.

Stat	Value
Mean Error	0.0371
Min Error	0.0008
Max Error	0.1560
Times Best Metric	4 of 60
Times Worst Metric	24 of 60

Walk Rate (BB/PA)

Behavioral, Not Mechanical

Walk rate is the most "decision-driven" metric. Swing decision / plate discipline patterns were the primary driver (22% of players). Unlike the other three metrics, BB/PA reflects a hitter's strategic approach to the strike zone, not their physical bat-to-ball ability. The system also found the largest systematic bias here.

Stat	Value
Mean Error	0.0281
Min Error	0.0013
Max Error	0.0982
Systematic Bias	Over-predicted (38 of 60)

Strikeout Rate (K/PA)

Mirror Image of Hit Rate

K rate is driven by the same signal as hit rate: swing-and-miss trends (20%) and zone rate patterns (20%). This confirms that strikeouts and hits share the same underlying mechanical process: bat-to-ball contact ability. When whiff rate goes up, both K/PA rises and H/PA falls.

Stat	Value
Mean Error	0.0432
Min Error	0.0000
Max Error	0.1908
Times Best Metric	9 of 60
Times Worst Metric	23 of 60

Systematic Bias Analysis

Does the system consistently over- or under-predict? Across 60 players:

Metric	Over-predicted	Under-predicted	Mean Bias	Interpretation
HR/PA	30	29	−0.0025	Nearly unbiased, slight under-prediction of power
H/PA	30	30	+0.0016	Nearly unbiased, minimal directional tendency
BB/PA	38	22	+0.0085	Largest bias: system expected more patience than 2025 showed
K/PA	33	27	−0.0007	Nearly unbiased, trivial under-prediction

Key insight: The BB/PA over-prediction (system expected more walks than actually occurred for 63% of players) suggests a league-wide shift toward more aggressive approaches in 2025. Hitters were swinging more and walking less than their career trends predicted. This is consistent with the MLB's observed trend of decreasing walk rates over recent seasons.

Performance Dynamics

30,141

Performance
Patterns Found

73,076

Performance
Shifts Mapped

Why Hot Streaks Compound

When power numbers rise, pitchers adjust their approach, which creates more hittable pitches, which drives even higher power numbers. The system found this pattern across power hitters, explaining why hot streaks build on themselves.

Why Patient Hitters Stay Patient

Higher walk rate leads to deeper counts, more off-speed exposure, and better swing selection, which drives even more walks. Patient hitters get rewarded with more information per at-bat.

Why Slumps Spiral

When strikeout rate rises, swing decisions degrade, contact quality drops, and strikeout rate rises further. The system identified this as the primary pattern behind extended cold spells.

Over 73,000 performance shift points — the tipping points between hot streaks and slumps — were mapped across all 60 players. The system identifies when and how these transitions happen.

What Separates Elite from Developing Projections

Factor	Elite + Strong (33 players)	Fair + Developing (9 players)
Average Accuracy	97.8%	93.6%
Average Error / PA	0.0185	0.0613
Avg Plate Appearances (2025)	601	528
Common Characteristics	Longer careers, stable approach	Young, volatile, role changes

Key finding: The primary driver of projection quality is career data depth. Players with longer careers provide more historical patterns to validate against, producing tighter forward projections. The 5 "Developing" grade players (Cal Raleigh, Jazz Chisholm Jr., Jose Siri, Josh Naylor, Nolan Arenado) were predominantly players who experienced significant role changes, position switches, or volatile recent performance that disrupted historical trends. This is not a system weakness. It's appropriate epistemic humility.

Notable Outlier Cases

Cody Bellinger: 99.0%

Why so accurate: Bellinger's performance was driven by swing decisions (HR) and long-term power production trends interconnected with medium-term batting consistency (H/PA). These are stable, persistent signals. Even his worst metric (H/PA) had only 0.024 error. The system captured his renaissance perfectly.

Luis Arraez: K/PA Error: 0.0000

Perfect K/PA projection. Arraez's strikeout rate was projected to four decimal places with zero error. His extreme contact-first approach produces the most stable, predictable K/PA signal in baseball. A finding that would interest any team evaluating plate discipline.

Jose Siri: 92.1%

Largest miss: K/PA error of 0.1908 (projected 0.183, actual 0.374). Siri's strikeout rate nearly doubled from what his career trends predicted. This likely reflects injury impact (fractured tibia) and role change. The system correctly flagged high uncertainty.

Cal Raleigh: H/PA Miss

Largest single-metric miss: H/PA error of 0.156 (projected .365, actual .209). Raleigh's hit rate collapsed far below career norms. The system over-projected his contact ability. A case where adding batted ball direction data could have signaled the decline earlier.

Cross-Domain Implications

Insights from the baseball analysis directly improved results in healthcare, retail, and commercial real estate:

Baseball Finding	Impact on Other Industries
Performance patterns cascade predictably	Same cascading patterns found in retail cross-sell and healthcare readmission data
Hot streaks and slumps have identifiable drivers	Similar momentum effects found in retail revenue shifts and real estate price cycles
Some outcomes depend on multiple factors working together	Led to richer insights in every subsequent domain deployment
More data history = more accurate results	Applied as a quality standard across all industry deployments

Summary Statistics

Players Analyzed

Performance Indicators
Per Player

97.0%

Average Projection
Accuracy

156

Relationships
Discovered

Statistic	Value
Dataset	2015–2025 MLB Statcast (plate-appearance level)
Performance Indicators / Player	72
Performance Patterns Found	30,141
Performance Shifts Mapped	73,076
Actuals Verified Against	Baseball Reference, StatMuse

Note: All findings are data-driven. No manual curation or cherry-picking was applied. The bias analysis uses confirmed 2025 season statistics from Baseball Reference and StatMuse.

View All 60 Player Results →

What the System Learned About Baseball