CASE STUDY · SPORTS ML
LIVE

Live win probability models for every Cubs game.

CubsStats is the analytics capability demonstration. XGBoost models trained on 304 games of MLB play-by-play. Auto-retrain pipeline. Live win probability, leverage index, and matchup analytics rendered in real time. The same stack ports to other sports, other domains, other questions.

Vertical
Sports analytics
Built for
Sabermetrics
Pricing
Internal
Timeline
Live and auto-retraining
Status
LIVE
THE PROBLEM

Analytics capability is not the same as shipped analytics.

Most ML consulting ends at notebooks and slide decks. CubsStats exists because the proof of an analytics capability is a live production system that operates without a human in the loop. The same stack scales to any sport, any operations-heavy domain, any question worth answering.

01 · DIAGNOSE

We picked the hardest baseline
we could find.

Baseball play-by-play is dense, sequential, and full of edge cases that break naive ML pipelines. If the model can handle the Cubs, it can handle most operations workflows. The diagnose phase scoped the dataset, established the baseline, and set the bar for what production meant.

304
Games of training data

A full season-plus of MLB play-by-play, structured and feature-engineered for win probability modeling at every pitch.

XGB
Gradient boosting baseline

XGBoost is the right tool for tabular data with strong feature interactions. The right baseline matters more than the fanciest model.

24h
Auto retrain cadence

Models retrain themselves nightly on the latest data. No human in the loop. No drift, no decay, no dashboard that quietly goes stale.

The model retrains itself nightly. The dashboard does not break.

02 · REDESIGN

Treat the model like a product,
not a notebook.

Four decisions distinguish a production ML system from a one-time analysis. Every one of them was a hard constraint before training started.

DECISION 01
Auto-retrain on a fixed cadence

Nightly retrains on the latest play-by-play data. The model never goes stale, and the freshness is observable.

DECISION 02
Live dashboard, not batch reports

Predictions render in real time on the public site. No daily PDF, no weekly slide deck. The model is the product.

DECISION 03
Same stack ports to other domains

The infrastructure was built so that swapping baseball for another domain is a feature engineering exercise, not a rebuild.

DECISION 04
Public proof, not internal demo

CubsStats runs at cubsstats.live where anyone can watch it work. Capability claims are easier to make when the proof is one click away.

THE RESULT

Live at cubsstats.live.

TYPICAL ML CONSULTING
Notebook deliverable
one-time analysis
  • Models that need a data scientist to refresh
  • Insights trapped in PDF or PowerPoint
  • No public proof the capability works
CUBSSTATS
Production system
live, auto-retraining
  • Auto-retrain nightly with zero human intervention
  • Public, live dashboard at cubsstats.live
  • Stack reusable across sports and operations domains
Visit live site
WANT THIS FOR YOUR DOMAIN?

Same methodology.
Different vertical.

If you operate in a process-heavy domain and want a system like this scoped to your business, the path starts with a fifteen minute call.

Book a Free Process Assessment
More from the ecosystem

Related builds

INSURANCE
Rivl
tryrivl.com

Rivl is the agency management system built for the people the incumbents forgot. Independent P and C agencies with one to ten producers, charged flat instead of per-seat, with AI native to the inbox.

Read the build
HEALTHCARE
ThrueLink
trythruelink.com

ThrueLink runs the patient access chain for specialty practices. Five stages, end to end, scoped to your specialty, your payer mix, and your provider count.

Read the build