CASE STUDY · SPORTS ML

LIVE

Live win probability models for every Cubs game.

CubsStats is the analytics capability demonstration. XGBoost models trained on 304 games of MLB play-by-play. Auto-retrain pipeline. Live win probability, leverage index, and matchup analytics rendered in real time. The same stack ports to other sports, other domains, other questions.

Visit cubsstats.live Book an assessment

Vertical

Sports analytics

Built for

Sabermetrics

Pricing

Internal

Timeline

Live and auto-retraining

Status

LIVE

THE PROBLEM

Analytics capability is not the same as shipped analytics.

Most ML consulting ends at notebooks and slide decks. CubsStats exists because the proof of an analytics capability is a live production system that operates without a human in the loop. The same stack scales to any sport, any operations-heavy domain, any question worth answering.

01 · DIAGNOSE

We picked the hardest baseline
we could find.

Baseball play-by-play is dense, sequential, and full of edge cases that break naive ML pipelines. If the model can handle the Cubs, it can handle most operations workflows. The diagnose phase scoped the dataset, established the baseline, and set the bar for what production meant.

304

Games of training data

A full season-plus of MLB play-by-play, structured and feature-engineered for win probability modeling at every pitch.

XGB

Gradient boosting baseline

XGBoost is the right tool for tabular data with strong feature interactions. The right baseline matters more than the fanciest model.

24h

Auto retrain cadence

Models retrain themselves nightly on the latest data. No human in the loop. No drift, no decay, no dashboard that quietly goes stale.

The model retrains itself nightly. The dashboard does not break.

02 · REDESIGN

Treat the model like a product,
not a notebook.

Four decisions distinguish a production ML system from a one-time analysis. Every one of them was a hard constraint before training started.

DECISION 01

Auto-retrain on a fixed cadence

Nightly retrains on the latest play-by-play data. The model never goes stale, and the freshness is observable.

DECISION 02

Live dashboard, not batch reports

Predictions render in real time on the public site. No daily PDF, no weekly slide deck. The model is the product.

DECISION 03

Same stack ports to other domains

The infrastructure was built so that swapping baseball for another domain is a feature engineering exercise, not a rebuild.

DECISION 04

Public proof, not internal demo

CubsStats runs at cubsstats.live where anyone can watch it work. Capability claims are easier to make when the proof is one click away.

THE RESULT

Live at cubsstats.live.

TYPICAL ML CONSULTING

Notebook deliverable

one-time analysis

Models that need a data scientist to refresh
Insights trapped in PDF or PowerPoint
No public proof the capability works

CUBSSTATS

Production system

live, auto-retraining

Auto-retrain nightly with zero human intervention
Public, live dashboard at cubsstats.live
Stack reusable across sports and operations domains

Visit live site

WANT THIS FOR YOUR DOMAIN?

Same methodology.
Different vertical.

If you operate in a process-heavy domain and want a system like this scoped to your business, the path starts with a fifteen minute call.

Book a Free Process Assessment

Related builds

INSURANCE

Rivl

tryrivl.com

Rivl is the agency management system built for the people the incumbents forgot. Independent P and C agencies with one to ten producers, charged flat instead of per-seat, with AI native to the inbox.

Read the build

HEALTHCARE

ThrueLink

trythruelink.com

ThrueLink runs the patient access chain for specialty practices. Five stages, end to end, scoped to your specialty, your payer mix, and your provider count.