Live win probability models for every Cubs game.
CubsStats is the analytics capability demonstration. XGBoost models trained on 304 games of MLB play-by-play. Auto-retrain pipeline. Live win probability, leverage index, and matchup analytics rendered in real time. The same stack ports to other sports, other domains, other questions.
Analytics capability is not the same as shipped analytics.
Most ML consulting ends at notebooks and slide decks. CubsStats exists because the proof of an analytics capability is a live production system that operates without a human in the loop. The same stack scales to any sport, any operations-heavy domain, any question worth answering.
We picked the hardest baseline
we could find.
Baseball play-by-play is dense, sequential, and full of edge cases that break naive ML pipelines. If the model can handle the Cubs, it can handle most operations workflows. The diagnose phase scoped the dataset, established the baseline, and set the bar for what production meant.
A full season-plus of MLB play-by-play, structured and feature-engineered for win probability modeling at every pitch.
XGBoost is the right tool for tabular data with strong feature interactions. The right baseline matters more than the fanciest model.
Models retrain themselves nightly on the latest data. No human in the loop. No drift, no decay, no dashboard that quietly goes stale.
The model retrains itself nightly. The dashboard does not break.
Treat the model like a product,
not a notebook.
Four decisions distinguish a production ML system from a one-time analysis. Every one of them was a hard constraint before training started.
Nightly retrains on the latest play-by-play data. The model never goes stale, and the freshness is observable.
Predictions render in real time on the public site. No daily PDF, no weekly slide deck. The model is the product.
The infrastructure was built so that swapping baseball for another domain is a feature engineering exercise, not a rebuild.
CubsStats runs at cubsstats.live where anyone can watch it work. Capability claims are easier to make when the proof is one click away.
Live at cubsstats.live.
- Models that need a data scientist to refresh
- Insights trapped in PDF or PowerPoint
- No public proof the capability works
- Auto-retrain nightly with zero human intervention
- Public, live dashboard at cubsstats.live
- Stack reusable across sports and operations domains
Related builds
Rivl is the agency management system built for the people the incumbents forgot. Independent P and C agencies with one to ten producers, charged flat instead of per-seat, with AI native to the inbox.
Read the buildThrueLink runs the patient access chain for specialty practices. Five stages, end to end, scoped to your specialty, your payer mix, and your provider count.
Read the build