CAP – Cricket Analytics Platform (Snowflake-native)
An end-to-end analytics platform that ingests openly available Cricsheet JSON, models ball-by-ball cricket in dbt, serves reporting marts for venue/player/team dashboards, and generates explainable venue-based winning strategies using Snowflake Cortex — all inside Snowflake.
High-level architecture
CAP is designed as a layered analytics platform. Raw JSON is preserved in BRONZE, structured in SILVER, stabilized in CORE (dim/fact), and pre-aggregated in REPORTING marts that power dashboards. A compact venue dossier feeds Snowflake Cortex to create strategy narratives that stay grounded in computed metrics.
JSON files are staged and loaded into a BRONZE table as VARIANT for lossless storage.
- Cricsheet JSON files staged in Snowflake.
- BRONZE: append-only VARIANT table (lossless).
- Filename lineage (match_id from filename) for traceability.
- SILVER: typed domain tables (MATCHES, DELIVERIES).
- CORE: DIM_MATCH + FCT_DELIVERY (stable grain).
- REPORTING: venue/player/team marts for dashboards.
- SnowSight dashboards for quick exploration.
- Streamlit-in-Snowflake for drilldowns and UX.
- Cortex generates venue strategy text from a dossier view.
Demo: Cortex-style venue strategy output (visual mock)
In the actual implementation, Streamlit queries reporting marts and a compact RPT_AI_VENUE_DOSSIER view, then calls Cortex to produce a concise pre-match plan. The card below is a portfolio-friendly mock that shows the shape of the output and how it changes by venue archetype.
Par is high; death overs decide matches.
- Batting: target +10–20 above par if defending
- Powerplay: keep wickets in hand, don’t over-attack
- Bowling: execute yorkers/wide lines at death
- Fielding: save 10–15 runs = match-winning
Reports & dashboards
CAP exposes dbt-built reporting marts (venue/player/team) and a Streamlit-in-Snowflake app that makes analysis interactive. Reports are grouped to match how analysts explore matches: venue → players → teams → matchups.
CAP.DBT_DEV_REPORTING.RPT_VENUE_OVERVIEWStadium behaviour, par scores, phase dynamics, and specialists.
Core KPIs for scoring + wickets by innings.
CAP.DBT_DEV_REPORTING.RPT_VENUE_OVERVIEW- Matches in sample
- Avg inns1/inns2 runs + RR
- Avg wickets by innings
- Primary KPI panel for venue page
Powerplay / middle / death run rates + wicket pressure.
CAP.DBT_DEV_REPORTING.RPT_VENUE_PHASE_RATES- RR by phase
- Wkts per over by phase
- Shows where games are won at this ground
Chase bias (toss strategy signal).
CAP.DBT_DEV_REPORTING.RPT_VENUE_CHASE_SUCCESS- Chasing win %
- Chasing wins vs matches
- AI grounding metric
Players who consistently score at this venue.
CAP.DBT_DEV_REPORTING.RPT_VENUE_BEST_BATTERS- Runs, balls, SR
- Venue sample context
- Quick scouting list
Players who take wickets / control runs at this venue.
CAP.DBT_DEV_REPORTING.RPT_VENUE_BEST_BOWLERS- Wickets, overs, economy
- Venue specialists
- Tactical planning
Season summaries, phase splits, and short-term form.
Runs, SR, boundary impact by season/format.
CAP.DBT_DEV_REPORTING.RPT_PLAYER_BATTING_SUMMARY- Runs/balls/SR
- 4s/6s + dot balls
- Filterable by match_type + season
Wickets + economy + dot-ball pressure by season/format.
CAP.DBT_DEV_REPORTING.RPT_PLAYER_BOWLING_SUMMARY- Overs/runs/wickets
- Economy + dot balls
- Format/season comparisons
Short-term form snapshot for selection/preview.
CAP.DBT_DEV_REPORTING.RPT_PLAYER_RECENT_FORM- Runs last 10
- Balls last 10
- SR last 10
Win%, run-rate balance, rivalries, and toss impact trends.
Team KPIs: wins, scoring RR, conceding RR.
CAP.DBT_DEV_REPORTING.RPT_TEAM_SUMMARY- Wins + win%
- Scoring vs conceding RR
- Season/format comparisons
Team vs team outcomes + sample size.
CAP.DBT_DEV_REPORTING.RPT_HEAD_TO_HEAD- Matches
- Wins by team
- Rivalry checks
How much toss correlates with match wins over time.
CAP.DBT_DEV_REPORTING.RPT_TOSS_IMPACT- Toss-win → match-win %
- Season trend
- Supports chase narratives
Head-to-head tables for previews and tactical planning.
H2H performance: balls, runs, SR, dismissals.
CAP.DBT_DEV_REPORTING.RPT_MATCHUPS_BATTER_BOWLER- Balls faced
- Runs + SR
- Dismissals
- Best for match previews
Streamlit in Snowflake — app pages
KPIs + par scores + phase rates + top batters/bowlers at the selected ground.
- match_type / season / venue filters
- bar chart: phase RR
- tables: top batters & bowlers
- feeds AI venue strategy generator
Batting + bowling summaries, phase splits, venue splits, recent form, and matchups.
- player selector with match_type/season filters
- SR by phase + economy by phase
- recent form (last 10 matches)
- batter vs bowler matchup drilldown
Team season KPIs, head-to-head summary and toss impact trend.
- team selector with match_type/season filters
- head-to-head table
- toss impact trend chart
A quick table browser for all reporting models to validate dbt outputs in Snowflake UI.
- select a reporting table
- limit slider
- instant preview for QA
Note: the Streamlit app uses fully-qualified table names due to Snowflake Streamlit execution constraints (no USE statements).
Tech stack
Data credits
Data used in this project is sourced from Cricsheet, which provides freely available ball-by-ball cricket data for research and analysis.
🌍 Visit cricsheet.orgCAP is intentionally designed like a real analytics platform: layered models, tested transformations, orchestration, and dashboard consumption — with an AI layer that stays grounded in computed metrics. It’s a strong demonstration of end-to-end analytics engineering in a Snowflake-native workflow.
Explore CAP on GitHub