Back to home

CAP – Cricket Analytics Platform (Snowflake-native)

An end-to-end analytics platform that ingests openly available Cricsheet JSON, models ball-by-ball cricket in dbt, serves reporting marts for venue/player/team dashboards, and generates explainable venue-based winning strategies using Snowflake Cortex — all inside Snowflake.

Portfolio project · 2025Snowflake · dbt · Airflow · Streamlit · CortexView project on GitHub
Situation
Ball-by-ball cricket data is semi-structured, nested, and high-volume. Most public dashboards flatten too early or lose lineage between raw JSON and BI metrics.
Task
Build a modern, Snowflake-native analytics platform: bronze/silver/core/reporting layers, repeatable dbt transformations, orchestration, and interactive dashboards.
Action
Implemented a layered warehouse design: BRONZE VARIANT JSON → SILVER domain tables (MATCHES, DELIVERIES) → CORE semantic models (DIM/FCT) → REPORTING marts for stadium/player/team analytics. Orchestrated dbt with Airflow (Cosmos) and built Streamlit-in-Snowflake dashboards.
Result
A portfolio-grade platform that supports fast exploration (SnowSight), product-style UX (Streamlit), and grounded AI insights (Cortex) that generate explainable winning strategies from venue patterns rather than “black-box predictions”.

High-level architecture

CAP is designed as a layered analytics platform. Raw JSON is preserved in BRONZE, structured in SILVER, stabilized in CORE (dim/fact), and pre-aggregated in REPORTING marts that power dashboards. A compact venue dossier feeds Snowflake Cortex to create strategy narratives that stay grounded in computed metrics.

1Step 1 – Land raw Cricsheet JSON in Snowflake (Bronze)

JSON files are staged and loaded into a BRONZE table as VARIANT for lossless storage.

Ingestion & Storage
  • Cricsheet JSON files staged in Snowflake.
  • BRONZE: append-only VARIANT table (lossless).
  • Filename lineage (match_id from filename) for traceability.
Transformations (dbt)
  • SILVER: typed domain tables (MATCHES, DELIVERIES).
  • CORE: DIM_MATCH + FCT_DELIVERY (stable grain).
  • REPORTING: venue/player/team marts for dashboards.
Consumption + AI
  • SnowSight dashboards for quick exploration.
  • Streamlit-in-Snowflake for drilldowns and UX.
  • Cortex generates venue strategy text from a dossier view.
Orchestration
Airflow (Cosmos) schedules dbt run/test.
Data layers
Bronze → Silver → Core → Reporting (clear separation of concerns).
AI grounding
LLM reads venue dossier metrics, not raw JSON.

Demo: Cortex-style venue strategy output (visual mock)

In the actual implementation, Streamlit queries reporting marts and a compact RPT_AI_VENUE_DOSSIER view, then calls Cortex to produce a concise pre-match plan. The card below is a portfolio-friendly mock that shows the shape of the output and how it changes by venue archetype.

(Auto-rotating every few seconds)
🏟️High scoring venue
Strategy summary from venue patterns (par score · phase RR · chase%)
Cortex-generated insight

Par is high; death overs decide matches.

  • Batting: target +10–20 above par if defending
  • Powerplay: keep wickets in hand, don’t over-attack
  • Bowling: execute yorkers/wide lines at death
  • Fielding: save 10–15 runs = match-winning
Grounded in reporting marts — not raw ball-by-ball.Explainable AI (not winner prediction)

Reports & dashboards

CAP exposes dbt-built reporting marts (venue/player/team) and a Streamlit-in-Snowflake app that makes analysis interactive. Reports are grouped to match how analysts explore matches: venue → players → teams → matchups.

Highlighted report: 🏟️ Venue intelligence · Venue Overview
Auto-rotating (nice for live walkthroughs)
Source: CAP.DBT_DEV_REPORTING.RPT_VENUE_OVERVIEW
Core KPIs for scoring + wickets by innings.
🏟️ Venue intelligence

Stadium behaviour, par scores, phase dynamics, and specialists.

Venue Overview

Core KPIs for scoring + wickets by innings.

REPORT
Source table
CAP.DBT_DEV_REPORTING.RPT_VENUE_OVERVIEW
  • Matches in sample
  • Avg inns1/inns2 runs + RR
  • Avg wickets by innings
  • Primary KPI panel for venue page
Venue Phase Rates

Powerplay / middle / death run rates + wicket pressure.

REPORT
Source table
CAP.DBT_DEV_REPORTING.RPT_VENUE_PHASE_RATES
  • RR by phase
  • Wkts per over by phase
  • Shows where games are won at this ground
Chasing Success

Chase bias (toss strategy signal).

REPORT
Source table
CAP.DBT_DEV_REPORTING.RPT_VENUE_CHASE_SUCCESS
  • Chasing win %
  • Chasing wins vs matches
  • AI grounding metric
Top Batters at Venue

Players who consistently score at this venue.

REPORT
Source table
CAP.DBT_DEV_REPORTING.RPT_VENUE_BEST_BATTERS
  • Runs, balls, SR
  • Venue sample context
  • Quick scouting list
Top Bowlers at Venue

Players who take wickets / control runs at this venue.

REPORT
Source table
CAP.DBT_DEV_REPORTING.RPT_VENUE_BEST_BOWLERS
  • Wickets, overs, economy
  • Venue specialists
  • Tactical planning
🧑‍🏫 Player intelligence

Season summaries, phase splits, and short-term form.

Player Batting Summary

Runs, SR, boundary impact by season/format.

REPORT
Source table
CAP.DBT_DEV_REPORTING.RPT_PLAYER_BATTING_SUMMARY
  • Runs/balls/SR
  • 4s/6s + dot balls
  • Filterable by match_type + season
Player Bowling Summary

Wickets + economy + dot-ball pressure by season/format.

REPORT
Source table
CAP.DBT_DEV_REPORTING.RPT_PLAYER_BOWLING_SUMMARY
  • Overs/runs/wickets
  • Economy + dot balls
  • Format/season comparisons
Recent Form (Last 10)

Short-term form snapshot for selection/preview.

REPORT
Source table
CAP.DBT_DEV_REPORTING.RPT_PLAYER_RECENT_FORM
  • Runs last 10
  • Balls last 10
  • SR last 10
🧢 Team intelligence

Win%, run-rate balance, rivalries, and toss impact trends.

Team Summary

Team KPIs: wins, scoring RR, conceding RR.

REPORT
Source table
CAP.DBT_DEV_REPORTING.RPT_TEAM_SUMMARY
  • Wins + win%
  • Scoring vs conceding RR
  • Season/format comparisons
Head-to-Head

Team vs team outcomes + sample size.

REPORT
Source table
CAP.DBT_DEV_REPORTING.RPT_HEAD_TO_HEAD
  • Matches
  • Wins by team
  • Rivalry checks
Toss Impact Trend

How much toss correlates with match wins over time.

REPORT
Source table
CAP.DBT_DEV_REPORTING.RPT_TOSS_IMPACT
  • Toss-win → match-win %
  • Season trend
  • Supports chase narratives
⚔️ Tactical matchups

Head-to-head tables for previews and tactical planning.

Batter vs Bowler Matchups

H2H performance: balls, runs, SR, dismissals.

REPORT
Source table
CAP.DBT_DEV_REPORTING.RPT_MATCHUPS_BATTER_BOWLER
  • Balls faced
  • Runs + SR
  • Dismissals
  • Best for match previews

Streamlit in Snowflake — app pages

Venue Dashboard

KPIs + par scores + phase rates + top batters/bowlers at the selected ground.

  • match_type / season / venue filters
  • bar chart: phase RR
  • tables: top batters & bowlers
  • feeds AI venue strategy generator
Player Dashboard

Batting + bowling summaries, phase splits, venue splits, recent form, and matchups.

  • player selector with match_type/season filters
  • SR by phase + economy by phase
  • recent form (last 10 matches)
  • batter vs bowler matchup drilldown
Team Dashboard

Team season KPIs, head-to-head summary and toss impact trend.

  • team selector with match_type/season filters
  • head-to-head table
  • toss impact trend chart
All Tables (Debug)

A quick table browser for all reporting models to validate dbt outputs in Snowflake UI.

  • select a reporting table
  • limit slider
  • instant preview for QA

Note: the Streamlit app uses fully-qualified table names due to Snowflake Streamlit execution constraints (no USE statements).

Tech stack

Snowflake (VARIANT, Snowpark)dbt (Silver/Core/Reporting)Apache Airflow + CosmosSnowSight dashboardsStreamlit in SnowflakeSnowflake Cortex (LLM)

Data credits

Data used in this project is sourced from Cricsheet, which provides freely available ball-by-ball cricket data for research and analysis.

🌍 Visit cricsheet.org

CAP is intentionally designed like a real analytics platform: layered models, tested transformations, orchestration, and dashboard consumption — with an AI layer that stays grounded in computed metrics. It’s a strong demonstration of end-to-end analytics engineering in a Snowflake-native workflow.

Explore CAP on GitHub