Case study

How we transformed California’s transit data infrastructure

Building the nation’s first statewide open-source transit data platform, serving 200+ agencies and processing millions of files daily

California Department of Transportation
2021 – Present
8–12 Engineers

Measurable impact

200+
Transit Agencies
Unified on single platform
1M+
Files Daily
Real-time data processing
100%
Open Source
Complete agency ownership
50TB+
Data Analyzed
Historical insights preserved

Bottom line

California now has the most comprehensive transit data infrastructure in the nation, built entirely on open-source technology that agencies own and control.

The challenge

California's 200+ transit agencies operated in complete data silos. Each agency managed their own systems, standards, and tools - making statewide planning, coordination, and improvement nearly impossible.

Critical pain points

  • No unified view: State couldn't see holistic transit performance
  • Duplicated efforts: Agencies solving the same problems independently
  • Vendor lock-in: Expensive proprietary systems limiting innovation
  • Limited resources: Small agencies couldn't afford modern tools

The vision

Caltrans and Cal-ITP leadership imagined a different future: a unified, open-source platform that would democratize access to data and tools, enabling every agency—regardless of size—to make data-driven decisions.

Our approach

Modern data stack

Implemented Google BigQuery, dbt, and Airflow to create a scalable, cloud-native data platform

  • Automated data pipelines
  • Real-time processing
  • Historical preservation

Open source first

Every component built on open-source tools, with all code publicly available on GitHub

  • No vendor lock-in
  • Community contributions
  • Complete transparency

Capacity building

Trained Caltrans staff to own and operate the platform, ensuring long-term sustainability

  • Hands-on training
  • Documentation
  • Gradual handoff

Ready to transform your transit data?

Email us directly at transit@jarv.us