Welcome to the codebase for the Cal-ITP data warehouse and ETL pipeline.
Documentation for this codebase lives at docs.calitp.org/data-infra
airflowcontains the local dev setup and source code for Airflow DAGs (ie, ETLs)airflow/data/agencies.ymlcontains catalogs for all transit agencies in CA's GTFS data.cicontains continious integration and deployment scripts using GH actions.docsbuilds the docs site.kubernetescontain helm charts, scripts and more for deploying apps and such on our kubernetes cluster.scriptcontains associated scripts (mostly python) that are ad hoc.servicescontain apps that we write and deploy to kubernetes.
- Follow the Conventional Commits standard for all commits
- Use Conventional Commit format for PR titles
- Use GitHub's draft status to indicate PRs that are not ready for review/merging
- Do not use GitHub's "update branch" button or merge the
mainbranch back into a PR branch to update it. Instead, rebase PR branches to update them and resolve any merge conflicts.
This repository uses black and pre-commit hooks to format code. To install locally, run
pip install pre-commit & pre-commit install in the root of the repo.