template-ml-registry

Minimal ML template with a simple data pipeline and model registry.

Install

uv sync --group dev

Layout

data/<dataset_name>/
  ├── raw/
  │   ├── full.parquet
  │   ├── train.parquet
  │   └── test.parquet
  └── preprocessed/
      ├── full.parquet
      ├── train.parquet
      └── test.parquet

outputs/
  ├── gridsearch/              # cv_summary.json
  ├── models/                  # index.json, current_best.json, *.skops
  └── metrics/{model_id}/      # predictions.parquet, metrics.json, plots/

Configuration

Edit configs/default.toml:

[data]
dataset_name = "example_data"              # name of dataset, gets own directory
target_name = "target"                     # column name of the target variable

[eval]
random_state = 123
test_size = 0.20
cv_splits = 5
# Choose ONE scorer you intend to use, by default the MAX is chosen
scoring = "neg_root_mean_squared_error"   # regression example
# scoring = "roc_auc"                     # classification example
n_jobs = -1

[search]
model_keys = ["<modelname>"]

[train]                         # optional explicit spec; if unset, train reuses registry "best"
## example for support vector machine
# model_key = "svm_rbf"         
# [train.params]
# model__C = 3.0
# model__gamma = "scale"

[predict]
model_id = "best"               # "best" or a concrete model id from the registry

Grid search uses the single metric specified in [eval].scoring. The registry records cv_score_type (scorer name) and cv_score (numeric value).

Checklist Before Running

Add models to src/package/models.py
Add grid to src/package/grid.py as GRID_SPACES[<model_key>]
Include desired model keys in [search].model_keys
Set the scoring method for cv in configs/*.toml
Define the metrics used to evaluate predictions in src/package/eval/metrics.py
Define the plots used to visualize predictions in src/package/eval/plots.py

Example Workflow

# Ingest data into raw/full
uv run package register-data --in <path-of-data>

# Copy raw to preprocessed (add transformations here)
uv run package preprocess

# Split preprocessed full into train/test
uv run package split --stage pre

# Grid search on preprocessed train
uv run package search

# Predict on preprocessed test with best model
uv run package predict --plots

# View registry (sorted by cv_score desc, then recency)
uv run package models --top 5

Commands

All commands accept --config or -c to specify a config file (default: configs/default.toml).

Command	Description
`uv run package register-data --in <file>`	Read CSV/Parquet and write to `raw/full.parquet`
`uv run package preprocess`	Preprocess and copy `raw/full.parquet` → `preprocessed/full.parquet`
`uv run package split --stage {raw\|pre}`	Split of `<stage>/full.parquet` into train/test
`uv run package search`	GridSearchCV on `preprocessed/train`; registers best estimators trained on full training data
`uv run package train`	Fit the best or specified model on `preprocessed/train`; registers model artifact as `.skops`
`uv run package predict [--model-id <id>] [--plots] [--plots-out <dir>]`	Predict on `preprocessed/test`; saves metrics and plots
`uv run package models [--top K]`	Show registry: id, model, cv_score_type, cv_score, created_at, params

Model Selection

The registry stores each model with its evaluation scorer. When using --model-id best, the best model among entries matching the current [eval].scoring in the provided config file is selected. If no model exists for that scorer, run package search (or package train --model-id) with that scorer first.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.vscode		.vscode
configs		configs
notebooks		notebooks
src/package		src/package
.DS_Store		.DS_Store
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

template-ml-registry

Install

Layout

Configuration

Checklist Before Running

Example Workflow

Commands

Model Selection

About

Uh oh!

Languages

jerelang/template-ml-registry

Folders and files

Latest commit

History

Repository files navigation

template-ml-registry

Install

Layout

Configuration

Checklist Before Running

Example Workflow

Commands

Model Selection

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Languages