diff --git a/articles/ai-foundry/default/agents/concepts/hosted-agents.md b/articles/ai-foundry/default/agents/concepts/hosted-agents.md index f020359308..9e2cb0df63 100644 --- a/articles/ai-foundry/default/agents/concepts/hosted-agents.md +++ b/articles/ai-foundry/default/agents/concepts/hosted-agents.md @@ -425,6 +425,17 @@ Here's an example: az cognitiveservices agent start --account-name myAccount --project-name myProject --name myAgent --agent-version 1 ``` +> [!TIP] +> - To minimize charges when idle, set `--min-replicas 0`. The service scales to zero and doesn't allocate compute when not serving requests. +> - The first request after scaling to zero incurs a cold start. To avoid cold starts, keep at least one warm replica by setting `--min-replicas 1` (or higher), trading steady-state cost for lower latency. +> - You can adjust replica settings later with a non-versioned update: `az cognitiveservices agent update --min-replicas --max-replicas `. +> +> Example (scale-to-zero when idle): +> +> ```bash +> az cognitiveservices agent start --account-name myAccount --project-name myProject --name myAgent --agent-version 1 --min-replicas 0 --max-replicas 2 +> ``` + When you start an agent: - Current status: **Stopped**