Add replica groups in dstack-service #3408

Bihan · 2025-12-20T03:14:52Z

Steps To Test

Step1: Create replica-groups-service.yml

# replica-groups-service.yml
type: service
name: replica-groups-test
python: 3.12

replica_groups:
  - name: replica-1
    replicas: 0..2
    scaling:
      metric: rps
      target: 2
    commands:
      - echo "Group 1 - Version 0" > /tmp/version.txt
      - python3 -m http.server 8000
    resources:
      cpu: 2

  - name: replica-2
    replicas: 0..3
    scaling:
      metric: rps
      target: 2
    commands:
      - echo "Group 2 - Version 0" > /tmp/version.txt
      - python3 -m http.server 8000
    resources:
      cpu: 2

port: 8000

Step2: dstack apply -f replica-groups-service.yml

Step3: Run load_test_replica_groups.py by subsituting your URL and TOKEN

import asyncio
import aiohttp
import time

# ==== Configuration ====
URL = "<URL>"
TOKEN = "<TOKEN>"
RPS = 8          # Requests per second
DURATION = 1800       # Duration in seconds
METHOD = "GET"     # or "POST"
# =======================

HEADERS = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {TOKEN}"
}


async def send_request(session, idx):
    """Send a request and print response"""
    try:
        async with session.request(METHOD, URL, headers=HEADERS) as resp:
            text = await resp.text()
            print(f"\n[{idx}] Status: {resp.status}")
            # print small part of response (HTML preview)
            print(text[:200].strip(), "...\n")
    except Exception as e:
        print(f"[{idx}] Error: {e}")


async def run_load_test():
    total_requests = RPS * DURATION
    interval = 1.0 / RPS

    async with aiohttp.ClientSession() as session:
        start_time = time.perf_counter()
        tasks = []

        for i in range(total_requests):
            task = asyncio.create_task(send_request(session, i + 1))
            tasks.append(task)
            await asyncio.sleep(interval)

        await asyncio.gather(*tasks)
        elapsed = time.perf_counter() - start_time
        print(f"\n✅ Sent {total_requests} requests in {elapsed:.2f}s "
              f"(~{total_requests/elapsed:.2f} RPS)")


if __name__ == "__main__":
    asyncio.run(run_load_test())

Expected Output
Each group gets one replica

Submit the run replica-groups-test? [y/n]: y
 NAME                  BACKEND          GPU  PRICE    STATUS   SUBMITTED 
 replica-groups-test                    -    -        running  07:31     
    group=0 replica=0  aws (us-east-2)  -    $0.0832  running  07:32     
    group=1 replica=1  aws (us-east-2)  -    $0.0832  running  07:32

Later, both groups scale respecting group configs.
group0 scales to 2 replicas,
and group1 scales to 3.

Below is the expected output

NAME                  BACKEND          GPU  PRICE    STATUS   SUBMITTED  
 replica-groups-test                    -    -        running  9 mins ago 
    group=0 replica=0  aws (us-east-2)  -    $0.0832  running  8 mins ago 
            replica=2  aws (us-east-2)  -    $0.0832  running  3 mins ago 
    group=1 replica=1  aws (us-east-2)  -    $0.0832  running  8 mins ago 
            replica=3  aws (us-east-2)  -    $0.0832  running  3 mins ago 
            replica=4  aws (us-east-2)  -    $0.0832  running  3 mins ago

Step4: Check whether replica specific commands were executed.
Attach to the desired replica
Eg:
dstack attach -replica 2 replica-groups-test
ssh replica-groups-test-0-2 'cat /tmp/version.txt'
output: Group 1 - Version 0

Step5: Check rolling deployment.
Important:
Rolling deployments are currently affected by a race condition that also impacts the non–replica group implementation and must be addressed separately (issue). However, when each replica group is configured with a single replica, this race condition does not affect rolling deployments.

Testing instructions:

Scale down each replica group to 1 replica.

Restart the load-testing script with RPS = 2.

After all groups have scaled down to a single replica, re-apply the configuration:

Re-apply
dstack apply -f replica-groups-service.yml

Active run replica-groups-test already exists. Detected changes that can be updated in-place:
- Configuration properties:
  - replica_groups

Update the run? [y/n]: y
 NAME                  BACKEND          GPU  PRICE    STATUS      SUBMITTED 
 replica-groups-test                    -    -        running     07:51     
    group=0 replica=0  aws (us-east-2)  -    $0.0832  terminated  07:51     
            replica=2  aws (us-east-2)  -    $0.0832  running     07:53     
    group=1 replica=1  aws (us-east-2)  -    $0.0832  terminated  07:51     
            replica=3  aws (us-east-2)  -    $0.0832  running     07:53

add_replica_groups_model Replica Groups AutoScaling Rolling deployment and UI Replica Groups implementation clean up

Bihan · 2025-12-20T03:15:47Z

Will be solving merge conflicts as review continues.

Bihan · 2025-12-20T03:19:20Z

Related PRs

#3205 from @DragonStuff

peterschmidt85 · 2025-12-20T09:52:11Z

@Bihan Do we really need replica group names?

peterschmidt85 · 2025-12-20T09:52:41Z

@Bihan Also please check the conflicts with master.

peterschmidt85 · 2025-12-20T09:54:28Z

Cosmetics only: I would rename replica_groups to replicas and also rename replicas under replica_groups to count.

Add replica groups in dstack-service

86139c5

add_replica_groups_model Replica Groups AutoScaling Rolling deployment and UI Replica Groups implementation clean up

Bihan requested review from jvstme and peterschmidt85 December 20, 2025 03:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add replica groups in dstack-service #3408

Add replica groups in dstack-service #3408

Uh oh!

Bihan commented Dec 20, 2025

Uh oh!

Bihan commented Dec 20, 2025

Uh oh!

Bihan commented Dec 20, 2025

Uh oh!

peterschmidt85 commented Dec 20, 2025 •

edited

Loading

Uh oh!

peterschmidt85 commented Dec 20, 2025

Uh oh!

peterschmidt85 commented Dec 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add replica groups in dstack-service #3408

Are you sure you want to change the base?

Add replica groups in dstack-service #3408

Uh oh!

Conversation

Bihan commented Dec 20, 2025

Uh oh!

Bihan commented Dec 20, 2025

Uh oh!

Bihan commented Dec 20, 2025

Related PRs

Uh oh!

peterschmidt85 commented Dec 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

peterschmidt85 commented Dec 20, 2025

Uh oh!

peterschmidt85 commented Dec 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

peterschmidt85 commented Dec 20, 2025 •

edited

Loading