Skip to content

Conversation

@Bihan
Copy link
Collaborator

@Bihan Bihan commented Dec 20, 2025

Steps To Test

Step1: Create replica-groups-service.yml

# replica-groups-service.yml
type: service
name: replica-groups-test
python: 3.12

replica_groups:
  - name: replica-1
    replicas: 0..2
    scaling:
      metric: rps
      target: 2
    commands:
      - echo "Group 1 - Version 0" > /tmp/version.txt
      - python3 -m http.server 8000
    resources:
      cpu: 2

  - name: replica-2
    replicas: 0..3
    scaling:
      metric: rps
      target: 2
    commands:
      - echo "Group 2 - Version 0" > /tmp/version.txt
      - python3 -m http.server 8000
    resources:
      cpu: 2

port: 8000

Step2: dstack apply -f replica-groups-service.yml

Step3: Run load_test_replica_groups.py by subsituting your URL and TOKEN

import asyncio
import aiohttp
import time

# ==== Configuration ====
URL = "<URL>"
TOKEN = "<TOKEN>"
RPS = 8          # Requests per second
DURATION = 1800       # Duration in seconds
METHOD = "GET"     # or "POST"
# =======================

HEADERS = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {TOKEN}"
}


async def send_request(session, idx):
    """Send a request and print response"""
    try:
        async with session.request(METHOD, URL, headers=HEADERS) as resp:
            text = await resp.text()
            print(f"\n[{idx}] Status: {resp.status}")
            # print small part of response (HTML preview)
            print(text[:200].strip(), "...\n")
    except Exception as e:
        print(f"[{idx}] Error: {e}")


async def run_load_test():
    total_requests = RPS * DURATION
    interval = 1.0 / RPS

    async with aiohttp.ClientSession() as session:
        start_time = time.perf_counter()
        tasks = []

        for i in range(total_requests):
            task = asyncio.create_task(send_request(session, i + 1))
            tasks.append(task)
            await asyncio.sleep(interval)

        await asyncio.gather(*tasks)
        elapsed = time.perf_counter() - start_time
        print(f"\n✅ Sent {total_requests} requests in {elapsed:.2f}s "
              f"(~{total_requests/elapsed:.2f} RPS)")


if __name__ == "__main__":
    asyncio.run(run_load_test())

Expected Output
Each group gets one replica

Submit the run replica-groups-test? [y/n]: y
 NAME                  BACKEND          GPU  PRICE    STATUS   SUBMITTED 
 replica-groups-test                    -    -        running  07:31     
    group=0 replica=0  aws (us-east-2)  -    $0.0832  running  07:32     
    group=1 replica=1  aws (us-east-2)  -    $0.0832  running  07:32

Later, both groups scale respecting group configs.
group0 scales to 2 replicas,
and group1 scales to 3.

Below is the expected output

NAME                  BACKEND          GPU  PRICE    STATUS   SUBMITTED  
 replica-groups-test                    -    -        running  9 mins ago 
    group=0 replica=0  aws (us-east-2)  -    $0.0832  running  8 mins ago 
            replica=2  aws (us-east-2)  -    $0.0832  running  3 mins ago 
    group=1 replica=1  aws (us-east-2)  -    $0.0832  running  8 mins ago 
            replica=3  aws (us-east-2)  -    $0.0832  running  3 mins ago 
            replica=4  aws (us-east-2)  -    $0.0832  running  3 mins ago

Step4: Check whether replica specific commands were executed.
Attach to the desired replica
Eg:
dstack attach -replica 2 replica-groups-test
ssh replica-groups-test-0-2 'cat /tmp/version.txt'
output: Group 1 - Version 0

Step5: Check rolling deployment.
Important:
Rolling deployments are currently affected by a race condition that also impacts the non–replica group implementation and must be addressed separately (issue). However, when each replica group is configured with a single replica, this race condition does not affect rolling deployments.

Testing instructions:

Scale down each replica group to 1 replica.

Restart the load-testing script with RPS = 2.

After all groups have scaled down to a single replica, re-apply the configuration:

Re-apply
dstack apply -f replica-groups-service.yml

Active run replica-groups-test already exists. Detected changes that can be updated in-place:
- Configuration properties:
  - replica_groups

Update the run? [y/n]: y
 NAME                  BACKEND          GPU  PRICE    STATUS      SUBMITTED 
 replica-groups-test                    -    -        running     07:51     
    group=0 replica=0  aws (us-east-2)  -    $0.0832  terminated  07:51     
            replica=2  aws (us-east-2)  -    $0.0832  running     07:53     
    group=1 replica=1  aws (us-east-2)  -    $0.0832  terminated  07:51     
            replica=3  aws (us-east-2)  -    $0.0832  running     07:53     

add_replica_groups_model

Replica Groups AutoScaling

Rolling deployment and UI

Replica Groups implementation clean up
@Bihan
Copy link
Collaborator Author

Bihan commented Dec 20, 2025

Will be solving merge conflicts as review continues.

@Bihan
Copy link
Collaborator Author

Bihan commented Dec 20, 2025

Related PRs

#3205 from @DragonStuff

@peterschmidt85
Copy link
Contributor

peterschmidt85 commented Dec 20, 2025

@Bihan Do we really need replica group names?

@peterschmidt85
Copy link
Contributor

@Bihan Also please check the conflicts with master.

@peterschmidt85
Copy link
Contributor

Cosmetics only: I would rename replica_groups to replicas and also rename replicas under replica_groups to count.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants