Skip to content

Conversation

@sivakami-projects
Copy link
Contributor

@sivakami-projects sivakami-projects commented Nov 24, 2025

Pipeline to run repeated tests on long running Swiftv2 AKS clusters.

Test pipeline - Tests are scheduled to run every 3 hours on central us euap. Link to Pipeline
Recent test run

Testing Approach
Test Lifecycle (per stage):
Create 8 pod scenarios with PodNetwork, PodNetworkInstance, Pods
Run 9 connectivity tests (HTTP-based)
Run private endpoint tests (storage access)
Delete all resources (Phase 1: Pods, Phase 2: PNI/PN/Namespaces)

Node Selection:
Tests filter by workload-type=$WORKLOAD_TYPE AND nic-capacity labels
Ensures isolation between different workload type stages
Currently: WORKLOAD_TYPE=swiftv2-linux

Files Changed
Pipeline Configuration
pipeline.yaml: Main pipeline with schedule trigger
long-running-pipeline-template.yaml: Stage definitions with VM SKU constants

Setup Scripts
create_aks.sh: AKS cluster creation with node labeling
create_vnets.sh: Customer VNet creation
create_peerings.sh: VNet peering mesh
create_storage.sh: Storage accounts with public access disabled (SA1 only)
create_nsg.sh: NSG rule application with retry logic
create_pe.sh: Private endpoint and DNS zone setup

Test Code
datapath.go: Enhanced with node label filtering, private endpoint testing
datapath_create_test.go: Resource creation scenarios
datapath_connectivity_test.go: HTTP connectivity validation
datapath_private_endpoint_test.go: Private endpoint access/isolation tests
datapath_delete_test.go: Resource cleanup

Documentation
README.md:

Reason for Change:

Issue Fixed:

Requirements:

Notes:

sivakami added 2 commits November 24, 2025 08:38
- Implemented scheduled pipeline running every 1 hour with persistent infrastructure
- Split test execution into 2 jobs: Create (with 20min wait) and Delete
- Added 8 test scenarios across 2 AKS clusters, 4 VNets, different subnets
- Implemented two-phase deletion strategy to prevent PNI ReservationInUse errors
- Added context timeouts on kubectl commands with force delete fallbacks
- Resource naming uses RG name as BUILD_ID for uniqueness across parallel setups
- Added SkipAutoDeleteTill tags to prevent automatic resource cleanup
- Conditional setup stages controlled by runSetupStages parameter
- Auto-generate RG name from location or allow custom names for parallel setups
- Added comprehensive README with setup instructions and troubleshooting
- Node selection by agentpool labels with usage tracking to prevent conflicts
- Kubernetes naming compliance (RFC 1123) for all resources

fix ginkgo flag.

Add datapath tests.

Delete old test file.

Add testcases for provate endpoint.

Ginkgo run specs only on specified files.

update pipeline params.

Add ginkgo tags

Add datapath tests.

Add ginkgo build tags.

remove wait time.

set namespace.

update pod image.

Add more nsg rules to block subnets s1 and s2

test change.

Change delegated subnet address range. Use delegated interface for network connectivity tests.

Datapath test between clusters.

test.

test private endpoints.

fix private endpoint tests.

Set storage account names in putput var.

set storage account name.

fix pn names.

update pe

update pe test.

update sas token generation.

Add node labels for sw2 scenario, cleanup pods on any test failure.

enable nsg tests.

update storage.

Add rules to nsg.

disable private endpoint negative test.

disable public network access on storage account with private endpoint.

wait for default nsg to be created.

disable negative test on private endpoint.

private endpoint depends on aks cluster vnets, change pipeline job dependencies.

Add node labels for each workload type and nic capacity.

make sku constant.

Update readme, set schedule for long running cluster on test branch.
@sivakami-projects sivakami-projects marked this pull request as ready for review November 24, 2025 17:07
@sivakami-projects sivakami-projects requested a review from a team as a code owner November 24, 2025 17:07
Copilot finished reviewing on behalf of sivakami-projects November 24, 2025 17:09
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a comprehensive long-running test pipeline for SwiftV2 pod networking on Azure AKS. The pipeline creates persistent infrastructure (2 AKS clusters, 4 VNets, storage accounts with private endpoints, NSGs) and runs scheduled tests every 3 hours to validate pod-to-pod connectivity, network security group isolation, and private endpoint access across multi-tenant scenarios.

Key Changes:

  • Adds scheduled pipeline with conditional infrastructure setup (runSetupStages parameter)
  • Implements 8 pod test scenarios across 2 clusters and 4 VNets with different NIC capacities
  • Includes 9 connectivity tests and 5 private endpoint tests with tenant isolation validation

Reviewed changes

Copilot reviewed 19 out of 20 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
.pipelines/swiftv2-long-running/pipeline.yaml Main pipeline with 3-hour scheduled trigger and runSetupStages parameter
.pipelines/swiftv2-long-running/template/long-running-pipeline-template.yaml Two-stage template: setup (conditional) and datapath tests with 4 jobs
.pipelines/swiftv2-long-running/scripts/*.sh Infrastructure setup scripts for AKS, VNets, storage, NSGs, and private endpoints
test/integration/swiftv2/longRunningCluster/datapath*.go Test implementation split into create, connectivity, private endpoint, and delete tests
test/integration/swiftv2/helpers/az_helpers.go Azure CLI and kubectl helper functions for resource management
test/integration/manifests/swiftv2/long-running-cluster/*.yaml Kubernetes resource templates for PodNetwork, PNI, and Pods
go.mod, go.sum Updates to support Ginkgo v2 testing framework
hack/aks/Makefile Updates for SwiftV2 cluster creation with multi-tenancy tags
.pipelines/swiftv2-long-running/README.md Comprehensive documentation of pipeline architecture and test scenarios

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

sleep_seconds=15
retry_count=0

while [[ $retry_count -lt $max_retries ]]; do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the min & max wait time for resources creation step?, does it wait 20mins max for cluster completion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These retries are for stamp infra vnets. Cluster creation job has no timeout set.

echo "Pod IP: $(hostname -i)";
echo "Starting HTTP server on port 8080";
# Create a simple HTTP server directory
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to define this index page instead of using a library image? please mention the use case if we are adding details to print

// Format: pn-<rg>-<vnet-prefix>-<subnet-name>
// Example: pn-sv2-long-run-centraluseuap-a1-s1
getNamespace := func(vnetName, subnetName string) string {
// Extract vnet prefix (a1, a2, a3, b1, etc.) from cx_vnet_a1 -> a1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: rename vnet names to be vn1, vn2 - to give some clarity on naming of pod

ginkgo.Fail(fmt.Sprintf("Missing required environment variables: RG='%s', BUILD_ID='%s'", rg, buildId))
}

ginkgo.It("deletes PodNetwork, PodNetworkInstance, and Pods", ginkgo.NodeTimeout(0), func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it check for any pending resources like MTPNC etc? sometimes mtpnc deletion gets stuck in ICMs, would be good to check

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me add that.

sivakami-projects and others added 5 commits December 5, 2025 10:43
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: sivakami-projects <126191544+sivakami-projects@users.noreply.github.com>
…st.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: sivakami-projects <126191544+sivakami-projects@users.noreply.github.com>
…st.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: sivakami-projects <126191544+sivakami-projects@users.noreply.github.com>
…ity_test.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: sivakami-projects <126191544+sivakami-projects@users.noreply.github.com>
…st.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: sivakami-projects <126191544+sivakami-projects@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants