[Build] Fix Image Builder reboot failures on Ubuntu by holding snap refreshes during build #7153
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description of changes
Fix intermittent Image Builder failures on Ubuntu 22.04 and 24.04 where the build fails after the reboot step with SSM agent connectivity issues.
Failure
Build image intermittently fails on Ubuntu 22.04 and 24.04 because of build instance reboot failure.
Root Cause
Failures are related to dual version of ssm-agent being mounted during the build. Those two versions are installed on the system because snap auto-refresh runs during the AMI build process, updating the SSM agent (installed via snap) in the background. When a reboot occurs while the snap refresh is in progress or has left the system in a transitional state (multiple snap revisions mounted), SSM agent fails to connect to SSM, so SSM marks the reboot as failed.
Interesting details
The commands below have been run on an instance whihc runs the AMI used as parent image for the failed build.
They show that:
Tests
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.