Skip to content

Conversation

@hgreebe
Copy link
Contributor

@hgreebe hgreebe commented Dec 17, 2025

Description of changes

  • Changes how compute nodes handle in place updates so that we no longer rely on cfn-hup running on compute nodes.
  • Replaces cfn-hup with a systemd timer that periodically checks a file in shared storage and runs an update if that file has been modified with the new cluster config version that has not yet been applied to the compute nodes. This file is updated by the head node when a cluster update occurs in order to signal to the compute nodes to update.
  • The check-update.service has a 30 second timeout so that it does not run indefinately if something hangs.
  • Revert changes for in_place_update_on_fleet_enabled from: 6eda378#diff-6d6c58cce2dd575c0638ee245d9647b0dfa3cbdef86a136bd816d00538529fb4

Tests

  • Created unit tests to cover the systemd timer and service as well as new update logic
  • Ran all the update integ tests

References

Please review the guidelines for contributing and Pull Request Instructions.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@hgreebe hgreebe requested review from a team as code owners December 17, 2025 13:28
@hgreebe hgreebe force-pushed the develop branch 2 times, most recently from 5ce2f72 to 96d1cbd Compare December 17, 2025 15:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant