diff --git a/CHANGELOG.md b/CHANGELOG.md index b78134f882..9ad4ba70cd 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -5,6 +5,7 @@ CHANGELOG ------ **CHANGES** +- Improve cluster update resiliency by ensuring clustermgtd is started after updates complete successfully, or after failed updates where queue reconfiguration succeeded. - Add chef attribute `cluster/in_place_update_on_fleet_enabled` to disable in-place updates on compute and login nodes and mitigate performance impact at scale. - Upgrade Slurm to version 24.11.7 (from 24.11.6). @@ -24,10 +25,13 @@ CHANGELOG - Open MPI: openmpi40-aws-4.1.7-2 and openmpi50-aws-5.0.8-11 **BUG FIXES** +- Fix an issue where cfn-hup enters an endless loop on the head node after a rollback to a cluster state older than 24 hours, caused by cfn-signal failing to signal an expired wait condition handle. +- Fix race condition where compute nodes could deploy the wrong cluster config version after an update failure. +- Prevent cluster readiness check failures due to instances launched while the check is in progress. +- Fix incorrect timestamp parsing for chef-client.log in CloudWatch Agent configuration. - Reduce EFA installation time for Ubuntu by ~20 minutes by only holding kernel packages for the installed kernel. - Add GetFunction and GetPolicy permissions to PClusterBuildImageCleanupRole to prevent AccessDenied errors during build image stack deletion. - Fix validation error messages when `DevSettings` is null or `DevSettings/InstanceTypesData` is missing required fields. -- Fix an issue where cfn-hup enters an endless loop on the head node after a rollback to a cluster state older than 24 hours, caused by cfn-signal failing to signal an expired wait condition handle. 3.14.0 ------