Skip to content

Conversation

@tomassrnka
Copy link

Changes

  • add snapshot-editor tsc set/clear to edit per-vCPU tsc_khz values in vmstate snapshots
  • usage examples: snapshot-editor tsc set --vmstate-path <file> --tsc-khz 2500000; snapshot-editor tsc clear --vmstate-path <file>
  • auto-detect host TSC frequency on x86_64 (fallback to explicit --tsc-khz elsewhere)
  • ensure snapshot headers/version/CRC are preserved when rewriting vmstate files (reuse existing snapshot instead of re-emitting a new one)
  • add an in-place unit test covering set/clear behavior

Reason

  • when moving paused instances between similar CPU SKUs (e.g., GCP N4 → C4), resume can fail due to mismatched TSC frequencies; these commands let us
    clear stale TSC values and stamp the target TSC so snapshots resume cleanly
  • supports instance mixture for https://github.com/e2b-dev/infra

Testing

  • cargo test -p snapshot-editor tsc::tests::test_tsc_set_and_clear_in_place
  • cargo check -p snapshot-editor
  • tools/devtool checkstyle

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.

PR Checklist

  • I have read and understand CONTRIBUTING.md.
  • I have run tools/devtool checkbuild --all to verify that the PR passes
    build checks on all supported architectures.
  • I have run tools/devtool checkstyle to verify that the PR passes the
    automated style checks.
  • I have described what is done in these changes, why they are needed, and
    how they are solving the problem in a clear and encompassing way.
  • I have updated any relevant documentation (both in code and in the docs)
    in the PR.
  • I have mentioned all user-facing changes in CHANGELOG.md.
  • If a specific issue led to this PR, this PR closes the issue.
  • When making API changes, I have followed the
    Runbook for Firecracker API changes.
  • I have tested all new and changed functionalities in unit tests and/or
    integration tests.
  • I have linked an issue to every new TODO.

  • This functionality cannot be added in rust-vmm.

Add set/clear TSC CLI subcommands and wire them into snapshot-editor.

Signed-off-by: Tomas Srnka <tomas.srnka@gmail.com>
Document snapshot-editor TSC set/clear commands with usage examples.

Signed-off-by: Tomas Srnka <tomas.srnka@gmail.com>
Rewrite TSC set/clear to always modify the input vmstate file in-place.
Update docs and changelog to reflect in-place behavior.

Signed-off-by: Tomas Srnka <tomas.srnka@gmail.com>
Add a unit test that sets and clears TSC values in-place.
Verify vCPU metadata updates when reloading the vmstate.

Signed-off-by: Tomas Srnka <tomas.srnka@gmail.com>
Allow MissingFrequency to be dead_code on x86_64 to avoid warnings.

Signed-off-by: Tomas Srnka <tomas.srnka@gmail.com>
Insert a blank line after the Added heading so mdformat style passes.

Signed-off-by: Tomas Srnka <tomas.srnka@gmail.com>
@tomassrnka tomassrnka marked this pull request as ready for review December 15, 2025 19:21
@ShadowCurse
Copy link
Contributor

Hi @tomassrnka, can you elaborate on the way snapshot restore fails for you?
In general, the tsc frequency value stored in the snapshot is what guest expects it to be, so changing it will mess with the guest if it relies on tsc for timing. During snapshot restoration process Firecracker sets the tsc frequency for kvm, so the kernel can emulate correct guest frequency if the host one is different. This should make so transferring VMs between hosts with different host tsc frequencies function properly.

@ilstam
Copy link
Contributor

ilstam commented Dec 18, 2025

Firecracker sets the tsc frequency for kvm

Note that TSC scaling requires H/W support. And in case you use nested virtualization the host kernel needs to be 5.15 or later. So we need to understand your failure mode.

@tomassrnka
Copy link
Author

The real-world scenario we’re hitting is the following:

We create a snapshot on a C4 instance on GCP and can successfully restore it on an N4 instance. However, restoring a snapshot created on N4 back onto C4 fails due to the TSC constraint:

"kvm: user requested TSC rate below hardware speed"

Both C4 and N4 instances use the same CPU family (Intel Xeon Scalable – Emerald Rapids / Granite Rapids), so this was initially unexpected.

Unless a new TSC value is read from the destination host and the snapshot is adjusted accordingly, the restore cannot succeed.

As a result, we introduced two new calls in snapshot-editor that allow modifying the TSC values inside the snapshot before restore.

The error we observe without this adjustment is:

[PUT /snapshot/load][400] loadSnapshotBadRequest {"fault_message":"Load snapshot error: Failed to restore from snapshot: Failed to build microVM from snapshot: Could not set TSC scaling within the snapshot: Invalid argument (os error 22)"}

In other words, restoring on a CPU with a higher TSC works, but not the other way around.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants