[ET-VK][ez] Fix use-after-free bug in Vulkan queue creation #16367

SS-JIA · 2025-12-22T20:30:20Z

Summary:

A use-after-free bug in find_compute_queues() was discovered using Valgrind. The function created a local queue_priorities vector inside a loop and stored its data pointer in VkDeviceQueueCreateInfo. When the vector went out of scope, its memory was freed, but vkCreateDevice() later accessed this freed memory. Fixed by adding a queue_priorities parameter to persist the data until after vkCreateDevice() completes.

Problem

A use-after-free bug was discovered in find_compute_queues() using Valgrind.
The function created a local std::vector queue_priorities inside a
loop and stored its data pointer in VkDeviceQueueCreateInfo. When the vector
went out of scope at the end of each iteration, its memory was freed. Later,
when vkCreateDevice() accessed these queue priorities, it read from freed
memory.

Investigation

Valgrind reported:

Invalid read of size 4 at 0xb3bdd60 (freed memory)
Block was freed by operator delete in find_compute_queues()
Block was allocated by operator new in find_compute_queues()
Error occurred during vkCreateDevice() call

Fix

Modified find_compute_queues() to accept an additional parameter
std::vector<std::vector>& queue_priorities that persists the
queue priority data until after vkCreateDevice() completes. This ensures
the memory remains valid when Vulkan needs to access it.

Updated all call sites:

create_logical_device()
Adapter constructor (external device variant)

Verification

Valgrind results before fix: 296 errors from 13 contexts, 1 Invalid read
Valgrind results after fix: 295 errors from 12 contexts, 0 Invalid reads ✓

Remaining errors are in NVIDIA drivers and third-party libraries.

cc @manuelcandales @digantdesai @cbilgin

Summary: A use-after-free bug in find_compute_queues() was discovered using Valgrind. The function created a local queue_priorities vector inside a loop and stored its data pointer in VkDeviceQueueCreateInfo. When the vector went out of scope, its memory was freed, but vkCreateDevice() later accessed this freed memory. Fixed by adding a queue_priorities parameter to persist the data until after vkCreateDevice() completes. ## Problem A use-after-free bug was discovered in find_compute_queues() using Valgrind. The function created a local std::vector<float> queue_priorities inside a loop and stored its data pointer in VkDeviceQueueCreateInfo. When the vector went out of scope at the end of each iteration, its memory was freed. Later, when vkCreateDevice() accessed these queue priorities, it read from freed memory. ## Investigation Valgrind reported: - Invalid read of size 4 at 0xb3bdd60 (freed memory) - Block was freed by operator delete in find_compute_queues() - Block was allocated by operator new in find_compute_queues() - Error occurred during vkCreateDevice() call ## Fix Modified find_compute_queues() to accept an additional parameter std::vector<std::vector<float>>& queue_priorities that persists the queue priority data until after vkCreateDevice() completes. This ensures the memory remains valid when Vulkan needs to access it. Updated all call sites: - create_logical_device() - Adapter constructor (external device variant) ## Verification Valgrind results before fix: 296 errors from 13 contexts, 1 Invalid read Valgrind results after fix: 295 errors from 12 contexts, 0 Invalid reads ✓ Remaining errors are in NVIDIA drivers and third-party libraries.

pytorch-bot · 2025-12-22T20:30:24Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/16367

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 Cancelled Job, 1 Unrelated Failure

As of commit 128b9e8 with merge base 0ee2f49 ():

CANCELLED JOB - The following job was cancelled. Please retry:

pull / test-samsung-models-linux / linux-job (gh)
##[error]The operation was canceled.

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

pull / android / run-emulator (gh) (#16137)
Timeout waiting for emulator to boot.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2025-12-22T20:31:00Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

meta-codesync · 2025-12-22T20:41:50Z

@SS-JIA has imported this pull request. If you are a Meta employee, you can view this in D89687612.

pytorch-bot bot added the module: vulkan Issues related to the Vulkan delegate and code under backends/vulkan/ label Dec 22, 2025

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 22, 2025

kirklandsign approved these changes Dec 22, 2025

View reviewed changes

meta-codesync bot merged commit c59acfb into main Dec 23, 2025
166 of 171 checks passed

meta-codesync bot deleted the pr16367 branch December 23, 2025 03:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ET-VK][ez] Fix use-after-free bug in Vulkan queue creation #16367

[ET-VK][ez] Fix use-after-free bug in Vulkan queue creation #16367

SS-JIA commented Dec 22, 2025 •

edited by pytorch-bot bot

Loading

Uh oh!

pytorch-bot bot commented Dec 22, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Dec 22, 2025

Uh oh!

meta-codesync bot commented Dec 22, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[ET-VK][ez] Fix use-after-free bug in Vulkan queue creation #16367

[ET-VK][ez] Fix use-after-free bug in Vulkan queue creation #16367

Conversation

SS-JIA commented Dec 22, 2025 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Investigation

Fix

Verification

Uh oh!

pytorch-bot bot commented Dec 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/16367

❌ 1 Cancelled Job, 1 Unrelated Failure

Uh oh!

github-actions bot commented Dec 22, 2025

This PR needs a release notes: label

Uh oh!

meta-codesync bot commented Dec 22, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

SS-JIA commented Dec 22, 2025 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Dec 22, 2025 •

edited

Loading

This PR needs a `release notes:` label