vulkan: Allow non-pow2 n_experts in topk_moe #17872

jeffbolznv · 2025-12-08T21:21:07Z

Saw granite-3.0-3b-a800m-instruct-Q8_0.gguf being used at https://www.phoronix.com/review/llama-cpp-vulkan-eoy2025/3, with lower than expected scaling for 5090.

before:

Z:\github\jeffbolznv\llama.cpp\build\bin\RelWithDebInfo>llama-bench -fa 1 -p 512 -n 128 --prio 1 -r 10 -m c:\models\granite-3.0-3b-a800m-instruct-Q8_0.gguf
ggml_vulkan: 0 = NVIDIA GeForce RTX 4070 (NVIDIA) | uma: 0 | fp16: 1 | bf16: 1 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: NV_coopmat2
| model                          |       size |     params | backend    | ngl | fa |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | --------------: | -------------------: |
| granitemoe 3B Q8_0             |   3.34 GiB |     3.37 B | Vulkan     |  99 |  1 |           pp512 |    11545.75 ± 199.63 |
| granitemoe 3B Q8_0             |   3.34 GiB |     3.37 B | Vulkan     |  99 |  1 |           tg128 |        305.26 ± 3.35 |

ggml_vulkan: 0 = NVIDIA GeForce RTX 5090 (NVIDIA) | uma: 0 | fp16: 1 | bf16: 1 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: NV_coopmat2
| model                          |       size |     params | backend    | ngl | fa |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | --------------: | -------------------: |
| granitemoe 3B Q8_0             |   3.34 GiB |     3.37 B | Vulkan     |  99 |  1 |           pp512 |   26586.25 ± 3416.45 |
| granitemoe 3B Q8_0             |   3.34 GiB |     3.37 B | Vulkan     |  99 |  1 |           tg128 |        455.17 ± 6.23 |

after:

Z:\github\jeffbolznv\llama.cpp\build\bin\RelWithDebInfo>llama-bench -fa 1 -p 512 -n 128 --prio 1 -r 10 -m c:\models\granite-3.0-3b-a800m-instruct-Q8_0.gguf
ggml_vulkan: 0 = NVIDIA GeForce RTX 4070 (NVIDIA) | uma: 0 | fp16: 1 | bf16: 1 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: NV_coopmat2
| model                          |       size |     params | backend    | ngl | fa |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | --------------: | -------------------: |
| granitemoe 3B Q8_0             |   3.34 GiB |     3.37 B | Vulkan     |  99 |  1 |           pp512 |    11704.54 ± 179.34 |
| granitemoe 3B Q8_0             |   3.34 GiB |     3.37 B | Vulkan     |  99 |  1 |           tg128 |        325.64 ± 1.62 |

ggml_vulkan: 0 = NVIDIA GeForce RTX 5090 (NVIDIA) | uma: 0 | fp16: 1 | bf16: 1 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: NV_coopmat2
| model                          |       size |     params | backend    | ngl | fa |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -: | --------------: | -------------------: |
| granitemoe 3B Q8_0             |   3.34 GiB |     3.37 B | Vulkan     |  99 |  1 |           pp512 |    28154.26 ± 504.16 |
| granitemoe 3B Q8_0             |   3.34 GiB |     3.37 B | Vulkan     |  99 |  1 |           tg128 |       521.35 ± 10.23 |

vulkan: Allow non-pow2 n_experts in topk_moe

9fee3f5

jeffbolznv requested review from 0cc4m and ggerganov as code owners December 8, 2025 21:21

loci-dev mentioned this pull request Dec 8, 2025

UPSTREAM PR #17872: vulkan: Allow non-pow2 n_experts in topk_moe auroralabs-loci/llama.cpp#492

Open

github-actions bot added testing Everything test related Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Dec 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

vulkan: Allow non-pow2 n_experts in topk_moe #17872

vulkan: Allow non-pow2 n_experts in topk_moe #17872

jeffbolznv commented Dec 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vulkan: Allow non-pow2 n_experts in topk_moe #17872

Are you sure you want to change the base?

vulkan: Allow non-pow2 n_experts in topk_moe #17872

Conversation

jeffbolznv commented Dec 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant