metal: use shared buffers on eGPU #17866

jdemeule · 2025-12-08T18:04:52Z

With #15906, I noticed on important regression when using metal backend on eGPU.
This commit restore the previous behavior and add an option to force its activation.

Before #15906, llama-bench on gemma 3 give me this kind of result:

$ ./llama-bench --model ggml-org_gemma-3-4b-it-GGUF_gemma-3-4b-it-Q4_K_M.gguf -r 1 --no-warmup
| model                          |       size |     params | backend    | threads |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | --------------: | -------------------: |
| gemma3 4B Q4_K - Medium        |   2.31 GiB |     3.88 B | Metal,BLAS |       6 |           pp512 |         48.72 ± 0.00 |
| gemma3 4B Q4_K - Medium        |   2.31 GiB |     3.88 B | Metal,BLAS |       6 |           tg128 |          5.95 ± 0.00 |

build: 33daece86 (6440)

So above 45t/s on pp test, and more than 5t/s on tg test.

After #15906, pp test has improved but tg test has been divided by 2.

$ ./llama-bench --model ggml-org_gemma-3-4b-it-GGUF_gemma-3-4b-it-Q4_K_M.gguf -r 1 --no-warmup
| model                          |       size |     params | backend    | threads |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | --------------: | -------------------: |
| gemma3 4B Q4_K - Medium        |   2.31 GiB |     3.88 B | Metal,BLAS |       6 |           pp512 |         60.66 ± 0.00 |
| gemma3 4B Q4_K - Medium        |   2.31 GiB |     3.88 B | Metal,BLAS |       6 |           tg128 |          2.84 ± 0.00 |

build: 0f0a3c285 (6441)

Launching the benchmark with "Metal System Trace" in Instruments.app, reveals some usage of the DMA1 channel which introduced lot of latency (at least, this is how I interpreted it).

With this PR, the performance are back as before on eGPU and should not impact any other configuration (dGPU and M1-M5).

# ./llama-bench --model ggml-org_gemma-3-4b-it-GGUF_gemma-3-4b-it-Q4_K_M.gguf -r 1 --no-warmup
| model                          |       size |     params | backend    | threads |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | --------------: | -------------------: |
| gemma3 4B Q4_K - Medium        |   2.31 GiB |     3.88 B | Metal,BLAS |       6 |           pp512 |         47.24 ± 0.00 |
| gemma3 4B Q4_K - Medium        |   2.31 GiB |     3.88 B | Metal,BLAS |       6 |           tg128 |          6.07 ± 0.00 |

build: b0db6483b (7327)

With ggml-org#15906, I noticed on important regression when using metal backend on eGPU. This commit restore the previous behavior and add an option to force its activation.

ggerganov · 2025-12-09T08:16:17Z

I'm not familiar with the concept of eGPU - is this running on an Intel Mac?

metal: use shared buffers on eGPU

6d041c9

With ggml-org#15906, I noticed on important regression when using metal backend on eGPU. This commit restore the previous behavior and add an option to force its activation.

jdemeule requested a review from ggerganov as a code owner December 8, 2025 18:04

github-actions bot added ggml changes relating to the ggml tensor library for machine learning Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels Dec 8, 2025

loci-dev mentioned this pull request Dec 8, 2025

UPSTREAM PR #17866: metal: use shared buffers on eGPU auroralabs-loci/llama.cpp#488

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

metal: use shared buffers on eGPU #17866

metal: use shared buffers on eGPU #17866

jdemeule commented Dec 8, 2025

Uh oh!

ggerganov commented Dec 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

metal: use shared buffers on eGPU #17866

Are you sure you want to change the base?

metal: use shared buffers on eGPU #17866

Conversation

jdemeule commented Dec 8, 2025

Uh oh!

ggerganov commented Dec 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants