Skip to content

Commit 7007a49

Browse files
committed
Update blog
1 parent a9de2f9 commit 7007a49

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

content/blog/2025-10-27-1761560082.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ Therefore the job of running a computation graph (like ONNX) efficiently on GPU(
2323
- every machine in each factory is being utilized optimally
2424
- account for the time it takes to move things between cities/factories/machines
2525

26-
And most importantly, you need to focus on your overall goal, i.e. either the time it takes to produce the finished product (i.e. latency) or maximum utilisation of all your machines (i.e. throughput).
26+
And most importantly, you need to focus on your overall goal, i.e. either the time it takes to produce the finished product (i.e. latency), or maximum utilisation of all your machines (i.e. throughput), or maybe power efficiency.
2727

2828
If you're supporting multiple models, then you're dealing with multiple computation graphs. And if you're supporting multiple GPU vendors (NVIDIA, AMD etc), and multiple architectures of each vendor (e.g. 3060, 4080, 5080 etc), then you're dealing with multiple factory configurations.
2929

0 commit comments

Comments
 (0)