You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/blog/2025-10-27-1761560082.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -23,7 +23,7 @@ Therefore the job of running a computation graph (like ONNX) efficiently on GPU(
23
23
- every machine in each factory is being utilized optimally
24
24
- account for the time it takes to move things between cities/factories/machines
25
25
26
-
And most importantly, you need to focus on your overall goal, i.e. either the time it takes to produce the finished product (i.e. latency) or maximum utilisation of all your machines (i.e. throughput).
26
+
And most importantly, you need to focus on your overall goal, i.e. either the time it takes to produce the finished product (i.e. latency), or maximum utilisation of all your machines (i.e. throughput), or maybe power efficiency.
27
27
28
28
If you're supporting multiple models, then you're dealing with multiple computation graphs. And if you're supporting multiple GPU vendors (NVIDIA, AMD etc), and multiple architectures of each vendor (e.g. 3060, 4080, 5080 etc), then you're dealing with multiple factory configurations.
0 commit comments