How to analyze etdump results for QNN backend? #16285
-
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 4 replies
-
|
For QNN profiling, I would mainly look at two aspects:
It describes some related debugging and profiling workflows around the Qualcomm backend, though you may already be familiar with it. |
Beta Was this translation helpful? Give feedback.
-
Delegate_CALL is inside Method::execute call. If the time is similar, that means the delegate call (the execution time in HTP) is dominant.
DELEGATE_CALL is everything inside HTP, and the individual operator call (like aten_bmm) means they fall back to cpu.
We can use optrace or the debugger https://github.com/pytorch/executorch/tree/main/backends/qualcomm/debugger#qairt-visualizer as shared by @yujiaoliang
@haowhsu-quic @shewu-quic @winskuo-quic @DannyYuyang-quic do we have guidance on this? |
Beta Was this translation helpful? Give feedback.
-
|
Hi @yujiaoliang
Yes, you can dump QHAS and optrace to observe the NPU utilization and TCM usage.: |
Beta Was this translation helpful? Give feedback.
-
|
Hello https://github.com/pytorch/executorch/tree/main/backends/qualcomm/debugger#limitation Is is currently possible to obtain the Qualcomm HTP Analysis Summary for LLM models, similar to the image you shared? |
Beta Was this translation helpful? Give feedback.


Hi @kimminsu38oo
Yes, the ExecuTorch QNN Intermediate Output Debugger is used to debug accuracy issues by comparing per-tensor outputs with CPU results. QHAS and optrace are used for performance analysis, using dumps from pte.
You can refer to the section about generate-optrace-and-qhas
Please note that the input order in the context binary may differ from the source model. You can check the input order in the JSON file using
<QNN_SDK_ROOT>//bin/x86_64-linux-clang/qnn-context-binary-utility --context_binary $1 --json_file $2.The following show how to generate optrace and QHAS in llama.py for stories260K.
Reproduce command