-
Notifications
You must be signed in to change notification settings - Fork 767
Description
Hi,
This is more of a question than an issue, but I couldn't find the documentation or source code examples that address this. We have a backend that only supports fixed point operators and I am trying to evaluate using executorch to deploy to our platform. I am new to using Py-Torch as a deployment platform, so please bear with me if my question is too basic.
When I use Py-Torch quantization, I see that it creates a graph in the following format where each operator is sandwiched between dequant and quant ops:
... -> dequant -> opX -> quant -> dequant -> opY -> quant -> ...
So, when I use executorch partitioning, is it the expectation that we pattern match dequant -> opX -> quant for lowering into some supported fixed point primitive supported on the backend?
Suppose, I have a Python model of each fixed point op, is there any straightforward way I can run the executorch program directly on Python by substituting the python model for the corresponding lowered module? Since the graph schema is known, it should be possible to do this myself, but wondering if someone already solved this problem.
If I lower the entire graph onto the backend as a single lowered module, I suppose that the memory planning doesn't apply inside the lowered module - i.e., the lowered module needs to take care of memory planning of tensors inside the module?
Finally, is there an example that shows how I can pass already quantized inputs to the executorch program? For example, if I use fixed quantization for inputs and outputs, clients can directly pass quantized inputs and outputs without the need to deal with floating point data. Is this possible with executorch?
Appreciate your help with my questions. This is an impressive platform!
Thanks,
Vijay.