Skip to content

Questions on deploying Quantized models ... #1141

@rvijayc

Description

@rvijayc

Hi,

This is more of a question than an issue, but I couldn't find the documentation or source code examples that address this. We have a backend that only supports fixed point operators and I am trying to evaluate using executorch to deploy to our platform. I am new to using Py-Torch as a deployment platform, so please bear with me if my question is too basic.

When I use Py-Torch quantization, I see that it creates a graph in the following format where each operator is sandwiched between dequant and quant ops:

  ... -> dequant -> opX -> quant -> dequant -> opY -> quant -> ...

So, when I use executorch partitioning, is it the expectation that we pattern match dequant -> opX -> quant for lowering into some supported fixed point primitive supported on the backend?

Suppose, I have a Python model of each fixed point op, is there any straightforward way I can run the executorch program directly on Python by substituting the python model for the corresponding lowered module? Since the graph schema is known, it should be possible to do this myself, but wondering if someone already solved this problem.

If I lower the entire graph onto the backend as a single lowered module, I suppose that the memory planning doesn't apply inside the lowered module - i.e., the lowered module needs to take care of memory planning of tensors inside the module?

Finally, is there an example that shows how I can pass already quantized inputs to the executorch program? For example, if I use fixed quantization for inputs and outputs, clients can directly pass quantized inputs and outputs without the need to deal with floating point data. Is this possible with executorch?

Appreciate your help with my questions. This is an impressive platform!

Thanks,
Vijay.

Metadata

Metadata

Assignees

Labels

help wantedExtra attention is neededmodule: quantizationIssues related to quantizationtriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions