Questions on deploying Quantized models ...

Hi,

This is more of a question than an issue, but I couldn't find the documentation or source code examples that address this. We have a backend that only supports fixed point operators and I am trying to evaluate using executorch to deploy to our platform. I am new to using Py-Torch as a deployment platform, so please bear with me if my question is too basic.

When I use Py-Torch quantization, I see that it creates a graph in the following format where each operator is sandwiched between `dequant` and `quant` ops:

```
  ... -> dequant -> opX -> quant -> dequant -> opY -> quant -> ...
```

So, when I use executorch *partitioning*, is it the expectation that we pattern match `dequant -> opX -> quant` for lowering into some supported fixed point primitive supported on the backend?

Suppose, I have a Python model of each fixed point op, is there any straightforward way I can run the executorch program directly on Python by substituting the python model for the corresponding lowered module? Since the graph schema is known, it should be possible to do this myself, but wondering if someone already solved this problem.

If I lower the entire graph onto the backend as a single lowered module, I suppose that the *memory planning* doesn't apply inside the lowered module - i.e., the lowered module needs to take care of memory planning of tensors inside the module?

Finally, is there an example that shows how I can pass *already quantized* inputs to the executorch program? For example, if I use fixed quantization for inputs and outputs, clients can directly pass quantized inputs and outputs without the need to deal with floating point data. Is this possible with executorch?

Appreciate your help with my questions. This is an impressive platform!

Thanks,
Vijay.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Questions on deploying Quantized models ... #1141

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Questions on deploying Quantized models ... #1141

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions