End-to-end recipes for optimizing diffusion models with torchao and diffusers (inference and FP8 training).
-
Updated
May 29, 2025 - Python
End-to-end recipes for optimizing diffusion models with torchao and diffusers (inference and FP8 training).
A survey of modern quantization formats (e.g., MXFP8, NVFP4) and inference optimization tools (e.g., TorchAO, GemLite), illustrated through the example of Llama-3.1 inference.
Deploy AI models with an API through quantization and containerization.
Add a description, image, and links to the torchao topic page so that developers can more easily learn about it.
To associate your repository with the torchao topic, visit your repo's landing page and select "manage topics."