Model Information
![Z-Image [TensorCoreFP8] - Image 1](/_next/image?url=https%3A%2F%2Fcdn.prompthero.com%2Fxkhq5jeif00nqgyovr3gxi5rftre-Z-Image%20%5BTensorCoreFP8%5D.jpeg&w=1920&q=75)
![Z-Image [TensorCoreFP8] - Image 2](/_next/image?url=https%3A%2F%2Fcdn.prompthero.com%2Frdl12wpbr8z6ms27gnmlzhx3g4qt-Z-Image%20%5BTensorCoreFP8%5D.jpeg&w=1920&q=75)
![Z-Image [TensorCoreFP8] - Image 3](/_next/image?url=https%3A%2F%2Fcdn.prompthero.com%2Frat04pscrvwcvyi6hnv3x758a75u-Z-Image%20%5BTensorCoreFP8%5D.jpeg&w=1920&q=75)
![Z-Image [TensorCoreFP8] - Image 4](/_next/image?url=https%3A%2F%2Fcdn.prompthero.com%2F6htuojp1i8872l48xj5jj9vapffx-Z-Image%20%5BTensorCoreFP8%5D.jpeg&w=1920&q=75)
Description
This is a quantized Z-Image that supports ComfyUI latest "TensorCoreFP8Layout".
On supported GPU, ComfyUI can do calculations in FP8 directly, instead of dequantizing + BF16. Much faster than BF16 and classic FP8 scaled models.
Also supports latest ComfyUI quantization features:
FP8 scaled: Higher precision than pure FP8.
Mixed precision: Keep important layers in BF16. Higher precision than pure FP8 scaled model.
Update (12/6/2025): If you also want to use torch.compile. You need master branch ComfyUI, or v0.3.77 (not release yet).
Update (12/6/2025): Added quantized qwen3 4b.
Mixed precision:
Not every layer is quantized. Early and final and some middle layers are still in BF16. That's why this model is about 500MB larger than classic FP8 model.
Post-training calibrated and FP8 tensor core support:
If you have a newer GPU (Nvidia: RTX 4xxx and later, AMD: gfx1200, gfx1201, gfx950):
Those GPUs have hardware FP8 calculation support. This model has post-training calibrated metadata. ComfyUI will automatically read the metadata and utilize those fancy tensor cores and do calculations in FP8 directly, instead of dequantizing + BF16.
On 4090, comparing to original BF16 Z-Image model:
gguf q4_K model: -26% it/s (dequantization overhead)
classic FP8 scaled model: -8% it/s (dequantization overhead)
this model: +31% it/s
this model + torch.compile: +60% it/s
On RTX 5xxx GPUs it should be faster than above because newer tensor cores and better fp8 support. Not tested.
AMD GPU not tested.
Welcome to share your results in the comment section.
If your GPU does not have FP8 tensor core:
This model can still save you ~50% VRAM. And slightly better than classic fp8 scaled model because of mixed precision.
Tips:
torch.compile is recommended.
Pytorch built-in feature, no dependences required and easy to use.
Recommend the "TorchCompileModelAdvanced" node from ComfyUI-KJNodes. Set the "dynamic" to True.
Note: (12/5/2025) This is a very new feature in ComfyUI. You will need the latest master branch ComfyUI, or ComfyUI v0.3.77.
Note: If this is the first time you use torch.compile, it needs to compile the model, usually takes 2min, and the progress bar will be stuck at step 0. Do NOT cancel the job.
It's compatible with sage attention etc.
ComfyUI only utilizes FP8 tensor cores doing linear, not attention. Which means it is 100% compatible with all kinds of attention optimizations (sage attention etc.).
More about this tensorcorefp8 model, if you are curious:
Why classic FP8 model can't do FP8 directly?
Classic FP8 model only stores weights in FP8. You can't directly do FP8 with a BF16 model. You need calibrated metadata to prevent overflow.
How did you calibrate?
Run the model over and over. Observe everything in the model, every input/output in every layer. Collect thousands of samples.
Will you quantize other models to tensorcorefp8?
If there is a very different version released, e.g. z-image base model, I will make a tensorcorefp8 version for that, assuming no official version.
I do not accept commissions.
Z-Image [TensorCoreFP8]
Turbo
Model Details
- Type
- AI Model
- Task
- text-to-image
- Subtype
- Safetensors / Checkpoint AI Model
- Created
- Updated
- June 5, 2026