Replies: 8 comments
-
|
Say good bye to VAE. |
Beta Was this translation helpful? Give feedback.
-
|
@Green-Sky Is this an alternative to VAEs or an image generation model? I think the best approach would be to implement this alternative applied to the existing models to distinguish the differences (replace the AutoEncoder to this, SDXL, SD1.5). Also, the size (14 GB) of this new alternative catches my attention—I hope it’s perhaps because these are training checkpoints that include the gradients. |
Beta Was this translation helpful? Give feedback.
-
I think it's a pixel-space diffusion model (or rather pixel-space flow). So there is no need for a VAE. |
Beta Was this translation helpful? Give feedback.
-
|
Yes, @stduhpf is right. It is an alternative approach, that does not need a VAE, but requires a full finetune.
|
Beta Was this translation helpful? Give feedback.
-
|
From what I understand, it doesn’t need a latent image; instead, it works directly on the pixels of an image. In other words, whereas a VAE transforms a latent image into an RGB image, with this new alternative that step isn’t necessary—the diffusion model receives the image pixels directly. |
Beta Was this translation helpful? Give feedback.
-
|
I looked a bit deeper into how it works and it's not exactly doing diffusion on pixels directly, it's grouping the pixels into 16x16 "latent" patches with a convolution layer, and then it's decoding these patches back to pixels using a NeRF (Neural Radiance Field ?) instead of a VAE. I don't fully understand what it all exactly means yet. Edit: Okay, so the trick is that the output from the DiT is no longer latent pixels, but the weights of a simple MLP that would denoise the corresponding image patch. |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
|
stable-diffusion.cpp now supports Chroma Radiance #910. |
Beta Was this translation helpful? Give feedback.

Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
The first good and practical "pixel space" diffusion model.
(quote from https://arxiv.org/abs/2507.23268)
https://huggingface.co/lodestones/Chroma1-Radiance
https://huggingface.co/lodestones/chroma-debug-development-only/tree/main/radiance (training checkpoints)
Updates on xitter https://xcancel.com/LodestoneRock/status/1961647453113119190
comfy pr: comfyanonymous/ComfyUI#9682
Images others have generated:
more:
lodestone-rock/flow@bd22c20#diff-112a6b6f5873ea3874f8dbc88f5ffebeb83ed7cdddc0a79df2ee8de31506b9e4
props to @lodestone-rock
Beta Was this translation helpful? Give feedback.
All reactions