Chroma Radiance (v0.4) #971

Green-Sky · 2025-09-09T09:07:06Z

Green-Sky
Sep 9, 2025

The first good and practical "pixel space" diffusion model.

[...] we propose to model the patch-wise decoding with neural field and present a single-scale, single-stage, efficient, end-to-end solution, coined as pixel neural field diffusion~(PixelNerd)

(quote from https://arxiv.org/abs/2507.23268)

https://huggingface.co/lodestones/Chroma1-Radiance
https://huggingface.co/lodestones/chroma-debug-development-only/tree/main/radiance (training checkpoints)

Updates on xitter https://xcancel.com/LodestoneRock/status/1961647453113119190

comfy pr: comfyanonymous/ComfyUI#9682

Images others have generated:

more:
lodestone-rock/flow@bd22c20#diff-112a6b6f5873ea3874f8dbc88f5ffebeb83ed7cdddc0a79df2ee8de31506b9e4

props to @lodestone-rock

Green-Sky · 2025-09-09T09:07:32Z

Green-Sky
Sep 9, 2025
Author

Say good bye to VAE.

0 replies

FSSRepo · 2025-09-10T18:40:18Z

FSSRepo
Sep 10, 2025

@Green-Sky Is this an alternative to VAEs or an image generation model? I think the best approach would be to implement this alternative applied to the existing models to distinguish the differences (replace the AutoEncoder to this, SDXL, SD1.5). Also, the size (14 GB) of this new alternative catches my attention—I hope it’s perhaps because these are training checkpoints that include the gradients.

0 replies

stduhpf · 2025-09-10T20:42:24Z

stduhpf
Sep 10, 2025

@Green-Sky Is this an alternative to VAEs or an image generation model?

I think it's a pixel-space diffusion model (or rather pixel-space flow). So there is no need for a VAE.

0 replies

Green-Sky · 2025-09-11T08:32:12Z

Green-Sky
Sep 11, 2025
Author

Yes, @stduhpf is right. It is an alternative approach, that does not need a VAE, but requires a full finetune.
If I understood it correctly, it works by reducing the pixel space using DCT, the tech that powers jpeg.
Here is a section from @lodestone-rock 's reddit chrome release post ( https://old.reddit.com/r/StableDiffusion/comments/1mxwr4e/update_chroma_project_training_is_finished_the/ ):

My Beef with GAN training.

GAN is notoriously hard to train and also expensive! It’s so unstable even with a shit ton of math regularization and another mumbojumbo you throw at it. This is the reason behind 2 of the research branches: Radiance is to remove the VAE altogether because you need a GAN to train it, and Flash is to get a few-step speed without needing a GAN to make it fast.

The instability comes from its core design: it's a min-max game between two networks. You have the Generator (the artist trying to paint fakes) and the Discriminator (the critic trying to spot them). They are locked in a predator-prey cycle. If your critic gets too good, the artist can't learn anything and gives up. If the artist gets too good, it fools the critic easily and stops improving. You're trying to find a perfect, delicate balance but in reality, the training often just oscillates wildly instead of settling down.

GANs also suffer badly from mode collapse. Imagine your artist discovers one specific type of image that always fools the critic. The smartest thing for it to do is to just produce that one image over and over. It has "collapsed" onto a single or a handful of modes (a single good solution) and has completely given up on learning the true variety of the data. You sacrifice the model's diversity for a few good-looking but repetitive results.

Honestly, this is probably why you see big labs hand-wave how they train their GANs. The process can be closer to gambling than engineering. They can afford to throw massive resources at hyperparameter sweeps and just pick the one run that works. My goal is different: I want to focus on methods that produce repeatable, reproducible results that can actually benefit everyone!

That's why I'm exploring ways to get the benefits (like speed) without the GAN headache.

The Holy Grail of the End-to-End Generation!

Ideally, we want a model that works directly with pixels, without compressing them into a latent space where information gets lost. Ever notice messed-up eyes or blurry details in an image? That's often the VAE hallucinating details because the original high-frequency information never made it into the latent space.

This is the whole motivation behind Chroma1-Radiance. It's an end-to-end model that operates directly in pixel space. And the neat thing about this is that it's designed to have the same computational cost as a latent space model! Based on the approach from the PixNerd paper, I've modified Chroma to work directly on pixels, aiming for the best of both worlds: full detail fidelity without the extra overhead. Still training for now but you can play around with it.

0 replies

FSSRepo · 2025-09-12T21:21:09Z

FSSRepo
Sep 12, 2025

From what I understand, it doesn’t need a latent image; instead, it works directly on the pixels of an image. In other words, whereas a VAE transforms a latent image into an RGB image, with this new alternative that step isn’t necessary—the diffusion model receives the image pixels directly.

0 replies

stduhpf · 2025-10-16T11:08:48Z

stduhpf
Oct 16, 2025

I looked a bit deeper into how it works and it's not exactly doing diffusion on pixels directly, it's grouping the pixels into 16x16 "latent" patches with a convolution layer, and then it's decoding these patches back to pixels using a NeRF (Neural Radiance Field ?) instead of a VAE. I don't fully understand what it all exactly means yet.

Edit: Okay, so the trick is that the output from the DiT is no longer latent pixels, but the weights of a simple MLP that would denoise the corresponding image patch.

0 replies

Green-Sky · 2025-10-16T12:10:07Z

Green-Sky
Oct 16, 2025
Author

Another cool thing I saw ppl mention, is that loras for Chroma (non-radiance) are somewhat compatible with the the radiance version. But I also saw the Chroma Radiance training is getting continued, we are now at v0.4 :)

Arch diagram from the repo:

(start at the bottom right)

0 replies

leejet · 2025-10-22T17:10:46Z

leejet
Oct 22, 2025
Maintainer

stable-diffusion.cpp now supports Chroma Radiance #910.

0 replies

Chroma Radiance (v0.4) #971

Uh oh!

Uh oh!

Green-Sky Sep 9, 2025

Replies: 8 comments

Uh oh!

Green-Sky Sep 9, 2025 Author

Uh oh!

Uh oh!

FSSRepo Sep 10, 2025

Uh oh!

stduhpf Sep 10, 2025

Uh oh!

Green-Sky Sep 11, 2025 Author

Uh oh!

FSSRepo Sep 12, 2025

Uh oh!

Uh oh!

stduhpf Oct 16, 2025

Uh oh!

Uh oh!

Green-Sky Oct 16, 2025 Author

Uh oh!

leejet Oct 22, 2025 Maintainer

Green-Sky
Sep 9, 2025

Green-Sky
Sep 9, 2025
Author

FSSRepo
Sep 10, 2025

stduhpf
Sep 10, 2025

Green-Sky
Sep 11, 2025
Author

FSSRepo
Sep 12, 2025

stduhpf
Oct 16, 2025

Green-Sky
Oct 16, 2025
Author

leejet
Oct 22, 2025
Maintainer