Skip to content

Conversation

@rattus128
Copy link
Contributor

Instead of doing the temporal causal 3d convolutions over the full tensor, do them 2 latent frames at most at a time (this can be more real frames). This reduces the VAEs VRAM usage in the temporal dimension to a constant. For videos with any substantial number of frames this is a major reduction in VRAM usage.

Improves at least Hunyuan 1.0 and Kandinsky VAEs.

Regression tested with SDXL (shares 2d code).

All of these 480Px81f VAE ops used to tile, and now they fit comfortably (RTX5090):

Screenshot from 2025-11-30 09-17-50

Primary commits:

Author: Rattus <rattus128@gmail.com>
Date:   Sun Nov 30 08:56:07 2025 +1000

    model: Add temporal roll to main VAE encoder
    
    If there are no attention layers, its a standard resnet and VideoConv3d
    is asked for, substitute in the temporal rolling VAE algorithm. This
    reduces VAE usage by the temporal dimension (can be huge VRAM savings).

commit 6571c912a70a6e9233283467f85b7ee10b3d7b59
Author: Rattus <rattus128@gmail.com>
Date:   Sun Nov 30 08:56:07 2025 +1000

    model: Add temporal roll to main VAE decoder
    
    If there are no attention layers, its a standard resnet and VideoConv3d
    is asked for, substitute in the temporal rolloing VAE algorithm. This
    reduces VAE usage by the temporal dimension (can be huge VRAM savings).

Remove the transitive import of VideoConv3d and Resnet and takes these
from actual implementation source.
According to git grep, this is not used now, and was not used in the
initial commit that introduced it (see below).

This semantic is difficult to implement temporal roll VAE for (and would
defeat the purpose). Rather than implement the complex if, just delete
the unused feature.

(venv) rattus@rattus-box2:~/ComfyUI$ git log --oneline
220afe3 (HEAD) Initial commit.
(venv) rattus@rattus-box2:~/ComfyUI$ git grep give_pre
comfy/ldm/modules/diffusionmodules/model.py:                 resolution, z_channels, give_pre_end=False, tanh_out=False, use_linear_attn=False,
comfy/ldm/modules/diffusionmodules/model.py:        self.give_pre_end = give_pre_end
comfy/ldm/modules/diffusionmodules/model.py:        if self.give_pre_end:

(venv) rattus@rattus-box2:~/ComfyUI$ git co origin/master
Previous HEAD position was 220afe3 Initial commit.
HEAD is now at 9d8a817 Enable async offloading by default on Nvidia. (comfyanonymous#10953)
(venv) rattus@rattus-box2:~/ComfyUI$ git grep give_pre
comfy/ldm/modules/diffusionmodules/model.py:                 resolution, z_channels, give_pre_end=False, tanh_out=False, use_linear_attn=False,
comfy/ldm/modules/diffusionmodules/model.py:        self.give_pre_end = give_pre_end
comfy/ldm/modules/diffusionmodules/model.py:        if self.give_pre_end:
Move the carrying conv op to the common VAE code and give it a better
name. Roll the carry implementation logic for Resnet into the base
class and scrap the Hunyuan specific subclass.
If there are no attention layers, its a standard resnet and VideoConv3d
is asked for, substitute in the temporal rolloing VAE algorithm. This
reduces VAE usage by the temporal dimension (can be huge VRAM savings).
If there are no attention layers, its a standard resnet and VideoConv3d
is asked for, substitute in the temporal rolling VAE algorithm. This
reduces VAE usage by the temporal dimension (can be huge VRAM savings).
@yoland68 yoland68 added the Core Core team dependency label Dec 2, 2025
@comfyanonymous comfyanonymous merged commit 73f5649 into comfyanonymous:master Dec 3, 2025
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Core Core team dependency

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants