-
Notifications
You must be signed in to change notification settings - Fork 10.8k
Closed as not planned
Closed as not planned
Copy link
Labels
Potential BugUser is reporting a bug. This should be tested.User is reporting a bug. This should be tested.
Description
Custom Node Testing
- I have tried disabling custom nodes and the issue persists (see how to disable custom nodes if you need help)
Expected Behavior
Image able to produce normal preview from step 2 of inference & normal output saved after vae decode
Actual Behavior
Preview is black thruout inference & only black output is generated
Steps to Reproduce
- Install calcuis/gguf city96/ComfyUI-GGUF extension for gguf loading and pollockjj/ComfyUI-MultiGPU for manual offloading, offloading ext is necessary to infer on my device as disabling will consistently OOM, gguf might be replaced by safetensors on another device & would likely still replicate since it's unlikely te at fault
- Obtain model files: diffusion model, te te mmproj, vae, lightning lora
python main.py --preview-method auto --force-fp16 --disable-cuda-malloc --windows-standalone-build- Run workflow from the screencap: (I avoid uploading image with metadata or json)
Debug Logs
Console (cmd in conda environment):
(comfyui) PS D:\webui-forge\ComfyUI_windows_portable\ComfyUI> .\run
D:\webui-forge\ComfyUI_windows_portable\ComfyUI>python main.py --preview-method auto --force-fp16 --disable-cuda-malloc --windows-standalone-build
Adding extra search path checkpoints D:\webui-forge\ComfyUI_windows_portable\ComfyUI\path\to\stable-diffusion-webui\models\Stable-diffusion
Adding extra search path configs D:\webui-forge\ComfyUI_windows_portable\ComfyUI\path\to\stable-diffusion-webui\models\Stable-diffusion
Adding extra search path vae D:\webui-forge\ComfyUI_windows_portable\ComfyUI\path\to\stable-diffusion-webui\models\VAE
Adding extra search path loras D:\webui-forge\ComfyUI_windows_portable\ComfyUI\path\to\stable-diffusion-webui\models\Lora
Adding extra search path loras D:\webui-forge\ComfyUI_windows_portable\ComfyUI\path\to\stable-diffusion-webui\models\LyCORIS
Adding extra search path upscale_models D:\webui-forge\ComfyUI_windows_portable\ComfyUI\path\to\stable-diffusion-webui\models\ESRGAN
Adding extra search path upscale_models D:\webui-forge\ComfyUI_windows_portable\ComfyUI\path\to\stable-diffusion-webui\models\RealESRGAN
Adding extra search path upscale_models D:\webui-forge\ComfyUI_windows_portable\ComfyUI\path\to\stable-diffusion-webui\models\SwinIR
Adding extra search path embeddings D:\webui-forge\ComfyUI_windows_portable\ComfyUI\path\to\stable-diffusion-webui\embeddings
Adding extra search path hypernetworks D:\webui-forge\ComfyUI_windows_portable\ComfyUI\path\to\stable-diffusion-webui\models\hypernetworks
Adding extra search path controlnet D:\webui-forge\ComfyUI_windows_portable\ComfyUI\path\to\stable-diffusion-webui\models\ControlNet
Adding extra search path checkpoints D:\webui-forge\webui\models\Stable-diffusion
Adding extra search path configs D:\webui-forge\webui\models\Stable-diffusion
Adding extra search path vae D:\webui-forge\webui\models\VAE
Adding extra search path loras D:\webui-forge\webui\models\Lora
Adding extra search path loras D:\webui-forge\webui\models\LyCORIS
Adding extra search path upscale_models D:\webui-forge\webui\models\ESRGAN
Adding extra search path upscale_models D:\webui-forge\webui\models\RealESRGAN
Adding extra search path upscale_models D:\webui-forge\webui\models\SwinIR
Adding extra search path embeddings D:\webui-forge\webui\embeddings
Adding extra search path hypernetworks D:\webui-forge\webui\models\hypernetworks
Adding extra search path controlnet D:\webui-forge\webui\models\ControlNet
Adding extra search path clip D:\webui-forge\webui\models\text_encoder
Checkpoint files will always be loaded safely.
Total VRAM 6144 MB, total RAM 32605 MB
pytorch version: 2.8.0+cu126
xformers version: 0.0.32.post2
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA GeForce RTX 2060 : native
Enabled pinned memory 14672.0
Using xformers attention
Python version: 3.11.13 | packaged by Anaconda, Inc. | (main, Jun 5 2025, 13:03:15) [MSC v.1929 64 bit (AMD64)]
ComfyUI version: 0.3.68
ComfyUI frontend version: 1.28.8
[Prompt Server] web root: C:\Users\USER\anaconda3\envs\comfyui\Lib\site-packages\comfyui_frontend_package\static
2025-11-06 18:27:17.392247: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
From C:\Users\USER\anaconda3\envs\comfyui\Lib\site-packages\tf_keras\src\losses.py:2976: The name tf.losses.sparse_softmax_cross_entropy is deprecated. Please use tf.compat.v1.losses.sparse_softmax_cross_entropy instead.
Unable to parse pyproject.toml due to lack dependency pydantic-settings, please run 'pip install -r requirements.txt': Illegal character '\n' (at line 3, column 101)
ComfyUI-GGUF: Allowing full torch compile
[MultiGPU Core Patching] Patching mm.soft_empty_cache for Comprehensive Memory Management (VRAM + CPU + Store Pruning)
[MultiGPU Core Patching] Patching mm.get_torch_device, mm.text_encoder_device, mm.unet_offload_device
[MultiGPU DEBUG] Initial current_device: cuda:0
[MultiGPU DEBUG] Initial current_text_encoder_device: cuda:0
[MultiGPU DEBUG] Initial current_unet_offload_device: cpu
[MultiGPU] Initiating custom_node Registration. . .
-----------------------------------------------
custom_node Found Nodes
-----------------------------------------------
ComfyUI-LTXVideo N 0
ComfyUI-Florence2 N 0
ComfyUI_bitsandbytes_NF4 N 0
x-flux-comfyui N 0
ComfyUI-MMAudio N 0
ComfyUI-GGUF Y 18
PuLID_ComfyUI N 0
ComfyUI-WanVideoWrapper N 0
-----------------------------------------------
[MultiGPU] Registration complete. Final mappings: CheckpointLoaderAdvancedMultiGPU, CheckpointLoaderAdvancedDisTorch2MultiGPU, UNetLoaderLP, UNETLoaderMultiGPU, VAELoaderMultiGPU, CLIPLoaderMultiGPU, DualCLIPLoaderMultiGPU, TripleCLIPLoaderMultiGPU, QuadrupleCLIPLoaderMultiGPU, CLIPVisionLoaderMultiGPU, CheckpointLoaderSimpleMultiGPU, ControlNetLoaderMultiGPU, DiffusersLoaderMultiGPU, DiffControlNetLoaderMultiGPU, UNETLoaderDisTorch2MultiGPU, VAELoaderDisTorch2MultiGPU, CLIPLoaderDisTorch2MultiGPU, DualCLIPLoaderDisTorch2MultiGPU, TripleCLIPLoaderDisTorch2MultiGPU, QuadrupleCLIPLoaderDisTorch2MultiGPU, CLIPVisionLoaderDisTorch2MultiGPU, CheckpointLoaderSimpleDisTorch2MultiGPU, ControlNetLoaderDisTorch2MultiGPU, DiffusersLoaderDisTorch2MultiGPU, DiffControlNetLoaderDisTorch2MultiGPU, UnetLoaderGGUFDisTorchMultiGPU, UnetLoaderGGUFAdvancedDisTorchMultiGPU, CLIPLoaderGGUFDisTorchMultiGPU, DualCLIPLoaderGGUFDisTorchMultiGPU, TripleCLIPLoaderGGUFDisTorchMultiGPU, QuadrupleCLIPLoaderGGUFDisTorchMultiGPU, UnetLoaderGGUFDisTorch2MultiGPU, UnetLoaderGGUFAdvancedDisTorch2MultiGPU, CLIPLoaderGGUFDisTorch2MultiGPU, DualCLIPLoaderGGUFDisTorch2MultiGPU, TripleCLIPLoaderGGUFDisTorch2MultiGPU, QuadrupleCLIPLoaderGGUFDisTorch2MultiGPU, UnetLoaderGGUFMultiGPU, UnetLoaderGGUFAdvancedMultiGPU, CLIPLoaderGGUFMultiGPU, DualCLIPLoaderGGUFMultiGPU, TripleCLIPLoaderGGUFMultiGPU, QuadrupleCLIPLoaderGGUFMultiGPU
Nvidia APEX normalization not installed, using PyTorch LayerNorm
Import times for custom nodes:
0.0 seconds: D:\webui-forge\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-GGUF
0.1 seconds: D:\webui-forge\ComfyUI_windows_portable\ComfyUI\custom_nodes\gguf
0.1 seconds: D:\webui-forge\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-MultiGPU
Context impl SQLiteImpl.
Will assume non-transactional DDL.
No target revision found.
Starting server
To see the GUI go to: http://127.0.0.1:8188
got prompt
Using xformers attention in VAE
Using xformers attention in VAE
VAE load device: cuda:0, offload device: cpu, dtype: torch.float16
gguf qtypes: Q6_K (1), F32 (141), IQ4_XS (166), Q5_K (31)
Using mmproj 'mmproj-qwen2.5-vl-7b-it-q4_0.gguf' for 'qwen2.5-vl-7b-it-iq4_xs.gguf'.
gguf qtypes: Q4_0 (192), F32 (291), F16 (34), Q8_0 (2)
[MultiGPU Core Patching] text_encoder_device_patched returning device: cuda:0 (current_text_encoder_device=cuda:0)
CLIP/text encoder model load device: cuda:0, offload device: cpu, current: cpu, dtype: torch.float16
Requested to load WanVAE
0 models unloaded.
loaded partially; 128.00 MB usable, 128.00 MB loaded, 114.00 MB offloaded, lowvram patches: 0
Requested to load QwenImageTEModel_
loaded completely; 3541.80 MB usable, 4469.60 MB loaded, full load: True
[MultiGPU Core Patching] Successfully patched ModelPatcher.partially_load
gguf qtypes: F16 (1093), Q8_0 (30), Q4_K (692), Q5_K (118)
model weight dtype torch.float16, manual cast: None
model_type FLUX
[MultiGPU DisTorch V2] Full allocation string: #cuda:0;8.0;cpu
[MultiGPU DisTorch V2] GGUFModelPatcher missing 'model_patches_models' attribute, using 'model_patches_to' fallback.
Requested to load QwenImage
===============================================
DisTorch2 Model Virtual VRAM Analysis
===============================================
Object Role Original(GB) Total(GB) Virt(GB)
-----------------------------------------------
cuda:0 recip 6.00GB 14.00GB +8.00GB
cpu donor 31.84GB 23.84GB -8.00GB
-----------------------------------------------
model model 11.62GB 3.62GB -8.00GB
[MultiGPU DisTorch V2] Model size (11.62GB) is larger than 90% of available VRAM on: cuda:0 (5.40GB).
[MultiGPU DisTorch V2] To prevent an OOM error, set 'virtual_vram_gb' to at least 6.22.
==================================================
[MultiGPU DisTorch V2] Final Allocation String:
cuda:0,0.6040;cpu,0.2512
==================================================
DisTorch2 Model Device Allocations
==================================================
Device VRAM GB Dev % Model GB Dist %
--------------------------------------------------
cuda:0 6.00 60.4% 3.62 31.2%
cpu 31.84 25.1% 8.00 68.8%
--------------------------------------------------
DisTorch2 Model Layer Distribution
--------------------------------------------------
Layer Type Layers Memory (MB) % Total
--------------------------------------------------
Linear 846 12010.58 100.0%
RMSNorm 241 0.07 0.0%
LayerNorm 241 0.00 0.0%
--------------------------------------------------
DisTorch2 Model Final Device/Layer Assignments
--------------------------------------------------
Device Layers Memory (MB) % Total
--------------------------------------------------
cuda:0 (<0.01%) 484 0.82 0.0%
cuda:0 264 3874.96 32.3%
cpu 580 8134.87 67.7%
--------------------------------------------------
[MultiGPU DisTorch V2] DisTorch loading completed.
[MultiGPU DisTorch V2] Total memory: 12010.65MB
100%|████████████████████████████████████████████████████████████████████████| 3/3 [01:23<00:00, 27.73s/it]
[MultiGPU DisTorch V2] ModelPatcher missing 'model_patches_models' attribute, using 'model_patches_to' fallback.
Requested to load WanVAE
0 models unloaded.
loaded partially; 128.00 MB usable, 128.00 MB loaded, 114.00 MB offloaded, lowvram patches: 0
Prompt executed in 399.30 secondsOther
This is not a duplicate of previous similar bugs with workaround:
- Since my gpu is too old to support sageattention this incompatibility is bypassed (many threads here & on reddit point to this as the cause of a bug that has identical output)
- I never added
--fastduring inference & testing - I tested these flag combinations
--force-fp16--force-fp16 --fp32-unet--force-fp16 --fp32-vae, only--force-fp16 --fp32-unetproduced a normal preview and output, this implies it's likely qwen diffusion model's issue with fp16 inference
As seen in the successful test run, the way to bypass this is --fp32-unet, at tremendous speed cost (~2-5x)
Possibly this is an issue with qwen model's fp16 inference, or implementation of fp16 inference in comfy, or quantization issue (might be from either fp32 or bf16), or specific rtx20 series-only fp16 issues, might need some more narrowing down
(Apologies for having to include some exts dealing with hardware constraints, someone might test on rtx 20 series cards with more vram)
Metadata
Metadata
Assignees
Labels
Potential BugUser is reporting a bug. This should be tested.User is reporting a bug. This should be tested.