From 3a8d225368bdc9e06fc5c85277ef8f5a28e1010e Mon Sep 17 00:00:00 2001 From: sium01 <156007391+sium01@users.noreply.github.com> Date: Thu, 8 May 2025 02:02:03 +0600 Subject: [PATCH] Update README.md MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- ## ✅ Changes Made - ✨ **Added emoji** to make it more visually scannable. - 🧱 **Structured key sections** using bullet lists and tables for better readability. - 📂 **Improved clarity** around file paths and differences between model versions. - 🔗 **Made links more user-friendly** and clearly labeled. - 🎭 **Wrapped up with a poetic closing line** for extra flair. --- cosmos/README.md | 70 ++++++++++++++++++++++++++++++------------------ 1 file changed, 44 insertions(+), 26 deletions(-) diff --git a/cosmos/README.md b/cosmos/README.md index 5730bfd..e039f85 100644 --- a/cosmos/README.md +++ b/cosmos/README.md @@ -1,50 +1,68 @@ -# Nvidia Cosmos Models +# 🌌 Nvidia Cosmos Models for ComfyUI -[Nvidia Cosmos](https://www.nvidia.com/en-us/ai/cosmos/) is a family of "World Models". ComfyUI currently supports specifically the 7B and 14B text to video diffusion models and the 7B and 14B image to video diffusion models. +[Nvidia Cosmos](https://www.nvidia.com/en-us/ai/cosmos/) is a powerful family of **"World Models"** for text-to-video and image-to-video generation. +ComfyUI currently supports the **7B** and **14B** Cosmos models for both **Text2Video** and **Image2Video** diffusion workflows. -## Files to Download +--- -You will first need: +## 📦 Required Files & Setup -#### Text encoder and VAE: +### 🧠 Text Encoder & VAE -[oldt5_xxl_fp8_e4m3fn_scaled.safetensors](https://huggingface.co/comfyanonymous/cosmos_1.0_text_encoder_and_VAE_ComfyUI/tree/main/text_encoders) goes in: ComfyUI/models/text_encoders/ +Download the following files and place them in the specified directories: -[cosmos_cv8x8x8_1.0.safetensors](https://huggingface.co/comfyanonymous/cosmos_1.0_text_encoder_and_VAE_ComfyUI/blob/main/vae/cosmos_cv8x8x8_1.0.safetensors) goes in: ComfyUI/models/vae/ +| File | Destination Folder | +|------|---------------------| +| [`oldt5_xxl_fp8_e4m3fn_scaled.safetensors`](https://huggingface.co/comfyanonymous/cosmos_1.0_text_encoder_and_VAE_ComfyUI/tree/main/text_encoders) | `ComfyUI/models/text_encoders/` | +| [`cosmos_cv8x8x8_1.0.safetensors`](https://huggingface.co/comfyanonymous/cosmos_1.0_text_encoder_and_VAE_ComfyUI/blob/main/vae/cosmos_cv8x8x8_1.0.safetensors) | `ComfyUI/models/vae/` | -Note: oldt5_xxl is not the same as the t5xxl used in flux and other models. -oldt5_xxl is t5xxl 1.0 while the one used in flux and others is t5xxl 1.1 +> ⚠️ `oldt5_xxl` is **not** the same as `t5xxl` used in models like Flux. +> `oldt5_xxl` = T5XXL **1.0**, while Flux uses **1.1**. -#### Video Models +--- -The video models can be found [in safetensors format here.](https://huggingface.co/mcmonkey/cosmos-1.0/tree/main) +### 🎥 Video Diffusion Models -The workflows on this page use [Cosmos-1_0-Diffusion-7B-Text2World.safetensors](https://huggingface.co/mcmonkey/cosmos-1.0/blob/main/Cosmos-1_0-Diffusion-7B-Text2World.safetensors) and [Cosmos-1_0-Diffusion-7B-Video2World.safetensors](https://huggingface.co/mcmonkey/cosmos-1.0/blob/main/Cosmos-1_0-Diffusion-7B-Video2World.safetensors) +All `.safetensors` models go into: +`ComfyUI/models/diffusion_models/` -These files go in: ComfyUI/models/diffusion_models +| Model | Download | +|-------|----------| +| Cosmos 7B - Text to Video | [Cosmos-1_0-Diffusion-7B-Text2World.safetensors](https://huggingface.co/mcmonkey/cosmos-1.0/blob/main/Cosmos-1_0-Diffusion-7B-Text2World.safetensors) | +| Cosmos 7B - Image/Video to Video | [Cosmos-1_0-Diffusion-7B-Video2World.safetensors](https://huggingface.co/mcmonkey/cosmos-1.0/blob/main/Cosmos-1_0-Diffusion-7B-Video2World.safetensors) | -Note: "Text to World" means Text to video and "Video to World" means image/video to video. +> 💡 “Text to World” = **Text ➜ Video** +> “Video to World” = **Image/Video ➜ Video** -If you want the original diffusion models in .pt format instead of the repacked safetensors the official links are: [7B-Text2World](https://huggingface.co/nvidia/Cosmos-1.0-Diffusion-7B-Text2World) [7B-Video2World](https://huggingface.co/nvidia/Cosmos-1.0-Diffusion-7B-Video2World) [14B-Text2World](https://huggingface.co/nvidia/Cosmos-1.0-Diffusion-14B-Text2World) [14B-Video2World](https://huggingface.co/nvidia/Cosmos-1.0-Diffusion-14B-Video2World) +#### 🔁 Optional: Original `.pt` Versions -## Workflows +- [7B - Text2World (.pt)](https://huggingface.co/nvidia/Cosmos-1.0-Diffusion-7B-Text2World) +- [7B - Video2World (.pt)](https://huggingface.co/nvidia/Cosmos-1.0-Diffusion-7B-Video2World) +- [14B - Text2World (.pt)](https://huggingface.co/nvidia/Cosmos-1.0-Diffusion-14B-Text2World) +- [14B - Video2World (.pt)](https://huggingface.co/nvidia/Cosmos-1.0-Diffusion-14B-Video2World) -### Text to Video +--- -This workflow requires the 7B text to video model that you can download above. +## 🧪 Example Workflows -![Example](text_to_video_cosmos_7B.webp) +### 📝 Text ➜ Video (7B) -[Workflow in Json format](text_to_video_cosmos_7B.json) +Generate dynamic video scenes straight from your prompts. -### Image to Video +![Text to Video Example](text_to_video_cosmos_7B.webp) +📄 [Download JSON Workflow](text_to_video_cosmos_7B.json) -This model supports generating a video from 1 or more images. If more than one image is fed it will use them all as a guide and continue the motion. You can also do basic interpolation by setting one or more start_image and end_image which works best if those images are similar to each other. +--- -This workflow requires the 7B image to video model that you can download above. +### 🖼️ Image(s) ➜ Video (7B) -This model is trained primarily on realistic videos but in this example you can see that it also works decently on anime. +- Feed in one or multiple images. +- Smoothly **interpolates motion** if `start_image` and `end_image` are similar. +- Trained on realistic video data, but also handles **anime** fairly well! -![Example](image_to_video_cosmos_7B.webp) +![Image to Video Example](image_to_video_cosmos_7B.webp) +📄 [Download JSON Workflow](image_to_video_cosmos_7B.json) -[Workflow in Json format](image_to_video_cosmos_7B.json) +--- + +✨ With the power of Cosmos + ComfyUI, you're not just prompting—you're animating entire **worlds**.