Add Predict2.5 inferences in gallery with examples #32

mpatekarhub · 2025-11-07T16:58:48Z

Add gallery.md with Text2World, Image2World, and Video2World examples
Include sample prompts, inputs, JSON structures, and results
Add all required asset files (images and videos)

Description

Brief description of the changes in this PR.

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update
Code refactoring
Performance improvement

Changes Made

Testing

I have tested the changes locally
Documentation builds successfully
Pre-commit hooks pass
Examples run without errors
Links and references are valid

Checklist

My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
Any dependent changes have been merged and published

Additional Notes

Any additional information that reviewers should know.

- Add gallery.md with Text2World, Image2World, and Video2World examples - Include sample prompts, inputs, JSON structures, and results - Add all required asset files (images and videos)

jingyijin2

Would really like to see more diversity and great results. Will tag people from multiple groups in search of good examples.

jingyijin2 · 2025-11-09T22:03:14Z

docs/recipes/inference/predict2_5/gallery.md

+
+## Model Details
+
+| Model Name | Prompt | Input | JSON Structure | Results |


This 6-column table is difficult to navigate. Let's make a table with 2-3 columns with just:

Prompt

Input Media (None for t2w, the image for i2w, and the video for v2w)

Results
Thank you!

jingyijin2 · 2025-11-09T22:05:05Z

docs/recipes/inference/predict2_5/gallery.md

+| Text2World | The video is shot from a pedestrian's perspective, showing a matte black van swiftly turning right at a stop sign. It is dusk, windy, and there is a cyclist approaching the intersection, waiting for the van to complete its turn. | Text prompt only | `{"inference_type": "text2world", "name": "TextOutput_3", "prompt": "The video is shot from a pedestrian's perspective, showing a matte black van swiftly turning right at a stop sign. It is dusk, windy, and there is a cyclist approaching the intersection, waiting for the van to complete its turn."}` | ![T3](samples/assets/T3_output.mp4) |
+| Text2World | The video is shot from the driver's view, showing several black SUVs steadily proceeding onto the highway. It is morning, bright and clear. A motorcycle merges into the lane ahead, and another vehicle signals to enter from an adjacent on-ramp. | Text prompt only | `{"inference_type": "text2world", "name": "TextOutput_4", "prompt": "The video is shot from the driver's view, showing several black SUVs steadily proceeding onto the highway. It is morning, bright and clear. A motorcycle merges into the lane ahead, and another vehicle signals to enter from an adjacent on-ramp."}` | ![T4](samples/assets/T4_output.mp4) |
+| Text2World | The video is shot from the driver's view, showing many red trucks steadily driving down a steep slope. It is evening, cloudy and windy. There is a car approaching from behind, and fallen leaves cover parts of the road, reducing traction and requiring careful control. | Text prompt only | `{"inference_type": "text2world", "name": "TextOutput_1", "prompt": "The video is shot from the driver's view, showing many red trucks steadily driving down a steep slope. It is evening, cloudy and windy. There is a car approaching from behind, and fallen leaves cover parts of the road, reducing traction and requiring careful control."}` | ![T5](samples/assets/T5_output.mp4) |
+| Image2World | The video is taken from the perspective of a vehicle's dashboard camera, showing the view of the road ahead. The sky is clear and blue, indicating good weather conditions. The road is lined with green trees and bushes, adding a touch of nature to the urban setting. There are multiple vehicles on the road, including cars and trucks, all moving in the same direction. The vehicles vary in color, with shades of red, blue, and white being prominent. The traffic lights are visible, showing a green light, indicating that it is safe to proceed. The video does not show any accidents or unusual events, just a typical day on the road. | Text prompt + ![I1](samples/assets/I1_input.jpg) | `{"inference_type": "image2world", "input_path": "I1_input.jpg", "guidance": 7, "seed": 0, "prompt": "The video is taken from the perspective of a vehicle's dashboard camera, showing the view of the road ahead. The sky is clear and blue, indicating good weather conditions. The road is lined with green trees and bushes, adding a touch of nature to the urban setting. There are multiple vehicles on the road, including cars and trucks, all moving in the same direction. The vehicles vary in color, with shades of red, blue, and white being prominent. The traffic lights are visible, showing a green light, indicating that it is safe to proceed. The video does not show any accidents or unusual events, just a typical day on the road."}` | [IO1](samples/assets/IO1_output.mp4) |


Since the guidance value and seed are all the same for the examples, could we add this information before the table that those are the default values used across the examples?

jingyijin2 · 2025-11-09T22:25:04Z

docs/recipes/inference/predict2_5/gallery.md

+
+| Model Name | Prompt | Input | JSON Structure | Results |
+|------------|--------|-------|----------------|---------|
+| Text2World | The video is shot from a pedestrian's perspective, showing two departing black trucks quickly turn left at an intersection. It is afternoon, with fog reducing visibility. Nearby, another vehicle is attempting to make a right turn while a pedestrian waits to cross. | Text prompt only | `{"inference_type": "text2world", "name": "TextOutput_1", "prompt": "The video is shot from a pedestrian's perspective, showing two departing black trucks quickly turn left at an intersection. It is afternoon, with fog reducing visibility. Nearby, another vehicle is attempting to make a right turn while a pedestrian waits to cross."}` | ![T1](samples/assets/T1_output.mp4) |


Please use these tags to play the video directly instead of including as links:

jingyijin2 · 2025-11-09T22:26:04Z

docs/recipes/inference/predict2_5/gallery.md

+
+Lets see some examples of above models with their details such as prompts, inputs, etc.
+
+## Model Details


Please split the examples into 3 subsections for t2w, i2w, and v2w.

- Seperate the examples to view it clearly - Add JSON example files for all models (Text2World, Image2World, Video2World)

- Seperate the examples for models - Replace image syntax ![](video.mp4) with HTML5 <video> tags - Enables inline video playback on GitHub instead of broken thumbnails - Text2World: All output videos now playable (T1-T5) - Image2World: All output videos now playable (IO1-IO5) - Video2World: Both input and output videos now playable (V1-V5, VO1-VO5) - Fix table formatting: remove double pipes throughout - Update JSON example files

- Set all videos to width 4096px for high-quality display - Convert Image2World input images from markdown to HTML img tags - Standardize all media sizes across the gallery

- Restructured gallery.md to match NVIDIA Cosmos Cookbook style - Added Base Parameters sections for all three models (Text2World, Image2World, Video2World) - Added inference_type and name fields to parameter configurations

- Reorganize common parameters with proper table formatting - Standardize parameter display across Text2World, Image2World, and Video2World

mpatekarhub

Hello
Given feedback is checked & accordinly changes are made.
I can see that its visible locally.

Add Predict2.5 inference gallery with examples

2718a95

- Add gallery.md with Text2World, Image2World, and Video2World examples - Include sample prompts, inputs, JSON structures, and results - Add all required asset files (images and videos)

jingyijin2 reviewed Nov 9, 2025

View reviewed changes

Mahesh Patekar added 5 commits November 10, 2025 13:44

Fix typo and formatting in Predict2.5 gallery

6244659

- Seperate the examples to view it clearly - Add JSON example files for all models (Text2World, Image2World, Video2World)

Update video and image sizes to full resolution

7e0e9e8

- Set all videos to width 4096px for high-quality display - Convert Image2World input images from markdown to HTML img tags - Standardize all media sizes across the gallery

Add Predict2.5 gallery page with enhanced layout

1477a64

- Restructured gallery.md to match NVIDIA Cosmos Cookbook style - Added Base Parameters sections for all three models (Text2World, Image2World, Video2World) - Added inference_type and name fields to parameter configurations

Improve gallery.md formatting and organization

3152870

- Reorganize common parameters with proper table formatting - Standardize parameter display across Text2World, Image2World, and Video2World

mpatekarhub commented Dec 4, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Predict2.5 inferences in gallery with examples #32

Add Predict2.5 inferences in gallery with examples #32

Uh oh!

mpatekarhub commented Nov 7, 2025 •

edited

Loading

Uh oh!

jingyijin2 left a comment

Uh oh!

jingyijin2 Nov 9, 2025

Uh oh!

jingyijin2 Nov 9, 2025

Uh oh!

jingyijin2 Nov 9, 2025

Uh oh!

jingyijin2 Nov 9, 2025

Uh oh!

mpatekarhub left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		## Model Details

		\| Model Name \| Prompt \| Input \| JSON Structure \| Results \|


		Lets see some examples of above models with their details such as prompts, inputs, etc.

		## Model Details

Add Predict2.5 inferences in gallery with examples #32

Are you sure you want to change the base?

Add Predict2.5 inferences in gallery with examples #32

Uh oh!

Conversation

mpatekarhub commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Changes Made

Testing

Checklist

Additional Notes

Uh oh!

jingyijin2 left a comment

Choose a reason for hiding this comment

Uh oh!

jingyijin2 Nov 9, 2025

Choose a reason for hiding this comment

Uh oh!

jingyijin2 Nov 9, 2025

Choose a reason for hiding this comment

Uh oh!

jingyijin2 Nov 9, 2025

Choose a reason for hiding this comment

Uh oh!

jingyijin2 Nov 9, 2025

Choose a reason for hiding this comment

Uh oh!

mpatekarhub left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mpatekarhub commented Nov 7, 2025 •

edited

Loading