Skip to content

Conversation

@mpatekarhub
Copy link
Collaborator

@mpatekarhub mpatekarhub commented Nov 7, 2025

  • Add gallery.md with Text2World, Image2World, and Video2World examples
  • Include sample prompts, inputs, JSON structures, and results
  • Add all required asset files (images and videos)

Description

Brief description of the changes in this PR.

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Code refactoring
  • Performance improvement

Changes Made

  • Added/updated documentation
  • Added/updated examples
  • Fixed bugs or issues
  • Improved code quality
  • Updated dependencies

Testing

  • I have tested the changes locally
  • Documentation builds successfully
  • Pre-commit hooks pass
  • Examples run without errors
  • Links and references are valid

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

Additional Notes

Any additional information that reviewers should know.

- Add gallery.md with Text2World, Image2World, and Video2World examples
- Include sample prompts, inputs, JSON structures, and results
- Add all required asset files (images and videos)
Copy link
Collaborator

@jingyijin2 jingyijin2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would really like to see more diversity and great results. Will tag people from multiple groups in search of good examples.


## Model Details

| Model Name | Prompt | Input | JSON Structure | Results |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This 6-column table is difficult to navigate. Let's make a table with 2-3 columns with just:

  • Prompt
  • Input Media (None for t2w, the image for i2w, and the video for v2w)
  • Results
    Thank you!

| Text2World | The video is shot from a pedestrian's perspective, showing a matte black van swiftly turning right at a stop sign. It is dusk, windy, and there is a cyclist approaching the intersection, waiting for the van to complete its turn. | Text prompt only | `{"inference_type": "text2world", "name": "TextOutput_3", "prompt": "The video is shot from a pedestrian's perspective, showing a matte black van swiftly turning right at a stop sign. It is dusk, windy, and there is a cyclist approaching the intersection, waiting for the van to complete its turn."}` | ![T3](samples/assets/T3_output.mp4) |
| Text2World | The video is shot from the driver's view, showing several black SUVs steadily proceeding onto the highway. It is morning, bright and clear. A motorcycle merges into the lane ahead, and another vehicle signals to enter from an adjacent on-ramp. | Text prompt only | `{"inference_type": "text2world", "name": "TextOutput_4", "prompt": "The video is shot from the driver's view, showing several black SUVs steadily proceeding onto the highway. It is morning, bright and clear. A motorcycle merges into the lane ahead, and another vehicle signals to enter from an adjacent on-ramp."}` | ![T4](samples/assets/T4_output.mp4) |
| Text2World | The video is shot from the driver's view, showing many red trucks steadily driving down a steep slope. It is evening, cloudy and windy. There is a car approaching from behind, and fallen leaves cover parts of the road, reducing traction and requiring careful control. | Text prompt only | `{"inference_type": "text2world", "name": "TextOutput_1", "prompt": "The video is shot from the driver's view, showing many red trucks steadily driving down a steep slope. It is evening, cloudy and windy. There is a car approaching from behind, and fallen leaves cover parts of the road, reducing traction and requiring careful control."}` | ![T5](samples/assets/T5_output.mp4) |
| Image2World | The video is taken from the perspective of a vehicle's dashboard camera, showing the view of the road ahead. The sky is clear and blue, indicating good weather conditions. The road is lined with green trees and bushes, adding a touch of nature to the urban setting. There are multiple vehicles on the road, including cars and trucks, all moving in the same direction. The vehicles vary in color, with shades of red, blue, and white being prominent. The traffic lights are visible, showing a green light, indicating that it is safe to proceed. The video does not show any accidents or unusual events, just a typical day on the road. | Text prompt + ![I1](samples/assets/I1_input.jpg) | `{"inference_type": "image2world", "input_path": "I1_input.jpg", "guidance": 7, "seed": 0, "prompt": "The video is taken from the perspective of a vehicle's dashboard camera, showing the view of the road ahead. The sky is clear and blue, indicating good weather conditions. The road is lined with green trees and bushes, adding a touch of nature to the urban setting. There are multiple vehicles on the road, including cars and trucks, all moving in the same direction. The vehicles vary in color, with shades of red, blue, and white being prominent. The traffic lights are visible, showing a green light, indicating that it is safe to proceed. The video does not show any accidents or unusual events, just a typical day on the road."}` | [IO1](samples/assets/IO1_output.mp4) |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the guidance value and seed are all the same for the examples, could we add this information before the table that those are the default values used across the examples?


| Model Name | Prompt | Input | JSON Structure | Results |
|------------|--------|-------|----------------|---------|
| Text2World | The video is shot from a pedestrian's perspective, showing two departing black trucks quickly turn left at an intersection. It is afternoon, with fog reducing visibility. Nearby, another vehicle is attempting to make a right turn while a pedestrian waits to cross. | Text prompt only | `{"inference_type": "text2world", "name": "TextOutput_1", "prompt": "The video is shot from a pedestrian's perspective, showing two departing black trucks quickly turn left at an intersection. It is afternoon, with fog reducing visibility. Nearby, another vehicle is attempting to make a right turn while a pedestrian waits to cross."}` | ![T1](samples/assets/T1_output.mp4) |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use these tags to play the video directly instead of including as links:


Lets see some examples of above models with their details such as prompts, inputs, etc.

## Model Details
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please split the examples into 3 subsections for t2w, i2w, and v2w.

Mahesh Patekar added 5 commits November 10, 2025 13:44
- Seperate the examples to view it clearly
- Add JSON example files for all models (Text2World, Image2World, Video2World)
- Seperate the examples for models
- Replace image syntax ![](video.mp4) with HTML5 <video> tags
- Enables inline video playback on GitHub instead of broken thumbnails
- Text2World: All output videos now playable (T1-T5)
- Image2World: All output videos now playable (IO1-IO5)
- Video2World: Both input and output videos now playable (V1-V5, VO1-VO5)
- Fix table formatting: remove double pipes throughout
- Update JSON example files
- Set all videos to width 4096px for high-quality display
- Convert Image2World input images from markdown to HTML img tags
- Standardize all media sizes across the gallery
- Restructured gallery.md to match NVIDIA Cosmos Cookbook style
- Added Base Parameters sections for all three models (Text2World, Image2World, Video2World)
- Added inference_type and name fields to parameter configurations
- Reorganize common parameters with proper table formatting
- Standardize parameter display across Text2World, Image2World, and Video2World
Copy link
Collaborator Author

@mpatekarhub mpatekarhub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello
Given feedback is checked & accordinly changes are made.
I can see that its visible locally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants