-
Notifications
You must be signed in to change notification settings - Fork 25
Add Predict2.5 inferences in gallery with examples #32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: mahesh/gallery
Are you sure you want to change the base?
Add Predict2.5 inferences in gallery with examples #32
Conversation
- Add gallery.md with Text2World, Image2World, and Video2World examples - Include sample prompts, inputs, JSON structures, and results - Add all required asset files (images and videos)
jingyijin2
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would really like to see more diversity and great results. Will tag people from multiple groups in search of good examples.
|
|
||
| ## Model Details | ||
|
|
||
| | Model Name | Prompt | Input | JSON Structure | Results | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This 6-column table is difficult to navigate. Let's make a table with 2-3 columns with just:
- Prompt
- Input Media (None for t2w, the image for i2w, and the video for v2w)
- Results
Thank you!
| | Text2World | The video is shot from a pedestrian's perspective, showing a matte black van swiftly turning right at a stop sign. It is dusk, windy, and there is a cyclist approaching the intersection, waiting for the van to complete its turn. | Text prompt only | `{"inference_type": "text2world", "name": "TextOutput_3", "prompt": "The video is shot from a pedestrian's perspective, showing a matte black van swiftly turning right at a stop sign. It is dusk, windy, and there is a cyclist approaching the intersection, waiting for the van to complete its turn."}` |  | | ||
| | Text2World | The video is shot from the driver's view, showing several black SUVs steadily proceeding onto the highway. It is morning, bright and clear. A motorcycle merges into the lane ahead, and another vehicle signals to enter from an adjacent on-ramp. | Text prompt only | `{"inference_type": "text2world", "name": "TextOutput_4", "prompt": "The video is shot from the driver's view, showing several black SUVs steadily proceeding onto the highway. It is morning, bright and clear. A motorcycle merges into the lane ahead, and another vehicle signals to enter from an adjacent on-ramp."}` |  | | ||
| | Text2World | The video is shot from the driver's view, showing many red trucks steadily driving down a steep slope. It is evening, cloudy and windy. There is a car approaching from behind, and fallen leaves cover parts of the road, reducing traction and requiring careful control. | Text prompt only | `{"inference_type": "text2world", "name": "TextOutput_1", "prompt": "The video is shot from the driver's view, showing many red trucks steadily driving down a steep slope. It is evening, cloudy and windy. There is a car approaching from behind, and fallen leaves cover parts of the road, reducing traction and requiring careful control."}` |  | | ||
| | Image2World | The video is taken from the perspective of a vehicle's dashboard camera, showing the view of the road ahead. The sky is clear and blue, indicating good weather conditions. The road is lined with green trees and bushes, adding a touch of nature to the urban setting. There are multiple vehicles on the road, including cars and trucks, all moving in the same direction. The vehicles vary in color, with shades of red, blue, and white being prominent. The traffic lights are visible, showing a green light, indicating that it is safe to proceed. The video does not show any accidents or unusual events, just a typical day on the road. | Text prompt +  | `{"inference_type": "image2world", "input_path": "I1_input.jpg", "guidance": 7, "seed": 0, "prompt": "The video is taken from the perspective of a vehicle's dashboard camera, showing the view of the road ahead. The sky is clear and blue, indicating good weather conditions. The road is lined with green trees and bushes, adding a touch of nature to the urban setting. There are multiple vehicles on the road, including cars and trucks, all moving in the same direction. The vehicles vary in color, with shades of red, blue, and white being prominent. The traffic lights are visible, showing a green light, indicating that it is safe to proceed. The video does not show any accidents or unusual events, just a typical day on the road."}` | [IO1](samples/assets/IO1_output.mp4) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the guidance value and seed are all the same for the examples, could we add this information before the table that those are the default values used across the examples?
|
|
||
| | Model Name | Prompt | Input | JSON Structure | Results | | ||
| |------------|--------|-------|----------------|---------| | ||
| | Text2World | The video is shot from a pedestrian's perspective, showing two departing black trucks quickly turn left at an intersection. It is afternoon, with fog reducing visibility. Nearby, another vehicle is attempting to make a right turn while a pedestrian waits to cross. | Text prompt only | `{"inference_type": "text2world", "name": "TextOutput_1", "prompt": "The video is shot from a pedestrian's perspective, showing two departing black trucks quickly turn left at an intersection. It is afternoon, with fog reducing visibility. Nearby, another vehicle is attempting to make a right turn while a pedestrian waits to cross."}` |  | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use these tags to play the video directly instead of including as links:
|
|
||
| Lets see some examples of above models with their details such as prompts, inputs, etc. | ||
|
|
||
| ## Model Details |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please split the examples into 3 subsections for t2w, i2w, and v2w.
- Seperate the examples to view it clearly - Add JSON example files for all models (Text2World, Image2World, Video2World)
- Seperate the examples for models - Replace image syntax  with HTML5 <video> tags - Enables inline video playback on GitHub instead of broken thumbnails - Text2World: All output videos now playable (T1-T5) - Image2World: All output videos now playable (IO1-IO5) - Video2World: Both input and output videos now playable (V1-V5, VO1-VO5) - Fix table formatting: remove double pipes throughout - Update JSON example files
- Set all videos to width 4096px for high-quality display - Convert Image2World input images from markdown to HTML img tags - Standardize all media sizes across the gallery
- Restructured gallery.md to match NVIDIA Cosmos Cookbook style - Added Base Parameters sections for all three models (Text2World, Image2World, Video2World) - Added inference_type and name fields to parameter configurations
- Reorganize common parameters with proper table formatting - Standardize parameter display across Text2World, Image2World, and Video2World
mpatekarhub
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello
Given feedback is checked & accordinly changes are made.
I can see that its visible locally.
Description
Brief description of the changes in this PR.
Type of Change
Changes Made
Testing
Checklist
Additional Notes
Any additional information that reviewers should know.