[FLUX] Add FLUX inference test in CI #1969

wwwjn · 2025-10-30T22:46:15Z

Model test time breakdown:

Further time breakdown load the "flux validation test" is:

model initialization takes 2-3min (Flux model + T5-xxl encoder, CLIPs encoder)
validation run (validation steps=default, each step validates the model, generate image at step 1,5,10): 5min30s

In this PR, following changes are being made to save time:

merge FLUX related test into one time to save FLUX model setup time
Reduce validation step to be 5, using smaller validation set for integration test

tianyu-l · 2025-10-30T22:52:10Z

torchtitan/models/flux/inference/infer.py

        # Create mapping from local indices to global prompt indices
-        global_ids = list(range(global_rank, total_prompts, world_size))
+        if single_prompt_mode:
+            # In single prompt mode, all ranks process the same prompt (index 0)


Sorry I somehow feel this PR is doing some overkill stuff. Would it be simpler if we just provide a truncated version of prompts.txt in test assets, say 1/2 prompts on each rank? Our goal is to make CI job lighter. Users should be fine if they always have to specify prompts in a .txt file? I just feel dealing with two paths doesn't seem to be elegant.

Thanks! That would be easier. I think specify prompts in a .txt file is easier, I will remove the single prompt path. However, there are a minor bug to fix: If the number of prompts < number of ranks, some rank will get 0 prompts. During forward path of T5/clips encoder, the program will hang because FSDP is applied on encoder, and some ranks didn't run forward so all_gather will hang.

I will modify the PR to fix the bug and always using prompts.txt file.

Sounds good! I think we can just error out early in that case, for simplicity.

tianyu-l · 2025-11-20T00:26:08Z

torchtitan/models/flux/inference/prompts.txt

Instead of removing all of them, we can create a test asset with only small portion of it.

tianyu-l

using smaller validation set for integration test

where are you doing this?

also you mentioned some difficulties when combining CP with validation, is it no longer a blocker?

wwwjn · 2025-11-25T21:02:22Z

using smaller validation set for integration test

where are you doing this?

Here I updated the validation.steps = 5 (default is 48 , which is calculated from the validation dataset size)

also you mentioned some difficulties when combining CP with validation, is it no longer a blocker?

The error message is NCCL timeout during generating images, then I figure out it happens occasionally in both CI docker and local dev machine. So I thought it could because each rank takes different inputs thus different generation speed.

tianyu-l

sgtm

wwwjn requested review from fegin, tianyu-l and wconstab as code owners October 30, 2025 22:46

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 30, 2025

wwwjn changed the title ~~[FLUXAdd single prompt mode for FLUX inference and inference CI test~~ [FLUX] Add single prompt mode for FLUX inference and inference CI test Oct 30, 2025

tianyu-l reviewed Oct 30, 2025

View reviewed changes

wwwjn force-pushed the flux-tests branch 2 times, most recently from 882f29e to 8205bfd Compare November 19, 2025 06:51

wwwjn changed the title ~~[FLUX] Add single prompt mode for FLUX inference and inference CI test~~ [FLUX] Add FLUX inference test in CI Nov 19, 2025

tianyu-l reviewed Nov 20, 2025

View reviewed changes

wwwjn added 10 commits November 24, 2025 07:59

add single prompt mode

5085f99

add flux

2ceab76

revert

882ad32

fix format

b0c1602

remove commands

d5a5a88

trigger the CI flow

5f6dbc4

merge flux tests

6321841

lint

9935e96

test FLUX CP + validation

4f4918c

test changes

c446bfa

wwwjn force-pushed the flux-tests branch from afb9b89 to c446bfa Compare November 24, 2025 16:31

fix typo

44fed4d

wwwjn requested a review from tianyu-l November 25, 2025 14:45

tianyu-l reviewed Nov 25, 2025

View reviewed changes

tianyu-l approved these changes Nov 25, 2025

View reviewed changes

wwwjn merged commit cbdb311 into main Nov 25, 2025
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FLUX] Add FLUX inference test in CI #1969

[FLUX] Add FLUX inference test in CI #1969

wwwjn commented Oct 30, 2025 •

edited

Loading

Uh oh!

tianyu-l Oct 30, 2025

Uh oh!

wwwjn Oct 31, 2025

Uh oh!

tianyu-l Oct 31, 2025

Uh oh!

tianyu-l Nov 20, 2025

Uh oh!

tianyu-l left a comment •

edited

Loading

Uh oh!

wwwjn commented Nov 25, 2025

Uh oh!

tianyu-l left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[FLUX] Add FLUX inference test in CI #1969

[FLUX] Add FLUX inference test in CI #1969

Conversation

wwwjn commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tianyu-l Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

wwwjn Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

tianyu-l Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

tianyu-l Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

tianyu-l left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wwwjn commented Nov 25, 2025

Uh oh!

tianyu-l left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

wwwjn commented Oct 30, 2025 •

edited

Loading

tianyu-l left a comment •

edited

Loading