Skip to content

Conversation

@NuojCheng
Copy link
Collaborator

@NuojCheng NuojCheng commented Nov 8, 2025

Introduce a new config hide_profiler_step_metric to omit metric logging for profiler start/stop and subsequent steps.

This change temporarily removes metric logging for four specific steps:

  1. The step where the profiler is activated.
  2. The step immediately following profiler activation.
  3. The step where the profiler is deactivated.
  4. The step immediately following profiler deactivation.

These steps produce outlier timings because profiler activation/deactivation overhead is currently included in the step time, which skews MFU and other benchmark metrics. This is a short-term fix to address the benchmarking inaccuracies described in b/456828037.

A long-term solution, moving the profiler to a separate subprocess to isolate its overhead, is planned.

Tests

  1. Tested with the command:
    smoke_train skip_first_n_steps_for_profiler=10 profiler_steps=3 profiler=xplane steps=20

The log difference can be viewed here: https://diff.googleplex.com/#key=JIA9QUANVfdW

  1. Tested with the command:
    smoke_train skip_first_n_steps_for_profiler=10 profiler_steps=3 profiler=xplane steps=20 hide_profiler_step_metric=true

The log difference can be viewed here: https://diff.googleplex.com/#key=LbBZkoUsDAW5

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

@NuojCheng NuojCheng force-pushed the chengnuojin-correct-mfu branch 3 times, most recently from 1ba9aa5 to 96b00f2 Compare November 10, 2025 22:58
@NuojCheng NuojCheng force-pushed the chengnuojin-correct-mfu branch 2 times, most recently from c239d2a to 7f8ed85 Compare November 11, 2025 00:44
Copy link
Collaborator

@gagika gagika left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, one minor comment

@NuojCheng NuojCheng force-pushed the chengnuojin-correct-mfu branch from 7f8ed85 to 55557db Compare November 11, 2025 05:06
@copybara-service copybara-service bot merged commit 8fd6f2a into main Nov 11, 2025
39 checks passed
@copybara-service copybara-service bot deleted the chengnuojin-correct-mfu branch November 11, 2025 06:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants