Skip to content

Conversation

@imreddyTeja
Copy link
Member

@imreddyTeja imreddyTeja commented Nov 25, 2025

Purpose

closes #1582

To-do

  • There is still SYPD fluctuation. I think this may be due to diagnostics

See buildkite runs here
There is still variation, but the final sypd matches up with the reported final, and it no longer has a sharp SYPD increase in the first few reports

Content

  • Add a wall time callback to the coupler
  • Add a new cli argument: atmos_log_progress, that can be used to enable the ClimaAtmos wall time callback and disable the coupler wall time callback. By default, the coupler wall time callback is used.
  • disable NVTX gc annotations. This increases ttfx, and we are not using it

  • I have read and checked the items on the review checklist.

@juliasloan25
Copy link
Member

There is still variation, but the final sypd matches up with the reported final, and it no longer has a sharp SYPD increase in the first few reports

Thanks for this change! It will be nice to not rely on ClimaAtmos's SYPD calculation. It looks like the last sypd and the reported final sypd are still a little different. Only by 2.5-3% so I think it's okay, but good to be aware of.

Comment on lines +514 to +515
callbacks =
config_dict["atmos_log_progress"] ? (checkpoint_cb,) : (checkpoint_cb, walltime_cb)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid constructing the walltime_cb when we don't use it, you could put the whole section you added above inside an if statement, like this:

if config_dict["atmos_log_progress"]
    <construct walltime_cb>
    callbacks = (checkpoint_cb, walltime_cb)
else
    callbacks = (checkpoint_cb,)

checkpoint_cb = TimeManager.Callback(schedule_checkpoint, Checkpointer.checkpoint_sims)

callbacks = (checkpoint_cb,)
tot_steps = Int(ceil(float(tspan[2] - tspan[1]) / float(Δt_cpl)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment above this section explaining what's being done? Actually it might be better to move this all into a function in the TimeManager module, then we can describe it in the docstring

E.g. we can explain that this outputs a log of simulation progress so far, along with an SYPD estimate. And we can explain how often this gets output - every (power of 2) steps, or every (5% of simulation) steps

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

WallTimeReport is calculated incorrectly

3 participants