Skip to content

Conversation

@ph-kev
Copy link
Member

@ph-kev ph-kev commented Nov 24, 2025

Examining the checkpoints that are saved by ClimaCoupler-AMIP buildkite run, the biggest cache files are those saved by JLD2.

JLD2 supports compression which can significantly reduce file size. A shuffle filter is applied, because it exploits the patterns in the higher-order bytes, given that the data is from a climate simulation. See the example in HDF5.jl (note that both HDF5 and JLD2 shuffle filters work the same way). Then, a ZstdFilter filter is applied with level 1 to prioritize speed over compression.

➜  checkpoints du -sh *
1.3M    checkpoint_BucketSimulation_94867200.hdf5
67M     checkpoint_cache_1_BucketSimulation_94867200.jld2
54G     checkpoint_cache_1_ClimaAtmosSimulation_94867200.jld2
19M     checkpoint_cache_1_PrescribedIceSimulation_94867200.jld2
21M     checkpoint_cache_1_PrescribedOceanSimulation_94867200.jld2
42M     checkpoint_ClimaAtmosSimulation_94867200.hdf5
5.3M    checkpoint_coupler_fields_1_94867200.jld2
144K    checkpoint_PrescribedIceSimulation_94867200.hdf5

TODO

  • Make a small benchmark to compare performance between no compression and different compression filters (time to checkpoint and size of checkpoints)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants