-
Notifications
You must be signed in to change notification settings - Fork 2.2k
perf: parallelize HashedPostStateSorted::from_reverts hashing/sorting #20148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
perf: parallelize HashedPostStateSorted::from_reverts hashing/sorting #20148
Conversation
fad60a2 to
b2011f6
Compare
| cfg!(feature = "parallel-from-reverts") && storages.len() >= PARALLEL_THRESHOLD; | ||
|
|
||
| let hashed_storages = if use_parallel { | ||
| #[cfg(feature = "parallel-from-reverts")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In other places we feature-flag rayon usage behind std, imo we could do that here and not introduce a new feature flag.
reth/crates/trie/sparse/src/state.rs
Lines 264 to 276 in 245cca7
| #[cfg(not(feature = "std"))] | |
| // If nostd then serially reveal storage proof nodes for each storage trie | |
| { | |
| for (account, storage_subtree) in storages { | |
| self.reveal_decoded_storage_multiproof(account, storage_subtree)?; | |
| } | |
| Ok(()) | |
| } | |
| #[cfg(feature = "std")] | |
| // If std then reveal storage proofs in parallel | |
| { |
|
@yongkangc let's rbc this as well to confirm there's no regression 🙏 |
|
hey @Andrurachi thanks for the pr, could u rebase / merge main in so that we can bench this as well? the reason is because we recently introduced a breaking change to MerkleChangeSets that requires merge of main for benching |
Parallelizes the hashing/sorting step using Rayon when account count exceeds a threshold (2500). This alleviates CPU bottlenecks during large state reverts or deep reorgs. Closes paradigmxyz#20049
b2011f6 to
1843db5
Compare
|
@yongkangc I've rebased on main to bring the breaking changes in MerkleChangeSets. The PR is compiling and benchmarks are passing. I'm using Regarding moving the feature flag to std: I tried replacing parallel-from-reverts with the standard default = ["std"] pattern, but I hit a wall of Zepter errors regarding feature propagation. To avoid pushing broken config or making you wait more than needed, I've kept parallel-from-reverts for this push. If required, could you please provide some advise on the correct Cargo.toml setup to satisfy Zepter for this specific crate? |
|
Hey @yongkangc, quick question before I push a ready-to-merge commit: Following the suggestion, I moved the parallelized path behind One point to confirm: do you want the benchmarks comparing sequential vs parallel included in the final commit, or should they be removed before merge? Also note: because running benches without Happy to push immediately once I have your preference. |
Summary
Introduces a parallelized path for
HashedPostStateSorted::from_revertsusingrayon, gated behind a newparallel-from-revertsfeature flag.This optimization targets CPU bottlenecks caused by deep reorgs or blocks with skewed storage distributions. It implements an Account Count Threshold (2,500) to ensure no regressions on small or standard blocks.
Motivation
Closes #20049
Processing large state reverts involves two distinct phases:
While the DB walk must remain sequential, the sorting phase becomes a bottleneck when thousands of accounts (or accounts with massive storage slots) need processing. This PR parallelizes the CPU-bound sorting phase, reducing wall-time for heavy blocks.
Implementation Details
parallel-from-reverts(opt-in).par_sort_unstable) introduced too much overhead for typical slot counts (< 1,000).< 2,500: Always Sequential (avoids Rayon overhead).>= 2,500: Parallel (distributes load, handles skewed accounts).Benchmarks
1. Micro-Benchmarks (
sorting_par_exp)Measured purely the sorting/hashing overhead (excluding DB reads).
Acc_LowAcc_MedAcc_HighSkewed_10kSkewed Distribution: 95% accounts have 4 slots, 5% have 2,000 slots.*
2. Integration Benchmarks (
integration_bench)Measured full lifecycle: DB Read -> Hashing -> Sorting -> Allocation.
SmallLarge_UniformLarge_SkewedSince the total runtime is dominated by DB I/O, this actually represents a solid optimization of the available CPU-bound work. The threshold ensures no regression for small blocks.
Checklist
rayondependency (optional).parallel-from-revertsfeature flag.