-
Notifications
You must be signed in to change notification settings - Fork 945
Optimise fork choice attestation dequeueing #8378
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: unstable
Are you sure you want to change the base?
Conversation
| PersistedForkChoice { | ||
| proto_array: self.proto_array().as_ssz_container(), | ||
| queued_attestations: self.queued_attestations().to_vec(), | ||
| queued_attestations: self.queued_attestations().iter().cloned().collect(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is potentially slower than if we implemented Encode for VecDeque. The conversion from VecDeque to Vec is non-trivial in most cases where the VecDeque doesn't start at index 0:
I'll check fork choice persistence times to make sure this isn't having a substantial impact.
| fc_store, | ||
| proto_array, | ||
| queued_attestations: persisted.queued_attestations, | ||
| queued_attestations: persisted.queued_attestations.into(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This conversion Vec => VecDeque is trivial and cheap, but only happens once on startup anyway.
| use types::{Epoch, Hash256, Slot}; | ||
|
|
||
| fn all_benches(c: &mut Criterion) { | ||
| let num_attestations = 1_500_000_usize / 16; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is ~2 slots worth of attestations on a 1.5M validator network
|
Some required checks have failed. Could you please take a look @michaelsproul? 🙏 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR optimizes the attestation queue management in fork choice by replacing Vec<QueuedAttestation> with VecDeque<QueuedAttestation> and improving the dequeue_attestations algorithm to preserve allocation capacity and avoid unnecessary reallocations.
Key Changes
- Replaced
VecwithVecDequefor the queued attestations data structure across fork choice components - Optimized
dequeue_attestationsfunction usingpartition_point,rotate_left, andsplit_offto maintain capacity while removing old attestations - Made
dequeue_attestationspublic and exported it for external use, including in benchmarks
Reviewed Changes
Copilot reviewed 5 out of 6 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| consensus/fork_choice/src/fork_choice.rs | Core implementation changes: converted queued_attestations to VecDeque, optimized dequeue logic with capacity preservation, made QueuedAttestation fields public, added Debug derive |
| consensus/fork_choice/src/lib.rs | Added public export of dequeue_attestations function |
| consensus/fork_choice/tests/tests.rs | Updated test helper function signature to use VecDeque reference |
| consensus/fork_choice/benches/benches.rs | Added comprehensive benchmark testing dequeue performance with realistic attestation volumes |
| consensus/fork_choice/Cargo.toml | Added criterion dev-dependency and benchmark harness configuration |
| Cargo.lock | Updated with criterion dependency |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
|
Some required checks have failed. Could you please take a look @michaelsproul? 🙏 |
|
This pull request has merge conflicts. Could you please resolve them @michaelsproul? 🙏 |
| let to_pop = queued_attestations.partition_point(|a| a.slot < current_slot); | ||
|
|
||
| // Rotate the entries to remove into the *end* of the vec deque. | ||
| queued_attestations.rotate_left(to_pop); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not change self.queued_attestations into a map of Slot -> Vec< QueuedAttestation> and when that slot arrives we just remove the entire vec? No need to rotate or split_off
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, I'll try it and see how the benchmark looks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will have the same problem of re-allocating memory for the Vecs though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's 4x slower 💀
dequeue_attestations/93750
time: [41.906 ms 41.928 ms 41.953 ms]
Found 8 outliers among 100 measurements (8.00%)
5 (5.00%) high mild
3 (3.00%) high severe
Impl here: michaelsproul@daf3617.
One of the problems is that I didn't try optimising out the concatenation of Vecs. dequeue_attestations could definitely return like a Vec<Vec<_>> or something, but I didn't bother refactoring any further, because I think the lack of memory-reuse just makes it worse than the VecDeque.
|
Putting this back to |
Proposed Changes
Optimise dequeuing of attestations in fork choice by avoiding reallocating the queue after every dequeue.
The included benchmarks shows that this takes ~35% off the runtime of
dequeue_attestationsover multiple slots:The baseline branch with just the benchmark added to
unstablecan be found here: https://github.com/michaelsproul/lighthouse/tree/dequeue-attestation-baseline-benchmarkAdditional Info
I played around with several
VecDeque-based implementations. The one withpartition_pointandrotate_left/split_offis substantially faster than usingpop_frontordrain.The risk of this PR is that the allocation for
queued_attestationsis never shrunk/contracted, so in unusual circumstances it could grow very large. We could consider adding a hard cap on the number of items in the queue, as we've also had this queue grow in an unbounded way during times of memory corruption: