Skip to content

Conversation

@sanity
Copy link
Collaborator

@sanity sanity commented Nov 7, 2025

Summary

  • add fallback target computation for UPDATE propagation
  • fetch contracts locally when subscribers missing
  • improve wasm executor logging and store error reporting

Testing

  • cargo test operations::update
  • cargo test wasm_runtime::tests::contract_metering
  • cargo test --test operations update_propagates_fallback

@sanity
Copy link
Collaborator Author

sanity commented Nov 7, 2025

@claude Please review PR

@claude
Copy link

claude bot commented Nov 7, 2025

Claude finished @sanity's task —— View job


PR Review: fix: add update fallback propagation

I've reviewed the changes in this PR. Here's my comprehensive analysis:


Summary of Changes

This PR implements three main improvements:

  1. Fallback target computation for UPDATE propagation - Ensures UPDATEs can propagate even when no subscribers exist
  2. Contract fetching during SeekNode fallback - Allows peers to fetch contracts locally when forwarding fails
  3. Improved logging - Better diagnostics for UPDATE operations and store errors

Detailed Review by File

crates/core/src/operations/update.rs (+218/-16)

Strengths:

  • Fallback propagation logic (lines 252-258, 429-435, 621-628): The new compute_update_fallback_targets method provides a good safety net when subscriber lists are empty. Uses k_closest_potentially_caching with k=3, which is reasonable.
  • Contract fetching fallback (lines 494-587): Excellent recovery mechanism when SeekNode can't find a forwarding target. Attempts to fetch the contract via GET operation and apply the update locally.
  • Enhanced error logging (lines 336-348): The detailed error logging with subscriber lists, candidates, and connection count will be very helpful for debugging network issues.
  • Self-subscription filtering (lines 787-816): The logic to allow the sender to remain in the broadcast list when appropriate (for local auto-subscribe) is well-considered.

Concerns & Questions:

  1. Potential infinite recursion (lines 503-514): When SeekNode fails to find a forwarding target, it triggers a GET operation which might itself fail with NoCachingPeers. Is there a risk of circular dependencies or retry storms here?

    let get_op = get::start_op(*key, true, false);
    if let Err(fetch_err) = get::request_get(op_manager, get_op, fetch_skip).await {
        // Falls through to error - but could GET also trigger an UPDATE?
  2. Fallback target selection (lines 852-884): The fallback uses k_closest_potentially_caching with k=3. Questions:

    • Why k=3 specifically? This seems arbitrary and isn't documented.
    • Should this be configurable or proportional to network size?
    • What happens in very small networks (< 3 peers)?
  3. Duplicat filtering logic (lines 787-816): The double filtering for allow_self seems complex and potentially error-prone:

    .filter(|pk| {
        if &pk.peer == sender {
            allow_self
        } else {
            true
        }
    })
    .filter(|pk| {
        if let Some(self_peer) = &self_peer {
            if &pk.peer == self_peer {
                allow_self
            } else {
                true
            }
        } else {
            true
        }
    })

    This could be simplified to a single filter with clearer logic.

  4. Missing validation (line 516): After fetching the contract in the SeekNode fallback path, there's a check if super::has_contract(op_manager, *key).await? but the GET operation already succeeded. Is this redundant? Or is there a race condition concern?

  5. Error handling inconsistency: The SeekNode fallback path changes an error to a warning (line 494: tracing::warn! instead of tracing::error!) and then attempts recovery. This is good, but should be consistent - why does the RequestUpdate path (line 356) still use tracing::error! without attempting the same recovery?

Recommendations:

  • Add comments explaining why k=3 was chosen for fallback targets
  • Simplify the double-filter logic for self-subscription
  • Consider unifying the recovery strategy across RequestUpdate and SeekNode message types
  • Add a circuit breaker or max-retry limit to prevent potential retry storms in the GET fallback path

crates/core/src/operations/put.rs (+16/-0)

Strengths:

  • Good addition of trace-level logging for PUT operation state tracking
  • Helps with debugging upstream forwarding behavior

Concerns:
None significant - these are pure logging additions.


crates/core/src/contract/executor/runtime.rs (+5/-0)

Strengths:

  • Useful info-level logging when contract state is updated

Minor suggestion:

  • Consider adding the previous state size to make the log more informative:
    tracing::info!(
        "Contract state updated for {key}, prev_size={}, new_size={}",
        current_state.as_ref().len(),
        new_state.as_ref().len()
    );

crates/core/src/wasm_runtime/store.rs (+10/-9)

Strengths:

  • Appropriate replacement of eprintln! with tracing::error! for better observability
  • Maintains error context with file/line information

Concerns:
None - this is a straightforward logging improvement.


Test files (contract_metering.rs, mod.rs)

Strengths:

  • Replacing println! with info! is correct for production code

Concerns:
None - these are test-only changes.


Testing Recommendations

Based on the PR description, the following tests are mentioned:

  • cargo test operations::update
  • cargo test wasm_runtime::tests::contract_metering
  • cargo test --test operations update_propagates_fallback

Additional test scenarios to consider:

  1. Test UPDATE propagation in a network with no subscribers but with peers (fallback scenario)
  2. Test the SeekNode contract fetch fallback path with various network configurations
  3. Test self-subscription behavior (sender should receive updates when appropriate)
  4. Test UPDATE behavior in very small networks (1-2 peers)
  5. Edge case: UPDATE arrives while contract is still initializing

Overall Assessment

Positive aspects:

  • Well-structured fallback logic that improves UPDATE reliability
  • Thoughtful consideration of edge cases (no subscribers, contract not found)
  • Excellent diagnostic logging improvements
  • The contract fetching fallback in SeekNode is particularly valuable

Areas for improvement:

  • Documentation of magic numbers (k=3)
  • Potential simplification of filtering logic
  • Clarification of retry/recursion behavior in fallback paths
  • Consider consistency in error recovery strategies across message types

Recommendation: This PR significantly improves UPDATE propagation reliability and observability. The changes are generally well-implemented. Address the concerns around potential infinite recursion, document the k=3 choice, and consider simplifying the filtering logic before merging.


@sanity sanity requested a review from iduartgomez November 8, 2025 01:40
@sanity
Copy link
Collaborator Author

sanity commented Nov 8, 2025

Addressed Claude's review points:

  1. Added UPDATE_FALLBACK_TARGET_LIMIT so the fallback fan-out is documented and easy to tune.
  2. When RequestUpdate runs out of forward targets we now attempt the same GET fallback as SeekNode, including skipping ourselves in the GET skip-list and reusing the broadcast logic once the contract is fetched.
  3. Simplified the allow-self filtering in get_broadcast_targets_update and documented why we double-check has_contract after a GET completes.

Tests: cargo test -p freenet operations::update

.ring
.closest_potentially_caching(key, skip_peers.as_slice())
.or_else(|| {
op_manager
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does this make a difference? if we get no closest_potentially_caching why we would get anything calling call k_closest_potentially_caching?

@iduartgomez
Copy link
Collaborator

What is the end goal of this PR? If we are pushing an update to a peer which is not subscribed to it what will happen? Won't the update be ignored anyway? Are we caching preemptively on that peer receiving the update?

Is the goal so update op does not return an error so the client applications will be receiving a positive result for the update? If so, this seems a very complicated way of achieving that, just mark the op as completed and return a message so it will be picked by the client apps waiting for responses and be done with it.

Base automatically changed from stack/session-actor-race to main November 9, 2025 12:55
@sanity
Copy link
Collaborator Author

sanity commented Nov 9, 2025

[Codex] @iduartgomez you were right to be skeptical about the GET-based fallback. I reverted those changes so UPDATE now behaves like the rest of the routing ops again: if we cannot reach a peer that already has the contract, we surface RingError::NoCachingPeers instead of caching it locally. While touching the file I kept the richer diagnostics (subscriber/candidate logging and the allow-self filter) so we still gain better observability without changing semantics.

Could you take another look at this simplified version and confirm it matches the original intent? Tests: cargo test -p freenet operations::update.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why order was changed here? What is fixed this way?

@iduartgomez
Copy link
Collaborator

let's rebase on main so we only see this pr changes

@iduartgomez
Copy link
Collaborator

@claude try to rebase on main, prefer main changes

@claude
Copy link

claude bot commented Nov 9, 2025

Claude finished @iduartgomez's task —— View job


Rebasing PR onto main

I'm unable to complete this request due to permission limitations.

What I tried

  • Check git status
  • Attempt to rebase onto main

Issue encountered

Git rebase operations require additional approval that I don't currently have access to. Additionally, rebasing typically requires a force-push to update the remote branch, which is outside my capabilities for security reasons.

How to proceed

You'll need to perform the rebase manually. Here are the commands:

# Fetch the latest main
git fetch origin main

# Rebase onto main, preferring main's changes in conflicts
git rebase origin/main -X theirs

# If there are conflicts that can't be auto-resolved:
# - Resolve them manually
# - Use: git add <resolved-files>
# - Continue with: git rebase --continue

# Force push the rebased branch (required after rebase)
git push origin stack/update-fallback --force-with-lease

The --force-with-lease flag is safer than --force as it will prevent overwriting work if someone else has pushed to the branch.

Alternative approach

If you prefer not to rebase, you could merge main into this branch instead:

git fetch origin main
git merge origin/main -X theirs
git push origin stack/update-fallback

This wouldn't require a force-push and I could help with that if you'd like.


@iduartgomez iduartgomez force-pushed the stack/update-fallback branch from c3fdf36 to e161690 Compare November 9, 2025 18:36
@iduartgomez iduartgomez added this pull request to the merge queue Nov 9, 2025
Merged via the queue into main with commit a413514 Nov 9, 2025
11 checks passed
@iduartgomez iduartgomez deleted the stack/update-fallback branch November 9, 2025 19:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants