-
Notifications
You must be signed in to change notification settings - Fork 324
feat: switch to QUIC multipath #3381
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
dignifiedquire
wants to merge
181
commits into
main
Choose a base branch
from
feat-multipath
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
+5,902
−8,036
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
|
Documentation for this PR has been generated and is available at: https://n0-computer.github.io/iroh/pr/3381/docs/iroh/ Last updated: 2025-12-02T20:47:23Z |
72cb071 to
db712c0
Compare
4827e62 to
946f71c
Compare
## Description For some odd reason AuthenticationError was given a branch that had quinn::ConnectionError inside it. But logically the connection error has nothing to do with authentication. That should have been a red flag. AuthenticationError itself is almost always wrapped in ConnectingError, which does correctly have a quinn::ConnectionError branch. And the few places where it was directly returned to the user it arguably **should** have been wrapped in a ConnectingError. The result of this is that before this fix you would get a very confusing authentication error if the remote client closed the connection right at the same time as the handshake completed for it (yes, this is difficult to do at the right time, and it only happens for the client since that completes the handshake one network hop before the server). But this was no authentication error, it is simply a closed connection. The new error structure captures this correctly. Similarly the InternalConsistencyError belongs on the ConnectingError. Though that one should be impossible to produce since it's supposed to be an invariant. ## Breaking Changes - `AuthenticationError` loses the `ConnectionError` and `InternalConsistencyError` branches. Both are on the `ConnectingError` instead. - `OutgoingZeroRttConnection::handshake_completed` and `IncomingZeroRttConnection` now return a `ConnectingError` instead of `AuthenticationError`. ## Notes & open questions While I've managed to trigger this error somewhat occasionally using a program that races the closing with the completed handshake in a very tight loop **before** applying this fix. I'm completely failing to trigger it since applying this fix, so I can admire the beautiful new error reporting this fix should give. It's a bit confusing. **edit**: it **is** confusing. But it is correct. Because my flaky failure does always close the connection *after* it completes the handshake. So ALPN and EndpointId are always available. And then you yield a valid `Connection` when awaiting an `Incoming`, it just is already closed. This fix could also be made against main. I believe the same commit should be able to be cherry-picked and will probably apply fairly clean. Do you think I should make it against main? ## Change checklist <!-- Remove any that are not relevant. --> - [x] Self-review. - [x] All breaking changes documented. - [x] List all breaking changes in the above "Breaking Changes" section.
There are issues for these already
…3664) ## Description This means these tests also work when nextest run locally. ## Breaking Changes <!-- Optional, if there are any breaking changes document them, including how to migrate older code. --> ## Notes & open questions I'm not sure why this wasn't done when the ci profile override was chosen. What am I missing? ## Change checklist <!-- Remove any that are not relevant. --> - [x] Self-review.
## Description Bumps netwatch and netdev, to remove duplicate dependency on both netdev@0.38 and netdev@0.39. ## Breaking Changes <!-- Optional, if there are any breaking changes document them, including how to migrate older code. --> ## Notes & open questions <!-- Any notes, remarks or open questions you have to make about the PR. --> ## Change checklist <!-- Remove any that are not relevant. --> - [ ] Self-review. - [ ] Documentation updates following the [style guide](https://rust-lang.github.io/rfcs/1574-more-api-documentation-conventions.html#appendix-a-full-conventions-text), if relevant. - [ ] Tests if relevant. - [ ] All breaking changes documented. - [ ] List all breaking changes in the above "Breaking Changes" section. - [ ] Open an issue or PR on any number0 repos that are affected by this breaking change. Give guidance on how the updates should be handled or do the actual updates themselves. The major ones are: - [ ] [`quic-rpc`](https://github.com/n0-computer/quic-rpc) - [ ] [`iroh-gossip`](https://github.com/n0-computer/iroh-gossip) - [ ] [`iroh-blobs`](https://github.com/n0-computer/iroh-blobs) - [ ] [`dumbpipe`](https://github.com/n0-computer/dumbpipe) - [ ] [`sendme`](https://github.com/n0-computer/sendme)
This is a next step into the world of configurable transports. We now allow disabling the IP based transports entirely. Internally this starts to prepare for a world where the user can configure multiple different transports, IP, relay and others in the future. Closes #2957
## Description Remove the test-only `Endpoint::path_selection` API and instead use `Endpoint::clear_ip_transports` for `PathSelection::RelayOnly `, now that this public API was added in #3651. ## Breaking Changes <!-- Optional, if there are any breaking changes document them, including how to migrate older code. --> ## Notes & open questions <!-- Any notes, remarks or open questions you have to make about the PR. --> ## Change checklist <!-- Remove any that are not relevant. --> - [ ] Self-review. - [ ] Documentation updates following the [style guide](https://rust-lang.github.io/rfcs/1574-more-api-documentation-conventions.html#appendix-a-full-conventions-text), if relevant. - [ ] Tests if relevant. - [ ] All breaking changes documented. - [ ] List all breaking changes in the above "Breaking Changes" section. - [ ] Open an issue or PR on any number0 repos that are affected by this breaking change. Give guidance on how the updates should be handled or do the actual updates themselves. The major ones are: - [ ] [`quic-rpc`](https://github.com/n0-computer/quic-rpc) - [ ] [`iroh-gossip`](https://github.com/n0-computer/iroh-gossip) - [ ] [`iroh-blobs`](https://github.com/n0-computer/iroh-blobs) - [ ] [`dumbpipe`](https://github.com/n0-computer/dumbpipe) - [ ] [`sendme`](https://github.com/n0-computer/sendme)
…moteState (#3673) ## Description Renames: * renamed `endpoint_map` -> `remote_map`, `EndpointMap` -> `RemoteMap`, `endpoint_state` -> `remote_state`, `EndpointStateActor` -> `RemoteStateActor` Moved: * moved `path_state` module under `remote_state` (prev `endpoint_state`), its items are used only there and nowhere else ## Breaking Changes <!-- Optional, if there are any breaking changes document them, including how to migrate older code. --> ## Notes & open questions <!-- Any notes, remarks or open questions you have to make about the PR. --> ## Change checklist <!-- Remove any that are not relevant. --> - [ ] Self-review. - [ ] Documentation updates following the [style guide](https://rust-lang.github.io/rfcs/1574-more-api-documentation-conventions.html#appendix-a-full-conventions-text), if relevant. - [ ] Tests if relevant. - [ ] All breaking changes documented. - [ ] List all breaking changes in the above "Breaking Changes" section. - [ ] Open an issue or PR on any number0 repos that are affected by this breaking change. Give guidance on how the updates should be handled or do the actual updates themselves. The major ones are: - [ ] [`quic-rpc`](https://github.com/n0-computer/quic-rpc) - [ ] [`iroh-gossip`](https://github.com/n0-computer/iroh-gossip) - [ ] [`iroh-blobs`](https://github.com/n0-computer/iroh-blobs) - [ ] [`dumbpipe`](https://github.com/n0-computer/dumbpipe) - [ ] [`sendme`](https://github.com/n0-computer/sendme) --------- Co-authored-by: Floris Bruynooghe <flub@n0.computer>
## Description Merges main and adapts for the changes from #3619 ## Breaking Changes <!-- Optional, if there are any breaking changes document them, including how to migrate older code. --> ## Notes & open questions <!-- Any notes, remarks or open questions you have to make about the PR. --> ## Change checklist <!-- Remove any that are not relevant. --> - [ ] Self-review. - [ ] Documentation updates following the [style guide](https://rust-lang.github.io/rfcs/1574-more-api-documentation-conventions.html#appendix-a-full-conventions-text), if relevant. - [ ] Tests if relevant. - [ ] All breaking changes documented. - [ ] List all breaking changes in the above "Breaking Changes" section. - [ ] Open an issue or PR on any number0 repos that are affected by this breaking change. Give guidance on how the updates should be handled or do the actual updates themselves. The major ones are: - [ ] [`quic-rpc`](https://github.com/n0-computer/quic-rpc) - [ ] [`iroh-gossip`](https://github.com/n0-computer/iroh-gossip) - [ ] [`iroh-blobs`](https://github.com/n0-computer/iroh-blobs) - [ ] [`dumbpipe`](https://github.com/n0-computer/dumbpipe) - [ ] [`sendme`](https://github.com/n0-computer/sendme) --------- Co-authored-by: Rüdiger Klaehn <rklaehn@protonmail.com> Co-authored-by: Friedel Ziegelmayer <me@dignifiedquire.com>
## Description Avoid potentially busy looping in a tokio task. I think this blocking leads to tokio not being able to close the runtime properly.
## Description Fixes #3642 This moves discovery handling fully into the `EndpointStateActor`. The pub(crate) interface to trigger discovery and get a EndpointMappedAddr is now `Magicsock::resolve_remote`, which sends the provided addresses to the EndpointStateActor. The actor starts discovery if it does not have a selected path and if discovery is not running. It returns either immediately if there are any known paths, or waits for discovery to produce at least one result or an error. Once this returns, `resolve_remote` returns either with a EndpointMappedAddr or with the discovery error. This means the current behavior is kept: We only start `quinn::Endpoint::connect` once we have at least one transport address for the remote. If not, we return the discovery error immediately from `iroh::Endpoint::connect`. This opens the door for us to easily tune when to run discovery in other siutations, e.g. when all available paths to a remote are closed. However, for now this PR still only starts discovery when `Endpoint::connect` is called and no path is selected at the moment. ## Breaking Changes <!-- Optional, if there are any breaking changes document them, including how to migrate older code. --> ## Notes & open questions <!-- Any notes, remarks or open questions you have to make about the PR. --> ## Change checklist <!-- Remove any that are not relevant. --> - [ ] Self-review. - [ ] Documentation updates following the [style guide](https://rust-lang.github.io/rfcs/1574-more-api-documentation-conventions.html#appendix-a-full-conventions-text), if relevant. - [ ] Tests if relevant. - [ ] All breaking changes documented. - [ ] List all breaking changes in the above "Breaking Changes" section. - [ ] Open an issue or PR on any number0 repos that are affected by this breaking change. Give guidance on how the updates should be handled or do the actual updates themselves. The major ones are: - [ ] [`quic-rpc`](https://github.com/n0-computer/quic-rpc) - [ ] [`iroh-gossip`](https://github.com/n0-computer/iroh-gossip) - [ ] [`iroh-blobs`](https://github.com/n0-computer/iroh-blobs) - [ ] [`dumbpipe`](https://github.com/n0-computer/dumbpipe) - [ ] [`sendme`](https://github.com/n0-computer/sendme)
## Description * fix idle timeout clear condition (previously it would hot loop) * fix hot loop when local_addrs watchable becomes disconnected during shutdown * when sending a datagram fails in the transports sender, include the dst address in the error message * do not break the RemoteStateActor when sending a datagram fails ## Breaking Changes <!-- Optional, if there are any breaking changes document them, including how to migrate older code. --> ## Notes & open questions <!-- Any notes, remarks or open questions you have to make about the PR. --> ## Change checklist <!-- Remove any that are not relevant. --> - [ ] Self-review. - [ ] Documentation updates following the [style guide](https://rust-lang.github.io/rfcs/1574-more-api-documentation-conventions.html#appendix-a-full-conventions-text), if relevant. - [ ] Tests if relevant. - [ ] All breaking changes documented. - [ ] List all breaking changes in the above "Breaking Changes" section. - [ ] Open an issue or PR on any number0 repos that are affected by this breaking change. Give guidance on how the updates should be handled or do the actual updates themselves. The major ones are: - [ ] [`quic-rpc`](https://github.com/n0-computer/quic-rpc) - [ ] [`iroh-gossip`](https://github.com/n0-computer/iroh-gossip) - [ ] [`iroh-blobs`](https://github.com/n0-computer/iroh-blobs) - [ ] [`dumbpipe`](https://github.com/n0-computer/dumbpipe) - [ ] [`sendme`](https://github.com/n0-computer/sendme) --------- Co-authored-by: Philipp Krüger <philipp.krueger1@gmail.com>
## Description This reverts a change from this PR: #3384 I originally thought I could make this test more reliable by pausing the tokio time across the `tokio::time::timeout` calls, but it turns out that actually makes the test *more* flaky: - When time is paused, the timeout will immediately fire once the tokio runtime has no more CPU work to do. - It's possible that there's no CPU work to do anymore, while there's something else that is actually still doing work, e.g. networking. - Before the `ActiveRelayActor` finishes its `run_connected` loop, it will call `client_sink.close().await`, which will do actual I/O. When the tokio runtime is paused at that moment, it'll immediately trigger the test's timeout. ## Notes & open questions I couldn't reproduce this problem even across a couple thousand runs of the test locally. I'm not super confident that this fixes things, but I've analyzed the logs and this seems to be the most likely thing that's happening to me. Closes #3613 ## Change checklist <!-- Remove any that are not relevant. --> - [x] Self-review.
## Description This switches from the old DISCO to the so-new-it-doesnt-exit-yet QUIC NAT Traversal. ## Breaking Changes Nothing visible? Maybe? ## Notes & open questions The QUIC NAT Traversal API doesn't exist yet, so this won't even build on any machine that's not mine. I've locally patched in the dummy methods that I use. --------- Co-authored-by: dignifiedquire <me@dignifiedquire.com> Co-authored-by: Frando <frando@unbiskant.org>
## Description This adds the conecpt of hooks to the iroh endpoint. `Hooks` are structs implementing the `EndpointHooks` trait and are used to intercept the establishment of connections. Multiple hooks can be added to the endpoint, and they will be invoked for each hook in the order they have been added to the endpoint. Currently there's two methods on the `EndpointHooks` trait: * `before_connect` is invoked before an outgoing connection is started. * `after_handshake` is invoked for incoming and outgoing connections once the TLS handshake has completed Both methods return an `Outcome`, which can either be `Reject` or `Accept`. If any hook returns `Reject`, the connection or connection attempt will be rejected. The PR also adds `ConnectionInfo`, which is a struct that has information about a connection, but does not keep the connection itself alive. It allows to inspect stats and paths, and there's a `closed` method that returns a future which completes once the connection closes (without keeping the connection alive). The PR includes two examples: * `auth-hook` implements authentication for iroh protocols through a middleware and a separate authentication protocol. Individual protocols don't need to be aware of authentication at all. * `monitor-connnections` monitors incoming and outgoing connections and prints connection stats once a connection closes. ## Breaking Changes <!-- Optional, if there are any breaking changes document them, including how to migrate older code. --> ## Notes & open questions <!-- Any notes, remarks or open questions you have to make about the PR. --> ## Change checklist <!-- Remove any that are not relevant. --> - [ ] Self-review. - [ ] Documentation updates following the [style guide](https://rust-lang.github.io/rfcs/1574-more-api-documentation-conventions.html#appendix-a-full-conventions-text), if relevant. - [ ] Tests if relevant. - [ ] All breaking changes documented. - [ ] List all breaking changes in the above "Breaking Changes" section. - [ ] Open an issue or PR on any number0 repos that are affected by this breaking change. Give guidance on how the updates should be handled or do the actual updates themselves. The major ones are: - [ ] [`quic-rpc`](https://github.com/n0-computer/quic-rpc) - [ ] [`iroh-gossip`](https://github.com/n0-computer/iroh-gossip) - [ ] [`iroh-blobs`](https://github.com/n0-computer/iroh-blobs) - [ ] [`dumbpipe`](https://github.com/n0-computer/dumbpipe) - [ ] [`sendme`](https://github.com/n0-computer/sendme) --------- Co-authored-by: dignifiedquire <me@dignifiedquire.com> Co-authored-by: ramfox <kasey@n0.computer>
removes the need for two `send` impls
## Description Since the server was actively closing the connection it is possible that the client would not have read the response yet by the time the connection is closed. ## Breaking Changes <!-- Optional, if there are any breaking changes document them, including how to migrate older code. --> ## Notes & open questions <!-- Any notes, remarks or open questions you have to make about the PR. --> ## Change checklist <!-- Remove any that are not relevant. --> - [ ] Self-review. - [ ] Documentation updates following the [style guide](https://rust-lang.github.io/rfcs/1574-more-api-documentation-conventions.html#appendix-a-full-conventions-text), if relevant. - [ ] Tests if relevant. - [ ] All breaking changes documented. - [ ] List all breaking changes in the above "Breaking Changes" section. - [ ] Open an issue or PR on any number0 repos that are affected by this breaking change. Give guidance on how the updates should be handled or do the actual updates themselves. The major ones are: - [ ] [`quic-rpc`](https://github.com/n0-computer/quic-rpc) - [ ] [`iroh-gossip`](https://github.com/n0-computer/iroh-gossip) - [ ] [`iroh-blobs`](https://github.com/n0-computer/iroh-blobs) - [ ] [`dumbpipe`](https://github.com/n0-computer/dumbpipe) - [ ] [`sendme`](https://github.com/n0-computer/sendme) <!-- Message of single commit: -->
## Description Based on #3593 This was always just a placeholder, and can now be collected using `EndpointHooks`. ## Breaking Changes - remove `Endpoint::latency` --------- Co-authored-by: varun-doshi <doshivarun202@gmail.com>
## Description Bumps quinn to latest `main-iroh` and netwatch/portmapper to n0-computer/net-tools#72 (the latter is needed because the quinn-udp version changed to 0.6 on `main-iroh`). ## Breaking Changes <!-- Optional, if there are any breaking changes document them, including how to migrate older code. --> ## Notes & open questions <!-- Any notes, remarks or open questions you have to make about the PR. --> ## Change checklist <!-- Remove any that are not relevant. --> - [ ] Self-review. - [ ] Documentation updates following the [style guide](https://rust-lang.github.io/rfcs/1574-more-api-documentation-conventions.html#appendix-a-full-conventions-text), if relevant. - [ ] Tests if relevant. - [ ] All breaking changes documented. - [ ] List all breaking changes in the above "Breaking Changes" section. - [ ] Open an issue or PR on any number0 repos that are affected by this breaking change. Give guidance on how the updates should be handled or do the actual updates themselves. The major ones are: - [ ] [`quic-rpc`](https://github.com/n0-computer/quic-rpc) - [ ] [`iroh-gossip`](https://github.com/n0-computer/iroh-gossip) - [ ] [`iroh-blobs`](https://github.com/n0-computer/iroh-blobs) - [ ] [`dumbpipe`](https://github.com/n0-computer/dumbpipe) - [ ] [`sendme`](https://github.com/n0-computer/sendme)
## Description
Whenever we insert a new path, trigger pruning paths.
We currently only prune IP paths, and pruning paths only occurs if we
have more than 30 IP paths.
We will prune any paths that did not successfully holepunch.
If there are still over 30 IP paths left, then we order the "inactive"
paths (paths that have been closed, but at one point holepunched), and
prune the paths that were closed earliest.
## Notes and Questions
- Added constants:
- `MAX_IP_PATHS` = 30 - maximum IP paths per endpoint
- `MAX_INACTIVE_IP_PATHS` = 10 - maximum inactive IP paths to keep
- New `PathState` field:
- `status` - tracks the `PathStatus` of the path
- New `PathStatus` enum:
- `PathStatus::Open` - is an open path
- `PathStatus::Inactive(Instant)` - was opened once, but currently inactive
- `PathStatus::Unusable` - we attempted to use it, but it never connected
- `PathStatus::Unknown` - we don't know the status yet
- New methods on `RemotePathState`:
- `abandoned_path` - marks a path as abandoned with timestamp, triggered when we get the `PathEvent::Abandoned` event
- `prune_paths` - triggers path pruning, occurs whenever we insert a path to the `RemotePathState`
- changed `insert` to `insert_open_path`
- New `prune_ip_paths` function with all the prune logic:
- Only prunes if IP paths exceed `MAX_IP_PATHS`
- Never prunes active paths or paths of unknown status
- Always prunes failed holepunch attempts (PathStatus::Unusable)
- Keeps 10 most recently inactive paths that were previously successful
- Special case: if all paths failed, keeps `MAX_IP_PATHS` instead of pruning everything
- Added tests for edge cases and the typical case
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Work on integrating n0-computer/quinn#28 into the iroh magic
TODOs