-
Notifications
You must be signed in to change notification settings - Fork 32
new NAT thread management #1341
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
|
||
| let (announceAddrs, discoveryAddrs) = nattedAddress( | ||
| s.config.nat, s.codexNode.switch.peerInfo.addrs, s.config.discoveryPort | ||
| s.natManager, s.config.nat, s.codexNode.switch.peerInfo.addrs, s.config.discoveryPort |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nph] reported by reviewdog 🐶
| s.natManager, s.config.nat, s.codexNode.switch.peerInfo.addrs, s.config.discoveryPort | |
| s.natManager, s.config.nat, s.codexNode.switch.peerInfo.addrs, | |
| s.config.discoveryPort, |
|
|
||
| proc redirectPorts*( | ||
| strategy: NatStrategy, tcpPort, udpPort: Port, description: string | ||
| manager: NatManager, strategy: NatStrategy, tcpPort, udpPort: Port, description: string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nph] reported by reviewdog 🐶
| manager: NatManager, strategy: NatStrategy, tcpPort, udpPort: Port, description: string | |
| manager: NatManager, | |
| strategy: NatStrategy, | |
| tcpPort, udpPort: Port, | |
| description: string, |
|
|
||
| proc setupNat*( | ||
| natStrategy: NatStrategy, tcpPort, udpPort: Port, clientId: string | ||
| manager: NatManager, natStrategy: NatStrategy, tcpPort, udpPort: Port, clientId: string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nph] reported by reviewdog 🐶
| manager: NatManager, natStrategy: NatStrategy, tcpPort, udpPort: Port, clientId: string | |
| manager: NatManager, | |
| natStrategy: NatStrategy, | |
| tcpPort, udpPort: Port, | |
| clientId: string, |
| tcpPort, | ||
| udpPort: Port, | ||
| clientId: string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nph] reported by reviewdog 🐶
| tcpPort, | |
| udpPort: Port, | |
| clientId: string | |
| tcpPort, udpPort: Port, | |
| clientId: string, |
|
|
||
| # Test address remapping | ||
| let (libp2pAddrs, discoveryAddrs) = nattedAddress(natConfig, addrs, udpPort) | ||
| let (libp2pAddrs, discoveryAddrs) = nattedAddress(natManager, natConfig, addrs, udpPort) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nph] reported by reviewdog 🐶
| let (libp2pAddrs, discoveryAddrs) = nattedAddress(natManager, natConfig, addrs, udpPort) | |
| let (libp2pAddrs, discoveryAddrs) = | |
| nattedAddress(natManager, natConfig, addrs, udpPort) |
Today I decided to dive in into
libcodex. On my two machines (AMD RYZEN AI MAX+ 395 (32)andAMD Ryzen AI 9 HX 370 (8+16)), the library crashes with segmentation fault when trying to create a second codex node in the same process (even when the first node was already stopped and destroyed successfully). It worked on Arnaud's machine and works on CI, thus we somehow blamed my setup...Initially I hoped that maybe just building it nativaly will fix the problem.
So, of course, I could not compile it as it seems that
nim-secp256k1has problem with some modern AMD RYZEN machines. The fix that seem to work was changing invendor/nim-codex/vendor/nim-secp256k1/secp256k1/abi.nim:Regarding the segmentation fault, I had some iterations with gpt-5-codex and here is an edited summary:
Switch.start()completes invendor/nim-codex/codex/codex.nim(line238), before any of the block‑exchange or discovery logs appear. The next code that runs isnattedAddress(...), so the blow‑up is occurring while we compute NAT mappings, not inside libp2p.vendor/nim-codex/codex/nat.nimkeeps all runtime state in module‑level globals (strategy,extIp,natClosed,activeMappings,natThreads) plus a singleaddQuitProc(stopNatThreads)hook. That design only works when one node lives in the process. As soon as you create a second node, both instances mutate the same sequences and atomics:redirectPortsappends to the sharedactiveMappings/natThreadsand spawns a renewal thread for every caller. Those renewal threads and any later redirectPorts invocations then race over the same global state.stopNatThreads()sets the sharednatClosedatomic totrue, tears down all threads and deletes every mapping inactiveMappings, regardless of which node created them. The other live node continues executing with stale NAT state and dangling C pointers, which is consistent with theSIGSEGVyou’re observing when the second node starts up.natClosednever flips back tofalse,activeMappingsis never cleared safely), the second node inherits half torn‑down structures. That’s why the bug only shows up once you create another node in the same process.getExternalIP()simply returnsnone. In that caseredirectPortsand the renewal threads never run, so the global state never gets into a bad cross-node mix — the code just keeps the defaultnat=anywithout doing any mapping, and everything appears fine. On my AMD boxes the call likely succeeds (UPnP works), so the brittle shared state is exercised and I am hitting the segfault.To make multiple nodes safe, we need to turn NAT management into a per-node resource.
The code on this branch is mostly authored by "gpt-5-codex" and so, it is just an entry point to actual work. The initial fix compiles and appears to work, but we had to temporarily disable the NAT port‑mapping renewal thread so we no longer spawn a background thread with
repeatPortMapping. This cannot go to production like this as NAT will not be able to adjust to changing conditions, but, I think it is a great start.With this initial fix, I can at least start two nodes one after another with libcodex and even two of them at the same time, even with NAT set to
any. Also apparently it fixed some logging issues that I have observed - for some combination of log levels for the codex nodes in one process, the node that stops second could in some circumstances hang onstopanddestroyoperations. With this initial fix, it seems not to be a problem any more.To experiments with libcodex, and to experience the problem yourself, you may like to check the following two repos: