Skip to content

Conversation

@doorgan
Copy link
Collaborator

@doorgan doorgan commented Nov 10, 2025

Alternative to #191 without introducing dependencies.

TODO:

  • Figure out why the engine node thinks Forge.EPMD does not exist.

@doorgan doorgan force-pushed the doorgan/epmdless-2 branch 2 times, most recently from fccf784 to 4bffe9b Compare November 11, 2025 02:29
## Enable deployment without epmd
## (requires changing both vm.args and remote.vm.args)
-epmd_module Elixir.XPForge.EPMD
-start_epmd false -dist_listen false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need the -dist_listen false anymore

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is a leftover of me splitting hairs

@mhanberg
Copy link
Member

Figure out why the engine node thinks Forge.EPMD does not exist.

Are you seeing this in tests or when you actually put it up?

@doorgan
Copy link
Collaborator Author

doorgan commented Nov 11, 2025

@mhanberg both in tests and when building a release, I'm getting this error:

Kernel pid terminated (application_controller) ("{application_start_failure,kernel,{{shutdown,{failed_to_start_child,net_sup,{shutdown,{failed_to_start_child,'Elixir.Forge.EPMD',{'EXIT',{undef,[{'Elixir.Forge.EPMD',start_link,[],[]},{supervisor,do_start_child_i,3,[{file,\"supervisor.erl\"},{line,959}]},{supervisor,do_start_child,3,[{file,\"supervisor.erl\"},{line,945}]},{supervisor,'-start_children/2-fun-0-',3,[{file,\"supervisor.erl\"},{line,929}]},{supervisor,children_map,4,[{file,\"supervisor.erl\"},{line,1820}]},{supervisor,init_children,2,[{file,\"supervisor.erl\"},{line,889}]},{gen_server,init_it,2,[{file,\"gen_server.erl\"},{line,2229}]},{gen_server,init_it,6,[{file,\"gen_server.erl\"},{line,2184}]}]}}}}}},{kernel,start,[normal,[]]}}}")

@mhanberg
Copy link
Member

@mhanberg both in tests and when building a release, I'm getting this error:

Kernel pid terminated (application_controller) ("{application_start_failure,kernel,{{shutdown,{failed_to_start_child,net_sup,{shutdown,{failed_to_start_child,'Elixir.Forge.EPMD',{'EXIT',{undef,[{'Elixir.Forge.EPMD',start_link,[],[]},{supervisor,do_start_child_i,3,[{file,\"supervisor.erl\"},{line,959}]},{supervisor,do_start_child,3,[{file,\"supervisor.erl\"},{line,945}]},{supervisor,'-start_children/2-fun-0-',3,[{file,\"supervisor.erl\"},{line,929}]},{supervisor,children_map,4,[{file,\"supervisor.erl\"},{line,1820}]},{supervisor,init_children,2,[{file,\"supervisor.erl\"},{line,889}]},{gen_server,init_it,2,[{file,\"gen_server.erl\"},{line,2229}]},{gen_server,init_it,6,[{file,\"gen_server.erl\"},{line,2184}]}]}}}}}},{kernel,start,[normal,[]]}}}")

What stands out to me is that the module name is not namespaced.

require Logger

def init(%Project{} = project, document_store_entropy, app_configs) do
Application.put_env(:kernel, :epmd_module, Forge.EPMD, persistent: true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know why this is necessary?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I't not or it should not be, I'm trying out things from the Livebook codebase
It seems like the vm tries to find the EPMD module before loading the code paths provided via -pa
Here in particular I was trying to not specify the epmd module with erl flags and instead try to change the epmd module after the engine node starts. This line wasn't reached because the rpc command to bootstrap didn't get through, but this in the -e flag doesn't help either

@doorgan
Copy link
Collaborator Author

doorgan commented Nov 11, 2025

What stands out to me is that the module name is not namespaced.

Ah, that log is from running tests, where namespacing doesn't run
If I build a release I get this in the lsp logs:

[ERROR][2025-11-10 19:03:47] ...p/_transport.lua:36     "rpc"   "/Users/dorgan/dev/expert/apps/expert/burrito_out/expert_darwin_arm64"  "stderr"        "plication_controller) (\"{application_start_failure,kernel,{{shutdown,{failed_to_start_child,net_sup,{shutdown,{failed_to_start_child,'Elixir.XPForge.EPMD',{'EXIT',{undef,[{'Elixir.XPForge.EPMD',start_link,[],[]},{supervisor,do_start_child_i,3,[{file,\\\"supervisor.erl\\\"},{line,420}]},{supervisor,do_start_child,2,[{file,\\\"supervisor.erl\\\"},{line,406}]},{supervis"

@doorgan
Copy link
Collaborator Author

doorgan commented Nov 11, 2025

I can reproduce the same issues on a barebones mix project: https://github.com/doorgan/epmdless

It does partially work if I follow @josevalim's example and run elixirc on a file containing both modules(although I get the requested disconnect from node in order to prevent overlapping partitions warning and children can't connect to each other), but it breaks inside a mix project

@mhanberg
Copy link
Member

I haven't researched further to validate and find a solution, but I believe the reason this is happening in the engine but not in expert is because expert is started as a release, which i believe by default loads all modules before the app starts.

Normal mix run or mix phx.server or iex -S mix invocations will lazily load modules as they are started, which is why it isn't present when it tries to utilize the -epmd_module. Which it seems to not load it when it tries to invoke it i guess.

Relevant section from the release docs at https://hexdocs.pm/elixir/releases.html#operating-system-scripts

# # Set the release to load code on demand (interactive) instead of preloading (embedded).
# export RELEASE_MODE=interactive

@josevalim
Copy link
Member

I can take a look later but in Livebook we don't start the node in the release, we explicitly call :net_kernel.start or similar to start them. It may be for the reasons above. i will investigate later.

@doorgan doorgan marked this pull request as ready for review November 14, 2025 15:20
@doorgan
Copy link
Collaborator Author

doorgan commented Nov 14, 2025

@josevalim thanks for all the help! This PR is now ready for testing and review

Comment on lines +37 to +40
(cd "apps/$proj" && elixir --erl "-start_epmd false -epmd_module Elixir.Forge.EPMD" -S mix {{args}})
;;
engine)
(cd "apps/$proj" && elixir --erl "-start_epmd false -epmd_module Elixir.Forge.EPMD" -S mix {{args}})
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's still useful to keep them separate even if they're the same right now but I don't feel strongly about it

you must set the environment variable ELIXIR_ERL_OPTIONS="-epmd_module #{Forge.EPMD}"
""")
end
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we do this when expert boots rather than when we try to start a child node?


## Enable deployment without epmd
## (requires changing both vm.args and remote.vm.args)
-epmd_module Elixir.XPForge.EPMD
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please double check this is indeed necessary. You may be fine using epmd for remote debugging (or maybe remote debugging is pointless).

Copy link
Member

@josevalim josevalim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! I have added some comments but none of them are stoppers (and they might be wrong :D)

Copy link
Member

@mhanberg mhanberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works on my corporate macbook and on my personal linux desktop

:shipit:

@doorgan doorgan merged commit 488d3a9 into main Nov 15, 2025
36 checks passed
@doorgan doorgan deleted the doorgan/epmdless-2 branch November 15, 2025 17:31
@doorgan doorgan mentioned this pull request Nov 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

4 participants