@@ -8,11 +8,13 @@ subnets or associated infrastructure such as routers. The requirements are that:
884 . At least one network on each node provides outbound internet access (either
99directly, or via a proxy).
1010
11- Futhermore, it is recommended that the deploy host has an interface on the
12- access network. While it is possible to e.g. use a floating IP on a login node
13- as an SSH proxy to access the other nodes, this can create problems in recovering
14- the cluster if the login node is unavailable and can make Ansible problems harder
15- to debug.
11+ Addresses on the "access network" used as the ` ansible_host ` IPs.
12+
13+ It is recommended that the deploy host either has a direct connection to the
14+ "access network" or jumps through a host on it which is not part of the appliance.
15+ Using e.g. a floating IP on a login node as a jumphost creates problems in
16+ recovering the cluster if the login node is unavailable and can make Ansible
17+ problems harder to debug.
1618
1719> [ !WARNING]
1820> If home directories are on a shared filesystem with no authentication (such
@@ -29,8 +31,8 @@ the OpenTofu variables. These will normally be set in
2931need to be overriden for specific environments, this can be done via an OpenTofu
3032module as discussed [ here] ( ./production.md ) .
3133
32- Note that if an OpenStack subnet has a gateway IP defined then nodes with ports
33- attached to that subnet will get a default route set via that gateway.
34+ Note that if an OpenStack subnet has a gateway IP defined then by default nodes
35+ with ports attached to that subnet get a default route set via that gateway.
3436
3537## Single network
3638This is the simplest possible configuration. A single network and subnet is
@@ -77,8 +79,9 @@ vnic_types = {
7779## Additional networks on some nodes
7880
7981This example shows how to modify variables for specific node groups. In this
80- case a baremetal node group has a second network attached. As above, only a
81- single subnet can have a gateway IP.
82+ case a baremetal node group has a second network attached. Here "subnetA" must
83+ have a gateway IP defined and "subnetB" must not, to avoid routing problems on
84+ the multi-homeed compute nodes.
8285
8386``` terraform
8487cluster_networks = [
@@ -109,3 +112,85 @@ compute = {
109112}
110113...
111114```
115+
116+ ## Multiple networks with non-default gateways
117+
118+ In some multiple network configurations it may be necessary to manage default
119+ routes rather than them being automatically created from a subnet gateway.
120+ This can be done using the tofu variable ` gateway_ip ` which can be set for the
121+ cluster and/or overriden on the compute and login groups. If this is set:
122+ - a default route via that address will be created on the appropriate interface
123+ during boot if it does not exist
124+ - any other default routes will be removed
125+
126+ For example the cluster configuration below has a "campus" network with a
127+ default gateway which provides inbound SSH / ondemand access and outbound
128+ internet attached only to the login nodes, and a "data" network attached to
129+ all nodes. The "data" network has no gateway IP set on its subnet to avoid dual
130+ default routes and routing conflicts on the multi-homed login nodes, but does
131+ have outbound connectivity via a router:
132+
133+ ``` terraform
134+ cluster_networks = [
135+ {
136+ network = "data" # access network, CIDR 172.16.0.0/23
137+ subnet = "data_subnet"
138+ }
139+ ]
140+
141+ login = {
142+ interactive = {
143+ nodes = ["login-0"]
144+ extra_networks = [
145+ {
146+ network = "campus"
147+ subnet = "campus_subnet"
148+ }
149+ ]
150+ }
151+ }
152+ compute = {
153+ general = {
154+ nodes = ["compute-0", "compute-1"]
155+ }
156+ gateway_ip = "172.16.0.1" # Router interface
157+ }
158+ ```
159+
160+ If there is no default route at all (either from a subnet gateway or from
161+ ` gateway_ip ` ) then a dummy route is created via the access network interface to
162+ ensure [ correct] ( https://docs.k3s.io/installation/airgap#default-network-route )
163+ ` k3s ` operation.
164+
165+ When using a subnet with no default gateway, OpenStack's nameserver for the
166+ subnet may refuse lookups. External nameservers can be defined using the
167+ [ resolv_conf] ( ../ansible/roles/resolv_conf/README.md ) role.
168+
169+ ## Proxies
170+
171+ If some nodes have no outbound connectivity via any networks, the cluster can
172+ be configured to deploy a [ squid proxy] ( https://www.squid-cache.org/ ) on a node
173+ with outbound connectivity. Assuming the ` compute ` and ` control ` nodes have no
174+ outbound connectivity and the ` login ` node does, the minimal configuration for
175+ this is:
176+
177+ ``` yaml
178+ # environments/$SITE/inventory/groups:
179+ [squid:children]
180+ login
181+ [proxy:children]
182+ control
183+ compute
184+ ```
185+
186+ ``` yaml
187+ # environments/$SITE/inventory/group_vars/all/squid.yml:
188+ # these are just examples
189+ squid_cache_disk : 1024 # MB
190+ squid_cache_mem : ' 12 GB'
191+ ` ` `
192+
193+ Note that name resolution must still be possible and may require defining an
194+ nameserver which is directly reachable from the node using the
195+ [resolv_conf](../ansible/roles/resolv_conf/README.md)
196+ role.
0 commit comments