11# Draining Kubernetes nodes
22
3+ {% hint 'danger' %}
4+ If Kubernetes nodes with ArangoDB pods on them are drained carelessly
5+ data loss can occur! The proper procedure is described below.
6+ {% endhint %}
7+
38For maintenance work in k8s it is sometimes necessary to drain a k8s node,
49which means removing all pods from it. Kubernetes offers a standard API
510for this and our operator supports this - to the best of its ability.
@@ -43,15 +48,15 @@ anything, either by k8s itself (`ReplicationController`, `ReplicaSet`,
4348` Job ` , ` DaemonSet ` or ` StatefulSet ` ) or by an operator. If this is the
4449case, the drain operation will be refused, unless one uses the option
4550` --force=true ` . Since the ArangoDB operator manages our pods, we do not
46- have to use this option for ArangoDB, you might have to use it for other
47- pods.
51+ have to use this option for ArangoDB, but you might have to use it for
52+ other pods.
4853
4954If all these checks have been overcome, k8s proceeds as follows: All
5055pods are notified about this event and are put into a ` Terminating `
5156state. During this time, they have a chance to take action, or indeed
5257the operator managing them has. In particular, although the pods get
5358termination notices, they can keep running until the operator has
54- removed all "finalizers" . This gives the operator a chance to sort out
59+ removed all _ finalizers _ . This gives the operator a chance to sort out
5560things, for example in our case to move data away from the pod.
5661
5762However, there is a limit to this tolerance by k8s, and that is the
@@ -78,23 +83,22 @@ If any shard replicas are not currently in sync, then there is a serious
7883risk that the cluster is currently not as resilient as expected.
7984{% endhint %}
8085
81- One possibility to verify these two things is via the web UI. Node
82- health can be monitored on this screen ("NODES/Overview" tab) :
86+ One possibility to verify these two things is via the ArangoDB web interface.
87+ Node health can be monitored in the _ Overview _ tab under _ NODES _ :
8388
8489![ Cluster Health Screen] ( ./HealthyCluster.png )
8590
86- ** One has to check that all nodes are green and there is no node error in the
87- top right corner** .
91+ ** Check that all nodes are green** and that there is ** no node error** in the
92+ top right corner.
8893
89- As to the shards being in sync, one checks this on this screen
90- ("NODES/Shards" tab):
94+ As to the shards being in sync, see the _ Shards_ tab under _ NODES_ :
9195
9296![ Shard Screen] ( ./ShardsInSync.png )
9397
94- ** One has to check that all collections have a green check mark on the
95- right side ** . If any collection does not have such a check mark, one can
96- click on the collection and see the details about shards. Please keep in
97- mind that this has to be done for each database separately!
98+ ** Check that all collections have a green check mark** on the right side.
99+ If any collection does not have such a check mark, you can click on the
100+ collection and see the details about shards. Please keep in
101+ mind that this has to be done ** for each database** separately!
98102
99103Obviously, this might be tedious and calls for automation. Therefore, there
100104are APIs for this. The first one is [ Cluster Health] ( ../../../HTTP/Cluster/Health.html ) :
@@ -103,7 +107,7 @@ are APIs for this. The first one is [Cluster Health](../../../HTTP/Cluster/Healt
103107POST /_admin/cluster/health
104108```
105109
106- which returns a JSON document looking like this:
110+ … which returns a JSON document looking like this:
107111
108112``` JSON
109113{
@@ -154,21 +158,23 @@ which returns a JSON document looking like this:
154158}
155159```
156160
157- One has to check that each instance has a ` Status ` field with the value
158- ` "GOOD" ` . Here is a shell command which makes this check easy, using the
159- ` jq ` JSON pretty printer:
161+ Check that each instance has a ` Status ` field with the value ` "GOOD" ` .
162+ Here is a shell command which makes this check easy, using the
163+ [ ` jq ` JSON pretty printer] ( https://stedolan.github.io/jq/ ) :
160164
161165``` bash
162166curl -k https://arangodb.9hoeffer.de:8529/_admin/cluster/health --user root: | jq . | grep ' "Status"' | grep -v ' "GOOD"'
163167```
164168
165- For the shards being in sync there is the [ Cluster Inventory] ( ../../../HTTP/Replications/ReplicationDump.html#return-cluster-inventory-of-collections-and-indexes ) API call:
169+ For the shards being in sync there is the
170+ [ Cluster Inventory] ( ../../../HTTP/Replications/ReplicationDump.html#return-cluster-inventory-of-collections-and-indexes )
171+ API call:
166172
167173```
168174POST /_db/_system/_api/replication/clusterInventory
169175```
170176
171- which returns a JSON body like this:
177+ … which returns a JSON body like this:
172178
173179``` JSON
174180{
@@ -236,7 +242,7 @@ which returns a JSON body like this:
236242}
237243```
238244
239- One has to check that for all collections the attribute ` "allInSync" ` has
245+ Check that for all collections the attribute ` "allInSync" ` has
240246the value ` true ` . Note that it is necessary to do this for all databases!
241247
242248Here is a shell command which makes this check easy:
@@ -245,40 +251,40 @@ Here is a shell command which makes this check easy:
245251curl -k https://arangodb.9hoeffer.de:8529/_db/_system/_api/replication/clusterInventory --user root: | jq . | grep ' "allInSync"' | sort | uniq -c
246252```
247253
248- {% hint 'tip' %}
249- If all these checks are performed and are OK, the cluster is ready to
254+ If all these checks are performed and are okay, the cluster is ready to
250255run a risk-free drain operation.
251- {% endhint %}
252256
253257{% hint 'danger' %}
254- Note that if there are some collections with ` replicationFactor ` set to
258+ If there are some collections with ` replicationFactor ` set to
2552591, the system is not resilient and cannot tolerate the failure of even a
256260single server! One can still perform a drain operation in this case, but
257261if anything goes wrong, in particular if the grace period is chosen too
258262short and a pod is killed the hard way, data loss can happen.
259263{% endhint %}
260264
261265If all ` replicationFactor ` s of all collections are at least 2, then the
262- system can tolerate the failure of a single DBserver . If you have set
266+ system can tolerate the failure of a single _ DBserver _ . If you have set
263267the ` Environment ` to ` Production ` in the specs of the ArangoDB
264- deployment, you will only ever have one DBserver on each k8s node and
268+ deployment, you will only ever have one _ DBserver _ on each k8s node and
265269therefore the drain operation is relatively safe, even if the grace
266270period is chosen too small.
267271
268- Furthermore, we recommend to have one k8s node more than DBservers in
272+ Furthermore, we recommend to have one k8s node more than _ DBservers _ in
269273you cluster, such that the deployment of a replacement _ DBServer_ can
270274happen quickly and not only after the maintenance work on the drained
271275node has been completed. However, with the necessary care described
272276below, the procedure should also work without this.
273277
274- Finally, ** one should not run a rolling upgrade or restart operation at
275- the time of a node drain** .
278+ Finally, one should ** not run a rolling upgrade or restart operation**
279+ at the time of a node drain.
276280
277- ## Optional: Clean out a DBserver manually
281+ ## Clean out a DBserver manually (optional)
278282
279283In this step we clean out a _ DBServer_ manually, before even issuing the
280284` kubectl drain ` command. This step is optional, but can speed up things
281- considerably. Here is why: If this step is not performed, we must choose
285+ considerably. Here is why:
286+
287+ If this step is not performed, we must choose
282288the grace period long enough to avoid any risk, as explained in the
283289previous section. However, this has a disadvantage which has nothing to
284290do with ArangoDB: We have observed, that some k8s internal services like
@@ -300,7 +306,7 @@ To clean out a _DBServer_ manually, we have to use this API:
300306POST /_admin/cluster/cleanOutServer
301307```
302308
303- and send as body a JSON document like this:
309+ … and send as body a JSON document like this:
304310
305311``` JSON
306312{"server" :" DBServer0006" }
@@ -310,16 +316,16 @@ and send as body a JSON document like this:
310316The value of the ` "server" ` attribute should be the name of the DBserver
311317which is one the pod which shall be drained next. This uses the UI short
312318name, alternatively one can use the internal name, which corresponds to
313- the pod name: In the example described in this section , the pod name is
319+ the pod name: In our example, the pod name is:
314320
315321```
316322my-arangodb-cluster-prmr-wbsq47rz-5676ed
317323```
318324
319- where ` my-arangodb-cluster ` is the ArangoDB deployment name, therefore
320- the internal name of the DBserver is ` PRMR-wbsq47rz ` , note that ` PRMR `
321- must be all capitals since pod names are always all lower case. So, I
322- could use the body
325+ … where ` my-arangodb-cluster ` is the ArangoDB deployment name, therefore
326+ the internal name of the _ DBserver _ is ` PRMR-wbsq47rz ` . Note that ` PRMR `
327+ must be all capitals since pod names are always all lower case. So, we
328+ could use the body:
323329
324330``` JSON
325331{"server" :" PRMR-wbsq47rz" }
@@ -338,7 +344,7 @@ completion status of the clean out server job with this API:
338344GET /_admin/cluster/queryAgencyJob?id=38029195
339345```
340346
341- which will return a body like this:
347+ … which will return a body like this:
342348
343349``` JSON
344350{
@@ -357,7 +363,7 @@ which will return a body like this:
357363}
358364```
359365
360- which indicates that the job is still ongoing (` "Pending" ` ). As soon as
366+ It indicates that the job is still ongoing (` "Pending" ` ). As soon as
361367the job has completed, the answer will be:
362368
363369``` JSON
@@ -378,7 +384,7 @@ the job has completed, the answer will be:
378384}
379385```
380386
381- Note that from this moment on the _ DBServer _ can no longer be used to move
387+ From this moment on the _ DBserver _ can no longer be used to move
382388shards to. At the same time, it will no longer hold any data of the
383389cluster.
384390
@@ -387,8 +393,7 @@ completely risk-free, even with a small grace period.
387393
388394## Performing the drain
389395
390- After all checks in Section
391- [ Things to check in ArangoDB before a node drain] ( #things-to-check-in-arangodb-before-a-node-drain )
396+ After all above [ checks before a node drain] ( #things-to-check-in-arangodb-before-a-node-drain )
392397have been done successfully, it is safe to perform the drain
393398operation, similar to this command:
394399
@@ -404,9 +409,10 @@ can be moved to a different server within 5 minutes. Note that this is
404409much data is stored in the pod, your mileage may vary, moving a terabyte
405410of data can take considerably longer!
406411
407- If the optional step in the previous section has been performed
408- beforehand, the grace period can easily be reduced to 60 seconds, say, at
409- least from the perspective of ArangoDB, since the _ DBServer_ is already
412+ If the optional step of
413+ [ cleaning out a DBserver manually] ( #clean-out-a-dbserver-manually-optional )
414+ has been performed beforehand, the grace period can easily be reduced to 60
415+ seconds - at least from the perspective of ArangoDB, since the server is already
410416cleaned out, so it can be dropped readily and there is still no risk.
411417
412418At the same time, this guarantees now that the drain is completed
@@ -415,25 +421,24 @@ approximately within a minute.
415421## Things to check after a node drain
416422
417423After a node has been drained, there will usually be one of the
418- DBservers gone from the cluster. As a replacement, another _ DBServer_ has
424+ _ DBservers _ gone from the cluster. As a replacement, another _ DBServer_ has
419425been deployed on a different node, if there is a different node
420426available. If not, the replacement can only be deployed when the
421427maintenance work on the drained node has been completed and it is
422428uncordoned again. In this latter case, one should wait until the node is
423429back up and the replacement pod has been deployed there.
424430
425431After that, one should perform the same checks as described in Section
426- [ Things to check in ArangoDB before a node drain] ( #things-to-check-in-arangodb-before-a-node-drain )
432+ [ things to check before a node drain] ( #things-to-check-in-arangodb-before-a-node-drain )
427433above.
428434
429435Finally, it is likely that the shard distribution in the "new" cluster
430436is not balanced out. In particular, the new _ DBSserver_ is not automatically
431- used to store shards. We recommend to [ re-balance] ( ../../Administration/Cluster/#movingrebalancing-shards ) the shard distribution,
437+ used to store shards. We recommend to
438+ [ re-balance] ( ../../Administration/Cluster/#movingrebalancing-shards ) the shard distribution,
432439either manually by moving shards or by using the "Rebalance Shards"
433440button in the "NODES/Shards" tab in the UI. This redistribution can take
434441some time again and progress can be monitored in the UI.
435442
436- After all this has been done, ** another round of checks should be done
437- before proceeding to drain the next node** .
438-
439-
443+ After all this has been done, ** another round of checks should be done**
444+ before proceeding to drain the next node.
0 commit comments