Skip to content

Commit dbe8641

Browse files
committed
Merge with Santo's review
1 parent d5efdbc commit dbe8641

File tree

1 file changed

+57
-52
lines changed
  • docs/Manual/Deployment/Kubernetes

1 file changed

+57
-52
lines changed

docs/Manual/Deployment/Kubernetes/Drain.md

Lines changed: 57 additions & 52 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,10 @@
11
# Draining Kubernetes nodes
22

3+
{% hint 'danger' %}
4+
If Kubernetes nodes with ArangoDB pods on them are drained carelessly
5+
data loss can occur! The proper procedure is described below.
6+
{% endhint %}
7+
38
For maintenance work in k8s it is sometimes necessary to drain a k8s node,
49
which means removing all pods from it. Kubernetes offers a standard API
510
for this and our operator supports this - to the best of its ability.
@@ -43,15 +48,15 @@ anything, either by k8s itself (`ReplicationController`, `ReplicaSet`,
4348
`Job`, `DaemonSet` or `StatefulSet`) or by an operator. If this is the
4449
case, the drain operation will be refused, unless one uses the option
4550
`--force=true`. Since the ArangoDB operator manages our pods, we do not
46-
have to use this option for ArangoDB, you might have to use it for other
47-
pods.
51+
have to use this option for ArangoDB, but you might have to use it for
52+
other pods.
4853

4954
If all these checks have been overcome, k8s proceeds as follows: All
5055
pods are notified about this event and are put into a `Terminating`
5156
state. During this time, they have a chance to take action, or indeed
5257
the operator managing them has. In particular, although the pods get
5358
termination notices, they can keep running until the operator has
54-
removed all "finalizers". This gives the operator a chance to sort out
59+
removed all _finalizers_. This gives the operator a chance to sort out
5560
things, for example in our case to move data away from the pod.
5661

5762
However, there is a limit to this tolerance by k8s, and that is the
@@ -78,23 +83,22 @@ If any shard replicas are not currently in sync, then there is a serious
7883
risk that the cluster is currently not as resilient as expected.
7984
{% endhint %}
8085

81-
One possibility to verify these two things is via the web UI. Node
82-
health can be monitored on this screen ("NODES/Overview" tab):
86+
One possibility to verify these two things is via the ArangoDB web interface.
87+
Node health can be monitored in the _Overview_ tab under _NODES_:
8388

8489
![Cluster Health Screen](./HealthyCluster.png)
8590

86-
**One has to check that all nodes are green and there is no node error in the
87-
top right corner**.
91+
**Check that all nodes are green** and that there is **no node error** in the
92+
top right corner.
8893

89-
As to the shards being in sync, one checks this on this screen
90-
("NODES/Shards" tab):
94+
As to the shards being in sync, see the _Shards_ tab under _NODES_:
9195

9296
![Shard Screen](./ShardsInSync.png)
9397

94-
**One has to check that all collections have a green check mark on the
95-
right side**. If any collection does not have such a check mark, one can
96-
click on the collection and see the details about shards. Please keep in
97-
mind that this has to be done for each database separately!
98+
**Check that all collections have a green check mark** on the right side.
99+
If any collection does not have such a check mark, you can click on the
100+
collection and see the details about shards. Please keep in
101+
mind that this has to be done **for each database** separately!
98102

99103
Obviously, this might be tedious and calls for automation. Therefore, there
100104
are APIs for this. The first one is [Cluster Health](../../../HTTP/Cluster/Health.html):
@@ -103,7 +107,7 @@ are APIs for this. The first one is [Cluster Health](../../../HTTP/Cluster/Healt
103107
POST /_admin/cluster/health
104108
```
105109

106-
which returns a JSON document looking like this:
110+
which returns a JSON document looking like this:
107111

108112
```JSON
109113
{
@@ -154,21 +158,23 @@ which returns a JSON document looking like this:
154158
}
155159
```
156160

157-
One has to check that each instance has a `Status` field with the value
158-
`"GOOD"`. Here is a shell command which makes this check easy, using the
159-
`jq` JSON pretty printer:
161+
Check that each instance has a `Status` field with the value `"GOOD"`.
162+
Here is a shell command which makes this check easy, using the
163+
[`jq` JSON pretty printer](https://stedolan.github.io/jq/):
160164

161165
```bash
162166
curl -k https://arangodb.9hoeffer.de:8529/_admin/cluster/health --user root: | jq . | grep '"Status"' | grep -v '"GOOD"'
163167
```
164168

165-
For the shards being in sync there is the [Cluster Inventory](../../../HTTP/Replications/ReplicationDump.html#return-cluster-inventory-of-collections-and-indexes) API call:
169+
For the shards being in sync there is the
170+
[Cluster Inventory](../../../HTTP/Replications/ReplicationDump.html#return-cluster-inventory-of-collections-and-indexes)
171+
API call:
166172

167173
```
168174
POST /_db/_system/_api/replication/clusterInventory
169175
```
170176

171-
which returns a JSON body like this:
177+
which returns a JSON body like this:
172178

173179
```JSON
174180
{
@@ -236,7 +242,7 @@ which returns a JSON body like this:
236242
}
237243
```
238244

239-
One has to check that for all collections the attribute `"allInSync"` has
245+
Check that for all collections the attribute `"allInSync"` has
240246
the value `true`. Note that it is necessary to do this for all databases!
241247

242248
Here is a shell command which makes this check easy:
@@ -245,40 +251,40 @@ Here is a shell command which makes this check easy:
245251
curl -k https://arangodb.9hoeffer.de:8529/_db/_system/_api/replication/clusterInventory --user root: | jq . | grep '"allInSync"' | sort | uniq -c
246252
```
247253

248-
{% hint 'tip' %}
249-
If all these checks are performed and are OK, the cluster is ready to
254+
If all these checks are performed and are okay, the cluster is ready to
250255
run a risk-free drain operation.
251-
{% endhint %}
252256

253257
{% hint 'danger' %}
254-
Note that if there are some collections with `replicationFactor` set to
258+
If there are some collections with `replicationFactor` set to
255259
1, the system is not resilient and cannot tolerate the failure of even a
256260
single server! One can still perform a drain operation in this case, but
257261
if anything goes wrong, in particular if the grace period is chosen too
258262
short and a pod is killed the hard way, data loss can happen.
259263
{% endhint %}
260264

261265
If all `replicationFactor`s of all collections are at least 2, then the
262-
system can tolerate the failure of a single DBserver. If you have set
266+
system can tolerate the failure of a single _DBserver_. If you have set
263267
the `Environment` to `Production` in the specs of the ArangoDB
264-
deployment, you will only ever have one DBserver on each k8s node and
268+
deployment, you will only ever have one _DBserver_ on each k8s node and
265269
therefore the drain operation is relatively safe, even if the grace
266270
period is chosen too small.
267271

268-
Furthermore, we recommend to have one k8s node more than DBservers in
272+
Furthermore, we recommend to have one k8s node more than _DBservers_ in
269273
you cluster, such that the deployment of a replacement _DBServer_ can
270274
happen quickly and not only after the maintenance work on the drained
271275
node has been completed. However, with the necessary care described
272276
below, the procedure should also work without this.
273277

274-
Finally, **one should not run a rolling upgrade or restart operation at
275-
the time of a node drain**.
278+
Finally, one should **not run a rolling upgrade or restart operation**
279+
at the time of a node drain.
276280

277-
## Optional: Clean out a DBserver manually
281+
## Clean out a DBserver manually (optional)
278282

279283
In this step we clean out a _DBServer_ manually, before even issuing the
280284
`kubectl drain` command. This step is optional, but can speed up things
281-
considerably. Here is why: If this step is not performed, we must choose
285+
considerably. Here is why:
286+
287+
If this step is not performed, we must choose
282288
the grace period long enough to avoid any risk, as explained in the
283289
previous section. However, this has a disadvantage which has nothing to
284290
do with ArangoDB: We have observed, that some k8s internal services like
@@ -300,7 +306,7 @@ To clean out a _DBServer_ manually, we have to use this API:
300306
POST /_admin/cluster/cleanOutServer
301307
```
302308

303-
and send as body a JSON document like this:
309+
and send as body a JSON document like this:
304310

305311
```JSON
306312
{"server":"DBServer0006"}
@@ -310,16 +316,16 @@ and send as body a JSON document like this:
310316
The value of the `"server"` attribute should be the name of the DBserver
311317
which is one the pod which shall be drained next. This uses the UI short
312318
name, alternatively one can use the internal name, which corresponds to
313-
the pod name: In the example described in this section, the pod name is
319+
the pod name: In our example, the pod name is:
314320

315321
```
316322
my-arangodb-cluster-prmr-wbsq47rz-5676ed
317323
```
318324

319-
where `my-arangodb-cluster` is the ArangoDB deployment name, therefore
320-
the internal name of the DBserver is `PRMR-wbsq47rz`, note that `PRMR`
321-
must be all capitals since pod names are always all lower case. So, I
322-
could use the body
325+
where `my-arangodb-cluster` is the ArangoDB deployment name, therefore
326+
the internal name of the _DBserver_ is `PRMR-wbsq47rz`. Note that `PRMR`
327+
must be all capitals since pod names are always all lower case. So, we
328+
could use the body:
323329

324330
```JSON
325331
{"server":"PRMR-wbsq47rz"}
@@ -338,7 +344,7 @@ completion status of the clean out server job with this API:
338344
GET /_admin/cluster/queryAgencyJob?id=38029195
339345
```
340346

341-
which will return a body like this:
347+
which will return a body like this:
342348

343349
```JSON
344350
{
@@ -357,7 +363,7 @@ which will return a body like this:
357363
}
358364
```
359365

360-
which indicates that the job is still ongoing (`"Pending"`). As soon as
366+
It indicates that the job is still ongoing (`"Pending"`). As soon as
361367
the job has completed, the answer will be:
362368

363369
```JSON
@@ -378,7 +384,7 @@ the job has completed, the answer will be:
378384
}
379385
```
380386

381-
Note that from this moment on the _DBServer_ can no longer be used to move
387+
From this moment on the _DBserver_ can no longer be used to move
382388
shards to. At the same time, it will no longer hold any data of the
383389
cluster.
384390

@@ -387,8 +393,7 @@ completely risk-free, even with a small grace period.
387393

388394
## Performing the drain
389395

390-
After all checks in Section
391-
[Things to check in ArangoDB before a node drain](#things-to-check-in-arangodb-before-a-node-drain)
396+
After all above [checks before a node drain](#things-to-check-in-arangodb-before-a-node-drain)
392397
have been done successfully, it is safe to perform the drain
393398
operation, similar to this command:
394399

@@ -404,9 +409,10 @@ can be moved to a different server within 5 minutes. Note that this is
404409
much data is stored in the pod, your mileage may vary, moving a terabyte
405410
of data can take considerably longer!
406411

407-
If the optional step in the previous section has been performed
408-
beforehand, the grace period can easily be reduced to 60 seconds, say, at
409-
least from the perspective of ArangoDB, since the _DBServer_ is already
412+
If the optional step of
413+
[cleaning out a DBserver manually](#clean-out-a-dbserver-manually-optional)
414+
has been performed beforehand, the grace period can easily be reduced to 60
415+
seconds - at least from the perspective of ArangoDB, since the server is already
410416
cleaned out, so it can be dropped readily and there is still no risk.
411417

412418
At the same time, this guarantees now that the drain is completed
@@ -415,25 +421,24 @@ approximately within a minute.
415421
## Things to check after a node drain
416422

417423
After a node has been drained, there will usually be one of the
418-
DBservers gone from the cluster. As a replacement, another _DBServer_ has
424+
_DBservers_ gone from the cluster. As a replacement, another _DBServer_ has
419425
been deployed on a different node, if there is a different node
420426
available. If not, the replacement can only be deployed when the
421427
maintenance work on the drained node has been completed and it is
422428
uncordoned again. In this latter case, one should wait until the node is
423429
back up and the replacement pod has been deployed there.
424430

425431
After that, one should perform the same checks as described in Section
426-
[Things to check in ArangoDB before a node drain](#things-to-check-in-arangodb-before-a-node-drain)
432+
[things to check before a node drain](#things-to-check-in-arangodb-before-a-node-drain)
427433
above.
428434

429435
Finally, it is likely that the shard distribution in the "new" cluster
430436
is not balanced out. In particular, the new _DBSserver_ is not automatically
431-
used to store shards. We recommend to [re-balance](../../Administration/Cluster/#movingrebalancing-shards) the shard distribution,
437+
used to store shards. We recommend to
438+
[re-balance](../../Administration/Cluster/#movingrebalancing-shards) the shard distribution,
432439
either manually by moving shards or by using the "Rebalance Shards"
433440
button in the "NODES/Shards" tab in the UI. This redistribution can take
434441
some time again and progress can be monitored in the UI.
435442

436-
After all this has been done, **another round of checks should be done
437-
before proceeding to drain the next node**.
438-
439-
443+
After all this has been done, **another round of checks should be done**
444+
before proceeding to drain the next node.

0 commit comments

Comments
 (0)