More doc updates

ewoutp · ewoutp · commit 89b7ea4d2230 · 2018-06-08T15:09:39.000+02:00
diff --git a/docs/Manual/Deployment/Kubernetes/Storage.md b/docs/Manual/Deployment/Kubernetes/Storage.md
@@ -10,13 +10,45 @@ In the `ArangoDeployment` resource, one can specify the type of storage
 used by groups of servers using the `spec.<group>.storageClassName`
 setting.
 
+This is an example of a `Cluster` deployment that stores its agent & dbserver
+data on `PersistentVolumes` that using the `my-local-ssd` `StorageClass`
+
+```yaml
+apiVersion: "database.arangodb.com/v1alpha"
+kind: "ArangoDeployment"
+metadata:
+  name: "cluster-using-local-ssh"
+spec:
+  mode: Cluster
+  agents:
+    storageClassName: my-local-ssd
+  dbservers:
+    storageClassName: my-local-ssd
+```
+
 The amount of storage needed is configured using the
 `spec.<group>.resources.requests.storage` setting.
 
 Note that configuring storage is done per group of servers.
 It is not possible to configure storage per individual
 server.
 
+This is an example of a `Cluster` deployment that requests volumes of 80GB
+for every dbserver, resulting in a total storage capacity of 240GB (with 3 dbservers).
+
+```yaml
+apiVersion: "database.arangodb.com/v1alpha"
+kind: "ArangoDeployment"
+metadata:
+  name: "cluster-using-local-ssh"
+spec:
+  mode: Cluster
+  dbservers:
+    resources:
+      requests:
+        storage: 80Gi
+```
+
 ## Local storage
 
 For optimal performance, ArangoDB should be configured with locally attached
@@ -26,6 +58,28 @@ The easiest way to accomplish this is to deploy an
 [`ArangoLocalStorage` resource](./StorageResource.md).
 The ArangoDB Storage Operator will use it to provide `PersistentVolumes` for you.
 
+This is an example of an `ArangoLocalStorage` resource that will result in
+`PersistentVolumes` created on any node of the Kubernetes cluster
+under the directory `/mnt/big-ssd-disk`.
+
+```yaml
+apiVersion: "storage.arangodb.com/v1alpha"
+kind: "ArangoLocalStorage"
+metadata:
+  name: "example-arangodb-storage"
+spec:
+  storageClass:
+    name: my-local-ssd
+  localPath:
+  - /mnt/big-ssd-disk
+```
+
+Note that using local storage required `VolumeScheduling` to be enabled in your
+Kubernetes cluster. ON Kubernetes 1.10 this is enabled by default, on version
+1.9 you have to enable it with a `--feature-gate` setting.
+
+### Manually creating `PersistentVolumes`
+
 The alternative is to create `PersistentVolumes` manually, for all servers that
 need persistent storage (single, agents & dbservers).
 E.g. for a `Cluster` with 3 agents and 5 dbservers, you must create 8 volumes.
@@ -54,14 +108,14 @@ metadata:
                  ]}
               }'
 spec:
-    capacity:
-      storage: 100Gi
-    accessModes:
-    - ReadWriteOnce
-    persistentVolumeReclaimPolicy: Delete
-    storageClassName: local-ssd
-    local:
-      path: /mnt/disks/ssd1
+  capacity:
+    storage: 100Gi
+  accessModes:
+  - ReadWriteOnce
+  persistentVolumeReclaimPolicy: Delete
+  storageClassName: local-ssd
+  local:
+    path: /mnt/disks/ssd1
 ```
 
 For Kubernetes 1.9 and up, you should create a `StorageClass` which is configured
diff --git a/docs/Manual/Deployment/Kubernetes/Tls.md b/docs/Manual/Deployment/Kubernetes/Tls.md
@@ -23,7 +23,8 @@ kubectl get secret <deploy-name>-ca --template='{{index .data "ca.crt"}}' | base
 
 ### Windows
 
-TODO
+To install a CA certificate in Windows, follow the
+[procedure described here](http://wiki.cacert.org/HowTo/InstallCAcertRoots).
 
 ### MacOS
 
@@ -41,4 +42,9 @@ sudo /usr/bin/security remove-trusted-cert -d ca.crt
 
 ### Linux
 
-TODO
+To install a CA certificate in Linux, on Ubuntu, run:
+
+```bash
+sudo cp ca.crt /usr/local/share/ca-certificates/<some-name>.crt
+sudo update-ca-certificates
+```
diff --git a/docs/Manual/Deployment/Kubernetes/Troubleshooting.md b/docs/Manual/Deployment/Kubernetes/Troubleshooting.md
@@ -0,0 +1,112 @@
+# Troubleshooting
+
+While Kubernetes and the ArangoDB Kubernetes operator will automatically
+resolve a lot of issues, there are always cases where human attention
+is needed.
+
+This chapter gives your tips & tricks to help you troubleshoot deployments.
+
+## Where to look
+
+In Kubernetes all resources can be inspected using `kubectl` using either
+the `get` or `describe` command.
+
+To get all details of the resource (both specification & status),
+run the following command:
+
+```bash
+kubectl get <resource-type> <resource-name> -n <namespace> -o yaml
+```
+
+For example, to get the entire specification and status
+of an `ArangoDeployment` resource named `my-arangodb` in the `default` namespace,
+run:
+
+```bash
+kubectl get ArangoDeployment my-arango -n default -o yaml
+# or shorter
+kubectl get arango my-arango -o yaml
+```
+
+Several types of resources (including all ArangoDB custom resources) support
+events. These events show what happened to the resource over time.
+
+To show the events (and most important resource data) of a resource,
+run the following command:
+
+```bash
+kubectl describe <resource-type> <resource-name> -n <namespace>
+```
+
+## Getting logs
+
+Another invaluable source of information is the log of containers being run
+in Kubernetes.
+These logs are accessible through the `Pods` that group these containers.
+
+The fetch the logs of the default container running in a `Pod`, run:
+
+```bash
+kubectl logs <pod-name> -n <namespace>
+# or with follow option to keep inspecting logs while their are written
+kubectl logs <pod-name> -n <namespace> -f
+```
+
+To inspect the logs of a specific container in `Pod`, add `-c <container-name>`.
+You can find the names of the containers in the `Pod`, using `kubectl describe pod ...`.
+
+{% hint 'info' %}
+Note that the ArangoDB operators are being deployed themselves as a Kubernetes `Deployment`
+with 2 replicas. This means that you will have to fetch the logs of 2 `Pods` running
+those replicas.
+{% endhint %}
+
+## What if
+
+### The `Pods` of a deployment stay in `Pending` state
+
+There are two common causes for this.
+
+1) The `Pods` cannot be scheduled because there are no enough nodes available.
+   This is usally only the case with a `spec.environment` setting that has a value of `Production`.
+
+   Solution: Add more nodes.
+1) There are no `PersistentVolumes` available to be bound to the `PersistentVolumeClaims`
+   created by the operator.
+
+    Solution: Use `kubectl get persistentvolumes` to inspect the available `PersistentVolumes`
+    and if needed, use the [`ArangoLocalStorage` operator](./StorageResource.md) to provision `PersistentVolumes`.
+
+### When restarting a `Node`, the `Pods` scheduled on that node remain in `Terminating` state
+
+When a `Node` no longer makes regular calls to the Kubernetes API server, it is
+marked as not available. Depending on specific settings in your `Pods`, Kubernetes
+will at some point decide to terminate the `Pod`. As long as the `Node` is not
+completely removed from the Kubernetes API server, Kubernetes will try to use
+the `Node` itself to terminate the `Pod`.
+
+The `ArangoDeployment` operator recognizes this condition and will try to replace those
+`Pods` with `Pods` on different nodes. The exact behavior differs per type of server.
+
+### What happens when a `Node` with local data is broken
+
+When a `Node` with `PersistentVolumes` hosted on that `Node` is broken and
+cannot be repaired, the data in those `PersistentVolumes` is lost.
+
+If an `ArangoDeployment` of type `Single` was using one of those `PersistentVolumes`
+the database is lost and must be restored from a backup.
+
+If an `ArangoDeployment` of type `ActiveFailover` or `Cluster` was using one of
+those `PersistentVolumes`, it depends on the type of server that was using the volume.
+
+- If an `Agent` was using the volume, it can be repaired as long as 2 other agents are still     healthy.
+- If a `DBServer` was using the volume, and the replication factor of all database
+  collections is 2 of higher, and the remaining dbservers are still healthy,
+  the cluster will duplicate the remaining replicas to
+  bring the number of replicases back to the original number.
+- If a `DBServer` was using the volume, and the replication factor of a database
+  collection is 1 and happens to be stored on that dbserver, the data is lost.
+- If a single server of an `ActiveFailover` deployment was using the volume, and the
+  other single server is still healthy, the other single server will become leader.
+  After replacing the failed single server, the new follower will synchronize with
+  the leader.