Skip to content

Commit 4951eb9

Browse files
authored
Document operations (#63)
Signed-off-by: Fabian Reinartz <freinartz@google.com>
1 parent a5b4e53 commit 4951eb9

File tree

2 files changed

+115
-1
lines changed

2 files changed

+115
-1
lines changed

docs/operations.md

Lines changed: 114 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,114 @@
1+
# Operations
2+
3+
## Prerequisites
4+
5+
The sidecar exposes a variety of metrics about its internal state that are
6+
essential during troubleshooting. Ensure that its associated Prometheus server is
7+
configured to scrape the sidecar's `/metrics` endpoint.
8+
9+
## Verify that the sidecar is running
10+
11+
Verify that the sidecar is running along your Prometheus server:
12+
13+
```
14+
kubectl -n <your_namespace> get pods
15+
```
16+
17+
You should see the following line:
18+
19+
```
20+
NAME READY STATUS RESTARTS AGE
21+
...
22+
prometheus-k8s-85cf598f75-64fjk 2/2 Running 0 24m
23+
...
24+
```
25+
26+
If it shows to have only one container (Ready: `1/1`), go back to the setup
27+
instructions and verify that you've correctly configured the Prometheus
28+
deployment/stateful set.
29+
30+
If it shows not both containers are ready, check the logs of the Prometheus and
31+
sidecar containers for any error messages:
32+
33+
```
34+
kubectl -n <your_namesapce> logs <pod_name> prometheus
35+
kubectl -n <your_namesapce> logs <pod_name> sidecar
36+
```
37+
38+
## Verify that the sidecar operates correctly
39+
40+
### Does the sidecar process Prometheus's data?
41+
42+
The sidecar follows the write-ahead-log of the Prometheus storage and converts
43+
Prometheus data into Stackdriver time series.
44+
45+
Go to the Prometheus UI and run the following query:
46+
47+
```
48+
rate(prometheus_sidecar_samples_processed[5m])
49+
```
50+
51+
It should produce a value greater than 0, which indicates how many Prometheus
52+
samples the sidecar is continously processing.
53+
54+
If it is zero, go to the `/targets` page in the UI and verify that Prometheus
55+
itself is actually ingesting data. If no targets are visible, consult the
56+
[Prometheus documentation][prom-getting-started] on how to configure Prometheus correctly.
57+
58+
### Are samples being sent to Stackdriver?
59+
60+
Run the following query to verify that the sidecar produces Stackdriver data
61+
from the Prometheus samples:
62+
63+
```
64+
rate(prometheus_sidecar_samples_produced[5m])
65+
```
66+
67+
The number is generally expected to be lower than the number of processed samples
68+
since multiple Prometheus samples (e.g. histogram buckets) may be consolidated
69+
into a single complex Stackdriver sample.
70+
71+
If it is zero, check the sidecar's logs for reported errors.
72+
73+
Verify that the produced samples are successfully being sent to Stackdriver:
74+
75+
```
76+
rate(prometheus_remote_storage_succeeded_samples_total[5m])
77+
```
78+
79+
The number should generally match the number of produced samples from the previous
80+
metric. If it is notably lower, check the sidecars logs for hints that Stackdriver
81+
rejected some samples.
82+
If no samples were sent successfully at all, the logs might indicate a broader
83+
error such as invalid credentials.
84+
85+
### Can the sidecar keep up with Prometheus?
86+
87+
The number of samples produced by Prometheus and processed by the sidecar, should
88+
be virtually identical. The following two queries should report nearly the same
89+
number:
90+
91+
```
92+
rate(prometheus_sidecar_samples_processed[5m])
93+
rate(prometheus_tsdb_head_samples_appended_total[5m])
94+
```
95+
96+
If the sidecar's processed samples are notably lower, Prometheus may be producing
97+
more data than the sidecar can process and/or write to Stackdriver.
98+
Check the sidecar for logs that indicate rate limiting by the Stackdriver API.
99+
You can further verify backpressure with the following query:
100+
101+
```
102+
prometheus_remote_storage_queue_length{queue="https://monitoring.googleapis.com:443/"} /
103+
prometheus_remote_storage_queue_capacity{queue="https://monitoring.googleapis.com:443/"}
104+
```
105+
106+
If the queue fullness has an upward trend or has already reached 1, you may
107+
consider [filtering][filter-docs] the amount of data that is forward to
108+
Stackdriver to excldue particularly noisy or high-volume metrics.
109+
Reducing the overall scrape interval of Prometheus is another option.
110+
111+
112+
[prom-getting-started]: https://prometheus.io/docs/prometheus/latest/getting_started/
113+
[filter-docs]: ../README.md#filters
114+

kube/full/deploy.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ kubectl apply -f _prometheus.yaml.tmp
1919
kubectl apply -f _node-exporter.yaml.tmp
2020
kubectl apply -f _kube-state-metrics.yaml.tmp --as=admin --as-group=system:masters
2121

22-
DATA_DIR=/data DATA_VOLUME=data-volume ../patch.sh deploy prometheus-meta
22+
DATA_DIR=/data DATA_VOLUME=data-volume ../patch.sh deploy prometheus-k8s
2323

2424
rm _*.tmp
2525
popd

0 commit comments

Comments
 (0)