Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 10 additions & 1 deletion charts/prometheus-rds-alerts/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ rules:
severity: warning
annotations:
summary: "Less than 20% free disk space on at least one instance"
description: 'One or more RDS instances has <20% free disk space'
description: "One or more RDS instances has <20% free disk space"

RDSDiskSpaceLimit:
expr: max by (aws_account_id, aws_region, dbidentifier) (rds_free_storage_bytes{} * 100 / rds_allocated_storage_bytes{}) < 10
Expand Down Expand Up @@ -204,3 +204,12 @@ rules:
annotations:
summary: "RDS instance(s) use(s) a certificate with an expiration date inferior to 15 days"
description: "{{ $value }} instance(s) of the AWS account ID={{ $labels.aws_account_id}} in region={{ $labels.aws_region }} use(s) a certificate with an expiration date inferior to 15 days"

RDSFullDiskSpace:
expr: max by (aws_account_id, aws_region, dbidentifier) (rds_instance_status{}) == -7
for: 5m
labels:
severity: critical
annotations:
summary: "Instance storage is full"
description: "{{ $labels.dbidentifier }} storage is full"
37 changes: 5 additions & 32 deletions content/runbooks/rds/RDSDiskSpaceLimit.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,50 +61,23 @@ Determine whether it's a long-term growth trend requiring storage increase or ab
## Mitigation

You must avoid reaching no disk space left situation.
Increase RDS disk space

- Fix the system that blocks PostgreSQL to recycle its WAL files
{{< hint danger >}}

- If long-running transactions/queries: Cancel or kill the transactions
- If non-running replication slot: Delete replication slot

- Increase RDS disk space

{{< hint danger >}}
{{< hint danger >}}
{{% aws-rds-storage-increase-limitations %}}
{{< /hint >}}

1. Set AWS_PROFILE
{{% aws-rds-storage-increase-commands %}}

```bash
export AWS_PROFILE=<AWS account>
```

2. Determine the minimum storage for the increase
💡 RDS requires a minimal storage increase of 10%

```bash
INSTANCE_IDENTIFIER=<replace with the RDS instance identifier>
```

```bash
aws rds describe-db-instances --db-instance-identifier ${INSTANCE_IDENTIFIER} \
| jq -r '{"Current IOPS": .DBInstances[0].Iops, "Current Storage Limit": .DBInstances[0].AllocatedStorage, "New minimum storage size": ((.DBInstances[0].AllocatedStorage|tonumber)+(.DBInstances[0].AllocatedStorage|tonumber*0.1|floor))}'
```

3. Increase storage:

```bash
NEW_ALLOCATED_STORAGE=<replace with new allocated storage in GB>
```

```bash
aws rds modify-db-instance --db-instance-identifier ${INSTANCE_IDENTIFIER} --allocated-storage ${NEW_ALLOCATED_STORAGE} --apply-immediately \
| jq .DBInstance.PendingModifiedValues
```

❗ If the RDS instance has replicas instances (replica or reporting), you must repeat the operation for all replicas to keep the same configuration between instances

4. Backport changes in Terraform
1. Backport changes in Terraform

## Additional resources

Expand Down
60 changes: 60 additions & 0 deletions content/runbooks/rds/RDSFullDiskSpace.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
---
title: Full disk space
---

# RDSFullDiskSpace

## Meaning

Alert is triggered when RDS instance storage is full

## Impact

PostgreSQL automatically stops when it detects there is no more disk space available.

**All database accesses are blocked**, causing application errors

## Diagnosis

You need to increase the RDS storage. Determine whether it's a long-term growth trend requiring storage increase or abnormal disk usage reflecting another problem.

{{< hint danger >}}
Since RDS disk space cannot be reduced and storage modifications are limited to once every 6 hours, you should carefully evaluate your storage requirements before making changes."
{{< /hint >}}

## Mitigation

RDS instances is **no more reachable**, you **must increase the RDS storage allocated disk**.

{{< hint danger >}}
{{% aws-rds-storage-increase-limitations %}}
{{< /hint >}}

{{% aws-rds-storage-increase-commands %}}

1. Wait for instance to pass in `storage-optimization` status

The instance becomes accessible after the `modifying` operation is complete.

{{<hint>}}
{{% aws-rds-status-storage-optimization %}}
{{< /hint >}}

See RDS instance status:

```bash
aws rds describe-db-instances \
--db-instance-identifier ${INSTANCE_IDENTIFIER} \
--query "DBInstances[0].[DBInstanceStatus]"
```

Additionally you can follow RDS event for this instance:

{{% aws-rds-list-events %}}

1. Backport changes in Terraform

## Additional resources

- [RDS Storage Modification](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_PIOPS.StorageTypes.html#USER_PIOPS.ModifyingExisting)
- [AWS RDS Storage Autoscaling](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_PIOPS.StorageTypes.html#USER_PIOPS.Autoscaling)
9 changes: 9 additions & 0 deletions layouts/shortcodes/aws-rds-list-events.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
<!-- markdownlint-disable-next-line MD041 // this content is included in section -->

```bash
aws rds describe-events \
--source-identifier ${INSTANCE_IDENTIFIER} \
--duration 720 \
--source-type db-instance \
| jq -r '.Events[] | "\(.Date) [\(.EventCategories[0])] \(.Message)"'
```
5 changes: 5 additions & 0 deletions layouts/shortcodes/aws-rds-status-storage-optimization.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
<!-- markdownlint-disable-next-line MD036 MD041 // this content is included in hint block -->

Storage optimization can take several hours depending of the instance storage type.

During this process, the instance performance may be impacted, and further storage modifications are deferred until optimization is complete.
35 changes: 35 additions & 0 deletions layouts/shortcodes/aws-rds-storage-increase-commands.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
<!-- markdownlint-disable-next-line MD036 MD041 // this content is included in hint block -->

1. Set AWS_PROFILE

```bash
export AWS_PROFILE=<AWS account>
```

2. Determine the minimum storage for the increase

💡 RDS requires a minimal storage increase of 10%

```bash
INSTANCE_IDENTIFIER=<replace with the RDS instance identifier>
```

```bash
aws rds describe-db-instances --db-instance-identifier ${INSTANCE_IDENTIFIER} \
| jq -r '{"Current IOPS": .DBInstances[0].Iops, "Current Storage Limit": .DBInstances[0].AllocatedStorage, "New minimum storage size": ((.DBInstances[0].AllocatedStorage|tonumber)+(.DBInstances[0].AllocatedStorage|tonumber*0.1|floor))}'
```

3. Increase storage:

```bash
NEW_ALLOCATED_STORAGE=<replace with new allocated storage in GB>
```

```bash
aws rds modify-db-instance --db-instance-identifier ${INSTANCE_IDENTIFIER} --allocated-storage ${NEW_ALLOCATED_STORAGE} --apply-immediately \
| jq .DBInstance.PendingModifiedValues
```

Instance will quickly pass in `modifying` then `storage-optimization` status.

❗ If the RDS instance has replicas instances, you must repeat the operation for each replicas to keep the same configuration between instances