From 232fad03663e082da7b948e7d93b7a3350a4c225 Mon Sep 17 00:00:00 2001
From: Vincent Mercier <vincmer@amazon.com>
Date: Mon, 1 Sep 2025 13:46:30 +0200
Subject: [PATCH 1/4] refact(content): Move Storage increase commands to
 shortcode

---
 content/runbooks/rds/RDSDiskSpaceLimit.md     | 44 ++-----------------
 .../aws-rds-storage-increase-commands.md      | 35 +++++++++++++++
 2 files changed, 39 insertions(+), 40 deletions(-)
 create mode 100644 layouts/shortcodes/aws-rds-storage-increase-commands.md
diff --git a/content/runbooks/rds/RDSDiskSpaceLimit.md b/content/runbooks/rds/RDSDiskSpaceLimit.md
index 282755e..1c1eba8 100644
--- a/content/runbooks/rds/RDSDiskSpaceLimit.md
+++ b/content/runbooks/rds/RDSDiskSpaceLimit.md
@@ -60,51 +60,15 @@ Determine whether it's a long-term growth trend requiring storage increase or ab
 
 ## Mitigation
 
-You must avoid reaching no disk space left situation.
+Increase RDS disk space
 
-- Fix the system that blocks PostgreSQL to recycle its WAL files
-
-  - If long-running transactions/queries: Cancel or kill the transactions
-  - If non-running replication slot: Delete replication slot
-
-- Increase RDS disk space
-
-    {{< hint danger >}}
+{{< hint danger >}}
 {{% aws-rds-storage-increase-limitations %}}
 {{< /hint >}}
 
-    1. Set AWS_PROFILE
-
-        ```bash
-        export AWS_PROFILE=<AWS account>
-        ```
-
-    2. Determine the minimum storage for the increase
-        💡 RDS requires a minimal storage increase of 10%
-
-        ```bash
-        INSTANCE_IDENTIFIER=<replace with the RDS instance identifier>
-        ```
-
-        ```bash
-        aws rds describe-db-instances --db-instance-identifier ${INSTANCE_IDENTIFIER} \
-        | jq -r '{"Current IOPS": .DBInstances[0].Iops, "Current Storage Limit": .DBInstances[0].AllocatedStorage, "New minimum storage size": ((.DBInstances[0].AllocatedStorage|tonumber)+(.DBInstances[0].AllocatedStorage|tonumber*0.1|floor))}'
-        ```
-
-    3. Increase storage:
-
-        ```bash
-        NEW_ALLOCATED_STORAGE=<replace with new allocated storage in GB>
-        ```
-
-        ```bash
-        aws rds modify-db-instance --db-instance-identifier ${INSTANCE_IDENTIFIER} --allocated-storage ${NEW_ALLOCATED_STORAGE} --apply-immediately \
-        | jq .DBInstance.PendingModifiedValues
-        ```
-
-        ❗ If the RDS instance has replicas instances (replica or reporting), you must repeat the operation for all replicas to keep the same configuration between instances
+{{% aws-rds-storage-increase-commands %}}
 
-    4. Backport changes in Terraform
+1. Backport changes in Terraform
 
 ## Additional resources
 
diff --git a/layouts/shortcodes/aws-rds-storage-increase-commands.md b/layouts/shortcodes/aws-rds-storage-increase-commands.md
new file mode 100644
index 0000000..b927a15
--- /dev/null
+++ b/layouts/shortcodes/aws-rds-storage-increase-commands.md
@@ -0,0 +1,35 @@
+<!-- markdownlint-disable-next-line MD036 MD041 // this content is included in hint block -->
+
+1. Set AWS_PROFILE
+
+    ```bash
+    export AWS_PROFILE=<AWS account>
+    ```
+
+2. Determine the minimum storage for the increase
+
+    💡 RDS requires a minimal storage increase of 10%
+
+    ```bash
+    INSTANCE_IDENTIFIER=<replace with the RDS instance identifier>
+    ```
+
+    ```bash
+    aws rds describe-db-instances --db-instance-identifier ${INSTANCE_IDENTIFIER} \
+    | jq -r '{"Current IOPS": .DBInstances[0].Iops, "Current Storage Limit": .DBInstances[0].AllocatedStorage, "New minimum storage size": ((.DBInstances[0].AllocatedStorage|tonumber)+(.DBInstances[0].AllocatedStorage|tonumber*0.1|floor))}'
+    ```
+
+3. Increase storage:
+
+    ```bash
+    NEW_ALLOCATED_STORAGE=<replace with new allocated storage in GB>
+    ```
+
+    ```bash
+    aws rds modify-db-instance --db-instance-identifier ${INSTANCE_IDENTIFIER} --allocated-storage ${NEW_ALLOCATED_STORAGE} --apply-immediately \
+    | jq .DBInstance.PendingModifiedValues
+    ```
+
+    Instance will quickly pass in `modifying` then `storage-optimization` status.
+
+    ❗ If the RDS instance has replicas instances, you must repeat the operation for each replicas to keep the same configuration between instances

From c1c75a3df2c80ec7c34a1dd29efa71b69fcddbc0 Mon Sep 17 00:00:00 2001
From: Vincent Mercier <vincmer@amazon.com>
Date: Mon, 1 Sep 2025 13:47:05 +0200
Subject: [PATCH 2/4] chore(shortcode): Add command to list RDS events

---
 layouts/shortcodes/aws-rds-list-events.md | 9 +++++++++
 1 file changed, 9 insertions(+)
 create mode 100644 layouts/shortcodes/aws-rds-list-events.md

diff --git a/layouts/shortcodes/aws-rds-list-events.md b/layouts/shortcodes/aws-rds-list-events.md
new file mode 100644
index 0000000..c0a5bf6
--- /dev/null
+++ b/layouts/shortcodes/aws-rds-list-events.md
@@ -0,0 +1,9 @@
+<!-- markdownlint-disable-next-line MD041 // this content is included in section -->
+
+```bash
+aws rds describe-events \
+--source-identifier ${INSTANCE_IDENTIFIER} \
+--duration 720 \
+--source-type db-instance \
+| jq -r '.Events[] | "\(.Date) [\(.EventCategories[0])] \(.Message)"'
+```

From 64d00fb733babd96f5b8d34617401eb85f90b7bb Mon Sep 17 00:00:00 2001
From: Vincent Mercier <vincmer@amazon.com>
Date: Mon, 1 Sep 2025 13:47:29 +0200
Subject: [PATCH 3/4] chore(shortcode): Add storage optimization

---
 layouts/shortcodes/aws-rds-status-storage-optimization.md | 5 +++++
 1 file changed, 5 insertions(+)
 create mode 100644 layouts/shortcodes/aws-rds-status-storage-optimization.md

diff --git a/layouts/shortcodes/aws-rds-status-storage-optimization.md b/layouts/shortcodes/aws-rds-status-storage-optimization.md
new file mode 100644
index 0000000..2aacb10
--- /dev/null
+++ b/layouts/shortcodes/aws-rds-status-storage-optimization.md
@@ -0,0 +1,5 @@
+<!-- markdownlint-disable-next-line MD036 MD041 // this content is included in hint block -->
+
+Storage optimization can take several hours depending of the instance storage type.
+
+During this process, the instance performance may be impacted, and further storage modifications are deferred until optimization is complete.

From e1aeb89ca9e7eebc3736135d13719ceda4be9e64 Mon Sep 17 00:00:00 2001
From: Vincent Mercier <vincmer@amazon.com>
Date: Mon, 1 Sep 2025 14:01:05 +0200
Subject: [PATCH 4/4] feat(runbook): Add RDSFullDiskSpace alert

---
 charts/prometheus-rds-alerts/values.yaml  | 11 ++++-
 content/runbooks/rds/RDSDiskSpaceLimit.md |  9 ++++
 content/runbooks/rds/RDSFullDiskSpace.md  | 60 +++++++++++++++++++++++
 3 files changed, 79 insertions(+), 1 deletion(-)
 create mode 100644 content/runbooks/rds/RDSFullDiskSpace.md

diff --git a/charts/prometheus-rds-alerts/values.yaml b/charts/prometheus-rds-alerts/values.yaml
index ebdba60..e311f8f 100644
--- a/charts/prometheus-rds-alerts/values.yaml
+++ b/charts/prometheus-rds-alerts/values.yaml
@@ -45,7 +45,7 @@ rules:
       severity: warning
     annotations:
       summary: "Less than 20% free disk space on at least one instance"
-      description: 'One or more RDS instances has <20% free disk space'
+      description: "One or more RDS instances has <20% free disk space"
 
   RDSDiskSpaceLimit:
     expr: max by (aws_account_id, aws_region, dbidentifier) (rds_free_storage_bytes{} * 100 / rds_allocated_storage_bytes{}) < 10
@@ -204,3 +204,12 @@ rules:
     annotations:
       summary: "RDS instance(s) use(s) a certificate with an expiration date inferior to 15 days"
       description: "{{ $value }} instance(s) of the AWS account ID={{ $labels.aws_account_id}} in region={{ $labels.aws_region }} use(s) a certificate with an expiration date inferior to 15 days"
+
+  RDSFullDiskSpace:
+    expr: max by (aws_account_id, aws_region, dbidentifier) (rds_instance_status{}) == -7
+    for: 5m
+    labels:
+      severity: critical
+    annotations:
+      summary: "Instance storage is full"
+      description: "{{ $labels.dbidentifier }} storage is full"
diff --git a/content/runbooks/rds/RDSDiskSpaceLimit.md b/content/runbooks/rds/RDSDiskSpaceLimit.md
index 1c1eba8..738fe0b 100644
--- a/content/runbooks/rds/RDSDiskSpaceLimit.md
+++ b/content/runbooks/rds/RDSDiskSpaceLimit.md
@@ -60,8 +60,17 @@ Determine whether it's a long-term growth trend requiring storage increase or ab
 
 ## Mitigation
 
+You must avoid reaching no disk space left situation.
 Increase RDS disk space
 
+- Fix the system that blocks PostgreSQL to recycle its WAL files
+{{< hint danger >}}
+
+  - If long-running transactions/queries: Cancel or kill the transactions
+  - If non-running replication slot: Delete replication slot
+
+- Increase RDS disk space
+
 {{< hint danger >}}
 {{% aws-rds-storage-increase-limitations %}}
 {{< /hint >}}
diff --git a/content/runbooks/rds/RDSFullDiskSpace.md b/content/runbooks/rds/RDSFullDiskSpace.md
new file mode 100644
index 0000000..550e073
--- /dev/null
+++ b/content/runbooks/rds/RDSFullDiskSpace.md
@@ -0,0 +1,60 @@
+---
+title: Full disk space
+---
+
+# RDSFullDiskSpace
+
+## Meaning
+
+Alert is triggered when RDS instance storage is full
+
+## Impact
+
+PostgreSQL automatically stops when it detects there is no more disk space available.
+
+**All database accesses are blocked**, causing application errors
+
+## Diagnosis
+
+You need to increase the RDS storage. Determine whether it's a long-term growth trend requiring storage increase or abnormal disk usage reflecting another problem.
+
+{{< hint danger >}}
+Since RDS disk space cannot be reduced and storage modifications are limited to once every 6 hours, you should carefully evaluate your storage requirements before making changes."
+{{< /hint >}}
+
+## Mitigation
+
+RDS instances is **no more reachable**, you **must increase the RDS storage allocated disk**.
+
+{{< hint danger >}}
+{{% aws-rds-storage-increase-limitations %}}
+{{< /hint >}}
+
+{{% aws-rds-storage-increase-commands %}}
+
+1. Wait for instance to pass in `storage-optimization` status
+
+   The instance becomes accessible after the `modifying` operation is complete.
+
+   {{<hint>}}
+   {{% aws-rds-status-storage-optimization %}}
+   {{< /hint >}}
+
+   See RDS instance status:
+
+   ```bash
+   aws rds describe-db-instances \                       
+      --db-instance-identifier ${INSTANCE_IDENTIFIER} \
+      --query "DBInstances[0].[DBInstanceStatus]"
+   ```
+
+   Additionally you can follow RDS event for this instance:
+
+   {{% aws-rds-list-events %}}
+
+1. Backport changes in Terraform
+
+## Additional resources
+
+- [RDS Storage Modification](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_PIOPS.StorageTypes.html#USER_PIOPS.ModifyingExisting)
+- [AWS RDS Storage Autoscaling](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_PIOPS.StorageTypes.html#USER_PIOPS.Autoscaling)