tkestack
diff --git a/‎playbook/README.md‎
Lines changed: 33 additions & 12 deletions b/‎playbook/README.md‎
Lines changed: 33 additions & 12 deletions
diff --git a/‎playbook/README_zh.md‎
Lines changed: 27 additions & 12 deletions b/‎playbook/README_zh.md‎
Lines changed: 27 additions & 12 deletions
@@ -33,6 +33,19 @@ Supports enabling `etcd Overload Protection` and `APF Flow Control` [APF Rate Li
 | `inject-stress-list-qps` | `int` | "100" | QPS per stress test Pod |
 | `inject-stress-total-duration` | `string` | "30s" | Total test duration (e.g. 30s, 5m) |
 
+**Recommended Parameters for TKE Clusters**
+
+| Cluseter Level | resource-create-object-size-bytes | resource-create-object-count | resource-create-qps | inject-stress-concurrency | inject-stress-list-qps |
+|---------|----------------------------------|-----------------------------|---------------------|--------------------------|-----------------------|
+| L5      | 10000                           | 100                         | 10                   | 6                        | 200                   |
+| L50     | 10000                           | 300                         | 10                   | 6                        | 200                   |
+| L100    | 50000                           | 500                         | 20                   | 6                        | 200                   |
+| L200    | 100000                          | 1000                        | 50                   | 9                        | 200                   |
+| L500    | 100000                          | 1000                        | 50                   | 12                       | 200                   |
+| L1000   | 100000                          | 3000                        | 50                   | 12                       | 300                   |
+| L3000   | 100000                          | 6000                        | 500                  | 18                       | 500                   |
+| L5000   | 100000                          | 10000                       | 500                  | 21                       | 500                   |
+
 **etcd Overload Protection & Enhanced APF**
 
 Tencent Cloud TKE team has developed these core protection features:
@@ -56,31 +69,39 @@ Supported versions:
 **playbook**: `workflow/coredns-disruption-scenario.yaml`
 
 This scenario simulates coredns service disruption by:
-1. Scaling coredns Deployment replicas to 0
-2. Maintaining zero replicas for specified duration
-3. Restoring original replica count
+
+1. **Pre-check**: Verify the existence of the `tke-chaos-test/tke-chaos-precheck-resource ConfigMap` in the target cluster to ensure the cluster is available for testing
+
+2. **Component Shutdown**: Log in to the Argo Web UI, click on `coredns-disruption-scenario workflow`, then click the `RESUME` button under the `SUMMARY` tab of the `suspend-1` node to scale down the coredns Deployment replicas to 0
+
+3. **Service Validation**: During the coredns disruption, you can verify whether your services are affected by the coredns disruption
+
+4. **Component Recovery**: Click the `RESUME` button under the `SUMMARY` tab of the `suspend-2` node to restore the coredns Deployment replicas
 
 **Parameters**
 
 | Parameter | Type | Default | Description |
 |-----------|------|---------|-------------|
-| `disruption-duration` | `string` | `30s` | Disruption duration (e.g. 30s, 5m) |
 | `kubeconfig-secret-name` | `string` | `dest-cluster-kubeconfig` | Target cluster kubeconfig secret name |
 
 ## kubernetes-proxy Disruption
 
 **playbook**: `workflow/kubernetes-proxy-disruption-scenario.yaml`
 
 This scenario simulates kubernetes-proxy service disruption by:
-1. Scaling kubernetes-proxy Deployment replicas to 0
-2. Maintaining zero replicas for specified duration
-3. Restoring original replica count
+
+1. **Pre-check**: Verify the existence of the `tke-chaos-test/tke-chaos-precheck-resource ConfigMap` in the target cluster to ensure the cluster is available for testing
+
+2. **Component Shutdown**: Log in to the Argo Web UI, click on `kubernetes-proxy-disruption-scenario workflow`, then click the `RESUME` button under the `SUMMARY` tab of the `suspend-1` node to scale down the kubernetes-proxy Deployment replicas to 0
+
+3. **Service Validation**: During the kubernetes-proxy disruption, you can verify whether your services are affected by the kubernetes-proxy disruption
+
+4. **Component Recovery**: Click the `RESUME` button under the `SUMMARY` tab of the `suspend-2` node to restore the kubernetes-proxy Deployment replicas
 
 **Parameters**
 
 | Parameter | Type | Default | Description |
 |-----------|------|---------|-------------|
-| `disruption-duration` | `string` | `30s` | Disruption duration (e.g. 30s, 5m) |
 | `kubeconfig-secret-name` | `string` | `dest-cluster-kubeconfig` | Target cluster kubeconfig secret name |
 
 ## Namespace Deletion Protection
@@ -140,10 +161,10 @@ kubectl create -f workflow/managed-cluster-master-component/restore-apiserver.ya
 
 | Parameter | Type | Default | Description |
 |-----------|------|---------|-------------|
-| `region` | `string` | `<REGION>` | Tencent Cloud region, e.g. `ap-guangzhou` [Region List](https://www.tencentcloud.com/document/product/213/6091?lang=en&pg=) |
-| `secret-id` | `string` | `<SECRET_ID>` | Tencent Cloud API secret ID, obtain from [API Key Management](https://console.cloud.tencent.com/cam/capi) |
-| `secret-key` | `string` | `<SECRET_KEY>` | Tencent Cloud API secret key |
-| `cluster-id` | `string` | `<CLUSTER_ID>` | Target cluster ID |
+| `region` | `string` | "" | Tencent Cloud region, e.g. `ap-guangzhou` [Region List](https://www.tencentcloud.com/document/product/213/6091?lang=en&pg=) |
+| `secret-id` | `string` | "" | Tencent Cloud API secret ID, obtain from [API Key Management](https://console.cloud.tencent.com/cam/capi) |
+| `secret-key` | `string` | "" | Tencent Cloud API secret key |
+| `cluster-id` | `string` | "" | Target cluster ID |
 | `kubeconfig-secret-name` | `string` | `dest-cluster-kubeconfig` | Secret name containing target cluster kubeconfig |
 
 **Notes**
 
@@ -33,6 +33,19 @@
 | `inject-stress-list-qps` | `int` | "100" | 每个发压`Pod`的`QPS` |
 | `inject-stress-total-duration` | `string` | "30s" | 发压执行总时长(如30s，5m等) |
 
+**TKE集群推荐压测参数**
+
+| 集群规格 | resource-create-object-size-bytes | resource-create-object-count | resource-create-qps | inject-stress-concurrency | inject-stress-list-qps |
+|---------|----------------------------------|-----------------------------|---------------------|--------------------------|-----------------------|
+| L5      | 10000                           | 100                         | 10                   | 6                        | 200                   |
+| L50     | 10000                           | 300                         | 10                   | 6                        | 200                   |
+| L100    | 50000                           | 500                         | 20                   | 6                        | 200                   |
+| L200    | 100000                          | 1000                        | 50                   | 9                        | 200                   |
+| L500    | 100000                          | 1000                        | 50                   | 12                       | 200                   |
+| L1000   | 100000                          | 3000                        | 50                   | 12                       | 300                   |
+| L3000   | 100000                          | 6000                        | 500                  | 18                       | 500                   |
+| L5000   | 100000                          | 10000                       | 500                  | 21                       | 500                   |
+
 **etcd过载保护&增强apf限流说明**
 
 腾讯云TKE团队在社区版本基础上开发了以下核心保护特性：
@@ -56,31 +69,33 @@
 **playbook**：`workflow/coredns-disruption-scenario.yaml`
 
 该场景通过以下方式构造`coredns`服务中断：
-1. 将`coredns Deployment`副本数缩容到`0`
-2. 维持指定时间副本数为`0`
-3. 恢复原有副本数
+
+1. **前置检查**：验证目标集群中存在`tke-chaos-test/tke-chaos-precheck-resource ConfigMap`，确保集群可用于演练
+2. **组件停机**：登录argo Web UI，点击`coredns-disruption-scenario workflow`，点击`suspend-1`节点`SUMMARY`标签下的`RESUME`按钮，将`coredns Deployment`副本数缩容到`0`
+3. **业务验证**：`coredns`停服期间，您可以去验证您的业务是否受到`cordns`停服的影响
+4. **组件恢复**：点击`suspend-2`节点`SUMMARY`标签下的`RESUME`按钮，将`coredns Deployment`副本数恢复
 
 **参数说明**
 
 | 参数名称 | 类型 | 默认值 | 说明 |
 |---------|------|--------|------|
-| `disruption-duration` | `string` | `30s` | 服务中断持续时间(如30s，5m等) |
 | `kubeconfig-secret-name` | `string` | `dest-cluster-kubeconfig` | `目标集群kubeconfig secret`名称，如为空，则演练当前集群 |
 
 ## kubernetes-proxy停服
 
 **playbook**：`workflow/kubernetes-proxy-disruption-scenario.yaml`
 
 该场景通过以下方式构造`kubernetes-proxy`服务中断：
-1. 将`kubernetes-proxy` `Deployment`副本数缩容到0
-2. 维持指定时间副本数为`0`
-3. 恢复原有副本数
+
+1. **前置检查**：验证目标集群中存在`tke-chaos-test/tke-chaos-precheck-resource ConfigMap`，确保集群可用于演练
+2. **组件停机**：登录argo Web UI，点击`kubernetes-proxy-disruption-scenario workflow`，点击`suspend-1`节点`SUMMARY`标签下的`RESUME`按钮，将`kubernetes-proxy Deployment`副本数缩容到`0`
+3. **业务验证**：`kubernetes-proxy`停服期间，您可以去验证您的业务是否受到`kubernetes-proxy`停服的影响
+4. **组件恢复**：点击`suspend-2`节点`SUMMARY`标签下的`RESUME`按钮，将`kubernetes-proxy Deployment`副本数恢复
 
 **参数说明**
 
 | 参数名称 | 类型 | 默认值 | 说明 |
 |---------|------|--------|------|
-| `disruption-duration` | `string` | `30s` | 服务中断持续时间(如30s，5m等) |
 | `kubeconfig-secret-name` | `string` | `dest-cluster-kubeconfig` | `目标集群kubeconfig secret`名称，如为空，则演练当前集群 |
 
 ## 命名空间删除防护
@@ -139,10 +154,10 @@ kubectl create -f workflow/managed-cluster-master-component/restore-apiserver.ya
 
 | 参数名称 | 类型 | 默认值 | 说明 |
 |---------|------|--------|------|
-| `region` | `string` | `<REGION>` | 腾讯云地域，如`ap-guangzhou` [地域查询](https://www.tencentcloud.com/zh/document/product/213/6091) |
-| `secret-id` | `string` | `<SECRET_ID>` | 腾讯云API密钥ID, 密钥可前往官网控制台 [API密钥管理](https://console.cloud.tencent.com/cam/capi) 进行获取 |
-| `secret-key` | `string` | `<SECRET_KEY>` | 腾讯云API密钥 |
-| `cluster-id` | `string` | `<CLUSTER_ID>` | 演练集群ID |
+| `region` | `string` | "" | 腾讯云地域，如`ap-guangzhou` [地域查询](https://www.tencentcloud.com/zh/document/product/213/6091) |
+| `secret-id` | `string` | "" | 腾讯云API密钥ID, 密钥可前往官网控制台 [API密钥管理](https://console.cloud.tencent.com/cam/capi) 进行获取 |
+| `secret-key` | `string` | "" | 腾讯云API密钥 |
+| `cluster-id` | `string` | "" | 演练集群ID |
 | `kubeconfig-secret-name` | `string` | `dest-cluster-kubeconfig` | 目标集群kubeconfig secret名称 |
 
 **注意事项**