Skip to content

Commit 0b636a1

Browse files
committed
refactor(prometheus_adapter): 重构告警规则管理API和逻辑
- 将watch_time字段从AlertRuleMeta移到AlertRule中 - 移除全量同步接口,改为增量更新方式 - 实现批量更新规则元信息的API - 重构服务层代码结构,提高可维护性 - 更新文档
1 parent f6c1ad1 commit 0b636a1

File tree

5 files changed

+182
-241
lines changed

5 files changed

+182
-241
lines changed

docs/prometheus_adapter/README.md

Lines changed: 24 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
- 架构设计
1111
- API 参考
1212
- 指标查询
13-
- 告警规则同步
13+
- 告警规则管理
1414
- Alertmanager 集成
1515
- 支持的服务
1616
- 错误码
@@ -36,7 +36,7 @@ internal/prometheus_adapter/
3636
├── api/ # API 层,处理 HTTP 请求
3737
│ ├── api.go # API 基础结构和初始化
3838
│ ├── metric_api.go # 指标相关的 API 处理器
39-
│ └── alert_api.go # 告警规则同步 API 处理器
39+
│ └── alert_api.go # 告警规则管理 API 处理器
4040
├── service/ # 业务逻辑层
4141
│ ├── metric_service.go # 指标查询服务实现
4242
│ └── alert_service.go # 告警规则同步服务实现
@@ -137,42 +137,9 @@ internal/prometheus_adapter/
137137
}
138138
```
139139

140-
### 告警规则同步
140+
### 告警规则管理
141141

142-
#### 1. 全量同步规则
143-
- 方法与路径:`POST /v1/alert-rules/sync`
144-
- 功能:接收监控告警模块发送的完整规则列表,生成 Prometheus 规则文件并触发重载(全量同步)
145-
- 请求体示例:
146-
```json
147-
{
148-
"rules": [
149-
{
150-
"name": "high_cpu_usage",
151-
"description": "CPU使用率过高告警",
152-
"expr": "system_cpu_usage_percent",
153-
"op": ">",
154-
"severity": "warning"
155-
}
156-
],
157-
"rule_metas": [
158-
{
159-
"alert_name": "high_cpu_usage", // 与规则模板的name字段保持一致
160-
"labels": "{\"service\":\"storage-service\",\"version\":\"1.0.0\"}",
161-
"threshold": 90,
162-
"watch_time": 300
163-
}
164-
]
165-
}
166-
```
167-
- 响应示例:
168-
```json
169-
{
170-
"status": "success",
171-
"message": "Rules synced to Prometheus"
172-
}
173-
```
174-
175-
#### 2. 更新单个规则模板
142+
#### 1. 更新单个规则模板
176143
- 方法与路径:`PUT /v1/alert-rules/:rule_name`
177144
- 功能:更新指定的告警规则模板,系统会自动查找所有使用该规则的元信息并重新生成 Prometheus 规则
178145
- 路径参数:
@@ -183,7 +150,8 @@ internal/prometheus_adapter/
183150
"description": "CPU使用率异常告警(更新后)",
184151
"expr": "avg(system_cpu_usage_percent)",
185152
"op": ">=",
186-
"severity": "critical"
153+
"severity": "critical",
154+
"watch_time": 300
187155
}
188156
```
189157
- 响应示例:
@@ -195,25 +163,33 @@ internal/prometheus_adapter/
195163
}
196164
```
197165

198-
#### 3. 更新单个规则元信息
199-
- 方法与路径:`PUT /v1/alert-rules/meta`
200-
- 功能:更新指定规则的元信息,系统会根据对应的规则模板重新生成 Prometheus 规则
166+
#### 2. 批量更新规则元信息
167+
- 方法与路径:`PUT /v1/alert-rules-meta/:rule_name`
168+
- 功能:批量更新指定规则的元信息,系统会根据对应的规则模板重新生成 Prometheus 规则
169+
- 路径参数:
170+
- `rule_name`:规则名称(如 `high_cpu_usage`
201171
- 请求体示例:
202172
```json
203173
{
204-
"rule_name": "high_cpu_usage", // 必填,对应规则模板的name
205-
"labels": "{\"service\":\"storage-service\",\"version\":\"2.0.0\"}", // 必填,用于唯一标识
206-
"threshold": 85,
207-
"watch_time": 600
174+
"metas": [
175+
{
176+
"labels": "{\"service\":\"storage-service\",\"version\":\"1.0.0\"}", // 必填,用于唯一标识
177+
"threshold": 85
178+
},
179+
{
180+
"labels": "{\"service\":\"storage-service\",\"version\":\"2.0.0\"}", // 必填,用于唯一标识
181+
"threshold": 90
182+
}
183+
]
208184
}
209185
```
210186
- 响应示例:
211187
```json
212188
{
213189
"status": "success",
214-
"message": "Rule meta updated and synced to Prometheus",
190+
"message": "Rule metas updated and synced to Prometheus",
215191
"rule_name": "high_cpu_usage",
216-
"labels": "{\"service\":\"storage-service\",\"version\":\"2.0.0\"}"
192+
"updated_count": 2
217193
}
218194
```
219195

@@ -232,11 +208,11 @@ internal/prometheus_adapter/
232208
- `expr`:PromQL 表达式,如 `sum(apitime) by (service, version)`,可包含时间范围
233209
- `op`:比较操作符(`>`, `<`, `=`, `!=`
234210
- `severity`:告警等级,通常进入告警的 labels.severity
211+
- `watch_time`:持续时间(秒),对应 Prometheus 的 `for` 字段
235212
- **AlertRuleMeta(元信息)**
236213
- `alert_name`:关联的规则名称(对应 alert_rules.name)
237214
- `labels`:JSON 格式的标签,用于筛选特定服务(如 `{"service":"s3","version":"v1"}`
238215
- `threshold`:告警阈值
239-
- `watch_time`:持续时间(秒),对应 Prometheus 的 `for` 字段
240216

241217
#### 增量更新说明
242218
- **增量更新**:新接口支持增量更新,只需传入需要修改的字段

internal/prometheus_adapter/api/alert_api.go

Lines changed: 35 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -10,32 +10,8 @@ import (
1010

1111
// setupAlertRouters 设置告警相关路由
1212
func (api *Api) setupAlertRouters(router *fox.Engine) {
13-
router.POST("/v1/alert-rules/sync", api.SyncRules)
1413
router.PUT("/v1/alert-rules/:rule_name", api.UpdateRule)
15-
router.PUT("/v1/alert-rules/meta", api.UpdateRuleMeta)
16-
}
17-
18-
// SyncRules 同步规则到Prometheus
19-
// 接收从监控告警模块发来的规则列表,生成Prometheus规则文件并重载配置
20-
func (api *Api) SyncRules(c *fox.Context) {
21-
var req model.SyncRulesRequest
22-
if err := c.ShouldBindJSON(&req); err != nil {
23-
SendErrorResponse(c, http.StatusBadRequest, model.ErrorCodeInvalidParameter,
24-
"Invalid request body: "+err.Error(), nil)
25-
return
26-
}
27-
28-
err := api.alertService.SyncRulesToPrometheus(req.Rules, req.RuleMetas)
29-
if err != nil {
30-
SendErrorResponse(c, http.StatusInternalServerError, model.ErrorCodeInternalError,
31-
"Failed to sync rules to Prometheus: "+err.Error(), nil)
32-
return
33-
}
34-
35-
c.JSON(http.StatusOK, map[string]string{
36-
"status": "success",
37-
"message": "Rules synced to Prometheus",
38-
})
14+
router.PUT("/v1/alert-rules-meta/:rule_name", api.UpdateRuleMetas)
3915
}
4016

4117
// UpdateRule 更新单个规则模板
@@ -62,6 +38,7 @@ func (api *Api) UpdateRule(c *fox.Context) {
6238
Expr: req.Expr,
6339
Op: req.Op,
6440
Severity: req.Severity,
41+
WatchTime: req.WatchTime,
6542
}
6643

6744
err := api.alertService.UpdateRule(rule)
@@ -81,42 +58,52 @@ func (api *Api) UpdateRule(c *fox.Context) {
8158
})
8259
}
8360

84-
// UpdateRuleMeta 更新单个规则元信息
85-
// 通过 alert_name + labels 唯一确定一个元信息记录
86-
func (api *Api) UpdateRuleMeta(c *fox.Context) {
61+
// UpdateRuleMetas 批量更新规则元信息
62+
// 通过 rule_name + labels 唯一确定一个元信息记录
63+
func (api *Api) UpdateRuleMetas(c *fox.Context) {
64+
ruleName := c.Param("rule_name")
65+
if ruleName == "" {
66+
SendErrorResponse(c, http.StatusBadRequest, model.ErrorCodeInvalidParameter,
67+
"Rule name is required", nil)
68+
return
69+
}
70+
8771
var req model.UpdateAlertRuleMetaRequest
8872
if err := c.ShouldBindJSON(&req); err != nil {
8973
SendErrorResponse(c, http.StatusBadRequest, model.ErrorCodeInvalidParameter,
9074
"Invalid request body: "+err.Error(), nil)
9175
return
9276
}
9377

94-
// alert_name 和 labels 是必填的
95-
if req.AlertName == "" || req.Labels == "" {
78+
if len(req.Metas) == 0 {
9679
SendErrorResponse(c, http.StatusBadRequest, model.ErrorCodeInvalidParameter,
97-
"alert_name and labels are required", nil)
80+
"At least one meta update is required", nil)
9881
return
9982
}
10083

101-
// 构建完整的元信息对象
102-
meta := model.AlertRuleMeta{
103-
AlertName: req.AlertName,
104-
Labels: req.Labels,
105-
Threshold: req.Threshold,
106-
WatchTime: req.WatchTime,
107-
}
108-
109-
err := api.alertService.UpdateRuleMeta(meta)
110-
if err != nil {
111-
SendErrorResponse(c, http.StatusInternalServerError, model.ErrorCodeInternalError,
112-
"Failed to update rule meta: "+err.Error(), nil)
113-
return
84+
// 批量更新元信息
85+
updatedCount := 0
86+
for _, metaUpdate := range req.Metas {
87+
// 构建完整的元信息对象
88+
meta := model.AlertRuleMeta{
89+
AlertName: ruleName,
90+
Labels: metaUpdate.Labels,
91+
Threshold: metaUpdate.Threshold,
92+
}
93+
94+
err := api.alertService.UpdateRuleMeta(meta)
95+
if err != nil {
96+
SendErrorResponse(c, http.StatusInternalServerError, model.ErrorCodeInternalError,
97+
fmt.Sprintf("Failed to update rule meta: %v", err), nil)
98+
return
99+
}
100+
updatedCount++
114101
}
115102

116103
c.JSON(http.StatusOK, map[string]interface{}{
117-
"status": "success",
118-
"message": "Rule meta updated and synced to Prometheus",
119-
"alert_name": req.AlertName,
120-
"labels": req.Labels,
104+
"status": "success",
105+
"message": "Rule metas updated and synced to Prometheus",
106+
"rule_name": ruleName,
107+
"updated_count": updatedCount,
121108
})
122109
}

internal/prometheus_adapter/model/alert.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ type AlertRule struct {
77
Expr string `json:"expr" gorm:"type:text;not null"` // 左侧业务指标表达式,如 sum(apitime) by (service, version)
88
Op string `json:"op" gorm:"type:varchar(4);not null"` // 阈值比较方式(>, <, =, !=)
99
Severity string `json:"severity" gorm:"type:varchar(32);not null"` // 告警等级,通常进入告警的 labels.severity
10+
WatchTime int `json:"watch_time"` // 持续时长(秒),映射 Prometheus rule 的 for 字段
1011
}
1112

1213
// AlertRuleMeta 告警规则元信息表 - 存储服务级别的告警配置
@@ -15,5 +16,4 @@ type AlertRuleMeta struct {
1516
AlertName string `json:"alert_name" gorm:"type:varchar(255);index"` // 关联 alert_rules.name
1617
Labels string `json:"labels" gorm:"type:jsonb"` // 适用标签,如 {"service":"s3","version":"v1"},为空表示全局
1718
Threshold float64 `json:"threshold"` // 阈值(会被渲染成特定规则的 threshold metric 数值)
18-
WatchTime int `json:"watch_time"` // 持续时长(映射 Prometheus rule 的 for)
1919
}

internal/prometheus_adapter/model/api.go

Lines changed: 7 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,7 @@ type UpdateAlertRuleRequest struct {
4646
Expr string `json:"expr,omitempty"`
4747
Op string `json:"op,omitempty" binding:"omitempty,oneof=> < = !="`
4848
Severity string `json:"severity,omitempty"`
49+
WatchTime int `json:"watch_time,omitempty"` // 持续时长(秒)
4950
}
5051

5152
// CreateAlertRuleMetaRequest 创建告警规则元信息请求
@@ -57,17 +58,13 @@ type CreateAlertRuleMetaRequest struct {
5758
MatchTime string `json:"match_time,omitempty"`
5859
}
5960

60-
// UpdateAlertRuleMetaRequest 更新告警规则元信息请求
61+
// UpdateAlertRuleMetaRequest 批量更新告警规则元信息请求
6162
type UpdateAlertRuleMetaRequest struct {
62-
AlertName string `json:"alert_name" binding:"required"`
63-
Labels string `json:"labels" binding:"required"`
64-
Threshold float64 `json:"threshold"`
65-
WatchTime int `json:"watch_time"`
63+
Metas []AlertRuleMetaUpdate `json:"metas" binding:"required"`
6664
}
6765

68-
// SyncRulesRequest 同步规则请求
69-
// 从监控告警模块发送过来的完整规则列表
70-
type SyncRulesRequest struct {
71-
Rules []AlertRule `json:"rules"` // 告警规则列表
72-
RuleMetas []AlertRuleMeta `json:"rule_metas"` // 规则元信息列表
66+
// AlertRuleMetaUpdate 单个规则元信息更新项
67+
type AlertRuleMetaUpdate struct {
68+
Labels string `json:"labels" binding:"required"` // 必填,用于唯一标识
69+
Threshold float64 `json:"threshold"`
7370
}

0 commit comments

Comments
 (0)