Skip to content

Conversation

@schmikei
Copy link
Contributor

@schmikei schmikei commented Oct 29, 2025

This modernizes the CouchDB Mixin to use newer libraries.

Overview
image
image
image
image

Nodes:
image
image
image

Logs:
image

@schmikei schmikei marked this pull request as ready for review October 29, 2025 21:19
@schmikei schmikei requested a review from a team as a code owner October 29, 2025 21:19
Copy link
Member

@Dasomeone Dasomeone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going to head out in a second so I'll publish my comments so-far! Going to pick it back up in the morning
@aalhour would also appreciate a second pair of eyes on this one :)

* the prometheusWithTotal is used for backwards compatibility as some metrics are suffixed with _total but in later versions of the couchdb-mixin.
* i.e. couchdb_open_os_files_total => couchdb_open_os_files
* This is to ensure that the signals for the metrics that are suffixed with _total continue to work as expected.
* This was an identified as a noticeable change from 3.3.0 to 3.5.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! 🚀
Can you call this out in the readme as well please? E.g. just what versions are supported, and what the different metricSources are for

Comment on lines +28 to +34
local dashboard = presto.grafana.dashboards[fname];
dashboard + util.patch_variables(dashboard, optional_labels)

for fname in std.objectFields(presto.grafana.dashboards)
},
prometheusAlerts+:: presto.prometheus.alerts,
prometheusRules+:: presto.prometheus.recordingRules,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presto?

Comment on lines +5 to +7
local presto =
prestolib.new()
+ prestolib.withConfigMixin(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presto?

name: 'Good response statuses',
nameShort: 'Good response statuses',
type: 'raw',
description: 'The total number of good response statuses aggregated across all nodes.',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
description: 'The total number of good response statuses aggregated across all nodes.',
description: 'The total number of good response (HTTP 2xx-3xx) statuses aggregated across all nodes.',

name: 'Error response statuses',
nameShort: 'Error response statuses',
type: 'raw',
description: 'The total number of error response statuses aggregated across all nodes.',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
description: 'The total number of error response statuses aggregated across all nodes.',
description: 'The total number of error response statuses (HTTP 4xx-5xx) aggregated across all nodes.',

errorResponseStatuses: {
name: 'Error response statuses',
nameShort: 'Error response statuses',
type: 'raw',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be requests per second, right? Currently it gets rendered as "rd/s" which I see as "reads per second" which feels wrong in this context?

goodResponseStatuses: {
name: 'Good response statuses',
nameShort: 'Good response statuses',
type: 'raw',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be requests per second, right? Currently it gets rendered as "rd/s" which I see as "reads per second" which feels wrong in this context?

Comment on lines +159 to +198
averageRequestLatencyp50: {
name: 'Average request latency p50',
nameShort: 'Average request latency p50',
type: 'raw',
description: 'The average request latency p50 aggregated across all nodes.',
unit: 's',
sources: {
prometheus: {
expr: 'avg by(' + groupLabelAggTerm + ', quantile) (couchdb_request_time_seconds{%(queriesSelector)s, quantile="0.5"})',
legendCustomTemplate: legendCustomTemplate + ' - p50',
},
},
},

averageRequestLatencyp75: {
name: 'Average request latency p75',
nameShort: 'Average request latency p75',
type: 'raw',
description: 'The average request latency p75 aggregated across all nodes.',
unit: 's',
sources: {
prometheus: {
expr: 'avg by(' + groupLabelAggTerm + ', quantile) (couchdb_request_time_seconds{%(queriesSelector)s, quantile="0.75"})',
legendCustomTemplate: legendCustomTemplate + ' - p75',
},
},
},

averageRequestLatencyp95: {
name: 'Average request latency p95',
nameShort: 'Average request latency p95',
type: 'raw',
description: 'The average request latency p95 aggregated across all nodes.',
unit: 's',
sources: {
prometheus: {
expr: 'avg by(' + groupLabelAggTerm + ', quantile) (couchdb_request_time_seconds{%(queriesSelector)s, quantile="0.95"})',
legendCustomTemplate: legendCustomTemplate + ' - p95',
},
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as my recent requests on the other old mixins, I think we could and should use histograms more, either as a replacement or in addition to these signals

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants