You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
154370: sql: log when optimizer estimates for scans are inaccurate r=Uzair5162 a=Uzair5162
#### sql: estimate table statistic staleness in stats.Refresher
This commit adds an `EstimateStaleness()` method to the table statistic
`Refresher`, which estimates the current fraction of stale rows in a
given table with the formula:
`cur_fraction_stale = (time_since_last_refresh /
avg_time_between_refreshes) * target_fraction_stale_rows`
Although this isn’t used anywhere yet, it will be useful for logging
when scans are misestimated (see #153748).
Part of: #153748, #153873
Release note: None
#### sql: plumb StageID into table reader processors
This commit plumbs `StageID` from the general `ProcessorSpec` into
table reader processors. Specifically, `StageID` is plumbed into
`tableReader` for the row-based flow and `colBatchScanBase` (via
`ColBatchScan` and `ColBatchDirectScan`) for the vectorized flow.
Although `stageID` isn't used in these processors yet, it will
be useful for aggregating row counts from metrics across distributed
table readers for misestimate logging.
Part of: #153748
Release note: None
#### sql: log when optimizer estimates for scans are inaccurate
This commit logs a warning on the gateway node when the estimated row
count for a logical scan is inaccurate. The `DistSQLReceiver` on the
gateway node now maintains the row count estimate and metadata for each
logical scan stage. Table reader metrics are extended to include their
StageID, which we use to aggregate the emitted row counts from all table
readers processors corresponding to the same plan stage at the receiver.
An estimate is considered inaccurate if it is off by at least a factor
of 2 and a fixed offset of 100, matching the logic in the warning from
`EXPLAIN ANALYZE`. The log message includes the table and index being
scanned, the estimated and actual row counts, the time since the last
table stats collection, and the table's estimated staleness.
This log is gated behind a new cluster setting,
`sql.log.scan_row_count_misestimate.enabled` (default off). Logging only
happens for user tables and is rate limited to log misestimates from at
most 1 query every 10 seconds.
Fixes: #153748Fixes: #153873
Release note (sql change): Added a default-off cluster setting
(`sql.log.scan_row_count_misestimate.enabled`) that enables logging a
warning on the gateway node when optimizer estimates for scans are
inaccurate. The log message includes the table and index being scanned,
the estimated and actual row counts, the time since the last table stats
collection, and the table's estimated staleness.
Co-authored-by: Uzair Ahmad <uzair.ahmad@cockroachlabs.com>
Copy file name to clipboardExpand all lines: docs/generated/settings/settings-for-tenants.txt
+1Lines changed: 1 addition & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -325,6 +325,7 @@ sql.insights.execution_insights_capacity integer 1000 the size of the per-node s
325
325
sql.insights.high_retry_count.threshold integer 10 the number of retries a slow statement must have undergone for its high retry count to be highlighted as a potential problem application
326
326
sql.insights.latency_threshold duration 100ms amount of time after which an executing statement is considered slow. Use 0 to disable. application
327
327
sql.log.redact_names.enabled boolean false if set, schema object identifers are redacted in SQL statements that appear in event logs application
328
+
sql.log.scan_row_count_misestimate.enabled boolean false when set to true, log a warning when a scan's actual row count differs significantly from the optimizer's estimate application
328
329
sql.log.slow_query.experimental_full_table_scans.enabled boolean false when set to true, statements that perform a full table/index scan will be logged to the slow query log even if they do not meet the latency threshold. Must have the slow query log enabled for this setting to have any effect. application
329
330
sql.log.slow_query.internal_queries.enabled boolean false when set to true, internal queries which exceed the slow query log threshold are logged to a separate log. Must have the slow query log enabled for this setting to have any effect. application
330
331
sql.log.slow_query.latency_threshold duration 0s when set to non-zero, log statements whose service latency exceeds the threshold to a secondary logger on each node application
Copy file name to clipboardExpand all lines: docs/generated/settings/settings.html
+1Lines changed: 1 addition & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -280,6 +280,7 @@
280
280
<tr><td><divid="setting-sql-insights-high-retry-count-threshold" class="anchored"><code>sql.insights.high_retry_count.threshold</code></div></td><td>integer</td><td><code>10</code></td><td>the number of retries a slow statement must have undergone for its high retry count to be highlighted as a potential problem</td><td>Basic/Standard/Advanced/Self-Hosted</td></tr>
281
281
<tr><td><divid="setting-sql-insights-latency-threshold" class="anchored"><code>sql.insights.latency_threshold</code></div></td><td>duration</td><td><code>100ms</code></td><td>amount of time after which an executing statement is considered slow. Use 0 to disable.</td><td>Basic/Standard/Advanced/Self-Hosted</td></tr>
282
282
<tr><td><divid="setting-sql-log-redact-names-enabled" class="anchored"><code>sql.log.redact_names.enabled</code></div></td><td>boolean</td><td><code>false</code></td><td>if set, schema object identifers are redacted in SQL statements that appear in event logs</td><td>Basic/Standard/Advanced/Self-Hosted</td></tr>
283
+
<tr><td><divid="setting-sql-log-scan-row-count-misestimate-enabled" class="anchored"><code>sql.log.scan_row_count_misestimate.enabled</code></div></td><td>boolean</td><td><code>false</code></td><td>when set to true, log a warning when a scan's actual row count differs significantly from the optimizer's estimate</td><td>Basic/Standard/Advanced/Self-Hosted</td></tr>
283
284
<tr><td><divid="setting-sql-log-slow-query-experimental-full-table-scans-enabled" class="anchored"><code>sql.log.slow_query.experimental_full_table_scans.enabled</code></div></td><td>boolean</td><td><code>false</code></td><td>when set to true, statements that perform a full table/index scan will be logged to the slow query log even if they do not meet the latency threshold. Must have the slow query log enabled for this setting to have any effect.</td><td>Basic/Standard/Advanced/Self-Hosted</td></tr>
284
285
<tr><td><divid="setting-sql-log-slow-query-internal-queries-enabled" class="anchored"><code>sql.log.slow_query.internal_queries.enabled</code></div></td><td>boolean</td><td><code>false</code></td><td>when set to true, internal queries which exceed the slow query log threshold are logged to a separate log. Must have the slow query log enabled for this setting to have any effect.</td><td>Basic/Standard/Advanced/Self-Hosted</td></tr>
285
286
<tr><td><divid="setting-sql-log-slow-query-latency-threshold" class="anchored"><code>sql.log.slow_query.latency_threshold</code></div></td><td>duration</td><td><code>0s</code></td><td>when set to non-zero, log statements whose service latency exceeds the threshold to a secondary logger on each node</td><td>Basic/Standard/Advanced/Self-Hosted</td></tr>
0 commit comments