-
Notifications
You must be signed in to change notification settings - Fork 327
fix: Add default time filter to ClickHouse queries to avoid full tabl… #1041
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
|
@m0nikasingh is attempting to deploy a commit to the HyperDX Team on Vercel. A member of the Team first needs to authorize it. |
| const where = metricName | ||
| ? chSql`WHERE MetricName=${{ String: metricName }}` | ||
| : ''; | ||
| : chSql`WHERE TimestampTime > (now() - toIntervalDay(7))`; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On line 274, we already use the max_rows_to_read setting to avoid full table scans, I'm not sure changing the time to look here would help
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that max_rows_to_read helps limit the amount of data scanned, but the challenge I’m running into is that I can’t use the force_index_by_date setting with HyperDX, because it doesn't filter by the date column. This setting is generally a good practice, as it helps prevent expensive full table scans on large ClickHouse clusters.
I agree that hardcoding a 7-day lookback isn’t the best approach. Ideally, the query should probably use the lookback period from the UI instead. Do you think that could work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@m0nikasingh I think in general hyperdx shouldn't require force_index_by_date to be set for the user, as all meaningfully queries submitted by the app will use time filters or for queries like this, have reasonable limits applied to terminate early (otherwise it's a bug).
We can revisit how metadata is populated by time range in a more comprehensive PR, but I think for now I'd recommend disabling the restriction for the user that's being used by HyperDX.
…e scans
fixes: #1036