-
Notifications
You must be signed in to change notification settings - Fork 73
Remove epsilon dataset usage for ml_benchmarks #197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@razdoburdin Would it be a problem to remove this dataset? |
it represents the xgboost cases with large histogram size not fitting in cache, but we can replace it by synthetic data or use preloaded data. |
|
@ethanglaser Any blockers for merging this PR? |
Last I checked there were still some issues with the job. Let's see how http://intel-ci.intel.com/f0caf58d-06da-f13b-8936-a4bf010d0e2d goes. Also I think we may need some help from Aleksei, the filters parameter does not work as intended (it always resorts to default from what I can tell) |
Why would the filters matter if this is removing it from the configs? |
Because this is another error frequently occurring in CI benchmark jobs. In the event that there are no dataset downloading issues, the logs are littered with "cannot place on GPU device" or something like this, for the CPU jobs, and what I am seeing is that it is because that part of the filters is not actually read by infra because the field is misconfigured or mishandled. |
Description
Avoid ChunkedEncodingError / IncompleteRead issues in CI by disabling usage of epsilon dataset
Checklist:
Completeness and readability
Testing