-
Notifications
You must be signed in to change notification settings - Fork 1.3k
retry ephemeral runner even if pod creation fails #4272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
@nikola-jokic I see there is a 0.13.0 PR - are these changes still relevant? It looks like some of the changes in that PR may fix this (but not 100% sure) |
|
Hey @badstreff, I think so too, at least it would fix some of the issues I was able to reproduce. I'm pretty sure that this case is already covered. |
|
@nikola-jokic Added a test case that covers the scenario I'm attempting to resolve, this test fails on the master branch without these changes |
|
@nikola-jokic If you have some time can you review? We ran into this again today |

Currently if the pod fails to create, for example if a resource quote is blocking it or some other admission controller then the controller will not retry.
There is code to handle event when the pod fails here PR #4059 but this doesn't work when the pod fails to actually create
This PR adds a failure count to the EphemeralRunner resource even if the pod fails to create so we back off and try again during reconciliation
I'm also not sure if it would be better to just emit an event that the pod creation failed, keep the ephemeralrunner in pending, and let the reconciler try again - this seems like a better pattern to me but the existing code is here to implement this retry logic so I think keeping it consistent is best