Skip to content

Conversation

@nikola-jokic
Copy link
Collaborator

@nikola-jokic nikola-jokic commented Nov 6, 2025

This pull request improves error handling and logging in the EphemeralRunnerReconciler for pod creation failures, particularly distinguishing between invalid and forbidden errors, and adding logic to handle resource quota issues more gracefully.

Error handling improvements:

  • Enhanced the handling of IsInvalid pod creation errors by logging the error and marking the EphemeralRunner as failed, instead of treating it together with IsForbidden errors.
  • Improved handling of IsForbidden errors by checking if the error is due to resource quota being exceeded. If the quota is exceeded and the runner is about to expire, the ephemeral runner is deleted and recreated; if only the quota is exceeded, the reconcile is retried after 30 seconds. Other forbidden errors fall back to the default handling.

Fixes #3629

@nikola-jokic nikola-jokic added the gha-runner-scale-set Related to the gha-runner-scale-set mode label Nov 6, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Nov 6, 2025

Hello! Thank you for your contribution.

Please review our contribution guidelines to understand the project's testing and code conventions.

@nikola-jokic nikola-jokic marked this pull request as ready for review November 6, 2025 19:31
@nikola-jokic nikola-jokic requested a review from mumoshu as a code owner November 6, 2025 19:31
Copilot AI review requested due to automatic review settings November 6, 2025 19:31
@nikola-jokic nikola-jokic requested review from a team, rentziass and toast-gear as code owners November 6, 2025 19:31
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors error handling for pod creation failures in the EphemeralRunnerReconciler. It splits the handling of IsInvalid and IsForbidden errors and introduces special recovery logic for resource quota exceeded scenarios.

Key Changes:

  • Separated IsInvalid and IsForbidden error handling into distinct cases
  • Added resource quota exceeded detection with retry logic
  • Implemented automatic ephemeral runner recreation when quota is exceeded and runner is about to expire

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

gha-runner-scale-set Related to the gha-runner-scale-set mode

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ARC not working with ResourceQuotas. Fails to schedule pod instead of queuing.

2 participants