Skip to content

Conversation

@umihico
Copy link
Owner

@umihico umihico commented Nov 13, 2025

Summary

Fixed deployment failures caused by concurrent workflow executions racing to deploy to the same CloudFormation stack.

Root Cause Analysis

The Problem

When PRs were merged to main, both auto-update and check workflows triggered simultaneously, causing race conditions:

2025-11-13T01:24:08Z - auto-update (push to main) - FAILED ❌
2025-11-13T01:24:08Z - check (push to main) - SUCCEEDED ✅
2025-11-13T01:59:03Z - auto-update (schedule) - SUCCEEDED ✅

Why It Failed

  1. Concurrent Triggers: Both workflows triggered at exactly 2025-11-13T01:24:08Z on push to main
  2. Race Condition: Both tried to update the same CloudFormation stack docker-selenium-lambda-prod
  3. Deployment Conflict: One deployment failed with:
    CREATE_FAILED: DemoLambdaFunction (AWS::Lambda::Function)
    Resource handler returned message: "Lambda does not have permission to access the ECR image"
    
  4. Stack Rollback: CloudFormation rolled back to UPDATE_ROLLBACK_COMPLETE state

Why The Error Was Misleading

The error message suggested an ECR permission issue, but the real problem was the concurrent deployment race condition. The "permission denied" error was a side effect of CloudFormation being in an inconsistent state during the conflict.

Solution

Modified check.yml to exclude main branch:

on:
  push:
    branches-ignore:
      - main
  workflow_dispatch:

Rationale:

  • auto-update workflow already handles main branch deployments
  • check workflow is now for feature branch validation only
  • Added workflow_dispatch for manual testing when needed

Workflow Responsibilities

  • auto-update.yml: Handles main branch deployments, scheduled updates, version updates
  • check.yml: Validates feature branches before merge
  • demo-test.yml: Tests README instructions on schedule

Test Plan

  • Analyzed workflow execution logs
  • Confirmed concurrent execution timing
  • Verified subsequent scheduled run succeeded
  • Merge PR and verify only auto-update runs
  • Push to feature branch and verify only check runs

🤖 Generated with Claude Code

…ck workflow

Root cause: Both auto-update and check workflows were triggered simultaneously
on push to main branch, causing a race condition when deploying to the same
CloudFormation stack. This resulted in deployment failures with ECR permission
errors.

The error occurred because:
1. PR merge triggered both workflows at 2025-11-13T01:24:08Z
2. Both tried to deploy to docker-selenium-lambda-prod stack concurrently
3. CloudFormation CREATE_FAILED on DemoLambdaFunction due to conflict
4. Stack rolled back to UPDATE_ROLLBACK_COMPLETE state

Solution: Exclude main branch from check workflow since auto-update already
handles main branch deployments. Added workflow_dispatch for manual testing.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@umihico umihico marked this pull request as ready for review November 13, 2025 02:19
@umihico umihico merged commit 04970ac into main Nov 13, 2025
1 check passed
@umihico umihico deleted the fix/prevent-concurrent-deployments branch November 13, 2025 02:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants