Skip to content

Commit bc4b42e

Browse files
authored
Merge pull request #9231 from Neon-White/warp-docs
Add Warp on IBM Cloud docs
2 parents 2edf186 + 0b03137 commit bc4b42e

File tree

1 file changed

+369
-0
lines changed

1 file changed

+369
-0
lines changed

docs/CI & Tests/warp-on-ibm.md

Lines changed: 369 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,369 @@
1+
# Warp IBM Cloud Automation Infrastructure
2+
3+
1. [Introduction](#introduction)
4+
2. [Prerequisites](#prerequisites)
5+
3. [Architecture Overview](#architecture-overview)
6+
4. [Execution Flow](#execution-flow)
7+
5. [GitHub Workflows](#github-workflows)
8+
6. [Cloud-Init Configuration](#cloud-init-configuration)
9+
7. [Helper Scripts](#helper-scripts)
10+
8. [Log Locations](#log-locations)
11+
9. [Debugging](#debugging)
12+
13+
## Introduction
14+
15+
The Warp IBM Cloud automation infrastructure provides automated nightly performance testing for NooBaa using Warp benchmarks on dedicated IBM Cloud VMs. This system automatically provisions virtual machines, runs Warp tests, uploads results to IBM Cloud Object Storage, sends notifications to Slack, and cleans up resources.
16+
17+
## Prerequisites
18+
19+
### Required Secrets
20+
21+
Before using the IBM Cloud automation infrastructure, the following GitHub secrets must be configured in the repository settings:
22+
23+
| Secret Name | Description |
24+
|-------------|-------------|
25+
| `IBM_CLOUD_API_KEY` | IBM Cloud API key with VPC infrastructure permissions |
26+
| `SLACK_NIGHTLY_RESULTS_URL` | Slack webhook URL for nightly test result notifications |
27+
| `IBM_COS_WRITER_CREDENTIALS` | JSON configuration for IBM Cloud Object Storage access (see below) |
28+
| `IBM_WARP_VM_CONFIG` | JSON configuration for VM provisioning (see below) |
29+
30+
### IBM_COS_WRITER_CREDENTIALS
31+
32+
The `IBM_COS_WRITER_CREDENTIALS` secret must be a **single line** JSON string containing the following keys:
33+
34+
```json
35+
{
36+
"AWS_ACCESS_KEY_ID": "...",
37+
"AWS_SECRET_ACCESS_KEY": "..."
38+
}
39+
```
40+
41+
### IBM_WARP_VM_CONFIG Structure
42+
43+
The `IBM_WARP_VM_CONFIG` secret must be a **single line** JSON string containing the following keys:
44+
45+
```json5
46+
{
47+
"INSTANCE_NAME": "...", // Base name for the VSI
48+
"RESOURCE_TAG": "...", // Tag to identify related resources
49+
"VPC_NAME": "...", // Name of VPC to use
50+
"REGION": "...", // Desired VSI region
51+
"ZONE": "...", // Desired VSI zone
52+
"INSTANCE_PROFILE": "...", // Desired VSI profile
53+
"FLOATING_IP_NAME": "...", // Base name for the floating IP
54+
"SUBNET_ID": "...", // ID of subnet to use
55+
"IMAGE_ID": "...", // ID of OS image to use
56+
"SECURITY_GROUP_ID": "...", // ID of security group to use
57+
"WARP_LOGS_BUCKET": "...", // Name of IBM COS bucket to store the Warp logs in
58+
"IBM_COS_ENDPOINT": "..." // IBM COS endpoint to use in order to access the logs bucket
59+
}
60+
```
61+
62+
## Architecture Overview
63+
64+
The automation uses a dispatcher pattern with four main components working together:
65+
66+
1. **Provisioning Dispatcher** - Triggers the provisioning workflow at midnight UTC
67+
2. **Provisioning Workflow** - Creates IBM Cloud VMs with proper networking and security
68+
3. **VM Configuration** - Sets up the testing environment using cloud-init
69+
4. **Cleanup Dispatcher & Workflow** - Automatically removes VMs and associated resources at 4 AM UTC
70+
71+
```text
72+
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
73+
│ Provision | | | | Cleanup |
74+
| Dispatcher │ │ Test & Log │ │ Dispatcher │
75+
│ (Midnight) │───▶│ (3+ hours) │───▶│ (4 AM) │
76+
└─────────────────┘ └─────────────────┘ └─────────────────┘
77+
│ │ │
78+
▼ ▼ ▼
79+
Provision VM Warp tests Cleanup VM
80+
Floating IP Log upload Release IP
81+
Cloud-init Slack notification Tag-based search
82+
```
83+
84+
## Execution Flow
85+
86+
This section provides a detailed walkthrough of the complete automation flow, from the midnight UTC trigger through all involved files and their roles, until the 4 AM cleanup.
87+
88+
### Timeline Overview
89+
90+
| Time (UTC) | Phase | Duration | Description |
91+
|------------|-------|----------|-------------|
92+
| 00:00 | **Provisioning** | ~1 minute | GitHub Actions provisions IBM Cloud VM |
93+
| 00:01 | **VM Setup** | ~3 minutes | Cloud-init configures and sets up the VM environment |
94+
| 00:05 | **Image Building** | ~20 minutes | The NooBaa tester image is built locally |
95+
| 00:25 | **Testing** | ~3 hours | Warp performance tests execute |
96+
| 03:30 | **Results** | ~2 minutes | Log upload and Slack notification |
97+
| 04:00 | **Cleanup** | ~1 minute | GitHub Actions cleans up all resources (in case any are left) |
98+
99+
### Detailed Flow
100+
101+
#### 1. Midnight UTC Trigger
102+
103+
**Triggered by**: GitHub Actions cron schedule
104+
**File**: `.github/workflows/ibm-nightly-provision-dispatcher.yaml`
105+
106+
The provisioning dispatcher workflow is triggered by the cron expression `'0 0 * * *'` and calls the reusable provisioning workflow (`.github/workflows/ibm-nightly-vm-provision.yaml`) which begins the following sequence:
107+
108+
1. **Environment Setup**
109+
- Installs IBM Cloud CLI and VPC plugin.
110+
- Extracts and masks sensitive configuration from `IBM_WARP_VM_CONFIG` secret (passed as `VM_CONFIG` to the reusable workflow).
111+
- Authenticates with IBM Cloud using `IBM_CLOUD_API_KEY`.
112+
113+
2. **Cloud-Init Preparation**
114+
- Uses `envsubst` to inject GitHub secrets into the cloud-init template.
115+
- Creates `/tmp/ibm-vm-runner-config-with-secrets.yaml` with populated environment variables.
116+
117+
3. **VM Provisioning**
118+
- Creates IBM Cloud VM instance with specified configuration.
119+
- Attaches the VM to the configured VPC, subnet, and security group.
120+
- Passes the prepared cloud-init configuration as user data.
121+
122+
4. **Network Configuration**
123+
- Reserves a floating IP in the same zone as the VM.
124+
- Binds the floating IP to the VM's primary network interface.
125+
- Enables external connectivity for log uploads and notifications.
126+
127+
#### 2. VM Initialization
128+
129+
**Triggered by**: VM boot process
130+
**File**: `.github/ibm-warp-runner-config.yaml`
131+
132+
Once the VM starts, cloud-init takes over using the configuration file:
133+
134+
1. **System Setup**
135+
- Updates package repositories.
136+
- Installs required packages.
137+
138+
2. **Environment Configuration**
139+
- Creates `/etc/warp.env` with environment variables from GitHub secrets.
140+
- Sets up Node.js path for Slack webhook functionality.
141+
142+
3. **Security & Services**
143+
- Schedules automatic VM shutdown after 4 hours (`shutdown -P '+240'`) as a safety measure.
144+
- Adds `ubuntu` user to `docker` group for container access.
145+
- Enables and starts Docker daemon.
146+
147+
4. **Repository Setup**
148+
- Clones the NooBaa core repository to `/home/ubuntu/tests/noobaa-core`.
149+
- Installs helper scripts to system path:
150+
- `run_containerized_warp_on_cloud_runner.sh``/usr/local/bin/`
151+
- `slack_notifier.js``/usr/local/bin/`
152+
153+
5. **Test Execution Launch**
154+
- Executes `/usr/local/bin/run_containerized_warp_on_cloud_runner.sh` as the final step.
155+
156+
#### 3. Warp Testing Phase
157+
158+
**File**: `tools/ibm_runner_helpers/run_containerized_warp_on_cloud_runner.sh`
159+
160+
The main orchestration script handles the entire testing workflow:
161+
162+
1. **Environment Loading**
163+
- Sources environment variables from `/etc/warp.env`.
164+
165+
2. **Container Preparation**
166+
- Builds the NooBaa tester Docker image locally on the VM.
167+
- Prepares the containerized testing environment.
168+
169+
3. **Warp Test Execution**
170+
- Runs Warp performance tests with parameters:
171+
- `--duration 3h` - 3-hour test duration
172+
- `--obj-size 10m` - 10MB object size
173+
- `--obj-randsize` - Random object sizes (up to 10MB)
174+
- Captures all test output and metrics.
175+
176+
#### 4. Results & Notification
177+
178+
**Files**:
179+
- `tools/ibm_runner_helpers/run_containerized_warp_on_cloud_runner.sh`
180+
- `tools/ibm_runner_helpers/slack_notifier.js`
181+
182+
1. **Log Upload**
183+
- Uses AWS CLI to upload test results to IBM Cloud Object Storage.
184+
185+
2. **Slack Notification**
186+
- Calls `slack_notifier.js` with test results.
187+
- Sends notification to configured Slack channel via webhook.
188+
189+
3. **VM Shutdown**
190+
- Initiates VM shutdown regardless of test outcome.
191+
- Ensures clean termination of the test environment.
192+
193+
#### 5. Resource Cleanup
194+
195+
**Triggered by**: GitHub Actions cron schedule
196+
**File**: `.github/workflows/ibm-nightly-cleanup-dispatcher.yaml`
197+
198+
The cleanup dispatcher workflow runs at 4 AM UTC (`'0 4 * * *'`), providing a 4-hour window for testing. It calls the reusable cleanup workflow (`.github/workflows/ibm-nightly-vm-cleanup.yaml`) which performs:
199+
200+
1. **Environment Setup**
201+
- Installs IBM Cloud CLI and VPC plugin.
202+
- Extracts VM configuration from `IBM_WARP_VM_CONFIG` secret (passed as `VM_CONFIG` to the reusable workflow).
203+
- Authenticates with IBM Cloud.
204+
205+
2. **Floating IP Cleanup**
206+
- Searches for the floating IP by resource tag (`IBM_WARP_VM_CONFIG.RESOURCE_TAG`).
207+
- Releases the floating IP if found.
208+
209+
3. **VM Cleanup**
210+
- Searches for the VM instance by resource tag (`IBM_WARP_VM_CONFIG.RESOURCE_TAG`).
211+
- Forcibly deletes the VM instance if found.
212+
- Waits and verifies complete deletion.
213+
214+
4. **Verification**
215+
- Confirms successful resource cleanup.
216+
- Logs cleanup status for monitoring.
217+
- Ensures no orphaned resources remain.
218+
219+
## GitHub Workflows
220+
221+
The automation uses a dispatcher pattern with four main workflow files:
222+
223+
### Nightly VM Provisioning Dispatcher
224+
225+
**File**: [`.github/workflows/ibm-nightly-provision-dispatcher.yaml`](../../.github/workflows/ibm-nightly-provision-dispatcher.yaml)
226+
227+
**Schedule**: Daily at midnight UTC (00:00)
228+
**Purpose**: Triggers the reusable provisioning workflow with the required configuration
229+
230+
#### VM_CLOUD_INIT_CONFIG_LOCATION Configuration
231+
232+
The `VM_CLOUD_INIT_CONFIG_LOCATION` parameter specifies the location of the desired cloud-config file within the NooBaa Core repository. This parameter is passed as an input to the reusable provisioning workflow and must point to a valid cloud-init configuration file.
233+
234+
**Important Notes**:
235+
- The path must be relative to the repository root.
236+
- The file must exist in the NooBaa Core repository.
237+
- The cloud-init configuration file contains the VM setup instructions, including package installations, environment setup, and the test execution commands.
238+
- If you need to modify the VM setup process, update the cloud-init configuration file at this location.
239+
240+
#### VM_CONFIG Secret Configuration
241+
242+
The `VM_CONFIG` parameter references the GitHub Actions secret that contains the machine configuration for IBM Cloud VM provisioning.
243+
244+
**Secret Structure**: The `VM_CONFIG` secret must contain a JSON configuration with all the IBM Cloud infrastructure details (see [IBM_WARP_VM_CONFIG Structure](#ibm_warp_vm_config-structure) section above for an example).
245+
246+
**Important Notes**:
247+
- This secret contains sensitive infrastructure configuration including VPC details, subnet IDs, and security group configurations
248+
- The secret is passed to the reusable workflow where it gets parsed and individual values are extracted as environment variables
249+
- All values from this JSON secret are automatically masked in GitHub Actions logs for security
250+
251+
#### GitHub Actions Secrets Limitation
252+
253+
Currently, all secrets required by a job must be passed to the job. This is because GitHub Actions does not provide a straightforward way to pass individual secrets to workflows on top of general secret inheritance - either all secrets are inherited, or all required secrets are passed by name. For example - it is not possible to pass only VM_CONFIG, since then all other inherited secrets are no longer passed to the workflow.
254+
255+
**Security Implications**:
256+
- All secrets are available to the entire workflow run, even if specific jobs don't need them.
257+
- This follows the principle of explicit secret management but can lead to broader secret exposure than strictly necessary.
258+
- Secrets are still properly masked in logs and handled securely by GitHub Actions.
259+
260+
### Nightly VM Provisioning (Reusable)
261+
262+
**File**: [`.github/workflows/ibm-nightly-vm-provision.yaml`](../../.github/workflows/ibm-nightly-vm-provision.yaml)
263+
264+
**Trigger**: Called by the provisioning dispatcher
265+
**Purpose**: Handles the actual VM provisioning, networking setup, and cloud-init configuration
266+
267+
### Nightly VM Cleanup Dispatcher
268+
269+
**File**: [`.github/workflows/ibm-nightly-cleanup-dispatcher.yaml`](../../.github/workflows/ibm-nightly-cleanup-dispatcher.yaml)
270+
271+
**Schedule**: Daily at 4 AM UTC (04:00) - 4 hours after provisioning
272+
**Purpose**: Triggers the reusable cleanup workflow
273+
274+
### Nightly VM Cleanup (Reusable)
275+
276+
**File**: [`.github/workflows/ibm-nightly-vm-cleanup.yaml`](../../.github/workflows/ibm-nightly-vm-cleanup.yaml)
277+
278+
**Trigger**: Called by the cleanup dispatcher
279+
**Purpose**: Handles the actual resource cleanup (VMs and floating IPs)
280+
281+
### Daily VM cleanup
282+
283+
**File**: [.github/workflows/ibm-daily-leftover-cleanup.yaml](../../.github/workflows/ibm-daily-leftover-cleanup.yaml)
284+
285+
**Trigger**: Daily at 6 AM UTC (06:00)
286+
**Purpose**: Cleans up machines and IPs based on name pattern instead of tags, to catch unexpected leftovers
287+
288+
## Cloud-Init Configuration
289+
290+
**File**: [`.github/ibm-warp-runner-config.yaml`](../../.github/ibm-warp-runner-config.yaml)
291+
292+
This cloud-init configuration automatically sets up the VM environment for Warp testing.
293+
The config includes:
294+
- The required packages that'll be installed on the VM
295+
- Additional setup commands
296+
- An environment file (`warp.env`) that is used to pass env vars from GitHub to the runner (via `envsubst` variable templating)
297+
- The 'entrypoint' (that is the last command that is run by cloud-init; in our case, the Warp scripts)
298+
299+
## Helper Scripts
300+
301+
### Containerized Warp Runner
302+
303+
**File**: [`tools/ibm_runner_helpers/run_containerized_warp_on_cloud_runner.sh`](../../tools/ibm_runner_helpers/run_containerized_warp_on_cloud_runner.sh)
304+
305+
The main orchestration script that:
306+
- Builds the NooBaa tester image locally on the VM.
307+
- Runs the Warp tests (currently with the parameters `--duration 3h --obj-size 10m --obj-randsize`).
308+
- Handles success/failure scenarios with appropriate notifications.
309+
- Uploads logs to IBM COS with timestamped directory structure.
310+
- Ensures VM shutdown regardless of test outcome.
311+
312+
### Slack Notifier
313+
314+
**File**: [`tools/ibm_runner_helpers/slack_notifier.js`](../../tools/ibm_runner_helpers/slack_notifier.js)
315+
316+
Simple Node.js helper script for sending notifications - mostly used in favor of clumsy cURL commands.
317+
318+
## Log locations:
319+
- **GitHub Actions**: Workflow run logs in GitHub UI
320+
- **VM logs**: `/var/log/cloud-init.log` and `/var/log/cloud-init-output.log`
321+
- **Test results**: Uploaded to IBM COS bucket with timestamp-based paths
322+
323+
## Debugging
324+
325+
### Serial Console Access
326+
327+
As a hardening measure, the test runner VMs cannot usually be accessed - no SSH key or username and password are authorized by default. However, `cloud-init` prints all of its output to the VM's serial console, which can be accessed from the IBM Cloud web UI without machine authentication.
328+
329+
**To access the serial console:**
330+
1. Navigate to the IBM Cloud web console.
331+
2. Go to **VPC Infrastructure****Virtual server instances**.
332+
3. Find your warp test VM instance.
333+
4. Click the **kebab menu** (three dots) next to the instance.
334+
5. Select **"Open serial console"**.
335+
336+
This provides real-time access to the VM's console output, including all cloud-init logs and script execution details.
337+
338+
### Debug VM with SSH Access
339+
340+
If further debugging is required beyond console logs, you can provision a debug VM with SSH access:
341+
342+
1. **Export required environment variables** locally from the "Required Secrets" section:
343+
```bash
344+
export IBM_CLOUD_API_KEY="your-api-key"
345+
export VM_CONFIG='{"INSTANCE_NAME": "example-name", ...}'
346+
# ... other required variables
347+
```
348+
349+
2. **Modify the cloud-init configuration** (`ibm-warp-runner-config.yaml`) to add SSH access:
350+
```yaml
351+
# Add SSH key or user authentication
352+
ssh_authorized_keys:
353+
- ssh-rsa AAAA... your-public-key
354+
```
355+
356+
3. **Run the provisioning commands manually** from the workflow to create a debuggable VM:
357+
```bash
358+
# Install IBM Cloud CLI
359+
curl -fsSL https://clis.cloud.ibm.com/install/linux | sh
360+
ibmcloud plugin install vpc-infrastructure -f
361+
362+
# Authenticate and provision
363+
ibmcloud login --apikey "$IBM_CLOUD_API_KEY" -r "$REGION"
364+
ibmcloud is instance-create ... --user-data @modified-config.yaml
365+
```
366+
367+
4. **SSH into the debug VM**
368+
369+
**Important**: Remember to clean up debug VMs manually after troubleshooting.

0 commit comments

Comments
 (0)