Skip to content

Commit d53c6d3

Browse files
committed
Address Derek's comments
Signed-off-by: manuelbuil <mbuil@suse.com>
1 parent 545c38e commit d53c6d3

File tree

1 file changed

+17
-14
lines changed

1 file changed

+17
-14
lines changed

blog/2025-11-03-strategies-for-large-images.md

Lines changed: 17 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -5,12 +5,16 @@ authors: manuelbuil
55
hide_table_of_contents: true
66
---
77

8-
Slow image pulls can be annoying and could increase Kubernetes startup times over a healthy threshold, particularly in resource-constrained or air-gapped environments. The situation gets worsened by new AI cloud native apps, which often rely on astronomically large images (several gigabytes). This post dives into K3s mechanisms, like pre-pulling images and embedded registry mirror, that can effectively improve the user's experience ensuring your cluster is ready the moment you need the images, especially in environments where network bandwidth might be constrained.
8+
Slow image pulls can be annoying and can increase Kubernetes startup times over a healthy threshold, particularly in resource-constrained or air-gapped environments. The situation gets worsened by new AI cloud native apps, which often rely on astronomically large images (several gigabytes). This post dives into K3s mechanisms, like pre-pulling images and the embedded registry mirror, that can effectively improve the user's experience when handling large images.
99

10+
<!-- truncate -->
1011

11-
## 📦 Online & Offline Strategies: The Power of Local Import ##
1212

13-
K3s provides two core mechanisms for ensuring large images are available quickly, whether you are connected to an external registry (online) or deploying in an isolated environment (offline). The goal is to shift the time spent waiting on a slow network pull into a fast local load during K3s startup.
13+
## Online & Offline Strategies: The Power of Local Import 📦 ##
14+
15+
K3s provides mechanisms for ensuring large images are available quickly, that address two common scenarios:
16+
- Online Clusters: To avoid slow image pulls from an external registry when a pod starts, K3s can `pre-pull` images from a manifest file.
17+
- Offline (Air-Gapped) Clusters: Where no external registry is available, K3s can `import` images directly from local tarball archives.
1418

1519
1. Pre-Pulling Images via a Manifest File (Online)
1620
In scenarios with internet connectivity, the goal is to initiate image pulls as early and efficiently as possible. K3s can be instructed to sequentially pull a set of images into the embedded containerd store during startup. This is ideal for ensuring base images are ready the moment the cluster starts.
@@ -19,7 +23,7 @@ Users can trigger a pull of images into the containerd image store by placing a
1923

2024
Imagine the file `example.txt` which contains:
2125

22-
```yaml
26+
```text
2327
docker.io/pytorch/pytorch:2.9.0-cuda12.6-cudnn9-runtime
2428
```
2529
Before starting the k3s service in the node, do the following:
@@ -32,7 +36,7 @@ mkdir -p /var/lib/rancher/k3s/agent/images
3236
cp example.txt /var/lib/rancher/k3s/agent/images
3337
```
3438
The K3s process will then pull these images via the CRI API. You should see the following two logs:
35-
```yaml
39+
```log
3640
# When the k3s controller detects the file
3741
level=info msg="Pulling images from /var/lib/rancher/k3s/agent/images/example.txt"
3842
level=info msg="Pulling image docker.io/pytorch/pytorch:2.9.0-cuda12.6-cudnn9-runtime"
@@ -57,7 +61,7 @@ cp microservices-demo.tar.gz /var/lib/rancher/k3s/agent/images/
5761
```
5862

5963
The K3s process will load those images and you should see the following two logs:
60-
```yaml
64+
```log
6165
level=info msg="Importing images from /var/lib/rancher/k3s/agent/images/microservices-demo.tar.gz"
6266
level=info msg="Imported images from /var/lib/rancher/k3s/agent/images/microservices-demo.tar.gz in 1m39.8610592s
6367
```
@@ -77,9 +81,9 @@ The cache file will store archive metadata (size and modification time). Subsequ
7781
Note that the caching mechanism needs to be enabled carefully. If an image was removed or pruned since last startup, take manual action to reimport the image. Check our [docs](https://docs.k3s.io/installation/airgap?_highlight=.cache.json&airgap-load-images=Manually+Deploy+Images#enable-conditional-image-imports) for more information.
7882

7983

80-
## 🕸️ Embedded Registry Mirror ##
84+
## Embedded Registry Mirror 🕸️ ##
8185

82-
K3s offers an in-cluster container image registry mirror by embedding Spegel. Its primary use case is to accelerate image pulling and reduce external network dependency in Kubernetes clusters by ensuring images are pulled within the cluster network rather than repeatedly from a central registry. To enable this feature, server nodes must be started with the --embedded-registry flag, or with embedded-registry: true in the configuration file. When enabled, every node in your cluster instantly becomes a stateless, local image mirror listening on port 6443. Nodes share a constantly updated list of available images over a peer-to-peer network on port 5001.
86+
K3s offers an in-cluster container image registry mirror by embedding [Spegel](https://spegel.dev/). Its primary use case is to accelerate image pulling and reduce external network dependency in Kubernetes clusters by ensuring images are pulled from within the cluster network rather than repeatedly from a central registry. To enable this feature, server nodes must be started with the `--embedded-registry` flag, or with `embedded-registry: true` in the configuration file. When enabled, every node in your cluster instantly becomes a stateless, local image mirror listening on port 6443. Nodes share a constantly updated list of available images over a peer-to-peer network on port 5001.
8387

8488
```bash
8589
# Enable the embedded registry mirror
@@ -88,16 +92,16 @@ embedded-registry: true
8892
supervisor-metrics: true
8993
```
9094

91-
And then, in all nodes, you must add a `registries.yaml` where we specified what registry we allow a node to both pull images from other nodes, and share the registry's images with other nodes. If a registry is enabled for mirroring on some nodes, but not on others, only the nodes with the registry enabled will exchange images from that registrywhat registries are mirrored. For example:
95+
And then, on all nodes, you must add a `registries.yaml` where we specified what registries to allow a node to both push and pull images with other nodes. If a registry is enabled for mirroring on some nodes, but not on others, only the nodes with the registry enabled will exchange images. For example:
9296

93-
```bash
97+
```yaml
9498
mirrors:
9599
docker.io:
96100
registry.k8s.io:
97101
```
98102
99103
If everything boots up correctly, you should see in the logs:
100-
```yaml
104+
```log
101105
level=info msg="Starting distributed registry mirror at https://10.11.0.11:6443/v2 for registries [docker.io registry.k8s.io]"
102106
level=info msg="Starting distributed registry P2P node at 10.11.0.11:5001"
103107
```
@@ -108,7 +112,6 @@ kubectl get --server https://10.11.0.11:6443 --raw /metrics | grep spegel
108112
```
109113

110114

111-
🏁 Conclusion
112-
115+
## Conclusion 🏁 ##
113116

114-
K3s provides robust, flexible tools to decisively tackle slow image pulls, a problem magnified by today's multi-gigabyte cloud-native and AI images. By leveraging pre-pulling manifest strategies, tarball loading or optimizing image distribution with the embedded Spegel registry mirror, you can shift network latency into quick, reliable local operations. These mechanisms ensure your resource-constrained and air-gapped clusters achieve rapid, predictable startup times, delivering a consistently better user experience.
117+
K3s provides robust, flexible tools to tackle slow image pulls, a problem magnified by today's multi-gigabyte cloud-native and AI images. By leveraging pre-pulling manifest strategies, tarball loading or optimizing image distribution with the embedded [Spegel](https://spegel.dev/) registry mirror, you can shift slow network operations into quick local operations. These mechanisms ensure your resource-constrained and air-gapped clusters achieve rapid, predictable startup times, delivering a consistently better user experience.

0 commit comments

Comments
 (0)