Skip to content

Commit d76e172

Browse files
committed
Add blog post about pre-pulling images
Signed-off-by: manuelbuil <mbuil@suse.com>
1 parent 7abea97 commit d76e172

File tree

1 file changed

+114
-0
lines changed

1 file changed

+114
-0
lines changed
Lines changed: 114 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,114 @@
1+
---
2+
title: K3s strategies for image consumption
3+
description: Master online and offline image loading techniques in K3s for ultra-fast application startup, even with multi-gigabyte containers.
4+
authors: manuelbuil
5+
hide_table_of_contents: true
6+
---
7+
8+
Slow image pulls can be annoying and could increase Kubernetes startup times over a healthy threshold, particularly in resource-constrained or air-gapped environments. The situation gets worsened by new AI cloud native apps, which often rely on astronomically large images (several gigabytes). This post dives into K3s mechanisms, like pre-pulling images and embedded registry mirror, that can effectively improve the user's experience ensuring your cluster is ready the moment you need the images, especially in environments where network bandwidth might be constrained.
9+
10+
11+
## 📦 Online & Offline Strategies: The Power of Local Import ##
12+
13+
K3s provides two core mechanisms for ensuring large images are available quickly, whether you are connected to an external registry (online) or deploying in an isolated environment (offline). The goal is to shift the time spent waiting on a slow network pull into a fast local load during K3s startup.
14+
15+
1. Pre-Pulling Images via a Manifest File (Online)
16+
In scenarios with internet connectivity, the goal is to initiate image pulls as early and efficiently as possible. K3s can be instructed to sequentially pull a set of images into the embedded containerd store during startup. This is ideal for ensuring base images are ready the moment the cluster starts.
17+
18+
Users can trigger a pull of images into the containerd image store by placing a simple text file containing the image names, one per line, in the /var/lib/rancher/k3s/agent/images directory. This can be done before K3s starts or while K3s is running.
19+
20+
Imagine the file `example.txt` which contains:
21+
22+
```yaml
23+
docker.io/pytorch/pytorch:2.9.0-cuda12.6-cudnn9-runtime
24+
```
25+
Before starting the k3s service in the node, do the following:
26+
27+
```bash
28+
# 1. Create the images directory on the node
29+
mkdir -p /var/lib/rancher/k3s/agent/images
30+
31+
# 2. Copy the manifest file (example.txt)
32+
cp example.txt /var/lib/rancher/k3s/agent/images
33+
```
34+
The K3s process will then pull these images via the CRI API. You should see the following two logs:
35+
```yaml
36+
# When the k3s controller detects the file
37+
level=info msg="Pulling images from /var/lib/rancher/k3s/agent/images/example.txt"
38+
level=info msg="Pulling image docker.io/pytorch/pytorch:2.9.0-cuda12.6-cudnn9-runtime"
39+
40+
# When the import is ready. It specifies how much time it took in ms:
41+
level=info msg="Imported docker.io/pytorch/pytorch:2.9.0-cuda12.6-cudnn9-runtime"
42+
level=info msg="Imported images from /var/lib/rancher/k3s/agent/images/example.txt in 6m1.178972902s"
43+
```
44+
45+
2. Importing Images from Tarballs (Offline & Ultra-Fast)
46+
47+
For the absolute fastest startup—critical or when being in an air-gapped environment, the images should be available locally as tarballs. K3s will load these images directly into the containerd image store, bypassing any network traffic entirely.
48+
49+
Place the image tarballs (created using docker save or ctr save) in the same /var/lib/rancher/k3s/agent/images directory. K3s will decompress the tarball, extract the image layers, and load them.
50+
51+
For example, I have created an image tarball with all the images required to deploy the popular [microservices-demo](https://github.com/GoogleCloudPlatform/microservices-demo) with the name `microservices-demo.tar.gz`.
52+
53+
```bash
54+
# Example: Save the image and place the tarball
55+
mkdir -p /var/lib/rancher/k3s/agent/images
56+
cp microservices-demo.tar.gz /var/lib/rancher/k3s/agent/images/
57+
```
58+
59+
The K3s process will load those images and you should see the following two logs:
60+
```yaml
61+
level=info msg="Importing images from /var/lib/rancher/k3s/agent/images/microservices-demo.tar.gz"
62+
level=info msg="Imported images from /var/lib/rancher/k3s/agent/images/microservices-demo.tar.gz in 1m39.8610592s
63+
```
64+
65+
You can verify the successfully imported images at any time using the bundled client: `k3s ctr images list`
66+
67+
### Optimizing booting times with tarballs ###
68+
69+
By default, image archives are imported every time K3s starts to ensure consistency. However, this delay can be significant when dealing with many large archives, for example, `microservices-demo.tar.gz` took 1m39s to import. To alleviate this, K3s offers a feature to only import tarballs that have changed since they were last processed. To enable this feature, create an empty `.cache.json` file in the images directory:
70+
71+
```bash
72+
touch /var/lib/rancher/k3s/agent/images/.cache.json
73+
```
74+
75+
The cache file will store archive metadata (size and modification time). Subsequent restarts of K3s will check this file and skip the import process for any large tarballs that haven't changed, dramatically speeding up cluster boot time. Therefore, to check that this is working, check `.cache.json` is not empty and, after restarting, that the two log lines do not appear anymore.
76+
77+
Note that the caching mechanism needs to be enabled carefully. If an image was removed or pruned since last startup, take manual action to reimport the image. Check our [docs](https://docs.k3s.io/installation/airgap?_highlight=.cache.json&airgap-load-images=Manually+Deploy+Images#enable-conditional-image-imports) for more information.
78+
79+
80+
## 🕸️ Embedded Registry Mirror ##
81+
82+
K3s offers an in-cluster container image registry mirror by embedding Spegel. Its primary use case is to accelerate image pulling and reduce external network dependency in Kubernetes clusters by ensuring images are pulled within the cluster network rather than repeatedly from a central registry. To enable this feature, server nodes must be started with the --embedded-registry flag, or with embedded-registry: true in the configuration file. When enabled, every node in your cluster instantly becomes a stateless, local image mirror listening on port 6443. Nodes share a constantly updated list of available images over a peer-to-peer network on port 5001.
83+
84+
```bash
85+
# Enable the embedded registry mirror
86+
embedded-registry: true
87+
# To enable metrics that can help with the embedded registry mirror
88+
supervisor-metrics: true
89+
```
90+
91+
And then, in all nodes, you must add a `registries.yaml` where we specified what registry we allow a node to both pull images from other nodes, and share the registry's images with other nodes. If a registry is enabled for mirroring on some nodes, but not on others, only the nodes with the registry enabled will exchange images from that registrywhat registries are mirrored. For example:
92+
93+
```bash
94+
mirrors:
95+
docker.io:
96+
registry.k8s.io:
97+
```
98+
99+
If everything boots up correctly, you should see in the logs:
100+
```yaml
101+
level=info msg="Starting distributed registry mirror at https://10.11.0.11:6443/v2 for registries [docker.io registry.k8s.io]"
102+
level=info msg="Starting distributed registry P2P node at 10.11.0.11:5001"
103+
```
104+
105+
And you should be able to see metrics of Spegel by querying the supervisor metrics server:
106+
```bash
107+
kubectl get --server https://10.11.0.11:6443 --raw /metrics | grep spegel
108+
```
109+
110+
111+
🏁 Conclusion
112+
113+
114+
K3s provides robust, flexible tools to decisively tackle slow image pulls, a problem magnified by today's multi-gigabyte cloud-native and AI images. By leveraging pre-pulling manifest strategies, tarball loading or optimizing image distribution with the embedded Spegel registry mirror, you can shift network latency into quick, reliable local operations. These mechanisms ensure your resource-constrained and air-gapped clusters achieve rapid, predictable startup times, delivering a consistently better user experience.

0 commit comments

Comments
 (0)