inconsistent oom kills in k8s

### Description

I am trying to get a better understanding of how gvisor handles memory management; and more specifically how it handles oom kills primarily for python workloads.

Across both gke and custom clusters I get very inconsistent oom-kills.  See below for version numbers.
I have req and limits set for primary and sidecar containers; i have experimented with setting req/limits for init container, but this doesn't seem to make a difference. By default I do not have these values set for init containers.

My deployment yaml is templated so the only differences are the containers.
With this setup I have some pods that are correctly oom-killed, exit 137, or evicted.
However, I often see several pods where memory reported by the k8s metrics api is 1.5x or greater than the limit set in the deployment spec; this is validated by inspecting the process on the host.
e.x.
```
kubectl get pod xxx -o jsonpath='{.spec.runtimeClassName}'
gvisor
kubectl get pod xxx -o jsonpath='{range .spec.containers[*]}{.name}{": limits="}{.resources.limits}{", requests="}{.resources.requests}{"\n"}{end}'
primary: limits={"cpu":"4","ephemeral-storage":"2Gi","memory":"8589934592"}, requests={"cpu":"2","ephemeral-storage":"2Gi","memory":"4294967296"}
sidecar: limits={"cpu":"100m","memory":"100Mi"}, requests={"cpu":"10m","memory":"50Mi"}

kubectl top pod xxx
NAME                                                       CPU(cores)   MEMORY(bytes)
xxx   93m          12560Mi
```


I'm not sure if this is a bug with gvisor or with my setup, would appreciate any help in debugging this.

### Steps to reproduce

I set up a test pod that just continuously allocates memory. With the single container it is oom-killed as expected. With an init container; i have seen a single instance where it was able to exceeded the limit; but most of the time it is killed correctly. Not able to draw any conclusions here.

### runsc version

```shell
I have two different cluster types using gvisor.
gke (1.33.5-gke.1080000) clusters:

/home/containerd/usr/local/sbin/runsc --version
runsc version google-785595836
spec: 1.2.1

custom cluster setup using systemd cgroups v2:

runsc --version
runsc version release-20250820.0
spec: 1.2.0
```

### docker version (if using docker)

```shell

```

### uname

_No response_

### kubectl (if using Kubernetes)

```shell
custom node setup

kubelet --version
Kubernetes v1.33.1

kubectl version
...
Server Version: v1.33.1

kubectl get nodes
NAME           STATUS   ROLES   AGE   VERSION
xxx   Ready    node    29d   v1.33.1
...
```

### repo state (if built from source)

_No response_

### runsc debug logs (if available)

```shell

```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

inconsistent oom kills in k8s #12282

Description

Steps to reproduce

runsc version

docker version (if using docker)

uname

kubectl (if using Kubernetes)

repo state (if built from source)

runsc debug logs (if available)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

inconsistent oom kills in k8s #12282

Description

Description

Steps to reproduce

runsc version

docker version (if using docker)

uname

kubectl (if using Kubernetes)

repo state (if built from source)

runsc debug logs (if available)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions