01 May 2026

feedKubernetes Blog

Kubernetes v1.36: Pod-Level Resource Managers (Alpha)

Kubernetes v1.36 introduces Pod-Level Resource Managers as an alpha feature, bringing a more flexible and powerful resource management model to performance-sensitive workloads. This enhancement extends the kubelet's Topology, CPU, and Memory Managers to support pod-level resource specifications (.spec.resources), evolving them from a strictly per-container allocation model to a pod-centric one.

Why do we need pod-level resource managers?

When running performance-critical workloads such as machine learning (ML) training, high-frequency trading applications, or low-latency databases, you often need exclusive, NUMA-aligned resources for your primary application containers to ensure predictable performance.

However, modern Kubernetes pods rarely consist of just one container. They frequently include sidecar containers for logging, monitoring, service meshes, or data ingestion.

Before this feature, this created a trade-off, to get NUMA-aligned, exclusive resources for your main application, you had to allocate exclusive, integer-based CPU resources to every container in the pod. This might be wasteful for lightweight sidecars. If you didn't do this, you forfeited the pod's Guaranteed Quality of Service (QoS) class entirely, losing the performance benefits.

Introducing pod-level resource managers

Enabling pod-level resources support for the resource managers (via the PodLevelResourceManagers and PodLevelResources feature gates) allows the kubelet to create hybrid resource allocation models. This brings flexibility and efficiency to high-performance workloads without sacrificing NUMA alignment.

Real-world use cases

Here are a few practical scenarios demonstrating how this feature can be applied, depending on the configured Topology Manager scope:

1. Tightly-coupled database (Topology manager's pod scope)

Consider a latency-sensitive database pod that includes a main database container, a local metrics exporter, and a backup agent sidecar.

When configured with the pod Topology Manager scope, the kubelet performs a single NUMA alignment based on the entire pod's budget. The database container gets its exclusive CPU and memory slices from that NUMA node. The remaining resources from the pod's budget form a new pod shared pool. The metrics exporter and backup agent run in this pod shared pool. They share resources with each other, but they are strictly isolated from the database's exclusive slices and the rest of the node.

This allows you to safely co-locate auxiliary containers on the same NUMA node as your primary workload without wasting dedicated cores on them.

apiVersion: v1
kind: Pod
metadata:
 name: tightly-coupled-database
spec:
 # Pod-level resources establish the overall budget and NUMA alignment size.
 resources:
 requests:
 cpu: "8"
 memory: "16Gi"
 limits:
 cpu: "8"
 memory: "16Gi"
 initContainers:
 - name: metrics-exporter
 image: metrics-exporter:v1
 restartPolicy: Always
 - name: backup-agent
 image: backup-agent:v1
 restartPolicy: Always
 containers:
 - name: database
 image: database:v1
 # This Guaranteed container gets an exclusive 6 CPU slice from the pod's budget.
 # The remaining 2 CPUs and 4Gi memory form the pod shared pool for the sidecars.
 resources:
 requests:
 cpu: "6"
 memory: "12Gi"
 limits:
 cpu: "6"
 memory: "12Gi"

2. ML workload with infrastructure sidecars (Topology manager's container scope)

Imagine a pod running a GPU-accelerated ML training workload alongside a generic service mesh sidecar.

Under the container Topology Manager scope, the kubelet evaluates each container individually. You can grant the ML container exclusive, NUMA-aligned CPUs and Memory for maximum performance. Meanwhile, the service mesh sidecar doesn't need to be NUMA-aligned; it can run in the general node-wide shared pool. The collective resource consumption is still safely bounded by the overall pod limits, but you only allocate NUMA-aligned, exclusive resources to the specific containers that actually require them.

apiVersion: v1
kind: Pod
metadata:
 name: ml-workload
spec:
 # Pod-level resources establish the overall budget constraint.
 resources:
 requests:
 cpu: "4"
 memory: "8Gi"
 limits:
 cpu: "4"
 memory: "8Gi"
 initContainers:
 - name: service-mesh-sidecar
 image: service-mesh:v1
 restartPolicy: Always
 containers:
 - name: ml-training
 image: ml-training:v1
 # Under the 'container' scope, this Guaranteed container receives exclusive,
 # NUMA-aligned resources, while the sidecar runs in the node's shared pool.
 resources:
 requests:
 cpu: "3"
 memory: "6Gi"
 limits:
 cpu: "3"
 memory: "6Gi"

CPU quotas (CFS) and isolation

When running these mixed workloads within a pod, isolation is enforced differently depending on the allocation:

How to enable Pod-Level Resource Managers

Using this feature requires Kubernetes v1.36 or newer. To enable it, you must configure the kubelet with the appropriate feature gates and policies:

  1. Enable the PodLevelResources and PodLevelResourceManagers feature gates.
  2. Configure the Topology Manager with a policy other than none (i.e., best-effort, restricted, or single-numa-node).
  3. Set the Topology Manager scope to either pod or container using the topologyManagerScope field in the KubeletConfiguration.
  4. Configure the CPU Manager with the static policy.
  5. Configure the Memory Manager with the Static policy.

Observability

To help cluster administrators monitor and debug these new allocation models, we have introduced several new kubelet metrics when the feature gate is enabled:

Current limitations and caveats

While this feature opens up new possibilities, there are a few things to keep in mind during its alpha phase. Be sure to review the Limitations and caveats in the official documentation for full details on compatibility, requirements, and downgrade instructions.

Getting started and providing feedback

For a deep dive into the technical details and configuration of this feature, check out the official concept documentation:

To learn more about the overall pod-level resources feature and how to assign resources to pods, see:

As this feature moves through Alpha, your feedback is invaluable. Please report any issues or share your experiences via the standard Kubernetes communication channels:

01 May 2026 6:35pm GMT

30 Apr 2026

feedKubernetes Blog

Kubernetes v1.36: In-Place Vertical Scaling for Pod-Level Resources Graduates to Beta

Following the graduation of Pod-Level Resources to Beta in v1.34 and the General Availability (GA) of In-Place Pod Vertical Scaling in v1.35, the Kubernetes community is thrilled to announce that In-Place Pod-Level Resources Vertical Scaling has graduated to Beta in v1.36!

This feature is now enabled by default via the InPlacePodLevelResourcesVerticalScaling feature gate. It allows users to update the aggregate Pod resource budget (.spec.resources) for a running Pod, often without requiring a container restart.

Why Pod-level in-place resize?

The Pod-level resource model simplified management for complex Pods (such as those with sidecars) by allowing containers to share a collective pool of resources. In v1.36, you can now adjust this aggregate boundary on-the-fly.

This is particularly useful for Pods where containers do not have individual limits defined. These containers automatically scale their effective boundaries to fit the newly resized Pod-level dimensions, allowing you to expand the shared pool during peak demand without manual per-container recalculations.

Resource inheritance and the resizePolicy

When a Pod-level resize is initiated, the Kubelet treats the change as a resize event for every container that inherits its limits from the Pod-level budget. To determine whether a restart is required, the Kubelet consults the resizePolicy defined within individual containers:

Note: Currently, resizePolicy is not supported at the Pod level. The Kubelet always defers to individual container settings to decide if an update can be applied in-place or requires a restart.

Example: Scaling a shared resource pool

In this scenario, a Pod is defined with a 2 CPU pod-level limit. Because the individual containers do not have their own limits defined, they share this total pool.

1. Initial Pod specification

apiVersion: v1
kind: Pod
metadata:
 name: shared-pool-app
spec:
 resources: # Pod-level limits
 limits:
 cpu: "2"
 memory: "4Gi"
 containers:
 - name: main-app
 image: my-app:v1
 resizePolicy: [{resourceName: "cpu", restartPolicy: "NotRequired"}]
 - name: sidecar
 image: logger:v1
 resizePolicy: [{resourceName: "cpu", restartPolicy: "NotRequired"}]

2. The resize operation

To double the CPU capacity to 4 CPUs, apply a patch using the resize subresource:

kubectl patch pod shared-pool-app --subresource resize --patch \
 '{"spec":{"resources":{"limits":{"cpu":"4"}}}}'

Node-Level reality: feasibility and safety

Applying a resize patch is only the first step. The Kubelet performs several checks and follows a specific sequence to ensure node stability:

1. The feasibility check

Before admitting a resize, the Kubelet verifies if the new aggregate request fits within the Node's allocatable capacity. If the Node is overcommitted, the resize is not ignored; instead, the PodResizePending condition will reflect a Deferred or Infeasible status, providing immediate feedback on why the "envelope" hasn't grown.

2. Update sequencing

To prevent resource "overshoot", the Kubelet coordinates the cgroup updates in a specific order:

Observability: tracking resize status

With the move to Beta, Kubernetes uses Pod Conditions to track the lifecycle of a resize:

status:
 allocatedResources:
 cpu: "4"
 resources:
 limits:
 cpu: "4"
 conditions:
 - type: PodResizeInProgress
 status: "True"

Constraints and requirements

What's next?

As we move toward General Availability (GA), the community is focusing on Vertical Pod Autoscaler (VPA) Integration, enabling VPA to issue Pod-level resource recommendations and trigger in-place actuation automatically.

Getting started and providing feedback

We encourage you to test this feature and provide feedback via the standard Kubernetes communication channels:

30 Apr 2026 6:35pm GMT

29 Apr 2026

feedKubernetes Blog

Kubernetes v1.36: Tiered Memory Protection with Memory QoS

On behalf of SIG Node, we are pleased to announce updates to the Memory QoS feature (alpha) in Kubernetes v1.36. Memory QoS uses the cgroup v2 memory controller to give the kernel better guidance on how to treat container memory. It was first introduced in v1.22 and updated in v1.27. In Kubernetes v1.36, we're introducing: opt-in memory reservation, tiered protection by QoS class, observability metrics, and kernel-version warning for memory.high.

What's new in v1.36

Opt-in memory reservation with memoryReservationPolicy

v1.36 separates throttling from reservation. Enabling the feature gate turns on memory.high throttling (the kubelet sets memory.high based on memoryThrottlingFactor, default 0.9), but memory reservation is now controlled by a separate kubelet configuration field:

Guaranteed Pods get hard protection via memory.min. For example, a Guaranteed Pod requesting 512 MiB of memory results in:

$ cat /sys/fs/cgroup/kubepods.slice/kubepods-pod6a4f2e3b_1c9d_4a5e_8f7b_2d3e4f5a6b7c.slice/memory.min
536870912

The kernel will not reclaim this memory under any circumstances. If it cannot honor the guarantee, it invokes the OOM killer on other processes to free pages.

Burstable Pods get soft protection via memory.low. For the same 512 MiB request on a Burstable Pod:

$ cat /sys/fs/cgroup/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod8b3c7d2e_4f5a_6b7c_9d1e_3f4a5b6c7d8e.slice/memory.low
536870912

The kernel avoids reclaiming this memory under normal pressure, but may reclaim it if the alternative is a system-wide OOM.

BestEffort Pods get neither memory.min nor memory.low. Their memory remains fully reclaimable.

Comparison with v1.27 behavior

In earlier versions, enabling the MemoryQoS feature gate immediately set memory.min for every container with a memory request. memory.min is a hard reservation that the kernel will not reclaim, regardless of memory pressure.

Consider a node with 8 GiB of RAM where Burstable Pod requests total 7 GiB. In earlier versions, that 7 GiB would be locked as memory.min, leaving little headroom for the kernel, system daemons, or BestEffort workloads and increasing the risk of OOM kills.

With v1.36 tiered reservation, those Burstable requests map to memory.low instead of memory.min. Under normal pressure, the kernel still protects that memory, but under extreme pressure it can reclaim part of it to avoid system-wide OOM. Only Guaranteed Pods use memory.min, which keeps hard reservation lower.

With memoryReservationPolicy in v1.36, you can enable throttling first, observe workload behavior, and opt into reservation when your node has enough headroom.

Observability metrics

Two alpha-stability metrics are exposed on the kubelet /metrics endpoint:

Metric Description
kubelet_memory_qos_node_memory_min_bytes Total memory.min across Guaranteed Pods
kubelet_memory_qos_node_memory_low_bytes Total memory.low across Burstable Pods

These are useful for capacity planning. If kubelet_memory_qos_node_memory_min_bytes is creeping toward your node's physical memory, you know hard reservation is getting tight.

$ curl -sk https://localhost:10250/metrics | grep memory_qos
# HELP kubelet_memory_qos_node_memory_min_bytes [ALPHA] Total memory.min in bytes for Guaranteed pods
kubelet_memory_qos_node_memory_min_bytes 5.36870912e+08
# HELP kubelet_memory_qos_node_memory_low_bytes [ALPHA] Total memory.low in bytes for Burstable pods
kubelet_memory_qos_node_memory_low_bytes 2.147483648e+09

Kernel version check

On kernels older than 5.9, memory.high throttling can trigger the kernel livelock issue. The bug was fixed in kernel 5.9. In v1.36, when the feature gate is enabled, the kubelet checks the kernel version at startup and logs a warning if it is below 5.9. The feature continues to work - this is informational, not a hard block.

How Kubernetes maps Memory QoS to cgroup v2

Memory QoS uses four cgroup v2 memory controller interfaces:

The following table shows how Kubernetes container resources map to cgroup v2 interfaces when memoryReservationPolicy: TieredReservation is configured. With the default memoryReservationPolicy: None, no memory.min or memory.low values are set.

QoS Class memory.min memory.low memory.high memory.max
Guaranteed Set to requests.memory
(hard protection)
Not set Not set
(requests == limits, so throttling is not useful)
Set to limits.memory
Burstable Not set Set to requests.memory
(soft protection)
Calculated based on
formula with throttling factor
Set to limits.memory
(if specified)
BestEffort Not set Not set Calculated based on
node allocatable memory
Not set

Cgroup hierarchy

cgroup v2 requires that a parent cgroup's memory protection is at least as large as the sum of its children's. The kubelet maintains this by setting memory.min on the kubepods root cgroup to the sum of all Guaranteed and Burstable Pod memory requests, and memory.low on the Burstable QoS cgroup to the sum of all Burstable Pod memory requests. This way the kernel can enforce the per-container and per-pod protection values correctly.

The kubelet manages pod-level and QoS-class cgroups directly using the runc libcontainer library, while container-level cgroups are managed by the container runtime (containerd or CRI-O).

How do I use it?

Prerequisites

  1. Kubernetes v1.36 or later
  2. Linux with cgroup v2. Kernel 5.9 or higher is recommended - earlier kernels work but may experience the livelock issue. You can verify cgroup v2 is active by running mount | grep cgroup2.
  3. A container runtime that supports cgroup v2 (containerd 1.6+, CRI-O 1.22+)

Configuration

To enable Memory QoS with tiered protection:

apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
featureGates:
 MemoryQoS: true
memoryReservationPolicy: TieredReservation # Options: None (default), TieredReservation
memoryThrottlingFactor: 0.9 # Optional: default is 0.9

If you want memory.high throttling without memory protection, omit memoryReservationPolicy or set it to None:

apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
featureGates:
 MemoryQoS: true
memoryReservationPolicy: None  # This is the default

How can I learn more?

Getting involved

This feature is driven by SIG Node. If you are interested in contributing or have feedback, you can find us on Slack (#sig-node), the mailing list, or at the regular SIG Node meetings. Please file bugs at kubernetes/kubernetes and enhancement proposals at kubernetes/enhancements.

29 Apr 2026 6:35pm GMT