30 Apr 2026

feedKubernetes Blog

Kubernetes v1.36: In-Place Vertical Scaling for Pod-Level Resources Graduates to Beta

Following the graduation of Pod-Level Resources to Beta in v1.34 and the General Availability (GA) of In-Place Pod Vertical Scaling in v1.35, the Kubernetes community is thrilled to announce that In-Place Pod-Level Resources Vertical Scaling has graduated to Beta in v1.36!

This feature is now enabled by default via the InPlacePodLevelResourcesVerticalScaling feature gate. It allows users to update the aggregate Pod resource budget (.spec.resources) for a running Pod, often without requiring a container restart.

Why Pod-level in-place resize?

The Pod-level resource model simplified management for complex Pods (such as those with sidecars) by allowing containers to share a collective pool of resources. In v1.36, you can now adjust this aggregate boundary on-the-fly.

This is particularly useful for Pods where containers do not have individual limits defined. These containers automatically scale their effective boundaries to fit the newly resized Pod-level dimensions, allowing you to expand the shared pool during peak demand without manual per-container recalculations.

Resource inheritance and the resizePolicy

When a Pod-level resize is initiated, the Kubelet treats the change as a resize event for every container that inherits its limits from the Pod-level budget. To determine whether a restart is required, the Kubelet consults the resizePolicy defined within individual containers:

Note: Currently, resizePolicy is not supported at the Pod level. The Kubelet always defers to individual container settings to decide if an update can be applied in-place or requires a restart.

Example: ccaling a shared resource pool

In this scenario, a Pod is defined with a 2 CPU pod-level limit. Because the individual containers do not have their own limits defined, they share this total pool.

1. Initial Pod specification

apiVersion: v1
kind: Pod
metadata:
 name: shared-pool-app
spec:
 resources: # Pod-level limits
 limits:
 cpu: "2"
 memory: "4Gi"
 containers:
 - name: main-app
 image: my-app:v1
 resizePolicy: [{resourceName: "cpu", restartPolicy: "NotRequired"}]
 - name: sidecar
 image: logger:v1
 resizePolicy: [{resourceName: "cpu", restartPolicy: "NotRequired"}]

2. The resize operation

To double the CPU capacity to 4 CPUs, apply a patch using the resize subresource:

kubectl patch pod shared-pool-app --subresource resize --patch \
 '{"spec":{"resources":{"limits":{"cpu":"4"}}}}'

Node-Level reality: feasibility and safety

Applying a resize patch is only the first step. The Kubelet performs several checks and follows a specific sequence to ensure node stability:

1. The feasibility check

Before admitting a resize, the Kubelet verifies if the new aggregate request fits within the Node's allocatable capacity. If the Node is overcommitted, the resize is not ignored; instead, the PodResizePending condition will reflect a Deferred or Infeasible status, providing immediate feedback on why the "envelope" hasn't grown.

2. Update sequencing

To prevent resource "overshoot," the Kubelet coordinates the cgroup updates in a specific order:

Observability: tracking resize status

With the move to Beta, Kubernetes uses Pod Conditions to track the lifecycle of a resize:

status:
 allocatedResources:
 cpu: "4"
 resources:
 limits:
 cpu: "4"
 conditions:
 - type: PodResizeInProgress
 status: "True"

Constraints and requirements

What's next?

As we move toward General Availability (GA), the community is focusing on Vertical Pod Autoscaler (VPA) Integration, enabling VPA to issue Pod-level resource recommendations and trigger in-place actuation automatically.

Getting started and providing feedback

We encourage you to test this feature and provide feedback via the standard Kubernetes communication channels:

30 Apr 2026 6:35pm GMT

29 Apr 2026

feedKubernetes Blog

Kubernetes v1.36: Tiered Memory Protection with Memory QoS

On behalf of SIG Node, we are pleased to announce updates to the Memory QoS feature (alpha) in Kubernetes v1.36. Memory QoS uses the cgroup v2 memory controller to give the kernel better guidance on how to treat container memory. It was first introduced in v1.22 and updated in v1.27. In Kubernetes v1.36, we're introducing: opt-in memory reservation, tiered protection by QoS class, observability metrics, and kernel-version warning for memory.high.

What's new in v1.36

Opt-in memory reservation with memoryReservationPolicy

v1.36 separates throttling from reservation. Enabling the feature gate turns on memory.high throttling (the kubelet sets memory.high based on memoryThrottlingFactor, default 0.9), but memory reservation is now controlled by a separate kubelet configuration field:

Guaranteed Pods get hard protection via memory.min. For example, a Guaranteed Pod requesting 512 MiB of memory results in:

$ cat /sys/fs/cgroup/kubepods.slice/kubepods-pod6a4f2e3b_1c9d_4a5e_8f7b_2d3e4f5a6b7c.slice/memory.min
536870912

The kernel will not reclaim this memory under any circumstances. If it cannot honor the guarantee, it invokes the OOM killer on other processes to free pages.

Burstable Pods get soft protection via memory.low. For the same 512 MiB request on a Burstable Pod:

$ cat /sys/fs/cgroup/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod8b3c7d2e_4f5a_6b7c_9d1e_3f4a5b6c7d8e.slice/memory.low
536870912

The kernel avoids reclaiming this memory under normal pressure, but may reclaim it if the alternative is a system-wide OOM.

BestEffort Pods get neither memory.min nor memory.low. Their memory remains fully reclaimable.

Comparison with v1.27 behavior

In earlier versions, enabling the MemoryQoS feature gate immediately set memory.min for every container with a memory request. memory.min is a hard reservation that the kernel will not reclaim, regardless of memory pressure.

Consider a node with 8 GiB of RAM where Burstable Pod requests total 7 GiB. In earlier versions, that 7 GiB would be locked as memory.min, leaving little headroom for the kernel, system daemons, or BestEffort workloads and increasing the risk of OOM kills.

With v1.36 tiered reservation, those Burstable requests map to memory.low instead of memory.min. Under normal pressure, the kernel still protects that memory, but under extreme pressure it can reclaim part of it to avoid system-wide OOM. Only Guaranteed Pods use memory.min, which keeps hard reservation lower.

With memoryReservationPolicy in v1.36, you can enable throttling first, observe workload behavior, and opt into reservation when your node has enough headroom.

Observability metrics

Two alpha-stability metrics are exposed on the kubelet /metrics endpoint:

Metric Description
kubelet_memory_qos_node_memory_min_bytes Total memory.min across Guaranteed Pods
kubelet_memory_qos_node_memory_low_bytes Total memory.low across Burstable Pods

These are useful for capacity planning. If kubelet_memory_qos_node_memory_min_bytes is creeping toward your node's physical memory, you know hard reservation is getting tight.

$ curl -sk https://localhost:10250/metrics | grep memory_qos
# HELP kubelet_memory_qos_node_memory_min_bytes [ALPHA] Total memory.min in bytes for Guaranteed pods
kubelet_memory_qos_node_memory_min_bytes 5.36870912e+08
# HELP kubelet_memory_qos_node_memory_low_bytes [ALPHA] Total memory.low in bytes for Burstable pods
kubelet_memory_qos_node_memory_low_bytes 2.147483648e+09

Kernel version check

On kernels older than 5.9, memory.high throttling can trigger the kernel livelock issue. The bug was fixed in kernel 5.9. In v1.36, when the feature gate is enabled, the kubelet checks the kernel version at startup and logs a warning if it is below 5.9. The feature continues to work - this is informational, not a hard block.

How Kubernetes maps Memory QoS to cgroup v2

Memory QoS uses four cgroup v2 memory controller interfaces:

The following table shows how Kubernetes container resources map to cgroup v2 interfaces when memoryReservationPolicy: TieredReservation is configured. With the default memoryReservationPolicy: None, no memory.min or memory.low values are set.

QoS Class memory.min memory.low memory.high memory.max
Guaranteed Set to requests.memory
(hard protection)
Not set Not set
(requests == limits, so throttling is not useful)
Set to limits.memory
Burstable Not set Set to requests.memory
(soft protection)
Calculated based on
formula with throttling factor
Set to limits.memory
(if specified)
BestEffort Not set Not set Calculated based on
node allocatable memory
Not set

Cgroup hierarchy

cgroup v2 requires that a parent cgroup's memory protection is at least as large as the sum of its children's. The kubelet maintains this by setting memory.min on the kubepods root cgroup to the sum of all Guaranteed and Burstable Pod memory requests, and memory.low on the Burstable QoS cgroup to the sum of all Burstable Pod memory requests. This way the kernel can enforce the per-container and per-pod protection values correctly.

The kubelet manages pod-level and QoS-class cgroups directly using the runc libcontainer library, while container-level cgroups are managed by the container runtime (containerd or CRI-O).

How do I use it?

Prerequisites

  1. Kubernetes v1.36 or later
  2. Linux with cgroup v2. Kernel 5.9 or higher is recommended - earlier kernels work but may experience the livelock issue. You can verify cgroup v2 is active by running mount | grep cgroup2.
  3. A container runtime that supports cgroup v2 (containerd 1.6+, CRI-O 1.22+)

Configuration

To enable Memory QoS with tiered protection:

apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
featureGates:
 MemoryQoS: true
memoryReservationPolicy: TieredReservation # Options: None (default), TieredReservation
memoryThrottlingFactor: 0.9 # Optional: default is 0.9

If you want memory.high throttling without memory protection, omit memoryReservationPolicy or set it to None:

apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
featureGates:
 MemoryQoS: true
memoryReservationPolicy: None  # This is the default

How can I learn more?

Getting involved

This feature is driven by SIG Node. If you are interested in contributing or have feedback, you can find us on Slack (#sig-node), the mailing list, or at the regular SIG Node meetings. Please file bugs at kubernetes/kubernetes and enhancement proposals at kubernetes/enhancements.

29 Apr 2026 6:35pm GMT

28 Apr 2026

feedKubernetes Blog

Kubernetes v1.36: Staleness Mitigation and Observability for Controllers

Staleness in Kubernetes controllers is a problem that affects many controllers, and is something may affect controller behavior in subtle ways. It is usually not until it is too late, when a controller in production has already taken incorrect action, that staleness is found to be an issue due to some underlying assumption made by the controller author. Some issues caused by staleness include controllers taking incorrect actions, controllers not taking action when they should, and controllers taking too long to take action. I am excited to announce that Kubernetes v1.36 includes new features that help mitigate staleness in controllers and provide better observability into controller behavior.

What is staleness?

Staleness in controllers comes from an outdated view of the world inside of the controller cache. In order to provide a fast user experience, controllers typically maintain a local cache of the state of the cluster. This cache is populated by watching the Kubernetes API server for changes to objects that the controller cares about. When the controller needs to take action, it will first check its cache to see if it has the latest information. If it does not, it will then update its cache by watching the API server for changes to objects that the controller cares about. This process is known as reconciliation.

However, there are some cases where the controller's cache may be outdated. For example, if the controller is restarted, it will need to rebuild its cache by watching the API server for changes to objects that the controller cares about. During this time, the controller's cache will be outdated, and it will not be able to take action. Additionally, if the API server is down, the controller's cache will not be updated, and it will not be able to take action. These are just a few examples of cases where the controller's cache may be outdated.

Improvements in 1.36

Kubernetes v1.36 includes improvements in both client-go as well as implementations of highly contended controllers in kube-controller-manager, using those client-go improvements.

client-go improvements

In client-go, the project added atomic FIFO processing (feature gate name AtomicFIFO), which is on top of the existing FIFO queue implementation. The new approach allows for the queue to atomically handle operations that are recieved in batches, such as the initial set of objects from a list operation that an informer uses to populate its cache. This ensures that the queue is always in a consistent state, even when events come out of order. Prior to this, events were added to the queue in the order that they were received, which could lead to an inconsistent state in the cache that does not accurately reflect the state of the cluster.

With this change, you can now ensure that the queue is always in a consistent state, even when events come out of order. To take advantage of this, clients using client-go can now introspect into the cache to determine the latest resource version that the controller cache has seen. This is done with the newly added function LastStoreSyncResourceVersion() implemented on the Store interface here. This function is the basis for the staleness mitigation features in kube-controller-manager.

kube-controller-manager improvements

In kube-controller-manager, the v1.36 release has added the ability for 4 different controllers to use this new capability. The controllers are:

  1. DaemonSet controller
  2. StatefulSet controller
  3. ReplicaSet controller
  4. Job controller

These controllers all act on pods, which in most cases are under the highest amount of contention in a cluster. The changes are on by default for these controllers, and can be disabled by setting the feature gates StaleControllerConsistency<API type> to false for the specific controller you wish to disable it for. For example, to disable the feature for the DaemonSet controller, you would set the feature gate StaleControllerConsistencyDaemonSet to false.

When the relevant feature gate is enabled, the controller will first check the latest resource version of the cache before taking action. If the latest resource version of the cache is lower than what the controller has written to the API server for the object it is trying to reconcile, the controller will not take action. This is because the controller's cache is outdated, and it does not have the latest information about the state of the cluster.

Use for informer authors

Informer authors using client-go can also immediately take advantage of these improvements. See an example of how to use this feature in the ReplicaSet informer. This PR shows how to use the new feature to check if the informer's cache is stale before taking action. The client-go library provides a ConsistencyStore data structure that queries the store and compares the latest resource version of the cache with the written resource version of the object.

The ReplicaSet controller tracks both the ReplicaSet's resource version and the resource version of the pods that the ReplicaSet manages. For a specific ReplicaSet, it tracks the latest written resource version of the pods that the ReplicaSet owns as well as any writes to the ReplicaSet itself. If the latest resource version of the cache is lower than what the controller has written to the API server for the object it is trying to reconcile, the controller will not take action. This is because the controller's cache is outdated, and it does not have the latest information about the state of the cluster.

An informer author can use the ConsistencyStore to track the latest resource version of the objects that the informer cares about. It provides 3 main functions:

type ConsistencyStore interface {
 // WroteAt records that the given object was written at the given resource version.
 WroteAt(owningObj runtime.Object, uid types.UID, groupResource schema.GroupResource, resourceVersion string)

 // EnsureReady returns true if the cache is up to date for the given object.
 // It is used prior to reconciliation to decide whether to reconcile or not.
 EnsureReady(namespacedName types.NamespacedName) bool

 // Clear removes the given object from the consistency store.
 // It is used when an object is deleted.
 Clear(namespacedName types.NamespacedName, uid types.UID)
}
  1. WroteAt: This function is called by the controller when it writes to the API server for an object. It is used to record the latest resource version of the object that the controller has written to the API server. The owningObj is the object that the controller is reconciling, and the uid is the UID of that object. The resource version and GroupResource are the resource version and GroupResource of the object that the controller has written to the API server. The object is not explicitly tracked, since the controller only cares about waiting to catch up to the latest resource version of the written object.
  2. EnsureReady: This function is called by the controller to ensure that the cache is up to date for the object. It is used prior to reconciliation to decide whether to reconcile or not. It returns true if the cache is up to date for the object, and false otherwise. It will use the information provided by WroteAt to determine if the cache is up to date.
  3. Clear: This function is called by the controller when an object is deleted. It is used to remove the object from the consistency store. This is mostly used for cleanup when an object is deleted to prevent the consistency store from growing indefinitely.

The UID is used to distinguish between different objects that have the same name, such as when an object is deleted and then recreated. It is not needed for EnsureReady because the consistency store is only concerned with catching up to the latest resource version of the object, not the specific object. It is primarily used to ensure that the controller doesn't delete the entry for an object when it is recreated with a new UID.

With these 3 functions, an informer author can implement staleness mitigation in their controller.

Observability

In addition to the staleness mitigation features, the Kubernetes project has also added related instrumentation to kube-controller-manager in 1.36. These metrics are also enabled by default, and are controlled using the same set of feature gates.

Metrics

The following alpha metrics have been added to kube-controller-manager in 1.36:

stale_sync_skips_total: The number of times the controller has skipped a sync due to stale cache. This metric is exposed for each controller that uses the staleness mitigation feature with the subsystem of the controller.

This metric is exposed by the kube-controller-manager metrics endpoint, and can be used to monitor the health of the controller.

Along with this metric, client-go also emits metrics that expose the latest resource version of every shared informer with the subsystem of the informer. This allows you to see the latest resource version of each informer, and use that to determine if the controller's cache is stale, especially great for comparing against the resource version of the API server.

This metric is named store_resource_version and has the Group, Version, and Resource as labels.

What's next?

Kubernetes SIG API Machinery is excited to continue working on this feature and hope to bring it to more controllers in the future. We are also interested in hearing your feedback on this feature. Please let us know what you think in the comments below or by opening an issue on the Kubernetes GitHub repository.

We are also working with controller-runtime to enable this set of semantics for all controllers built with controller-runtime. This will allow any controller built with controller-runtime to gain the benefits of read your own writes, without having to implement the logic themselves.

28 Apr 2026 6:35pm GMT

27 Apr 2026

feedKubernetes Blog

Kubernetes v1.36: Mutable Pod Resources for Suspended Jobs (beta)

Kubernetes v1.36 promotes the ability to modify container resource requests and limits in the pod template of a suspended Job to beta. First introduced as alpha in v1.35, this feature allows queue controllers and cluster administrators to adjust CPU, memory, GPU, and extended resource specifications on a Job while it is suspended, before it starts or resumes running.

Why mutable pod resources for suspended Jobs?

Batch and machine learning workloads often have resource requirements that are not precisely known at Job creation time. The optimal resource allocation depends on current cluster capacity, queue priorities, and the availability of specialized hardware like GPUs.

Before this feature, resource requirements in a Job's pod template were immutable once set. If a queue controller like Kueue determined that a suspended Job should run with different resources, the only option was to delete and recreate the Job, losing any associated metadata, status, or history. This feature also provides a way to let a specific Job instance for a CronJob progress slowly with reduced resources, rather than outright failing to run if the cluster is heavily loaded.

Consider a machine learning training Job initially requesting 4 GPUs:

apiVersion: batch/v1
kind: Job
metadata:
 name: training-job-example-abcd123
 labels:
 app.kubernetes.io/name: trainer
spec:
 suspend: true
 template:
 metadata:
 annotations:
 kubernetes.io/description: "ML training, ID abcd123"
 spec:
 containers:
 - name: trainer
 image: example-registry.example.com/training:2026-04-23T150405.678
 resources:
 requests:
 cpu: "8"
 memory: "32Gi"
 example-hardware-vendor.com/gpu: "4"
 limits:
 cpu: "8"
 memory: "32Gi"
 example-hardware-vendor.com/gpu: "4"
 restartPolicy: Never

A queue controller managing cluster resources might determine that only 2 GPUs are available. With this feature, the controller can update the Job's resource requests before resuming it:

apiVersion: batch/v1
kind: Job
metadata:
 name: training-job-example-abcd123
 labels:
 app.kubernetes.io/name: trainer
spec:
 suspend: true
 template:
 metadata:
 annotations:
 kubernetes.io/description: "ML training, ID abcd123"
 spec:
 containers:
 - name: trainer
 image: example-registry.example.com/training:2026-04-23T150405.678
 resources:
 requests:
 cpu: "4"
 memory: "16Gi"
 example-hardware-vendor.com/gpu: "2"
 limits:
 cpu: "4"
 memory: "16Gi"
 example-hardware-vendor.com/gpu: "2"
 restartPolicy: Never

Once the resources are updated, the controller resumes the Job by setting spec.suspend to false, and the new Pods are created with the adjusted resource specifications.

How it works

The Kubernetes API server relaxes the immutability constraint on pod template resource fields specifically for suspended Jobs. No new API types have been introduced; the existing Job and pod template structures accommodate the change through relaxed validation.

The mutable fields are:

Resource updates are permitted when the following conditions are met:

  1. The Job has spec.suspend set to true.
  2. For a Job that was previously running and then suspended, all active Pods must have terminated (status.active equals 0) before resource mutations are accepted.

Standard resource validation still applies. For example, resource limits must be greater than or equal to requests, and extended resources must be specified as whole numbers where required.

What's new in beta

With the promotion to beta in Kubernetes v1.36, the MutablePodResourcesForSuspendedJobs feature gate is enabled by default. This means clusters running v1.36 can use this feature without any additional configuration on the API server.

Try it out

If your cluster is running Kubernetes v1.36 or later, this feature is available by default. For v1.35 clusters, enable the MutablePodResourcesForSuspendedJobs feature gate on the kube-apiserver.

You can test it by creating a suspended Job, updating its container resources using kubectl edit or a controller, and then resuming the Job:

# Create a suspended Job
kubectl apply -f my-job.yaml --server-side

# Edit the resource requests
kubectl edit job training-job-example-abcd123

# Resume the Job
kubectl patch job training-job-example-abcd123 -p '{"spec":{"suspend":false}}'

Considerations

Running Jobs that are suspended

If you suspend a Job that was already running, you must wait for all of that Job's active Pods to terminate before modifying resources. The API server rejects resource mutations while status.active is greater than zero. This prevents inconsistency between running Pods and the updated pod template.

Pod replacement policy

When using this feature with Jobs that may have failed Pods, consider setting podReplacementPolicy: Failed. This ensures that replacement Pods are only created after the previous Pods have fully terminated, preventing resource contention from overlapping Pods.

ResourceClaims

Dynamic Resource Allocation (DRA) resourceClaimTemplates remain immutable. If your workload uses DRA, you must recreate the claim templates separately to match any resource changes.

Getting involved

This feature was developed by SIG Apps This feature was developed by SIG Apps with input from WG Batch. Both groups welcome feedback as the feature progresses toward stable.

You can reach out through:

27 Apr 2026 6:35pm GMT

24 Apr 2026

feedKubernetes Blog

Kubernetes v1.36: Fine-Grained Kubelet API Authorization Graduates to GA

On behalf of Kubernetes SIG Auth and SIG Node, we are pleased to announce the graduation of fine-grained kubelet API authorization to General Availability (GA) in Kubernetes v1.36!

The KubeletFineGrainedAuthz feature gate was introduced as an opt-in alpha feature in Kubernetes v1.32, then graduated to beta (enabled by default) in v1.33. Now, the feature is generally available and the feature gate is locked to enabled. This feature enables more precise, least-privilege access control over the kubelet's HTTPS API, replacing the need to grant the overly broad nodes/proxy permission for common monitoring and observability use cases.

Motivation: the nodes/proxy problem

The kubelet exposes an HTTPS endpoint with several APIs that give access to data of varying sensitivity, including pod listings, node metrics, container logs, and, critically, the ability to execute commands inside running containers.

Prior to this feature, kubelet authorization used a coarse-grained model. When webhook authorization was enabled, almost all kubelet API paths were mapped to a single nodes/proxy subresource. This meant that any workload needing to read metrics or health status from the kubelet required nodes/proxy permission, the same permission that also grants the ability to execute arbitrary commands in any container running on the node.

What's wrong with that?

Granting nodes/proxy to monitoring agents, log collectors, or health-checking tools violates the principle of least privilege. If any of those workloads were compromised, an attacker would gain the ability to run commands in every container on the node. The nodes/proxy permission is effectively a node-level superuser capability, and granting it broadly dramatically increases the blast radius of a security incident.

This problem has been well understood in the community for years (see kubernetes/kubernetes#83465), and was the driving motivation behind this enhancement KEP-2862.

The nodes/proxy GET WebSocket RCE risk

The situation is more severe than it might appear at first glance. Security researchers demonstrated in early 2026 that nodes/proxy GET alone, which is the minimal read-only permission routinely granted to monitoring tools, can be abused to execute commands in any pod on reachable nodes.

The root cause is a mismatch between how WebSocket connections work and how the kubelet maps HTTP methods to RBAC verbs. The WebSocket protocol (RFC 6455) requires an HTTP GET request for the initial connection handshake. The kubelet maps this GET to the RBAC get verb and authorizes the request without performing a secondary check to confirm that CREATE permission is also present for the write operation that follows. Using a WebSocket client like websocat, an attacker can reach the kubelet's /exec endpoint directly on port 10250 and execute arbitrary commands:

websocat --insecure \
 --header "Authorization: Bearer $TOKEN" \
 --protocol v4.channel.k8s.io \
 "wss://$NODE_IP:10250/exec/default/nginx/nginx?output=1&error=1&command=id"

uid=0(root) gid=0(root) groups=0(root)

Fine-grained kubelet authorization: how it works

With KubeletFineGrainedAuthz, the kubelet now performs an additional, more specific authorization check before falling back to the nodes/proxy subresource. Several commonly used kubelet API paths are mapped to their own dedicated subresources:

kubelet API Resource Subresource
/stats/* nodes stats
/metrics/* nodes metrics
/logs/* nodes log
/pods nodes pods, proxy
/runningPods/ nodes pods, proxy
/healthz nodes healthz, proxy
/configz nodes configz, proxy
/spec/* nodes spec
/checkpoint/* nodes checkpoint
all others nodes proxy

For the endpoints that now have fine-grained subresources (/pods, /runningPods/, /healthz, /configz), the kubelet first sends a SubjectAccessReview for the specific subresource. If that check succeeds, the request is authorized. If it fails, the kubelet retries with the coarse-grained nodes/proxy subresource for backward compatibility.

This dual-check approach ensures a smooth migration path. Existing workloads with nodes/proxy permissions continue to work, while new deployments can adopt least-privilege access from day one.

What this means in practice

Consider a Prometheus node exporter or a monitoring DaemonSet that needs to scrape /metrics from the kubelet. Previously, you would need an RBAC ClusterRole like this:

# Old approach: overly broad
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
 name: monitoring-agent
rules:
- apiGroups: [""]
 resources: ["nodes/proxy"]
 verbs: ["get"]

This grants the monitoring agent far more access than it needs. With fine-grained authorization, you can now scope the permissions precisely:

# New approach: least privilege
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
 name: monitoring-agent
rules:
- apiGroups: [""]
 resources: ["nodes/metrics", "nodes/stats"]
 verbs: ["get"]

The monitoring agent can now read metrics and stats from the kubelet without ever being able to execute commands in containers.

Updated system:kubelet-api-admin ClusterRole

When RBAC authorization is enabled, the built-in system:kubelet-api-admin ClusterRole is automatically updated to include permissions for all the new fine-grained subresources. This ensures that cluster administrators who already use this role, including the API server's kubelet client, continue to have full access without any manual configuration changes.

The role now includes permissions for:

Upgrade considerations

Because the kubelet performs a dual authorization check (fine-grained first, then falling back to nodes/proxy), upgrading to v1.36 should be seamless for most clusters:

Verifying the feature is enabled

You can confirm that the feature is active on a given node by checking the kubelet metrics endpoint. Since the metrics endpoint on port 10250 requires authorization, you'll first need to create appropriate RBAC bindings for the pod or ServiceAccount making the request.

Step 1: Create a ServiceAccount and ClusterRole

apiVersion: v1
kind: ServiceAccount
metadata:
 name: kubelet-metrics-checker
 namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
 name: kubelet-metrics-reader
rules:
- apiGroups: [""]
 resources: ["nodes/metrics"]
 verbs: ["get"]

Step 2: Bind the ClusterRole to the ServiceAccount

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
 name: kubelet-metrics-checker
subjects:
- kind: ServiceAccount
 name: kubelet-metrics-checker
 namespace: default
roleRef:
 kind: ClusterRole
 name: kubelet-metrics-reader
 apiGroup: rbac.authorization.k8s.io

Apply both manifests:

kubectl apply -f serviceaccount.yaml
kubectl apply -f clusterrole.yaml
kubectl apply -f clusterrolebinding.yaml

Step 3: Run a pod with the ServiceAccount and check the feature flag

kubectl run kubelet-check \
 --image=curlimages/curl \
 --serviceaccount=kubelet-metrics-checker \
 --restart=Never \
 --rm -it \
 -- sh

Then from within the pod, retrieve the node IP and query the metrics endpoint:

# Get the token
TOKEN=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)

# Query the kubelet metrics and filter for the feature gate
curl -sk \
 --header "Authorization: Bearer $TOKEN" \
 https://$NODE_IP:10250/metrics \
 | grep kubernetes_feature_enabled \
 | grep KubeletFineGrainedAuthz

If the feature is enabled, you should see output like:

kubernetes_feature_enabled{name="KubeletFineGrainedAuthz",stage="GA"} 1

Note: Replace $NODE_IP with the IP address of the node you want to check. You can retrieve node IPs with kubectl get nodes -o wide.

The journey from alpha to GA

Release Stage Details
v1.32 Alpha Feature gate KubeletFineGrainedAuthz introduced, disabled by default
v1.33 Beta Enabled by default; fine-grained checks for /pods, /runningPods/, /healthz, /configz
v1.36 GA Feature gate locked to enabled; fine-grained kubelet authorization is always active

What's next?

With fine-grained kubelet authorization now GA, the Kubernetes community can begin recommending and eventually enforcing the use of specific subresources instead of nodes/proxy for monitoring and observability workloads. The urgency of this migration is underscored by research showing that nodes/proxy GET can be abused for unlogged remote code execution via the WebSocket protocol. This risk is present in the default RBAC configurations of dozens of widely deployed Helm charts. Over time, we expect:

Getting involved

This enhancement was driven by SIG Auth and SIG Node. If you are interested in contributing to the security and authorization features of Kubernetes, please join us:

We look forward to hearing your feedback and experiences with this feature!

24 Apr 2026 6:35pm GMT

23 Apr 2026

feedKubernetes Blog

Kubernetes v1.36: User Namespaces in Kubernetes are finally GA

After several years of development, User Namespaces support in Kubernetes reached General Availability (GA) with the v1.36 release. This is a Linux-only feature.

For those of us working on low level container runtimes and rootless technologies, this has been a long awaited milestone. We finally reached the point where "rootless" security isolation can be used for Kubernetes workloads.

This feature also enables a critical pattern: running workloads with privileges and still being confined in the user namespace. When hostUsers: false is set, capabilities like CAP_NET_ADMIN become namespaced, meaning they grant administrative power over container local resources without affecting the host. This effectively enables new use cases that were not possible before without running a fully privileged container.

The Problem with UID 0

A process running as root inside a container is also seen from the kernel as root on the host. If an attacker manages to break out of the container, whether through a kernel vulnerability or a misconfigured mount, they are root on the host.

While there are many security measures in place for running containers, these measures don't change the underlying identity of the process, it still has some "parts" of root.

The engine: ID-mapped mounts

The road to GA wasn't just about the Kubernetes API; it was about making the kernel work for us. In the early stages, one of the biggest blockers was volume ownership. If you mapped a container to a high UID range, the Kubelet had to recursively chown every file in the attached volume so the container could read/write them. For large volumes, this was such an expensive operation that destroyed startup performance.

The key enabler was ID-mapped mounts (introduced in Linux 5.12 and refined in later versions). Instead of rewriting file ownership on disk, the kernel remaps it at mount time.

When a volume is mounted into a Pod with User Namespaces enabled, the kernel performs a transparent translation of the UIDs (user ids) and GIDs (group ids). To the container, the files appear owned by UID 0. On disk, file ownership is unchanged - no chown is needed. This is an O(1) operation, instant and efficient.

Using it in Kubernetes v1.36

Using user namespaces is straightforward: all you need to do is set hostUsers: false in your Pod spec. No changes to your container images, no complex configuration. The interface remains the same one introduced during the Alpha phase. In the spec for a Pod (or PodTemplate), you explicitly opt-out of the host user namespace:

apiVersion: v1
kind: Pod
metadata:
 name: isolated-workload
spec:
 hostUsers: false
 containers:
 - name: app
 image: fedora:42
 securityContext:
 runAsUser: 0

For more details on how user namespaces work in practice and demos of CVEs rated HIGH mitigated, see the previous blog posts: User Namespaces alpha, User Namespaces stateful pods in alpha, User Namespaces beta, and User Namespaces enabled by default.

Getting involved

If you're interested in user namespaces or want to contribute, here are some useful links:

Acknowledgments

This feature has been years in the making: the first KEP was opened 10 years ago by other contributors, and we have been actively working on it for the last 6 years. We'd like to thank everyone who contributed across SIG Node, the container runtimes, and the Linux kernel. Special thanks to the reviewers and early adopters who helped shape the design through multiple alpha and beta cycles.

23 Apr 2026 6:35pm GMT

22 Apr 2026

feedKubernetes Blog

SELinux Volume Label Changes goes GA (and likely implications in v1.37)

If you run Kubernetes on Linux with SELinux in enforcing mode, plan ahead: a future release (anticipated to be v1.37) is expected to turn the SELinuxMount feature gate on by default. This makes volume setup faster for most workloads, but it can break applications that still depend on the older recursive relabeling model in subtle ways (for example, sharing one volume between privileged and unprivileged Pods on the same node). Kubernetes v1.36 is the right release to audit your cluster and fix or opt out of this change.

If your nodes do not use SELinux, nothing changes for you: the kubelet skips the whole SELinux logic when SELinux is unavailable or disabled in the Linux kernel. You can skip this article completely.

This blog builds on the earlier work described in the Kubernetes 1.27: Efficient SELinux Relabeling (Beta) post, where the SELinuxMountReadWriteOncePod feature gate was described. The problem to be addressed remains the same, however, this blog extends that same approach to all volumes.

The problem

Linux systems with Security Enhanced Linux (SELinux) enabled use labels attached to objects (for example, files and network sockets) to make access control decisions. Historically, the container runtime applies SELinux labels to a Pod and all its volumes. Kubernetes only passes the SELinux label from a Pod's securityContext fields to the container runtime.

The container runtime then recursively changes the SELinux label on all files that are visible to the Pod's containers. This can be time-consuming if there are many files on the volume, especially when the volume is on a remote filesystem.

Caution:

If a container uses subPath of a volume, only that subPath of the whole volume is relabeled. This allows two Pods that have two different SELinux labels to use the same volume, as long as they use different subpaths of it.

If a Pod does not have any SELinux label assigned in the Kubernetes API, the container runtime assigns a unique random label, so a process that potentially escapes the container boundary cannot access data of any other container on the host. The container runtime still recursively relabels all Pod volumes with this random SELinux label.

What Kubernetes is improving

Where the stack supports it, the kubelet can mount the volume with -o context=<label> so the kernel applies the correct label for all inodes on that mount without a recursive inode traversal. That path is gated by feature flags and requires, among other things, that the Pod expose enough of an SELinux label (for example spec.securityContext.seLinuxOptions.level) and that the volume driver opts in (for CSI, CSIDriver field spec.seLinuxMount: true).

The project rolled this out in phases:

If a Pod and its volume meet all of the following conditions, Kubernetes will mount the volume directly with the right SELinux label. Such a mount will happen in a constant time and the container runtime will not need to recursively relabel any files on it. For such a mount to happen:

  1. The operating system must support SELinux. Without SELinux support detected, the kubelet and the container runtime do not do anything with regard to SELinux.

  2. The feature gate SELinuxMountReadWriteOncePod must be enabled. If you're running Kubernetes v1.36, the feature is enabled unconditionally.

  3. The Pod must use a PersistentVolumeClaim with applicable accessModes:

    • Either the volume has accessModes: ["ReadWriteOncePod"]
    • or the volume can use any other access mode(s), provided that the feature gates SELinuxChangePolicy and SELinuxMount are both enabled and the Pod has spec.securityContext.seLinuxChangePolicy set to nil (default) or as MountOption.

    The feature gate SELinuxMount is Beta and disabled by default in Kubernetes 1.36. All other SELinux-related feature gates are now General Availability (GA).

    With any of these feature gates disabled, SELinux labels will always be applied by the container runtime via recursively traversing through the volume (or its subPaths).

  4. The Pod must have at least seLinuxOptions.level assigned in its security context or all containers in that Pod must have it set in their container-level security contexts. Kubernetes will read the default user, role and type from the operating system defaults (typically system_u, system_r and container_t).

    Without Kubernetes knowing at least the SELinux level, the container runtime will assign a random level after the volumes are mounted. The container runtime will still relabel the volumes recursively in that case.

  5. The volume plugin or the CSI driver responsible for the volume supports mounting with SELinux mount options.

    These in-tree volume plugins support mounting with SELinux mount options: fc and iscsi.

    CSI drivers that support mounting with SELinux mount options must declare this capability in their CSIDriver instance by setting the seLinuxMount field.

    Volumes managed by other volume plugins or CSI drivers that do not set seLinuxMount: true will be recursively relabeled by the container runtime.

The breaking change

The SELinuxMount feature gate changes what volumes can be shared among multiple Pods in a subtle way.

Both of these cases work with recursive relabeling:

  1. Two Pods with different SELinux labels share the same volume, but each of them uses a different subPath to the volume.
  2. A privileged Pod and an unprivileged Pod share the same volume.

The above scenarios will not work with modern, target behavior for Kubernetes mounting when SELinux is active. Instead, one of these Pods will be stuck in ContainerCreating until the other Pod is terminated.

The first case is very niche and hasn't been seen in practice. Although the second case is still quite rare, this setup has been observed in applications. Kubernetes v1.36 offers metrics and events to identify these Pods and allows cluster administrators to opt out of the mount option through the Pod field spec.securityContext.seLinuxChangePolicy.

seLinuxChangePolicy

The new Pod field spec.securityContext.seLinuxChangePolicy specifies how the SELinux label is applied to all Pod volumes. In Kubernetes v1.36, this field is part of the stable Pod API.

There are three choices available:

field not set (default)
In Kubernetes v1.36, the behavior depends on whether the SELinuxMount feature gate is enabled. By default that feature gate is not enabled, and the SELinux label is applied recursively. If you enable that feature gate in your cluster, and all other conditions are met, labelling will be applied using the mount option.
Recursive
the SELinux label is applied recursively. This opts out from using the mount option.
MountOption
the SELinux label is applied using the mount option, if all other conditions are met. This choice is available only when the SELinuxMount feature gate is enabled.

SELinux warning controller (optional)

Kubernetes v1.36 provides a new controller within the control plane, selinux-warning-controller. This controller runs within the kube-controller-manager controller. To use it, you pass --controllers=*,selinux-warning-controller on the kube-controller-manager command line; you also must not have explicitly overridden the SELinuxChangePolicy feature gate to be disabled.

The controller watches all Pods in the cluster and emits an Event when it finds two Pods that share the same volume in a way that is not compatible with the SELinuxMount feature gate. All such conflicting Pods will receive an event, such as:

SELinuxLabel "system_u:system_r:container_t:s0:c98,c99" conflicts with pod my-other-pod that uses the same volume as this pod with SELinuxLabel "system_u:system_r:container_t:s0:c0,c1". If both pods land on the same node, only one of them may access the volume.

The actual Pod name may be censored when the conflicting Pods run in different namespaces to prevent leaking information across namespace boundaries.

The controller reports such an event even when these Pods don't run on the same node, to make sure all Pods work regardless of the Kubernetes scheduler decision. They could run on the same node next time.

In addition, the controller emits the metric selinux_warning_controller_selinux_volume_conflict that lists all current conflicts among Pods. The metric has labels that identify the conflicting Pods and their SELinux labels, such as:

selinux_warning_controller_selinux_volume_conflict{pod1_name="my-other-pod",pod1_namespace="default",pod1_value="system_u:object_r:container_file_t:s0:c0,c1",pod2_name="my-pod",pod2_namespace="default",pod2_value="system_u:object_r:container_file_t:s0:c0,c2",property="SELinuxLabel"} 1

There is a security consequence from enabling this opt-in controller: it may reveal namespace names, which are always present in the metric. The Kubernetes project assumes only cluster administrators can access kube-controller-manager metrics.

Suggested upgrade path

To ensure a smooth upgrade path from v1.36 to a release with SELinuxMount enabled (anticipated to be v1.37), we suggest you follow these steps:

  1. Enable selinux-warning-controller in the kube-controller-manager.
  2. Check the selinux_warning_controller_selinux_volume_conflict metric. It shows all potential conflicts between Pods. For each conflicting Pod (Deployment, StatefulSet, etc.), either apply the opt-out (set Pod's spec.securityContext.seLinuxChangePolicy: Recursive) or re-architect the application to remove such a conflict. For example, do your Pods really need to run as privileged?
  3. Check the volume_manager_selinux_volume_context_mismatch_warnings_total metric. This metric is emitted by the kubelet when it actually starts a Pod that runs when SELinuxMount is disabled, but such a Pod won't start when SELinuxMount is enabled. This metric lists the number of Pods that will experience a true conflict. Unfortunately, this metric does not expose the exact Pod name as a label. The full Pod name is available only in the selinux_warning_controller_selinux_volume_conflict metric.
  4. Once both metrics have been accounted for, upgrade to a Kubernetes version that has SELinuxMount enabled.

Consider using a MutatingAdmissionPolicy, a mutating webhook, or a policy engine like Kyverno or Gatekeeper to apply the opt-out to all Pods in a namespace or across the entire cluster.

When SELinuxMount is enabled, the kubelet will emit the metric volume_manager_selinux_volume_context_mismatch_errors_total with the number of Pods that could not start because their SELinux label conflicts with an existing Pod that uses the same volume. The exact Pod names should still be available in the selinux_warning_controller_selinux_volume_conflict metric, if the selinux-warning-controller is enabled.

Further reading

Acknowledgements

If you run into issues, have feedback, or want to contribute, find us on the Kubernetes Slack in #sig-node and #sig-storage or join a SIG Node or SIG Storage meetings.

22 Apr 2026 6:35pm GMT

Kubernetes v1.36: ハル (Haru)

Editors: Chad M. Crowell, Kirti Goyal, Sophia Ugochukwu, Swathi Rao, Utkarsh Umre

Similar to previous releases, the release of Kubernetes v1.36 introduces new stable, beta, and alpha features. The consistent delivery of high-quality releases underscores the strength of our development cycle and the vibrant support from our community.

This release consists of 70 enhancements. Of those enhancements, 18 have graduated to Stable, 25 are entering Beta, and 25 have graduated to Alpha.

There are also some deprecations and removals in this release; make sure to read about those.

Release theme and logo

We open 2026 with Kubernetes v1.36, a release that arrives as the season turns and the light shifts on the mountain. ハル (Haru) is a sound in Japanese that carries many meanings; among those we hold closest are 春 (spring), 晴れ (hare, clear skies), and 遥か (haruka, far-off, distant). A season, a sky, and a horizon. You will find all three in what follows.

The logo, created by avocadoneko / Natsuho Ide, draws inspiration from Katsushika Hokusai's Thirty-six Views of Mount Fuji (富嶽三十六景, Fugaku Sanjūrokkei), the same series that gave the world The Great Wave off Kanagawa. Our v1.36 logo reimagines one of the series' most celebrated prints, Fine Wind, Clear Morning (凱風快晴, Gaifū Kaisei), also known as Red Fuji (赤富士, Aka Fuji): the mountain lit red in a summer dawn, bare of snow after the long thaw. Thirty-six views felt like a fitting number to sit with at v1.36, and a reminder that even Hokusai didn't stop there.1 Keeping watch over the scene is the Kubernetes helm, set into the sky alongside the mountain.

At the foot of Fuji sit Stella (left) and Nacho (right), two cats with the Kubernetes helm on their collars, standing in for the role of komainu, the paired lion-dog guardians that watch over Japanese shrines. Paired, because nothing is guarded alone. Stella and Nacho stand in for a very much larger set of paws: the SIGs and working groups, the maintainers and reviewers, the people behind docs, blogs, and translations, the release team, first-time contributors taking their first steps, and lifelong contributors returning season after season. Kubernetes v1.36 is, as always, held up by many hands.

Brushed across Red Fuji in the logo is the calligraphy 晴れに翔け (hare ni kake), "soar into clear skies". It is the first half of a couplet that was too long to fit on the mountain:

晴れに翔け、未来よ明け
hare ni kake, asu yo ake
"Soar into clear skies; toward tomorrow's sunrise."2

That is the wish we carry for this release: to soar into clear skies, for the release itself, for the project, and for everyone who ships it together. The dawn breaking over Red Fuji is not an ending but a passage: this release carries us to the next, and that one to the one after, on toward horizons far beyond what any single view can hold.

1. The series was so popular that Hokusai added ten more prints, bringing the total to forty-six.
2. 未来 means "the future" in its widest sense, not just tomorrow but everything still to come. It is usually read mirai; here it takes the informal reading asu.

Spotlight on key updates

Kubernetes v1.36 is packed with new features and improvements. Here are a few select updates the Release Team would like to highlight!

Stable: Fine-grained API authorization

On behalf of Kubernetes SIG Auth and SIG Node, we are pleased to announce the graduation of fine-grained kubelet API authorization to General Availability (GA) in Kubernetes v1.36!

The KubeletFineGrainedAuthz feature gate was introduced as an opt-in alpha feature in Kubernetes v1.32, then graduated to beta (enabled by default) in v1.33. Now, the feature is generally available. This feature enables more precise, least-privilege access control over the kubelet's HTTPS API replacing the need to grant the overly broad nodes/proxy permission for common monitoring and observability use cases.

​​This work was done as a part of KEP #2862 led by SIG Auth and SIG Node.

Beta: Resource health status

Before the v1.34 release, Kubernetes lacked a native way to report the health of allocated devices, making it difficult to diagnose Pod crashes caused by hardware failures. Building on the initial alpha release in v1.31 which focused on Device Plugins, Kubernetes v1.36 expands this feature by promoting the allocatedResourcesStatus field within the .status for each Pod (to beta). This field provides a unified health reporting mechanism for all specialized hardware.

Users can now run kubectl describe pod to determine if a container's crash loop is due to an Unhealthy or Unknown device status, regardless of whether the hardware was provisioned via traditional plugins or the newer DRA framework. This enhanced visibility allows administrators and automated controllers to quickly identify faulty hardware and streamline the recovery of high-performance workloads.

This work was done as part of KEP #4680 led by SIG Node.

Alpha: Workload Aware Scheduling (WAS) features

Previously, the Kubernetes scheduler and job controllers managed pods as independent units, often leading to fragmented scheduling or resource waste for complex, distributed workloads. Kubernetes v1.36 introduces a comprehensive suite of Workload Aware Scheduling (WAS) features in Alpha, natively integrating the Job controller with a revised Workload API and a new decoupled PodGroup API, to treat related pods as a single logical entity.

Kubernetes v1.35 already supported gang scheduling by requiring a minimum number of pods to be schedulable before any were bound to nodes. v1.36 goes further with a new PodGroup scheduling cycle that evaluates the entire group atomically, either all pods in the group are bound together, or none are.

This work was done across several KEPs (including #4671, #5547, #5832, #5732, and #5710) led by SIG Scheduling and SIG Apps.

Features graduating to Stable

This is a selection of some of the improvements that are now stable following the v1.36 release.

Volume group snapshots

After several cycles in beta, VolumeGroupSnapshot support reaches General Availability (GA) in Kubernetes v1.36. This feature allows you to take crash-consistent snapshots across multiple PersistentVolumeClaims simultaneously. The support for volume group snapshots relies on a set of extension APIs for group snapshots. These APIs allow users to take crash consistent snapshots for a set of volumes. A key aim is to allow you to restore that set of snapshots to new volumes and recover your workload based on a crash consistent recovery point.

This work was done as part of KEP #3476 led by SIG Storage.

Mutable volume attach limits

In Kubernetes v1.36, the mutable CSINode allocatable feature graduates to stable. This enhancement allows Container Storage Interface (CSI) drivers to dynamically update the reported maximum number of volumes that a node can handle.

With this update, the kubelet can dynamically update a node's volume limits and capacity information. The kubelet adjusts these limits based on periodic checks or in response to resource exhaustion errors from the CSI driver, without requiring a component restart. This ensures the Kubernetes scheduler maintains an accurate view of storage availability, preventing pod scheduling failures caused by outdated volume limits.

This work was done as part of KEP #4876 led by SIG Storage.

API for external signing of ServiceAccount tokens

In Kubernetes v1.36, the external ServiceAccount token signer feature for service accounts graduates to stable, making it possible to offload token signing to an external system while still integrating cleanly with the Kubernetes API. Clusters can now rely on an external JWT signer for issuing projected service account tokens that follow the standard service account token format, including support for extended expiration when needed. This is especially useful for clusters that already rely on external identity or key management systems, allowing Kubernetes to integrate without duplicating key management inside the control plane.

The kube-apiserver is wired to discover public keys from the external signer, cache them, and validate tokens it did not sign itself, so existing authentication and authorization flows continue to work as expected. Over the alpha and beta phases, the API and configuration for the external signer plugin, path validation, and OIDC discovery were hardened to handle real-world deployments and rotation patterns safely.

With GA in v1.36, external ServiceAccount token signing is now a fully supported option for platforms that centralize identity and signing, simplifying integration with external IAM systems and reducing the need to manage signing keys directly inside the control plane.

This work was done as part of KEP #740 led by SIG Auth.

DRA features graduating to Stable

Part of the Dynamic Resource Allocation (DRA) ecosystem reaches full production maturity in Kubernetes v1.36 as key governance and selection features graduate to Stable. The transition of DRA admin access to GA provides a permanent, secure framework for cluster administrators to access and manage hardware resources globally, while the stabilization of prioritized lists ensures that resource selection logic remains consistent and predictable across all cluster environments.

Now, organizations can confidently deploy mission-critical hardware automation with the guarantee of long-term API stability and backward compatibility. These features empower users to implement sophisticated resource-sharing policies and administrative overrides that are essential for large-scale GPU clusters and multi-tenant AI platforms, marking the completion of the core architectural foundation for next-generation resource management.

This work was done as part of KEPs #5018 and #4816 led by SIG Auth and SIG Scheduling.

Mutating admission policies

Declarative cluster management reaches a new level of sophistication in Kubernetes v1.36 with the graduation of MutatingAdmissionPolicies to Stable. This milestone provides a native, high-performance alternative to traditional webhooks by allowing administrators to define resource mutations directly in the API server using the Common Expression Language (CEL), fully replacing the need for external infrastructure for many common use cases.

Now, cluster operators can modify incoming requests without the latency and operational complexity associated with managing custom admission webhooks. By moving mutation logic into a declarative, versioned policy, organizations can achieve more predictable cluster behavior, reduced network overhead, and a hardened security model with the full guarantee of long-term API stability.

This work was done as part of KEP #3962 led by SIG API Machinery.

Declarative validation of Kubernetes native types with validation-gen

The development of custom resources reaches a new level of efficiency in Kubernetes v1.36 as declarative validation (with validation-gen) graduates to Stable. This milestone replaces the manual and often error-prone task of writing complex OpenAPI schemas by allowing developers to define sophisticated validation logic directly within Go struct tags using the Common Expression Language (CEL).

Instead of writing custom validation functions, Kubernetes contributors can now define validation rules using IDL marker comments (such as +k8s:minimum or +k8s:enum) directly within the API type definitions (types.go). The validation-gen tool parses these comments to automatically generate robust API validation code at compile-time. This reduces maintenance overhead and ensures that API validation remains consistent and synchronized with the source code.

This work was done as part of KEP #5073 led by SIG API Machinery.

Removal of gogo protobuf dependency for Kubernetes API types

Security and long-term maintainability for the Kubernetes codebase take a major step forward in Kubernetes v1.36 with the completion of the gogoprotobuf removal. This initiative has eliminated a significant dependency on the unmaintained gogoprotobuf library, which had become a source of potential security vulnerabilities and a blocker for adopting modern Go language features.

Instead of migrating to standard Protobuf generation, which presented compatibility risks for Kubernetes API types, the project opted to fork and internalize the required generation logic within k8s.io/code-generator. This approach successfully eliminates the unmaintained runtime dependencies from the Kubernetes dependency graph while preserving existing API behavior and serialization compatibility. For consumers of Kubernetes API Go types, this change reduces technical debt and prevents accidental misuse with standard protobuf libraries.

This work was done as part of KEP #5589 led by SIG API Machinery.

Node log query

Previously, Kubernetes required cluster administrators to log into nodes via SSH or implement a client-side reader for debugging issues pertaining to control-plane or worker nodes. While certain issues still require direct node access, issues with the kube-proxy or kubelet can be diagnosed by inspecting their logs. Node logs offer cluster administrators a method to view these logs using the kubelet API and kubectl plugin to simplify troubleshooting without logging into nodes, similar to debugging issues related to a pod or container. This method is operating system agnostic and requires the services or nodes to log to /var/log.

As this feature reaches GA in Kubernetes 1.36 after thorough performance validation on production workloads, it is enabled by default on the kubelet through the NodeLogQuery feature gate. In addition, the enableSystemLogQuery kubelet configuration option must also be enabled.

This work was done as a part of KEP #2258 led by SIG Windows.

Support User Namespaces in pods

Container isolation and node security reach a major maturity milestone in Kubernetes v1.36 as support for User Namespaces graduates to Stable. This long-awaited feature provides a critical layer of defense-in-depth by allowing the mapping of a container's root user to a non-privileged user on the host, ensuring that even if a process escapes the container, it possesses no administrative power over the underlying node.

Now, cluster operators can confidently enable this hardened isolation for production workloads to mitigate the impact of container breakout vulnerabilities. By decoupling the container's internal identity from the host's identity, Kubernetes provides a robust, standardized mechanism to protect multi-tenant environments and sensitive infrastructure from unauthorized access, all with the full guarantee of long-term API stability.

This work was done as part of KEP #127 led by SIG Node.

Support PSI based on cgroupv2

Node resource management and observability become more precise in Kubernetes v1.36 as the export of Pressure Stall Information (PSI) metrics graduates to Stable. This feature provides the kubelet with the ability to report "pressure" metrics for CPU, memory, and I/O, offering a more granular view of resource contention than traditional utilization metrics.

Cluster operators and autoscalers can use these metrics to distinguish between a system that is simply busy and one that is actively stalling due to resource exhaustion. By leveraging these signals, users can more accurately tune pod resource requests, improve the reliability of vertical autoscaling, and detect noisy neighbor effects before they lead to application performance degradation or node instability.

This work was done as part of KEP #4205 led by SIG Node.

Volume source: OCI artifact and/or image

The distribution of container data becomes more flexible in Kubernetes v1.36 as OCI volume source support graduates to Stable. This feature moves beyond the traditional requirement of mounting volumes from external storage providers or config maps by allowing the kubelet to pull and mount content directly from any OCI-compliant registry, such as a container image or an artifact repository.

Now, developers and platform engineers can package application data, models, or static assets as OCI artifacts and deliver them to pods using the same registries and versioning workflows they already use for container images. This convergence of image and volume management simplifies CI/CD pipelines, reduces dependency on specialized storage backends for read-only content, and ensures that data remains portable and securely accessible across any environment.

This work was done as part of KEP #4639 led by SIG Node.

New features in Beta

This is a selection of some of the improvements that are now beta following the v1.36 release.

Staleness mitigation for controllers

Staleness in Kubernetes controllers is a problem that affects many controllers and can subtly affect controller behavior. It is usually not until it is too late, when a controller in production has already taken incorrect action, that staleness is found to be an issue due to some underlying assumption made by the controller author. This could lead to conflicting updates or data corruption upon controller reconciliation during times of cache staleness.

We are excited to announce that Kubernetes v1.36 includes new features that help mitigate controller staleness and provide better observability of controller behavior. This prevents reconciliation based on an outdated view of cluster state that can often lead to harmful behavior.

This work was done as part of KEP #5647 led by SIG API Machinery.

IP/CIDR validation improvements

In Kubernetes v1.36, the StrictIPCIDRValidation feature for API IP and CIDR fields graduates to beta, tightening validation to catch malformed addresses and prefixes that previously slipped through. This helps prevent subtle configuration bugs where Services, Pods, NetworkPolicies, or other resources reference invalid IPs, which could otherwise lead to confusing runtime behavior or security surprises.

Controllers are updated to canonicalize IPs they write back into objects and to warn when they encounter bad values that were already stored, so clusters can gradually converge on clean, consistent data. With beta, StrictIPCIDRValidation is ready for wider use, giving operators more reliable guardrails around IP-related configuration as they evolve networks and policies over time.

This work was done as a part of KEP #4858 led by SIG Network.

Separate kubectl user preferences from cluster configs

The .kuberc feature for customizing kubectl user preferences continues to be beta and is enabled by default. The ~/.kube/kuberc file allows users to store aliases, default flags, and other personal settings separately from kubeconfig files, which hold cluster endpoints and credentials. This separation prevents personal preferences from interfering with CI pipelines or shared kubeconfig files, while maintaining a consistent kubectl experience across different clusters and contexts.

In Kubernetes v1.36, .kuberc was expanded with the ability to define policies for credential plugins (allowlists or denylists) to enforce safer authentication practicies. Users can disable this functionality if needed by setting the KUBECTL_KUBERC=false or KUBERC=off environment variables.

This work was done as a part of KEP #3104 led by SIG CLI, with the help from SIG Auth.

Mutable container resources when Job is suspended

In Kubernetes v1.36, the MutablePodResourcesForSuspendedJobs feature graduates to beta and is enabled by default. This update relaxes Job validation to allow updates to container CPU, memory, GPU, and extended resource requests and limits while a Job is suspended.

This capability allows queue controllers and operators to adjust batch workload requirements based on real‑time cluster conditions. For example, a queueing system can suspend incoming Jobs, adjust their resource requirements to match available capacity or quota, and then unsuspend them. The feature strictly limits mutability to suspended Jobs (or Jobs whose pods have been terminated upon suspension) to prevent disruptive changes to actively running pods.

This work was done as a part of KEP #5440 led by SIG Apps.

Constrained impersonation

In Kubernetes v1.36, the ConstrainedImpersonation feature for user impersonation graduates to beta, tightening a historically all‑or‑nothing mechanism into something that can actually follow least‑privilege principles. When this feature is enabled, an impersonator must have two distinct sets of permissions: one to impersonate a given identity, and another to perform specific actions on that identity's behalf. This prevents support tools, controllers, or node agents from using impersonation to gain broader access than they themselves are allowed, even if their impersonation RBAC is misconfigured. Existing impersonate rules keep working, but the API server prefers the new constrained checks first, making the transition incremental instead of a flag day. With beta in v1.36, ConstrainedImpersonation is tested, documented, and ready for wider adoption by platform teams that rely on impersonation for debugging, proxying, or node‑level controllers.

This work was done as a part of KEP #5284 led by SIG Auth.

DRA features in beta

The Dynamic Resource Allocation (DRA) framework reaches another maturity milestone in Kubernetes v1.36 as several core features graduate to beta and are enabled by default. This transition moves DRA beyond basic allocation by graduating partitionable devices and consumable capacity, allowing for more granular sharing of hardware like GPUs, while device taints and tolerations ensure that specialized resources are only utilized by the appropriate workloads.

Now, users benefit from a much more reliable and observable resource lifecycle through ResourceClaim device status and the ability to ensure device attachment before Pod scheduling. By integrating these features with extended resource support, Kubernetes provides a robust production-ready alternative to the legacy device plugin system, enabling complex AI and HPC workloads to manage hardware with unprecedented precision and operational safety.

This work was done across several KEPs (including #5004, #4817, #5055, #5075, #4815, and #5007) led by SIG Scheduling and SIG Node.

Statusz for Kubernetes components

In Kubernetes v1.36, the ComponentStatusz feature gate for core Kubernetes components graduates to beta, providing a /statusz endpoint (enabled by default) that surfaces real‑time build and version details for each component. This low‑overhead z-page exposes information like start time, uptime, Go version, binary version, emulation version, and minimum compatibility version, so operators and developers can quickly see exactly what is running without digging through logs or configs.

The endpoint offers a human‑readable text view by default, plus a versioned structured API (config.k8s.io/v1beta1) for programmatic access in JSON, YAML, or CBOR via explicit content negotiation. Access is granted to the system:monitoring group, keeping it aligned with existing protections on health and metrics endpoints and avoiding exposure of sensitive data.

With beta, ComponentStatusz is enabled by default across all core control‑plane components and node agents, backed by unit, integration, and end‑to‑end tests so it can be safely used in production for observability and debugging workflows.

This work was done as a part of KEP #4827 led by SIG Instrumentation.

Flagz for Kubernetes components

In Kubernetes v1.36, the ComponentFlagz feature gate for core Kubernetes components graduates to beta, standardizing a /flagz endpoint that exposes the effective command‑line flags each component was started with. This gives cluster operators and developers real‑time, in‑cluster visibility into component configuration, making it much easier to debug unexpected behavior or verify that a flag rollout actually took effect after a restart.

The endpoint supports both a human‑readable text view and a versioned structured API (initially config.k8s.io/v1beta1), so you can either curl it during an incident or wire it into automated tooling once you are ready. Access is granted to the system:monitoring group and sensitive values can be redacted, keeping configuration insight aligned with existing security practices around health and status endpoints.

With beta, ComponentFlagz is now enabled by default and implemented across all core control‑plane components and node agents, backed by unit, integration, and end‑to‑end tests to ensure the endpoint is reliable in production clusters.

This work was done as a part of KEP #4828 led by SIG Instrumentation.

Mixed version proxy (aka unknown version interoperability proxy)

In Kubernetes v1.36, the mixed version proxy feature graduates to beta, building on its alpha introduction in v1.28 to provide safer control-plane upgrades for mixed-version clusters. Each API request can now be routed to the apiserver instance that serves the requested group, version, and resource, reducing 404s and failures due to version skew.

The feature relies on peer-aggregated discovery, so apiservers share information about which resources and versions they expose, then use that data to transparently reroute requests when needed. New metrics on rerouted traffic and proxy behavior help operators understand how often requests are forwarded and to which peers. Together, these changes make it easier to run highly available, mixed-version API control planes in production while performing multi-step or partial control-plane upgrades.

This work was done as a part of KEP #4020 led by SIG API Machinery

Memory QoS with cgroups v2

Kubernetes now enhances memory QoS on Linux cgroup v2 nodes with smarter, tiered memory protection that better aligns kernel controls with pod requests and limits, reducing interference and thrashing for workloads sharing the same node. This iteration also refines how kubelet programs memory.high and memory.min, adds metrics and safeguards to avoid livelocks, and introduces configuration options so cluster operators can tune memory protection behavior for their environments.

This work was done as part of KEP #2570 led by SIG Node.

New features in Alpha

This is a selection of some of the improvements that are now alpha following the v1.36 release.

HPA scale to zero for custom metrics

Until now, the HorizontalPodAutoscaler (HPA) required a minimum of at least one replica to remain active, as it could only calculate scaling needs based on metrics (like CPU or Memory) from running pods. Kubernetes v1.36 continues the development of the HPA scale to zero feature (disabled by default) in Alpha, allowing workloads to scale down to zero replicas specifically when using Object or External metrics.

Now, users can experiment with significantly reducing infrastructure costs by completely idling heavy workloads when no work is pending. While the feature remains behind the HPAScaleToZero feature gate, it enables the HPA to stay active even with zero running pods, automatically scaling the deployment back up as soon as the external metric (e.g., queue length) indicates that new tasks have arrived.

This work was done as part of KEP #2021 led by SIG Autoscaling.

DRA features in Alpha

Historically, the Dynamic Resource Allocation (DRA) framework lacked seamless integration with high-level controllers and provided limited visibility into device-specific metadata or availability. Kubernetes v1.36 introduces a wave of DRA enhancements in Alpha, including native ResourceClaim support for workloads, and DRA native resources to provide the flexibility of DRA to cpu management.

Now, users can leverage the downward API to expose complex resource attributes directly to containers and benefit from improved resource availability visibility for more predictable scheduling. these updates, combined with support for list types in device attributes, transform DRA from a low-level primitive into a robust system capable of handling the sophisticated networking and compute requirements of modern AI and high-performance computing (HPC) stacks.

This work was done across several KEPs (including #5729, #5304, #5517, #5677, and #5491) led by SIG Scheduling and SIG Node.

Native histogram support for Kubernetes metrics

High-resolution monitoring reaches a new milestone in Kubernetes v1.36 with the introduction of native histogram support in Alpha. While classical Prometheus histograms relied on static, pre-defined buckets that often forced a compromise between data accuracy and memory usage, this update allows the control plane to export sparse histograms that dynamically adjust their resolution based on real-time data.

Now, cluster operators can capture precise latency distributions for the kube-apiserver and other core components without the overhead of manual bucket management. This architectural shift ensures more reliable SLIs and SLOs, providing high-fidelity heatmaps that remain accurate even during the most unpredictable workload spikes.

This work was done as part of KEP #5808 led by SIG Instrumentation.

Manifest based admission control config

Managing admission controllers moves toward a more declarative and consistent model in Kubernetes v1.36 with the introduction of manifest-based admission control configuration in Alpha. This change addresses the long-standing challenge of configuring admission plugins through disparate command-line flags or separate, complex config files by allowing administrators to define the desired state of admission control directly through a structured manifest.

Now, cluster operators can manage admission plugin settings with the same versioned, declarative workflows used for other Kubernetes objects, significantly reducing the risk of configuration drift and manual errors during cluster upgrades. By centralizing these configurations into a unified manifest, the kube-apiserver becomes easier to audit and automate, paving the way for more secure and reproducible cluster deployments.

This work was done as part of KEP #5793 led by SIG API Machinery.

CRI list streaming

With the introduction of CRI list streaming in Alpha, Kubernetes v1.36 uses new internal streaming operations. This enhancement addresses the memory pressure and latency spikes often seen on large-scale nodes by replacing traditional, monolithic List requests between the kubelet and the container runtime with a more efficient server-side streaming RPC.

Now, instead of waiting for a single, massive response containing all container or image data, the kubelet can process results incrementally as they are streamed. This shift significantly reduces the peak memory footprint of the kubelet and improves responsiveness on high-density nodes, ensuring that cluster management remains fluid even as the number of containers per node continues to grow.

This work was done as part of KEP #5825 led by SIG Node.

Other notable changes

Ingress NGINX retirement

To prioritize the safety and security of the ecosystem, Kubernetes SIG Network and the Security Response Committee have retired Ingress NGINX on March 24, 2026. Since that date, there have been no further releases, no bugfixes, and no updates to resolve any security vulnerabilities discovered. Existing deployments of Ingress NGINX will continue to function, and installation artifacts like Helm charts and container images will remain available.

For full details, see the official retirement announcement.

Faster SELinux labelling for volumes (GA)

Kubernetes v1.36 makes the SELinux volume mounting improvement generally available. This change replaced recursive file relabeling with mount -o context=XYZ option, applying the correct SELinux label to the entire volume at mount time. It brings more consistent performance and reduces Pod startup delays on SELinux-enforcing systems.

This feature was introduced as beta in v1.28 for ReadWriteOncePod volumes. In v1.32, it gained metrics and an opt-out option (securityContext.seLinuxChangePolicy: Recursive) to help catch conflicts. Now in v1.36, it reaches Stable and defaults to all volumes, with Pods or CSIDrivers opting in via spec.seLinuxMount.

However, we expect this feature to create the risk of breaking changes in the future Kubernetes releases, potentially due to sharing one volume between privileged and unprivileged Pods on the same node.

Developers have the responsibility of setting the seLinuxChangePolicy field and SELinux volume labels on Pods. Regardless of whether they are writing a Deployment, StatefulSet, DaemonSet or even a custom resource that includes a Pod template, being careless with these settings can lead to a range of problems such as Pods not starting up correctly when Pods share a volume.

Kubernetes v1.36 is the ideal release to audit your clusters. To learn more, check out SELinux Volume Label Changes goes GA (and likely implications in v1.37) blog.

For more details on this enhancement, refer to KEP-1710: Speed up recursive SELinux label change.

Graduations, deprecations, and removals in v1.36

Graduations to stable

This lists all the features that graduated to stable (also known as general availability). For a full list of updates including new features and graduations from alpha to beta, see the release notes.

This release includes a total of 18 enhancements promoted to stable:

Deprecations removals, and community updates

As Kubernetes develops and matures, features may be deprecated, removed, or replaced with better ones to improve the project's overall health. See the Kubernetes deprecation and removal policy for more details on this process. Kubernetes v1.36 includes a couple of deprecations.

Deprecation of Service .spec.externalIPs

With this release, the externalIPs field in Service spec is deprecated. This means the functionality exists, but will no longer function in a future version of Kubernetes. You should plan to migrate if you currently rely on that field. This field has been a known security headache for years, enabling man-in-the-middle attacks on your cluster traffic, as documented in CVE-2020-8554. From Kubernetes v1.36 and onwards, you will see deprecation warnings when using it, with full removal planned for v1.43.

If your Services still lean on externalIPs, consider using LoadBalancer services for cloud-managed ingress, NodePort for simple port exposure, or Gateway API for a more flexible and secure way to handle external traffic.

For more details on this field and its deprecation, refer to External IPs or read KEP-5707: Deprecate service.spec.externalIPs.

Removal of the gitRepo volume driver

The gitRepo volume type has been deprecated since v1.11. For Kubernetes v1.36, the gitRepo volume plugin is permanently disabled and cannot be turned back on. This change protects clusters from a critical security issue where using gitRepo could let an attacker run code as root on the node.

Although gitRepo has been deprecated for years and better alternatives have been recommended, it was still technically possible to use it in previous releases. From v1.36 onward, that path is closed for good, so any existing workloads depending on gitRepo will need to migrate to supported approaches such as init containers or external git-sync style tools.

For more details on this removal, refer to KEP-5040: Remove gitRepo volume driver

Release notes

Check out the full details of the Kubernetes v1.36 release in our release notes.

Availability

Kubernetes v1.36 is available for download on GitHub or on the Kubernetes download page.

To get started with Kubernetes, check out these tutorials or run local Kubernetes clusters using minikube. You can also easily install v1.36 using kubeadm.

Release Team

Kubernetes is only possible with the support, commitment, and hard work of its community. Each release team is made up of dedicated community volunteers who work together to build the many pieces that make up the Kubernetes releases you rely on. This requires the specialized skills of people from all corners of our community, from the code itself to its documentation and project management.

We would like to thank the entire Release Team for the hours spent hard at work to deliver the Kubernetes v1.36 release to our community. The Release Team's membership ranges from first-time shadows to returning team leads with experience forged over several release cycles. A very special thanks goes out to our release lead, Ryota Sawada, for guiding us through a successful release cycle, for his hands-on approach to solving challenges, and for bringing the energy and care that drives our community forward.

Project Velocity

The CNCF K8s DevStats project aggregates a number of interesting data points related to the velocity of Kubernetes and various sub-projects. This includes everything from individual contributions to the number of companies that are contributing, and is an illustration of the depth and breadth of effort that goes into evolving this ecosystem.

During the v1.36 release cycle, which spanned 15 weeks from 12th January 2026 to 22nd April 2026, Kubernetes received contributions from as many as 106 different companies and 491 individuals. In the wider cloud native ecosystem, the figure goes up to 370 companies, counting 2235 total contributors.

Note that "contribution" counts when someone makes a commit, code review, comment, creates an issue or PR, reviews a PR (including blogs and documentation) or comments on issues and PRs. If you are interested in contributing, visit Getting Started on our contributor website.

Source for this data:

Events Update

Explore upcoming Kubernetes and cloud native events, including KubeCon + CloudNativeCon, KCD, and other notable conferences worldwide. Stay informed and get involved with the Kubernetes community!

April 2026

May 2026

June 2026

July 2026

September 2026

October 2026

November 2026

You can find the latest event details here.

Upcoming Release Webinar

Join members of the Kubernetes v1.36 Release Team on Wednesday, May 20th 2026 at 4:00 PM (UTC) to learn about the release highlights of this release. For more information and registration, visit the event page on the CNCF Online Programs site.

Get Involved

The simplest way to get involved with Kubernetes is by joining one of the many Special Interest Groups (SIGs) that align with your interests. Have something you'd like to broadcast to the Kubernetes community? Share your voice at our weekly community meeting, and through the channels below. Thank you for your continued feedback and support.

22 Apr 2026 12:00am GMT

21 Apr 2026

feedKubernetes Blog

Gateway API v1.5: Moving features to Stable

Gateway API logo

The Kubernetes SIG Network community presents the release of Gateway API (v1.5)! Released on February 27, 2026, version 1.5 is our biggest release yet, and concentrates on moving existing Experimental features to Standard (Stable).

The Gateway API v1.5.1 patch release is already available.

The Gateway API v1.5 brings six widely-requested feature promotions to the Standard channel (Gateway API's GA release channel):

Special thanks for Gateway API Contributors for their efforts on this release.

New release process

As of Gateway API v1.5, the project has moved to a release train model, where on a feature freeze date, any features that are ready are shipped in the release.

This applies to both Experimental and Standard, and also applies to documentation -- if the documentation isn't ready to ship, the feature isn't ready to ship.

We are aiming for this to produce a more reliable release cadence (since we are basing our work off the excellent work done by SIG Release on Kubernetes itself). As part of this change, we've also introduced Release Manager and Release Shadow roles to our release team. Many thanks to Flynn (Buoyant) and Beka Modebadze (Google) for all the great work coordinating and filing the rough edges of our release process. They are both going to continue in this role for the next release as well.

New standard features

ListenerSet

Leads: Dave Protasowski, David Jumani

GEP-1713

Why ListenerSet?

Prior to ListenerSet, all listeners had to be specified directly on the Gateway object. While this worked well for simple use cases, it created challenges for more complex or multi-tenant environments:

ListenerSet addresses these limitations by allowing listeners to be defined independently and then merged onto a target Gateway.

ListenerSets also enable attaching more than 64 listeners to a single, shared Gateway. This is critical for large scale deployments and scenarios with multiple hostnames per listener.

Even though the ListenerSet feature significantly enhances scalability, the listener field in Gateway remains a mandatory requirement and the Gateway must have at least one valid listener.

How it works

A ListenerSet attaches to a Gateway and contributes one or more listeners. The Gateway controller is responsible for merging listeners from the Gateway resource itself and any attached ListenerSet resources.

In this example, a central infrastructure team defines a Gateway with a default HTTP listener, while two different application teams define their own ListenerSet resources in separate namespaces. Both ListenerSets attach to the same Gateway and contribute additional HTTPS listeners.

---
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
 name: example-gateway
 namespace: infra
spec:
 gatewayClassName: example-gateway-class
 allowedListeners:
 namespaces:
 from: All # A selector lets you fine tune this
 listeners:
 - name: http
 protocol: HTTP
 port: 80
---
apiVersion: gateway.networking.k8s.io/v1
kind: ListenerSet
metadata:
 name: team-a-listeners
 namespace: team-a
spec:
 parentRef:
 name: example-gateway
 namespace: infra
 listeners:
 - name: https-a
 protocol: HTTPS
 port: 443
 hostname: a.example.com
 tls:
 certificateRefs:
 - name: a-cert
---
apiVersion: gateway.networking.k8s.io/v1
kind: ListenerSet
metadata:
 name: team-b-listeners
 namespace: team-b
spec:
 parentRef:
 name: example-gateway
 namespace: infra
 listeners:
 - name: https-b
 protocol: HTTPS
 port: 443
 hostname: b.example.com
 tls:
 certificateRefs:
 - name: b-cert

TLSRoute

Leads: Rostislav Bobrovsky, Ricardo Pchevuzinske Katz

GEP-2643

The TLSRoute resource allows you to route requests by matching the Server Name Indication (SNI) presented by the client during the TLS handshake and directing the stream to the appropriate Kubernetes backends.

When working with TLSRoute, a Gateway's TLS listener can be configured in one of two modes: Passthrough or Terminate.

If you install Gateway API v1.5 Standard over v1.4 or earlier Experimental, your existing Experimental TLSRoutes will not be usable. This is because they will be stored in the v1alpha2 or v1alpha3 version, which is not included in the v1.5 Standard YAMLs. If this applies to you, either continue using Experimental for v1.5.1 and onward, or you'll need to download and migrate your TLSRoutes to v1, which is present in the Standard YAMLs.

Passthrough mode

The Passthrough mode is designed for strict security requirements. It is ideal for scenarios where traffic must remain encrypted end-to-end until it reaches the destination backend, when the external client and backend need to authenticate directly with each other, or when you can't store certificates on the Gateway. This configuration is also applicable when an encrypted TCP stream is required instead of standard HTTP traffic.

In this mode, the encrypted byte stream is proxied directly to the destination backend. The Gateway has zero access to private keys or unencrypted data.

The following TLSRoute is attached to a listener that is configured in Passthrough mode. It will match only TLS handshakes with the foo.example.com SNI hostname and apply its routing rules to pass the encrypted TCP stream to the configured backend:

---
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
 name: example-gateway
spec:
 gatewayClassName: example-gateway-class
 listeners:
 - name: tls-passthrough
 protocol: TLS
 port: 8443
 tls:
 mode: Passthrough
---
apiVersion: gateway.networking.k8s.io/v1
kind: TLSRoute
metadata:
 name: foo-route
spec:
 parentRefs:
 - name: example-gateway
 sectionName: tls-passthrough
 hostnames:
 - "foo.example.com"
 rules:
 - backendRefs:
 - name: foo-svc
 port: 8443

Terminate mode

The Terminate mode provides the convenience of centralized TLS certificate management directly at the Gateway.

In this mode, the TLS session is fully terminated at the Gateway, which then routes the decrypted payload to the destination backend as a plain text TCP stream.

The following TLSRoute is attached to a listener that is configured in Terminate mode. It will match only TLS handshakes with the bar.example.com SNI hostname and apply its routing rules to pass the decrypted TCP stream to the configured backend:

apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
 name: example-gateway
spec:
 gatewayClassName: example-gateway-class
 listeners:
 - name: tls-terminate
 protocol: TLS
 port: 443
 tls:
 mode: Terminate
 certificateRefs:
 - name: tls-terminate-certificate
---
apiVersion: gateway.networking.k8s.io/v1
kind: TLSRoute
metadata:
 name: bar-route
spec:
 parentRefs:
 - name: example-gateway
 sectionName: tls-terminate
 hostnames:
 - "bar.example.com"
 rules:
 - backendRefs:
 - name: bar-svc
 port: 8080

HTTPRoute CORS filter

Leads: Damian Sawicki, Ricardo Pchevuzinske Katz, Norwin Schnyder, Huabing (Robin) Zhao, LiangLliu,

GEP-1767

Cross-origin resource sharing (CORS) is an HTTP-header based security mechanism that allows (or denies) a web page to access resources from a server on an origin different from the domain that served the web page. See our documentation page for more information. The HTTPRoute resource can be used to configure Cross-Origin Resource Sharing (CORS). The following HTTPRoute allows requests from https://app.example:

apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
 name: cors
spec:
 parentRefs:
 - name: same-namespace
 rules:
 - matches:
 - path:
 type: PathPrefix
 value: /cors-behavior-creds-false
 backendRefs:
 - name: infra-backend-v1
 port: 8080
 filters:
 - cors:
 allowOrigins:
 - https://app.example
 type: CORS

Instead of specifying a list of specific origins, you can also specify a single wildcard ("*"), which will allow any origin. It is also allowed to use semi-specified origins in the list, where the wildcard appears after the scheme and at the beginning of the hostname, e.g. https://*.bar.com:

apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
 name: cors
spec:
 parentRefs:
 - name: same-namespace
 rules:
 - matches:
 - path:
 type: PathPrefix
 value: /cors-behavior-creds-false
 backendRefs:
 - name: infra-backend-v1
 port: 8080
 filters:
 - cors:
 allowOrigins:
 - https://www.baz.com
 - https://*.bar.com
 - https://*.foo.com
 type: CORS

HTTPRoute filters allow for the configuration of CORS settings. See a list of supported options below:

allowCredentials
Specifies whether the browser is allowed to include credentials (such as cookies and HTTP authentication) in the CORS request.
allowMethods
The HTTP methods that are allowed for CORS requests.
allowHeaders
The HTTP headers that are allowed for CORS requests.
exposeHeaders
The HTTP headers that are exposed to the client.
maxAge
The maximum time in seconds that the browser should cache the preflight response.

A comprehensive example:

apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
 name: cors-allow-credentials
spec:
 parentRefs:
 - name: same-namespace
 rules:
 - matches:
 - path:
 type: PathPrefix
 value: /cors-behavior-creds-true
 backendRefs:
 - name: infra-backend-v1
 port: 8080
 filters:
 - cors:
 allowOrigins:
 - "https://www.foo.example.com"
 - "https://*.bar.example.com"
 allowMethods:
 - GET
 - OPTIONS
 allowHeaders:
 - "*"
 exposeHeaders:
 - "x-header-3"
 - "x-header-4"
 allowCredentials: true
 maxAge: 3600
 type: CORS

Gateway client certificate validation

Leads: Arko Dasgupta, Katarzyna Łach, Norwin Schnyder

GEP-91

Client certificate validation, also known as mutual TLS (mTLS), is a security mechanism where the client provides a certificate to the server to prove its identity. This is in contrast to standard TLS, where only the server presents a certificate to the client. In the context of the Gateway API, frontend mTLS means that the Gateway validates the client's certificate before allowing the connection to proceed to a backend service. This validation is done by checking the client certificate against a set of trusted Certificate Authorities (CAs) configured on the Gateway. The API was shaped this way to address a critical security vulnerability related to connection reuse and still provide some level of flexibility.

Configuration overview

Client validation is defined using the frontendValidation struct, which specifies how the Gateway should verify the client's identity.

Validation can be applied globally to the Gateway or overridden for specific ports:

  1. Default Configuration: This configuration applies to all HTTPS listeners on the Gateway, unless a per-port override is defined.
  2. Per-Port Configuration: This allows for fine-grained control, overriding the default configuration for all listeners handling traffic on a specific port.

Example:

apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
 name: client-validation-basic
spec:
 gatewayClassName: acme-lb
 tls:
 frontend:
 default:
 validation:
 caCertificateRefs:
 - kind: ConfigMap
 group: ""
 name: foo-example-com-ca-cert
 perPort:
 - port: 8443
 tls:
 validation:
 caCertificateRefs:
 - kind: ConfigMap
 group: ""
 name: foo-example-com-ca-cert
 mode: "AllowInsecureFallback"
 listeners:
 - name: foo-https
 protocol: HTTPS
 port: 443
 hostname: foo.example.com
 tls:
 certificateRefs:
 - kind: Secret
 group: ""
 name: foo-example-com-cert
 - name: bar-https
 protocol: HTTPS
 port: 8443
 hostname: bar.example.com
 tls:
 certificateRefs:
 - kind: Secret
 group: ""
 name: bar-example-com-cert

Certificate selection for Gateway TLS origination

Leads: Marcin Kosieradzki, Rob Scott, Norwin Schnyder, Lior Lieberman, Katarzyna Lach

GEP-3155

Mutual TLS (mTLS) for upstream connections requires the Gateway to present a client certificate to the backend, in addition to verifying the backend's certificate. This ensures that the backend only accepts connections from authorized Gateways.

Gateway's client certificate configuration

To configure the client certificate that the Gateway uses when connecting to backends, use the tls.backend.clientCertificateRef field in the Gateway resource. This configuration applies to the Gateway as a client for all upstream connections managed by that Gateway.

apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
 name: backend-tls
spec:
 gatewayClassName: acme-lb
 tls:
 backend:
 clientCertificateRef:
 kind: Secret
 group: "" # empty string means core API group
 name: foo-example-cert
 listeners:
 - name: foo-http
 protocol: HTTP
 port: 80
 hostname: foo.example.com

ReferenceGrant promoted to v1

The ReferenceGrant resource has not changed in more than a year, and we do not expect it to change further, so its version has been bumped to v1, and it is now officially in the Standard channel, and abides by the GA API contract (that is, no breaking changes).

Try it out

Unlike other Kubernetes APIs, you don't need to upgrade to the latest version of Kubernetes to get the latest version of Gateway API. As long as you're running Kubernetes 1.30 or later, you'll be able to get up and running with this version of Gateway API.

To try out the API, follow the Getting Started Guide.

As of this writing, seven implementations are already fully conformant with Gateway API v1.5. In alphabetical order:

Get involved

Wondering when a feature will be added? There are lots of opportunities to get involved and help define the future of Kubernetes routing APIs for both ingress and service mesh.

The maintainers would like to thank everyone who's contributed to Gateway API, whether in the form of commits to the repo, discussion, ideas, or general support. We could never have made this kind of progress without the support of this dedicated and active community.

This article was edited in April 2026 to correct the release date for Gateway API 1.5.0.

21 Apr 2026 4:30pm GMT

30 Mar 2026

feedKubernetes Blog

Kubernetes v1.36 Sneak Peek

Kubernetes v1.36 is coming at the end of April 2026. This release will include removals and deprecations, and it is packed with an impressive number of enhancements. Here are some of the features we are most excited about in this cycle!

Please note that this information reflects the current state of v1.36 development and may change before release.

The Kubernetes API removal and deprecation process

The Kubernetes project has a well-documented deprecation policy for features. This policy states that stable APIs may only be deprecated when a newer, stable version of that same API is available and that APIs have a minimum lifetime for each stability level. A deprecated API has been marked for removal in a future Kubernetes release. It will continue to function until removal (at least one year from the deprecation), but usage will result in a warning being displayed. Removed APIs are no longer available in the current version, at which point you must migrate to using the replacement.

Whether an API is removed as a result of a feature graduating from beta to stable, or because that API simply did not succeed, all removals comply with this deprecation policy. Whenever an API is removed, migration options are communicated in the deprecation guide.

A recent example of this principle in action is the retirement of the ingress-nginx project, announced by SIG-Security on March 24, 2026. As stewardship shifts away from the project, the community has been encouraged to evaluate alternative ingress controllers that align with current security and maintenance best practices. This transition reflects the same lifecycle discipline that underpins Kubernetes itself, ensuring continued evolution without abrupt disruption.

Ingress NGINX retirement

To prioritize the safety and security of the ecosystem, Kubernetes SIG Network and the Security Response Committee have retired Ingress NGINX on March 24, 2026. Since that date, there have been no further releases, no bugfixes, and no updates to resolve any security vulnerabilities discovered. Existing deployments of Ingress NGINX will continue to function, and installation artifacts like Helm charts and container images will remain available.

For full details, see the official retirement announcement.

Deprecations and removals for Kubernetes v1.36

Deprecation of .spec.externalIPs in Service

The externalIPs field in Service spec is being deprecated, which means you'll soon lose a quick way to route arbitrary externalIPs to your Services. This field has been a known security headache for years, enabling man-in-the-middle attacks on your cluster traffic, as documented in CVE-2020-8554. From Kubernetes v1.36 and onwards, you will see deprecation warnings when using it, with full removal planned for v1.43.

If your Services still lean on externalIPs, consider using LoadBalancer services for cloud-managed ingress, NodePort for simple port exposure, or Gateway API for a more flexible and secure way to handle external traffic.

For more details on this enhancement, refer to KEP-5707: Deprecate service.spec.externalIPs

Removal of gitRepo volume driver

The gitRepo volume type has been deprecated since v1.11. Starting Kubernetes v1.36, the gitRepo volume plugin is permanently disabled and cannot be turned back on. This change protects clusters from a critical security issue where using gitRepo could let an attacker run code as root on the node.

Although gitRepo has been deprecated for years and better alternatives have been recommended, it was still technically possible to use it in previous releases. From v1.36 onward, that path is closed for good, so any existing workloads depending on gitRepo will need to migrate to supported approaches such as init containers or external git-sync style tools.

For more details on this enhancement, refer to KEP-5040: Remove gitRepo volume driver

Featured enhancements of Kubernetes v1.36

The following list of enhancements is likely to be included in the upcoming v1.36 release. This is not a commitment and the release content is subject to change.

Faster SELinux labelling for volumes (GA)

Kubernetes v1.36 makes the SELinux volume mounting improvement generally available. This change replaced recursive file relabeling with mount -o context=XYZ option, applying the correct SELinux label to the entire volume at mount time. It brings more consistent performance and reduces Pod startup delays on SELinux-enforcing systems.

This feature was introduced as beta in v1.28 for ReadWriteOncePod volumes. In v1.32, it gained metrics and an opt-out option (securityContext.seLinuxChangePolicy: Recursive) to help catch conflicts. Now in v1.36, it reaches stable and defaults to all volumes, with Pods or CSIDrivers opting in via spec.SELinuxMount.

However, we expect this feature to create the risk of breaking changes in the future Kubernetes releases, due to the potential for mixing of privileged and unprivileged pods. Setting the seLinuxChangePolicy field and SELinux volume labels on Pods, correctly, is the responsibility of the Pod author Developers have that responsibility whether they are writing a Deployment, StatefulSet, DaemonSet or even a custom resource that includes a Pod template. Being careless with these settings can lead to a range of problems when Pods share volumes.

For more details on this enhancement, refer to KEP-1710: Speed up recursive SELinux label change

External signing of ServiceAccount tokens

As a beta feature, Kubernetes already supports external signing of ServiceAccount tokens. This allows clusters to integrate with external key management systems or signing services instead of relying only on internally managed keys.

With this enhancement, the kube-apiserver can delegate token signing to external systems such as cloud key management services or hardware security modules. This improves security and simplifies key management services for clusters that rely on centralized signing infrastructure. We expect that this will graduate to stable (GA) in Kubernetes v1.36.

For more details on this enhancement, refer to KEP-740: Support external signing of service account tokens

DRA Driver support for Device taints and tolerations

Kubernetes v1.33 introduced support for taints and tolerations for physical devices managed through Dynamic Resource Allocation (DRA). Normally, any device can be used for scheduling. However, this enhancement allows DRA drivers to mark devices as tainted, which ensures that they will not be used for scheduling purposes. Alternatively, cluster administrators can create a DeviceTaintRule to mark devices that match a certain selection criteria(such as all devices of a certain driver) as tainted. This improves scheduling control and helps ensure that specialized hardware resources are only used by workloads that explicitly request them.

In Kubernetes v1.36, this feature graduates to beta with more comprehensive testing complete, making it accessible by default without the need for a feature flag and open to user feedback.

To learn about taints and tolerations, see taints and tolerations.
For more details on this enhancement, refer to KEP-5055: DRA: device taints and tolerations.

DRA support for partitionable devices

Kubernetes v1.36 expands Dynamic Resource Allocation (DRA) by introducing support for partitionable devices, allowing a single hardware accelerator to be split into multiple logical units that can be shared across workloads. This is especially useful for high-cost resources like GPUs, where dedicating an entire device to a single workload can lead to underutilization.

With this enhancement, platform teams can improve overall cluster efficiency by allocating only the required portion of a device to each workload, rather than reserving it entirely. This makes it easier to run multiple workloads on the same hardware while maintaining isolation and control, helping organizations get more value out of their infrastructure.

To learn more about this enhancement, refer to KEP-4815: DRA Partitionable Devices

Want to know more?

New features and deprecations are also announced in the Kubernetes release notes. We will formally announce what's new in Kubernetes v1.36 as part of the CHANGELOG for that release.

Kubernetes v1.36 release is planned for Wednesday, April 22, 2026. Stay tuned for updates!

You can also see the announcements of changes in the release notes for:

Get involved

The simplest way to get involved with Kubernetes is by joining one of the many Special Interest Groups (SIGs) that align with your interests. Have something you'd like to broadcast to the Kubernetes community? Share your voice at our weekly community meeting, and through the channels below. Thank you for your continued feedback and support.

30 Mar 2026 12:00am GMT

20 Mar 2026

feedKubernetes Blog

Announcing Ingress2Gateway 1.0: Your Path to Gateway API

With the Ingress-NGINX retirement scheduled for March 2026, the Kubernetes networking landscape is at a turning point. For most organizations, the question isn't whether to migrate to Gateway API, but how to do so safely.

Migrating from Ingress to Gateway API is a fundamental shift in API design. Gateway API provides a modular, extensible API with strong support for Kubernetes-native RBAC. Conversely, the Ingress API is simple, and implementations such as Ingress-NGINX extend the API through esoteric annotations, ConfigMaps, and CRDs. Migrating away from Ingress controllers such as Ingress-NGINX presents the daunting task of capturing all the nuances of the Ingress controller, and mapping that behavior to Gateway API.

Ingress2Gateway is an assistant that helps teams confidently move from Ingress to Gateway API. It translates Ingress resources/manifests along with implementation-specific annotations to Gateway API while warning you about untranslatable configuration and offering suggestions.

Today, SIG Network is proud to announce the 1.0 release of Ingress2Gateway. This milestone represents a stable, tested migration assistant for teams ready to modernize their networking stack.

Ingress2Gateway 1.0

Ingress-NGINX annotation support

The main improvement for the 1.0 release is more comprehensive Ingress-NGINX support. Before the 1.0 release, Ingress2Gateway only supported three Ingress-NGINX annotations. For the 1.0 release, Ingress2Gateway supports over 30 common annotations (CORS, backend TLS, regex matching, path rewrite, etc.).

Comprehensive integration testing

Each supported Ingress-NGINX annotation, and representative combinations of common annotations, is backed by controller-level integration tests that verify the behavioral equivalence of the Ingress-NGINX configuration and the generated Gateway API. These tests exercise real controllers in live clusters and compare runtime behavior (routing, redirects, rewrites, etc.), not just YAML structure.

The tests:

A comprehensive test suite not only catches bugs in development, but also ensures the correctness of the translation, especially given surprising edge cases and unexpected defaults, so that you don't find out about them in production.

Notification & error handling

Migration is not a "one-click" affair. Surfacing subtleties and untranslatable behavior is as important as translating supported configuration. The 1.0 release cleans up the formatting and content of notifications, so it is clear what is missing and how you can fix it.

Using Ingress2Gateway

Ingress2Gateway is a migration assistant, not a one-shot replacement. Its goal is to

The rest of the section shows you how to safely migrate the following Ingress-NGINX configuration

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
 annotations:
 nginx.ingress.kubernetes.io/proxy-body-size: "1G"
 nginx.ingress.kubernetes.io/use-regex: "true"
 nginx.ingress.kubernetes.io/proxy-send-timeout: "1"
 nginx.ingress.kubernetes.io/proxy-read-timeout: "1"
 nginx.ingress.kubernetes.io/enable-cors: "true"
 nginx.ingress.kubernetes.io/configuration-snippet: |
 more_set_headers "Request-Id: $req_id";
 name: my-ingress
 namespace: my-ns
spec:
 ingressClassName: nginx
 rules:
 - host: my-host.example.com
 http:
 paths:
 - backend:
 service:
 name: website-service
 port:
 number: 80
 path: /users/(\d+)
 pathType: ImplementationSpecific
 tls:
 - hosts:
 - my-host.example.com
 secretName: my-secret

1. Install Ingress2Gateway

If you have a Go environment set up, you can install Ingress2Gateway with

go install github.com/kubernetes-sigs/ingress2gateway@v1.0.0

Otherwise,

brew install ingress2gateway

You can also download the binary from GitHub or build from source.

2. Run Ingress2Gateway

You can pass Ingress2Gateway Ingress manifests, or have the tool read directly from your cluster.

# Pass it files
ingress2gateway print --input-file my-manifest.yaml,my-other-manifest.yaml --providers=ingress-nginx > gwapi.yaml
# Use a namespace in your cluster
ingress2gateway print --namespace my-api --providers=ingress-nginx > gwapi.yaml
# Or your whole cluster
ingress2gateway print --providers=ingress-nginx --all-namespaces > gwapi.yaml

Note:

You can also pass --emitter <agentgateway|envoy-gateway|kgateway> to output implementation-specific extensions.

3. Review the output

This is the most critical step. The commands from the previous section output a Gateway API manifest to gwapi.yaml, and they also emit warnings that explain what did not translate exactly and what to review manually.

apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
 annotations:
 gateway.networking.k8s.io/generator: ingress2gateway-dev
 name: nginx
 namespace: my-ns
spec:
 gatewayClassName: nginx
 listeners:
 - hostname: my-host.example.com
 name: my-host-example-com-http
 port: 80
 protocol: HTTP
 - hostname: my-host.example.com
 name: my-host-example-com-https
 port: 443
 protocol: HTTPS
 tls:
 certificateRefs:
 - group: ""
 kind: Secret
 name: my-secret
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
 annotations:
 gateway.networking.k8s.io/generator: ingress2gateway-dev
 name: my-ingress-my-host-example-com
 namespace: my-ns
spec:
 hostnames:
 - my-host.example.com
 parentRefs:
 - name: nginx
 port: 443
 rules:
 - backendRefs:
 - name: website-service
 port: 80
 filters:
 - cors:
 allowCredentials: true
 allowHeaders:
 - DNT
 - Keep-Alive
 - User-Agent
 - X-Requested-With
 - If-Modified-Since
 - Cache-Control
 - Content-Type
 - Range
 - Authorization
 allowMethods:
 - GET
 - PUT
 - POST
 - DELETE
 - PATCH
 - OPTIONS
 allowOrigins:
 - '*'
 maxAge: 1728000
 type: CORS
 matches:
 - path:
 type: RegularExpression
 value: (?i)/users/(\d+).*
 name: rule-0
 timeouts:
 request: 10s
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
 annotations:
 gateway.networking.k8s.io/generator: ingress2gateway-dev
 name: my-ingress-my-host-example-com-ssl-redirect
 namespace: my-ns
spec:
 hostnames:
 - my-host.example.com
 parentRefs:
 - name: nginx
 port: 80
 rules:
 - filters:
 - requestRedirect:
 scheme: https
 statusCode: 308
 type: RequestRedirect

Ingress2Gateway successfully translated some annotations into their Gateway API equivalents. For example, the nginx.ingress.kubernetes.io/enable-cors annotation was translated into a CORS filter. But upon closer inspection, the nginx.ingress.kubernetes.io/proxy-{read,send}-timeout and nginx.ingress.kubernetes.io/proxy-body-size annotations do not map perfectly. The logs show the reason for these omissions as well as reasoning behind the translation.

┌─ WARN ────────────────────────────────────────
│ Unsupported annotation nginx.ingress.kubernetes.io/configuration-snippet
│ source: INGRESS-NGINX
│ object: Ingress: my-ns/my-ingress
└─
┌─ INFO ────────────────────────────────────────
│ Using case-insensitive regex path matches. You may want to change this.
│ source: INGRESS-NGINX
│ object: HTTPRoute: my-ns/my-ingress-my-host-example-com
└─
┌─ WARN ────────────────────────────────────────
│ ingress-nginx only supports TCP-level timeouts; i2gw has made a best-effort translation to Gateway API timeouts.request. Please verify that this meets your needs. See documentation: https://gateway-api.sigs.k8s.io/guides/http-timeouts/
│ source: INGRESS-NGINX
│ object: HTTPRoute: my-ns/my-ingress-my-host-example-com
└─
┌─ WARN ────────────────────────────────────────
│ Failed to apply my-ns.my-ingress.metadata.annotations."nginx.ingress.kubernetes.io/proxy-body-size" from my-ns/my-ingress: Most Gateway API implementations have reasonable body size and buffering defaults
│ source: STANDARD_EMITTER
│ object: HTTPRoute: my-ns/my-ingress-my-host-example-com
└─
┌─ WARN ────────────────────────────────────────
│ Gateway API does not support configuring URL normalization (RFC 3986, Section 6). Please check if this matters for your use case and consult implementation-specific details.
│ source: STANDARD_EMITTER
└─

There is a warning that Ingress2Gateway does not support the nginx.ingress.kubernetes.io/configuration-snippet annotation. You will have to check your Gateway API implementation documentation to see if there is a way to achieve equivalent behavior.

The tool also notified us that Ingress-NGINX regex matches are case-insensitive prefix matches, which is why there is a match pattern of (?i)/users/(\d+).*. Most organizations will want to change this behavior to be an exact case-sensitive match by removing the leading (?i) and the trailing .* from the path pattern.

Ingress2Gateway made a best-effort translation from the nginx.ingress.kubernetes.io/proxy-{send,read}-timeout annotations to a 10 second request timeout in our HTTP route. If requests for this service should be much shorter, say 3 seconds, you can make the corresponding changes to your Gateway API manifests.

Also, nginx.ingress.kubernetes.io/proxy-body-size does not have a Gateway API equivalent, and was thus not translated. However, most Gateway API implementations have reasonable defaults for maximum body size and buffering, so this might not be a problem in practice. Further, some emitters might offer support for this annotation through implementation-specific extensions. For example, adding the --emitter agentgateway, --emitter envoy-gateway, or --emitter kgateway flag to the previous ingress2gateway print command would have resulted in additional implementation-specific configuration in the generated Gateway API manifests that attempted to capture the body size configuration.

We also see a warning about URL normalization. Gateway API implementations such as Agentgateway, Envoy Gateway, Kgateway, and Istio have some level of URL normalization, but the behavior varies across implementations and is not configurable through standard Gateway API. You should check and test the URL normalization behavior of your Gateway API implementation to ensure it is compatible with your use case.

To match Ingress-NGINX default behavior, Ingress2Gateway also added a listener on port 80 and a HTTP Request redirect filter to redirect HTTP traffic to HTTPS. You may not want to serve HTTP traffic at all and remove the listener on port 80 and the corresponding HTTPRoute.

Caution:

Always thoroughly review the generated output and logs.

After manually applying these changes, the Gateway API manifests might look as follows.

---
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
 annotations:
 gateway.networking.k8s.io/generator: ingress2gateway-dev
 name: nginx
 namespace: my-ns
spec:
 gatewayClassName: nginx
 listeners:
 - hostname: my-host.example.com
 name: my-host-example-com-https
 port: 443
 protocol: HTTPS
 tls:
 certificateRefs:
 - group: ""
 kind: Secret
 name: my-secret
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
 annotations:
 gateway.networking.k8s.io/generator: ingress2gateway-dev
 name: my-ingress-my-host-example-com
 namespace: my-ns
spec:
 hostnames:
 - my-host.example.com
 parentRefs:
 - name: nginx
 port: 443
 rules:
 - backendRefs:
 - name: website-service
 port: 80
 filters:
 - cors:
 allowCredentials: true
 allowHeaders:
 - DNT
 ...
 allowMethods:
 - GET
 ...
 allowOrigins:
 - '*'
 maxAge: 1728000
 type: CORS
 matches:
 - path:
 type: RegularExpression
 value: /users/(\d+)
 name: rule-0
 timeouts:
 request: 3s

4. Verify

Now that you have Gateway API manifests, you should thoroughly test them in a development cluster. In this case, you should at least double-check that your Gateway API implementation's maximum body size defaults are appropriate for you and verify that a three-second timeout is enough.

After validating behavior in a development cluster, deploy your Gateway API configuration alongside your existing Ingress. We strongly suggest that you then gradually shift traffic using weighted DNS, your cloud load balancer, or traffic-splitting features of your platform. This way, you can quickly recover from any misconfiguration that made it through your tests.

Finally, when you have shifted all your traffic to your Gateway API controller, delete your Ingress resources and uninstall your Ingress controller.

Conclusion

The Ingress2Gateway 1.0 release is just the beginning, and we hope that you use Ingress2Gateway to safely migrate to Gateway API. As we approach the March 2026 Ingress-NGINX retirement, we invite the community to help us increase our configuration coverage, expand testing, and improve UX.

Resources about Gateway API

The scope of Gateway API can be daunting. Here are some resources to help you work with Gateway API:

20 Mar 2026 7:00pm GMT

Running Agents on Kubernetes with Agent Sandbox

The landscape of artificial intelligence is undergoing a massive architectural shift. In the early days of generative AI, interacting with a model was often treated as a transient, stateless function call: a request that spun up, executed for perhaps 50 milliseconds, and terminated.

Today, the world is witnessing AI v2 eating AI v1. The ecosystem is moving from short-lived, isolated tasks to deploying multiple, coordinated AI agents that run constantly. These autonomous agents need to maintain context, use external tools, write and execute code, and communicate with one another over extended periods.

As platform engineering teams look for the right infrastructure to host these new AI workloads, one platform stands out as the natural choice: Kubernetes. However, mapping these unique agentic workloads to traditional Kubernetes primitives requires a new abstraction.

This is where the new Agent Sandbox project (currently in development under SIG Apps) comes into play.

The Kubernetes advantage (and the abstraction gap)

Kubernetes is the de facto standard for orchestrating cloud-native applications precisely because it solves the challenges of extensibility, robust networking, and ecosystem maturity. However, as AI evolves from short-lived inference requests to long-running, autonomous agents, we are seeing the emergence of a new operational pattern.

AI agents, by contrast, are typically isolated, stateful, singleton workloads. They act as a digital workspace or execution environment for an LLM. An agent needs a persistent identity and a secure scratchpad for writing and executing (often untrusted) code. Crucially, because these long-lived agents are expected to be mostly idle except for brief bursts of activity, they require a lifecycle that supports mechanisms like suspension and rapid resumption.

While you could theoretically approximate this by stringing together a StatefulSet of size 1, a headless Service, and a PersistentVolumeClaim for every single agent, managing this at scale becomes an operational nightmare.

Because of these unique properties, traditional Kubernetes primitives don't perfectly align.

Introducing Kubernetes Agent Sandbox

To bridge this gap, SIG Apps is developing agent-sandbox. The project introduces a declarative, standardized API specifically tailored for singleton, stateful workloads like AI agent runtimes.

At its core, the project introduces the Sandbox CRD. It acts as a lightweight, single-container environment built entirely on Kubernetes primitives, offering:

Scaling agents with extensions

Because the AI space is moving incredibly quickly, we built an Extensions API layer that enables even faster iteration and development.

Starting a new pod adds about a second of overhead. That's perfectly fine when deploying a new version of a microservice, but when an agent is invoked after being idle, a one-second cold start breaks the continuity of the interaction. It forces the user or the orchestrating service to wait for the environment to provision before the model can even begin to think or act. SandboxWarmPool solves this by maintaining a pool of pre-provisioned Sandbox pods, effectively eliminating cold starts. Users or orchestration services can simply issue a SandboxClaim against a SandboxTemplate, and the controller immediately hands over a pre-warmed, fully isolated environment to the agent.

Quick start

Ready to try it yourself? You can install the Agent Sandbox core components and extensions directly into your learning or sandbox cluster, using your chosen release.

We recommend you use the latest release as the project is moving fast.

# Replace "vX.Y.Z" with a specific version tag (e.g., "v0.1.0") from
# https://github.com/kubernetes-sigs/agent-sandbox/releases
export VERSION="vX.Y.Z"

# Install the core components:
kubectl apply -f https://github.com/kubernetes-sigs/agent-sandbox/releases/download/${VERSION}/manifest.yaml

# Install the extensions components (optional):
kubectl apply -f https://github.com/kubernetes-sigs/agent-sandbox/releases/download/${VERSION}/extensions.yaml

# Install the Python SDK (optional):
# Create a virtual Python environment
python3 -m venv .venv
source .venv/bin/activate
# Install from PyPI
pip install k8s-agent-sandbox

Once installed, you can try out the Python SDK for AI agents or deploy one of the ready-to-use examples to see how easy it is to spin up an isolated agent environment.

The future of agents is cloud native

Whether it's a 50-millisecond stateless task, or a multi-week, mostly-idle collaborative process, extending Kubernetes with primitives designed specifically for isolated stateful singletons allows us to leverage all the robust benefits of the cloud-native ecosystem.

The Agent Sandbox project is open source and community-driven. If you are building AI platforms, developing agentic frameworks, or are interested in Kubernetes extensibility, we invite you to get involved:

20 Mar 2026 6:00pm GMT

18 Mar 2026

feedKubernetes Blog

Securing Production Debugging in Kubernetes

During production debugging, the fastest route is often broad access such as cluster-admin (a ClusterRole that grants administrator-level access), shared bastions/jump boxes, or long-lived SSH keys. It works in the moment, but it comes with two common problems: auditing becomes difficult, and temporary exceptions have a way of becoming routine.

This post offers my recommendations for good practices applicable to existing Kubernetes environments with minimal tooling changes:

A good architecture for securing production debugging workflows is to use a just-in-time secure shell gateway (often deployed as an on demand pod in the cluster). It acts as an SSH-style "front door" that makes temporary access actually temporary. You can authenticate with short-lived, identity-bound credentials, establish a session to the gateway, and the gateway uses the Kubernetes API and RBAC to control what they can do, such as pods/log, pods/exec, and pods/portforward. Sessions expire automatically, and both the gateway logs and Kubernetes audit logs capture who accessed what and when without shared bastion accounts or long-lived keys.

1) Using an access broker on top of Kubernetes RBAC

RBAC controls who can do what in Kubernetes. Many Kubernetes environments rely primarily on RBAC for authorization, although Kubernetes also supports other authorization modes such as Webhook authorization. You can enforce access directly with Kubernetes RBAC, or put an access broker in front of the cluster that still relies on Kubernetes permissions under the hood. In either model, Kubernetes RBAC remains the source of truth for what the Kubernetes API allows and at what scope.

An access broker adds controls that RBAC does not cover well. For example, it can decide whether a request is auto-approved or requires manual approval, whether a user can run a command, and which commands are allowed in a session. It can also manage group membership so that you grant permissions to groups instead of individual users. Kubernetes RBAC can allow actions such as pods/exec, but it cannot restrict which commands run inside an exec session.

With that model, Kubernetes RBAC defines the allowed actions for a user or group (for example, an on-call team in a single namespace). I recommend you only define access rules that grant rights to groups or to ServiceAccounts - never to individual users. The broker or identity provider then adds or removes users from that group as needed.

The broker can also enforce extra policy on top, like which commands are permitted in an interactive session and which requests can be auto-approved versus require manual approval. That policy can live in a JSON or XML file and be maintained through code review, so updates go through a formal pull request and are reviewed like any other production change.

Example: a namespaced on-call debug Role

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
 name: oncall-debug
 namespace: <namespace>
rules:
 # Discover what's running
 - apiGroups: [""]
 resources: ["pods", "events"]
 verbs: ["get", "list", "watch"]

 # Read logs
 - apiGroups: [""]
 resources: ["pods/log"]
 verbs: ["get"]

 # Interactive debugging actions
 - apiGroups: [""]
 resources: ["pods/exec", "pods/portforward"]
 verbs: ["create"]

 # Understand rollout/controller state
 - apiGroups: ["apps"]
 resources: ["deployments", "replicasets"]
 verbs: ["get", "list", "watch"]

 # Optional: allow kubectl debug ephemeral containers
 - apiGroups: [""]
 resources: ["pods/ephemeralcontainers"]
 verbs: ["update"]

Bind the Role to a group (rather than individual users) so membership can be managed through your identity provider:

apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
 name: oncall-debug
 namespace: <namespace>
subjects:
 - kind: Group
 name: oncall-<team-name>
 apiGroup: rbac.authorization.k8s.io
roleRef:
 kind: Role
 name: oncall-debug
 apiGroup: rbac.authorization.k8s.io

2) Short-lived, identity-bound credentials

The goal is to use short-lived, identity-bound credentials that clearly tie a session to a real person and expire quickly. These credentials can include the user's identity and the scope of what they're allowed to do. They're typically signed using a private key that stays with the engineer, such as a hardware-backed key (for example, a YubiKey), so they can not be forged without access to that key.

You can implement this with Kubernetes-native authentication (for example, client certificates or an OIDC-based flow), or have the access broker from the previous section issue short-lived credentials on the user's behalf. In many setups, Kubernetes still uses RBAC to enforce permissions based on the authenticated identity and groups/claims. If you use an access broker, it can also encode additional scope constraints in the credential and enforce them during the session, such as which cluster or namespace the session applies to and which actions (or approved commands) are allowed against pods or nodes. In either case, the credentials should be signed by a certificate authority (CA), and that CA should be rotated on a regular schedule (for example, quarterly) to limit long-term risk.

Option A: short-lived OIDC tokens

A lot of managed Kubernetes clusters already give you short-lived tokens. The main thing is to make sure your kubeconfig refreshes them automatically instead of copying a long-lived token into the file.

For example:

users:
- name: oncall
 user:
 exec:
 apiVersion: client.authentication.k8s.io/v1
 command: cred-helper
 args: ["--cluster=prod", "--ttl=30m"]

Option B: Short-lived client certificates (X.509)

If your API server (or your access broker from the previous section) is set up to trust a client CA, you can use short-lived client certificates for debugging access. The idea is:

This is straightforward to operationalize with the Kubernetes CertificateSigningRequest API.

Generate a key and CSR locally:

# Generate a private key.
# This could instead be generated within a hardware token;
# OpenSSL and several similar tools include support for that.
openssl genpkey -algorithm Ed25519 -out oncall.key

openssl req -new -key oncall.key -out oncall.csr \
 -subj "/CN=user/O=oncall-payments"

Create a CertificateSigningRequest with a short expiration:

apiVersion: certificates.k8s.io/v1
kind: CertificateSigningRequest
metadata:
 name: oncall-<user>-20260218
spec:
 request: <base64-encoded oncall.csr>
 signerName: kubernetes.io/kube-apiserver-client
 expirationSeconds: 1800 # 30 minutes
 usages:
 - client auth

After the CSR is approved and signed, you extract the issued certificate and use it together with the private key to authenticate, for example via kubectl.

3) Use a just-in-time access gateway to run debugging commands

Once you have short-lived credentials, you can use them to open a secure shell session to a just-in-time access gateway, often exposed over SSH and created on demand. If the gateway is exposed over SSH, a common pattern is to issue the engineer a short-lived OpenSSH user certificate for the session. The gateway trusts your SSH user CA, authenticates the engineer at connection time, and then applies the approved session policy before making Kubernetes API calls on the user's behalf. OpenSSH certificates are separate from Kubernetes X.509 client certificates, so these are usually treated as distinct layers.

The resulting session should also be scoped so it cannot be reused outside of what was approved. For example, the gateway or broker can limit it to a specific cluster and namespace, and optionally to a narrower target such as a pod or node. That way, even if someone tries to reuse the access, it will not work outside the intended scope. After the session is established, the gateway executes only the allowed actions and records what happened for auditing.

Example: Namespace-scoped role bindings

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
 name: jit-debug
 namespace: <namespace>
 annotations:
 kubernetes.io/description: >
 Colleagues performing semi-privileged debugging, with access provided
 just in time and on demand.
rules:
 - apiGroups: [""]
 resources: ["pods", "pods/log"]
 verbs: ["get", "list", "watch"]
 - apiGroups: [""]
 resources: ["pods/exec"]
 verbs: ["create"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
 name: jit-debug
 namespace: <namespace>
subjects:
 - kind: Group
 name: jit:oncall:<namespace>  # mapped from the short-lived credential (cert/OIDC)
 apiGroup: rbac.authorization.k8s.io
roleRef:
 kind: Role
 name: jit-debug
 apiGroup: rbac.authorization.k8s.io

These RBAC objects, and the rules they define, allow debugging only within the specified namespace; attempts to access other namespaces are not allowed.

Example: Cluster-scoped role binding

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
 name: jit-cluster-read
rules:
 - apiGroups: [""]
 resources: ["nodes", "namespaces"]
 verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
 name: jit-cluster-read
subjects:
 - kind: Group
 name: jit:oncall:cluster
 apiGroup: rbac.authorization.k8s.io
roleRef:
 kind: ClusterRole
 name: jit-cluster-read
 apiGroup: rbac.authorization.k8s.io

These RBAC rules grant cluster-wide read access (for example, to nodes and namespaces) and should be used only for workflows that truly require cluster-scoped resources.

Finer-grained restrictions like "only this pod/node" or "only these commands" are typically enforced by the access gateway/broker during the session, but Kubernetes also offers other options, such as ValidatingAdmissionPolicy for restricting writes and webhook authorization for custom authorization across verbs.

In environments with stricter access controls, you can add an extra, short-lived session mediation layer to separate session establishment from privileged actions. Both layers are ephemeral, use identity-bound expiring credentials, and produce independent audit trails. The mediation layer handles session setup/forwarding, while the execution layer performs only RBAC-authorized Kubernetes actions. This separation can reduce exposure by narrowing responsibilities, scoping credentials per step, and enforcing end-to-end session expiry.

References

Disclaimer: The views expressed in this post are solely those of the author and do not reflect the views of the author's employer or any other organization.

18 Mar 2026 6:00pm GMT

17 Mar 2026

feedKubernetes Blog

The Invisible Rewrite: Modernizing the Kubernetes Image Promoter

Every container image you pull from registry.k8s.io got there through kpromo, the Kubernetes image promoter. It copies images from staging registries to production, signs them with cosign, replicates signatures across more than 20 regional mirrors, and generates SLSA provenance attestations. If this tool breaks, no Kubernetes release ships. Over the past few weeks, we rewrote its core from scratch, deleted 20% of the codebase, made it dramatically faster, and nobody noticed. That was the whole point.

A bit of history

The image promoter started in late 2018 as an internal Google project by Linus Arver. The goal was simple: replace the manual, Googler-gated process of copying container images into k8s.gcr.io with a community-owned, GitOps-based workflow. Push to a staging registry, open a PR with a YAML manifest, get it reviewed and merged, and automation handles the rest. KEP-1734 formalized this proposal.

In early 2019, the code moved to kubernetes-sigs/k8s-container-image-promoter and grew quickly. Over the next few years, Stephen Augustus consolidated multiple tools (cip, gh2gcs, krel promote-images, promobot-files) into a single CLI called kpromo. The repository was renamed to promo-tools. Adolfo Garcia Veytia (Puerco) added cosign signing and SBOM support. Tyler Ferrara built vulnerability scanning. Carlos Panato kept the project in a healthy and releasable state. 42 contributors made about 3,500 commits across more than 60 releases.

It worked. But by 2025 the codebase carried the weight of seven years of incremental additions from multiple SIGs and subprojects. The README said it plainly: you will see duplicated code, multiple techniques for accomplishing the same thing, and several TODOs.

The problems we needed to solve

Production promotion jobs for Kubernetes core images regularly took over 30 minutes and frequently failed with rate limit errors. The core promotion logic had grown into a monolith that was hard to extend and difficult to test, making new features like provenance or vulnerability scanning painful to add.

On the SIG Release roadmap, two work items had been sitting for a while: "Rewrite artifact promoter" and "Make artifact validation more robust". We had discussed these at SIG Release meetings and KubeCons, and the open research spikes on project board #171 captured eight questions that needed answers before we could move forward.

One issue to answer them all

In February 2026, we opened issue #1701 ("Rewrite artifact promoter pipeline") and answered all eight spikes in a single tracking issue. The rewrite was deliberately phased so that each step could be reviewed, merged, and validated independently. Here is what we did:

Phase 1: Rate Limiting (#1702). Rewrote rate limiting to properly throttle all registry operations with adaptive backoff.

Phase 2: Interfaces (#1704). Put registry and auth operations behind clean interfaces so they can be swapped out and tested independently.

Phase 3: Pipeline Engine (#1705). Built a pipeline engine that runs promotion as a sequence of distinct phases instead of one large function.

Phase 4: Provenance (#1706). Added SLSA provenance verification for staging images.

Phase 5: Scanner and SBOMs (#1709). Added vulnerability scanning and SBOM support. Flipped the default to the new pipeline engine. At this point we cut v4.2.0 and let it soak in production before continuing.

Phase 6: Split Signing from Replication (#1713). Separated image signing from signature replication into their own pipeline phases, eliminating the rate limit contention that caused most production failures.

Phase 7: Remove Legacy Pipeline (#1712). Deleted the old code path entirely.

Phase 8: Remove Legacy Dependencies (#1716). Deleted the audit subsystem, deprecated tools, and e2e test infrastructure.

Phase 9: Delete the Monolith (#1718). Removed the old monolithic core and its supporting packages. Thousands of lines deleted across phases 7 through 9.

Each phase shipped independently. v4.3.0 followed the next day with the legacy code fully removed.

With the new architecture in place, a series of follow-up improvements landed: parallelized registry reads (#1736), retry logic for all network operations (#1742), per-request timeouts to prevent pipeline hangs (#1763), HTTP connection reuse (#1759), local registry integration tests (#1746), the removal of deprecated credential file support (#1758), a rework of attestation handling to use cosign's OCI APIs and the removal of deprecated SBOM support (#1764), and a dedicated promotion record predicate type registered with the in-toto attestation framework (#1767). These would have been much harder to land without the clean separation the rewrite provided. v4.4.0 shipped all of these improvements and enabled provenance generation and verification by default.

The new pipeline

The promotion pipeline now has seven clearly separated phases:

graph LR Setup --> Plan --> Provenance --> Validate --> Promote --> Sign --> Attest
Phase What it does
Setup Validate options, prewarm TUF cache.
Plan Parse manifests, read registries, compute which images need promotion.
Provenance Verify SLSA attestations on staging images.
Validate Check cosign signatures, exit here for dry runs.
Promote Copy images server-side, preserving digests.
Sign Sign promoted images with keyless cosign.
Attest Generate promotion provenance attestations using a dedicated in-toto predicate type.

Phases run sequentially, so each one gets exclusive access to the full rate limit budget. No more contention. Signature replication to mirror registries is no longer part of this pipeline and runs as a dedicated periodic Prow job instead.

Making it fast

With the architecture in place, we turned to performance.

Parallel registry reads (#1736): The plan phase reads 1,350 registries. We parallelized this and the plan phase dropped from about 20 minutes to about 2 minutes.

Two-phase tag listing (#1761): Instead of checking all 46,000 image groups across more than 20 mirrors, we first check only the source repositories. About 57% of images have no signatures at all because they were promoted before signing was enabled. We skip those entirely, cutting API calls roughly in half.

Source check before replication (#1727): Before iterating all mirrors for a given image, we check if the signature exists on the primary registry first. In steady state where most signatures are already replicated, this reduced the work from about 17 hours to about 15 minutes.

Per-request timeouts (#1763): We observed intermittent hangs where a stalled connection blocked the pipeline for over 9 hours. Every network operation now has its own timeout and transient failures are retried automatically.

Connection reuse (#1759): We started reusing HTTP connections and auth state across operations, eliminating redundant token negotiations. This closed a long-standing request from 2023.

By the numbers

Here is what the rewrite looks like in aggregate.

The codebase shrank by a fifth while gaining provenance attestations, a pipeline engine, vulnerability scanning integration, parallelized operations, retry logic, integration tests against local registries, and a standalone signature replication mode.

No user-facing changes

This was a hard requirement. The kpromo cip command accepts the same flags and reads the same YAML manifests. The post-k8sio-image-promo Prow job continued working throughout. The promotion manifests in kubernetes/k8s.io did not change. Nobody had to update their workflows or configuration.

We caught two regressions early in production. One (#1731) caused a registry key mismatch that made every image appear as "lost" so that nothing was promoted. Another (#1733) set the default thread count to zero, blocking all goroutines. Both were fixed within hours. The phased release strategy (v4.2.0 with the new engine, v4.3.0 with legacy code removed) gave us a clear rollback path that we fortunately never needed.

What comes next

Signature replication across all mirror registries remains the most expensive part of the promotion cycle. Issue #1762 proposes eliminating it entirely by having archeio (the registry.k8s.io redirect service) route signature tag requests to a single canonical upstream instead of per-region backends. Another option would be to move signing closer to the registry infrastructure itself. Both approaches need further discussion with the SIG Release and infrastructure teams, but either one would remove thousands of API calls per promotion cycle and simplify the codebase even further.

Thank you

This project has been a community effort spanning seven years. Thank you to Linus, Stephen, Adolfo, Carlos, Ben, Marko, Lauri, Tyler, Arnaud, and many others who contributed code, reviews, and planning over the years. The SIG Release and Release Engineering communities provided the context, the discussions, and the patience for a rewrite of infrastructure that every Kubernetes release depends on.

If you want to get involved, join us in #release-management on the Kubernetes Slack or check out the repository.

17 Mar 2026 12:00am GMT

09 Mar 2026

feedKubernetes Blog

Announcing the AI Gateway Working Group

The community around Kubernetes includes a number of Special Interest Groups (SIGs) and Working Groups (WGs) facilitating discussions on important topics between interested contributors. Today, we're excited to announce the formation of the AI Gateway Working Group, a new initiative focused on developing standards and best practices for networking infrastructure that supports AI workloads in Kubernetes environments.

What is an AI Gateway?

In a Kubernetes context, an AI Gateway refers to network gateway infrastructure (including proxy servers, load-balancers, etc.) that generally implements the Gateway API specification with enhanced capabilities for AI workloads. Rather than defining a distinct product category, AI Gateways describe infrastructure designed to enforce policy on AI traffic, including:

Working group charter and mission

The AI Gateway Working Group operates under a clear charter with the mission to develop proposals for Kubernetes Special Interest Groups (SIGs) and their sub-projects. Its primary goals include:

Active proposals

WG AI Gateway currently has several active proposals that address key challenges in AI workload networking:

Payload Processing

The payload processing proposal addresses the critical need for AI workloads to inspect and transform full HTTP request and response payloads. This enables:

AI Inference Security

AI Inference Optimization

The proposal defines standards for declarative payload processor configuration, ordered processing pipelines, and configurable failure modes - all essential for production AI workload deployments.

Egress gateways

Modern AI applications increasingly depend on external inference services, whether for specialized models, failover scenarios, or cost optimization. The egress gateways proposal aims to define standards for securely routing traffic outside the cluster. Key features include:

External AI Service Integration

Advanced Traffic Management

User Stories We're Addressing

Upcoming events

KubeCon + CloudNativeCon Europe 2026, Amsterdam

AI Gateway working group members will be presenting at KubeCon + CloudNativeCon Europe in Amsterdam, discussing the problems at the intersection of AI and networking, including the working group's active proposals, as well as the intersection of AI gateways with Model Context Protocol (MCP) and agent networking patterns.
This session will showcase how AI Gateway working group proposals enable the infrastructure needed for next-generation AI deployments and communication patterns.
The session will also include the initial designs, early prototypes, and emerging directions shaping the WG's roadmap.
For more details see our session here:

Get involved

The AI Gateway Working Group represents the Kubernetes community's commitment to standardizing AI workload networking. As AI becomes increasingly integral to modern applications, we need robust, standardized infrastructure that can support the unique requirements of inference workloads while maintaining the security, observability, and reliability standards that Kubernetes users expect.
Our proposals are currently in active development, with implementations beginning across various gateway projects. We're working closely with SIG Network on Gateway API enhancements and collaborating with the broader cloud-native community to ensure our standards meet real-world production needs.

Whether you're a gateway implementer, platform operator, AI application developer, or simply interested in the intersection of Kubernetes and AI, we'd love your input. The working group follows an open contribution model - you can review our proposals, join our weekly meetings, or start discussions on our GitHub repository. To learn more:

The future of AI infrastructure in Kubernetes is being built today, join up and learn how you can contribute and help shape the future of AI-aware gateway capabilities in Kubernetes.

09 Mar 2026 6:00pm GMT

27 Feb 2026

feedKubernetes Blog

Before You Migrate: Five Surprising Ingress-NGINX Behaviors You Need to Know

As announced November 2025, Kubernetes will retire Ingress-NGINX in March 2026. Despite its widespread usage, Ingress-NGINX is full of surprising defaults and side effects that are probably present in your cluster today. This blog highlights these behaviors so that you can migrate away safely and make a conscious decision about which behaviors to keep. This post also compares Ingress-NGINX with Gateway API and shows you how to preserve Ingress-NGINX behavior in Gateway API. The recurring risk pattern in every section is the same: a seemingly correct translation can still cause outages if it does not consider Ingress-NGINX's quirks.

I'm going to assume that you, the reader, have some familiarity with Ingress-NGINX and the Ingress API. Most examples use httpbin as the backend.

Also, note that Ingress-NGINX and NGINX Ingress are two separate Ingress controllers. Ingress-NGINX is an Ingress controller maintained and governed by the Kubernetes community that is retiring March 2026. NGINX Ingress is an Ingress controller by F5. Both use NGINX as the dataplane, but are otherwise unrelated. From now on, this blog post only discusses Ingress-NGINX.

1. Regex matches are prefix-based and case insensitive

Suppose that you wanted to route all requests with a path consisting of only three uppercase letters to the httpbin service. You might create the following Ingress with the nginx.ingress.kubernetes.io/use-regex: "true" annotation and the regex pattern of /[A-Z]{3}.

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
 name: regex-match-ingress
 annotations:
 nginx.ingress.kubernetes.io/use-regex: "true"
spec:
 ingressClassName: nginx
 rules:
 - host: regex-match.example.com
 http:
 paths:
 - path: "/[A-Z]{3}"
 pathType: ImplementationSpecific
 backend:
 service:
 name: httpbin
 port:
 number: 8000

However, because regex matches are prefix and case insensitive, Ingress-NGINX routes any request with a path that starts with any three letters to httpbin:

curl -sS -H "Host: regex-match.example.com" http://<your-ingress-ip>/uuid

The output is similar to:

{
 "uuid": "e55ef929-25a0-49e9-9175-1b6e87f40af7"
}

Note: The /uuid endpoint of httpbin returns a random UUID. A UUID in the response body means that the request was successfully routed to httpbin.

With Gateway API, you can use an HTTP path match with a type of RegularExpression for regular expression path matching. RegularExpression matches are implementation specific, so check with your Gateway API implementation to verify the semantics of RegularExpression matching. Popular Envoy-based Gateway API implementations such as Istio1, Envoy Gateway, and Kgateway do a full case-sensitive match.

Thus, if you are unaware that Ingress-NGINX patterns are prefix and case-insensitive, and, unbeknownst to you, clients or applications send traffic to /uuid (or /uuid/some/other/path), you might create the following HTTP route.

apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
 name: regex-match-route
spec:
 hostnames:
 - regex-match.example.com
 parentRefs:
 - name: <your gateway>  # Change this depending on your use case
 rules:
 - matches:
 - path:
 type: RegularExpression
 value: "/[A-Z]{3}"
 backendRefs:
 - name: httpbin
 port: 8000

However, if your Gateway API implementation does full case-sensitive matches, the above HTTP route would not match a request with a path of /uuid. The above HTTP route would thus cause an outage because requests that Ingress-NGINX routed to httpbin would fail with a 404 Not Found at the gateway.

To preserve the case-insensitive regex matching, you can use the following HTTP route.

apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
 name: regex-match-route
spec:
 hostnames:
 - regex-match.example.com
 parentRefs:
 - name: <your gateway>  # Change this depending on your use case
 rules:
 - matches:
 - path:
 type: RegularExpression
 value: "/[a-zA-Z]{3}.*"
 backendRefs:
 - name: httpbin
 port: 8000

Alternatively, the aforementioned proxies support the (?i) flag to indicate case insensitive matches. Using the flag, the pattern could be (?i)/[a-z]{3}.*.

2. The nginx.ingress.kubernetes.io/use-regex applies to all paths of a host across all (Ingress-NGINX) Ingresses

Now, suppose that you have an Ingress with the nginx.ingress.kubernetes.io/use-regex: "true" annotation, but you want to route requests with a path of exactly /headers to httpbin. Unfortunately, you made a typo and set the path to /Header instead of /headers.

---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
 name: regex-match-ingress
 annotations:
 nginx.ingress.kubernetes.io/use-regex: "true"
spec:
 ingressClassName: nginx
 rules:
 - host: regex-match.example.com
 http:
 paths:
 - path: "<some regex pattern>"
 pathType: ImplementationSpecific
 backend:
 service:
 name: <your backend>
 port:
 number: 8000
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
 name: regex-match-ingress-other
spec:
 ingressClassName: nginx
 rules:
 - host: regex-match.example.com
 http:
 paths:
 - path: "/Header" # typo here, should be /headers
 pathType: Exact
 backend:
 service:
 name: httpbin
 port:
 number: 8000

Most would expect a request to /headers to respond with a 404 Not Found, since /headers does not match the Exact path of /Header. However, because the regex-match-ingress Ingress has the nginx.ingress.kubernetes.io/use-regex: "true" annotation and the regex-match.example.com host, all paths with the regex-match.example.com host are treated as regular expressions across all (Ingress-NGINX) Ingresses. Since regex patterns are case-insensitive prefix matches, /headers matches the /Header pattern and Ingress-NGINX routes such requests to httpbin. Running the command

curl -sS -H "Host: regex-match.example.com" http://<your-ingress-ip>/headers

the output looks like:

{
 "headers": {
 ...
 }
}

Note: The /headers endpoint of httpbin returns the request headers. The fact that the response contains the request headers in the body means that the request was successfully routed to httpbin.

Gateway API does not silently convert or interpret Exact and Prefix matches as regex patterns. So if you converted the above Ingresses into the following HTTP route and preserved the typo and match types, requests to /headers will respond with a 404 Not Found instead of a 200 OK.

apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
 name: regex-match-route
spec:
 hostnames:
 - regex-match.example.com
 rules:
 ...
 - matches:
 - path:
 type: Exact
 value: "/Header"
 backendRefs:
 - name: httpbin
 port: 8000

To keep the case-insensitive prefix matching, you can change

 - matches:
 - path:
 type: Exact
 value: "/Header"

to

 - matches:
 - path:
 type: RegularExpression
 value: "(?i)/Header"

Or even better, you could fix the typo and change the match to

 - matches:
 - path:
 type: Exact
 value: "/headers"

3. Rewrite target implies regex

In this case, suppose you want to rewrite the path of requests with a path of /ip to /uuid before routing them to httpbin, and as in Section 2, you want to route requests with the path of exactly /headers to httpbin. However, you accidentally make a typo and set the path to /IP instead of /ip and /Header instead of /headers.

---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
 name: rewrite-target-ingress
 annotations:
 nginx.ingress.kubernetes.io/rewrite-target: "/uuid"
spec:
 ingressClassName: nginx
 rules:
 - host: rewrite-target.example.com
 http:
 paths:
 - path: "/IP"
 pathType: Exact
 backend:
 service:
 name: httpbin
 port:
 number: 8000
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
 name: rewrite-target-ingress-other
spec:
 ingressClassName: nginx
 rules:
 - host: rewrite-target.example.com
 http:
 paths:
 - path: "/Header"
 pathType: Exact
 backend:
 service:
 name: httpbin
 port:
 number: 8000

The nginx.ingress.kubernetes.io/rewrite-target: "/uuid" annotation causes requests that match paths in the rewrite-target-ingress Ingress to have their paths rewritten to /uuid before being routed to the backend.

Even though no Ingress has the nginx.ingress.kubernetes.io/use-regex: "true" annotation, the presence of the nginx.ingress.kubernetes.io/rewrite-target annotation in the rewrite-target-ingress Ingress causes all paths with the rewrite-target.example.com host to be treated as regex patterns. In other words, the nginx.ingress.kubernetes.io/rewrite-target silently adds the nginx.ingress.kubernetes.io/use-regex: "true" annotation, along with all the side effects discussed above.

For example, a request to /ip has its path rewritten to /uuid because /ip matches the case-insensitive prefix pattern of /IP in the rewrite-target-ingress Ingress. After running the command

curl -sS -H "Host: rewrite-target.example.com" http://<your-ingress-ip>/ip

the output is similar to:

{
 "uuid": "12a0def9-1adg-2943-adcd-1234aadfgc67"
}

Like in the nginx.ingress.kubernetes.io/use-regex example, Ingress-NGINX treats paths of other ingresses with the rewrite-target.example.com host as case-insensitive prefix patterns. Running the command

curl -sS -H "Host: rewrite-target.example.com" http://<your-ingress-ip>/headers

gives an output that looks like

{
 "headers": {
 ...
 }
}

You can configure path rewrites in Gateway API with the HTTP URL rewrite filter which does not silently convert your Exact and Prefix matches into regex patterns. However, if you are unaware of the side effects of the nginx.ingress.kubernetes.io/rewrite-target annotation and do not realize that /Header and /IP are both typos, you might create the following HTTP route.

apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
 name: rewrite-target-route
spec:
 hostnames:
 - rewrite-target.example.com
 parentRefs:
 - name: <your-gateway>
 rules:
 - matches:
 - path:
 type: Exact
 value: "/IP"
 filters:
 - type: URLRewrite
 urlRewrite:
 path:
 type: ReplaceFullPath
 replaceFullPath: /uuid
 backendRefs:
 - name: httpbin
 port: 8000
 - matches:
 - path:
 # This is an exact match, irrespective of other rules
 type: Exact
 value: "/Header"
 backendRefs:
 - name: httpbin
 port: 8000

As with Section 2, because /IP is now an Exact match type in your HTTP route, requests to /ip will respond with a 404 Not Found instead of a 200 OK. Similarly, requests to /headers will also respond with a 404 Not Found instead of a 200 OK. Thus, this HTTP route will break applications and clients that rely on the /ip and /headers routes.

To fix this, you can change the matches in the HTTP route to be regex matches, and change the path patterns to be case-insensitive prefix matches, as follows.

 - matches:
 - path:
 type: RegularExpression
 value: "(?i)/IP.*"
...
 - matches:
 - path:
 type: RegularExpression
 value: "(?i)/Header.*"

Or, you can keep the Exact match type and fix the typos.

4. Requests missing a trailing slash are redirected to the same path with a trailing slash

Consider the following Ingress:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
 name: trailing-slash-ingress
spec:
 ingressClassName: nginx
 rules:
 - host: trailing-slash.example.com
 http:
 paths:
 - path: "/my-path/"
 pathType: Exact
 backend:
 service:
 name: <your-backend>
 port:
 number: 8000

You might expect Ingress-NGINX to respond to /my-path with a 404 Not Found since the /my-path does not exactly match the Exact path of /my-path/. However, Ingress-NGINX redirects the request to /my-path/ with a 301 Moved Permanently because the only difference between /my-path and /my-path/ is a trailing slash.

curl -isS -H "Host: trailing-slash.example.com" http://<your-ingress-ip>/my-path

The output looks like:

HTTP/1.1 301 Moved Permanently
...
Location: http://trailing-slash.example.com/my-path/
...

The same applies if you change the pathType to Prefix. However, the redirect does not happen if the path is a regex pattern.

Conformant Gateway API implementations do not silently configure any kind of redirects. If clients or downstream services depend on this redirect, a migration to Gateway API that does not explicitly configure request redirects will cause an outage because requests to /my-path will now respond with a 404 Not Found instead of a 301 Moved Permanently. You can explicitly configure redirects using the HTTP request redirect filter as follows:

apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
 name: trailing-slash-route
spec:
 hostnames:
 - trailing-slash.example.com
 parentRefs:
 - name: <your-gateway>
 rules:
 - matches:
 - path:
 type: Exact
 value: "/my-path"
 filters:
 requestRedirect:
 statusCode: 301
 path:
 type: ReplaceFullPath
 replaceFullPath: /my-path/
 - matches:
 - path:
 type: Exact # or Prefix
 value: "/my-path/"
 backendRefs:
 - name: <your-backend>
 port: 8000

5. Ingress-NGINX normalizes URLs

URL normalization is the process of converting a URL into a canonical form before matching it against Ingress rules and routing it. The specifics of URL normalization are defined in RFC 3986 Section 6.2, but some examples are

Ingress-NGINX normalizes URLs before matching them against Ingress rules. For example, consider the following Ingress:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
 name: path-normalization-ingress
spec:
 ingressClassName: nginx
 rules:
 - host: path-normalization.example.com
 http:
 paths:
 - path: "/uuid"
 pathType: Exact
 backend:
 service:
 name: httpbin
 port:
 number: 8000

Ingress-NGINX normalizes the path of the following requests to /uuid. Now that the request matches the Exact path of /uuid, Ingress-NGINX responds with either a 200 OK response or a 301 Moved Permanently to /uuid.

For the following commands

curl -sS -H "Host: path-normalization.example.com" http://<your-ingress-ip>/uuid
curl -sS -H "Host: path-normalization.example.com" http://<your-ingress-ip>/ip/abc/../../uuid
curl -sSi -H "Host: path-normalization.example.com" http://<your-ingress-ip>////uuid

the outputs are similar to

{
 "uuid": "29c77dfe-73ec-4449-b70a-ef328ea9dbce"
}
{
 "uuid": "d20d92e8-af57-4014-80ba-cf21c0c4ffae"
}
HTTP/1.1 301 Moved Permanently
...
Location: /uuid
...

Your backends might rely on the Ingress/Gateway API implementation to normalize URLs. That said, most Gateway API implementations will have some path normalization enabled by default. For example, Istio, Envoy Gateway, and Kgateway all normalize . and .. segments out of the box. For more details, check the documentation for each Gateway API implementation that you use.

Conclusion

As we all race to respond to the Ingress-NGINX retirement, I hope this blog post instills some confidence that you can migrate safely and effectively despite all the intricacies of Ingress-NGINX.

SIG Network has also been working on supporting the most common Ingress-NGINX annotations (and some of these unexpected behaviors) in Ingress2Gateway to help you translate Ingress-NGINX configuration into Gateway API, and offer alternatives to unsupported behavior.

SIG Network released Gateway API 1.5 earlier today (27th February 2026), which graduates features such as ListenerSet (that allow app developers to better manage TLS certificates), and the HTTPRoute CORS filter that allows CORS configuration.


  1. You can use Istio purely as Gateway API controller with no other service mesh features. ↩︎

27 Feb 2026 3:30pm GMT