24 Apr 2026
Kubernetes Blog
Kubernetes v1.36: Fine-Grained Kubelet API Authorization Graduates to GA
On behalf of Kubernetes SIG Auth and SIG Node, we are pleased to announce the graduation of fine-grained kubelet API authorization to General Availability (GA) in Kubernetes v1.36!
The KubeletFineGrainedAuthz feature gate was introduced as an opt-in alpha feature in Kubernetes v1.32, then graduated to beta (enabled by default) in v1.33. Now, the feature is generally available and the feature gate is locked to enabled. This feature enables more precise, least-privilege access control over the kubelet's HTTPS API, replacing the need to grant the overly broad nodes/proxy permission for common monitoring and observability use cases.
Motivation: the nodes/proxy problem
The kubelet exposes an HTTPS endpoint with several APIs that give access to data of varying sensitivity, including pod listings, node metrics, container logs, and, critically, the ability to execute commands inside running containers.
Prior to this feature, kubelet authorization used a coarse-grained model. When webhook authorization was enabled, almost all kubelet API paths were mapped to a single nodes/proxy subresource. This meant that any workload needing to read metrics or health status from the kubelet required nodes/proxy permission, the same permission that also grants the ability to execute arbitrary commands in any container running on the node.
What's wrong with that?
Granting nodes/proxy to monitoring agents, log collectors, or health-checking tools violates the principle of least privilege. If any of those workloads were compromised, an attacker would gain the ability to run commands in every container on the node. The nodes/proxy permission is effectively a node-level superuser capability, and granting it broadly dramatically increases the blast radius of a security incident.
This problem has been well understood in the community for years (see kubernetes/kubernetes#83465), and was the driving motivation behind this enhancement KEP-2862.
The nodes/proxy GET WebSocket RCE risk
The situation is more severe than it might appear at first glance. Security researchers demonstrated in early 2026 that nodes/proxy GET alone, which is the minimal read-only permission routinely granted to monitoring tools, can be abused to execute commands in any pod on reachable nodes.
The root cause is a mismatch between how WebSocket connections work and how the kubelet maps HTTP methods to RBAC verbs. The WebSocket protocol (RFC 6455) requires an HTTP GET request for the initial connection handshake. The kubelet maps this GET to the RBAC get verb and authorizes the request without performing a secondary check to confirm that CREATE permission is also present for the write operation that follows. Using a WebSocket client like websocat, an attacker can reach the kubelet's /exec endpoint directly on port 10250 and execute arbitrary commands:
websocat --insecure \
--header "Authorization: Bearer $TOKEN" \
--protocol v4.channel.k8s.io \
"wss://$NODE_IP:10250/exec/default/nginx/nginx?output=1&error=1&command=id"
uid=0(root) gid=0(root) groups=0(root)
Fine-grained kubelet authorization: how it works
With KubeletFineGrainedAuthz, the kubelet now performs an additional, more specific authorization check before falling back to the nodes/proxy subresource. Several commonly used kubelet API paths are mapped to their own dedicated subresources:
kubelet API |
Resource | Subresource |
|---|---|---|
/stats/* |
nodes | stats |
/metrics/* |
nodes | metrics |
/logs/* |
nodes | log |
/pods |
nodes | pods, proxy |
/runningPods/ |
nodes | pods, proxy |
/healthz |
nodes | healthz, proxy |
/configz |
nodes | configz, proxy |
/spec/* |
nodes | spec |
/checkpoint/* |
nodes | checkpoint |
| all others | nodes | proxy |
For the endpoints that now have fine-grained subresources (/pods, /runningPods/, /healthz, /configz), the kubelet first sends a SubjectAccessReview for the specific subresource. If that check succeeds, the request is authorized. If it fails, the kubelet retries with the coarse-grained nodes/proxy subresource for backward compatibility.
This dual-check approach ensures a smooth migration path. Existing workloads with nodes/proxy permissions continue to work, while new deployments can adopt least-privilege access from day one.
What this means in practice
Consider a Prometheus node exporter or a monitoring DaemonSet that needs to scrape /metrics from the kubelet. Previously, you would need an RBAC ClusterRole like this:
# Old approach: overly broad
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: monitoring-agent
rules:
- apiGroups: [""]
resources: ["nodes/proxy"]
verbs: ["get"]
This grants the monitoring agent far more access than it needs. With fine-grained authorization, you can now scope the permissions precisely:
# New approach: least privilege
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: monitoring-agent
rules:
- apiGroups: [""]
resources: ["nodes/metrics", "nodes/stats"]
verbs: ["get"]
The monitoring agent can now read metrics and stats from the kubelet without ever being able to execute commands in containers.
Updated system:kubelet-api-admin ClusterRole
When RBAC authorization is enabled, the built-in system:kubelet-api-admin ClusterRole is automatically updated to include permissions for all the new fine-grained subresources. This ensures that cluster administrators who already use this role, including the API server's kubelet client, continue to have full access without any manual configuration changes.
The role now includes permissions for:
nodes/proxynodes/statsnodes/metricsnodes/lognodes/specnodes/checkpointnodes/configznodes/healthznodes/pods
Upgrade considerations
Because the kubelet performs a dual authorization check (fine-grained first, then falling back to nodes/proxy), upgrading to v1.36 should be seamless for most clusters:
- Existing workloads with
nodes/proxypermissions continue to work without changes. The fallback tonodes/proxyensures backward compatibility. - The API server always has
nodes/proxypermissions viasystem:kubelet-api-admin, sokube-apiserver-to-kubeletcommunication is unaffected regardless of feature gate state. - Mixed-version clusters are handled gracefully. If a
kubeletsupports fine-grained authorization but the API server does not (or vice versa),nodes/proxypermissions serve as the fallback.
Verifying the feature is enabled
You can confirm that the feature is active on a given node by checking the kubelet metrics endpoint. Since the metrics endpoint on port 10250 requires authorization, you'll first need to create appropriate RBAC bindings for the pod or ServiceAccount making the request.
Step 1: Create a ServiceAccount and ClusterRole
apiVersion: v1
kind: ServiceAccount
metadata:
name: kubelet-metrics-checker
namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: kubelet-metrics-reader
rules:
- apiGroups: [""]
resources: ["nodes/metrics"]
verbs: ["get"]
Step 2: Bind the ClusterRole to the ServiceAccount
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: kubelet-metrics-checker
subjects:
- kind: ServiceAccount
name: kubelet-metrics-checker
namespace: default
roleRef:
kind: ClusterRole
name: kubelet-metrics-reader
apiGroup: rbac.authorization.k8s.io
Apply both manifests:
kubectl apply -f serviceaccount.yaml
kubectl apply -f clusterrole.yaml
kubectl apply -f clusterrolebinding.yaml
Step 3: Run a pod with the ServiceAccount and check the feature flag
kubectl run kubelet-check \
--image=curlimages/curl \
--serviceaccount=kubelet-metrics-checker \
--restart=Never \
--rm -it \
-- sh
Then from within the pod, retrieve the node IP and query the metrics endpoint:
# Get the token
TOKEN=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)
# Query the kubelet metrics and filter for the feature gate
curl -sk \
--header "Authorization: Bearer $TOKEN" \
https://$NODE_IP:10250/metrics \
| grep kubernetes_feature_enabled \
| grep KubeletFineGrainedAuthz
If the feature is enabled, you should see output like:
kubernetes_feature_enabled{name="KubeletFineGrainedAuthz",stage="GA"} 1
Note: Replace
$NODE_IPwith the IP address of the node you want to check. You can retrieve node IPs withkubectl get nodes -o wide.
The journey from alpha to GA
| Release | Stage | Details |
|---|---|---|
| v1.32 | Alpha | Feature gate KubeletFineGrainedAuthz introduced, disabled by default |
| v1.33 | Beta | Enabled by default; fine-grained checks for /pods, /runningPods/, /healthz, /configz |
| v1.36 | GA | Feature gate locked to enabled; fine-grained kubelet authorization is always active |
What's next?
With fine-grained kubelet authorization now GA, the Kubernetes community can begin recommending and eventually enforcing the use of specific subresources instead of nodes/proxy for monitoring and observability workloads. The urgency of this migration is underscored by research showing that nodes/proxy GET can be abused for unlogged remote code execution via the WebSocket protocol. This risk is present in the default RBAC configurations of dozens of widely deployed Helm charts. Over time, we expect:
- Ecosystem adoption: Monitoring tools like Prometheus, Datadog agents, and other
DaemonSetscan update their default RBAC configurations to usenodes/metrics,nodes/stats, andnodes/podsinstead ofnodes/proxy. This directly eliminates the WebSocket RCE attack surface for those workloads. - Policy enforcement: Admission controllers and policy engines can flag or reject RBAC bindings that grant
nodes/proxywhen fine-grained alternatives exist, helping organizations adopt least-privilege access at scale. - Deprecation path: As adoption grows,
nodes/proxymay eventually be deprecated for monitoring use cases, further reducing the attack surface of Kubernetes clusters.
Getting involved
This enhancement was driven by SIG Auth and SIG Node. If you are interested in contributing to the security and authorization features of Kubernetes, please join us:
- SIG Auth
- SIG Node
- Slack:
#sig-authand#sig-node - KEP-2862: Fine-Grained Kubelet API Authorization
We look forward to hearing your feedback and experiences with this feature!
24 Apr 2026 6:35pm GMT
23 Apr 2026
Kubernetes Blog
Kubernetes v1.36: User Namespaces in Kubernetes are finally GA
After several years of development, User Namespaces support in Kubernetes reached General Availability (GA) with the v1.36 release. This is a Linux-only feature.
For those of us working on low level container runtimes and rootless technologies, this has been a long awaited milestone. We finally reached the point where "rootless" security isolation can be used for Kubernetes workloads.
This feature also enables a critical pattern: running workloads with privileges and still being confined in the user namespace. When hostUsers: false is set, capabilities like CAP_NET_ADMIN become namespaced, meaning they grant administrative power over container local resources without affecting the host. This effectively enables new use cases that were not possible before without running a fully privileged container.
The Problem with UID 0
A process running as root inside a container is also seen from the kernel as root on the host. If an attacker manages to break out of the container, whether through a kernel vulnerability or a misconfigured mount, they are root on the host.
While there are many security measures in place for running containers, these measures don't change the underlying identity of the process, it still has some "parts" of root.
The engine: ID-mapped mounts
The road to GA wasn't just about the Kubernetes API; it was about making the kernel work for us. In the early stages, one of the biggest blockers was volume ownership. If you mapped a container to a high UID range, the Kubelet had to recursively chown every file in the attached volume so the container could read/write them. For large volumes, this was such an expensive operation that destroyed startup performance.
The key enabler was ID-mapped mounts (introduced in Linux 5.12 and refined in later versions). Instead of rewriting file ownership on disk, the kernel remaps it at mount time.
When a volume is mounted into a Pod with User Namespaces enabled, the kernel performs a transparent translation of the UIDs (user ids) and GIDs (group ids). To the container, the files appear owned by UID 0. On disk, file ownership is unchanged - no chown is needed. This is an O(1) operation, instant and efficient.
Using it in Kubernetes v1.36
Using user namespaces is straightforward: all you need to do is set hostUsers: false in your Pod spec. No changes to your container images, no complex configuration. The interface remains the same one introduced during the Alpha phase. In the spec for a Pod (or PodTemplate), you explicitly opt-out of the host user namespace:
apiVersion: v1
kind: Pod
metadata:
name: isolated-workload
spec:
hostUsers: false
containers:
- name: app
image: fedora:42
securityContext:
runAsUser: 0
For more details on how user namespaces work in practice and demos of CVEs rated HIGH mitigated, see the previous blog posts: User Namespaces alpha, User Namespaces stateful pods in alpha, User Namespaces beta, and User Namespaces enabled by default.
Getting involved
If you're interested in user namespaces or want to contribute, here are some useful links:
Acknowledgments
This feature has been years in the making: the first KEP was opened 10 years ago by other contributors, and we have been actively working on it for the last 6 years. We'd like to thank everyone who contributed across SIG Node, the container runtimes, and the Linux kernel. Special thanks to the reviewers and early adopters who helped shape the design through multiple alpha and beta cycles.
23 Apr 2026 6:35pm GMT
22 Apr 2026
Kubernetes Blog
SELinux Volume Label Changes goes GA (and likely implications in v1.37)
If you run Kubernetes on Linux with SELinux in enforcing mode, plan ahead: a future release (anticipated to be v1.37) is expected to turn the SELinuxMount feature gate on by default. This makes volume setup faster for most workloads, but it can break applications that still depend on the older recursive relabeling model in subtle ways (for example, sharing one volume between privileged and unprivileged Pods on the same node). Kubernetes v1.36 is the right release to audit your cluster and fix or opt out of this change.
If your nodes do not use SELinux, nothing changes for you: the kubelet skips the whole SELinux logic when SELinux is unavailable or disabled in the Linux kernel. You can skip this article completely.
This blog builds on the earlier work described in the Kubernetes 1.27: Efficient SELinux Relabeling (Beta) post, where the SELinuxMountReadWriteOncePod feature gate was described. The problem to be addressed remains the same, however, this blog extends that same approach to all volumes.
The problem
Linux systems with Security Enhanced Linux (SELinux) enabled use labels attached to objects (for example, files and network sockets) to make access control decisions. Historically, the container runtime applies SELinux labels to a Pod and all its volumes. Kubernetes only passes the SELinux label from a Pod's securityContext fields to the container runtime.
The container runtime then recursively changes the SELinux label on all files that are visible to the Pod's containers. This can be time-consuming if there are many files on the volume, especially when the volume is on a remote filesystem.
Caution:
If a container usessubPath of a volume, only that subPath of the whole volume is relabeled. This allows two Pods that have two different SELinux labels to use the same volume, as long as they use different subpaths of it.If a Pod does not have any SELinux label assigned in the Kubernetes API, the container runtime assigns a unique random label, so a process that potentially escapes the container boundary cannot access data of any other container on the host. The container runtime still recursively relabels all Pod volumes with this random SELinux label.
What Kubernetes is improving
Where the stack supports it, the kubelet can mount the volume with -o context=<label> so the kernel applies the correct label for all inodes on that mount without a recursive inode traversal. That path is gated by feature flags and requires, among other things, that the Pod expose enough of an SELinux label (for example spec.securityContext.seLinuxOptions.level) and that the volume driver opts in (for CSI, CSIDriver field spec.seLinuxMount: true).
The project rolled this out in phases:
- ReadWriteOncePod volumes were handled under the
SELinuxMountReadWriteOncePodfeature gate, on by default since v1.28 and GA in v1.36. - Broader coverage was handled under the
SELinuxMountflag, paired with thespec.securityContext.seLinuxChangePolicyfield on Pods.
If a Pod and its volume meet all of the following conditions, Kubernetes will mount the volume directly with the right SELinux label. Such a mount will happen in a constant time and the container runtime will not need to recursively relabel any files on it. For such a mount to happen:
-
The operating system must support SELinux. Without SELinux support detected, the kubelet and the container runtime do not do anything with regard to SELinux.
-
The feature gate
SELinuxMountReadWriteOncePodmust be enabled. If you're running Kubernetes v1.36, the feature is enabled unconditionally. -
The Pod must use a PersistentVolumeClaim with applicable
accessModes:- Either the volume has
accessModes: ["ReadWriteOncePod"] - or the volume can use any other access mode(s), provided that the feature gates
SELinuxChangePolicyandSELinuxMountare both enabled and the Pod hasspec.securityContext.seLinuxChangePolicyset to nil (default) or asMountOption.
The feature gate
SELinuxMountis Beta and disabled by default in Kubernetes 1.36. All other SELinux-related feature gates are now General Availability (GA).With any of these feature gates disabled, SELinux labels will always be applied by the container runtime via recursively traversing through the volume (or its subPaths).
- Either the volume has
-
The Pod must have at least
seLinuxOptions.levelassigned in its security context or all containers in that Pod must have it set in their container-level security contexts. Kubernetes will read the defaultuser,roleandtypefrom the operating system defaults (typicallysystem_u,system_randcontainer_t).Without Kubernetes knowing at least the SELinux
level, the container runtime will assign a random level after the volumes are mounted. The container runtime will still relabel the volumes recursively in that case. -
The volume plugin or the CSI driver responsible for the volume supports mounting with SELinux mount options.
These in-tree volume plugins support mounting with SELinux mount options:
fcandiscsi.CSI drivers that support mounting with SELinux mount options must declare this capability in their CSIDriver instance by setting the
seLinuxMountfield.Volumes managed by other volume plugins or CSI drivers that do not set
seLinuxMount: truewill be recursively relabeled by the container runtime.
The breaking change
The SELinuxMount feature gate changes what volumes can be shared among multiple Pods in a subtle way.
Both of these cases work with recursive relabeling:
- Two Pods with different SELinux labels share the same volume, but each of them uses a different
subPathto the volume. - A privileged Pod and an unprivileged Pod share the same volume.
The above scenarios will not work with modern, target behavior for Kubernetes mounting when SELinux is active. Instead, one of these Pods will be stuck in ContainerCreating until the other Pod is terminated.
The first case is very niche and hasn't been seen in practice. Although the second case is still quite rare, this setup has been observed in applications. Kubernetes v1.36 offers metrics and events to identify these Pods and allows cluster administrators to opt out of the mount option through the Pod field spec.securityContext.seLinuxChangePolicy.
seLinuxChangePolicy
The new Pod field spec.securityContext.seLinuxChangePolicy specifies how the SELinux label is applied to all Pod volumes. In Kubernetes v1.36, this field is part of the stable Pod API.
There are three choices available:
- field not set (default)
- In Kubernetes v1.36, the behavior depends on whether the
SELinuxMountfeature gate is enabled. By default that feature gate is not enabled, and the SELinux label is applied recursively. If you enable that feature gate in your cluster, and all other conditions are met, labelling will be applied using the mount option. Recursive- the SELinux label is applied recursively. This opts out from using the mount option.
MountOption- the SELinux label is applied using the mount option, if all other conditions are met. This choice is available only when the
SELinuxMountfeature gate is enabled.
SELinux warning controller (optional)
Kubernetes v1.36 provides a new controller within the control plane, selinux-warning-controller. This controller runs within the kube-controller-manager controller. To use it, you pass --controllers=*,selinux-warning-controller on the kube-controller-manager command line; you also must not have explicitly overridden the SELinuxChangePolicy feature gate to be disabled.
The controller watches all Pods in the cluster and emits an Event when it finds two Pods that share the same volume in a way that is not compatible with the SELinuxMount feature gate. All such conflicting Pods will receive an event, such as:
SELinuxLabel "system_u:system_r:container_t:s0:c98,c99" conflicts with pod my-other-pod that uses the same volume as this pod with SELinuxLabel "system_u:system_r:container_t:s0:c0,c1". If both pods land on the same node, only one of them may access the volume.
The actual Pod name may be censored when the conflicting Pods run in different namespaces to prevent leaking information across namespace boundaries.
The controller reports such an event even when these Pods don't run on the same node, to make sure all Pods work regardless of the Kubernetes scheduler decision. They could run on the same node next time.
In addition, the controller emits the metric selinux_warning_controller_selinux_volume_conflict that lists all current conflicts among Pods. The metric has labels that identify the conflicting Pods and their SELinux labels, such as:
selinux_warning_controller_selinux_volume_conflict{pod1_name="my-other-pod",pod1_namespace="default",pod1_value="system_u:object_r:container_file_t:s0:c0,c1",pod2_name="my-pod",pod2_namespace="default",pod2_value="system_u:object_r:container_file_t:s0:c0,c2",property="SELinuxLabel"} 1
There is a security consequence from enabling this opt-in controller: it may reveal namespace names, which are always present in the metric. The Kubernetes project assumes only cluster administrators can access kube-controller-manager metrics.
Suggested upgrade path
To ensure a smooth upgrade path from v1.36 to a release with SELinuxMount enabled (anticipated to be v1.37), we suggest you follow these steps:
- Enable selinux-warning-controller in the kube-controller-manager.
- Check the
selinux_warning_controller_selinux_volume_conflictmetric. It shows all potential conflicts between Pods. For each conflicting Pod (Deployment, StatefulSet, etc.), either apply the opt-out (set Pod'sspec.securityContext.seLinuxChangePolicy: Recursive) or re-architect the application to remove such a conflict. For example, do your Pods really need to run as privileged? - Check the
volume_manager_selinux_volume_context_mismatch_warnings_totalmetric. This metric is emitted by the kubelet when it actually starts a Pod that runs whenSELinuxMountis disabled, but such a Pod won't start whenSELinuxMountis enabled. This metric lists the number of Pods that will experience a true conflict. Unfortunately, this metric does not expose the exact Pod name as a label. The full Pod name is available only in theselinux_warning_controller_selinux_volume_conflictmetric. - Once both metrics have been accounted for, upgrade to a Kubernetes version that has
SELinuxMountenabled.
Consider using a MutatingAdmissionPolicy, a mutating webhook, or a policy engine like Kyverno or Gatekeeper to apply the opt-out to all Pods in a namespace or across the entire cluster.
When SELinuxMount is enabled, the kubelet will emit the metric volume_manager_selinux_volume_context_mismatch_errors_total with the number of Pods that could not start because their SELinux label conflicts with an existing Pod that uses the same volume. The exact Pod names should still be available in the selinux_warning_controller_selinux_volume_conflict metric, if the selinux-warning-controller is enabled.
Further reading
- KEP: Speed up SELinux volume relabeling using mounts
- SELinux Volume Relabeling Feature Gates
- Story 3: cluster upgrade
- Configure a security context for a Pod - Efficient SELinux volume relabeling and selinux-warning-controller
Acknowledgements
If you run into issues, have feedback, or want to contribute, find us on the Kubernetes Slack in #sig-node and #sig-storage or join a SIG Node or SIG Storage meetings.
22 Apr 2026 6:35pm GMT