23 Dec 2025
Kubernetes Blog
Kubernetes v1.35: Fine-grained Supplemental Groups Control Graduates to GA
On behalf of Kubernetes SIG Node, we are pleased to announce the graduation of fine-grained supplemental groups control to General Availability (GA) in Kubernetes v1.35!
The new Pod field, supplementalGroupsPolicy, was introduced as an opt-in alpha feature for Kubernetes v1.31, and then had graduated to beta in v1.33. Now, the feature is generally available. This feature allows you to implement more precise control over supplemental groups in Linux containers that can strengthen the security posture particularly in accessing volumes. Moreover, it also enhances the transparency of UID/GID details in containers, offering improved security oversight.
If you are planning to upgrade your cluster from v1.32 or an earlier version, please be aware that some behavioral breaking change introduced since beta (v1.33). For more details, see the behavioral changes introduced in beta and the upgrade considerations sections of the previous blog for graduation to beta.
Motivation: Implicit group memberships defined in /etc/group in the container image
Even though the majority of Kubernetes cluster admins/users may not be aware of this, by default Kubernetes merges group information from the Pod with information defined in /etc/group in the container image.
Here's an example; a Pod manifest that specifies spec.securityContext.runAsUser: 1000, spec.securityContext.runAsGroup: 3000 and spec.securityContext.supplementalGroups: 4000 as part of the Pod's security context.
apiVersion: v1
kind: Pod
metadata:
name: implicit-groups-example
spec:
securityContext:
runAsUser: 1000
runAsGroup: 3000
supplementalGroups: [4000]
containers:
- name: example-container
image: registry.k8s.io/e2e-test-images/agnhost:2.45
command: [ "sh", "-c", "sleep 1h" ]
securityContext:
allowPrivilegeEscalation: false
What is the result of id command in the example-container container? The output should be similar to this:
uid=1000 gid=3000 groups=3000,4000,50000
Where does group ID 50000 in supplementary groups (groups field) come from, even though 50000 is not defined in the Pod's manifest at all? The answer is /etc/group file in the container image.
Checking the contents of /etc/group in the container image contains something like the following:
user-defined-in-image:x:1000:
group-defined-in-image:x:50000:user-defined-in-image
This shows that the container's primary user 1000 belongs to the group 50000 in the last entry.
Thus, the group membership defined in /etc/group in the container image for the container's primary user is implicitly merged to the information from the Pod. Please note that this was a design decision the current CRI implementations inherited from Docker, and the community never really reconsidered it until now.
What's wrong with it?
The implicitly merged group information from /etc/group in the container image poses a security risk. These implicit GIDs can't be detected or validated by policy engines because there's no record of them in the Pod manifest. This can lead to unexpected access control issues, particularly when accessing volumes (see kubernetes/kubernetes#112879 for details) because file permission is controlled by UID/GIDs in Linux.
Fine-grained supplemental groups control in a Pod: supplementaryGroupsPolicy
To tackle this problem, a Pod's .spec.securityContext now includes supplementalGroupsPolicy field.
This field lets you control how Kubernetes calculates the supplementary groups for container processes within a Pod. The available policies are:
-
Merge: The group membership defined in
/etc/groupfor the container's primary user will be merged. If not specified, this policy will be applied (i.e. as-is behavior for backward compatibility). -
Strict: Only the group IDs specified in
fsGroup,supplementalGroups, orrunAsGroupare attached as supplementary groups to the container processes. Group memberships defined in/etc/groupfor the container's primary user are ignored.
I'll explain how the Strict policy works. The following Pod manifest specifies supplementalGroupsPolicy: Strict:
apiVersion: v1
kind: Pod
metadata:
name: strict-supplementalgroups-policy-example
spec:
securityContext:
runAsUser: 1000
runAsGroup: 3000
supplementalGroups: [4000]
supplementalGroupsPolicy: Strict
containers:
- name: example-container
image: registry.k8s.io/e2e-test-images/agnhost:2.45
command: [ "sh", "-c", "sleep 1h" ]
securityContext:
allowPrivilegeEscalation: false
The result of id command in the example-container container should be similar to this:
uid=1000 gid=3000 groups=3000,4000
You can see Strict policy can exclude group 50000 from groups!
Thus, ensuring supplementalGroupsPolicy: Strict (enforced by some policy mechanism) helps prevent the implicit supplementary groups in a Pod.
Note:
A container with sufficient privileges can change its process identity. The supplementalGroupsPolicy only affect the initial process identity.
Read on for more details.
Attached process identity in Pod status
This feature also exposes the process identity attached to the first container process of the container via .status.containerStatuses[].user.linux field. It would be helpful to see if implicit group IDs are attached.
...
status:
containerStatuses:
- name: ctr
user:
linux:
gid: 3000
supplementalGroups:
- 3000
- 4000
uid: 1000
...
Note:
Please note that the values in status.containerStatuses[].user.linux field is the firstly attached process identity to the first container process in the container. If the container has sufficient privilege to call system calls related to process identity (e.g. setuid(2), setgid(2) or setgroups(2), etc.), the container process can change its identity. Thus, the actual process identity will be dynamic.
There are several ways to restrict these permissions in containers. We suggest the belows as simple solutions:
- setting
privilege: falseandallowPrivilegeEscalation: falsein your container'ssecurityContext, or - conform your pod to
Restrictedpolicy in Pod Security Standard.
Also, kubelet has no visibility into NRI plugins or container runtime internal workings. Cluster Administrator configuring nodes or highly privilege workloads with the permission of a local administrator may change supplemental groups for any pod. However this is outside of a scope of Kubernetes control and should not be a concern for security-hardened nodes.
Strict policy requires up-to-date container runtimes
The high level container runtime (e.g. containerd, CRI-O) plays a key role for calculating supplementary group ids that will be attached to the containers. Thus, supplementalGroupsPolicy: Strict requires a CRI runtime that support this feature. The old behavior (supplementalGroupsPolicy: Merge) can work with a CRI runtime that does not support this feature, because this policy is fully backward compatible.
Here are some CRI runtimes that support this feature, and the versions you need to be running:
- containerd: v2.0 or later
- CRI-O: v1.31 or later
And, you can see if the feature is supported in the Node's .status.features.supplementalGroupsPolicy field. Please note that this field is different from status.declaredFeatures introduced in KEP-5328: Node Declared Features(formerly Node Capabilities).
apiVersion: v1
kind: Node
...
status:
features:
supplementalGroupsPolicy: true
As container runtimes support this feature universally, various security policies may start enforcing the Strict behavior as more secure. It is the best practice to ensure that your Pods are ready for this enforcement and all supplemental groups are transparently declared in Pod spec, rather than in images.
Getting involved
This enhancement was driven by the SIG Node community. Please join us to connect with the community and share your ideas and feedback around the above feature and beyond. We look forward to hearing from you!
How can I learn more?
- Configure a Security Context for a Pod or Container for the further details of
supplementalGroupsPolicy - KEP-3619: Fine-grained SupplementalGroups control
23 Dec 2025 6:30pm GMT
22 Dec 2025
Kubernetes Blog
Kubernetes v1.35: Kubelet Configuration Drop-in Directory Graduates to GA
With the recent v1.35 release of Kubernetes, support for a kubelet configuration drop-in directory is generally available. The newly stable feature simplifies the management of kubelet configuration across large, heterogeneous clusters.
With v1.35, the kubelet command line argument --config-dir is production-ready and fully supported, allowing you to specify a directory containing kubelet configuration drop-in files. All files in that directory will be automatically merged with your main kubelet configuration. This allows cluster administrators to maintain a cohesive base configuration for kubelets while enabling targeted customizations for different node groups or use cases, and without complex tooling or manual configuration management.
The problem: managing kubelet configuration at scale
As Kubernetes clusters grow larger and more complex, they often include heterogeneous node pools with different hardware capabilities, workload requirements, and operational constraints. This diversity necessitates different kubelet configurations across node groups-yet managing these varied configurations at scale becomes increasingly challenging. Several pain points emerge:
- Configuration drift: Different nodes may have slightly different configurations, leading to inconsistent behavior
- Node group customization: GPU nodes, edge nodes, and standard compute nodes often require different kubelet settings
- Operational overhead: Maintaining separate, complete configuration files for each node type is error-prone and difficult to audit
- Change management: Rolling out configuration changes across heterogeneous node pools requires careful coordination
Before this support was added to Kubernetes, cluster administrators had to choose between using a single monolithic configuration file for all nodes, manually maintaining multiple complete configuration files, or relying on separate tooling. Each approach had its own drawbacks. This graduation to stable gives cluster administrators a fully supported fourth way to solve that challenge.
Example use cases
Managing heterogeneous node pools
Consider a cluster with multiple node types: standard compute nodes, high-capacity nodes (such as those with GPUs or large amounts of memory), and edge nodes with specialized requirements.
Base configuration
File: 00-base.conf
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
clusterDNS:
- "10.96.0.10"
clusterDomain: cluster.local
High-capacity node override
File: 50-high-capacity-nodes.conf
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
maxPods: 50
systemReserved:
memory: "4Gi"
cpu: "1000m"
Edge node override
File: 50-edge-nodes.conf (edge compute typically has lower capacity)
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
evictionHard:
memory.available: "500Mi"
nodefs.available: "5%"
With this structure, high-capacity nodes apply both the base configuration and the capacity-specific overrides, while edge nodes apply the base configuration with edge-specific settings.
Gradual configuration rollouts
When rolling out configuration changes, you can:
- Add a new drop-in file with a high numeric prefix (e.g.,
99-new-feature.conf) - Test the changes on a subset of nodes
- Gradually roll out to more nodes
- Once stable, merge changes into the base configuration
Viewing the merged configuration
Since configuration is now spread across multiple files, you can inspect the final merged configuration using the kubelet's /configz endpoint:
# Start kubectl proxy
kubectl proxy
# In another terminal, fetch the merged configuration
# Change the '<node-name>' placeholder before running the curl command
curl -X GET http://127.0.0.1:8001/api/v1/nodes/<node-name>/proxy/configz | jq .
This shows the actual configuration the kubelet is using after all merging has been applied. The merged configuration also includes any configuration settings that were specified via kubelet command-line arguments.
For detailed setup instructions, configuration examples, and merging behavior, see the official documentation:
Good practices
When using the kubelet configuration drop-in directory:
-
Test configurations incrementally: Always test new drop-in configurations on a subset of nodes before rolling out cluster-wide to minimize risk
-
Version control your drop-ins: Store your drop-in configuration files in version control (or the configuration source from which these are generated) alongside your infrastructure as code to track changes and enable easy rollbacks
-
Use numeric prefixes for predictable ordering: Name files with numeric prefixes (e.g.,
00-,50-,90-) to explicitly control merge order and make the configuration layering obvious to other administrators -
Be mindful of temporary files: Some text editors automatically create backup files (such as
.bak,.swp, or files with~suffix) in the same directory when editing. Ensure these temporary or backup files are not left in the configuration directory, as they may be processed by the kubelet
Acknowledgments
This feature was developed through the collaborative efforts of SIG Node. Special thanks to all contributors who helped design, implement, test, and document this feature across its journey from alpha in v1.28, through beta in v1.30, to GA in v1.35.
To provide feedback on this feature, join the Kubernetes Node Special Interest Group, participate in discussions on the public Slack channel (#sig-node), or file an issue on GitHub.
Get involved
If you have feedback or questions about kubelet configuration management, or want to share your experience using this feature, join the discussion:
- SIG Node community page
- Kubernetes Slack in the #sig-node channel
- SIG Node mailing list
SIG Node would love to hear about your experiences using this feature in production!
22 Dec 2025 6:30pm GMT
21 Dec 2025
Kubernetes Blog
Avoiding Zombie Cluster Members When Upgrading to etcd v3.6
This article is a mirror of an original that was recently published to the official etcd blog. The key takeaway? Always upgrade to etcd v3.5.26 or later before moving to v3.6. This ensures your cluster is automatically repaired, and avoids zombie members.
Issue summary
Recently, the etcd community addressed an issue that may appear when users upgrade from v3.5 to v3.6. This bug can cause the cluster to report "zombie members", which are etcd nodes that were removed from the database cluster some time ago, and are re-appearing and joining database consensus. The etcd cluster is then inoperable until these zombie members are removed.
In etcd v3.5 and earlier, the v2store was the source of truth for membership data, even though the v3store was also present. As a part of our v2store deprecation plan, in v3.6 the v3store is the source of truth for cluster membership. Through a bug report we found out that, in some older clusters, v2store and v3store could become inconsistent. This inconsistency manifests after upgrading as seeing old, removed "zombie" cluster members re-appearing in the cluster.
The fix and upgrade path
We've added a mechanism in etcd v3.5.26 to automatically sync v3store from v2store, ensuring that affected clusters are repaired before upgrading to 3.6.x.
To support the many users currently upgrading to 3.6, we have provided the following safe upgrade path:
- Upgrade your cluster to v3.5.26 or later.
- Wait and confirm that all members are healthy post-update.
- Upgrade to v3.6.
We are unable to provide a safe workaround path for users who have some obstacle preventing updating to v3.5.26. As such, if v3.5.26 is not available from your packaging source or vendor, you should delay upgrading to v3.6 until it is.
Additional technical detail
Information below is offered for reference only. Users can follow the safe upgrade path without knowledge of the following details.
This issue is encountered with clusters that have been running in production on etcd v3.5.25 or earlier. It is a side effect of adding and removing members from the cluster, or recovering the cluster from failure. This means that the issue is more likely the older the etcd cluster is, but it cannot be ruled out for any user regardless of the age of the cluster.
etcd maintainers, working with issue reporters, have found three possible triggers for the issue based on symptoms and an analysis of etcd code and logs:
- Bug in
etcdctl snapshot restore(v3.4 and old versions): When restoring a snapshot usingetcdctl snapshot restore, etcdctl was supposed to remove existing members before adding the new ones. In v3.4, due to a bug, old members were not removed, resulting in zombie members. Refer to the comment on etcdctl. --force-new-clusterin v3.5 and earlier versions: In rare cases, forcibly creating a new single-member cluster did not fully remove old members, leaving zombies. The issue was resolved in v3.5.22. Please refer to this PR in the Raft project for detailed technical information.- --unsafe-no-sync enabled: If
--unsafe-no-syncis enabled, in rare cases etcd might persist a membership change to v3store but crash before writing it to the WAL, causing inconsistency between v2store and v3store. This is a problem for single-member clusters. For multi-member clusters, forcibly creating a new single-member cluster from the crashed node's data may lead to zombie members.
Note
--unsafe-no-sync is generally not recommended, as it may break the guarantees given by the consensus protocol.Importantly, there may be other triggers for v2store and v3store membership data becoming inconsistent that we have not yet found. This means that you cannot assume that you are safe just because you have not performed any of the three actions above. Once users are upgraded to etcd v3.6, v3store becomes the source of membership data, and further inconsistency is not possible.
Advanced users who want to verify the consistency between v2store and v3store can follow the steps described in this comment. This check is not required to fix the issue, nor does SIG etcd recommend bypassing the v3.5.26 update regardless of the results of the check.
Key takeaway
Always upgrade to v3.5.26 or later before moving to v3.6. This ensures your cluster is automatically repaired and avoids zombie members.
Acknowledgements
We would like to thank Christian Baumann for reporting this long-standing upgrade issue. His report and follow-up work helped bring the issue to our attention so that we could investigate and resolve it upstream.
21 Dec 2025 12:00am GMT