05 May 2026
Kubernetes Blog
Kubernetes v1.36: Declarative Validation Graduates to GA
In Kubernetes v1.36, Declarative Validation for Kubernetes native types has reached General Availability (GA).
For users, this means more reliable, predictable, and better-documented APIs. By moving to a declarative model, the project also unlocks the future ability to publish validation rules via OpenAPI and integrate with ecosystem tools like Kubebuilder. For contributors and ecosystem developers, this replaces thousands of lines of handwritten validation code with a unified, maintainable framework.
This post covers why this migration was necessary, how the declarative validation framework works, and what new capabilities come with this GA release.
The Motivation: Escaping the "Handwritten" Technical Debt
For years, the validation of Kubernetes native APIs relied almost entirely on handwritten Go code. If a field needed to be bounded by a minimum value, or if two fields needed to be mutually exclusive, developers had to write explicit Go functions to enforce those constraints.
As the Kubernetes API surface expanded, this approach led to several systemic issues:
- Technical Debt: The project accumulated roughly 18,000 lines of boilerplate validation code. This code was difficult to maintain, error-prone, and required intense scrutiny during code reviews.
- Inconsistency: Without a centralized framework, validation rules were sometimes applied inconsistently across different resources.
- Opaque APIs: Handwritten validation logic was difficult to discover or analyze programmatically. This meant clients and tooling couldn't predictably know validation rules without consulting the source code or encountering errors at runtime.
The solution proposed by SIG API Machinery was Declarative Validation: using Interface Definition Language (IDL) tags (specifically +k8s: marker tags) directly within types.go files to define validation rules.
Enter validation-gen
At the core of the declarative validation feature is a new code generator called validation-gen. Just as Kubernetes uses generators for deep copies, conversions, and defaulting, validation-gen parses +k8s: tags and automatically generates the corresponding Go validation functions.
These generated functions are then registered seamlessly with the API scheme. The generator is designed as an extensible framework, allowing developers to plug in new "Validators" by describing the tags they parse and the Go logic they should produce.
A Comprehensive Suite of +k8s: Tags
The declarative validation framework introduces a comprehensive suite of marker tags that provide rich validation capabilities highly optimized for Go types. For a full list of supported tags, check out the official documentation. Here is a catalog of some of the most common tags you will now see in the Kubernetes codebase:
- Presence:
+k8s:optional,+k8s:required - Basic Constraints:
+k8s:minimum=0,+k8s:maximum=100,+k8s:maxLength=16,+k8s:format=k8s-short-name - Collections:
+k8s:listType=map,+k8s:listMapKey=type - Unions:
+k8s:unionMember,+k8s:unionDiscriminator - Immutability:
+k8s:immutable,+k8s:update=[NoSet, NoModify, NoClear]
Example Usage:
type ReplicationControllerSpec struct {
// +k8s:optional
// +k8s:minimum=0
Replicas *int32 `json:"replicas,omitempty"`
}
By placing these tags directly above the field definitions, the constraints are self-documenting and immediately visible to anyone reading the type definitions.
Advanced Capabilities: "Ambient Ratcheting"
One of the most substantial outcomes of this work is that validation ratcheting is now a standard, ambient part of the API. In the past, if we needed to tighten validation, we had to first add handwritten ratcheting code, wait a release, and then tighten the validation to avoid breaking existing objects.
With declarative validation, this safety mechanism is built-in. If a user updates an existing object, the validation framework compares the incoming object with the oldObject. If a specific field's value is semantically equivalent to its prior state (i.e., the user didn't change it), the new validation rule is bypassed. This "ambient ratcheting" means we can loosen or tighten validation immediately and in the least disruptive way possible.
Scaling API Reviews with kube-api-linter
Reaching GA required absolute confidence in the generated code, but our vision extends beyond just validation. Declarative validation is a key part of a comprehensive approach to making API review easier, more consistent, and highly scalable.
By moving validation rules out of opaque Go functions and into structured markers, we are empowering tools like kube-api-linter. This linter can now statically analyze API types and enforce API conventions automatically, significantly reducing the manual burden on SIG API Machinery reviewers and providing immediate feedback to contributors.
What's next?
With the release of Kubernetes v1.36, Declarative Validation graduates to General Availability (GA). As a stable feature, the associated DeclarativeValidation feature gate is now enabled by default. It has become the primary mechanism for adding new validation rules to Kubernetes native types.
Looking forward, the project is committed to adopting declarative validation even more extensively. This includes migrating the remaining legacy handwritten validation code for established APIs and requiring its use for all new APIs and new fields. This ongoing transition will continue to shrink the codebase's complexity while enhancing the consistency and reliability of the entire Kubernetes API surface.
Beyond the core migration, declarative validation also unlocks an exciting future for the broader ecosystem. Because validation rules are now defined as structured markers rather than opaque Go code, they can be parsed and reflected in the OpenAPI schemas published by the Kubernetes API server. This paves the way for tools like kubectl, client libraries, and IDEs to perform rich client-side validation before a request is ever sent to the cluster. The same declarative framework can also be consumed by ecosystem tools like Kubebuilder, enabling a more consistent developer experience for authors of Custom Resource Definitions (CRDs).
Getting involved
The migration to declarative validation is an ongoing effort. While the framework itself is GA, there is still work to be done migrating older APIs to the new declarative format.
If you are interested in contributing to the core of Kubernetes API Machinery, this is a fantastic place to start. Check out the validation-gen documentation, look for issues tagged with sig/api-machinery, and join the conversation in the #sig-api-machinery and #sig-api-machinery-dev-tools channels on Kubernetes Slack (for an invitation, visit https://slack.k8s.io/). You can also attend the SIG API Machinery meetings to get involved directly.
Acknowledgments
A huge thank you to everyone who helped bring this feature to GA:
- Tim Hockin
- Joe Betz
- Aaron Prindle
- Lalit Chauhan
- David Eads
- Darshan Murthy
- Jordan Liggitt
- Patrick Ohly
- Maciej Szulik
- Wojciech Tyczynski
- Joel Speed
- Bryce Palmer
And the many others across the Kubernetes community who contributed along the way.
Welcome to the declarative future of Kubernetes validation!
05 May 2026 6:35pm GMT
04 May 2026
Kubernetes Blog
Kubernetes v1.36: Admission Policies That Can't Be Deleted
If you've ever tried to enforce a security policy across a fleet of Kubernetes clusters, you've probably run into a frustrating chicken-and-egg problem. Your admission policies are API objects, which means they don't exist until someone creates them, and they can be deleted by anyone with the right permissions. There's always a window during cluster bootstrap where your policies aren't active yet, and there's no way to prevent a privileged user from removing them.
Kubernetes v1.36 introduces an alpha feature that addresses this: manifest-based admission control. It lets you define admission webhooks and CEL-based policies as files on disk, loaded by the API server at startup, before it serves any requests.
The gap we're closing
Most Kubernetes policy enforcement today works through the API. You create a ValidatingAdmissionPolicy or a webhook configuration as an API object, and the admission controller picks it up. This works well in steady state, but it has some fundamental limitations.
During cluster bootstrap, there's a gap between when the API server starts serving requests and when your policies are created and active. If you're restoring from a backup or recovering from an etcd failure, that gap can be significant.
There's also a self-protection problem. Admission webhooks and policies can't intercept operations on their own configuration resources. Kubernetes skips invoking webhooks on types like ValidatingWebhookConfiguration to avoid circular dependencies. That means a sufficiently privileged user can delete your critical admission policies, and there's nothing in the admission chain to stop them.
We - Kubernetes SIG API Machinery - wanted a way to say "these policies are always on, full stop."
How it works
You add a staticManifestsDir field to the AdmissionConfiguration file that you already pass to the API server via --admission-control-config-file. Point it at a directory, drop your policy YAML files in there, and the API server loads them before it starts serving.
apiVersion: apiserver.config.k8s.io/v1
kind: AdmissionConfiguration
plugins:
- name: ValidatingAdmissionPolicy
configuration:
apiVersion: apiserver.config.k8s.io/v1
kind: ValidatingAdmissionPolicyConfiguration
staticManifestsDir: "/etc/kubernetes/admission/validating-policies/"
The manifest files are standard Kubernetes resource definitions. The only requirement is that all the objects that these manifests define must have names ending in .static.k8s.io. This reserved suffix prevents collisions with API-based configurations and makes it easy to tell where an admission decision came from when you're looking at metrics or audit logs.
Here's a complete example that denies privileged containers outside kube-system:
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingAdmissionPolicy
metadata:
name: "deny-privileged.static.k8s.io"
annotations:
kubernetes.io/description: "Deny launching privileged pods, anywhere this policy is applied"
spec:
failurePolicy: Fail
matchConstraints:
resourceRules:
- apiGroups: [""]
apiVersions: ["v1"]
operations: ["CREATE", "UPDATE"]
resources: ["pods"]
variables:
- name: allContainers
expression: >-
object.spec.containers +
(has(object.spec.initContainers) ? object.spec.initContainers : []) +
(has(object.spec.ephemeralContainers) ? object.spec.ephemeralContainers : [])
validations:
- expression: >-
!variables.allContainers.exists(c,
has(c.securityContext) && has(c.securityContext.privileged) &&
c.securityContext.privileged == true)
message: "Privileged containers are not allowed"
---
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingAdmissionPolicyBinding
metadata:
name: "deny-privileged-binding.static.k8s.io"
annotations:
kubernetes.io/description: "Bind deny-privileged policy to all namespaces except kube-system"
spec:
policyName: "deny-privileged.static.k8s.io"
validationActions:
- Deny
matchResources:
namespaceSelector:
matchExpressions:
- key: "kubernetes.io/metadata.name"
operator: NotIn
values: ["kube-system"]
Protecting what couldn't be protected before
The part we're most excited about is the ability to intercept operations on admission configuration resources themselves.
With API-based admission, webhooks and policies are never invoked on types like ValidatingAdmissionPolicy or ValidatingWebhookConfiguration. That restriction exists for good reason: if a webhook could reject changes to its own configuration, you could end up locked out with no way to fix it through the API.
Manifest-based policies don't have that problem. If a bad policy is blocking something it shouldn't, you fix the file on disk and the API server picks up the change. There's no circular dependency because the recovery path doesn't go through the API.
This means you can write a manifest-based policy that prevents deletion of your critical API-based admission policies. For platform teams managing shared clusters, this is a significant improvement. You can now guarantee that your baseline security policies can't be removed by a cluster admin, accidentally or otherwise.
Here's what that looks like in practice. This policy prevents any modification or deletion of admission resources that carry the platform.example.com/protected: "true" label:
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingAdmissionPolicy
metadata:
name: "protect-policies.static.k8s.io"
annotations:
kubernetes.io/description: "Prevent modification or deletion of protected admission resources"
spec:
failurePolicy: Fail
matchConstraints:
resourceRules:
- apiGroups: ["admissionregistration.k8s.io"]
apiVersions: ["*"]
operations: ["DELETE", "UPDATE"]
resources:
- "validatingadmissionpolicies"
- "validatingadmissionpolicybindings"
- "validatingwebhookconfigurations"
- "mutatingwebhookconfigurations"
validations:
- expression: >-
!has(oldObject.metadata.labels) ||
!('platform.example.com/protected' in oldObject.metadata.labels) ||
oldObject.metadata.labels['platform.example.com/protected'] != 'true'
message: "Protected admission resources cannot be modified or deleted"
---
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingAdmissionPolicyBinding
metadata:
name: "protect-policies-binding.static.k8s.io"
annotations:
kubernetes.io/description: "Bind protect-policies policy to all admission resources"
spec:
policyName: "protect-policies.static.k8s.io"
validationActions:
- Deny
With this in place, any API-based admission policy or webhook configuration labeled platform.example.com/protected: "true" is shielded from tampering. The protection itself lives on disk and can't be removed through the API.
A few things to know
Manifest-based configurations are intentionally self-contained. They can't reference API resources, which means no paramKind for policies, no Service references for admission webhooks (instead they are URL-only), and bindings may only reference policies in the same manifest set. These restrictions exist because the configurations need to work without any cluster state, including at startup before etcd is available.
If you run multiple API server instances, each one loads its own manifest files independently. There's no cross-server synchronization built in. This is the same model as other file-based API server configurations like encryption at rest. When this feature is enabled, Kubernetes exposes a configuration hash as a label on relevant metrics, so you can detect drift.
Files are watched for changes at runtime, so you don't need to restart the API server to update policies. If you update a manifest file, the API server validates the new configuration and swaps it in atomically. If validation fails, it keeps the previous good configuration and logs the error. This means you can roll out policy changes across your fleet using standard configuration management tools (Ansible, Puppet, or even mounted ConfigMaps) without any API server downtime.
The initial load at startup is stricter: if any manifest is invalid, the API server won't start. This is intentional. At startup, failing fast is safer than running without your expected policies.
Try it out
To try this in Kubernetes v1.36:
- Enable the
ManifestBasedAdmissionControlConfigfeature gate for each kube-apiserver. - Create a directory with your static manifest files. If you need to mount that in to the Pod where the API server runs, do that too. Read-only is fine.
- Configure
staticManifestsDirin yourAdmissionConfigurationwith the directory path. - Start the API server with
--admission-control-config-filepointing to yourAdmissionConfigurationfile.
The full documentation is at Manifest-Based Admission Control, and you can follow KEP-5793 for ongoing progress.
We'd love to hear your feedback. Reach out on the #sig-api-machinery channel on Kubernetes Slack (for an invitation, visit https://slack.k8s.io/).
How to get involved
If you're interested in contributing to this feature or other SIG API Machinery projects, join us on #sig-api-machinery on Kubernetes Slack. You're also welcome to attend the SIG API Machinery meetings, held every other Wednesday.
04 May 2026 6:35pm GMT
01 May 2026
Kubernetes Blog
Kubernetes v1.36: Pod-Level Resource Managers (Alpha)
Kubernetes v1.36 introduces Pod-Level Resource Managers as an alpha feature, bringing a more flexible and powerful resource management model to performance-sensitive workloads. This enhancement extends the kubelet's Topology, CPU, and Memory Managers to support pod-level resource specifications (.spec.resources), evolving them from a strictly per-container allocation model to a pod-centric one.
Why do we need pod-level resource managers?
When running performance-critical workloads such as machine learning (ML) training, high-frequency trading applications, or low-latency databases, you often need exclusive, NUMA-aligned resources for your primary application containers to ensure predictable performance.
However, modern Kubernetes pods rarely consist of just one container. They frequently include sidecar containers for logging, monitoring, service meshes, or data ingestion.
Before this feature, this created a trade-off, to get NUMA-aligned, exclusive resources for your main application, you had to allocate exclusive, integer-based CPU resources to every container in the pod. This might be wasteful for lightweight sidecars. If you didn't do this, you forfeited the pod's Guaranteed Quality of Service (QoS) class entirely, losing the performance benefits.
Introducing pod-level resource managers
Enabling pod-level resources support for the resource managers (via the PodLevelResourceManagers and PodLevelResources feature gates) allows the kubelet to create hybrid resource allocation models. This brings flexibility and efficiency to high-performance workloads without sacrificing NUMA alignment.
Real-world use cases
Here are a few practical scenarios demonstrating how this feature can be applied, depending on the configured Topology Manager scope:
1. Tightly-coupled database (Topology manager's pod scope)
Consider a latency-sensitive database pod that includes a main database container, a local metrics exporter, and a backup agent sidecar.
When configured with the pod Topology Manager scope, the kubelet performs a single NUMA alignment based on the entire pod's budget. The database container gets its exclusive CPU and memory slices from that NUMA node. The remaining resources from the pod's budget form a new pod shared pool. The metrics exporter and backup agent run in this pod shared pool. They share resources with each other, but they are strictly isolated from the database's exclusive slices and the rest of the node.
This allows you to safely co-locate auxiliary containers on the same NUMA node as your primary workload without wasting dedicated cores on them.
apiVersion: v1
kind: Pod
metadata:
name: tightly-coupled-database
spec:
# Pod-level resources establish the overall budget and NUMA alignment size.
resources:
requests:
cpu: "8"
memory: "16Gi"
limits:
cpu: "8"
memory: "16Gi"
initContainers:
- name: metrics-exporter
image: metrics-exporter:v1
restartPolicy: Always
- name: backup-agent
image: backup-agent:v1
restartPolicy: Always
containers:
- name: database
image: database:v1
# This Guaranteed container gets an exclusive 6 CPU slice from the pod's budget.
# The remaining 2 CPUs and 4Gi memory form the pod shared pool for the sidecars.
resources:
requests:
cpu: "6"
memory: "12Gi"
limits:
cpu: "6"
memory: "12Gi"
2. ML workload with infrastructure sidecars (Topology manager's container scope)
Imagine a pod running a GPU-accelerated ML training workload alongside a generic service mesh sidecar.
Under the container Topology Manager scope, the kubelet evaluates each container individually. You can grant the ML container exclusive, NUMA-aligned CPUs and Memory for maximum performance. Meanwhile, the service mesh sidecar doesn't need to be NUMA-aligned; it can run in the general node-wide shared pool. The collective resource consumption is still safely bounded by the overall pod limits, but you only allocate NUMA-aligned, exclusive resources to the specific containers that actually require them.
apiVersion: v1
kind: Pod
metadata:
name: ml-workload
spec:
# Pod-level resources establish the overall budget constraint.
resources:
requests:
cpu: "4"
memory: "8Gi"
limits:
cpu: "4"
memory: "8Gi"
initContainers:
- name: service-mesh-sidecar
image: service-mesh:v1
restartPolicy: Always
containers:
- name: ml-training
image: ml-training:v1
# Under the 'container' scope, this Guaranteed container receives exclusive,
# NUMA-aligned resources, while the sidecar runs in the node's shared pool.
resources:
requests:
cpu: "3"
memory: "6Gi"
limits:
cpu: "3"
memory: "6Gi"
CPU quotas (CFS) and isolation
When running these mixed workloads within a pod, isolation is enforced differently depending on the allocation:
- Exclusive containers: Containers granted exclusive CPU slices have their CPU CFS quota enforcement disabled at the container level, allowing them to run without being throttled by the Linux scheduler.
- Pod shared pool containers: Containers falling into the pod shared pool have CPU CFS quotas enforced at the pod level, ensuring they do not consume more than the leftover pod budget.
How to enable Pod-Level Resource Managers
Using this feature requires Kubernetes v1.36 or newer. To enable it, you must configure the kubelet with the appropriate feature gates and policies:
- Enable the
PodLevelResourcesandPodLevelResourceManagersfeature gates. - Configure the Topology Manager with a policy other than
none(i.e.,best-effort,restricted, orsingle-numa-node). - Set the Topology Manager scope to either
podorcontainerusing thetopologyManagerScopefield in theKubeletConfiguration. - Configure the CPU Manager with the
staticpolicy. - Configure the Memory Manager with the
Staticpolicy.
Observability
To help cluster administrators monitor and debug these new allocation models, we have introduced several new kubelet metrics when the feature gate is enabled:
resource_manager_allocations_total: Counts the total number of exclusive resource allocations performed by a manager. Thesourcelabel ("pod" or "node") distinguishes between allocations drawn from the node-level pool versus a pre-allocated pod-level pool.resource_manager_allocation_errors_total: Counts errors encountered during exclusive resource allocation, distinguished by the intended allocationsource("pod" or "node").resource_manager_container_assignments: Tracks the cumulative number of containers running with specific assignment types. Theassignment_typelabel ("node_exclusive", "pod_exclusive", "pod_shared") provides visibility into how workloads are distributed.
Current limitations and caveats
While this feature opens up new possibilities, there are a few things to keep in mind during its alpha phase. Be sure to review the Limitations and caveats in the official documentation for full details on compatibility, requirements, and downgrade instructions.
Getting started and providing feedback
For a deep dive into the technical details and configuration of this feature, check out the official concept documentation:
To learn more about the overall pod-level resources feature and how to assign resources to pods, see:
As this feature moves through Alpha, your feedback is invaluable. Please report any issues or share your experiences via the standard Kubernetes communication channels:
01 May 2026 6:35pm GMT