03 Feb 2026
Kubernetes Blog
Introducing Node Readiness Controller
In the standard Kubernetes model, a node's suitability for workloads hinges on a single binary "Ready" condition. However, in modern Kubernetes environments, nodes require complex infrastructure dependencies-such as network agents, storage drivers, GPU firmware, or custom health checks-to be fully operational before they can reliably host pods.
Today, on behalf of the Kubernetes project, I am announcing the Node Readiness Controller. This project introduces a declarative system for managing node taints, extending the readiness guardrails during node bootstrapping beyond standard conditions. By dynamically managing taints based on custom health signals, the controller ensures that workloads are only placed on nodes that met all infrastructure-specific requirements.
Why the Node Readiness Controller?
Core Kubernetes Node "Ready" status is often insufficient for clusters with sophisticated bootstrapping requirements. Operators frequently struggle to ensure that specific DaemonSets or local services are healthy before a node enters the scheduling pool.
The Node Readiness Controller fills this gap by allowing operators to define custom scheduling gates tailored to specific node groups. This enables you to enforce distinct readiness requirements across heterogeneous clusters, ensuring for example, that GPU equipped nodes only accept pods once specialized drivers are verified, while general purpose nodes follow a standard path.
It provides three primary advantages:
- Custom Readiness Definitions: Define what ready means for your specific platform.
- Automated Taint Management: The controller automatically applies or removes node taints based on condition status, preventing pods from landing on unready infrastructure.
- Declarative Node Bootstrapping: Manage multi-step node initialization reliably, with a clear observability into the bootstrapping process.
Core concepts and features
The controller centers around the NodeReadinessRule (NRR) API, which allows you to define declarative gates for your nodes.
Flexible enforcement modes
The controller supports two distinct operational modes:
- Continuous enforcement
- Actively maintains the readiness guarantee throughout the node's entire lifecycle. If a critical dependency (like a device driver) fails later, the node is immediately tainted to prevent new scheduling.
- Bootstrap-only enforcement
- Specifically for one-time initialization steps, such as pre-pulling heavy images or hardware provisioning. Once conditions are met, the controller marks the bootstrap as complete and stops monitoring that specific rule for the node.
Condition reporting
The controller reacts to Node Conditions rather than performing health checks itself. This decoupled design allows it to integrate seamlessly with other tools existing in the ecosystem as well as custom solutions:
- Node Problem Detector (NPD): Use existing NPD setups and custom scripts to report node health.
- Readiness Condition Reporter: A lightweight agent provided by the project that can be deployed to periodically check local HTTP endpoints and patch node conditions accordingly.
Operational safety with dry run
Deploying new readiness rules across a fleet carries inherent risk. To mitigate this, dry run mode allows operators to first simulate impact on the cluster. In this mode, the controller logs intended actions and updates the rule's status to show affected nodes without applying actual taints, enabling safe validation before enforcement.
Example: CNI bootstrapping
The following NodeReadinessRule ensures a node remains unschedulable until its CNI agent is functional. The controller monitors a custom cniplugin.example.net/NetworkReady condition and only removes the readiness.k8s.io/acme.com/network-unavailable taint once the status is True.
apiVersion: readiness.node.x-k8s.io/v1alpha1
kind: NodeReadinessRule
metadata:
name: network-readiness-rule
spec:
conditions:
- type: "cniplugin.example.net/NetworkReady"
requiredStatus: "True"
taint:
key: "readiness.k8s.io/acme.com/network-unavailable"
effect: "NoSchedule"
value: "pending"
enforcementMode: "bootstrap-only"
nodeSelector:
matchLabels:
node-role.kubernetes.io/worker: ""
Demo:
Getting involved
The Node Readiness Controller is just getting started, with our initial releases out, and we are seeking community feedback to refine the roadmap. Following our productive Unconference discussions at KubeCon NA 2025, we are excited to continue the conversation in person.
Join us at KubeCon + CloudNativeCon Europe 2026 for our maintainer track session: Addressing Non-Deterministic Scheduling: Introducing the Node Readiness Controller.
In the meantime, you can contribute or track our progress here:
- GitHub: https://sigs.k8s.io/node-readiness-controller
- Slack: Join the conversation in #sig-node-readiness-controller
- Documentation: Getting Started
03 Feb 2026 2:00am GMT
30 Jan 2026
Kubernetes Blog
New Conversion from cgroup v1 CPU Shares to v2 CPU Weight
I'm excited to announce the implementation of an improved conversion formula from cgroup v1 CPU shares to cgroup v2 CPU weight. This enhancement addresses critical issues with CPU priority allocation for Kubernetes workloads when running on systems with cgroup v2.
Background
Kubernetes was originally designed with cgroup v1 in mind, where CPU shares were defined simply by assigning the container's CPU requests in millicpu form.
For example, a container requesting 1 CPU (1024m) would get (cpu.shares = 1024).
After a while, cgroup v1 started being replaced by its successor, cgroup v2. In cgroup v2, the concept of CPU shares (which ranges from 2 to 262144, or from 2¹ to 2¹⁸) was replaced with CPU weight (which ranges from [1, 10000], or 10⁰ to 10⁴).
With the transition to cgroup v2, KEP-2254 introduced a conversion formula to map cgroup v1 CPU shares to cgroup v2 CPU weight. The conversion formula was defined as: cpu.weight = (1 + ((cpu.shares - 2) * 9999) / 262142)
This formula linearly maps values from [2¹, 2¹⁸] to [10⁰, 10⁴].

While this approach is simple, the linear mapping imposes a few significant problems and impacts both performance and configuration granularity.
Problems with previous conversion formula
The current conversion formula creates two major issues:
1. Reduced priority against non-Kubernetes workloads
In cgroup v1, the default value for CPU shares is 1024, meaning a container requesting 1 CPU has equal priority with system processes that live outside of Kubernetes' scope. However, in cgroup v2, the default CPU weight is 100, but the current formula converts 1 CPU (1024m) to only ≈39 weight - less than 40% of the default.
Example:
- Container requesting 1 CPU (1024m)
- cgroup v1:
cpu.shares = 1024(equal to default) - cgroup v2 (current):
cpu.weight = 39(much lower than default 100)
This means that after moving to cgroup v2, Kubernetes (or OCI) workloads would de-facto reduce their CPU priority against non-Kubernetes processes. The problem can be severe for setups with many system daemons that run outside of Kubernetes' scope and expect Kubernetes workloads to have priority, especially in situations of resource starvation.
2. Unmanageable granularity
The current formula produces very low values for small CPU requests, limiting the ability to create sub-cgroups within containers for fine-grained resource distribution (which will possibly be much easier moving forward, see KEP #5474 for more info).
Example:
- Container requesting 100m CPU
- cgroup v1:
cpu.shares = 102 - cgroup v2 (current):
cpu.weight = 4(too low for sub-cgroup configuration)
With cgroup v1, requesting 100m CPU which led to 102 CPU shares was manageable in the sense that sub-cgroups could have been created inside the main container, assigning fine-grained CPU priorities for different groups of processes. With cgroup v2 however, having 4 shares is very hard to distribute between sub-cgroups since it's not granular enough.
With plans to allow writable cgroups for unprivileged containers, this becomes even more relevant.
New conversion formula
Description
The new formula is more complicated, but does a much better job mapping between cgroup v1 CPU shares and cgroup v2 CPU weight:
The idea is that this is a quadratic function to cross the following values:
- (2, 1): The minimum values for both ranges.
- (1024, 100): The default values for both ranges.
- (262144, 10000): The maximum values for both ranges.
Visually, the new function looks as follows:

And if you zoom in to the important part:

The new formula is "close to linear", yet it is carefully designed to map the ranges in a clever way so the three important points above would cross.
How it solves the problems
-
Better priority alignment:
- A container requesting 1 CPU (1024m) will now get a
cpu.weight = 102. This value is close to cgroup v2's default 100. This restores the intended priority relationship between Kubernetes workloads and system processes.
- A container requesting 1 CPU (1024m) will now get a
-
Improved granularity:
- A container requesting 100m CPU will get
cpu.weight = 17, (see here). Enables better fine-grained resource distribution within containers.
- A container requesting 100m CPU will get
Adoption and integration
This change was implemented at the OCI layer. In other words, this is not implemented in Kubernetes itself; therefore the adoption of the new conversion formula depends solely on the OCI runtime adoption.
For example:
- runc: The new formula is enabled from version 1.3.2.
- crun: The new formula is enabled from version 1.23.
Impact on existing deployments
Important: Some consumers may be affected if they assume the older linear conversion formula. Applications or monitoring tools that directly calculate expected CPU weight values based on the previous formula may need updates to account for the new quadratic conversion. This is particularly relevant for:
- Custom resource management tools that predict CPU weight values.
- Monitoring systems that validate or expect specific weight values.
- Applications that programmatically set or verify CPU weight values.
The Kubernetes project recommends testing the new conversion formula in non-production environments before upgrading OCI runtimes to ensure compatibility with existing tooling.
Where can I learn more?
For those interested in this enhancement:
- Kubernetes GitHub Issue #131216 - Detailed technical analysis and examples, including discussions and reasoning for choosing the above formula.
- KEP-2254: cgroup v2 - Original cgroup v2 implementation in Kubernetes.
- Kubernetes cgroup documentation - Current resource management guidance.
How do I get involved?
For those interested in getting involved with Kubernetes node-level features, join the Kubernetes Node Special Interest Group. We always welcome new contributors and diverse perspectives on resource management challenges.
30 Jan 2026 4:00pm GMT
29 Jan 2026
Kubernetes Blog
Ingress NGINX: Statement from the Kubernetes Steering and Security Response Committees
In March 2026, Kubernetes will retire Ingress NGINX, a piece of critical infrastructure for about half of cloud native environments. The retirement of Ingress NGINX was announced for March 2026, after years of public warnings that the project was in dire need of contributors and maintainers. There will be no more releases for bug fixes, security patches, or any updates of any kind after the project is retired. This cannot be ignored, brushed off, or left until the last minute to address. We cannot overstate the severity of this situation or the importance of beginning migration to alternatives like Gateway API or one of the many third-party Ingress controllers immediately.
To be abundantly clear: choosing to remain with Ingress NGINX after its retirement leaves you and your users vulnerable to attack. None of the available alternatives are direct drop-in replacements. This will require planning and engineering time. Half of you will be affected. You have two months left to prepare.
Existing deployments will continue to work, so unless you proactively check, you may not know you are affected until you are compromised. In most cases, you can check to find out whether or not you rely on Ingress NGINX by running kubectl get pods --all-namespaces --selector app.kubernetes.io/name=ingress-nginx with cluster administrator permissions.
Despite its broad appeal and widespread use by companies of all sizes, and repeated calls for help from the maintainers, the Ingress NGINX project never received the contributors it so desperately needed. According to internal Datadog research, about 50% of cloud native environments currently rely on this tool, and yet for the last several years, it has been maintained solely by one or two people working in their free time. Without sufficient staffing to maintain the tool to a standard both ourselves and our users would consider secure, the responsible choice is to wind it down and refocus efforts on modern alternatives like Gateway API.
We did not make this decision lightly; as inconvenient as it is now, doing so is necessary for the safety of all users and the ecosystem as a whole. Unfortunately, the flexibility Ingress NGINX was designed with, that was once a boon, has become a burden that cannot be resolved. With the technical debt that has piled up, and fundamental design decisions that exacerbate security flaws, it is no longer reasonable or even possible to continue maintaining the tool even if resources did materialize.
We issue this statement together to reinforce the scale of this change and the potential for serious risk to a significant percentage of Kubernetes users if this issue is ignored. It is imperative that you check your clusters now. If you are reliant on Ingress NGINX, you must begin planning for migration.
Thank you,
Kubernetes Steering Committee
Kubernetes Security Response Committee
29 Jan 2026 12:00am GMT