30 Mar 2026
Kubernetes Blog
Kubernetes v1.36 Sneak Peek
Kubernetes v1.36 is coming at the end of April 2026. This release will include removals and deprecations, and it is packed with an impressive number of enhancements. Here are some of the features we are most excited about in this cycle!
Please note that this information reflects the current state of v1.36 development and may change before release.
The Kubernetes API removal and deprecation process
The Kubernetes project has a well-documented deprecation policy for features. This policy states that stable APIs may only be deprecated when a newer, stable version of that same API is available and that APIs have a minimum lifetime for each stability level. A deprecated API has been marked for removal in a future Kubernetes release. It will continue to function until removal (at least one year from the deprecation), but usage will result in a warning being displayed. Removed APIs are no longer available in the current version, at which point you must migrate to using the replacement.
- Generally available (GA) or stable API versions may be marked as deprecated but must not be removed within a major version of Kubernetes.
- Beta or pre-release API versions must be supported for 3 releases after the deprecation.
- Alpha or experimental API versions may be removed in any release without prior deprecation notice; this process can become a withdrawal in cases where a different implementation for the same feature is already in place.
Whether an API is removed as a result of a feature graduating from beta to stable, or because that API simply did not succeed, all removals comply with this deprecation policy. Whenever an API is removed, migration options are communicated in the deprecation guide.
A recent example of this principle in action is the retirement of the ingress-nginx project, announced by SIG-Security on March 24, 2026. As stewardship shifts away from the project, the community has been encouraged to evaluate alternative ingress controllers that align with current security and maintenance best practices. This transition reflects the same lifecycle discipline that underpins Kubernetes itself, ensuring continued evolution without abrupt disruption.
Ingress NGINX retirement
To prioritize the safety and security of the ecosystem, Kubernetes SIG Network and the Security Response Committee have retired Ingress NGINX on March 24, 2026. Since that date, there have been no further releases, no bugfixes, and no updates to resolve any security vulnerabilities discovered. Existing deployments of Ingress NGINX will continue to function, and installation artifacts like Helm charts and container images will remain available.
For full details, see the official retirement announcement.
Deprecations and removals for Kubernetes v1.36
Deprecation of .spec.externalIPs in Service
The externalIPs field in Service spec is being deprecated, which means you'll soon lose a quick way to route arbitrary externalIPs to your Services. This field has been a known security headache for years, enabling man-in-the-middle attacks on your cluster traffic, as documented in CVE-2020-8554. From Kubernetes v1.36 and onwards, you will see deprecation warnings when using it, with full removal planned for v1.43.
If your Services still lean on externalIPs, consider using LoadBalancer services for cloud-managed ingress, NodePort for simple port exposure, or Gateway API for a more flexible and secure way to handle external traffic.
For more details on this enhancement, refer to KEP-5707: Deprecate service.spec.externalIPs
Removal of gitRepo volume driver
The gitRepo volume type has been deprecated since v1.11. Starting Kubernetes v1.36, the gitRepo volume plugin is permanently disabled and cannot be turned back on. This change protects clusters from a critical security issue where using gitRepo could let an attacker run code as root on the node.
Although gitRepo has been deprecated for years and better alternatives have been recommended, it was still technically possible to use it in previous releases. From v1.36 onward, that path is closed for good, so any existing workloads depending on gitRepo will need to migrate to supported approaches such as init containers or external git-sync style tools.
For more details on this enhancement, refer to KEP-5040: Remove gitRepo volume driver
Featured enhancements of Kubernetes v1.36
The following list of enhancements is likely to be included in the upcoming v1.36 release. This is not a commitment and the release content is subject to change.
Faster SELinux labelling for volumes (GA)
Kubernetes v1.36 makes the SELinux volume mounting improvement generally available. This change replaced recursive file relabeling with mount -o context=XYZ option, applying the correct SELinux label to the entire volume at mount time. It brings more consistent performance and reduces Pod startup delays on SELinux-enforcing systems.
This feature was introduced as beta in v1.28 for ReadWriteOncePod volumes. In v1.32, it gained metrics and an opt-out option (securityContext.seLinuxChangePolicy: Recursive) to help catch conflicts. Now in v1.36, it reaches stable and defaults to all volumes, with Pods or CSIDrivers opting in via spec.SELinuxMount.
However, we expect this feature to create the risk of breaking changes in the future Kubernetes releases, due to the potential for mixing of privileged and unprivileged pods. Setting the seLinuxChangePolicy field and SELinux volume labels on Pods, correctly, is the responsibility of the Pod author Developers have that responsibility whether they are writing a Deployment, StatefulSet, DaemonSet or even a custom resource that includes a Pod template. Being careless with these settings can lead to a range of problems when Pods share volumes.
For more details on this enhancement, refer to KEP-1710: Speed up recursive SELinux label change
External signing of ServiceAccount tokens
As a beta feature, Kubernetes already supports external signing of ServiceAccount tokens. This allows clusters to integrate with external key management systems or signing services instead of relying only on internally managed keys.
With this enhancement, the kube-apiserver can delegate token signing to external systems such as cloud key management services or hardware security modules. This improves security and simplifies key management services for clusters that rely on centralized signing infrastructure. We expect that this will graduate to stable (GA) in Kubernetes v1.36.
For more details on this enhancement, refer to KEP-740: Support external signing of service account tokens
DRA Driver support for Device taints and tolerations
Kubernetes v1.33 introduced support for taints and tolerations for physical devices managed through Dynamic Resource Allocation (DRA). Normally, any device can be used for scheduling. However, this enhancement allows DRA drivers to mark devices as tainted, which ensures that they will not be used for scheduling purposes. Alternatively, cluster administrators can create a DeviceTaintRule to mark devices that match a certain selection criteria(such as all devices of a certain driver) as tainted. This improves scheduling control and helps ensure that specialized hardware resources are only used by workloads that explicitly request them.
In Kubernetes v1.36, this feature graduates to beta with more comprehensive testing complete, making it accessible by default without the need for a feature flag and open to user feedback.
To learn about taints and tolerations, see taints and tolerations.
For more details on this enhancement, refer to KEP-5055: DRA: device taints and tolerations.
DRA support for partitionable devices
Kubernetes v1.36 expands Dynamic Resource Allocation (DRA) by introducing support for partitionable devices, allowing a single hardware accelerator to be split into multiple logical units that can be shared across workloads. This is especially useful for high-cost resources like GPUs, where dedicating an entire device to a single workload can lead to underutilization.
With this enhancement, platform teams can improve overall cluster efficiency by allocating only the required portion of a device to each workload, rather than reserving it entirely. This makes it easier to run multiple workloads on the same hardware while maintaining isolation and control, helping organizations get more value out of their infrastructure.
To learn more about this enhancement, refer to KEP-4815: DRA Partitionable Devices
Want to know more?
New features and deprecations are also announced in the Kubernetes release notes. We will formally announce what's new in Kubernetes v1.36 as part of the CHANGELOG for that release.
Kubernetes v1.36 release is planned for Wednesday, April 22, 2026. Stay tuned for updates!
You can also see the announcements of changes in the release notes for:
- Kubernetes v1.35
- Kubernetes v1.34
- Kubernetes v1.33
- Kubernetes v1.32
- Kubernetes v1.31
- Kubernetes v1.30
Get involved
The simplest way to get involved with Kubernetes is by joining one of the many Special Interest Groups (SIGs) that align with your interests. Have something you'd like to broadcast to the Kubernetes community? Share your voice at our weekly community meeting, and through the channels below. Thank you for your continued feedback and support.
- Follow us on Bluesky @kubernetes.io for the latest updates
- Join the community discussion on Discuss
- Join the community on Slack
- Post questions (or answer questions) on Server Fault or Stack Overflow
- Share your Kubernetes story
- Read more about what's happening with Kubernetes on the blog
- Learn more about the Kubernetes Release Team
30 Mar 2026 12:00am GMT
20 Mar 2026
Kubernetes Blog
Announcing Ingress2Gateway 1.0: Your Path to Gateway API
With the Ingress-NGINX retirement scheduled for March 2026, the Kubernetes networking landscape is at a turning point. For most organizations, the question isn't whether to migrate to Gateway API, but how to do so safely.
Migrating from Ingress to Gateway API is a fundamental shift in API design. Gateway API provides a modular, extensible API with strong support for Kubernetes-native RBAC. Conversely, the Ingress API is simple, and implementations such as Ingress-NGINX extend the API through esoteric annotations, ConfigMaps, and CRDs. Migrating away from Ingress controllers such as Ingress-NGINX presents the daunting task of capturing all the nuances of the Ingress controller, and mapping that behavior to Gateway API.
Ingress2Gateway is an assistant that helps teams confidently move from Ingress to Gateway API. It translates Ingress resources/manifests along with implementation-specific annotations to Gateway API while warning you about untranslatable configuration and offering suggestions.
Today, SIG Network is proud to announce the 1.0 release of Ingress2Gateway. This milestone represents a stable, tested migration assistant for teams ready to modernize their networking stack.
Ingress2Gateway 1.0
Ingress-NGINX annotation support
The main improvement for the 1.0 release is more comprehensive Ingress-NGINX support. Before the 1.0 release, Ingress2Gateway only supported three Ingress-NGINX annotations. For the 1.0 release, Ingress2Gateway supports over 30 common annotations (CORS, backend TLS, regex matching, path rewrite, etc.).
Comprehensive integration testing
Each supported Ingress-NGINX annotation, and representative combinations of common annotations, is backed by controller-level integration tests that verify the behavioral equivalence of the Ingress-NGINX configuration and the generated Gateway API. These tests exercise real controllers in live clusters and compare runtime behavior (routing, redirects, rewrites, etc.), not just YAML structure.
The tests:
- spin up an Ingress-NGINX controller
- spin up multiple Gateway API controllers
- apply Ingress resources that have implementation-specific configuration
- translate Ingress resources to Gateway API with
ingress2gatewayand apply generated manifests - verify that the Gateway API controllers and the Ingress controller exhibit equivalent behavior.
A comprehensive test suite not only catches bugs in development, but also ensures the correctness of the translation, especially given surprising edge cases and unexpected defaults, so that you don't find out about them in production.
Notification & error handling
Migration is not a "one-click" affair. Surfacing subtleties and untranslatable behavior is as important as translating supported configuration. The 1.0 release cleans up the formatting and content of notifications, so it is clear what is missing and how you can fix it.
Using Ingress2Gateway
Ingress2Gateway is a migration assistant, not a one-shot replacement. Its goal is to
- migrate supported Ingress configuration and behavior
- identify unsupported configuration and suggest alternatives
- reevaluate and potentially discard undesirable configuration
The rest of the section shows you how to safely migrate the following Ingress-NGINX configuration
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
annotations:
nginx.ingress.kubernetes.io/proxy-body-size: "1G"
nginx.ingress.kubernetes.io/use-regex: "true"
nginx.ingress.kubernetes.io/proxy-send-timeout: "1"
nginx.ingress.kubernetes.io/proxy-read-timeout: "1"
nginx.ingress.kubernetes.io/enable-cors: "true"
nginx.ingress.kubernetes.io/configuration-snippet: |
more_set_headers "Request-Id: $req_id";
name: my-ingress
namespace: my-ns
spec:
ingressClassName: nginx
rules:
- host: my-host.example.com
http:
paths:
- backend:
service:
name: website-service
port:
number: 80
path: /users/(\d+)
pathType: ImplementationSpecific
tls:
- hosts:
- my-host.example.com
secretName: my-secret
1. Install Ingress2Gateway
If you have a Go environment set up, you can install Ingress2Gateway with
go install github.com/kubernetes-sigs/ingress2gateway@v1.0.0
Otherwise,
brew install ingress2gateway
You can also download the binary from GitHub or build from source.
2. Run Ingress2Gateway
You can pass Ingress2Gateway Ingress manifests, or have the tool read directly from your cluster.
# Pass it files
ingress2gateway print --input-file my-manifest.yaml,my-other-manifest.yaml --providers=ingress-nginx > gwapi.yaml
# Use a namespace in your cluster
ingress2gateway print --namespace my-api --providers=ingress-nginx > gwapi.yaml
# Or your whole cluster
ingress2gateway print --providers=ingress-nginx --all-namespaces > gwapi.yaml
Note:
You can also pass--emitter <agentgateway|envoy-gateway|kgateway> to output implementation-specific extensions.3. Review the output
This is the most critical step. The commands from the previous section output a Gateway API manifest to gwapi.yaml, and they also emit warnings that explain what did not translate exactly and what to review manually.
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
annotations:
gateway.networking.k8s.io/generator: ingress2gateway-dev
name: nginx
namespace: my-ns
spec:
gatewayClassName: nginx
listeners:
- hostname: my-host.example.com
name: my-host-example-com-http
port: 80
protocol: HTTP
- hostname: my-host.example.com
name: my-host-example-com-https
port: 443
protocol: HTTPS
tls:
certificateRefs:
- group: ""
kind: Secret
name: my-secret
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
annotations:
gateway.networking.k8s.io/generator: ingress2gateway-dev
name: my-ingress-my-host-example-com
namespace: my-ns
spec:
hostnames:
- my-host.example.com
parentRefs:
- name: nginx
port: 443
rules:
- backendRefs:
- name: website-service
port: 80
filters:
- cors:
allowCredentials: true
allowHeaders:
- DNT
- Keep-Alive
- User-Agent
- X-Requested-With
- If-Modified-Since
- Cache-Control
- Content-Type
- Range
- Authorization
allowMethods:
- GET
- PUT
- POST
- DELETE
- PATCH
- OPTIONS
allowOrigins:
- '*'
maxAge: 1728000
type: CORS
matches:
- path:
type: RegularExpression
value: (?i)/users/(\d+).*
name: rule-0
timeouts:
request: 10s
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
annotations:
gateway.networking.k8s.io/generator: ingress2gateway-dev
name: my-ingress-my-host-example-com-ssl-redirect
namespace: my-ns
spec:
hostnames:
- my-host.example.com
parentRefs:
- name: nginx
port: 80
rules:
- filters:
- requestRedirect:
scheme: https
statusCode: 308
type: RequestRedirect
Ingress2Gateway successfully translated some annotations into their Gateway API equivalents. For example, the nginx.ingress.kubernetes.io/enable-cors annotation was translated into a CORS filter. But upon closer inspection, the nginx.ingress.kubernetes.io/proxy-{read,send}-timeout and nginx.ingress.kubernetes.io/proxy-body-size annotations do not map perfectly. The logs show the reason for these omissions as well as reasoning behind the translation.
┌─ WARN ────────────────────────────────────────
│ Unsupported annotation nginx.ingress.kubernetes.io/configuration-snippet
│ source: INGRESS-NGINX
│ object: Ingress: my-ns/my-ingress
└─
┌─ INFO ────────────────────────────────────────
│ Using case-insensitive regex path matches. You may want to change this.
│ source: INGRESS-NGINX
│ object: HTTPRoute: my-ns/my-ingress-my-host-example-com
└─
┌─ WARN ────────────────────────────────────────
│ ingress-nginx only supports TCP-level timeouts; i2gw has made a best-effort translation to Gateway API timeouts.request. Please verify that this meets your needs. See documentation: https://gateway-api.sigs.k8s.io/guides/http-timeouts/
│ source: INGRESS-NGINX
│ object: HTTPRoute: my-ns/my-ingress-my-host-example-com
└─
┌─ WARN ────────────────────────────────────────
│ Failed to apply my-ns.my-ingress.metadata.annotations."nginx.ingress.kubernetes.io/proxy-body-size" from my-ns/my-ingress: Most Gateway API implementations have reasonable body size and buffering defaults
│ source: STANDARD_EMITTER
│ object: HTTPRoute: my-ns/my-ingress-my-host-example-com
└─
┌─ WARN ────────────────────────────────────────
│ Gateway API does not support configuring URL normalization (RFC 3986, Section 6). Please check if this matters for your use case and consult implementation-specific details.
│ source: STANDARD_EMITTER
└─
There is a warning that Ingress2Gateway does not support the nginx.ingress.kubernetes.io/configuration-snippet annotation. You will have to check your Gateway API implementation documentation to see if there is a way to achieve equivalent behavior.
The tool also notified us that Ingress-NGINX regex matches are case-insensitive prefix matches, which is why there is a match pattern of (?i)/users/(\d+).*. Most organizations will want to change this behavior to be an exact case-sensitive match by removing the leading (?i) and the trailing .* from the path pattern.
Ingress2Gateway made a best-effort translation from the nginx.ingress.kubernetes.io/proxy-{send,read}-timeout annotations to a 10 second request timeout in our HTTP route. If requests for this service should be much shorter, say 3 seconds, you can make the corresponding changes to your Gateway API manifests.
Also, nginx.ingress.kubernetes.io/proxy-body-size does not have a Gateway API equivalent, and was thus not translated. However, most Gateway API implementations have reasonable defaults for maximum body size and buffering, so this might not be a problem in practice. Further, some emitters might offer support for this annotation through implementation-specific extensions. For example, adding the --emitter agentgateway, --emitter envoy-gateway, or --emitter kgateway flag to the previous ingress2gateway print command would have resulted in additional implementation-specific configuration in the generated Gateway API manifests that attempted to capture the body size configuration.
We also see a warning about URL normalization. Gateway API implementations such as Agentgateway, Envoy Gateway, Kgateway, and Istio have some level of URL normalization, but the behavior varies across implementations and is not configurable through standard Gateway API. You should check and test the URL normalization behavior of your Gateway API implementation to ensure it is compatible with your use case.
To match Ingress-NGINX default behavior, Ingress2Gateway also added a listener on port 80 and a HTTP Request redirect filter to redirect HTTP traffic to HTTPS. You may not want to serve HTTP traffic at all and remove the listener on port 80 and the corresponding HTTPRoute.
Caution:
Always thoroughly review the generated output and logs.After manually applying these changes, the Gateway API manifests might look as follows.
---
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
annotations:
gateway.networking.k8s.io/generator: ingress2gateway-dev
name: nginx
namespace: my-ns
spec:
gatewayClassName: nginx
listeners:
- hostname: my-host.example.com
name: my-host-example-com-https
port: 443
protocol: HTTPS
tls:
certificateRefs:
- group: ""
kind: Secret
name: my-secret
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
annotations:
gateway.networking.k8s.io/generator: ingress2gateway-dev
name: my-ingress-my-host-example-com
namespace: my-ns
spec:
hostnames:
- my-host.example.com
parentRefs:
- name: nginx
port: 443
rules:
- backendRefs:
- name: website-service
port: 80
filters:
- cors:
allowCredentials: true
allowHeaders:
- DNT
...
allowMethods:
- GET
...
allowOrigins:
- '*'
maxAge: 1728000
type: CORS
matches:
- path:
type: RegularExpression
value: /users/(\d+)
name: rule-0
timeouts:
request: 3s
4. Verify
Now that you have Gateway API manifests, you should thoroughly test them in a development cluster. In this case, you should at least double-check that your Gateway API implementation's maximum body size defaults are appropriate for you and verify that a three-second timeout is enough.
After validating behavior in a development cluster, deploy your Gateway API configuration alongside your existing Ingress. We strongly suggest that you then gradually shift traffic using weighted DNS, your cloud load balancer, or traffic-splitting features of your platform. This way, you can quickly recover from any misconfiguration that made it through your tests.
Finally, when you have shifted all your traffic to your Gateway API controller, delete your Ingress resources and uninstall your Ingress controller.
Conclusion
The Ingress2Gateway 1.0 release is just the beginning, and we hope that you use Ingress2Gateway to safely migrate to Gateway API. As we approach the March 2026 Ingress-NGINX retirement, we invite the community to help us increase our configuration coverage, expand testing, and improve UX.
Resources about Gateway API
The scope of Gateway API can be daunting. Here are some resources to help you work with Gateway API:
- Listener sets allow application developers to manage gateway listeners.
gwctlgives you a comprehensive view of your Gateway resources, such as attachments and linter errors.- Gateway API Slack:
#sig-network-gateway-apion Kubernetes Slack - Ingress2Gateway Slack:
#sig-network-ingress2gatewayon Kubernetes Slack - GitHub: kubernetes-sigs/ingress2gateway
20 Mar 2026 7:00pm GMT
Running Agents on Kubernetes with Agent Sandbox
The landscape of artificial intelligence is undergoing a massive architectural shift. In the early days of generative AI, interacting with a model was often treated as a transient, stateless function call: a request that spun up, executed for perhaps 50 milliseconds, and terminated.
Today, the world is witnessing AI v2 eating AI v1. The ecosystem is moving from short-lived, isolated tasks to deploying multiple, coordinated AI agents that run constantly. These autonomous agents need to maintain context, use external tools, write and execute code, and communicate with one another over extended periods.
As platform engineering teams look for the right infrastructure to host these new AI workloads, one platform stands out as the natural choice: Kubernetes. However, mapping these unique agentic workloads to traditional Kubernetes primitives requires a new abstraction.
This is where the new Agent Sandbox project (currently in development under SIG Apps) comes into play.
The Kubernetes advantage (and the abstraction gap)
Kubernetes is the de facto standard for orchestrating cloud-native applications precisely because it solves the challenges of extensibility, robust networking, and ecosystem maturity. However, as AI evolves from short-lived inference requests to long-running, autonomous agents, we are seeing the emergence of a new operational pattern.
AI agents, by contrast, are typically isolated, stateful, singleton workloads. They act as a digital workspace or execution environment for an LLM. An agent needs a persistent identity and a secure scratchpad for writing and executing (often untrusted) code. Crucially, because these long-lived agents are expected to be mostly idle except for brief bursts of activity, they require a lifecycle that supports mechanisms like suspension and rapid resumption.
While you could theoretically approximate this by stringing together a StatefulSet of size 1, a headless Service, and a PersistentVolumeClaim for every single agent, managing this at scale becomes an operational nightmare.
Because of these unique properties, traditional Kubernetes primitives don't perfectly align.
Introducing Kubernetes Agent Sandbox
To bridge this gap, SIG Apps is developing agent-sandbox. The project introduces a declarative, standardized API specifically tailored for singleton, stateful workloads like AI agent runtimes.
At its core, the project introduces the Sandbox CRD. It acts as a lightweight, single-container environment built entirely on Kubernetes primitives, offering:
- Strong isolation for untrusted code: When an AI agent generates and executes code autonomously, security is paramount. The Sandbox custom resource natively supports different runtimes, like gVisor or Kata Containers. This provides the necessary kernel and network isolation required for multi-tenant, untrusted execution.
- Lifecycle management: Unlike traditional web servers optimized for steady, stateless traffic, an AI agent operates as a stateful workspace that may be idle for hours between tasks. Agent Sandbox supports scaling these idle environments to zero to save resources, while ensuring they can resume exactly where they left off.
- Stable identity: Coordinated multi-agent systems require stable networking. Every Sandbox is given a stable hostname and network identity, allowing distinct agents to discover and communicate with each other seamlessly.
Scaling agents with extensions
Because the AI space is moving incredibly quickly, we built an Extensions API layer that enables even faster iteration and development.
Starting a new pod adds about a second of overhead. That's perfectly fine when deploying a new version of a microservice, but when an agent is invoked after being idle, a one-second cold start breaks the continuity of the interaction. It forces the user or the orchestrating service to wait for the environment to provision before the model can even begin to think or act. SandboxWarmPool solves this by maintaining a pool of pre-provisioned Sandbox pods, effectively eliminating cold starts. Users or orchestration services can simply issue a SandboxClaim against a SandboxTemplate, and the controller immediately hands over a pre-warmed, fully isolated environment to the agent.
Quick start
Ready to try it yourself? You can install the Agent Sandbox core components and extensions directly into your learning or sandbox cluster, using your chosen release.
We recommend you use the latest release as the project is moving fast.
# Replace "vX.Y.Z" with a specific version tag (e.g., "v0.1.0") from
# https://github.com/kubernetes-sigs/agent-sandbox/releases
export VERSION="vX.Y.Z"
# Install the core components:
kubectl apply -f https://github.com/kubernetes-sigs/agent-sandbox/releases/download/${VERSION}/manifest.yaml
# Install the extensions components (optional):
kubectl apply -f https://github.com/kubernetes-sigs/agent-sandbox/releases/download/${VERSION}/extensions.yaml
# Install the Python SDK (optional):
# Create a virtual Python environment
python3 -m venv .venv
source .venv/bin/activate
# Install from PyPI
pip install k8s-agent-sandbox
Once installed, you can try out the Python SDK for AI agents or deploy one of the ready-to-use examples to see how easy it is to spin up an isolated agent environment.
The future of agents is cloud native
Whether it's a 50-millisecond stateless task, or a multi-week, mostly-idle collaborative process, extending Kubernetes with primitives designed specifically for isolated stateful singletons allows us to leverage all the robust benefits of the cloud-native ecosystem.
The Agent Sandbox project is open source and community-driven. If you are building AI platforms, developing agentic frameworks, or are interested in Kubernetes extensibility, we invite you to get involved:
- Check out the project on GitHub: kubernetes-sigs/agent-sandbox
- Join the discussion in the #sig-apps and #agent-sandbox channels on the Kubernetes Slack.
20 Mar 2026 6:00pm GMT
18 Mar 2026
Kubernetes Blog
Securing Production Debugging in Kubernetes
During production debugging, the fastest route is often broad access such as cluster-admin (a ClusterRole that grants administrator-level access), shared bastions/jump boxes, or long-lived SSH keys. It works in the moment, but it comes with two common problems: auditing becomes difficult, and temporary exceptions have a way of becoming routine.
This post offers my recommendations for good practices applicable to existing Kubernetes environments with minimal tooling changes:
- Least privilege with RBAC
- Short-lived, identity-bound credentials
- An SSH-style handshake model for cloud native debugging
A good architecture for securing production debugging workflows is to use a just-in-time secure shell gateway (often deployed as an on demand pod in the cluster). It acts as an SSH-style "front door" that makes temporary access actually temporary. You can authenticate with short-lived, identity-bound credentials, establish a session to the gateway, and the gateway uses the Kubernetes API and RBAC to control what they can do, such as pods/log, pods/exec, and pods/portforward. Sessions expire automatically, and both the gateway logs and Kubernetes audit logs capture who accessed what and when without shared bastion accounts or long-lived keys.
1) Using an access broker on top of Kubernetes RBAC
RBAC controls who can do what in Kubernetes. Many Kubernetes environments rely primarily on RBAC for authorization, although Kubernetes also supports other authorization modes such as Webhook authorization. You can enforce access directly with Kubernetes RBAC, or put an access broker in front of the cluster that still relies on Kubernetes permissions under the hood. In either model, Kubernetes RBAC remains the source of truth for what the Kubernetes API allows and at what scope.
An access broker adds controls that RBAC does not cover well. For example, it can decide whether a request is auto-approved or requires manual approval, whether a user can run a command, and which commands are allowed in a session. It can also manage group membership so that you grant permissions to groups instead of individual users. Kubernetes RBAC can allow actions such as pods/exec, but it cannot restrict which commands run inside an exec session.
With that model, Kubernetes RBAC defines the allowed actions for a user or group (for example, an on-call team in a single namespace). I recommend you only define access rules that grant rights to groups or to ServiceAccounts - never to individual users. The broker or identity provider then adds or removes users from that group as needed.
The broker can also enforce extra policy on top, like which commands are permitted in an interactive session and which requests can be auto-approved versus require manual approval. That policy can live in a JSON or XML file and be maintained through code review, so updates go through a formal pull request and are reviewed like any other production change.
Example: a namespaced on-call debug Role
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: oncall-debug
namespace: <namespace>
rules:
# Discover what's running
- apiGroups: [""]
resources: ["pods", "events"]
verbs: ["get", "list", "watch"]
# Read logs
- apiGroups: [""]
resources: ["pods/log"]
verbs: ["get"]
# Interactive debugging actions
- apiGroups: [""]
resources: ["pods/exec", "pods/portforward"]
verbs: ["create"]
# Understand rollout/controller state
- apiGroups: ["apps"]
resources: ["deployments", "replicasets"]
verbs: ["get", "list", "watch"]
# Optional: allow kubectl debug ephemeral containers
- apiGroups: [""]
resources: ["pods/ephemeralcontainers"]
verbs: ["update"]
Bind the Role to a group (rather than individual users) so membership can be managed through your identity provider:
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: oncall-debug
namespace: <namespace>
subjects:
- kind: Group
name: oncall-<team-name>
apiGroup: rbac.authorization.k8s.io
roleRef:
kind: Role
name: oncall-debug
apiGroup: rbac.authorization.k8s.io
2) Short-lived, identity-bound credentials
The goal is to use short-lived, identity-bound credentials that clearly tie a session to a real person and expire quickly. These credentials can include the user's identity and the scope of what they're allowed to do. They're typically signed using a private key that stays with the engineer, such as a hardware-backed key (for example, a YubiKey), so they can not be forged without access to that key.
You can implement this with Kubernetes-native authentication (for example, client certificates or an OIDC-based flow), or have the access broker from the previous section issue short-lived credentials on the user's behalf. In many setups, Kubernetes still uses RBAC to enforce permissions based on the authenticated identity and groups/claims. If you use an access broker, it can also encode additional scope constraints in the credential and enforce them during the session, such as which cluster or namespace the session applies to and which actions (or approved commands) are allowed against pods or nodes. In either case, the credentials should be signed by a certificate authority (CA), and that CA should be rotated on a regular schedule (for example, quarterly) to limit long-term risk.
Option A: short-lived OIDC tokens
A lot of managed Kubernetes clusters already give you short-lived tokens. The main thing is to make sure your kubeconfig refreshes them automatically instead of copying a long-lived token into the file.
For example:
users:
- name: oncall
user:
exec:
apiVersion: client.authentication.k8s.io/v1
command: cred-helper
args: ["--cluster=prod", "--ttl=30m"]
Option B: Short-lived client certificates (X.509)
If your API server (or your access broker from the previous section) is set up to trust a client CA, you can use short-lived client certificates for debugging access. The idea is:
- The private key is created and kept under the engineer's machine (ideally hardware-backed, like a non-exportable key in a YubiKey/PIV token)
- A short-lived certificate is issued (often via the CertificateSigningRequest API, or your access broker from the previous section, with a TTL).
- RBAC maps the authenticated identity to a minimal Role
This is straightforward to operationalize with the Kubernetes CertificateSigningRequest API.
Generate a key and CSR locally:
# Generate a private key.
# This could instead be generated within a hardware token;
# OpenSSL and several similar tools include support for that.
openssl genpkey -algorithm Ed25519 -out oncall.key
openssl req -new -key oncall.key -out oncall.csr \
-subj "/CN=user/O=oncall-payments"
Create a CertificateSigningRequest with a short expiration:
apiVersion: certificates.k8s.io/v1
kind: CertificateSigningRequest
metadata:
name: oncall-<user>-20260218
spec:
request: <base64-encoded oncall.csr>
signerName: kubernetes.io/kube-apiserver-client
expirationSeconds: 1800 # 30 minutes
usages:
- client auth
After the CSR is approved and signed, you extract the issued certificate and use it together with the private key to authenticate, for example via kubectl.
3) Use a just-in-time access gateway to run debugging commands
Once you have short-lived credentials, you can use them to open a secure shell session to a just-in-time access gateway, often exposed over SSH and created on demand. If the gateway is exposed over SSH, a common pattern is to issue the engineer a short-lived OpenSSH user certificate for the session. The gateway trusts your SSH user CA, authenticates the engineer at connection time, and then applies the approved session policy before making Kubernetes API calls on the user's behalf. OpenSSH certificates are separate from Kubernetes X.509 client certificates, so these are usually treated as distinct layers.
The resulting session should also be scoped so it cannot be reused outside of what was approved. For example, the gateway or broker can limit it to a specific cluster and namespace, and optionally to a narrower target such as a pod or node. That way, even if someone tries to reuse the access, it will not work outside the intended scope. After the session is established, the gateway executes only the allowed actions and records what happened for auditing.
Example: Namespace-scoped role bindings
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: jit-debug
namespace: <namespace>
annotations:
kubernetes.io/description: >
Colleagues performing semi-privileged debugging, with access provided
just in time and on demand.
rules:
- apiGroups: [""]
resources: ["pods", "pods/log"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["pods/exec"]
verbs: ["create"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: jit-debug
namespace: <namespace>
subjects:
- kind: Group
name: jit:oncall:<namespace> # mapped from the short-lived credential (cert/OIDC)
apiGroup: rbac.authorization.k8s.io
roleRef:
kind: Role
name: jit-debug
apiGroup: rbac.authorization.k8s.io
These RBAC objects, and the rules they define, allow debugging only within the specified namespace; attempts to access other namespaces are not allowed.
Example: Cluster-scoped role binding
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: jit-cluster-read
rules:
- apiGroups: [""]
resources: ["nodes", "namespaces"]
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: jit-cluster-read
subjects:
- kind: Group
name: jit:oncall:cluster
apiGroup: rbac.authorization.k8s.io
roleRef:
kind: ClusterRole
name: jit-cluster-read
apiGroup: rbac.authorization.k8s.io
These RBAC rules grant cluster-wide read access (for example, to nodes and namespaces) and should be used only for workflows that truly require cluster-scoped resources.
Finer-grained restrictions like "only this pod/node" or "only these commands" are typically enforced by the access gateway/broker during the session, but Kubernetes also offers other options, such as ValidatingAdmissionPolicy for restricting writes and webhook authorization for custom authorization across verbs.
In environments with stricter access controls, you can add an extra, short-lived session mediation layer to separate session establishment from privileged actions. Both layers are ephemeral, use identity-bound expiring credentials, and produce independent audit trails. The mediation layer handles session setup/forwarding, while the execution layer performs only RBAC-authorized Kubernetes actions. This separation can reduce exposure by narrowing responsibilities, scoping credentials per step, and enforcing end-to-end session expiry.
References
- Authorization
- Using RBAC Authorization
- Authenticating
- Certificates and Certificate Signing Requests
- Issue a Certificate for a Kubernetes API Client Using a CertificateSigningRequest
- Role Based Access Control Good Practices
Disclaimer: The views expressed in this post are solely those of the author and do not reflect the views of the author's employer or any other organization.
18 Mar 2026 6:00pm GMT
17 Mar 2026
Kubernetes Blog
The Invisible Rewrite: Modernizing the Kubernetes Image Promoter
Every container image you pull from registry.k8s.io got there through kpromo, the Kubernetes image promoter. It copies images from staging registries to production, signs them with cosign, replicates signatures across more than 20 regional mirrors, and generates SLSA provenance attestations. If this tool breaks, no Kubernetes release ships. Over the past few weeks, we rewrote its core from scratch, deleted 20% of the codebase, made it dramatically faster, and nobody noticed. That was the whole point.
A bit of history
The image promoter started in late 2018 as an internal Google project by Linus Arver. The goal was simple: replace the manual, Googler-gated process of copying container images into k8s.gcr.io with a community-owned, GitOps-based workflow. Push to a staging registry, open a PR with a YAML manifest, get it reviewed and merged, and automation handles the rest. KEP-1734 formalized this proposal.
In early 2019, the code moved to kubernetes-sigs/k8s-container-image-promoter and grew quickly. Over the next few years, Stephen Augustus consolidated multiple tools (cip, gh2gcs, krel promote-images, promobot-files) into a single CLI called kpromo. The repository was renamed to promo-tools. Adolfo Garcia Veytia (Puerco) added cosign signing and SBOM support. Tyler Ferrara built vulnerability scanning. Carlos Panato kept the project in a healthy and releasable state. 42 contributors made about 3,500 commits across more than 60 releases.
It worked. But by 2025 the codebase carried the weight of seven years of incremental additions from multiple SIGs and subprojects. The README said it plainly: you will see duplicated code, multiple techniques for accomplishing the same thing, and several TODOs.
The problems we needed to solve
Production promotion jobs for Kubernetes core images regularly took over 30 minutes and frequently failed with rate limit errors. The core promotion logic had grown into a monolith that was hard to extend and difficult to test, making new features like provenance or vulnerability scanning painful to add.
On the SIG Release roadmap, two work items had been sitting for a while: "Rewrite artifact promoter" and "Make artifact validation more robust". We had discussed these at SIG Release meetings and KubeCons, and the open research spikes on project board #171 captured eight questions that needed answers before we could move forward.
One issue to answer them all
In February 2026, we opened issue #1701 ("Rewrite artifact promoter pipeline") and answered all eight spikes in a single tracking issue. The rewrite was deliberately phased so that each step could be reviewed, merged, and validated independently. Here is what we did:
Phase 1: Rate Limiting (#1702). Rewrote rate limiting to properly throttle all registry operations with adaptive backoff.
Phase 2: Interfaces (#1704). Put registry and auth operations behind clean interfaces so they can be swapped out and tested independently.
Phase 3: Pipeline Engine (#1705). Built a pipeline engine that runs promotion as a sequence of distinct phases instead of one large function.
Phase 4: Provenance (#1706). Added SLSA provenance verification for staging images.
Phase 5: Scanner and SBOMs (#1709). Added vulnerability scanning and SBOM support. Flipped the default to the new pipeline engine. At this point we cut v4.2.0 and let it soak in production before continuing.
Phase 6: Split Signing from Replication (#1713). Separated image signing from signature replication into their own pipeline phases, eliminating the rate limit contention that caused most production failures.
Phase 7: Remove Legacy Pipeline (#1712). Deleted the old code path entirely.
Phase 8: Remove Legacy Dependencies (#1716). Deleted the audit subsystem, deprecated tools, and e2e test infrastructure.
Phase 9: Delete the Monolith (#1718). Removed the old monolithic core and its supporting packages. Thousands of lines deleted across phases 7 through 9.
Each phase shipped independently. v4.3.0 followed the next day with the legacy code fully removed.
With the new architecture in place, a series of follow-up improvements landed: parallelized registry reads (#1736), retry logic for all network operations (#1742), per-request timeouts to prevent pipeline hangs (#1763), HTTP connection reuse (#1759), local registry integration tests (#1746), the removal of deprecated credential file support (#1758), a rework of attestation handling to use cosign's OCI APIs and the removal of deprecated SBOM support (#1764), and a dedicated promotion record predicate type registered with the in-toto attestation framework (#1767). These would have been much harder to land without the clean separation the rewrite provided. v4.4.0 shipped all of these improvements and enabled provenance generation and verification by default.
The new pipeline
The promotion pipeline now has seven clearly separated phases:
| Phase | What it does |
|---|---|
| Setup | Validate options, prewarm TUF cache. |
| Plan | Parse manifests, read registries, compute which images need promotion. |
| Provenance | Verify SLSA attestations on staging images. |
| Validate | Check cosign signatures, exit here for dry runs. |
| Promote | Copy images server-side, preserving digests. |
| Sign | Sign promoted images with keyless cosign. |
| Attest | Generate promotion provenance attestations using a dedicated in-toto predicate type. |
Phases run sequentially, so each one gets exclusive access to the full rate limit budget. No more contention. Signature replication to mirror registries is no longer part of this pipeline and runs as a dedicated periodic Prow job instead.
Making it fast
With the architecture in place, we turned to performance.
Parallel registry reads (#1736): The plan phase reads 1,350 registries. We parallelized this and the plan phase dropped from about 20 minutes to about 2 minutes.
Two-phase tag listing (#1761): Instead of checking all 46,000 image groups across more than 20 mirrors, we first check only the source repositories. About 57% of images have no signatures at all because they were promoted before signing was enabled. We skip those entirely, cutting API calls roughly in half.
Source check before replication (#1727): Before iterating all mirrors for a given image, we check if the signature exists on the primary registry first. In steady state where most signatures are already replicated, this reduced the work from about 17 hours to about 15 minutes.
Per-request timeouts (#1763): We observed intermittent hangs where a stalled connection blocked the pipeline for over 9 hours. Every network operation now has its own timeout and transient failures are retried automatically.
Connection reuse (#1759): We started reusing HTTP connections and auth state across operations, eliminating redundant token negotiations. This closed a long-standing request from 2023.
By the numbers
Here is what the rewrite looks like in aggregate.
- Over 40 PRs merged, 3 releases shipped (v4.2.0, v4.3.0, v4.4.0)
- Over 10,000 lines added and over 16,000 lines deleted, a net reduction of about 5,000 lines (20% smaller codebase)
- Performance drastically improved across the board
- Robustness improved with retry logic, per-request timeouts, and adaptive rate limiting
- 19 long-standing issues closed
The codebase shrank by a fifth while gaining provenance attestations, a pipeline engine, vulnerability scanning integration, parallelized operations, retry logic, integration tests against local registries, and a standalone signature replication mode.
No user-facing changes
This was a hard requirement. The kpromo cip command accepts the same flags and reads the same YAML manifests. The post-k8sio-image-promo Prow job continued working throughout. The promotion manifests in kubernetes/k8s.io did not change. Nobody had to update their workflows or configuration.
We caught two regressions early in production. One (#1731) caused a registry key mismatch that made every image appear as "lost" so that nothing was promoted. Another (#1733) set the default thread count to zero, blocking all goroutines. Both were fixed within hours. The phased release strategy (v4.2.0 with the new engine, v4.3.0 with legacy code removed) gave us a clear rollback path that we fortunately never needed.
What comes next
Signature replication across all mirror registries remains the most expensive part of the promotion cycle. Issue #1762 proposes eliminating it entirely by having archeio (the registry.k8s.io redirect service) route signature tag requests to a single canonical upstream instead of per-region backends. Another option would be to move signing closer to the registry infrastructure itself. Both approaches need further discussion with the SIG Release and infrastructure teams, but either one would remove thousands of API calls per promotion cycle and simplify the codebase even further.
Thank you
This project has been a community effort spanning seven years. Thank you to Linus, Stephen, Adolfo, Carlos, Ben, Marko, Lauri, Tyler, Arnaud, and many others who contributed code, reviews, and planning over the years. The SIG Release and Release Engineering communities provided the context, the discussions, and the patience for a rewrite of infrastructure that every Kubernetes release depends on.
If you want to get involved, join us in #release-management on the Kubernetes Slack or check out the repository.
17 Mar 2026 12:00am GMT
09 Mar 2026
Kubernetes Blog
Announcing the AI Gateway Working Group
The community around Kubernetes includes a number of Special Interest Groups (SIGs) and Working Groups (WGs) facilitating discussions on important topics between interested contributors. Today, we're excited to announce the formation of the AI Gateway Working Group, a new initiative focused on developing standards and best practices for networking infrastructure that supports AI workloads in Kubernetes environments.
What is an AI Gateway?
In a Kubernetes context, an AI Gateway refers to network gateway infrastructure (including proxy servers, load-balancers, etc.) that generally implements the Gateway API specification with enhanced capabilities for AI workloads. Rather than defining a distinct product category, AI Gateways describe infrastructure designed to enforce policy on AI traffic, including:
- Token-based rate limiting for AI APIs.
- Fine-grained access controls for inference APIs.
- Payload inspection enabling intelligent routing, caching, and guardrails.
- Support for AI-specific protocols and routing patterns.
Working group charter and mission
The AI Gateway Working Group operates under a clear charter with the mission to develop proposals for Kubernetes Special Interest Groups (SIGs) and their sub-projects. Its primary goals include:
- Standards Development: Create declarative APIs, standards, and guidance for AI workload networking in Kubernetes.
- Community Collaboration: Foster discussions and build consensus around best practices for AI infrastructure.
- Extensible Architecture: Ensure composability, pluggability, and ordered processing for AI-specific gateway extensions.
- Standards-Based Approach: Build on established networking foundations, layering AI-specific capabilities on top of proven standards.
Active proposals
WG AI Gateway currently has several active proposals that address key challenges in AI workload networking:
Payload Processing
The payload processing proposal addresses the critical need for AI workloads to inspect and transform full HTTP request and response payloads. This enables:
AI Inference Security
- Guard against malicious prompts and prompt injection attacks.
- Content filtering for AI responses.
- Signature-based detection and anomaly detection for AI traffic.
AI Inference Optimization
- Semantic routing based on request content.
- Intelligent caching to reduce inference costs and improve response times.
- RAG (Retrieval-Augmented Generation) system integration for context enhancement.
The proposal defines standards for declarative payload processor configuration, ordered processing pipelines, and configurable failure modes - all essential for production AI workload deployments.
Egress gateways
Modern AI applications increasingly depend on external inference services, whether for specialized models, failover scenarios, or cost optimization. The egress gateways proposal aims to define standards for securely routing traffic outside the cluster. Key features include:
External AI Service Integration
- Secure access to cloud-based AI services (OpenAI, Vertex AI, Bedrock, etc.).
- Managed authentication and token injection for third-party AI APIs.
- Regional compliance and failover capabilities.
Advanced Traffic Management
- Backend resource definitions for external FQDNs and services.
- TLS policy management and certificate authority control.
- Cross-cluster routing for centralized AI infrastructure.
User Stories We're Addressing
- Platform operators providing managed access to external AI services.
- Developers requiring inference failover across multiple cloud providers.
- Compliance engineers enforcing regional restrictions on AI traffic.
- Organizations centralizing AI workloads on dedicated clusters.
Upcoming events
KubeCon + CloudNativeCon Europe 2026, Amsterdam
AI Gateway working group members will be presenting at KubeCon + CloudNativeCon Europe in Amsterdam, discussing the problems at the intersection of AI and networking, including the working group's active proposals, as well as the intersection of AI gateways with Model Context Protocol (MCP) and agent networking patterns.
This session will showcase how AI Gateway working group proposals enable the infrastructure needed for next-generation AI deployments and communication patterns.
The session will also include the initial designs, early prototypes, and emerging directions shaping the WG's roadmap.
For more details see our session here:
Get involved
The AI Gateway Working Group represents the Kubernetes community's commitment to standardizing AI workload networking. As AI becomes increasingly integral to modern applications, we need robust, standardized infrastructure that can support the unique requirements of inference workloads while maintaining the security, observability, and reliability standards that Kubernetes users expect.
Our proposals are currently in active development, with implementations beginning across various gateway projects. We're working closely with SIG Network on Gateway API enhancements and collaborating with the broader cloud-native community to ensure our standards meet real-world production needs.
Whether you're a gateway implementer, platform operator, AI application developer, or simply interested in the intersection of Kubernetes and AI, we'd love your input. The working group follows an open contribution model - you can review our proposals, join our weekly meetings, or start discussions on our GitHub repository. To learn more:
- Visit the working group's umbrella GitHub repository.
- Read the working group's charter.
- Join the weekly meeting on Thursdays at 2PM EST.
- Connect with the working group on Slack (#wg-ai-gateway) (visit https://slack.k8s.io/ for an invitation).
- Join the AI Gateway mailing list.
The future of AI infrastructure in Kubernetes is being built today, join up and learn how you can contribute and help shape the future of AI-aware gateway capabilities in Kubernetes.
09 Mar 2026 6:00pm GMT
27 Feb 2026
Kubernetes Blog
Before You Migrate: Five Surprising Ingress-NGINX Behaviors You Need to Know
As announced November 2025, Kubernetes will retire Ingress-NGINX in March 2026. Despite its widespread usage, Ingress-NGINX is full of surprising defaults and side effects that are probably present in your cluster today. This blog highlights these behaviors so that you can migrate away safely and make a conscious decision about which behaviors to keep. This post also compares Ingress-NGINX with Gateway API and shows you how to preserve Ingress-NGINX behavior in Gateway API. The recurring risk pattern in every section is the same: a seemingly correct translation can still cause outages if it does not consider Ingress-NGINX's quirks.
I'm going to assume that you, the reader, have some familiarity with Ingress-NGINX and the Ingress API. Most examples use httpbin as the backend.
Also, note that Ingress-NGINX and NGINX Ingress are two separate Ingress controllers. Ingress-NGINX is an Ingress controller maintained and governed by the Kubernetes community that is retiring March 2026. NGINX Ingress is an Ingress controller by F5. Both use NGINX as the dataplane, but are otherwise unrelated. From now on, this blog post only discusses Ingress-NGINX.
1. Regex matches are prefix-based and case insensitive
Suppose that you wanted to route all requests with a path consisting of only three uppercase letters to the httpbin service. You might create the following Ingress with the nginx.ingress.kubernetes.io/use-regex: "true" annotation and the regex pattern of /[A-Z]{3}.
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: regex-match-ingress
annotations:
nginx.ingress.kubernetes.io/use-regex: "true"
spec:
ingressClassName: nginx
rules:
- host: regex-match.example.com
http:
paths:
- path: "/[A-Z]{3}"
pathType: ImplementationSpecific
backend:
service:
name: httpbin
port:
number: 8000
However, because regex matches are prefix and case insensitive, Ingress-NGINX routes any request with a path that starts with any three letters to httpbin:
curl -sS -H "Host: regex-match.example.com" http://<your-ingress-ip>/uuid
The output is similar to:
{
"uuid": "e55ef929-25a0-49e9-9175-1b6e87f40af7"
}
Note: The /uuid endpoint of httpbin returns a random UUID. A UUID in the response body means that the request was successfully routed to httpbin.
With Gateway API, you can use an HTTP path match with a type of RegularExpression for regular expression path matching. RegularExpression matches are implementation specific, so check with your Gateway API implementation to verify the semantics of RegularExpression matching. Popular Envoy-based Gateway API implementations such as Istio1, Envoy Gateway, and Kgateway do a full case-sensitive match.
Thus, if you are unaware that Ingress-NGINX patterns are prefix and case-insensitive, and, unbeknownst to you, clients or applications send traffic to /uuid (or /uuid/some/other/path), you might create the following HTTP route.
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: regex-match-route
spec:
hostnames:
- regex-match.example.com
parentRefs:
- name: <your gateway> # Change this depending on your use case
rules:
- matches:
- path:
type: RegularExpression
value: "/[A-Z]{3}"
backendRefs:
- name: httpbin
port: 8000
However, if your Gateway API implementation does full case-sensitive matches, the above HTTP route would not match a request with a path of /uuid. The above HTTP route would thus cause an outage because requests that Ingress-NGINX routed to httpbin would fail with a 404 Not Found at the gateway.
To preserve the case-insensitive regex matching, you can use the following HTTP route.
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: regex-match-route
spec:
hostnames:
- regex-match.example.com
parentRefs:
- name: <your gateway> # Change this depending on your use case
rules:
- matches:
- path:
type: RegularExpression
value: "/[a-zA-Z]{3}.*"
backendRefs:
- name: httpbin
port: 8000
Alternatively, the aforementioned proxies support the (?i) flag to indicate case insensitive matches. Using the flag, the pattern could be (?i)/[a-z]{3}.*.
2. The nginx.ingress.kubernetes.io/use-regex applies to all paths of a host across all (Ingress-NGINX) Ingresses
Now, suppose that you have an Ingress with the nginx.ingress.kubernetes.io/use-regex: "true" annotation, but you want to route requests with a path of exactly /headers to httpbin. Unfortunately, you made a typo and set the path to /Header instead of /headers.
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: regex-match-ingress
annotations:
nginx.ingress.kubernetes.io/use-regex: "true"
spec:
ingressClassName: nginx
rules:
- host: regex-match.example.com
http:
paths:
- path: "<some regex pattern>"
pathType: ImplementationSpecific
backend:
service:
name: <your backend>
port:
number: 8000
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: regex-match-ingress-other
spec:
ingressClassName: nginx
rules:
- host: regex-match.example.com
http:
paths:
- path: "/Header" # typo here, should be /headers
pathType: Exact
backend:
service:
name: httpbin
port:
number: 8000
Most would expect a request to /headers to respond with a 404 Not Found, since /headers does not match the Exact path of /Header. However, because the regex-match-ingress Ingress has the nginx.ingress.kubernetes.io/use-regex: "true" annotation and the regex-match.example.com host, all paths with the regex-match.example.com host are treated as regular expressions across all (Ingress-NGINX) Ingresses. Since regex patterns are case-insensitive prefix matches, /headers matches the /Header pattern and Ingress-NGINX routes such requests to httpbin. Running the command
curl -sS -H "Host: regex-match.example.com" http://<your-ingress-ip>/headers
the output looks like:
{
"headers": {
...
}
}
Note: The /headers endpoint of httpbin returns the request headers. The fact that the response contains the request headers in the body means that the request was successfully routed to httpbin.
Gateway API does not silently convert or interpret Exact and Prefix matches as regex patterns. So if you converted the above Ingresses into the following HTTP route and preserved the typo and match types, requests to /headers will respond with a 404 Not Found instead of a 200 OK.
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: regex-match-route
spec:
hostnames:
- regex-match.example.com
rules:
...
- matches:
- path:
type: Exact
value: "/Header"
backendRefs:
- name: httpbin
port: 8000
To keep the case-insensitive prefix matching, you can change
- matches:
- path:
type: Exact
value: "/Header"
to
- matches:
- path:
type: RegularExpression
value: "(?i)/Header"
Or even better, you could fix the typo and change the match to
- matches:
- path:
type: Exact
value: "/headers"
3. Rewrite target implies regex
In this case, suppose you want to rewrite the path of requests with a path of /ip to /uuid before routing them to httpbin, and as in Section 2, you want to route requests with the path of exactly /headers to httpbin. However, you accidentally make a typo and set the path to /IP instead of /ip and /Header instead of /headers.
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: rewrite-target-ingress
annotations:
nginx.ingress.kubernetes.io/rewrite-target: "/uuid"
spec:
ingressClassName: nginx
rules:
- host: rewrite-target.example.com
http:
paths:
- path: "/IP"
pathType: Exact
backend:
service:
name: httpbin
port:
number: 8000
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: rewrite-target-ingress-other
spec:
ingressClassName: nginx
rules:
- host: rewrite-target.example.com
http:
paths:
- path: "/Header"
pathType: Exact
backend:
service:
name: httpbin
port:
number: 8000
The nginx.ingress.kubernetes.io/rewrite-target: "/uuid" annotation causes requests that match paths in the rewrite-target-ingress Ingress to have their paths rewritten to /uuid before being routed to the backend.
Even though no Ingress has the nginx.ingress.kubernetes.io/use-regex: "true" annotation, the presence of the nginx.ingress.kubernetes.io/rewrite-target annotation in the rewrite-target-ingress Ingress causes all paths with the rewrite-target.example.com host to be treated as regex patterns. In other words, the nginx.ingress.kubernetes.io/rewrite-target silently adds the nginx.ingress.kubernetes.io/use-regex: "true" annotation, along with all the side effects discussed above.
For example, a request to /ip has its path rewritten to /uuid because /ip matches the case-insensitive prefix pattern of /IP in the rewrite-target-ingress Ingress. After running the command
curl -sS -H "Host: rewrite-target.example.com" http://<your-ingress-ip>/ip
the output is similar to:
{
"uuid": "12a0def9-1adg-2943-adcd-1234aadfgc67"
}
Like in the nginx.ingress.kubernetes.io/use-regex example, Ingress-NGINX treats paths of other ingresses with the rewrite-target.example.com host as case-insensitive prefix patterns. Running the command
curl -sS -H "Host: rewrite-target.example.com" http://<your-ingress-ip>/headers
gives an output that looks like
{
"headers": {
...
}
}
You can configure path rewrites in Gateway API with the HTTP URL rewrite filter which does not silently convert your Exact and Prefix matches into regex patterns. However, if you are unaware of the side effects of the nginx.ingress.kubernetes.io/rewrite-target annotation and do not realize that /Header and /IP are both typos, you might create the following HTTP route.
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: rewrite-target-route
spec:
hostnames:
- rewrite-target.example.com
parentRefs:
- name: <your-gateway>
rules:
- matches:
- path:
type: Exact
value: "/IP"
filters:
- type: URLRewrite
urlRewrite:
path:
type: ReplaceFullPath
replaceFullPath: /uuid
backendRefs:
- name: httpbin
port: 8000
- matches:
- path:
# This is an exact match, irrespective of other rules
type: Exact
value: "/Header"
backendRefs:
- name: httpbin
port: 8000
As with Section 2, because /IP is now an Exact match type in your HTTP route, requests to /ip will respond with a 404 Not Found instead of a 200 OK. Similarly, requests to /headers will also respond with a 404 Not Found instead of a 200 OK. Thus, this HTTP route will break applications and clients that rely on the /ip and /headers routes.
To fix this, you can change the matches in the HTTP route to be regex matches, and change the path patterns to be case-insensitive prefix matches, as follows.
- matches:
- path:
type: RegularExpression
value: "(?i)/IP.*"
...
- matches:
- path:
type: RegularExpression
value: "(?i)/Header.*"
Or, you can keep the Exact match type and fix the typos.
4. Requests missing a trailing slash are redirected to the same path with a trailing slash
Consider the following Ingress:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: trailing-slash-ingress
spec:
ingressClassName: nginx
rules:
- host: trailing-slash.example.com
http:
paths:
- path: "/my-path/"
pathType: Exact
backend:
service:
name: <your-backend>
port:
number: 8000
You might expect Ingress-NGINX to respond to /my-path with a 404 Not Found since the /my-path does not exactly match the Exact path of /my-path/. However, Ingress-NGINX redirects the request to /my-path/ with a 301 Moved Permanently because the only difference between /my-path and /my-path/ is a trailing slash.
curl -isS -H "Host: trailing-slash.example.com" http://<your-ingress-ip>/my-path
The output looks like:
HTTP/1.1 301 Moved Permanently
...
Location: http://trailing-slash.example.com/my-path/
...
The same applies if you change the pathType to Prefix. However, the redirect does not happen if the path is a regex pattern.
Conformant Gateway API implementations do not silently configure any kind of redirects. If clients or downstream services depend on this redirect, a migration to Gateway API that does not explicitly configure request redirects will cause an outage because requests to /my-path will now respond with a 404 Not Found instead of a 301 Moved Permanently. You can explicitly configure redirects using the HTTP request redirect filter as follows:
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: trailing-slash-route
spec:
hostnames:
- trailing-slash.example.com
parentRefs:
- name: <your-gateway>
rules:
- matches:
- path:
type: Exact
value: "/my-path"
filters:
requestRedirect:
statusCode: 301
path:
type: ReplaceFullPath
replaceFullPath: /my-path/
- matches:
- path:
type: Exact # or Prefix
value: "/my-path/"
backendRefs:
- name: <your-backend>
port: 8000
5. Ingress-NGINX normalizes URLs
URL normalization is the process of converting a URL into a canonical form before matching it against Ingress rules and routing it. The specifics of URL normalization are defined in RFC 3986 Section 6.2, but some examples are
- removing path segments that are just a
.:my/./path -> my/path - having a
..path segment remove the previous segment:my/../path -> /path - deduplicating consecutive slashes in a path:
my//path -> my/path
Ingress-NGINX normalizes URLs before matching them against Ingress rules. For example, consider the following Ingress:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: path-normalization-ingress
spec:
ingressClassName: nginx
rules:
- host: path-normalization.example.com
http:
paths:
- path: "/uuid"
pathType: Exact
backend:
service:
name: httpbin
port:
number: 8000
Ingress-NGINX normalizes the path of the following requests to /uuid. Now that the request matches the Exact path of /uuid, Ingress-NGINX responds with either a 200 OK response or a 301 Moved Permanently to /uuid.
For the following commands
curl -sS -H "Host: path-normalization.example.com" http://<your-ingress-ip>/uuid
curl -sS -H "Host: path-normalization.example.com" http://<your-ingress-ip>/ip/abc/../../uuid
curl -sSi -H "Host: path-normalization.example.com" http://<your-ingress-ip>////uuid
the outputs are similar to
{
"uuid": "29c77dfe-73ec-4449-b70a-ef328ea9dbce"
}
{
"uuid": "d20d92e8-af57-4014-80ba-cf21c0c4ffae"
}
HTTP/1.1 301 Moved Permanently
...
Location: /uuid
...
Your backends might rely on the Ingress/Gateway API implementation to normalize URLs. That said, most Gateway API implementations will have some path normalization enabled by default. For example, Istio, Envoy Gateway, and Kgateway all normalize . and .. segments out of the box. For more details, check the documentation for each Gateway API implementation that you use.
Conclusion
As we all race to respond to the Ingress-NGINX retirement, I hope this blog post instills some confidence that you can migrate safely and effectively despite all the intricacies of Ingress-NGINX.
SIG Network has also been working on supporting the most common Ingress-NGINX annotations (and some of these unexpected behaviors) in Ingress2Gateway to help you translate Ingress-NGINX configuration into Gateway API, and offer alternatives to unsupported behavior.
SIG Network released Gateway API 1.5 earlier today (27th February 2026), which graduates features such as ListenerSet (that allow app developers to better manage TLS certificates), and the HTTPRoute CORS filter that allows CORS configuration.
-
You can use Istio purely as Gateway API controller with no other service mesh features. ↩︎
27 Feb 2026 3:30pm GMT
12 Feb 2026
Kubernetes Blog
Spotlight on SIG Architecture: API Governance
This is the fifth interview of a SIG Architecture Spotlight series that covers the different subprojects, and we will be covering SIG Architecture: API Governance.
In this SIG Architecture spotlight we talked with Jordan Liggitt, lead of the API Governance sub-project.
Introduction
FM: Hello Jordan, thank you for your availability. Tell us a bit about yourself, your role and how you got involved in Kubernetes.
JL: My name is Jordan Liggitt. I'm a Christian, husband, father of four, software engineer at Google by day, and amateur musician by stealth. I was born in Texas (and still like to claim it as my point of origin), but I've lived in North Carolina for most of my life.
I've been working on Kubernetes since 2014. At that time, I was working on authentication and authorization at Red Hat, and my very first pull request to Kubernetes attempted to add an OAuth server to the Kubernetes API server. It never exited work-in-progress status. I ended up going with a different approach that layered on top of the core Kubernetes API server in a different project (spoiler alert: this is foreshadowing), and I closed it without merging six months later.
Undeterred by that start, I stayed involved, helped build Kubernetes authentication and authorization capabilities, and got involved in the definition and evolution of the core Kubernetes APIs from early beta APIs, like v1beta3 to v1. I got tagged as an API reviewer in 2016 based on those contributions, and was added as an API approver in 2017.
Today, I help lead the API Governance and code organization subprojects for SIG Architecture, and I am a tech lead for SIG Auth.
FM: And when did you get specifically involved in the API Governance project?
JL: Around 2019.
Goals and scope of API Governance
FM: How would you describe the main goals and areas of intervention of the subproject?
The surface area includes all the various APIs Kubernetes has, and there are APIs that people do not always realize are APIs: command-line flags, configuration files, how binaries are run, how they talk to back-end components like the container runtime, and how they persist data. People often think of "the API" as only the REST API... that is the biggest and most obvious one, and the one with the largest audience, but all of these other surfaces are also APIs. Their audiences are narrower, so there is more flexibility there, but they still require consideration.
The goals are to be stable while still enabling innovation. Stability is easy if you never change anything, but that contradicts the goal of evolution and growth. So we balance "be stable" with "allow change".
FM: Speaking of changes, in terms of ensuring consistency and quality (which is clearly one of the reasons this project exists), what are the specific quality gates in the lifecycle of a Kubernetes change? Does API Governance get involved during the release cycle, prior to it through guidelines, or somewhere in between? At what points do you ensure the intended role is fulfilled?
JL: We have guidelines and conventions, both for APIs in general and for how to change an API. These are living documents that we update as we encounter new scenarios. They are long and dense, so we also support them with involvement at either the design stage or the implementation stage.
Sometimes, due to bandwidth constraints, teams move ahead with design work without feedback from API Review. That's fine, but it means that when implementation begins, the API review will happen then, and there may be substantial feedback. So we get involved when a new API is created or an existing API is changed, either at design or implementation.
FM: Is this during the Kubernetes Enhancement Proposal (KEP) process? Since KEPs are mandatory for enhancements, I assume part of the work intersects with API Governance?
JL: It can. KEPs vary in how detailed they are. Some include literal API definitions. When they do, we can perform an API review at the design stage. Then implementation becomes a matter of checking fidelity to the design.
Getting involved early is ideal. But some KEPs are conceptual and leave details to the implementation. That's not wrong; it just means the implementation will be more exploratory. Then API Review gets involved later, possibly recommending structural changes.
There's a trade-off regardless: detailed design upfront versus iterative discovery during implementation. People and teams work differently, and we're flexible and happy to consult early or at implementation time.
FM: This reminds me of what Fred Brooks wrote in "The Mythical Man-Month" about conceptual integrity being central to product quality... No matter how you structure the process, there must be a point where someone looks at what is coming and ensures conceptual integrity. Kubernetes uses APIs everywhere -- externally and internally -- so API Governance is critical to maintaining that integrity. How is this captured?
JL: Yes, the conventions document captures patterns we've learned over time: what to do in various situations. We also have automated linters and checks to ensure correctness around patterns like spec/status semantics. These automated tools help catch issues even when humans miss them.
As new scenarios arise -- and they do constantly -- we think through how to approach them and fold the results back into our documentation and tools. Sometimes it takes a few attempts before we settle on an approach that works well.
FM: Exactly. Each new interaction improves the guidelines.
JL: Right. And sometimes the first approach turns out to be wrong. It may take two or three iterations before we land on something robust.
The impact of Custom Resource Definitions
FM: Is there any particular change, episode, or domain that stands out as especially noteworthy, complex, or interesting in your experience?
JL: The watershed moment was Custom Resources. Prior to that, every API was handcrafted by us and fully reviewed. There were inconsistencies, but we understood and controlled every type and field.
When Custom Resources arrived, anyone could define anything. The first version did not even require a schema. That made it extremely powerful -- it enabled change immediately -- but it left us playing catch-up on stability and consistency.
When Custom Resources graduated to General Availability (GA), schemas became required, but escape hatches still existed for backward compatibility. Since then, we've been working on giving CRD authors validation capabilities comparable to built-ins. Built-in validation rules for CRDs have only just reached GA in the last few releases.
So CRDs opened the "anything is possible" era. Built-in validation rules are the second major milestone: bringing consistency back.
The three major themes have been defining schemas, validating data, and handling pre-existing invalid data. With ratcheting validation (allowing data to improve without breaking existing objects), we can now guide CRD authors toward conventions without breaking the world.
API Governance in context
FM: How does API Governance relate to SIG Architecture and API Machinery?
JL: API Machinery provides the actual code and tools that people build APIs on. They don't review APIs for storage, networking, scheduling, etc.
SIG Architecture sets the overall system direction and works with API Machinery to ensure the system supports that direction. API Governance works with other SIGs building on that foundation to define conventions and patterns, ensuring consistent use of what API Machinery provides.
FM: Thank you. That clarifies the flow. Going back to release cycles: do release phases -- enhancements freeze, code freeze -- change your workload? Or is API Governance mostly continuous?
JL: We get involved in two places: design and implementation. Design involvement increases before enhancements freeze; implementation involvement increases before code freeze. However, many efforts span multiple releases, so there is always some design and implementation happening, even for work targeting future releases. Between those intense periods, we often have time to work on long-term design work.
An anti-pattern we see is teams thinking about a large feature for months and then presenting it three weeks before enhancements freeze, saying, "Here is the design, please review." For big changes with API impact, it's much better to involve API Governance early.
And there are good times in the cycle for this -- between freezes -- when people have bandwidth. That's when long-term review work fits best.
Getting involved
FM: Clearly. Now, regarding team dynamics and new contributors: how can someone get involved in API Governance? What should they focus on?
JL: It's usually best to follow a specific change rather than trying to learn everything at once. Pick a small API change, perhaps one someone else is making or one you want to make, and observe the full process: design, implementation, review.
High-bandwidth review -- live discussion over video -- is often very effective. If you're making or following a change, ask whether there's a time to go over the design or PR together. Observing those discussions is extremely instructive.
Start with a small change. Then move to a bigger one. Then maybe a new API. That builds understanding of conventions as they are applied in practice.
FM: Excellent. Any final comments, or anything we missed?
JL: Yes... the reason we care so much about compatibility and stability is for our users. It's easy for contributors to see those requirements as painful obstacles preventing cleanup or requiring tedious work... but users integrated with our system, and we made a promise to them: we want them to trust that we won't break that contract. So even when it requires more work, moves slower, or involves duplication, we choose stability.
We are not trying to be obstructive; we are trying to make life good for users.
A lot of our questions focus on the future: you want to do something now... how will you evolve it later without breaking it? We assume we will know more in the future, and we want the design to leave room for that.
We also assume we will make mistakes. The question then is: how do we leave ourselves avenues to improve while keeping compatibility promises?
FM: Exactly. Jordan, thank you, I think we've covered everything. This has been an insightful view into the API Governance project and its role in the wider Kubernetes project.
JL: Thank you.
12 Feb 2026 12:00am GMT
03 Feb 2026
Kubernetes Blog
Introducing Node Readiness Controller
In the standard Kubernetes model, a node's suitability for workloads hinges on a single binary "Ready" condition. However, in modern Kubernetes environments, nodes require complex infrastructure dependencies-such as network agents, storage drivers, GPU firmware, or custom health checks-to be fully operational before they can reliably host pods.
Today, on behalf of the Kubernetes project, I am announcing the Node Readiness Controller. This project introduces a declarative system for managing node taints, extending the readiness guardrails during node bootstrapping beyond standard conditions. By dynamically managing taints based on custom health signals, the controller ensures that workloads are only placed on nodes that met all infrastructure-specific requirements.
Why the Node Readiness Controller?
Core Kubernetes Node "Ready" status is often insufficient for clusters with sophisticated bootstrapping requirements. Operators frequently struggle to ensure that specific DaemonSets or local services are healthy before a node enters the scheduling pool.
The Node Readiness Controller fills this gap by allowing operators to define custom scheduling gates tailored to specific node groups. This enables you to enforce distinct readiness requirements across heterogeneous clusters, ensuring for example, that GPU equipped nodes only accept pods once specialized drivers are verified, while general purpose nodes follow a standard path.
It provides three primary advantages:
- Custom Readiness Definitions: Define what ready means for your specific platform.
- Automated Taint Management: The controller automatically applies or removes node taints based on condition status, preventing pods from landing on unready infrastructure.
- Declarative Node Bootstrapping: Manage multi-step node initialization reliably, with a clear observability into the bootstrapping process.
Core concepts and features
The controller centers around the NodeReadinessRule (NRR) API, which allows you to define declarative gates for your nodes.
Flexible enforcement modes
The controller supports two distinct operational modes:
- Continuous enforcement
- Actively maintains the readiness guarantee throughout the node's entire lifecycle. If a critical dependency (like a device driver) fails later, the node is immediately tainted to prevent new scheduling.
- Bootstrap-only enforcement
- Specifically for one-time initialization steps, such as pre-pulling heavy images or hardware provisioning. Once conditions are met, the controller marks the bootstrap as complete and stops monitoring that specific rule for the node.
Condition reporting
The controller reacts to Node Conditions rather than performing health checks itself. This decoupled design allows it to integrate seamlessly with other tools existing in the ecosystem as well as custom solutions:
- Node Problem Detector (NPD): Use existing NPD setups and custom scripts to report node health.
- Readiness Condition Reporter: A lightweight agent provided by the project that can be deployed to periodically check local HTTP endpoints and patch node conditions accordingly.
Operational safety with dry run
Deploying new readiness rules across a fleet carries inherent risk. To mitigate this, dry run mode allows operators to first simulate impact on the cluster. In this mode, the controller logs intended actions and updates the rule's status to show affected nodes without applying actual taints, enabling safe validation before enforcement.
Example: CNI bootstrapping
The following NodeReadinessRule ensures a node remains unschedulable until its CNI agent is functional. The controller monitors a custom cniplugin.example.net/NetworkReady condition and only removes the readiness.k8s.io/acme.com/network-unavailable taint once the status is True.
apiVersion: readiness.node.x-k8s.io/v1alpha1
kind: NodeReadinessRule
metadata:
name: network-readiness-rule
spec:
conditions:
- type: "cniplugin.example.net/NetworkReady"
requiredStatus: "True"
taint:
key: "readiness.k8s.io/acme.com/network-unavailable"
effect: "NoSchedule"
value: "pending"
enforcementMode: "bootstrap-only"
nodeSelector:
matchLabels:
node-role.kubernetes.io/worker: ""
Demo:
Getting involved
The Node Readiness Controller is just getting started, with our initial releases out, and we are seeking community feedback to refine the roadmap. Following our productive Unconference discussions at KubeCon NA 2025, we are excited to continue the conversation in person.
Join us at KubeCon + CloudNativeCon Europe 2026 for our maintainer track session: Addressing Non-Deterministic Scheduling: Introducing the Node Readiness Controller.
In the meantime, you can contribute or track our progress here:
- GitHub: https://sigs.k8s.io/node-readiness-controller
- Slack: Join the conversation in #sig-node-readiness-controller
- Documentation: Getting Started
03 Feb 2026 2:00am GMT
30 Jan 2026
Kubernetes Blog
New Conversion from cgroup v1 CPU Shares to v2 CPU Weight
I'm excited to announce the implementation of an improved conversion formula from cgroup v1 CPU shares to cgroup v2 CPU weight. This enhancement addresses critical issues with CPU priority allocation for Kubernetes workloads when running on systems with cgroup v2.
Background
Kubernetes was originally designed with cgroup v1 in mind, where CPU shares were defined simply by assigning the container's CPU requests in millicpu form.
For example, a container requesting 1 CPU (1024m) would get (cpu.shares = 1024).
After a while, cgroup v1 started being replaced by its successor, cgroup v2. In cgroup v2, the concept of CPU shares (which ranges from 2 to 262144, or from 2¹ to 2¹⁸) was replaced with CPU weight (which ranges from [1, 10000], or 10⁰ to 10⁴).
With the transition to cgroup v2, KEP-2254 introduced a conversion formula to map cgroup v1 CPU shares to cgroup v2 CPU weight. The conversion formula was defined as: cpu.weight = (1 + ((cpu.shares - 2) * 9999) / 262142)
This formula linearly maps values from [2¹, 2¹⁸] to [10⁰, 10⁴].

While this approach is simple, the linear mapping imposes a few significant problems and impacts both performance and configuration granularity.
Problems with previous conversion formula
The current conversion formula creates two major issues:
1. Reduced priority against non-Kubernetes workloads
In cgroup v1, the default value for CPU shares is 1024, meaning a container requesting 1 CPU has equal priority with system processes that live outside of Kubernetes' scope. However, in cgroup v2, the default CPU weight is 100, but the current formula converts 1 CPU (1024m) to only ≈39 weight - less than 40% of the default.
Example:
- Container requesting 1 CPU (1024m)
- cgroup v1:
cpu.shares = 1024(equal to default) - cgroup v2 (current):
cpu.weight = 39(much lower than default 100)
This means that after moving to cgroup v2, Kubernetes (or OCI) workloads would de-facto reduce their CPU priority against non-Kubernetes processes. The problem can be severe for setups with many system daemons that run outside of Kubernetes' scope and expect Kubernetes workloads to have priority, especially in situations of resource starvation.
2. Unmanageable granularity
The current formula produces very low values for small CPU requests, limiting the ability to create sub-cgroups within containers for fine-grained resource distribution (which will possibly be much easier moving forward, see KEP #5474 for more info).
Example:
- Container requesting 100m CPU
- cgroup v1:
cpu.shares = 102 - cgroup v2 (current):
cpu.weight = 4(too low for sub-cgroup configuration)
With cgroup v1, requesting 100m CPU which led to 102 CPU shares was manageable in the sense that sub-cgroups could have been created inside the main container, assigning fine-grained CPU priorities for different groups of processes. With cgroup v2 however, having 4 shares is very hard to distribute between sub-cgroups since it's not granular enough.
With plans to allow writable cgroups for unprivileged containers, this becomes even more relevant.
New conversion formula
Description
The new formula is more complicated, but does a much better job mapping between cgroup v1 CPU shares and cgroup v2 CPU weight:
The idea is that this is a quadratic function to cross the following values:
- (2, 1): The minimum values for both ranges.
- (1024, 100): The default values for both ranges.
- (262144, 10000): The maximum values for both ranges.
Visually, the new function looks as follows:

And if you zoom in to the important part:

The new formula is "close to linear", yet it is carefully designed to map the ranges in a clever way so the three important points above would cross.
How it solves the problems
-
Better priority alignment:
- A container requesting 1 CPU (1024m) will now get a
cpu.weight = 102. This value is close to cgroup v2's default 100. This restores the intended priority relationship between Kubernetes workloads and system processes.
- A container requesting 1 CPU (1024m) will now get a
-
Improved granularity:
- A container requesting 100m CPU will get
cpu.weight = 17, (see here). Enables better fine-grained resource distribution within containers.
- A container requesting 100m CPU will get
Adoption and integration
This change was implemented at the OCI layer. In other words, this is not implemented in Kubernetes itself; therefore the adoption of the new conversion formula depends solely on the OCI runtime adoption.
For example:
- runc: The new formula is enabled from version 1.3.2.
- crun: The new formula is enabled from version 1.23.
Impact on existing deployments
Important: Some consumers may be affected if they assume the older linear conversion formula. Applications or monitoring tools that directly calculate expected CPU weight values based on the previous formula may need updates to account for the new quadratic conversion. This is particularly relevant for:
- Custom resource management tools that predict CPU weight values.
- Monitoring systems that validate or expect specific weight values.
- Applications that programmatically set or verify CPU weight values.
The Kubernetes project recommends testing the new conversion formula in non-production environments before upgrading OCI runtimes to ensure compatibility with existing tooling.
Where can I learn more?
For those interested in this enhancement:
- Kubernetes GitHub Issue #131216 - Detailed technical analysis and examples, including discussions and reasoning for choosing the above formula.
- KEP-2254: cgroup v2 - Original cgroup v2 implementation in Kubernetes.
- Kubernetes cgroup documentation - Current resource management guidance.
How do I get involved?
For those interested in getting involved with Kubernetes node-level features, join the Kubernetes Node Special Interest Group. We always welcome new contributors and diverse perspectives on resource management challenges.
30 Jan 2026 4:00pm GMT
29 Jan 2026
Kubernetes Blog
Ingress NGINX: Statement from the Kubernetes Steering and Security Response Committees
In March 2026, Kubernetes will retire Ingress NGINX, a piece of critical infrastructure for about half of cloud native environments. The retirement of Ingress NGINX was announced for March 2026, after years of public warnings that the project was in dire need of contributors and maintainers. There will be no more releases for bug fixes, security patches, or any updates of any kind after the project is retired. This cannot be ignored, brushed off, or left until the last minute to address. We cannot overstate the severity of this situation or the importance of beginning migration to alternatives like Gateway API or one of the many third-party Ingress controllers immediately.
To be abundantly clear: choosing to remain with Ingress NGINX after its retirement leaves you and your users vulnerable to attack. None of the available alternatives are direct drop-in replacements. This will require planning and engineering time. Half of you will be affected. You have two months left to prepare.
Existing deployments will continue to work, so unless you proactively check, you may not know you are affected until you are compromised. In most cases, you can check to find out whether or not you rely on Ingress NGINX by running kubectl get pods --all-namespaces --selector app.kubernetes.io/name=ingress-nginx with cluster administrator permissions.
Despite its broad appeal and widespread use by companies of all sizes, and repeated calls for help from the maintainers, the Ingress NGINX project never received the contributors it so desperately needed. According to internal Datadog research, about 50% of cloud native environments currently rely on this tool, and yet for the last several years, it has been maintained solely by one or two people working in their free time. Without sufficient staffing to maintain the tool to a standard both ourselves and our users would consider secure, the responsible choice is to wind it down and refocus efforts on modern alternatives like Gateway API.
We did not make this decision lightly; as inconvenient as it is now, doing so is necessary for the safety of all users and the ecosystem as a whole. Unfortunately, the flexibility Ingress NGINX was designed with, that was once a boon, has become a burden that cannot be resolved. With the technical debt that has piled up, and fundamental design decisions that exacerbate security flaws, it is no longer reasonable or even possible to continue maintaining the tool even if resources did materialize.
We issue this statement together to reinforce the scale of this change and the potential for serious risk to a significant percentage of Kubernetes users if this issue is ignored. It is imperative that you check your clusters now. If you are reliant on Ingress NGINX, you must begin planning for migration.
Thank you,
Kubernetes Steering Committee
Kubernetes Security Response Committee
29 Jan 2026 12:00am GMT
28 Jan 2026
Kubernetes Blog
Experimenting with Gateway API using kind
This document will guide you through setting up a local experimental environment with Gateway API on kind. This setup is designed for learning and testing. It helps you understand Gateway API concepts without production complexity.
Caution:
This is an experimentation learning setup, and should not be used for production. The components used on this document are not suited for production usage. Once you're ready to deploy Gateway API in a production environment, select an implementation that suits your needs.Overview
In this guide, you will:
- Set up a local Kubernetes cluster using kind (Kubernetes in Docker)
- Deploy cloud-provider-kind, which provides both LoadBalancer Services and a Gateway API controller
- Create a Gateway and HTTPRoute to route traffic to a demo application
- Test your Gateway API configuration locally
This setup is ideal for learning, development, and experimentation with Gateway API concepts.
Prerequisites
Before you begin, ensure you have the following installed on your local machine:
- Docker - Required to run kind and cloud-provider-kind
- kubectl - The Kubernetes command-line tool
- kind - Kubernetes in Docker
- curl - Required to test the routes
Create a kind cluster
Create a new kind cluster by running:
kind create cluster
This will create a single-node Kubernetes cluster running in a Docker container.
Install cloud-provider-kind
Next, you need cloud-provider-kind, which provides two key components for this setup:
- A LoadBalancer controller that assigns addresses to LoadBalancer-type Services
- A Gateway API controller that implements the Gateway API specification
It also automatically installs the Gateway API Custom Resource Definitions (CRDs) in your cluster.
Run cloud-provider-kind as a Docker container on the same host where you created the kind cluster:
VERSION="$(basename $(curl -s -L -o /dev/null -w '%{url_effective}' https://github.com/kubernetes-sigs/cloud-provider-kind/releases/latest))"
docker run -d --name cloud-provider-kind --rm --network host -v /var/run/docker.sock:/var/run/docker.sock registry.k8s.io/cloud-provider-kind/cloud-controller-manager:${VERSION}
Note: On some systems, you may need elevated privileges to access the Docker socket.
Verify that cloud-provider-kind is running:
docker ps --filter name=cloud-provider-kind
You should see the container listed and in a running state. You can also check the logs:
docker logs cloud-provider-kind
Experimenting with Gateway API
Now that your cluster is set up, you can start experimenting with Gateway API resources.
cloud-provider-kind automatically provisions a GatewayClass called cloud-provider-kind. You'll use this class to create your Gateway.
It is worth noticing that while kind is not a cloud provider, the project is named as cloud-provider-kind as it provides features that simulate a cloud-enabled environment.
Deploy a Gateway
The following manifest will:
- Create a new namespace called
gateway-infra - Deploy a Gateway that listens on port 80
- Accept HTTPRoutes with hostnames matching the
*.exampledomain.examplepattern - Allow routes from any namespace to attach to the Gateway. Note: In real clusters, prefer Same or Selector values on the
allowedRoutesnamespace selector field to limit attachments.
Apply the following manifest:
---
apiVersion: v1
kind: Namespace
metadata:
name: gateway-infra
---
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: gateway
namespace: gateway-infra
spec:
gatewayClassName: cloud-provider-kind
listeners:
- name: default
hostname: "*.exampledomain.example"
port: 80
protocol: HTTP
allowedRoutes:
namespaces:
from: All
Then verify that your Gateway is properly programmed and has an address assigned:
kubectl get gateway -n gateway-infra gateway
Expected output:
NAME CLASS ADDRESS PROGRAMMED AGE
gateway cloud-provider-kind 172.18.0.3 True 5m6s
The PROGRAMMED column should show True, and the ADDRESS field should contain an IP address.
Deploy a demo application
Next, deploy a simple echo application that will help you test your Gateway configuration. This application:
- Listens on port 3000
- Echoes back request details including path, headers, and environment variables
- Runs in a namespace called
demo
Apply the following manifest:
apiVersion: v1
kind: Namespace
metadata:
name: demo
---
apiVersion: v1
kind: Service
metadata:
labels:
app.kubernetes.io/name: echo
name: echo
namespace: demo
spec:
ports:
- name: http
port: 3000
protocol: TCP
targetPort: 3000
selector:
app.kubernetes.io/name: echo
type: ClusterIP
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app.kubernetes.io/name: echo
name: echo
namespace: demo
spec:
selector:
matchLabels:
app.kubernetes.io/name: echo
template:
metadata:
labels:
app.kubernetes.io/name: echo
spec:
containers:
- env:
- name: POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
image: registry.k8s.io/gateway-api/echo-basic:v20251204-v1.4.1
name: echo-basic
Create an HTTPRoute
Now create an HTTPRoute to route traffic from your Gateway to the echo application. This HTTPRoute will:
- Respond to requests for the hostname
some.exampledomain.example - Route traffic to the echo application
- Attach to the Gateway in the
gateway-infranamespace
Apply the following manifest:
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: echo
namespace: demo
spec:
parentRefs:
- name: gateway
namespace: gateway-infra
hostnames: ["some.exampledomain.example"]
rules:
- matches:
- path:
type: PathPrefix
value: /
backendRefs:
- name: echo
port: 3000
Test your route
The final step is to test your route using curl. You'll make a request to the Gateway's IP address with the hostname some.exampledomain.example. The command below is for POSIX shell only, and may need to be adjusted for your environment:
GW_ADDR=$(kubectl get gateway -n gateway-infra gateway -o jsonpath='{.status.addresses[0].value}')
curl --resolve some.exampledomain.example:80:${GW_ADDR} http://some.exampledomain.example
You should receive a JSON response similar to this:
{
"path": "/",
"host": "some.exampledomain.example",
"method": "GET",
"proto": "HTTP/1.1",
"headers": {
"Accept": [
"*/*"
],
"User-Agent": [
"curl/8.15.0"
]
},
"namespace": "demo",
"ingress": "",
"service": "",
"pod": "echo-dc48d7cf8-vs2df"
}
If you see this response, congratulations! Your Gateway API setup is working correctly.
Troubleshooting
If something isn't working as expected, you can troubleshoot by checking the status of your resources.
Check the Gateway status
First, inspect your Gateway resource:
kubectl get gateway -n gateway-infra gateway -o yaml
Look at the status section for conditions. Your Gateway should have:
Accepted: True- The Gateway was accepted by the controllerProgrammed: True- The Gateway was successfully configured.status.addressespopulated with an IP address
Check the HTTPRoute status
Next, inspect your HTTPRoute:
kubectl get httproute -n demo echo -o yaml
Check the status.parents section for conditions. Common issues include:
- ResolvedRefs set to False with reason
BackendNotFound; this means that the backend Service doesn't exist or has the wrong name - Accepted set to False; this means that the route couldn't attach to the Gateway (check namespace permissions or hostname matching)
Example error when a backend is not found:
status:
parents:
- conditions:
- lastTransitionTime: "2026-01-19T17:13:35Z"
message: backend not found
observedGeneration: 2
reason: BackendNotFound
status: "False"
type: ResolvedRefs
controllerName: kind.sigs.k8s.io/gateway-controller
Check controller logs
If the resource statuses don't reveal the issue, check the cloud-provider-kind logs:
docker logs -f cloud-provider-kind
This will show detailed logs from both the LoadBalancer and Gateway API controllers.
Cleanup
When you're finished with your experiments, you can clean up the resources:
Remove Kubernetes resources
Delete the namespaces (this will remove all resources within them):
kubectl delete namespace gateway-infra
kubectl delete namespace demo
Stop cloud-provider-kind
Stop and remove the cloud-provider-kind container:
docker stop cloud-provider-kind
Because the container was started with the --rm flag, it will be automatically removed when stopped.
Delete the kind cluster
Finally, delete the kind cluster:
kind delete cluster
Next steps
Now that you've experimented with Gateway API locally, you're ready to explore production-ready implementations:
- Production Deployments: Review the Gateway API implementations to find a controller that matches your production requirements
- Learn More: Explore the Gateway API documentation to learn about advanced features like TLS, traffic splitting, and header manipulation
- Advanced Routing: Experiment with path-based routing, header matching, request mirroring and other features following Gateway API user guides
A final word of caution
This kind setup is for development and learning only. Always use a production-grade Gateway API implementation for real workloads.
28 Jan 2026 12:00am GMT
27 Jan 2026
Kubernetes Blog
Cluster API v1.12: Introducing In-place Updates and Chained Upgrades
Cluster API brings declarative management to Kubernetes cluster lifecycle, allowing users and platform teams to define the desired state of clusters and rely on controllers to continuously reconcile toward it.
Similar to how you can use StatefulSets or Deployments in Kubernetes to manage a group of Pods, in Cluster API you can use KubeadmControlPlane to manage a set of control plane Machines, or you can use MachineDeployments to manage a group of worker Nodes.
The Cluster API v1.12.0 release expands what is possible in Cluster API, reducing friction in common lifecycle operations by introducing in-place updates and chained upgrades.
Emphasis on simplicity and usability
With v1.12.0, the Cluster API project demonstrates once again that this community is capable of delivering a great amount of innovation, while at the same time minimizing impact for Cluster API users.
What does this mean in practice?
Users simply have to change the Cluster or the Machine spec (just as with previous Cluster API releases), and Cluster API will automatically trigger in-place updates or chained upgrades when possible and advisable.
In-place Updates
Like Kubernetes does for Pods in Deployments, when the Machine spec changes also Cluster API performs rollouts by creating a new Machine and deleting the old one.
This approach, inspired by the principle of immutable infrastructure, has a set of considerable advantages:
- It is simple to explain, predictable, consistent and easy to reason about with users and engineers.
- It is simple to implement, because it relies only on two core primitives, create and delete.
- Implementation does not depend on Machine-specific choices, like OS, bootstrap mechanism etc.
As a result, Machine rollouts drastically reduce the number of variables to be considered when managing the lifecycle of a host server that is hosting Nodes.
However, while advantages of immutability are not under discussion, both Kubernetes and Cluster API are undergoing a similar journey, introducing changes that allow users to minimize workload disruption whenever possible.
Over time, also Cluster API has introduced several improvements to immutable rollouts, including:
- Support for in-place propagation of changes affecting Kubernetes resources only, thus avoiding unnecessary rollouts
- A way to Taint outdated nodes with PreferNoSchedule, thus reducing Pod churn by optimizing how Pods are rescheduled during rollouts.
- Support for the delete first rollout strategy, thus making it easier to do immutable rollouts on bare metal / environments with constrained resources.
The new in-place update feature in Cluster API is the next step in this journey.
With the v1.12.0 release, Cluster API introduces support for update extensions allowing users to make changes on existing machines in-place, without deleting and re-creating the Machines.
Both KubeadmControlPlane and MachineDeployments support in-place updates based on the new update extension, and this means that the boundary of what is possible in Cluster API is now changed in a significant way.
How do in-place updates work?
The simplest way to explain it is that once the user triggers an update by changing the desired state of Machines, then Cluster API chooses the best tool to achieve the desired state.
The news is that now Cluster API can choose between immutable rollouts and in-place update extensions to perform required changes.
Importantly, this is not immutable rollouts vs in-place updates; Cluster API considers both valid options and selects the most appropriate mechanism for a given change.
From the perspective of the Cluster API maintainers, in-place updates are most useful for making changes that don't otherwise require a node drain or pod restart; for example: changing user credentials for the Machine. On the other hand, when the workload will be disrupted anyway, just do a rollout.
Nevertheless, Cluster API remains true to its extensible nature, and everyone can create their own update extension and decide when and how to use in-place updates by trading in some of the benefits of immutable rollouts.
For a deep dive into this feature, make sure to attend the session In-place Updates with Cluster API: The Sweet Spot Between Immutable and Mutable Infrastructure at KubeCon EU in Amsterdam!
Chained Upgrades
ClusterClass and managed topologies in Cluster API jointly provided a powerful and effective framework that acts as a building block for many platforms offering Kubernetes-as-a-Service.
Now with v1.12.0 this feature is making another important step forward, by allowing users to upgrade by more than one Kubernetes minor version in a single operation, commonly referred to as a chained upgrade.
This allows users to declare a target Kubernetes version and let Cluster API safely orchestrate the required intermediate steps, rather than manually managing each minor upgrade.
The simplest way to explain how chained upgrades work, is that once the user triggers an update by changing the desired version for a Cluster, Cluster API computes an upgrade plan, and then starts executing it. Rather than (for example) update the Cluster to v1.33.0 and then v1.34.0 and then v1.35.0, checking on progress at each step, a chained upgrade lets you go directly to v1.35.0.
Executing an upgrade plan means upgrading control plane and worker machines in a strictly controlled order, repeating this process as many times as needed to reach the desired state. The Cluster API is now capable of managing this for you.
Cluster API takes care of optimizing and minimizing the upgrade steps for worker machines, and in fact worker machines will skip upgrades to intermediate Kubernetes minor releases whenever allowed by the Kubernetes version skew policies.
Also in this case extensibility is at the core of this feature, and upgrade plan runtime extensions can be used to influence how the upgrade plan is computed; similarly, lifecycle hooks can be used to automate other tasks that must be performed during an upgrade, e.g. upgrading an addon after the control plane update completed.
From our perspective, chained upgrades are most useful for users that struggle to keep up with Kubernetes minor releases, and e.g. they want to upgrade only once per year and then upgrade by three versions (n-3 → n). But be warned: the fact that you can now easily upgrade by more than one minor version is not an excuse to not patch your cluster frequently!
Release team
I would like to thank all the contributors, the maintainers, and all the engineers that volunteered for the release team.
The reliability and predictability of Cluster API releases, which is one of the most appreciated features from our users, is only possible with the support, commitment, and hard work of its community.
Kudos to the entire Cluster API community for the v1.12.0 release and all the great releases delivered in 2025! If you are interested in getting involved, learn about Cluster API contributing guidelines.
What's next?
If you read the Cluster API manifesto, you can see how the Cluster API subproject claims the right to remain unfinished, recognizing the need to continuously evolve, improve, and adapt to the changing needs of Cluster API's users and the broader Cloud Native ecosystem.
As Kubernetes itself continues to evolve, the Cluster API subproject will keep advancing alongside it, focusing on safer upgrades, reduced disruption, and stronger building blocks for platforms managing Kubernetes at scale.
Innovation remains at the heart of Cluster API, stay tuned for an exciting 2026!
Useful links:
27 Jan 2026 4:00pm GMT
22 Jan 2026
Kubernetes Blog
Headlamp in 2025: Project Highlights
This announcement is a recap from a post originally published on the Headlamp blog.
Headlamp has come a long way in 2025. The project has continued to grow - reaching more teams across platforms, powering new workflows and integrations through plugins, and seeing increased collaboration from the broader community.
We wanted to take a moment to share a few updates and highlight how Headlamp has evolved over the past year.
Updates
Joining Kubernetes SIG UI
This year marked a big milestone for the project: Headlamp is now officially part of Kubernetes SIG UI. This move brings roadmap and design discussions even closer to the core Kubernetes community and reinforces Headlamp's role as a modern, extensible UI for the project.
As part of that, we've also been sharing more about making Kubernetes approachable for a wider audience, including an appearance on Enlightening with Whitney Lee and a talk at KCD New York 2025.
Linux Foundation mentorship
This year, we were excited to work with several students through the Linux Foundation's Mentorship program, and our mentees have already left a visible mark on Headlamp:
- Adwait Godbole built the KEDA plugin, adding a UI in Headlamp to view and manage KEDA resources like ScaledObjects and ScaledJobs.
- Dhairya Majmudar set up an OpenTelemetry-based observability stack for Headlamp, wiring up metrics, logs, and traces so the project is easier to monitor and debug.
- Aishwarya Ghatole led a UX audit of Headlamp plugins, identifying usability issues and proposing design improvements and personas for plugin users.
- Anirban Singha developed the Karpenter plugin, giving Headlamp a focused view into Karpenter autoscaling resources and decisions.
- Aditya Chaudhary improved Gateway API support, so you can see networking relationships on the resource map, as well as improved support for many of the new Gateway API resources.
- Faakhir Zahid completed a way to easily manage plugin installation with Headlamp deployed in clusters.
- Saurav Upadhyay worked on backend caching for Kubernetes API calls, reducing load on the API server and improving performance in Headlamp.
New changes
Multi-cluster view
Managing multiple clusters is challenging: teams often switch between tools and lose context when trying to see what runs where. Headlamp solves this by giving you a single view to compare clusters side-by-side. This makes it easier to understand workloads across environments and reduces the time spent hunting for resources.
View of multi-cluster workloads
Projects
Kubernetes apps often span multiple namespaces and resource types, which makes troubleshooting feel like piecing together a puzzle. We've added Projects to give you an application-centric view that groups related resources across multiple namespaces - and even clusters. This allows you to reduce sprawl, troubleshoot faster, and collaborate without digging through YAML or cluster-wide lists.
View of the new Projects feature
Changes:
- New "Projects" feature for grouping namespaces into app- or team-centric projects
- Extensible Projects details view that plugins can customize with their own tabs and actions
Navigation and Activities
Day-to-day ops in Kubernetes often means juggling logs, terminals, YAML, and dashboards across clusters. We redesigned Headlamp's navigation to treat these as first-class "activities" you can keep open and come back to, instead of one-off views you lose as soon as you click away.
View of the new task bar
Changes:
- A new task bar/activities model lets you pin logs, exec sessions, and details as ongoing activities
- An activity overview with a "Close all" action and cluster information
- Multi-select and global filters in tables
Thanks to Jan Jansen and Aditya Chaudhary.
Search and map
When something breaks in production, the first two questions are usually "where is it?" and "what is it connected to?" We've upgraded both search and the map view so you can get from a high-level symptom to the right set of objects much faster.
View of the new Advanced Search feature
Changes:
- An Advanced search view that supports rich, expression-based queries over Kubernetes objects
- Improved global search that understands labels and multiple search items, and can even update your current namespace based on what you find
- EndpointSlice support in the Network section
- A richer map view that now includes Custom Resources and Gateway API objects
Thanks to Fabian, Alexander North, and Victor Marcolino from Swisscom, and also to Aditya Chaudhary.
OIDC and authentication
We've put real work into making OIDC setup clearer and more resilient, especially for in-cluster deployments.
View of user information for OIDC clusters
Changes:
- User information displayed in the top bar for OIDC-authenticated users
- PKCE support for more secure authentication flows, as well as hardened token refresh handling
- Documentation for using the access token using
-oidc-use-access-token=true - Improved support for public OIDC clients like AKS and EKS
- New guide for setting up Headlamp on AKS with Azure Entra-ID using OAuth2Proxy
Thanks to David Dobmeier and Harsh Srivastava.
App Catalog and Helm
We've broadened how you deploy and source apps via Headlamp, specifically supporting vanilla Helm repos.
Changes:
- A more capable Helm chart with optional backend TLS termination, PodDisruptionBudgets, custom pod labels, and more
- Improved formatting and added missing access token arg in the Helm chart
- New in-cluster Helm support with an
--enable-helmflag and a service proxy
Thanks to Vrushali Shah and Murali Annamneni from Oracle, and also to Pat Riehecky, Joshua Akers, Rostislav Stříbrný, Rick L, and Victor.
Performance, accessibility, and UX
Finally, we've spent a lot of time on the things you notice every day but don't always make headlines: startup time, list views, log viewers, accessibility, and small network UX details. A continuous accessibility self-audit has also helped us identify key issues and make Headlamp easier for everyone to use.
View of the Learn section in docs
Changes:
- Significant desktop improvements, with up to 60% faster app loads and much quicker dev-mode reloads for contributors
- Numerous table and log viewer refinements: persistent sort order, consistent row actions, copy-name buttons, better tooltips, and more forgiving log inputs
- Accessibility and localization improvements, including fixes for zoom-related layout issues, better color contrast, improved screen reader support, and expanded language coverage
- More control over resources, with live pod CPU/memory metrics, richer pod details, and inline editing for secrets and CRD fields
- A refreshed documentation and plugin onboarding experience, including a "Learn" section and plugin showcase
- A more complete NetworkPolicy UI and network-related polish
- Nightly builds available for early testing
Thanks to Jaehan Byun and Jan Jansen.
Plugins and extensibility
Discovering plugins is simpler now - no more hopping between Artifact Hub and assorted GitHub repos. Browse our dedicated Plugins page for a curated catalog of Headlamp-endorsed plugins, along with a showcase of featured plugins.
View of the Plugins showcase
Headlamp AI Assistant
Managing Kubernetes often means memorizing commands and juggling tools. Headlamp's new AI Assistant changes this by adding a natural-language interface built into the UI. Now, instead of typing kubectl or digging through YAML you can ask, "Is my app healthy?" or "Show logs for this deployment," and get answers in context, speeding up troubleshooting and smoothing onboarding for new users. Learn more about it here.
New plugins additions
Alongside the new AI Assistant, we've been growing Headlamp's plugin ecosystem so you can bring more of your workflows into a single UI, with integrations like Minikube, Karpenter, and more.
Highlights from the latest plugin releases:
- Minikube plugin, providing a locally stored single node Minikube cluster
- Karpenter plugin, with support for Azure Node Auto-Provisioning (NAP)
- KEDA plugin, which you can learn more about here
- Community-maintained plugins for Gatekeeper and KAITO
Thanks to Vrushali Shah and Murali Annamneni from Oracle, and also to Anirban Singha, Adwait Godbole, Sertaç Özercan, Ernest Wong, and Chloe Lim.
Other plugins updates
Alongside new additions, we've also spent time refining plugins that many of you already use, focusing on smoother workflows and better integration with the core UI.
View of the Backstage plugin
Changes:
- Flux plugin: Updated for Flux v2.7, with support for newer CRDs, navigation fixes so it works smoothly on recent clusters
- App Catalog: Now supports Helm repos in addition to Artifact Hub, can run in-cluster via /serviceproxy, and shows both current and latest app versions
- Plugin Catalog: Improved card layout and accessibility, plus dependency and Storybook test updates
- Backstage plugin: Dependency and build updates, more info here
Plugin development
We've focused on making it faster and clearer to build, test, and ship Headlamp plugins, backed by improved documentation and lighter tooling.
View of the Plugin Development guide
Changes:
- New and expanded guides for plugin architecture and development, including how to publish and ship plugins
- Added i18n support documentation so plugins can be translated and localized
- Added example plugins: ui-panels, resource-charts, custom-theme, and projects
- Improved type checking for Headlamp APIs, restored Storybook support for component testing, and reduced dependencies for faster installs and fewer updates
- Documented plugin install locations, UI signifiers in Plugin Settings, and labels that differentiated shipped, UI-installed, and dev-mode plugins
Security upgrades
We've also been investing in keeping Headlamp secure - both by tightening how authentication works and by staying on top of upstream vulnerabilities and tooling.
Updates:
- We've been keeping up with security updates, regularly updating dependencies and addressing upstream security issues.
- We tightened the Helm chart's default security context and fixed a regression that broke the plugin manager.
- We've improved OIDC security with PKCE support, helping unblock more secure and standards-compliant OIDC setups when deploying Headlamp in-cluster.
Conclusion
Thank you to everyone who has contributed to Headlamp this year - whether through pull requests, plugins, or simply sharing how you're using the project. Seeing the different ways teams are adopting and extending the project is a big part of what keeps us moving forward. If your organization uses Headlamp, consider adding it to our adopters list.
If you haven't tried Headlamp recently, all these updates are available today. Check out the latest Headlamp release, explore the new views, plugins, and docs, and share your feedback with us on Slack or GitHub - your feedback helps shape where Headlamp goes next.
22 Jan 2026 2:00am GMT
21 Jan 2026
Kubernetes Blog
Announcing the Checkpoint/Restore Working Group
The community around Kubernetes includes a number of Special Interest Groups (SIGs) and Working Groups (WGs) facilitating discussions on important topics between interested contributors. Today we would like to announce the new Kubernetes Checkpoint Restore WG focusing on the integration of Checkpoint/Restore functionality into Kubernetes.
Motivation and use cases
There are several high-level scenarios discussed in the working group:
- Optimizing resource utilization for interactive workloads, such as Jupyter notebooks and AI chatbots
- Accelerating startup of applications with long initialization times, including Java applications and LLM inference services
- Using periodic checkpointing to enable fault-tolerance for long-running workloads, such as distributed model training
- Providing interruption-aware scheduling with transparent checkpoint/restore, allowing lower-priority Pods to be preempted while preserving the runtime state of applications
- Facilitating Pod migration across nodes for load balancing and maintenance, without disrupting workloads.
- Enabling forensic checkpointing to investigate and analyze security incidents such as cyberattacks, data breaches, and unauthorized access.
Across these scenarios, the goal is to help facilitate discussions of ideas between the Kubernetes community and the growing Checkpoint/Restore in Userspace (CRIU) ecosystem. The CRIU community includes several projects that support these use cases, including:
- CRIU - A tool for checkpointing and restoring running applications and containers
- checkpointctl - A tool for in-depth analysis of container checkpoints
- criu-coordinator - A tool for coordinated checkpoint/restore of distributed applications with CRIU
- checkpoint-restore-operator - A Kubernetes operator for managing checkpoints
More information about the checkpoint/restore integration with Kubernetes is also available here.
Related events
Following our presentation about transparent checkpointing at KubeCon EU 2025, we are excited to welcome you to our panel discussion and AI + ML session at KubeCon + CloudNativeCon Europe 2026.
Connect with us
If you are interested in contributing to Kubernetes or CRIU, there are several ways to participate:
- Join our meeting every second Thursday at 17:00 UTC via the Zoom link in our meeting notes; recordings of our prior meetings are available here.
- Chat with us on the Kubernetes Slack: #wg-checkpoint-restore
- Email us at the wg-checkpoint-restore mailing list
21 Jan 2026 6:00pm GMT
19 Jan 2026
Kubernetes Blog
Uniform API server access using clientcmd
If you've ever wanted to develop a command line client for a Kubernetes API, especially if you've considered making your client usable as a kubectl plugin, you might have wondered how to make your client feel familiar to users of kubectl. A quick glance at the output of kubectl options might put a damper on that: "Am I really supposed to implement all those options?"
Fear not, others have done a lot of the work involved for you. In fact, the Kubernetes project provides two libraries to help you handle kubectl-style command line arguments in Go programs: clientcmd and cli-runtime (which uses clientcmd). This article will show how to use the former.
General philosophy
As might be expected since it's part of client-go, clientcmd's ultimate purpose is to provide an instance of restclient.Config that can issue requests to an API server.
It follows kubectl semantics:
- defaults are taken from
~/.kubeor equivalent; - files can be specified using the
KUBECONFIGenvironment variable; - all of the above settings can be further overridden using command line arguments.
It doesn't set up a --kubeconfig command line argument, which you might want to do to align with kubectl; you'll see how to do this in the "Bind the flags" section.
Available features
clientcmd allows programs to handle
kubeconfigselection (usingKUBECONFIG);- context selection;
- namespace selection;
- client certificates and private keys;
- user impersonation;
- HTTP Basic authentication support (username/password).
Configuration merging
In various scenarios, clientcmd supports merging configuration settings: KUBECONFIG can specify multiple files whose contents are combined. This can be confusing, because settings are merged in different directions depending on how they are implemented. If a setting is defined in a map, the first definition wins, subsequent definitions are ignored. If a setting is not defined in a map, the last definition wins.
When settings are retrieved using KUBECONFIG, missing files result in warnings only. If the user explicitly specifies a path (in --kubeconfig style), there must be a corresponding file.
If KUBECONFIG isn't defined, the default configuration file, ~/.kube/config, is used instead, if present.
Overall process
The general usage pattern is succinctly expressed in the clientcmd package documentation:
loadingRules := clientcmd.NewDefaultClientConfigLoadingRules()
// if you want to change the loading rules (which files in which order), you can do so here
configOverrides := &clientcmd.ConfigOverrides{}
// if you want to change override values or bind them to flags, there are methods to help you
kubeConfig := clientcmd.NewNonInteractiveDeferredLoadingClientConfig(loadingRules, configOverrides)
config, err := kubeConfig.ClientConfig()
if err != nil {
// Do something
}
client, err := metav1.New(config)
// ...
In the context of this article, there are six steps:
- Configure the loading rules.
- Configure the overrides.
- Build a set of flags.
- Bind the flags.
- Build the merged configuration.
- Obtain an API client.
Configure the loading rules
clientcmd.NewDefaultClientConfigLoadingRules() builds loading rules which will use either the contents of the KUBECONFIG environment variable, or the default configuration file name (~/.kube/config). In addition, if the default configuration file is used, it is able to migrate settings from the (very) old default configuration file (~/.kube/.kubeconfig).
You can build your own ClientConfigLoadingRules, but in most cases the defaults are fine.
Configure the overrides
clientcmd.ConfigOverrides is a struct storing overrides which will be applied over the settings loaded from the configuration derived using the loading rules. In the context of this article, its primary purpose is to store values obtained from command line arguments. These are handled using the pflag library, which is a drop-in replacement for Go's flag package, adding support for double-hyphen arguments with long names.
In most cases there's nothing to set in the overrides; I will only bind them to flags.
Build a set of flags
In this context, a flag is a representation of a command line argument, specifying its long name (such as --namespace), its short name if any (such as -n), its default value, and a description shown in the usage information. Flags are stored in instances of the FlagInfo struct.
Three sets of flags are available, representing the following command line arguments:
- authentication arguments (certificates, tokens, impersonations, username/password);
- cluster arguments (API server, certificate authority, TLS configuration, proxy, compression)
- context arguments (cluster name,
kubeconfiguser name, namespace)
The recommended selection includes all three with a named context selection argument and a timeout argument.
These are all available using the Recommended…Flags functions. The functions take a prefix, which is prepended to all the argument long names.
So calling clientcmd.RecommendedConfigOverrideFlags("") results in command line arguments such as --context, --namespace, and so on. The --timeout argument is given a default value of 0, and the --namespace argument has a corresponding short variant, -n. Adding a prefix, such as "from-", results in command line arguments such as --from-context, --from-namespace, etc. This might not seem particularly useful on commands involving a single API server, but they come in handy when multiple API servers are involved, such as in multi-cluster scenarios.
There's a potential gotcha here: prefixes don't modify the short name, so --namespace needs some care if multiple prefixes are used: only one of the prefixes can be associated with the -n short name. You'll have to clear the short names associated with the other prefixes' --namespace , or perhaps all prefixes if there's no sensible -n association. Short names can be cleared as follows:
kflags := clientcmd.RecommendedConfigOverrideFlags(prefix)
kflags.ContextOverrideFlags.Namespace.ShortName = ""
In a similar fashion, flags can be disabled entirely by clearing their long name:
kflags.ContextOverrideFlags.Namespace.LongName = ""
Bind the flags
Once a set of flags has been defined, it can be used to bind command line arguments to overrides using clientcmd.BindOverrideFlags. This requires a pflag FlagSet rather than one from Go's flag package.
If you also want to bind --kubeconfig, you should do so now, by binding ExplicitPath in the loading rules:
flags.StringVarP(&loadingRules.ExplicitPath, "kubeconfig", "", "", "absolute path(s) to the kubeconfig file(s)")
Build the merged configuration
Two functions are available to build a merged configuration:
clientcmd.NewInteractiveDeferredLoadingClientConfigclientcmd.NewNonInteractiveDeferredLoadingClientConfig
As the names suggest, the difference between the two is that the first can ask for authentication information interactively, using a provided reader, whereas the second only operates on the information given to it by the caller.
The "deferred" mention in these function names refers to the fact that the final configuration will be determined as late as possible. This means that these functions can be called before the command line arguments are parsed, and the resulting configuration will use whatever values have been parsed by the time it's actually constructed.
Obtain an API client
The merged configuration is returned as a ClientConfig instance. An API client can be obtained from that by calling the ClientConfig() method.
If no configuration is given (KUBECONFIG is empty or points to non-existent files, ~/.kube/config doesn't exist, and no configuration is given using command line arguments), the default setup will return an obscure error referring to KUBERNETES_MASTER. This is legacy behaviour; several attempts have been made to get rid of it, but it is preserved for the --local and --dry-run command line arguments in --kubectl. You should check for "empty configuration" errors by calling clientcmd.IsEmptyConfig() and provide a more explicit error message.
The Namespace() method is also useful: it returns the namespace that should be used. It also indicates whether the namespace was overridden by the user (using --namespace).
Full example
Here's a complete example.
package main
import (
"context"
"fmt"
"os"
"github.com/spf13/pflag"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/client-go/kubernetes"
"k8s.io/client-go/tools/clientcmd"
)
func main() {
// Loading rules, no configuration
loadingRules := clientcmd.NewDefaultClientConfigLoadingRules()
// Overrides and flag (command line argument) setup
configOverrides := &clientcmd.ConfigOverrides{}
flags := pflag.NewFlagSet("clientcmddemo", pflag.ExitOnError)
clientcmd.BindOverrideFlags(configOverrides, flags,
clientcmd.RecommendedConfigOverrideFlags(""))
flags.StringVarP(&loadingRules.ExplicitPath, "kubeconfig", "", "", "absolute path(s) to the kubeconfig file(s)")
flags.Parse(os.Args)
// Client construction
kubeConfig := clientcmd.NewNonInteractiveDeferredLoadingClientConfig(loadingRules, configOverrides)
config, err := kubeConfig.ClientConfig()
if err != nil {
if clientcmd.IsEmptyConfig(err) {
panic("Please provide a configuration pointing to the Kubernetes API server")
}
panic(err)
}
client, err := kubernetes.NewForConfig(config)
if err != nil {
panic(err)
}
// How to find out what namespace to use
namespace, overridden, err := kubeConfig.Namespace()
if err != nil {
panic(err)
}
fmt.Printf("Chosen namespace: %s; overridden: %t\n", namespace, overridden)
// Let's use the client
nodeList, err := client.CoreV1().Nodes().List(context.TODO(), v1.ListOptions{})
if err != nil {
panic(err)
}
for _, node := range nodeList.Items {
fmt.Println(node.Name)
}
}
Happy coding, and thank you for your interest in implementing tools with familiar usage patterns!
19 Jan 2026 6:00pm GMT