02 Jan 2026
Kubernetes Blog
Kubernetes v1.35: New level of efficiency with in-place Pod restart
The release of Kubernetes 1.35 introduces a powerful new feature that provides a much-requested capability: the ability to trigger a full, in-place restart of the Pod. This feature, Restart All Containers (alpha in 1.35), allows for an efficient way to reset a Pod's state compared to resource-intensive approach of deleting and recreating the entire Pod. This feature is especially useful for AI/ML workloads allowing application developers to concentrate on their core training logic while offloading complex failure-handling and recovery mechanisms to sidecars and declarative Kubernetes configuration. With RestartAllContainers and other planned enhancements, Kubernetes continues to add building blocks for creating the most flexible, robust, and efficient platforms for AI/ML workloads.
This new functionality is available by enabling the RestartAllContainersOnContainerExits feature gate. This alpha feature extends the Container Restart Rules feature, which graduated to beta in Kubernetes 1.35.
The problem: when a single container restart isn't enough and recreating pods is too costly
Kubernetes has long supported restart policies at the Pod level (restartPolicy) and, more recently, at the individual container level. These policies are great for handling crashes in a single, isolated process. However, many modern applications have more complex inter-container dependencies. For instance:
- An init container prepares the environment by mounting a volume or generating a configuration file. If the main application container corrupts this environment, simply restarting that one container is not enough. The entire initialization process needs to run again.
- A watcher sidecar monitors system health. If it detects an unrecoverable but retriable error state, it must trigger a restart of the main application container from a clean slate.
- A sidecar that manages a remote resource fails. Even if the sidecar restarts on its own, the main container may be stuck trying to access an outdated or broken connection.
In all these cases, the desired action is not to restart a single container, but all of them. Previously, the only way to achieve this was to delete the Pod and have a controller (like a Job or ReplicaSet) create a new one. This process is slow and expensive, involving the scheduler, node resource allocation and re-initialization of networking and storage.
This inefficiency becomes even worse when handling large-scale AI/ML workloads (>= 1,000 Nodes with one Pod per Node). A common requirement for these synchronous workloads is that when a failure occurs (such as a Node crash), all Pods in the fleet must be recreated to reset the state before training can resume, even if all the other Pods were not directly affected by the failure. Deleting, creating and scheduling thousands of Pods simultaneously creates a massive bottleneck. The estimated overhead of this failure could cost $100,000 per month in wasted resources.
Handling these failures for AI/ML training jobs requires a complex integration touching both the training framework and Kubernetes, which are often fragile and toilsome. This feature introduces a Kubernetes-native solution, improving system robustness and allowing application developers to concentrate on their core training logic.
Another major benefit of restarting Pods in place is that keeping Pods on their assigned Nodes allows for further optimizations. For example, one can implement node-level caching tied to a specific Pod identity, something that is impossible when Pods are unnecessarily being recreated on different Nodes.
Introducing the RestartAllContainers action
To address this, Kubernetes v1.35 adds a new action to the container restart rules: RestartAllContainers. When a container exits in a way that matches a rule with this action, the kubelet initiates a fast, in-place restart of the Pod.
This in-place restart is highly efficient because it preserves the Pod's most important resources:
- The Pod's UID, IP address and network namespace.
- The Pod's sandbox and any attached devices.
- All volumes, including
emptyDirand mounted volumes from PVCs.
After terminating all running containers, the Pod's startup sequence is re-executed from the very beginning. This means all init containers are run again in order, followed by the sidecar and regular containers, ensuring a completely fresh start in a known-good environment. With the exception of ephemeral containers (which are terminated), all other containers-including those that previously succeeded or failed-will be restarted, regardless of their individual restart policies.
Use cases
1. Efficient restarts for ML/Batch jobs
For ML training jobs, rescheduling a worker Pod on failure is a costly operation that wastes valuable compute resources. On a 1,000-node training cluster, rescheduling overhead can waste over $100,000 in compute resources monthly.
With RestartAllContainers actions you can address this by enabling a much faster, hybrid recovery strategy: recreate only the "bad" Pods (e.g., those on unhealthy Nodes) while triggering RestartAllContainers for the remaining healthy Pods. Benchmarks show this reduces the recovery overhead from minutes to a few seconds.
With in-place restarts, a watcher sidecar can monitor the main training process. If it encounters a specific, retriable error, the watcher can exit with a designated code to trigger a fast reset of the worker Pod, allowing it to restart from the last checkpoint without involving the Job controller. This capability is now natively supported by Kubernetes.
Read more details about future development and JobSet features at KEP-467 JobSet in-place restart.
apiVersion: v1
kind: Pod
metadata:
name: ml-worker-pod
spec:
restartPolicy: Never
initContainers:
# This init container will re-run on every in-place restart
- name: setup-environment
image: my-repo/setup-worker:1.0
- name: watcher-sidecar
image: my-repo/watcher:1.0
restartPolicy: Always
restartPolicyRules:
- action: RestartAllContainers
onExit:
exitCodes:
operator: In
# A specific exit code from the watcher triggers a full pod restart
values: [88]
containers:
- name: main-application
image: my-repo/training-app:1.0
2. Re-running init containers for a clean state
Imagine a scenario where an init container is responsible for fetching credentials or setting up a shared volume. If the main application fails in a way that corrupts this shared state, you need the init container to rerun.
By configuring the main application to exit with a specific code upon detecting such a corruption, you can trigger the RestartAllContainers action, guaranteeing that the init container provides a clean setup before the application restarts.
3. Handling high rate of similar tasks execution
There are cases when tasks are best represented as a Pod execution. And each task requires a clean execution. The task may be a game session backend or some queue item processing. If the rate of tasks is high, running the whole cycle of Pod creation, scheduling and initialization is simply too expensive, especially when tasks can be short. The ability to restart all containers from scratch enables a Kubernetes-native way to handle this scenario without custom solutions or frameworks.
How to use it
To try this feature, you must enable the RestartAllContainersOnContainerExits feature gate on your Kubernetes cluster components (API server and kubelet) running Kubernetes v1.35+. This alpha feature extends the ContainerRestartRules feature, which graduated to beta in v1.35 and is enabled by default.
Once enabled, you can add restartPolicyRules to any container (init, sidecar, or regular) and use the RestartAllContainers action.
The feature is designed to be easily usable on existing apps. However, if an application does not follow some best practices, it may cause issues for the application or for observability tooling. When enabling the feature, make sure that all containers are reentrant and that external tooling is prepared for init containers to re-run. Also, when restarting all containers, the kubelet does not run preStop hooks. This means containers must be designed to handle abrupt termination without relying on preStop hooks for graceful shutdown.
Observing the restart
To make this process observable, a new Pod condition, AllContainersRestarting, is added to the Pod's status. When a restart is triggered, this condition becomes True and it reverts to False once all containers have terminated and the Pod is ready to start its lifecycle anew. This provides a clear signal to users and other cluster components about the Pod's state.
All containers restarted by this action will have their restart count incremented in the container status.
Learn more
- Read the official documentation on Pod Lifecycle.
- Read the detailed proposal in the KEP-5532: Restart All Containers on Container Exits.
- Read the proposal for JobSet in-place restart in JobSet issue #467.
We want your feedback!
As an alpha feature, RestartAllContainers is ready for you to experiment with and any use cases and feedback are welcome. This feature is driven by the SIG Node community. If you are interested in getting involved, sharing your thoughts, or contributing, please join us!
You can reach SIG Node through:
- Slack: #sig-node
- Mailing list
02 Jan 2026 6:30pm GMT
31 Dec 2025
Kubernetes Blog
Kubernetes 1.35: Enhanced Debugging with Versioned z-pages APIs
Debugging Kubernetes control plane components can be challenging, especially when you need to quickly understand the runtime state of a component or verify its configuration. With Kubernetes 1.35, we're enhancing the z-pages debugging endpoints with structured, machine-parseable responses that make it easier to build tooling and automate troubleshooting workflows.
What are z-pages?
z-pages are special debugging endpoints exposed by Kubernetes control plane components. Introduced as an alpha feature in Kubernetes 1.32, these endpoints provide runtime diagnostics for components like kube-apiserver, kube-controller-manager, kube-scheduler, kubelet and kube-proxy. The name "z-pages" comes from the convention of using /*z paths for debugging endpoints.
Currently, Kubernetes supports two primary z-page endpoints:
/statusz- Displays high-level component information including version information, start time, uptime, and available debug paths
/flagz- Shows all command-line arguments and their values used to start the component (with confidential values redacted for security)
These endpoints are valuable for human operators who need to quickly inspect component state, but until now, they only returned plain text output that was difficult to parse programmatically.
What's new in Kubernetes 1.35?
Kubernetes 1.35 introduces structured, versioned responses for both /statusz and /flagz endpoints. This enhancement maintains backward compatibility with the existing plain text format while adding support for machine-readable JSON responses.
Backward compatible design
The new structured responses are opt-in. Without specifying an Accept header, the endpoints continue to return the familiar plain text format:
$ curl --cert /etc/kubernetes/pki/apiserver-kubelet-client.crt \
--key /etc/kubernetes/pki/apiserver-kubelet-client.key \
--cacert /etc/kubernetes/pki/ca.crt \
https://localhost:6443/statusz
kube-apiserver statusz
Warning: This endpoint is not meant to be machine parseable, has no formatting compatibility guarantees and is for debugging purposes only.
Started: Wed Oct 16 21:03:43 UTC 2024
Up: 0 hr 00 min 16 sec
Go version: go1.23.2
Binary version: 1.35.0-alpha.0.1595
Emulation version: 1.35
Paths: /healthz /livez /metrics /readyz /statusz /version
Structured JSON responses
To receive a structured response, include the appropriate Accept header:
Accept: application/json;v=v1alpha1;g=config.k8s.io;as=Statusz
This returns a versioned JSON response:
{
"kind": "Statusz",
"apiVersion": "config.k8s.io/v1alpha1",
"metadata": {
"name": "kube-apiserver"
},
"startTime": "2025-10-29T00:30:01Z",
"uptimeSeconds": 856,
"goVersion": "go1.23.2",
"binaryVersion": "1.35.0",
"emulationVersion": "1.35",
"paths": [
"/healthz",
"/livez",
"/metrics",
"/readyz",
"/statusz",
"/version"
]
}
Similarly, /flagz supports structured responses with the header:
Accept: application/json;v=v1alpha1;g=config.k8s.io;as=Flagz
Example response:
{
"kind": "Flagz",
"apiVersion": "config.k8s.io/v1alpha1",
"metadata": {
"name": "kube-apiserver"
},
"flags": {
"advertise-address": "192.168.8.4",
"allow-privileged": "true",
"authorization-mode": "[Node,RBAC]",
"enable-priority-and-fairness": "true",
"profiling": "true"
}
}
Why structured responses matter
The addition of structured responses opens up several new possibilities:
1. Automated health checks and monitoring
Instead of parsing plain text, monitoring tools can now easily extract specific fields. For example, you can programmatically check if a component has been running with an unexpected emulated version or verify that critical flags are set correctly.
2. Better debugging tools
Developers can build sophisticated debugging tools that compare configurations across multiple components or track configuration drift over time. The structured format makes it trivial to diff configurations or validate that components are running with expected settings.
3. API versioning and stability
By introducing versioned APIs (starting with v1alpha1), we provide a clear path to stability. As the feature matures, we'll introduce v1beta1 and eventually v1, giving you confidence that your tooling won't break with future Kubernetes releases.
How to use structured z-pages
Prerequisites
Both endpoints require feature gates to be enabled:
/statusz: Enable theComponentStatuszfeature gate/flagz: Enable theComponentFlagzfeature gate
Example: Getting structured responses
Here's an example using curl to retrieve structured JSON responses from the kube-apiserver:
# Get structured statusz response
curl \
--cert /etc/kubernetes/pki/apiserver-kubelet-client.crt \
--key /etc/kubernetes/pki/apiserver-kubelet-client.key \
--cacert /etc/kubernetes/pki/ca.crt \
-H "Accept: application/json;v=v1alpha1;g=config.k8s.io;as=Statusz" \
https://localhost:6443/statusz | jq .
# Get structured flagz response
curl \
--cert /etc/kubernetes/pki/apiserver-kubelet-client.crt \
--key /etc/kubernetes/pki/apiserver-kubelet-client.key \
--cacert /etc/kubernetes/pki/ca.crt \
-H "Accept: application/json;v=v1alpha1;g=config.k8s.io;as=Flagz" \
https://localhost:6443/flagz | jq .
Note:
The examples above use client certificate authentication and verify the server's certificate using--cacert. If you need to bypass certificate verification in a test environment, you can use --insecure (or -k), but this should never be done in production as it makes you vulnerable to man-in-the-middle attacks.Important considerations
Alpha feature status
The structured z-page responses are an alpha feature in Kubernetes 1.35. This means:
- The API format may change in future releases
- These endpoints are intended for debugging, not production automation
- You should avoid relying on them for critical monitoring workflows until they reach beta or stable status
Security and access control
z-pages expose internal component information and require proper access controls. Here are the key security considerations:
Authorization: Access to z-page endpoints is restricted to members of the system:monitoring group, which follows the same authorization model as other debugging endpoints like /healthz, /livez, and /readyz. This ensures that only authorized users and service accounts can access debugging information. If your cluster uses RBAC, you can manage access by granting appropriate permissions to this group.
Authentication: The authentication requirements for these endpoints depend on your cluster's configuration. Unless anonymous authentication is enabled for your cluster, you typically need to use authentication mechanisms (such as client certificates) to access these endpoints.
Information disclosure: These endpoints reveal configuration details about your cluster components, including:
- Component versions and build information
- All command-line arguments and their values (with confidential values redacted)
- Available debug endpoints
Only grant access to trusted operators and debugging tools. Avoid exposing these endpoints to unauthorized users or automated systems that don't require this level of access.
Future evolution
As the feature matures, we (Kubernetes SIG Instrumentation) expect to:
- Introduce
v1beta1and eventuallyv1versions of the API - Gather community feedback on the response schema
- Potentially add additional z-page endpoints based on user needs
Try it out
We encourage you to experiment with structured z-pages in a test environment:
- Enable the
ComponentStatuszandComponentFlagzfeature gates on your control plane components - Try querying the endpoints with both plain text and structured formats
- Build a simple tool or script that uses the structured data
- Share your feedback with the community
Learn more
- z-pages documentation
- KEP-4827: Component Statusz
- KEP-4828: Component Flagz
- Join the discussion in the #sig-instrumentation channel on Kubernetes Slack
Get involved
We'd love to hear your feedback! The structured z-pages feature is designed to make Kubernetes easier to debug and monitor. Whether you're building internal tooling, contributing to open source projects, or just exploring the feature, your input helps shape the future of Kubernetes observability.
If you have questions, suggestions, or run into issues, please reach out to SIG Instrumentation. You can find us on Slack or at our regular community meetings.
Happy debugging!
31 Dec 2025 6:30pm GMT
30 Dec 2025
Kubernetes Blog
Kubernetes v1.35: Watch Based Route Reconciliation in the Cloud Controller Manager
Up to and including Kubernetes v1.34, the route controller in Cloud Controller Manager (CCM) implementations built using the k8s.io/cloud-provider library reconciles routes at a fixed interval. This causes unnecessary API requests to the cloud provider when there are no changes to routes. Other controllers implemented through the same library already use watch-based mechanisms, leveraging informers to avoid unnecessary API calls. A new feature gate is being introduced in v1.35 to allow changing the behavior of the route controller to use watch-based informers.
What's new?
The feature gate CloudControllerManagerWatchBasedRoutesReconciliation has been introduced to k8s.io/cloud-provider in alpha stage by SIG Cloud Provider. To enable this feature you can use --feature-gate=CloudControllerManagerWatchBasedRoutesReconciliation=true in the CCM implementation you are using.
About the feature gate
This feature gate will trigger the route reconciliation loop whenever a node is added, deleted, or the fields .spec.podCIDRs or .status.addresses are updated.
An additional reconcile is performed in a random interval between 12h and 24h, which is chosen at the controller's start time.
This feature gate does not modify the logic within the reconciliation loop. Therefore, users of a CCM implementation should not experience significant changes to their existing route configurations.
How can I learn more?
For more details, refer to the KEP-5237.
30 Dec 2025 6:30pm GMT