28 Nov 2023

feedKubernetes – Production-Grade Container Orchestration

Blog: New Experimental Features in Gateway API v1.0

Authors: Candace Holman (Red Hat), Dave Protasowski (VMware), Gaurav K Ghildiyal (Google), John Howard (Google), Simone Rodigari (IBM)

Recently, the Gateway API announced its v1.0 GA release, marking a huge milestone for the project.

Along with stabilizing some of the core functionality in the API, a number of exciting new experimental features have been added.

Backend TLS Policy

BackendTLSPolicy is a new Gateway API type used for specifying the TLS configuration of the connection from the Gateway to backend Pods via the Service API object. It is specified as a Direct PolicyAttachment without defaults or overrides, applied to a Service that accesses a backend, where the BackendTLSPolicy resides in the same namespace as the Service to which it is applied. All Gateway API Routes that point to a referenced Service should respect a configured BackendTLSPolicy.

While there were existing ways provided for TLS to be configured for edge and passthrough termination, this new API object specifically addresses the configuration of TLS in order to convey HTTPS from the Gateway dataplane to the backend. This is referred to as "backend TLS termination" and enables the Gateway to know how to connect to a backend Pod that has its own certificate.

Termination Types

The specification of a BackendTLSPolicy consists of:

Examples

Using System Certificates

In this example, the BackendTLSPolicy is configured to use system certificates to connect with a TLS-encrypted upstream connection where Pods backing the dev Service are expected to serve a valid certificate for dev.example.com.

apiVersion: gateway.networking.k8s.io/v1alpha2
kind: BackendTLSPolicy
metadata:
 name: tls-upstream-dev
spec:
 targetRef:
 kind: Service
 name: dev-service
 group: ""
 tls:
 wellKnownCACerts: "System"
 hostname: dev.example.com

Using Explicit CA Certificates

In this example, the BackendTLSPolicy is configured to use certificates defined in the configuration map auth-cert to connect with a TLS-encrypted upstream connection where Pods backing the auth Service are expected to serve a valid certificate for auth.example.com.

apiVersion: gateway.networking.k8s.io/v1alpha2
kind: BackendTLSPolicy
metadata:
 name: tls-upstream-auth
spec:
 targetRef:
 kind: Service
 name: auth-service
 group: ""
 tls:
 caCertRefs:
 - kind: ConfigMapReference
 name: auth-cert
 group: ""
 hostname: auth.example.com

The following illustrates a BackendTLSPolicy that configures TLS for a Service serving a backend:

flowchart LR client(["client"]) gateway["Gateway"] style gateway fill:#02f,color:#fff httproute["HTTP
Route"] style httproute fill:#02f,color:#fff service["Service"] style service fill:#02f,color:#fff pod1["Pod"] style pod1 fill:#02f,color:#fff pod2["Pod"] style pod2 fill:#02f,color:#fff client -.->|HTTP
request| gateway gateway --> httproute httproute -.->|BackendTLSPolicy|service service --> pod1 & pod2

For more information, refer to the documentation for TLS.

HTTPRoute Timeouts

A key enhancement in Gateway API's latest release (v1.0) is the introduction of the timeouts field within HTTPRoute Rules. This feature offers a dynamic way to manage timeouts for incoming HTTP requests, adding precision and reliability to your gateway setups.

With Timeouts, developers can fine-tune their Gateway API's behavior in two fundamental ways:

  1. Request Timeout:

    The request timeout is the duration within which the Gateway API implementation must send a response to a client's HTTP request. It allows flexibility in specifying when this timeout starts, either before or after the entire client request stream is received, making it implementation-specific. This timeout efficiently covers the entire request-response transaction, enhancing the responsiveness of your services.

  2. Backend Request Timeout:

    The backendRequest timeout is a game-changer for those dealing with backends. It sets a timeout for a single request sent from the Gateway to a backend service. This timeout spans from the initiation of the request to the reception of the full response from the backend. This feature is particularly helpful in scenarios where the Gateway needs to retry connections to a backend, ensuring smooth communication under various conditions.

Notably, the request timeout encompasses the backendRequest timeout. Hence, the value of backendRequest should never exceed the value of the request timeout.

The ability to configure these timeouts adds a new layer of reliability to your Kubernetes services. Whether it's ensuring client requests are processed within a specified timeframe or managing backend service communications, Gateway API's Timeouts offer the control and predictability you need.

To get started, you can define timeouts in your HTTPRoute Rules using the Timeouts field, specifying their type as Duration. A zero-valued timeout (0s) disables the timeout, while a valid non-zero-valued timeout should be at least 1ms.

Here's an example of setting request and backendRequest timeouts in an HTTPRoute:

apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
 name: timeout-example
spec:
 parentRefs:
 - name: example-gateway
 rules:
 - matches:
 - path:
 type: PathPrefix
 value: /timeout
 timeouts:
 request: 10s
 backendRequest: 2s
 backendRefs:
 - name: timeout-svc
 port: 8080

In this example, a request timeout of 10 seconds is defined, ensuring that client requests are processed within that timeframe. Additionally, a 2-second backendRequest timeout is set for individual requests from the Gateway to a backend service called timeout-svc.

These new HTTPRoute Timeouts provide Kubernetes users with more control and flexibility in managing network communications, helping ensure a smoother and more predictable experience for both clients and backends. For additional details and examples, refer to the official timeouts API documentation.

Gateway Infrastructure Labels

While Gateway API providers a common API for different implementations, each implementation will have different resources created under-the-hood to apply users' intent. This could be configuring cloud load balancers, creating in-cluster Pods and Services, or more.

While the API has always provided an extension point -- parametersRef in GatewayClass -- to customize implementation specific things, there was no common core way to express common infrastructure customizations.

Gateway API v1.0 paves the way for this with a new infrastructure field on the Gateway object, allowing customization of the underlying infrastructure. For now, this starts small with two critical fields: labels and annotations. When these are set, any generated infrastructure will have the provided labels and annotations set on them.

For example, I may want to group all my resources for one application together:

apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
 name: hello-world
spec:
 infrastructure:
 labels:
 app.kubernetes.io/name: hello-world

In the future, we are looking into more common infrastructure configurations, such as resource sizing.

For more information, refer to the documentation for this feature.

Support for Websockets, HTTP/2 and more!

Not all implementations of Gateway API support automatic protocol selection. In some cases protocols are disabled without an explicit opt-in.

When a Route's backend references a Kubernetes Service, application developers can specify the protocol using ServicePort appProtocol field.

For example the following store Kubernetes Service is indicating the port 8080 supports HTTP/2 Prior Knowledge.

apiVersion: v1
kind: Service
metadata:
 name: store
spec:
 selector:
 app: store
 ports:
 - protocol: TCP
 appProtocol: kubernetes.io/h2c
 port: 8080
 targetPort: 8080

Currently, Gateway API has conformance testing for:

For more information, refer to the documentation for Backend Protocol Selection.

gwctl, our new Gateway API command line tool

gwctl is a command line tool that aims to be a kubectl replacement for viewing Gateway API resources.

The initial release of gwctl that comes bundled with Gateway v1.0 release includes helpful features for managing Gateway API Policies. Gateway API Policies serve as powerful extension mechanisms for modifying the behavior of Gateway resources. One challenge with using policies is that it may be hard to discover which policies are affecting which Gateway resources. gwctl helps bridge this gap by answering questions like:

gwctl is still in the very early phases of development and hence may be a bit rough around the edges. Follow the instructions in the repository to install and try out gwctl.

Examples

Here are some examples of how gwctl can be used:

# List all policies in the cluster. This will also give the resource they bind
# to.
gwctl get policies -A
# List all available policy types.
gwctl get policycrds
# Describe all HTTPRoutes in namespace ns2. (Output includes effective policies)
gwctl describe httproutes -n ns2
# Describe a single HTTPRoute in the default namespace. (Output includes
# effective policies)
gwctl describe httproutes my-httproute-1
# Describe all Gateways across all namespaces. (Output includes effective
# policies)
gwctl describe gateways -A
# Describe a single GatewayClass. (Output includes effective policies)
gwctl describe gatewayclasses foo-com-external-gateway-class

Get involved

These projects, and many more, continue to be improved in Gateway API. There are lots of opportunities to get involved and help define the future of Kubernetes routing APIs for both Ingress and Mesh.

If this is interesting to you, please join us in the community and help us build the future of Gateway API together!

28 Nov 2023 6:00pm GMT

24 Nov 2023

feedKubernetes – Production-Grade Container Orchestration

Blog: Spotlight on SIG Testing

Author: Sandipan Panda

Welcome to another edition of the SIG spotlight blog series, where we highlight the incredible work being done by various Special Interest Groups (SIGs) within the Kubernetes project. In this edition, we turn our attention to SIG Testing, a group interested in effective testing of Kubernetes and automating away project toil. SIG Testing focus on creating and running tools and infrastructure that make it easier for the community to write and run tests, and to contribute, analyze and act upon test results.

To gain some insights into SIG Testing, Sandipan Panda spoke with Michelle Shepardson, a senior software engineer at Google and a chair of SIG Testing, and Patrick Ohly, a software engineer and architect at Intel and a SIG Testing Tech Lead.

Meet the contributors

Sandipan: Could you tell us a bit about yourself, your role, and how you got involved in the Kubernetes project and SIG Testing?

Michelle: Hi! I'm Michelle, a senior software engineer at Google. I first got involved in Kubernetes through working on tooling for SIG Testing, like the external instance of TestGrid. I'm part of oncall for TestGrid and Prow, and am now a chair for the SIG.

Patrick: Hello! I work as a software engineer and architect in a team at Intel which focuses on open source Cloud Native projects. When I ramped up on Kubernetes to develop a storage driver, my very first question was "how do I test it in a cluster and how do I log information?" That interest led to various enhancement proposals until I had (re)written enough code that also took over official roles as SIG Testing Tech Lead (for the E2E framework) and structured logging WG lead.

Testing practices and tools

Sandipan: Testing is a field in which multiple approaches and tools exist; how did you arrive at the existing practices?

Patrick: I can't speak about the early days because I wasn't around yet 😆, but looking back at some of the commit history it's pretty obvious that developers just took what was available and started using it. For E2E testing, that was Ginkgo+Gomega. Some hacks were necessary, for example around cleanup after a test run and for categorising tests. Eventually this led to Ginkgo v2 and revised best practices for E2E testing. Regarding unit testing opinions are pretty diverse: some maintainers prefer to use just the Go standard library with hand-written checks. Others use helper packages like stretchr/testify. That diversity is okay because unit tests are self-contained - contributors just have to be flexible when working on many different areas. Integration testing falls somewhere in the middle. It's based on Go unit tests, but needs complex helper packages to bring up an apiserver and other components, then runs tests that are more like E2E tests.

Subprojects owned by SIG Testing

Sandipan: SIG Testing is pretty diverse. Can you give a brief overview of the various subprojects owned by SIG Testing?

Michelle: Broadly, we have subprojects related to testing frameworks, and infrastructure, though they definitely overlap. So for the former, there's e2e-framework (used externally), test/e2e/framework (used for Kubernetes itself) and kubetest2 for end-to-end testing, as well as boskos (resource rental for e2e tests), KIND (Kubernetes-in-Docker, for local testing and development), and the cloud provider for KIND. For the latter, there's Prow (K8s-based CI/CD and chatops), and a litany of other tools and utilities for triage, analysis, coverage, Prow/TestGrid config generation, and more in the test-infra repo.

If you are willing to learn more and get involved with any of the SIG Testing subprojects, check out the SIG Testing README.

Key challenges and accomplishments

Sandipan: What are some of the key challenges you face?

Michelle: Kubernetes is a gigantic project in every aspect, from contributors to code to users and more. Testing and infrastructure have to meet that scale, keeping up with every change from every repo under Kubernetes while facilitating developing, improving, and releasing the project as much as possible, though of course, we're not the only SIG involved in that. I think another other challenge is staffing subprojects. SIG Testing has a number of subprojects that have existed for years, but many of the original maintainers for them have moved on to other areas or no longer have the time to maintain them. We need to grow long-term expertise and owners in those subprojects.

Patrick: As Michelle said, the sheer size can be a challenge. It's not just the infrastructure, also our processes must scale with the number of contributors. It's good to document best practices, but not good enough: we have many new contributors, which is good, but having reviewers explain best practices doesn't scale - assuming that the reviewers even know about them! It also doesn't help that existing code cannot get updated immediately because there is so much of it, in particular for E2E testing. The initiative to apply stricter linting to new or modified code while accepting that existing code doesn't pass those same linter checks helps a bit.

Sandipan: Any SIG accomplishments that you are proud of and would like to highlight?

Patrick: I am biased because I have been driving this, but I think that the E2E framework and linting are now in a much better shape than they used to be. We may soon be able to run integration tests with race detection enabled, which is important because we currently only have that for unit tests and those tend to be less complex.

Sandipan: Testing is always important, but is there anything specific to your work in terms of the Kubernetes release process?

Patrick: test flakes… if we have too many of those, development velocity goes down because PRs cannot be merged without clean test runs and those become less likely. Developers also lose trust in testing and just "retest" until they have a clean run, without checking whether failures might indeed be related to a regression in their current change.

The people and the scope

Sandipan: What are some of your favourite things about this SIG?

Michelle: The people, of course 🙂. Aside from that, I like the broad scope SIG Testing has. I feel like even small changes can make a big difference for fellow contributors, and even if my interests change over time, I'll never run out of projects to work on.

Patrick: I can work on things that make my life and the life of my fellow developers better, like the tooling that we have to use every day while working on some new feature elsewhere.

Sandipan: Are there any funny / cool / TIL anecdotes that you could tell us?

Patrick: I started working on E2E framework enhancements five years ago, then was less active there for a while. When I came back and wanted to test some new enhancement, I asked about how to write unit tests for the new code and was pointed to some existing tests which looked vaguely familiar, as if I had seen them before. I looked at the commit history and found that I had written them! I'll let you decide whether that says something about my failing long-term memory or simply is normal… Anyway, folks, remember to write good commit messages and comments; someone will need them at some point - it might even be yourself!

Looking ahead

Sandipan: What areas and/or subprojects does your SIG need help with?

Michelle: Some subprojects aren't staffed at the moment and could use folks willing to learn more about them. boskos and kubetest2 especially stand out to me, since both are important for testing but lack dedicated owners.

Sandipan: Are there any useful skills that new contributors to SIG Testing can bring to the table? What are some things that people can do to help this SIG if they come from a background that isn't directly linked to programming?

Michelle: I think user empathy, writing clear feedback, and recognizing patterns are really useful. Someone who uses the test framework or tooling and can outline pain points with clear examples, or who can recognize a wider issue in the project and pull data to inform solutions for it.

Sandipan: What's next for SIG Testing?

Patrick: Stricter linting will soon become mandatory for new code. There are several E2E framework sub-packages that could be modernised, if someone wants to take on that work. I also see an opportunity to unify some of our helper code for E2E and integration testing, but that needs more thought and discussion.

Michelle: I'm looking forward to making some usability improvements for some of our tools and infra, and to supporting more long-term contributions and growth of contributors into long-term roles within the SIG. If you're interested, hit us up!

Looking ahead, SIG Testing has exciting plans in store. You can get in touch with the folks at SIG Testing in their Slack channel or attend one of their regular bi-weekly meetings on Tuesdays. If you are interested in making it easier for the community to run tests and contribute test results, to ensure Kubernetes is stable across a variety of cluster configurations and cloud providers, join the SIG Testing community today!

24 Nov 2023 12:00am GMT

16 Nov 2023

feedKubernetes – Production-Grade Container Orchestration

Blog: The Case for Kubernetes Resource Limits: Predictability vs. Efficiency

Author: Milan Plžík (Grafana Labs)

There's been quite a lot of posts suggesting that not using Kubernetes resource limits might be a fairly useful thing (for example, For the Love of God, Stop Using CPU Limits on Kubernetes or Kubernetes: Make your services faster by removing CPU limits ). The points made there are totally valid - it doesn't make much sense to pay for compute power that will not be used due to limits, nor to artificially increase latency. This post strives to argue that limits have their legitimate use as well.

As a Site Reliability Engineer on the Grafana Labs platform team, which maintains and improves internal infrastructure and tooling used by the product teams, I primarily try to make Kubernetes upgrades as smooth as possible. But I also spend a lot of time going down the rabbit hole of various interesting Kubernetes issues. This article reflects my personal opinion, and others in the community may disagree.

Let's flip the problem upside down. Every pod in a Kubernetes cluster has inherent resource limits - the actual CPU, memory, and other resources of the machine it's running on. If those physical limits are reached by a pod, it will experience throttling similar to what is caused by reaching Kubernetes limits.

The problem

Pods without (or with generous) limits can easily consume the extra resources on the node. This, however, has a hidden cost - the amount of extra resources available often heavily depends on pods scheduled on the particular node and their actual load. These extra resources make each pod a special snowflake when it comes to real resource allocation. Even worse, it's fairly hard to figure out the resources that the pod had at its disposal at any given moment - certainly not without unwieldy data mining of pods running on a particular node, their resource consumption, and similar. And finally, even if we pass this obstacle, we can only have data sampled up to a certain rate and get profiles only for a certain fraction of our calls. This can be scaled up, but the amount of observability data generated might easily reach diminishing returns. Thus, there's no easy way to tell if a pod had a quick spike and for a short period of time used twice as much memory as usual to handle a request burst.

Now, with Black Friday and Cyber Monday approaching, businesses expect a surge in traffic. Good performance data/benchmarks of the past performance allow businesses to plan for some extra capacity. But is data about pods without limits reliable? With memory or CPU instant spikes handled by the extra resources, everything might look good according to past data. But once the pod bin-packing changes and the extra resources get more scarce, everything might start looking different - ranging from request latencies rising negligibly to requests slowly snowballing and causing pod OOM kills. While almost no one actually cares about the former, the latter is a serious issue that requires instant capacity increase.

Configuring the limits

Not using limits takes a tradeoff - it opportunistically improves the performance if there are extra resources available, but lowers predictability of the performance, which might strike back in the future. There are a few approaches that can be used to increase the predictability again. Let's pick two of them to analyze:

Some other cases might also be considered, but these are probably the two simplest ones to discuss.

Cluster resource economy

Note that in both cases discussed above, we're effectively preventing the workloads from using some cluster resources it has at the cost of getting more predictability - which might sound like a steep price to pay for a bit more stable performance. Let's try to quantify the impact there.

Bin-packing and cluster resource allocation

Firstly, let's discuss bin-packing and cluster resource allocation. There's some inherent cluster inefficiency that comes to play - it's hard to achieve 100% resource allocation in a Kubernetes cluster. Thus, some percentage will be left unallocated.

When configuring fixed-fraction headroom limits, a proportional amount of this will be available to the pods. If the percentage of unallocated resources in the cluster is lower than the constant we use for setting fixed-fraction headroom limits (see the figure, line 2), all the pods together are able to theoretically use up all the node's resources; otherwise there are some resources that will inevitably be wasted (see the figure, line 1). In order to eliminate the inevitable resource waste, the percentage for fixed-fraction headroom limits should be configured so that it's at least equal to the expected percentage of unallocated resources.

Chart displaying various requests/limits configurations

For requests = limits (see the figure, line 3), this does not hold: Unless we're able to allocate all node's resources, there's going to be some inevitably wasted resources. Without any knobs to turn on the requests/limits side, the only suitable approach here is to ensure efficient bin-packing on the nodes by configuring correct machine profiles. This can be done either manually or by using a variety of cloud service provider tooling - for example Karpenter for EKS or GKE Node auto provisioning.

Optimizing actual resource utilization

Free resources also come in the form of unused resources of other pods (reserved vs. actual CPU utilization, etc.), and their availability can't be predicted in any reasonable way. Configuring limits makes it next to impossible to utilize these. Looking at this from a different perspective, if a workload wastes a significant amount of resources it has requested, re-visiting its own resource requests might be a fair thing to do. Looking at past data and picking more fitting resource requests might help to make the packing more tight (although at the price of worsening its performance - for example increasing long tail latencies).

Conclusion

Optimizing resource requests and limits is hard. Although it's much easier to break things when setting limits, those breakages might help prevent a catastrophe later by giving more insights into how the workload behaves in bordering conditions. There are cases where setting limits makes less sense: batch workloads (which are not latency-sensitive - for example non-live video encoding), best-effort services (don't need that level of availability and can be preempted), clusters that have a lot of spare resources by design (various cases of specialty workloads - for example services that handle spikes by design).

On the other hand, setting limits shouldn't be avoided at all costs - even though figuring out the "right" value for limits is harder and configuring a wrong value yields less forgiving situations. Configuring limits helps you learn about a workload's behavior in corner cases, and there are simple strategies that can help when reasoning about the right value. It's a tradeoff between efficient resource usage and performance predictability and should be considered as such.

There's also an economic aspect of workloads with spiky resource usage. Having "freebie" resources always at hand does not serve as an incentive to improve performance for the product team. Big enough spikes might easily trigger efficiency issues or even problems when trying to defend a product's SLA - and thus, might be a good candidate to mention when assessing any risks.

16 Nov 2023 12:00am GMT

Blog: Kubernetes Removals, Deprecations, and Major Changes in Kubernetes 1.29

Authors: Carol Valencia, Kristin Martin, Abigail McCarthy, James Quigley, Hosam Kamel

As with every release, Kubernetes v1.29 will introduce feature deprecations and removals. Our continued ability to produce high-quality releases is a testament to our robust development cycle and healthy community. The following are some of the deprecations and removals coming in the Kubernetes 1.29 release.

The Kubernetes API removal and deprecation process

The Kubernetes project has a well-documented deprecation policy for features. This policy states that stable APIs may only be deprecated when a newer, stable version of that same API is available and that APIs have a minimum lifetime for each stability level. A deprecated API is one that has been marked for removal in a future Kubernetes release; it will continue to function until removal (at least one year from the deprecation), but usage will result in a warning being displayed. Removed APIs are no longer available in the current version, at which point you must migrate to using the replacement.

Whether an API is removed as a result of a feature graduating from beta to stable or because that API simply did not succeed, all removals comply with this deprecation policy. Whenever an API is removed, migration options are communicated in the documentation.

A note about the k8s.gcr.io redirect to registry.k8s.io

To host its container images, the Kubernetes project uses a community-owned image registry called registry.k8s.io. Starting last March traffic to the old k8s.gcr.io registry began being redirected to registry.k8s.io. The deprecated k8s.gcr.io registry will eventually be phased out. For more details on this change or to see if you are impacted, please read k8s.gcr.io Redirect to registry.k8s.io - What You Need to Know.

A note about the Kubernetes community-owned package repositories

Earlier in 2023, the Kubernetes project introduced pkgs.k8s.io, community-owned software repositories for Debian and RPM packages. The community-owned repositories replaced the legacy Google-owned repositories (apt.kubernetes.io and yum.kubernetes.io). On September 13, 2023, those legacy repositories were formally deprecated and their contents frozen.

For more information on this change or to see if you are impacted, please read the deprecation announcement.

Deprecations and removals for Kubernetes v1.29

See the official list of API removals for a full list of planned deprecations for Kubernetes v1.29.

Removal of in-tree integrations with cloud providers (KEP-2395)

The feature gates DisableCloudProviders and DisableKubeletCloudCredentialProviders will both be set to true by default for Kubernetes v1.29. This change will require that users who are currently using in-tree cloud provider integrations (Azure, GCE, or vSphere) enable external cloud controller managers, or opt in to the legacy integration by setting the associated feature gates to false.

Enabling external cloud controller managers means you must run a suitable cloud controller manager within your cluster's control plane; it also requires setting the command line argument --cloud-provider=external for the kubelet (on every relevant node), and across the control plane (kube-apiserver and kube-controller-manager).

For more information about how to enable and run external cloud controller managers, read Cloud Controller Manager Administration and Migrate Replicated Control Plane To Use Cloud Controller Manager.

For general information about cloud controller managers, please see Cloud Controller Manager in the Kubernetes documentation.

Removal of the v1beta2 flow control API group

The flowcontrol.apiserver.k8s.io/v1beta2 API version of FlowSchema and PriorityLevelConfiguration will no longer be served in Kubernetes v1.29.

To prepare for this, you can edit your existing manifests and rewrite client software to use the flowcontrol.apiserver.k8s.io/v1beta3 API version, available since v1.26. All existing persisted objects are accessible via the new API. Notable changes in flowcontrol.apiserver.k8s.io/v1beta3 include that the PriorityLevelConfiguration spec.limited.assuredConcurrencyShares field was renamed to spec.limited.nominalConcurrencyShares.

Deprecation of the status.nodeInfo.kubeProxyVersion field for Node

The .status.kubeProxyVersion field for Node objects will be marked as deprecated in v1.29 in preparation for its removal in a future release. This field is not accurate and is set by kubelet, which does not actually know the kube-proxy version, or even if kube-proxy is running.

Want to know more?

Deprecations are announced in the Kubernetes release notes. You can see the announcements of pending deprecations in the release notes for:

We will formally announce the deprecations that come with Kubernetes v1.29 as part of the CHANGELOG for that release.

For information on the deprecation and removal process, refer to the official Kubernetes deprecation policy document.

16 Nov 2023 12:00am GMT

07 Nov 2023

feedKubernetes – Production-Grade Container Orchestration

Blog: Introducing SIG etcd

Authors: Han Kang (Google), Marek Siarkowicz (Google), Frederico Muñoz (SAS Institute)

Special Interest Groups (SIGs) are a fundamental part of the Kubernetes project, with a substantial share of the community activity happening within them. When the need arises, new SIGs can be created, and that was precisely what happened recently.

SIG etcd is the most recent addition to the list of Kubernetes SIGs. In this article we will get to know it a bit better, understand its origins, scope, and plans.

The critical role of etcd

If we look inside the control plane of a Kubernetes cluster, we will find etcd, a consistent and highly-available key value store used as Kubernetes' backing store for all cluster data -- this description alone highlights the critical role that etcd plays, and the importance of it within the Kubernetes ecosystem.

This critical role makes the health of the etcd project and community an important consideration, and concerns about the state of the project in early 2022 did not go unnoticed. The changes in the maintainer team, amongst other factors, contributed to a situation that needed to be addressed.

Why a special interest group

With the critical role of etcd in mind, it was proposed that the way forward would be to create a new special interest group. If etcd was already at the heart of Kubernetes, creating a dedicated SIG not only recognises that role, it would make etcd a first-class citizen of the Kubernetes community.

Establishing SIG etcd creates a dedicated space to make explicit the contract between etcd and Kubernetes api machinery and to prevent, on the etcd level, changes which violate this contract. Additionally, etcd will be able to adopt the processes that Kubernetes offers its SIGs (KEPs, PRR, phased feature gates, amongst others) in order to improve the consistency and reliability of the codebase. Being able to use these processes will be a substantial benefit to the etcd community.

As a SIG, etcd will also be able to draw contributor support from Kubernetes proper: active contributions to etcd from Kubernetes maintainers would decrease the likelihood of breaking Kubernetes changes, through the increased number of potential reviewers and the integration with existing testing framework. This will not only benefit Kubernetes, which will be able to better participate and shape the direction of etcd in terms of the critical role it plays, but also etcd as a whole.

About SIG etcd

The recently created SIG is already working towards its goals, defined in its Charter and Vision. The purpose is clear: to ensure etcd is a reliable, simple, and scalable production-ready store for building cloud-native distributed systems and managing cloud-native infrastructure via orchestrators like Kubernetes.

The scope of SIG etcd is not exclusively about etcd as a Kubernetes component, it also covers etcd as a standard solution. Our goal is to make etcd the most reliable key-value storage to be used anywhere, unconstrained by any Kubernetes-specific limits and scaling to meet the requirements of many diverse use-cases.

We are confident that the creation of SIG etcd constitutes an important milestone in the lifecycle of the project, simultaneously improving etcd itself, and also the integration of etcd with Kubernetes. We invite everyone interested in etcd to visit our page, join us at our Slack channel, and get involved in this new stage of etcd's life.

07 Nov 2023 12:00am GMT

03 Nov 2023

feedKubernetes – Production-Grade Container Orchestration

Blog: Kubernetes Contributor Summit: Behind-the-scenes

Author : Frederico Muñoz (SAS Institute)

Every year, just before the official start of KubeCon+CloudNativeCon, there's a special event that has a very special place in the hearts of those organizing and participating in it: the Kubernetes Contributor Summit. To find out why, and to provide a behind-the-scenes perspective, we interview Noah Abrahams, whom amongst other roles was the co-lead for the Kubernetes Contributor Summit in 2023.

Frederico Muñoz (FSM): Hello Noah, and welcome. Could you start by introducing yourself and telling us how you got involved in Kubernetes?

Noah Abrahams (NA): I've been in this space for quite a while. I got started in IT in the mid 90's, and I've been working in the "Cloud" space for about 15 years. It was, frankly, through a combination of sheer luck (being in the right place at the right time) and having good mentors to pull me into those places (thanks, Tim!), that I ended up at a startup called Apprenda in 2016. While I was there, they pivoted into Kubernetes, and it was the best thing that could have happened to my career. It was around v1.2 and someone asked me if I could give a presentation on Kubernetes concepts at "my local meetup" in Las Vegas. The meetup didn't exist yet, so I created it, and got involved in the wider community. One thing led to another, and soon I was involved in ContribEx, joined the release team, was doing booth duty for the CNCF, became an ambassador, and here we are today.

The Contributor Summit

KCSEU 2023 group photo

FM: Before leading the organisation of the KCSEU 2023, how many other Contributor Summits were you a part of?

NA: I was involved in four or five before taking the lead. If I'm recalling correctly, I attended the summit in Copenhagen, then sometime in 2018 I joined the wrong meeting, because the summit staff meeting was listed on the ContribEx calendar. Instead of dropping out of the call, I listened a bit, then volunteered to take on some work that didn't look like it had anybody yet dedicated to it. I ended up running Ops in Seattle and helping run the New Contributor Workshop in Shanghai, that year. Since then, I've been involved in all but two, since I missed both Barcelona and Valencia.

FM: Have you noticed any major changes in terms of how the conference is organized throughout the years? Namely in terms of number of participants, venues, speakers, themes...

NA: The summit changes over the years with the ebb and flow of the desires of the contributors that attend. While we can typically expect about the same number of attendees, depending on the region that the event is held in, we adapt the style and content greatly based on the feedback that we receive at the end of each event. Some years, contributors ask for more free-style or unconference type sessions, and we plan on having more of those, but some years, people ask for more planned sessions or workshops, so that's what we facilitate. We also have to continually adapt to the venue that we have, the number of rooms we're allotted, how we're going to share the space with other events and so forth. That all goes into the planning ahead of time, from how many talk tracks we'll have, to what types of tables and how many microphones we want in a room.

There has been one very significant change over the years, though, and that is that we no longer run the New Contributor Workshop. While the content was valuable, running the session during the summit never led to any people who weren't already contributing to the project becoming dedicated contributors to the project, so we removed it from the schedule. We'll deliver that content another way, while we'll keep the summit focused on existing contributors.

What makes it special

FM: Going back to the introduction I made, I've heard several participants saying that KubeCon is great, but that the Contributor Summit is for them the main event. In your opinion, why do you think that makes it so?

NA: I think part of it ties into what I mentioned a moment ago, the flexibility in our content types. For many contributors, I think the summit is basically "How Kubecon used to be", back when it was primarily a gathering of the contributors to talk about the health of the project and the work that needed to be done. So, in that context, if the contributors want to discuss, say, a new Working Group, then they have dedicated space to do so in the summit. They also have the space to sit down and hack on a tough problem, discuss architectural philosophy, bring potential problems to more people's attention, refine our methods, and so forth. Plus, the unconference aspect allows for some malleability on the day-of, for whatever is most important right then and there. Whatever folks want to get out of this environment is what we'll provide, and having a space and time specifically to address your particular needs is always going to be well received.

Let's not forget the social aspect, too. Despite the fact that we're a global community and work together remotely and asynchronously, it's still easier to work together when you have a personal connection, and can put a face to a Github handle. Zoom meetings are a good start, but even a single instance of in-person time makes a big difference in how people work together. So, getting folks together a couple times a year makes the project run more smoothly.

Organizing the Summit

FM: In terms of the organization team itself, could you share with us a general overview of the staffing process? Who are the people that make it happen? How many different teams are involved?

NA: There's a bit of the "usual suspects" involved in making this happen, many of whom you'll find in the ContribEx meetings, but really it comes down to whoever is going to step up and do the work. We start with a general call out for volunteers from the org. There's a Github issue where we'll track the staffing and that will get shouted out to all the usual comms channels: slack, k-dev, etc.

From there, there's a handful of different teams, overseeing content/program committee, registration, communications, day-of operations, the awards the SIGs present to their members, the after-summit social event, and so on. The leads for each team/role are generally picked from folks who have stepped up and worked the event before, either as a shadow, or a previous lead, so we know we can rely on them, which is a recurring theme. The leads pick their shadows from whoever pipes up on the issue, and the teams move forward, operating according to their role books, which we try to update at the end of each summit, with what we've learned over the past few months. It's expected that a shadow will be in line to lead that role at some point in a future summit, so we always have a good bench of folks available to make this event happen. A couple of the roles also have some non-shadow volunteers where people can step in to help a bit, like as an on-site room monitor, and get a feel for how things are put together without having to give a serious up-front commitment, but most of the folks working the event are dedicated to both making the summit successful, and coming back to do so in the future. Of course, the roster can change over time, or even suddenly, as people gain or lose travel budget, get new jobs, only attend Europe or North America or Asia, etc. It's a constant dance, relying 100% on the people who want to make this project successful.

Last, but not least, is the Summit lead. They have to keep the entire process moving forward, be willing to step in to keep bike-shedding from derailing our deadlines, make sure the right people are talking to one another, lead all our meetings to make sure everyone gets a voice, etc. In some cases, the lead has to even be willing to take over an entirely separate role, in case someone gets sick or has any other extenuating circumstances, to make sure absolutely nothing falls through the cracks. The lead is only allowed to volunteer after they've been through this a few times and know what the event entails. Event planning is not for the faint of heart.

FM: The participation of volunteers is essential, but there's also the topic of CNCF support: how does this dynamic play out in practice?

NA: This event would not happen in its current form without our CNCF liaison. They provide us with space, make sure we are fed and caffeinated and cared for, bring us outside spaces to evaluate, so we have somewhere to hold the social gathering, get us the budget so we have t-shirts and patches and the like, and generally make it possible for us to put this event together. They're even responsible for the signage and arrows, so the attendees know where to go. They're the ones sitting at the front desk, keeping an eye on everything and answering people's questions. At the same time, they're along to facilitate, and try to avoid influencing our planning.

There's a ton of work that goes into making the summit happen that is easy to overlook, as an attendee, because people tend to expect things to just work. It is not exaggerating to say this event would not have happened like it has over the years, without the help from our liaisons, like Brienne and Deb. They are an integral part of the team.

A look ahead

FM: Currently, we're preparing the NA 2023 summit, how is it going? Any changes in format compared with previous ones?

NA: I would say it's going great, though I'm sort of emeritus lead for this event, mostly picking up the things that I see need to be done and don't have someone assigned to it. We're always learning from our past experiences and making small changes to continually be better, from how many people need to be on a particular rotation to how far in advance we open and close the CFP. There's no major changes right now, just continually providing the content that the contributors want.

FM: For our readers that might be interested in joining in the Kubernetes Contributor Summit, is there anything they should know?

NA: First of all, the summit is an event by and for Org members. If you're not already an org member, you should be getting involved before trying to attend the summit, as the content is curated specifically towards the contributors and maintainers of the project. That applies to the staff, as well, as all the decisions should be made with the interests and health of kubernetes contributors being the end goal. We get a lot of people who show interest in helping out, but then aren't ready to make any sort of commitment, and that just makes more work for us. If you're not already a proven and committed member of this community, it's difficult for us to place you in a position that requires reliability. We have made some rare exceptions when we need someone local to help us out, but those are few and far between.

If you are, however, already a member, we'd love to have you. The more people that are involved, the better the event becomes. That applies to both dedicated staff, and those in attendance bringing CFPs, unconference topics, and just contributing to the discussions. If you're part of this community and you're going to be at KubeCon, I would highly urge you to attend, and if you're not yet an org member, let's make that happen!

FM: Indeed! Any final comments you would like to share?

NA: Just that the Contributor Summit is, for me, the ultimate manifestation of the Hallway Track. By being here, you're part of the conversations that move this project forward. It's good for you, and it's good for Kubernetes. I hope to see you all in Chicago!

03 Nov 2023 12:00am GMT

02 Nov 2023

feedKubernetes – Production-Grade Container Orchestration

Blog: Spotlight on SIG Architecture: Production Readiness

Author: Frederico Muñoz (SAS Institute)

This is the second interview of a SIG Architecture Spotlight series that will cover the different subprojects. In this blog, we will cover the SIG Architecture: Production Readiness subproject.

In this SIG Architecture spotlight, we talked with Wojciech Tyczynski (Google), lead of the Production Readiness subproject.

About SIG Architecture and the Production Readiness subproject

Frederico (FSM): Hello Wojciech, could you tell us a bit about yourself, your role and how you got involved in Kubernetes?

Wojciech Tyczynski (WT): I started contributing to Kubernetes in January 2015. At that time, Google (where I was and still am working) decided to start a Kubernetes team in the Warsaw office (in addition to already existing teams in California and Seattle). I was lucky enough to be one of the seeding engineers for that team.

After two months of onboarding and helping with different tasks across the project towards 1.0 launch, I took ownership of the scalability area and I was leading Kubernetes to support clusters with 5000 nodes. I'm still involved in SIG Scalability as its Technical Lead. That was the start of a journey since scalability is such a cross-cutting topic, and I started contributing to many other areas including, over time, to SIG Architecture.

FSM: In SIG Architecture, why specifically the Production Readiness subproject? Was it something you had in mind from the start, or was it an unexpected consequence of your initial involvement in scalability?

WT: After reaching that milestone of Kubernetes supporting 5000-node clusters, one of the goals was to ensure that Kubernetes would not degrade its scalability properties over time. While non-scalable implementation is always fixable, designing non-scalable APIs or contracts is problematic. I was looking for a way to ensure that people are thinking about scalability when they create new features and capabilities without introducing too much overhead.

This is when I joined forces with John Belamaric and David Eads and created a Production Readiness subproject within SIG Architecture. While setting the bar for scalability was only one of a few motivations for it, it ended up fitting quite well. At the same time, I was already involved in the overall reliability of the system internally, so other goals of Production Readiness were also close to my heart.

FSM: To anyone new to how SIG Architecture works, how would you describe the main goals and areas of intervention of the Production Readiness subproject?

WT: The goal of the Production Readiness subproject is to ensure that any feature that is added to Kubernetes can be reliably used in production clusters. This primarily means that those features are observable, scalable, supportable, can always be safely enabled and in case of production issues also disabled.

Production readiness and the Kubernetes project

FSM: Architectural consistency being one of the goals of the SIG, is this made more challenging by the distributed and open nature of Kubernetes? Do you feel this impacts the approach that Production Readiness has to take?

WT: The distributed nature of Kubernetes certainly impacts Production Readiness, because it makes thinking about aspects like enablement/disablement or scalability more challenging. To be more precise, when enabling or disabling features that span multiple components you need to think about version skew between them and design for it. For scalability, changes in one component may actually result in problems for a completely different one, so it requires a good understanding of the whole system, not just individual components. But it's also what makes this project so interesting.

FSM: Those running Kubernetes in production will have their own perspective on things, how do you capture this feedback?

WT: Fortunately, we aren't talking about "them" here, we're talking about "us": all of us are working for companies that are managing large fleets of Kubernetes clusters and we're involved in that too, so we suffer from those problems ourselves.

So while we're trying to get feedback (our annual PRR survey is very important for us), it rarely reveals completely new problems - it rather shows the scale of them. And we try to react to it - changes like "Beta APIs off by default" happen in reaction to the data that we observe.

FSM: On the topic of reaction, that made me think of how the Kubernetes Enhancement Proposal (KEP) template has a Production Readiness Review (PRR) section, which is tied to the graduation process. Was this something born out of identified insufficiencies? How would you describe the results?

WT: As mentioned above, the overall goal of the Production Readiness subproject is to ensure that every newly added feature can be reliably used in production. It's not possible to enforce that by a central team - we need to make it everyone's problem.

To achieve it, we wanted to ensure that everyone designing their new feature is thinking about safe enablement, scalability, observability, supportability, etc. from the very beginning. Which means not when the implementation starts, but rather during the design. Given that KEPs are effectively Kubernetes design docs, making it part of the KEP template was the way to achieve the goal.

FSM: So, in a way making sure that feature owners have thought about the implications of their proposal.

WT: Exactly. We already observed that just by forcing feature owners to think through the PRR aspects (via forcing them to fill in the PRR questionnaire) many of the original issues are going away. Sure - as PRR approvers we're still catching gaps, but even the initial versions of KEPs are better now than they used to be a couple of years ago in what concerns thinking about productionisation aspects, which is exactly what we wanted to achieve - spreading the culture of thinking about reliability in its widest possible meaning.

FSM: We've been talking about the PRR process, could you describe it for our readers?

WT: The PRR process is fairly simple - we just want to ensure that you think through the productionisation aspects of your feature early enough. If you do your job, it's just a matter of answering some questions in the KEP template and getting approval from a PRR approver (in addition to regular SIG approval). If you didn't think about those aspects earlier, it may require spending more time and potentially revising some decisions, but that's exactly what we need to make the Kubernetes project reliable.

Helping with Production Readiness

FSM: Production Readiness seems to be one area where a good deal of prior exposure is required in order to be an effective contributor. Are there also ways for someone newer to the project to contribute?

WT: PRR approvers have to have a deep understanding of the whole Kubernetes project to catch potential issues. Kubernetes is such a large project now with so many nuances that people who are new to the project can simply miss the context, no matter how senior they are.

That said, there are many ways that you may implicitly help. Increasing the reliability of particular areas of the project by improving its observability and debuggability, increasing test coverage, and building new kinds of tests (upgrade, downgrade, chaos, etc.) will help us a lot. Note that the PRR subproject is focused on keeping the bar at the design level, but we should also care equally about the implementation. For that, we're relying on individual SIGs and code approvers, so having people there who are aware of productionisation aspects, and who deeply care about it, will help the project a lot.

FSM: Thank you! Any final comments you would like to share with our readers?

WT: I would like to highlight and thank all contributors for their cooperation. While the PRR adds some additional work for them, we see that people care about it, and what's even more encouraging is that with every release the quality of the answers improves, and questions "do I really need a metric reflecting if my feature works" or "is downgrade really that important" don't really appear anymore.

02 Nov 2023 12:00am GMT

31 Oct 2023

feedKubernetes – Production-Grade Container Orchestration

Blog: Gateway API v1.0: GA Release

Authors: Shane Utt (Kong), Nick Young (Isovalent), Rob Scott (Google)

On behalf of Kubernetes SIG Network, we are pleased to announce the v1.0 release of Gateway API! This release marks a huge milestone for this project. Several key APIs are graduating to GA (generally available), while other significant features have been added to the Experimental channel.

What's new

Graduation to v1

This release includes the graduation of Gateway, GatewayClass, and HTTPRoute to v1, which means they are now generally available (GA). This API version denotes a high level of confidence in the API surface and provides guarantees of backwards compatibility. Note that although, the version of these APIs included in the Standard channel are now considered stable, that does not mean that they are complete. These APIs will continue to receive new features via the Experimental channel as they meet graduation criteria. For more information on how all of this works, refer to the Gateway API Versioning Policy.

Logo

Gateway API now has a logo! This logo was designed through a collaborative process, and is intended to represent the idea that this is a set of Kubernetes APIs for routing traffic both north-south and east-west:

Gateway API Logo

CEL Validation

Historically, Gateway API has bundled a validating webhook as part of installing the API. Starting in v1.0, webhook installation is optional and only recommended for Kubernetes 1.24. Gateway API now includes CEL validation rules as part of the CRDs. This new form of validation is supported in Kubernetes 1.25+, and thus the validating webhook is no longer required in most installations.

Standard channel

This release was primarily focused on ensuring that the existing beta APIs were well defined and sufficiently stable to graduate to GA. That led to a variety of spec clarifications, as well as some improvements to status to improve the overall UX when interacting with Gateway API.

Experimental channel

Most of the changes included in this release were limited to the experimental channel. These include HTTPRoute timeouts, TLS config from Gateways to backends, WebSocket support, Gateway infrastructure labels, and more. Stay tuned for a follow up blog post that will cover each of these new features in detail.

Everything else

For a full list of the changes included in this release, please refer to the v1.0.0 release notes.

How we got here

The idea of Gateway API was initially proposed 4 years ago at KubeCon San Diego as the next generation of Ingress API. Since then, an incredible community has formed to develop what has likely become the most collaborative API in Kubernetes history. Over 170 people have contributed to this API so far, and that number continues to grow.

A special thank you to the 20+ community members who agreed to take on an official role in the project, providing some time for reviews and sharing the load of maintaining the project!

We especially want to highlight the emeritus maintainers that played a pivotal role in the early development of this project:

Try it out

Unlike other Kubernetes APIs, you don't need to upgrade to the latest version of Kubernetes to get the latest version of Gateway API. As long as you're running one of the 5 most recent minor versions of Kubernetes (1.24+), you'll be able to get up and running with the latest version of Gateway API.

To try out the API, follow our Getting Started guide.

What's next

This release is just the beginning of a much larger journey for Gateway API, and there are still plenty of new features and new ideas in flight for future releases of the API.

One of our key goals going forward is to work to stabilize and graduate other experimental features of the API. These include support for service mesh, additional route types (GRPCRoute, TCPRoute, TLSRoute, UDPRoute), and a variety of experimental features.

We've also been working towards moving ReferenceGrant into a built-in Kubernetes API that can be used for more than just Gateway API. Within Gateway API, we've used this resource to safely enable cross-namespace references, and that concept is now being adopted by other SIGs. The new version of this API will be owned by SIG Auth and will likely include at least some modifications as it migrates to a built-in Kubernetes API.

Gateway API at KubeCon + CloudNativeCon

At KubeCon North America (Chicago) and the adjacent Contributor Summit there are several talks related to Gateway API that will go into more detail on these topics. If you're attending either of these events this year, considering adding these to your schedule.

Contributor Summit:

KubeCon Main Event:

KubeCon Office Hours:

Gateway API maintainers will be holding office hours sessions at KubeCon if you'd like to discuss or brainstorm any related topics. To get the latest updates on these sessions, join the #sig-network-gateway-api channel on Kubernetes Slack.

Get involved

We've only barely scratched the surface of what's in flight with Gateway API. There are lots of opportunities to get involved and help define the future of Kubernetes routing APIs for both Ingress and Mesh.

If this is interesting to you, please join us in the community and help us build the future of Gateway API together!

31 Oct 2023 6:00pm GMT

25 Oct 2023

feedKubernetes – Production-Grade Container Orchestration

Blog: Introducing ingress2gateway; Simplifying Upgrades to Gateway API

Authors: Lior Lieberman (Google), Kobi Levi (independent)

Today we are releasing ingress2gateway, a tool that can help you migrate from Ingress to Gateway API. Gateway API is just weeks away from graduating to GA, if you haven't upgraded yet, now's the time to think about it!

Background

In the ever-evolving world of Kubernetes, networking plays a pivotal role. As more applications are deployed in Kubernetes clusters, effective exposure of these services to clients becomes a critical concern. If you've been working with Kubernetes, you're likely familiar with the Ingress API, which has been the go-to solution for managing external access to services.

The Ingress API provides a way to route external traffic to your applications within the cluster, making it an indispensable tool for many Kubernetes users. Ingress has its limitations however, and as applications become more complex and the demands on your Kubernetes clusters increase, these limitations can become bottlenecks.

Some of the limitations are:

Gateway API

To overcome this, Gateway API is designed to provide a more flexible, extensible, and powerful way to manage traffic to your services.

Gateway API is just weeks away from a GA (General Availability) release. It provides a standard Kubernetes API for ingress traffic control. It offers extended functionality, improved customization, and greater flexibility. By focusing on modular and expressive API resources, Gateway API makes it possible to describe a wider array of routing configurations and models.

The transition from Ingress API to Gateway API in Kubernetes is driven by advantages and advanced functionalities that Gateway API offers, with its foundation built on four core principles: a role-oriented approach, portability, expressiveness and extensibility.

A role-oriented approach

Gateway API employs a role-oriented approach that aligns with the conventional roles within organizations involved in configuring Kubernetes service networking. This approach enables infrastructure engineers, cluster operators, and application developers to collectively address different aspects of Gateway API.

For instance, infrastructure engineers play a pivotal role in deploying GatewayClasses, cluster-scoped resources that act as templates to explicitly define behavior for Gateways derived from them, laying the groundwork for robust service networking.

Subsequently, cluster operators utilize these GatewayClasses to deploy gateways. A Gateway in Kubernetes' Gateway API defines how external traffic can be directed to Services within the cluster, essentially bridging non-Kubernetes sources to Kubernetes-aware destinations. It represents a request for a load balancer configuration aligned with a GatewayClass' specification. The Gateway spec may not be exhaustive as some details can be supplied by the GatewayClass controller, ensuring portability. Additionally, a Gateway can be linked to multiple Route references to channel specific traffic subsets to designated services.

Lastly, application developers configure route resources (such as HTTPRoutes), to manage configuration (e.g. timeouts, request matching/filter) and Service composition (e.g. path routing to backends) Route resources define protocol-specific rules for mapping requests from a Gateway to Kubernetes Services. HTTPRoute is for multiplexing HTTP or terminated HTTPS connections. It's intended for use in cases where you want to inspect the HTTP stream and use HTTP request data for either routing or modification, for example using HTTP Headers for routing, or modifying them in-flight.

Diagram showing the key resources that make up Gateway API and how they relate to each other. The resources shown are GatewayClass, Gateway, and HTTPRoute; the Service API is also shown

Portability

With more than 20 API implementations, Gateway API is designed to be more portable across different implementations, clusters and environments. It helps reduce Ingress' reliance on non-portable, provider-specific annotations, making your configurations more consistent and easier to manage across multiple clusters.

Gateway API commits to supporting the 5 latest Kubernetes minor versions. That means that Gateway API currently supports Kubernetes 1.24+.

Expressiveness

Gateway API provides standard, Kubernetes-backed support for a wide range of features, such as header-based matching, traffic splitting, weight-based routing, request mirroring and more. With Ingress, these features need custom provider-specific annotations.

Extensibility

Gateway API is designed with extensibility as a core feature. Rather than enforcing a one-size-fits-all model, it offers the flexibility to link custom resources at multiple layers within the API's framework. This layered approach to customization ensures that users can tailor configurations to their specific needs without overwhelming the main structure. By doing so, Gateway API facilitates more granular and context-sensitive adjustments, allowing for a fine-tuned balance between standardization and adaptability. This becomes particularly valuable in complex cloud-native environments where specific use cases require nuanced configurations. A critical difference is that Gateway API has a much broader base set of features and a standard pattern for extensions that can be more expressive than annotations were on Ingress.

Upgrading to Gateway

Migrating from Ingress to Gateway API may seem intimidating, but luckily Kubernetes just released a tool to simplify the process. ingress2gateway assists in the migration by converting your existing Ingress resources into Gateway API resources. Here is how you can get started with Gateway API and using ingress2gateway:

  1. Install a Gateway controller OR install the Gateway API CRDs manually .

  2. Install ingress2gateway.

    If you have a Go development environment locally, you can install ingress2gateway with:

    go install github.com/kubernetes-sigs/ingress2gateway@v0.1.0
    

    This installs ingress2gateway to $(go env GOPATH)/bin/ingress2gateway.

    Alternatively, follow the installation guide here.

  3. Once the tool is installed, you can use it to convert the ingress resources in your cluster to Gateway API resources.

    ingress2gateway print
    

    This above command will:

    1. Load your current Kubernetes client config including the active context, namespace and authentication details.
    2. Search for ingresses and provider-specific resources in that namespace.
    3. Convert them to Gateway API resources (Currently only Gateways and HTTPRoutes). For other options you can run the tool with -h, or refer to https://github.com/kubernetes-sigs/ingress2gateway#options.
  4. Review the converted Gateway API resources, validate them, and then apply them to your cluster.

  5. Send test requests to your Gateway to check that it is working. You could get your gateway address using kubectl get gateway <gateway-name> -n <namespace> -o jsonpath='{.status.addresses}{"\n"}'.

  6. Update your DNS to point to the new Gateway.

  7. Once you've confirmed that no more traffic is going through your Ingress configuration, you can safely delete it.

Wrapping up

Achieving reliable, scalable and extensible networking has always been a challenging objective. The Gateway API is designed to improve the current Kubernetes networking standards like ingress and reduce the need for implementation specific annotations and CRDs.

It is a Kubernetes standard API, consistent across different platforms and implementations and most importantly it is future proof. Gateway API is the next generation of the Ingress API, but has a larger scope than that, expanding to tackle mesh and layer 4 routing as well. Gateway API and ingress2gateway are supported by a dedicated team under SIG Network that actively work on it and manage the ecosystem. It is also likely to receive more updates and community support.

The Road Ahead

ingress2gateway is just getting started. We're planning to onboard more providers, introduce support for more types of Gateway API routes, and make sure everything syncs up smoothly with the ongoing development of Gateway API.

Excitingly, Gateway API is also making significant strides. While v1.0 is about to launching, there's still a lot of work ahead. This release incorporates many new experimental features, with additional functionalities currently in the early stages of planning and development.

If you're interested in helping to contribute, we would love to have you! Please check out the community page which includes links to the Slack channel and community meetings. We look forward to seeing you!!

Useful Links

25 Oct 2023 6:00pm GMT

24 Oct 2023

feedKubernetes – Production-Grade Container Orchestration

Blog: Plants, process and parties: the Kubernetes 1.28 release interview

Author: Craig Box

Since 2018, one of my favourite contributions to the Kubernetes community has been to share the story of each release. Many of these stories were told on behalf of a past employer; by popular demand, I've brought them back, now under my own name. If you were a fan of the old show, I would be delighted if you would subscribe.

Back in August, we welcomed the release of Kubernetes 1.28. That release was led by Grace Nguyen, a CS student at the University of Waterloo. Grace joined me for the traditional release interview, and while you can read her story below, I encourage you to listen to it if you can.

This transcript has been lightly edited and condensed for clarity.


You're a student at the University of Waterloo, so I want to spend the first two minutes of this interview talking about the Greater Kitchener-Waterloo region. It's August, so this is one of the four months of the year when there's no snow visible on the ground?
Well, it's not that bad. I think the East Coast has it kind of good. I grew up in Calgary, but I do love summer here in Waterloo. We have a petting zoo close to our university campus, so I go and see the llamas sometimes.

Is that a new thing?
I'm not sure, it seems like it's been around five-ish years, the Waterloo Park?

I lived there in 2007, for a couple of years, just to set the scene for why we're talking about this. I think they were building a lot of the park then. I do remember, of course, that Kitchener holds the second largest Oktoberfest in the world. Is that something you've had a chance to check out?
I have not. I actually didn't know that was a fact.

The local civic organization is going to have to do a bit more work, I feel. Do you like ribs?
I have mixed feelings about ribs. It's kind of a hit or miss situation for me so far.

Again, that might be something that's changed over the last few years. The Ribfests used to have a lot of trophies with little pigs on top of them, but I feel that the shifting dining habits of the world might mean they have to offer some vegan or vegetarian options, to please the modern palette.
[LAUGHS] For sure. Do you recommend the Oktoberfest here? Have you been?

I went a couple of times. It was a lot of fun.
Okay.

It's basically just drinking. I would have recommended it back then; I'm not sure it would be quite what I'd be doing today.
All right, good to know.

The Ribfest, however, I would go back just for that.
Oh, ok.

And the great thing about Ribfests as a concept is that they have one in every little town. The Kitchener Ribfest, I looked it up, it's in July; you've just missed that. But, you could go to the Waterloo Ribfest in September.
Oh, it is in September? They have their own Ribfest?

They do. I think Guelph has one, and Cambridge has one. That's the advantage of the region - there are lots of little cities. Kitchener and Waterloo are two cities that grew into each other - they do call them the Twin Cities. I hear that they finally built the light rail link between the two of them?
It is fantastic, and makes the city so much more walkable.

Yes, you can go from one mall to the other. That's Canada for you.
Well, Uptown is really nice. I quite like it. It's quite cozy.

Do you ever cross the border over into Kitchener? Or only when you've lost a bet?
Yeah, not a lot. Only for farmer's market, I say.

It's worthwhile. There's a lot of good food there, I remember.
Yeah. Quite lovely.

Now we've got all that out of the way, let's travel back in time a little bit. You mentioned there that you went to high school in Calgary?
I did. I had not been to Ontario before I went to university. Calgary was frankly too cold and not walkable enough for me.

I basically say the same thing about Waterloo and that's why I moved to England.
Fascinating. Gets better.

How did you get into tech?
I took a computer science class in high school. I was one of maybe only three women in the class, and I kind of stuck with it since.

Was the gender distribution part of your thought process at the time?
Yeah, I think I was drawn to it partially because I didn't see a lot of people who looked like me in the class.

You followed it through to university. What is it that you're studying?
I am studying computer engineering, so a lot of hardware stuff.

You're involved in the UW Cybersecurity Club. What can you tell me about that without having to kill me?
Oh, we are very nice and friendly people! I told myself I'm going to have a nice and chill summer and then I got chosen to lead the release and also ended up running the Waterloo Cybersecurity Club. The club kind of died out during the pandemic, because we weren't on campus, but we have so many smart and amazing people who are in cybersecurity, so it's great to get them together and I learned so many things.

Is that like the modern equivalent of the LAN party? You're all getting into a dark room and trying to hack the Gibson?
[LAUGHS] Well, you'll have to explain to me again what a LAN party is. Do you bring your own PC?

You used to. Back in the day it was incomprehensible that you could communicate with a different person in a different place at a fast enough speed, so you had to physically sit next to somebody and plug a cable in between you.
Okay, well kind of the same, I guess. We bring our own laptop and we go to CTF competitions together.

They didn't have laptops back in the days of LAN parties. You'd bring a giant 19-inch square monitor, and everything. It was a badge of honor what you could carry.
Okay. Can't relate, but good to know. [LAUGHS]

One of the more unique aspects of UW is its co-op system. Tell us a little bit about that?
As part of my degree, I am required to do minimum five and maximum six co-ops. I've done all six of them. Two of them were in Kubernetes and that's how I got started.

A co-op is a placement, as opposed to something you do on campus?
Right, so co-op is basically an internship. My first one was at the Canada Revenue Agency. We didn't have wifi and I had my own cubicle, which is interesting. They don't do that anymore, they have open office space. But my second was at Ericsson, where I learned about Kubernetes. It was during the pandemic. KubeCon offered virtual attendance for students and I signed up and I poked around and I have been around since.

What was it like going through university during the COVID years? What did that mean in terms of the fact you would previously have traveled to these internships? Did you do them all from home?
I'm not totally sure what I missed out on. For sure, a lot of relationship building, but also that we do have to move a lot as part of the co-op experience. Last fall I was in San Francisco, I was in Palo Alto earlier this year. A lot of that dynamic has already been the case.

Definitely different weather systems, Palo Alto versus Waterloo.
Oh, for sure. Yes, yes. Really glad I was there over the winter.

The first snow would fall in Ontario about the end of October and it would pile up over the next few months. There were still piles that hadn't melted by June. That's why I say, there were only four months of the year, July through September, where there was no snow on the ground.
That's true. Didn't catch any snow in Palo Alto, and honestly, that's great. [CHUCKLES]

Thank you, global warming, I guess.
Oh no! [LAUGHS]

Tell me about the co-op term that you did working with Kubernetes at Ericsson?
This was such a long time ago, but we were trying to build some sort of pipeline to deploy testing. It was running inside a cluster, and I learned Helm charts and all that good stuff. And then, for the co-op after that, I worked at a Canadian startup in FinTech. It was 24/7 Kubernetes, building their secret injection system, using ArgoCD to automatically pull secrets from 1Password.

How did that lead you on to involvement with the release team?
It was over the pandemic, so I didn't have a lot to do, I went to the conference, saw so many cool talks. One that really stuck out to me was a Kubernetes hacking talk by Tabitha Sable and V Korbes. I thought it was the most amazing thing and it was so cool. One of my friends was on the release team at the time, and she showed me what she does. I applied and thankfully got in. I didn't have any open source experience. It was fully like one of those things where someone took a chance on me.

How would you characterize the experience that you've had to date? You have had involvement with pretty much every release since then.
Yeah, I think it was a really formative experience, and the community has been such a big part of it.

You started as an enhancement shadow with Kubernetes 1.22, eventually moving up to enhancements lead, then you moved on to be the release lead shadow. Obviously, you are the lead for 1.28, but for 1.27 you did something a bit different. What was that, and why did you do it?
For 1.25 and 1.26, I was release lead shadow, so I had an understanding of what that role was like. I wanted to shadow another team, and at that time I thought CI Signal was a big black box to me. I joined the team, but I also had capacity for other things, I joined as a branch manager associate as well.

What is the difference between that role and the traditional release team roles we think about?
Yeah, that's a great question. So the branch management role is a more constant role. They don't necessarily get swapped out every release. You shadow as an associate, so you do things like cut releases, distribute them, update distros, things like that. It's a really important role, and the folks that are in there are more technical. So if you have been on the release team for a long time and are looking for more permanent role, I recommend looking into that.

Congratulations again on the release of 1.28 today.
Yeah, thank you.

What is the best new feature in Kubernetes 1.28, and why is it sidecar container support?
Great question. I am as excited as you. In 1.28, we have a new feature in alpha, which is sidecar container support. We introduced a new field called restartPolicy for init containers, that allows the containers to live throughout the life cycle of the pod and not block the pod from terminating. Craig, you know a lot about this, but there are so many use cases for this. It is a very common pattern. You use it for logging, monitoring, metrics; also configs and secrets as well.

And the service mesh!
And the service mesh.

Very popular. I will say that the Sidecar pattern was called out very early on, in a blog post Brendan Burns wrote, talking about how you can achieve some of the things you just mentioned. Support for it in Kubernetes has been- it's been a while, shall we say. I've been doing these interviews since 2018, and September 2019 was when I first had a conversation with a release manager who felt they had to apologize for Sidecar containers not shipping in that release.
Well, here we are!

Thank you for not letting the side down.
[LAUGHS]

There are a bunch of other features that are going to GA in 1.28. Tell me about what's new with kubectl events?
It got a new CLI and now it is separate from kubectl get. I think that changes in the CLI are always a little bit more apparent because they are user-facing.

Are there a lot of other user-facing changes, or are most of the things in the release very much behind the scenes?
I would say it's a good mix of both; it depends on what you're interested in.

I am interested, of course, in non-graceful node shutdown support. What can you tell us about that?
Right, so for situations where you have a hardware failure or a broken OS, we have added additional support for a better graceful shutdown.

If someone trips over the power cord at your LAN party and your cluster goes offline as a result?
Right, exactly. More availability! That's always good.

And if it's not someone tripping over your power cord, it's probably DNS that broke your cluster. What's changed in terms of DNS configuration?
Oh, we introduced a new feature gate to allow more DNS search path.

Is that all there is to it?
That's pretty much it. [LAUGHING] Yeah, you can have more and longer DNS search path.

It can never be long enough. Just search everything! If .com doesn't work, try .net and try .io after that.
Surely.

Those are a few of the big features that are moving to stable. Obviously, over the course of the last few releases, features come in, moving from Alpha to Beta and so on. New features coming in today might not be available to people for a while. As you mentioned, there are feature gates that you can enable to allow people to have access to these. What are some of the newest features that have been introduced that are in Alpha, that are particularly interesting to you personally?
I have two. The first one is kubectl delete --interactive. I'm always nervous when I delete something, you know, it's going to be a typo or it's going to be on the wrong tab. So we have an --interactive flag for that now.

So you can get feedback on what you're about to delete before you do it?
Right; confirmation is good!

You mentioned two there, what was the second one?
Right; this one is close to my heart. It is a SIG Release KEP, publishing on community infrastructure. I'm not sure if you know, but as part of my branch management associate role in 1.27, I had the opportunity to cut a few releases. It takes up to 12 hours sometimes. And now, we are hoping that the process only includes release managers, so we don't have to call up the folks at Google and, you know, lengthen that process anymore.

Is 12 hours the expected length for software of this size, or is there work in place to try and bring that down?
There's so much work in place to bring that down. I think 12 hours is on the shorter end of it. Unfortunately, we have had a situation where we have to, you know, switch the release manager because it's just so late at night for them.

They've fallen asleep halfway through?
Exactly, yeah. 6 to 12 hours, I think, is our status quo.

The theme for this release is "Planternetes". That's going to need some explanation, I feel.
Okay. I had full creative control over this. It is summer in the northern hemisphere, and I am a big house plant fanatic. It's always a little sad when I have to move cities for co-op and can't take my plants with me.

Is that a border control thing? They don't let you take them over the border?
It's not even that; they're just so clunky and fragile. It's usually not worth the effort. But I think our community is very much like a garden. We have very critical roles in the ecosystem and we all have to work together.

Will you be posting seeds out to contributors and growing something together all around the world?
That would be so cool if we had merch, like a little card with seeds embedded in it. I don't think we have the budget for that though. [LAUGHS]

You say that. There are people who are inspired in many different areas. I love talking to the release managers and hearing the things that they're interested in. You should think about taking some seeds off one of your plants, and just spreading them around the world. People can take pictures, and tag you in them on Instagram.
That's cool. You know how we have a SIG Beard? We can have a SIG Plant.

You worked for a long time with the release lead for 1.27. Xander Grzywinski. One of the benefits of having done my interview with him in writing and not as a podcast is I didn't have to try and butcher pronouncing his surname. Can you help me out here?
I unfortunately cannot. I don't want to butcher it either!

Anyway, Xander told me that he suspected that in this release you would have to deal with some very last-minute PRs, as is tradition. Was that the case?
I vividly remember the last minute PRs from last release because I was trying to cut the releases, as part of the branch management team. Thankfully, that was not the case this release. We have had other challenges, of course.

Can you tell me some of those challenges?
I think improvement on documentation is always a big part. The KEP process can be very daunting to new contributors. How do you get people to review your KEPs? How do you opt in? All that stuff. We're improving documentations for that.

As someone who has been through a lot of releases, I've been feeling, like you've said, that the last minute nature has slowed down a little. The process is perhaps improving. Do you see that, or do you think there's still a long way to go for the leads to improve it?
I think we've come very far. When I started in 1.22, we were using spreadsheets to track a hundred enhancements. It was a monster; I was terrified to touch it. Now, we're on GitHub boards. As a result of that, we are actually merging the bug triage and CI Signal team in 1.29.

What's the impact of that?
The bug triage team is now using the GitHub board to track issues, which is much more efficient. We are able to merge the two teams together.

I have heard a rumor that GitHub boards are powered by spreadsheets underneath.
Honestly, even if that's true, the fact that it's on the same platform and it has better version control is just magical.

At this time, the next release lead has not yet been announced, but tradition dictates that you write down your feelings, best wishes and instructions to them in an envelope, which you'll leave in their desk drawer. What are you going to put inside that envelope?
Our 1.28 release lead is fantastic and they're so capable of handling the release-

That's you, isn't it?
1.29? [LAUGHS] No, I'm too tired. I need to catch up on my sleep. My advice for them? It's going to be okay. It's all going to be okay. I was going to echo Leo's and Cici's words, to overcommunicate, but I think that has been said enough times already.

You've communicated enough. Stop! No more communication!
Yeah, no more communication. [LAUGHS] It's going to be okay. And honestly, shout out to my emeritus advisor, Leo, for reminding me that. Sometimes there are a lot of fires and it can be overwhelming, but it will be okay.

As we've alluded to a little bit throughout our conversation, there are a lot of people in the Kubernetes community who, for want of a better term, have had "a lot of experience" at running these systems. Then there are, of course, a lot of people who are just at the beginning of their careers; like yourself, at university. How do you see the difference between how those groups interact? Is there one team throughout, or what do you think that each can learn from the other?
I think the diversity of the team is one of its strengths and I really enjoy it. I learn so much from folks who have been doing this for 20 years or folks who are new to the industry like I am.

I know the CNCF goes to a lot of effort to enable new people to take part. Is there anything that you can say about how people might get involved?
Firstly, I think SIG Release has started a wonderful tradition, or system, of helping new folks join the release team as a shadow, and helping them grow into bigger positions, like leads. I think other SIGs are also following that template as well. But a big part of me joining and sticking with the community has been the ability to go to conferences. As I said, my first conference was KubeCon, when I was not involved in the community at all. And so a big shout-out to the CNCF and the companies that sponsor the Dan Kohn and the speaker scholarships. They have been the sole reason that I was able to attend KubeCon, meet people, and feel the power of the community.

Last year's KubeCon in North America was in Detroit?
Detroit, I was there, yeah.

That's quite a long drive?
I was in SF, so I flew over.

You live right next door! If only you'd been in Waterloo.
Yeah, but who knows? Maybe I'll do a road trip from Waterloo to Chicago this year.


Grace Nguyen is a student at the University of Waterloo, and was the release team lead for Kubernetes 1.28. Subscribe to Let's Get To The News, or search for it wherever you get your podcasts.

24 Oct 2023 12:00am GMT

23 Oct 2023

feedKubernetes – Production-Grade Container Orchestration

Blog: PersistentVolume Last Phase Transition Time in Kubernetes

Author: Roman Bednář (Red Hat)

In the recent Kubernetes v1.28 release, we (SIG Storage) introduced a new alpha feature that aims to improve PersistentVolume (PV) storage management and help cluster administrators gain better insights into the lifecycle of PVs. With the addition of the lastPhaseTransitionTime field into the status of a PV, cluster administrators are now able to track the last time a PV transitioned to a different phase, allowing for more efficient and informed resource management.

Why do we need new PV field?

PersistentVolumes in Kubernetes play a crucial role in providing storage resources to workloads running in the cluster. However, managing these PVs effectively can be challenging, especially when it comes to determining the last time a PV transitioned between different phases, such as Pending, Bound or Released. Administrators often need to know when a PV was last used or transitioned to certain phases; for instance, to implement retention policies, perform cleanup, or monitor storage health.

In the past, Kubernetes users have faced data loss issues when using the Delete retain policy and had to resort to the safer Retain policy. When we planned the work to introduce the new lastPhaseTransitionTime field, we wanted to provide a more generic solution that can be used for various use cases, including manual cleanup based on the time a volume was last used or producing alerts based on phase transition times.

How lastPhaseTransitionTime helps

Provided you've enabled the feature gate (see How to use it, the new .status.lastPhaseTransitionTime field of a PersistentVolume (PV) is updated every time that PV transitions from one phase to another. Whether it's transitioning from Pending to Bound, Bound to Released, or any other phase transition, the lastPhaseTransitionTime will be recorded. For newly created PVs the phase will be set to Pending and the lastPhaseTransitionTime will be recorded as well.

This feature allows cluster administrators to:

  1. Implement Retention Policies

    With the lastPhaseTransitionTime, administrators can now track when a PV was last used or transitioned to the Released phase. This information can be crucial for implementing retention policies to clean up resources that have been in the Released phase for a specific duration. For example, it is now trivial to write a script or a policy that deletes all PVs that have been in the Released phase for a week.

  2. Monitor Storage Health

    By analyzing the phase transition times of PVs, administrators can monitor storage health more effectively. For example, they can identify PVs that have been in the Pending phase for an unusually long time, which may indicate underlying issues with the storage provisioner.

How to use it

The lastPhaseTransitionTime field is alpha starting from Kubernetes v1.28, so it requires the PersistentVolumeLastPhaseTransitionTime feature gate to be enabled.

If you want to test the feature whilst it's alpha, you need to enable this feature gate on the kube-controller-manager and the kube-apiserver.

Use the --feature-gates command line argument:

--feature-gates="...,PersistentVolumeLastPhaseTransitionTime=true"

Keep in mind that the feature enablement does not have immediate effect; the new field will be populated whenever a PV is updated and transitions between phases. Administrators can then access the new field through the PV status, which can be retrieved using standard Kubernetes API calls or through Kubernetes client libraries.

Here is an example of how to retrieve the lastPhaseTransitionTime for a specific PV using the kubectl command-line tool:

kubectl get pv <pv-name> -o jsonpath='{.status.lastPhaseTransitionTime}'

Going forward

This feature was initially introduced as an alpha feature, behind a feature gate that is disabled by default. During the alpha phase, we (Kubernetes SIG Storage) will collect feedback from the end user community and address any issues or improvements identified.

Once sufficient feedback has been received, or no complaints are received the feature can move to beta. The beta phase will allow us to further validate the implementation and ensure its stability.

At least two Kubernetes releases will happen between the release where this field graduates to beta and the release that graduates the field to general availability (GA). That means that the earliest release where this field could be generally available is Kubernetes 1.32, likely to be scheduled for early 2025.

Getting involved

We always welcome new contributors so if you would like to get involved you can join our Kubernetes Storage Special-Interest-Group (SIG).

If you would like to share feedback, you can do so on our public Slack channel. If you're not already part of that Slack workspace, you can visit https://slack.k8s.io/ for an invitation.

Special thanks to all the contributors that provided great reviews, shared valuable insight and helped implement this feature (alphabetical order):

23 Oct 2023 12:00am GMT

20 Oct 2023

feedKubernetes – Production-Grade Container Orchestration

Blog: A Quick Recap of 2023 China Kubernetes Contributor Summit

Author: Paco Xu and Michael Yao (DaoCloud)

On September 26, 2023, the first day of KubeCon + CloudNativeCon + Open Source Summit China 2023, nearly 50 contributors gathered in Shanghai for the Kubernetes Contributor Summit.

All participants in the 2023 Kubernetes Contributor Summit

All participants in the 2023 Kubernetes Contributor Summit

This marked the first in-person offline gathering held in China after three years of the pandemic.

A joyful meetup

The event began with welcome speeches from Kevin Wang from Huawei Cloud, one of the co-chairs of KubeCon, and Puja from Giant Swarm.

Following the opening remarks, the contributors introduced themselves briefly. Most attendees were from China, while some contributors had made the journey from Europe and the United States specifically for the conference. Technical experts from companies such as Microsoft, Intel, Huawei, as well as emerging forces like DaoCloud, were present. Laughter and cheerful voices filled the room, regardless of whether English was spoken with European or American accents or if conversations were carried out in authentic Chinese language. This created an atmosphere of comfort, joy, respect, and anticipation. Past contributions brought everyone closer, and mutual recognition and accomplishments made this offline gathering possible.

Face to face meeting in Shanghai

Face to face meeting in Shanghai

The attending contributors were no longer just GitHub IDs; they transformed into vivid faces. From sitting together and capturing group photos to attempting to identify "Who is who," a loosely connected collective emerged. This team structure, although loosely knit and free-spirited, was established to pursue shared dreams.

As the saying goes, "You reap what you sow." Each effort has been diligently documented within the Kubernetes community contributions. Regardless of the passage of time, the community will not erase those shining traces. Brilliance can be found in your PRs, issues, or comments. It can also be seen in the smiling faces captured in meetup photos or heard through stories passed down among contributors.

Technical sharing and discussions

Next, there were three technical sharing sessions:

A technical session about sig-multi-cluster

A technical session about sig-multi-cluster

Following the sessions, a video featuring a call for contributors by Sergey Kanzhelev, the SIG-Node Chair, was played. The purpose was to encourage more contributors to join the Kubernetes community, with a special emphasis on the popular SIG-Node.

Lastly, Kevin hosted an Unconference collective discussion session covering topics such as multi-cluster management, scheduling, elasticity, AI, and more. For detailed minutes of the Unconference meeting, please refer to https://docs.qq.com/doc/DY3pLWklzQkhjWHNT.

China's contributor statistics

The contributor summit took place in Shanghai, with 90% of the attendees being Chinese. Within the Cloud Native Computing Foundation (CNCF) ecosystem, contributions from China have been steadily increasing. Currently:

The Kubernetes Contributor Summit is an inclusive meetup that welcomes all community contributors, including:

Acknowledgments

We would like to express our gratitude to the organizers of this event:

We extend our appreciation to all the contributors who attended the China Kubernetes Contributor Summit in Shanghai. Your dedication and commitment to the Kubernetes community are invaluable. Together, we continue to push the boundaries of cloud native technology and shape the future of this ecosystem.

20 Oct 2023 12:00am GMT

12 Oct 2023

feedKubernetes – Production-Grade Container Orchestration

Blog: Bootstrap an Air Gapped Cluster With Kubeadm

Author: Rob Mengert (Defense Unicorns)

Ever wonder how software gets deployed onto a system that is deliberately disconnected from the Internet and other networks? These systems are typically disconnected due to their sensitive nature. Sensitive as in utilities (power/water), banking, healthcare, weapons systems, other government use cases, etc. Sometimes it's technically a water gap, if you're running Kubernetes on an underwater vessel. Still, these environments need software to operate. This concept of deployment in a disconnected state is what it means to deploy to the other side of an air gap.

Again, despite this posture, software still needs to run in these environments. Traditionally, software artifacts are physically carried across the air gap on hard drives, USB sticks, CDs, or floppy disks (for ancient systems, it still happens). Kubernetes lends itself particularly well to running software behind an air gap for several reasons, largely due to its declarative nature.

In this blog article, I will walk through the process of bootstrapping a Kubernetes cluster in an air-gapped lab environment using Fedora Linux and kubeadm.

The Air Gap VM Setup

A real air-gapped network can take some effort to set up, so for this post, I will use an example VM on a laptop and do some network modifications. Below is the topology:

Topology on the host/laptop which shows that connectivity to the internet from the air gap VM is not possible. However, connectivity between the host/laptop and the VM is possible

Local topology

This VM will have its network connectivity disabled but in a way that doesn't shut down the VM's virtual NIC. Instead, its network will be downed by injecting a default route to a dummy interface, making anything internet-hosted unreachable. However, the VM still has a connected route to the bridge interface on the host, which means that network connectivity to the host is still working. This posture means that data can be transferred from the host/laptop to the VM via scp, even with the default route on the VM black-holing all traffic that isn't destined for the local bridge subnet. This type of transfer is analogous to carrying data across the air gap and will be used throughout this post.

Other details about the lab setup:

VM OS: Fedora 37
Kubernetes Version: v1.27.3
CNI Plugins Version: v1.3.0
CNI Provider and Version: Flannel v0.22.0

While this single VM lab is a simplified example, the below diagram more approximately shows what a real air-gapped environment could look like:

Example production topology which shows 3 control plane Kubernetes nodes and 'n' worker nodes along with a Docker registry in an air-gapped environment. Additionally shows two workstations, one on each side of the air gap and an IT admin which physically carries the artifacts across.

Note, there is still intentional isolation between the environment and the internet. There are also some things that are not shown in order to keep the diagram simple, for example malware scanning on the secure side of the air gap.

Back to the single VM lab environment.

Identifying the required software artifacts

I have gone through the trouble of identifying all of the required software components that need to be carried across the air gap in order for this cluster to be stood up:

The way I identified these was by trying to do the installation and working through all of the errors that are thrown around an additional dependency being required. In a real air-gapped scenario, each transport of artifacts across the air gap could represent anywhere from 20 minutes to several weeks of time spent by the installer. That is to say that the target system could be located in a data center on the same floor as your desk, at a satellite downlink facility in the middle of nowhere, or on a submarine that's out to sea. Knowing what is on that system at any given time is important so you know what you have to bring.

Prepare the Node for K8s

Before downloading and moving the artifacts to the VM, let's first prep that VM to run Kubernetes.

VM preparation

Run these steps as a normal user

Make destination directory for software artifacts

mkdir ~/tmp

Run the following steps as the superuser (root)

Write to /etc/sysctl.d/99-k8s-cri.conf:

cat > /etc/sysctl.d/99-k8s-cri.conf << EOF
net.bridge.bridge-nf-call-iptables=1
net.ipv4.ip_forward=1
net.bridge.bridge-nf-call-ip6tables=1
EOF

Write to /etc/modules-load.d/k8s.conf (enable overlay and nbr_netfilter):

echo -e overlay\\nbr_netfilter > /etc/modules-load.d/k8s.conf

Install iptables:

dnf -y install iptables-legacy

Set iptables to use legacy mode (not nft emulating iptables):

update-alternatives --set iptables /usr/sbin/iptables-legacy

Turn off swap:

touch /etc/systemd/zram-generator.conf
systemctl mask systemd-zram-setup@.service
sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab

Disable firewalld (this is OK in a demo context):

systemctl disable --now firewalld

Disable systemd-resolved:

systemctl disable --now systemd-resolved

Configure DNS defaults for NetworkManager:

sed -i '/\[main\]/a dns=default' /etc/NetworkManager/NetworkManager.conf

Blank the system-level DNS resolver configuration:

unlink /etc/resolv.conf || true
touch /etc/resolv.conf

Disable SELinux (just for a demo - check before doing this in production!):

setenforce 0

Make sure all changes survive a reboot

reboot

Download all the artifacts

On the laptop/host machine, download all of the artifacts enumerated in the previous section. Since the air gapped VM is running Fedora 37, all of the dependencies shown in this part are for Fedora 37. Note, this procedure will only work on AArch64 or AMD64 CPU architectures as they are the most popular and widely available.. You can execute this procedure anywhere you have write permissions; your home directory is a perfectly suitable choice.

Note, operating system packages for the Kubernetes artifacts that need to be carried across can now be found at pkgs.k8s.io. This blog post will use a combination of Fedora repositories and GitHub in order to download all of the required artifacts. When you're doing this on your own cluster, you should decide whether to use the official Kubernetes packages, or the official packages from your operating system distribution - both are valid choices.

# Set architecture variables
UARCH=$(uname -m)

if [["$UARCH" == "arm64" || "$UARCH" == "aarch64"]]; then

 ARCH="aarch64"
 K8s_ARCH="arm64"

else

 ARCH="x86_64"
 K8s_ARCH="amd64"

fi

Set environment variables for software versions to use:

CNI_PLUGINS_VERSION="v1.3.0"
CRICTL_VERSION="v1.27.0"
KUBE_RELEASE="v1.27.3"
RELEASE_VERSION="v0.15.1"
K9S_VERSION="v0.27.4"

Create a download directory, change into it, and download all of the RPMs and configuration files

mkdir download && cd download

curl -O https://download.docker.com/linux/fedora/37/${ARCH}/stable/Packages/docker-ce-cli-23.0.2-1.fc37.${ARCH}.rpm

curl -O https://download.docker.com/linux/fedora/37/${ARCH}/stable/Packages/containerd.io-1.6.19-3.1.fc37.${ARCH}.rpm

curl -O https://download.docker.com/linux/fedora/37/${ARCH}/stable/Packages/docker-compose-plugin-2.17.2-1.fc37.${ARCH}.rpm

curl -O https://download.docker.com/linux/fedora/37/${ARCH}/stable/Packages/docker-ce-rootless-extras-23.0.2-1.fc37.${ARCH}.rpm

curl -O https://download.docker.com/linux/fedora/37/${ARCH}/stable/Packages/docker-ce-23.0.2-1.fc37.${ARCH}.rpm

curl -O https://download-ib01.fedoraproject.org/pub/fedora/linux/releases/37/Everything/${ARCH}/os/Packages/l/libcgroup-3.0-1.fc37.${ARCH}.rpm

echo -e "\nDownload Kubernetes Binaries"

curl -L -O "https://github.com/containernetworking/plugins/releases/download/${CNI_PLUGINS_VERSION}/cni-plugins-linux-${K8s_ARCH}-${CNI_PLUGINS_VERSION}.tgz"

curl -L -O "https://github.com/kubernetes-sigs/cri-tools/releases/download/${CRICTL_VERSION}/crictl-${CRICTL_VERSION}-linux-${K8s_ARCH}.tar.gz"

curl -L --remote-name-all https://dl.k8s.io/release/${KUBE_RELEASE}/bin/linux/${K8s_ARCH}/{kubeadm,kubelet}

curl -L -O "https://raw.githubusercontent.com/kubernetes/release/${RELEASE_VERSION}/cmd/kubepkg/templates/latest/deb/kubelet/lib/systemd/system/kubelet.service"

curl -L -O "https://raw.githubusercontent.com/kubernetes/release/${RELEASE_VERSION}/cmd/kubepkg/templates/latest/deb/kubeadm/10-kubeadm.conf"

curl -L -O "https://dl.k8s.io/release/${KUBE_RELEASE}/bin/linux/${K8s_ARCH}/kubectl"

echo -e "\nDownload dependencies"

curl -O "https://dl.fedoraproject.org/pub/fedora/linux/releases/37/Everything/${ARCH}/os/Packages/s/socat-1.7.4.2-3.fc37.${ARCH}.rpm"

curl -O "https://dl.fedoraproject.org/pub/fedora/linux/releases/37/Everything/${ARCH}/os/Packages/l/libcgroup-3.0-1.fc37.${ARCH}.rpm"

curl -O "https://dl.fedoraproject.org/pub/fedora/linux/releases/37/Everything/${ARCH}/os/Packages/c/conntrack-tools-1.4.6-4.fc37.${ARCH}.rpm"

curl -LO "https://github.com/derailed/k9s/releases/download/${K9S_VERSION}/k9s_Linux_${K8s_ARCH}.tar.gz"

curl -LO "https://raw.githubusercontent.com/flannel-io/flannel/master/Documentation/kube-flannel.yml"

Download all of the necessary container images:

images=(
 "registry.k8s.io/kube-apiserver:${KUBE_RELEASE}"
 "registry.k8s.io/kube-controller-manager:${KUBE_RELEASE}"
 "registry.k8s.io/kube-scheduler:${KUBE_RELEASE}"
 "registry.k8s.io/kube-proxy:${KUBE_RELEASE}"
 "registry.k8s.io/pause:3.9"
 "registry.k8s.io/etcd:3.5.7-0"
 "registry.k8s.io/coredns/coredns:v1.10.1"
 "registry:2.8.2"
 "flannel/flannel:v0.22.0"
 "flannel/flannel-cni-plugin:v1.1.2"
)

for image in "${images[@]}"; do
 # Pull the image from the registry
 docker pull "$image"

 # Save the image to a tar file on the local disk
 image_name=$(echo "$image" | sed 's|/|_|g' | sed 's/:/_/g')
 docker save -o "${image_name}.tar" "$image"

done

The above commands will take a look at the CPU architecture for the current host/laptop, create and change into a directory called download, and finally download all of the dependencies. Each of these files must then be transported over the air gap via scp. The exact syntax of the command will vary depending on the user on the VM, if you created an SSH key, and the IP of your air gap VM. The rough syntax is:

scp -i <<SSH_KEY>> <<FILE>> <<AIRGAP_VM_USER>>@<<AIRGAP_VM_IP>>:~/tmp/

Once all of the files have been transported to the air gapped VM, the rest of the blog post will take place from the VM. Open a terminal session to that system.

Put the artifacts in place

Everything that is needed in order to bootstrap a Kubernetes cluster now exists on the air-gapped VM. This section is a lot more complicated since various types of artifacts are now on disk on the air-gapped VM. Get a root shell on the air gap VM as the rest of this section will be executed from there. Let's start by setting the same architecture variables and environmental as were set on the host/laptop and then install all of the RPM packages:

UARCH=$(uname -m)
# Set architecture variables

if [["$UARCH" == "arm64" || "$UARCH" == "aarch64"]]; then

 ARCH="aarch64"
 K8s_ARCH="arm64"

else

 ARCH="x86_64"
 K8s_ARCH="amd64"

fi

# Set environment variables
CNI_PLUGINS_VERSION="v1.3.0"
CRICTL_VERSION="v1.27.0"
KUBE_RELEASE="v1.27.3"
RELEASE_VERSION="v0.15.1"
K9S_VERSION="v0.27.4"

cd ~/tmp/

dnf -y install ./*.rpm

Next, install the CNI plugins and crictl:

mkdir -p /opt/cni/bin
tar -C /opt/cni/bin -xz -f "cni-plugins-linux-${K8s_ARCH}-v1.3.0.tgz"
tar -C /usr/local/bin-xz -f "crictl-v1.27.0-linux-${K8s_ARCH}.tar.gz"

Make kubeadm, kubelet and kubectl executable and move them from the /tmp directory to /usr/local/bin:

chmod +x kubeadm kubelet kubectl
mv kubeadm kubelet kubectl /usr/local/bin

Define an override for the systemd kubelet service file, and move it to the proper location:

mkdir -p /etc/systemd/system/kubelet.service.d

sed "s:/usr/bin:/usr/local/bin:g" 10-kubeadm.conf > /etc/systemd/system/kubelet.service.d/10-kubeadm.conf

The CRI plugin for containerd is disabled by default; enable it:

sed -i 's/^disabled_plugins = \["cri"\]/#&/' /etc/containerd/config.toml

Put a custom /etc/docker/daemon.json file in place:

echo '{

"exec-opts": ["native.cgroupdriver=systemd"],

"insecure-registries" : ["localhost:5000"],

"allow-nondistributable-artifacts": ["localhost:5000"],

"log-driver": "json-file",

"log-opts": {

"max-size": "100m"

},

"group": "rnd",

"storage-driver": "overlay2",

"storage-opts": [

"overlay2.override_kernel_check=true"

]

}' > /etc/docker/daemon.json

Two important items to highlight in the Docker daemon.json configuration file. The insecure-registries line means that the registry in brackets does not support TLS. Even inside an air gapped environment, this isn't a good practice but is fine for the purposes of this lab. The allow-nondistributable-artifacts line tells Docker to permit pushing nondistributable artifacts to this registry. Docker by default does not push these layers to avoid potential issues around licensing or distribution rights. A good example of this is the Windows base container image. This line will allow layers that Docker marks as "foreign" to be pushed to the registry. While not a big deal for this article, that line could be required for some air gapped environments. All layers have to exist locally since nothing inside the air gapped environment can reach out to a public container image registry to get what it needs.

(Re)start Docker and enable it so it starts at system boot:

systemctl restart docker
systemctl enable docker

Start, and enable, containerd and the kubelet:

systemctl enable --now containerd
systemctl enable --now kubelet

The container image registry that runs in Docker is only required for any CNI related containers and subsequent workload containers. This registry is not used to house the Kubernetes component containers. Note, nerdctl would have also worked here as an alternative to Docker and would have allowed for direct interaction with containerd. Docker was chosen for its familiarity.

Start a container image registry inside Docker:

docker load -i registry_2.8.2.tar
docker run -d -p 5000:5000 --restart=always --name registry registry:2.8.2

Load Flannel containers into the Docker registry

Note: Flannel was chosen for this lab due to familiarity. Chose whatever CNI works best in your environment.

docker load -i flannel_flannel_v0.22.0.tar
docker load -i flannel_flannel-cni-plugin_v1.1.2.tar
docker tag flannel/flannel:v0.22.0 localhost:5000/flannel/flannel:v0.22.0
docker tag flannel/flannel-cni-plugin:v1.1.1 localhost:5000/flannel/flannel-cni-plugin:v1.1.1
docker push localhost:5000/flannel/flannel:v0.22.0
docker push localhost:5000/flannel/flannel-cni-plugin:v1.1.1

Load container images for Kubernetes components, via ctr:

images_files=(
 "registry.k8s.io/kube-apiserver:${KUBE_RELEASE}"
 "registry.k8s.io/kube-controller-manager:${KUBE_RELEASE}"
 "registry.k8s.io/kube-scheduler:${KUBE_RELEASE}"
 "registry.k8s.io/kube-proxy:${KUBE_RELEASE}"
 "registry.k8s.io/pause:3.9"
 "registry.k8s.io/etcd:3.5.7-0"
 "registry.k8s.io/coredns/coredns:v1.10.1"

)


for index in "${!image_files[@]}"; do

 if [[-f "${image_files[$index]}" ]]; then

 # The below line loads the images where they need to be on the VM
 ctr -n k8s.io images import ${image_files[$index]}

 else

 echo "File ${image_files[$index]} not found!" 1>&2

 fi

done

A totally reasonable question here could be "Why not use the Docker registry that was just stood up to house the K8s component images?" This simply didn't work even with the proper modification to the configuration file that gets passed to kubeadm.

Spin up the Kubernetes cluster

Check if a cluster is already running and tear it down if it is:

if systemctl is-active --quiet kubelet; then

 # Reset the Kubernetes cluster

 echo "A Kubernetes cluster is already running. Resetting the cluster..."

 kubeadm reset -f

fi

Log into the Docker registry from inside the air-gapped VM:

# OK for a demo; use secure credentials in production!

DOCKER_USER=user
DOCKER_PASS=pass
echo ${DOCKER_PASS} | docker login --username=${DOCKER_USER} --password-stdin localhost:5000

Create a cluster configuration file and initialize the cluster:

echo "---

apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
clusterName: kubernetes
kubernetesVersion: v1.27.3
networking:
 dnsDomain: cluster.local
 podSubnet: 10.244.0.0/16 # --pod-network-cidr
 serviceSubnet: 10.96.0.0/12
---
apiVersion: kubeadm.k8s.io/v1beta3
kind: InitConfiguration
localAPIEndpoint:
 advertiseAddress: 10.10.10.10 # Update to the IP address of the air gap VM
 bindPort: 6443
nodeRegistration:
 criSocket: unix:///run/containerd/containerd.sock # or rely on autodetection
 name: airgap # this must match the hostname of the air gap VM
# Since this is a single node cluster, this taint has to be commented out,
# otherwise the coredns pods will not come up.
# taints:
# - effect: NoSchedule
# key: node-role.kubernetes.io/master" > kubeadm_cluster.yaml

kubeadm init --config kubeadm_config.yaml

Set $KUBECONFIG and use kubectl to wait until the API server is healthy:

export KUBECONFIG=/etc/kubernetes/admin.conf

until kubectl get nodes; do
 echo -e "\nWaiting for API server to respond..." 1>&2
 sleep 5

done

Set up networking

Update Flannel image locations in the Flannel manifest, and apply it:

sed -i 's/image: docker\.io/image: localhost:5000/g' kube-flannel.yaml
kubectl apply -f kube-flannel.yaml

Run kubectl get pods -A --watch until all pods are up and running.

Run an example Pod

With a cluster operational, the next step is a workload. For this simple demonstration, the Podinfo application will be deployed.

Install Helm

This first part of the procedure must be executed from the host/laptop. If not already present, install Helm following Installing Helm.

Next, download the helm binary for Linux:

UARCH=$(uname -m)
# Reset the architecture variables if needed
if [["$UARCH" == "arm64" || "$UARCH" == "aarch64"]]; then

 ARCH="aarch64"
 K8s_ARCH="arm64"

else

 ARCH="x86_64"
 K8s_ARCH="amd64"

fi

curl -LO https://get.helm.sh/helm-v3.12.2-linux-${K8s_ARCH}.tar.gz

Add the Podinfo helm repository, download the Podinfo helm chart, download the Podinfo container image, and then finally save it to the local disk:

helm repo add https://stefanprodan.github.io/podinfo
helm fetch podinfo/podinfo --version 6.4.0
docker pull ghcr.io/stefanprodan/podinfo:6.4.0

Save the podinfo image to a tar file on the local disk

docker save -o podinfo_podinfo-6.4.0.tar ghcr.io/stefanprodan/podinfo

### Transfer the image across the air gap
Reuse the `~/tmp` directory created on the air gapped VM to transport these artifacts across the air gap:
```bash
scp -i <<SSH_KEY>> <<FILE>> <<AIRGAP_VM_USER>>@<<AIRGAP_VM_IP>>:~/tmp/

Continue on the isolated side

Now pivot over to the air gap VM for the rest of the installation procedure.

Switch into ~/tmp:

cd ~/tmp

Extract and move the helm binary:

tar -zxvf helm-v3.0.0-linux-amd64.tar.gz
mv linux-amd64/helm /usr/local/bin/helm

Load the Podinfo container image into the local Docker registry:

docker load -i podinfo_podinfo-6.4.0.tar
docker tag podinfo/podinfo:6.4.0 localhost:5000/podinfo/podinfo:6.4.0
docker push localhost:5000/podinfo/podinfo:6.4.0

Ensure "$KUBECONFIG` is set correctly, then install the Podinfo Helm chart:

# Outside of a demo or lab environment, use lower (or even least) privilege
# credentials to manage your workloads.
export KUBECONFIG=/etc/kubernetes/admin.conf
helm install podinfo ./podinfo-6.4.0.tgz --set image.repository=localhost:5000/podinfo/podinfo

Verify that the Podinfo application comes up:

kubectl get pods -n default

Or run k9s (a terminal user interface for Kubernetes):

k9s

Zarf

Zarf is an open-source tool that takes a declarative approach to software packaging and delivery, including air gap. This same podinfo application will be installed onto the air gap VM using Zarf in this section. The first step is to install Zarf on the host/laptop.

Alternatively, a prebuilt binary can be downloaded onto the host/laptop from GitHub for various OS/CPU architectures.

A binary is also needed across the air gap on the VM:

UARCH=$(uname -m)
# Set the architecture variables if needed
if [["$UARCH" == "arm64" || "$UARCH" == "aarch64"]]; then

 ARCH="aarch64"
 K8s_ARCH="arm64"

else

 ARCH="x86_64"
 K8s_ARCH="amd64"

fi

export ZARF_VERSION=v0.28.3

curl -LO "https://github.com/defenseunicorns/zarf/releases/download/${ZARF_VERSION}/zarf_${ZARF_VERSION}_Linux_${K8s_ARCH}"

Zarf needs to bootstrap itself into a Kubernetes cluster through the use of an init package. That also needs to be transported across the air gap so let's download it onto the host/laptop:

curl -LO "https://github.com/defenseunicorns/zarf/releases/download/${ZARF_VERSION}/zarf-init-${K8s_ARCH}-${ZARF_VERSION}.tar.zst"

The way that Zarf is declarative is through the use of a zarf.yaml file. Here is the zarf.yaml file that will be used for this Podinfo installation. Write it to whatever directory you you have write access to on your host/laptop; your home directory is fine:

echo 'kind: ZarfPackageConfig
metadata:
name: podinfo
description: "Deploy helm chart for the podinfo application in K8s via zarf"
components:
- name: podinfo
required: true
charts:
- name: podinfo
version: 6.4.0
namespace: podinfo-helm-namespace
releaseName: podinfo
url: https://stefanprodan.github.io/podinfo
images:
- ghcr.io/stefanprodan/podinfo:6.4.0' > zarf.yaml

The next step is to build the Podinfo package. This must be done from the same directory location where the zarf.yaml file is located.

zarf package create --confirm

This command will download the defined helm chart and image and put them into a single file written to disk. This single file is all that needs to be carried across the air gap:

ls zarf-package-*

Sample output:

zarf-package-podinfo-arm64.tar.zst

Transport the linux zarf binary, zarf init package and Podinfo package over to the air gapped VM:

scp -i <<SSH_KEY>> <<FILE>> <<AIRGAP_VM_USER>>@<<AIRGAP_VM_IP>>:~/tmp/

From the air gapped VM, switch into the ~/tmp directory where all of the artifacts were placed:

cd ~/tmp

Set $KUBECONFIG to a file with credentials for the local cluster; also set the the Zarf version:

export KUBECONFIG=/etc/kubernetes/admin.conf

export ZARF_VERSION=$(zarf version)

Make the zarf binary executable and (as root) move it to /usr/bin:

chmod +x zarf && sudo mv zarf /usr/bin

Likewise, move the Zarf init package to /usr/bin:

mv zarf-init-arm64-${ZARF_VERSION}.tar.zst /usr/bin

Initialize Zarf into the cluster:

zarf init --confirm --components=git-server

When this command is done, a Zarf package is ready to be deployed.

zarf package deploy

This command will search the current directory for a Zarf package. Select the podinfo package (zarf-package-podinfo-${K8s_ARCH}.tar.zst) and continue. Once the package deployment is complete, run zarf tools monitor in order to bring up k9s to view the cluster.

Conclusion

This is one method that can be used to spin up an air-gapped cluster and two methods to deploy a mission application. Your mileage may vary on different operating systems regarding the exact software artifacts that need to be carried across the air gap, but conceptually this procedure is still valid.

This demo also created an artificial air-gapped environment. In the real world, every missed dependency could represent hours, if not days, or weeks of lost time to get running software in the air-gapped environment. This artificial air gap also obscured some common methods or air gap software delivery such as using a data diode. Depending on the environment, the diode can be very expensive to use. Also, none of the artifacts were scanned before being carried across the air gap. The presence of the air gap in general means that the workload running there is more sensitive, and nothing should be carried across unless it's known to be safe.

12 Oct 2023 12:00am GMT

10 Oct 2023

feedKubernetes – Production-Grade Container Orchestration

Blog: CRI-O is moving towards pkgs.k8s.io

Author: Sascha Grunert

The Kubernetes community recently announced that their legacy package repositories are frozen, and now they moved to introduced community-owned package repositories powered by the OpenBuildService (OBS). CRI-O has a long history of utilizing OBS for their package builds, but all of the packaging efforts have been done manually so far.

The CRI-O community absolutely loves Kubernetes, which means that they're delighted to announce that:

All future CRI-O packages will be shipped as part of the officially supported Kubernetes infrastructure hosted on pkgs.k8s.io!

There will be a deprecation phase for the existing packages, which is currently being discussed in the CRI-O community. The new infrastructure will only support releases of CRI-O >= v1.28.2 as well as release branches newer than release-1.28.

How to use the new packages

In the same way as the Kubernetes community, CRI-O provides deb and rpm packages as part of a dedicated subproject in OBS, called isv:kubernetes:addons:cri-o. This project acts as an umbrella and provides stable (for CRI-O tags) as well as prerelease (for CRI-O release-1.y and main branches) package builds.

Stable Releases:

Prereleases:

There are no stable releases available in the v1.29 repository yet, because v1.29.0 will be released in December. The CRI-O community will also not support release branches older than release-1.28, because there have been CI requirements merged into main which could be only backported to release-1.28 with appropriate efforts.

For example, If an end-user would like to install the latest available version of the CRI-O main branch, then they can add the repository in the same way as they do for Kubernetes.

rpm Based Distributions

For rpm based distributions, you can run the following commands as a root user to install CRI-O together with Kubernetes:

Add the Kubernetes repo

cat <<EOF | tee /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://pkgs.k8s.io/core:/stable:/v1.28/rpm/
enabled=1
gpgcheck=1
gpgkey=https://pkgs.k8s.io/core:/stable:/v1.28/rpm/repodata/repomd.xml.key
EOF

Add the CRI-O repo

cat <<EOF | tee /etc/yum.repos.d/cri-o.repo
[cri-o]
name=CRI-O
baseurl=https://pkgs.k8s.io/addons:/cri-o:/prerelease:/main/rpm/
enabled=1
gpgcheck=1
gpgkey=https://pkgs.k8s.io/addons:/cri-o:/prerelease:/main/rpm/repodata/repomd.xml.key
EOF

Install official package dependencies

dnf install -y \
 conntrack \
 container-selinux \
 ebtables \
 ethtool \
 iptables \
 socat

Install the packages from the added repos

dnf install -y --repo cri-o --repo kubernetes \
 cri-o \
 kubeadm \
 kubectl \
 kubelet

deb Based Distributions

For deb based distributions, you can run the following commands as a root user:

Install dependencies for adding the repositories

apt-get update
apt-get install -y software-properties-common curl

Add the Kubernetes repository

curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.28/deb/Release.key |
 gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
echo "deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.28/deb/ /" |
 tee /etc/apt/sources.list.d/kubernetes.list

Add the CRI-O repository

curl -fsSL https://pkgs.k8s.io/addons:/cri-o:/prerelease:/main/deb/Release.key |
 gpg --dearmor -o /etc/apt/keyrings/cri-o-apt-keyring.gpg
echo "deb [signed-by=/etc/apt/keyrings/cri-o-apt-keyring.gpg] https://pkgs.k8s.io/addons:/cri-o:/prerelease:/main/deb/ /" |
 tee /etc/apt/sources.list.d/cri-o.list

Install the packages

apt-get update
apt-get install -y cri-o kubelet kubeadm kubectl

Start CRI-O

systemctl start crio.service

The Project's prerelease:/main prefix at the CRI-O's package path, can be replaced with stable:/v1.28, stable:/v1.29, prerelease:/v1.28 or prerelease:/v1.29 if another stream package is used.

Bootstrapping a cluster using kubeadm can be done by running kubeadm init command, which automatically detects that CRI-O is running in the background. There are also Vagrantfile examples available for Fedora 38 as well as Ubuntu 22.04 for testing the packages together with kubeadm.

How it works under the hood

Everything related to these packages lives in the new CRI-O packaging repository. It contains a daily reconciliation GitHub action workflow, for all supported release branches as well as tags of CRI-O. A test pipeline in the OBS workflow ensures that the packages can be correctly installed and used before being published. All of the staging and publishing of the packages is done with the help of the Kubernetes Release Toolbox (krel), which is also used for the official Kubernetes deb and rpm packages.

The package build inputs will undergo daily reconciliation and will be supplied by CRI-O's static binary bundles. These bundles are built and signed for each commit in the CRI-O CI, and contain everything CRI-O requires to run on a certain architecture. The static builds are reproducible, powered by nixpkgs and available only for x86_64, aarch64 and ppc64le architecture.

The CRI-O maintainers will be happy to listen to any feedback or suggestions on the new packaging efforts! Thank you for reading this blog post, feel free to reach out to the maintainers via the Kubernetes Slack channel #crio or create an issue in the packaging repository.

10 Oct 2023 12:00am GMT

05 Oct 2023

feedKubernetes – Production-Grade Container Orchestration

Blog: Spotlight on SIG Architecture: Conformance

Author: Frederico Muñoz (SAS Institute)

This is the first interview of a SIG Architecture Spotlight series that will cover the different subprojects. We start with the SIG Architecture: Conformance subproject

In this SIG Architecture spotlight, we talked with Riaan Kleinhans (ii-Team), Lead for the Conformance sub-project.

About SIG Architecture and the Conformance subproject

Frederico (FSM): Hello Riaan, and welcome! For starters, tell us a bit about yourself, your role and how you got involved in Kubernetes.

Riaan Kleinhans (RK): Hi! My name is Riaan Kleinhans and I live in South Africa. I am the Project manager for the ii-Team in New Zealand. When I joined ii the plan was to move to New Zealand in April 2020 and then Covid happened. Fortunately, being a flexible and dynamic team we were able to make it work remotely and in very different time zones.

The ii team have been tasked with managing the Kubernetes Conformance testing technical debt and writing tests to clear the technical debt. I stepped into the role of project manager to be the link between monitoring, test writing and the community. Through that work I had the privilege of meeting Dan Kohn in those first months, his enthusiasm about the work we were doing was a great inspiration.

FSM: Thank you - so, your involvement in SIG Architecture started because of the conformance work?

RK: SIG Architecture is the home for the Kubernetes Conformance subproject. Initially, most of my interactions were directly with SIG Architecture through the Conformance sub-project. However, as we began organizing the work by SIG, we started engaging directly with each individual SIG. These engagements with the SIGs that own the untested APIs have helped us accelerate our work.

FSM: How would you describe the main goals and areas of intervention of the Conformance sub-project?

RM: The Kubernetes Conformance sub-project focuses on guaranteeing compatibility and adherence to the Kubernetes specification by developing and maintaining a comprehensive conformance test suite. Its main goals include assuring compatibility across different Kubernetes implementations, verifying adherence to the API specification, supporting the ecosystem by encouraging conformance certification, and fostering collaboration within the Kubernetes community. By providing standardised tests and promoting consistent behaviour and functionality, the Conformance subproject ensures a reliable and compatible Kubernetes ecosystem for developers and users alike.

More on the Conformance Test Suite

FSM: A part of providing those standardised tests is, I believe, the Conformance Test Suite. Could you explain what it is and its importance?

RK: The Kubernetes Conformance Test Suite checks if Kubernetes distributions meet the project's specifications, ensuring compatibility across different implementations. It covers various features like APIs, networking, storage, scheduling, and security. Passing the tests confirms proper implementation and promotes a consistent and portable container orchestration platform.

FSM: Right, the tests are important in the way they define the minimum features that any Kubernetes cluster must support. Could you describe the process around determining which features are considered for inclusion? Is there any tension between a more minimal approach, and proposals from the other SIGs?

RK: The requirements for each endpoint that undergoes conformance testing are clearly defined by SIG Architecture. Only API endpoints that are generally available and non-optional features are eligible for conformance. Over the years, there have been several discussions regarding conformance profiles, exploring the possibility of including optional endpoints like RBAC, which are widely used by most end users, in specific profiles. However, this aspect is still a work in progress.

Endpoints that do not meet the conformance criteria are listed in ineligible_endpoints.yaml, which is publicly accessible in the Kubernetes repo. This file can be updated to add or remove endpoints as their status or requirements change. These ineligible endpoints are also visible on APISnoop.

Ensuring transparency and incorporating community input regarding the eligibility or ineligibility of endpoints is of utmost importance to SIG Architecture.

FSM: Writing tests for new features is something generally requires some kind of enforcement. How do you see the evolution of this in Kubernetes? Was there a specific effort to improve the process in a way that required tests would be a first-class citizen, or was that never an issue?

RK: When discussions surrounding the Kubernetes conformance programme began in 2018, only approximately 11% of endpoints were covered by tests. At that time, the CNCF's governing board requested that if funding were to be provided for the work to cover missing conformance tests, the Kubernetes Community should adopt a policy of not allowing new features to be added unless they include conformance tests for their stable APIs.

SIG Architecture is responsible for stewarding this requirement, and APISnoop has proven to be an invaluable tool in this regard. Through automation, APISnoop generates a pull request every weekend to highlight any discrepancies in Conformance coverage. If any endpoints are promoted to General Availability without a conformance test, it will be promptly identified. This approach helps prevent the accumulation of new technical debt.

Additionally, there are plans in the near future to create a release informing job, which will add an additional layer to prevent any new technical debt.

FSM: I see, tooling and automation play an important role there. What are, in your opinion, the areas that, conformance-wise, still require some work to be done? In other words, what are the current priority areas marked for improvement?

RK: We have reached the "100% Conformance Tested" milestone in release 1.27!

At that point, the community took another look at all the endpoints that were listed as ineligible for conformance. The list was populated through community input over several years. Several endpoints that were previously deemed ineligible for conformance have been identified and relocated to a new dedicated list, which is currently receiving focused attention for conformance test development. Again, that list can also be checked on apisnoop.cncf.io.

To ensure the avoidance of new technical debt in the conformance project, there are upcoming plans to establish a release informing job as an additional preventive measure.

While APISnoop is currently hosted on CNCF infrastructure, the project has been generously donated to the Kubernetes community. Consequently, it will be transferred to community-owned infrastructure before the end of 2023.

FSM: That's great news! For anyone wanting to help, what are the venues for collaboration that you would highlight? Do all of them require solid knowledge of Kubernetes as a whole, or are there ways someone newer to the project can contribute?

RK: Contributing to conformance testing is akin to the task of "washing the dishes" - it may not be highly visible, but it remains incredibly important. It necessitates a strong understanding of Kubernetes, particularly in the areas where the endpoints need to be tested. This is why working with each SIG that owns the API endpoint being tested is so important.

As part of our commitment to making test writing accessible to everyone, the ii team is currently engaged in the development of a "click and deploy" solution. This solution aims to enable anyone to swiftly create a working environment on real hardware within minutes. We will share updates regarding this development as soon as we are ready.

FSM: That's very helpful, thank you. Any final comments you would like to share with our readers?

RK: Conformance testing is a collaborative community endeavour that involves extensive cooperation among SIGs. SIG Architecture has spearheaded the initiative and provided guidance. However, the progress of the work relies heavily on the support of all SIGs in reviewing, enhancing, and endorsing the tests.

I would like to extend my sincere appreciation to the ii team for their unwavering commitment to resolving technical debt over the years. In particular, Hippie Hacker's guidance and stewardship of the vision has been invaluable. Additionally, I want to give special recognition to Stephen Heywood for shouldering the majority of the test writing workload in recent releases, as well as to Zach Mandeville for his contributions to APISnoop.

FSM: Many thanks for your availability and insightful comments, I've personally learned quite a bit with it and I'm sure our readers will as well.

05 Oct 2023 12:00am GMT

02 Oct 2023

feedKubernetes – Production-Grade Container Orchestration

Blog: Announcing the 2023 Steering Committee Election Results

Author: Kaslin Fields

The 2023 Steering Committee Election is now complete. The Kubernetes Steering Committee consists of 7 seats, 4 of which were up for election in 2023. Incoming committee members serve a term of 2 years, and all members are elected by the Kubernetes Community.

This community body is significant since it oversees the governance of the entire Kubernetes project. With that great power comes great responsibility. You can learn more about the steering committee's role in their charter.

Thank you to everyone who voted in the election; your participation helps support the community's continued health and success.

Results

Congratulations to the elected committee members whose two year terms begin immediately (listed in alphabetical order by GitHub handle):

They join continuing members:

Stephen Augustus is a returning Steering Committee Member.

Big Thanks!

Thank you and congratulations on a successful election to this round's election officers:

Thanks to the Emeritus Steering Committee Members. Your service is appreciated by the community:

And thank you to all the candidates who came forward to run for election.

Get Involved with the Steering Committee

This governing body, like all of Kubernetes, is open to all. You can follow along with Steering Committee backlog items and weigh in by filing an issue or creating a PR against their repo. They have an open meeting on the first Monday at 9:30am PT of every month. They can also be contacted at their public mailing list steering@kubernetes.io.

You can see what the Steering Committee meetings are all about by watching past meetings on the YouTube Playlist.

If you want to meet some of the newly elected Steering Committee members, join us for the Steering AMA at the Kubernetes Contributor Summit in Chicago.


This post was written by the Contributor Comms Subproject. If you want to write stories about the Kubernetes community, learn more about us.

02 Oct 2023 12:00am GMT