31 Jul 2025

feedHacker News

GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning

Comments

31 Jul 2025 10:57am GMT

feedLinuxiac

DuckStation PS1 Emulator Dev May Drop Linux Support After AUR Frustrations

DuckStation PS1 Emulator Dev May Drop Linux Support After AUR Frustrations

After repeated complaints from Arch users, the DuckStation PS1 emulator dev removed the PKGBUILD and is considering dropping Linux support altogether.

31 Jul 2025 10:46am GMT

feedHacker News

GOP’s Josh Hawley and Democrats vote to advance congressional stock trading ban

Comments

31 Jul 2025 10:07am GMT

Show HN: AgentGuard – Auto-kill AI agents before they burn through your budget

Comments

31 Jul 2025 5:54am GMT

30 Jul 2025

feedLinuxiac

Alma-Based HeliumOS 10 Is Out — Here’s What I Think

Alma-Based HeliumOS 10 Is Out — Here’s What I Think

Alma-based HeliumOS 10 is out now with Linux kernel 6.12, Zsh as default, Btrfs with optional encryption, and Docker preinstalled-here's my take.

30 Jul 2025 8:21pm GMT

feedOMG! Ubuntu

Ubuntu 25.10 Offers Improved Disk Encryption Using TPM

Ubuntu 25.10 improves experimental TPM-backed full-disk encryption, which ties security to hardware integrity. New options and checks will be in place.

You're reading Ubuntu 25.10 Offers Improved Disk Encryption Using TPM, a blog post from OMG! Ubuntu. Do not reproduce elsewhere without permission.

30 Jul 2025 7:00pm GMT

feedUbuntu blog

How to enable Real-time Ubuntu on your machine 

If you're here, you likely already know about preemption, determinism, and real-time capable operating systems. If that's the case, and you want to learn how to get up and running with Real-time Ubuntu, skip ahead now to find out how to enable the kernel on your workstation. If you'd like a short refresher, we have […]

30 Jul 2025 11:12am GMT

feedOMG! Ubuntu

GNOME Shell Gets a Proper Desktop Photo Widget (Finally)

A customisable photo widget for your GNOME desktop that shows images from any folder you like. Resizable and moveable, it adds personalised flourish.

You're reading GNOME Shell Gets a Proper Desktop Photo Widget (Finally), a blog post from OMG! Ubuntu. Do not reproduce elsewhere without permission.

30 Jul 2025 11:04am GMT

feedLinuxiac

Archinstall 3.0.9 Rolls Out with U2F and Bluetooth Support

Archinstall 3.0.9 Rolls Out with U2F and Bluetooth Support

Archinstall 3.0.9, a guided installer for Arch Linux, adds U2F authentication, LUKS iteration tweaks, and Bluetooth support.

30 Jul 2025 9:13am GMT

29 Jul 2025

feedOMG! Ubuntu

Fish is Like Bash With a Brain — Here’s How to Try it on Ubuntu

Fish Shell.Fish might be the Bash alternative you didn't know you needed, thanks to features like highlighting, and smarter command suggestions. Learn how to install it on Ubuntu.

You're reading Fish is Like Bash With a Brain - Here's How to Try it on Ubuntu, a blog post from OMG! Ubuntu. Do not reproduce elsewhere without permission.

29 Jul 2025 6:04pm GMT

feedUbuntu blog

Canonical MAAS awarded as best quality software by TIOBE

Canonical's MAAS User Interface has been ranked as the top-quality software project in its category by the quarterly TIOBE Software Quality Assurance Award

29 Jul 2025 1:30pm GMT

28 Jul 2025

feedKubernetes Blog

Kubernetes v1.34 Sneak Peek

Kubernetes v1.34 is coming at the end of August 2025. This release will not include any removal or deprecation, but it is packed with an impressive number of enhancements. Here are some of the features we are most excited about in this cycle!

Please note that this information reflects the current state of v1.34 development and may change before release.

Featured enhancements of Kubernetes v1.34

The following list highlights some of the notable enhancements likely to be included in the v1.34 release, but is not an exhaustive list of all planned changes. This is not a commitment and the release content is subject to change.

The core of DRA targets stable

Dynamic Resource Allocation (DRA) provides a flexible way to categorize, request, and use devices like GPUs or custom hardware in your Kubernetes cluster.

Since the v1.30 release, DRA has been based around claiming devices using structured parameters that are opaque to the core of Kubernetes. The relevant enhancement proposal, KEP-4381, took inspiration from dynamic provisioning for storage volumes. DRA with structured parameters relies on a set of supporting API kinds: ResourceClaim, DeviceClass, ResourceClaimTemplate, and ResourceSlice API types under resource.k8s.io, while extending the .spec for Pods with a new resourceClaims field. The core of DRA is targeting graduation to stable in Kubernetes v1.34.

With DRA, device drivers and cluster admins define device classes that are available for use. Workloads can claim devices from a device class within device requests. Kubernetes allocates matching devices to specific claims and places the corresponding Pods on nodes that can access the allocated devices. This framework provides flexible device filtering using CEL, centralized device categorization, and simplified Pod requests, among other benefits.

Once this feature has graduated, the resource.k8s.io/v1 APIs will be available by default.

ServiceAccount tokens for image pull authentication

The ServiceAccount token integration for kubelet credential providers is likely to reach beta and be enabled by default in Kubernetes v1.34. This allows the kubelet to use these tokens when pulling container images from registries that require authentication.

That support already exists as alpha, and is tracked as part of KEP-4412.

The existing alpha integration allows the kubelet to use short-lived, automatically rotated ServiceAccount tokens (that follow OIDC-compliant semantics) to authenticate to a container image registry. Each token is scoped to one associated Pod; the overall mechanism replaces the need for long-lived image pull Secrets.

Adopting this new approach reduces security risks, supports workload-level identity, and helps cut operational overhead. It brings image pull authentication closer to modern, identity-aware good practice.

Pod replacement policy for Deployments

After a change to a Deployment, terminating pods may stay up for a considerable amount of time and may consume additional resources. As part of KEP-3973, the .spec.podReplacementPolicy field will be introduced (as alpha) for Deployments.

If your cluster has the feature enabled, you'll be able to select one of two policies:

TerminationStarted
Creates new pods as soon as old ones start terminating, resulting in faster rollouts at the cost of potentially higher resource consumption.
TerminationComplete
Waits until old pods fully terminate before creating new ones, resulting in slower rollouts but ensuring controlled resource consumption.

This feature makes Deployment behavior more predictable by letting you choose when new pods should be created during updates or scaling. It's beneficial when working in clusters with tight resource constraints or with workloads with long termination periods.

It's expected to be available as an alpha feature and can be enabled using the DeploymentPodReplacementPolicy and DeploymentReplicaSetTerminatingReplicas feature gates in the API server and kube-controller-manager.

Production-ready tracing for kubelet and API Server

To address the longstanding challenge of debugging node-level issues by correlating disconnected logs, KEP-2831 provides deep, contextual insights into the kubelet.

This feature instruments critical kubelet operations, particularly its gRPC calls to the Container Runtime Interface (CRI), using the vendor-agnostic OpenTelemetry standard. It allows operators to visualize the entire lifecycle of events (for example: a Pod startup) to pinpoint sources of latency and errors. Its most powerful aspect is the propagation of trace context; the kubelet passes a trace ID with its requests to the container runtime, enabling runtimes to link their own spans.

This effort is complemented by a parallel enhancement, KEP-647, which brings the same tracing capabilities to the Kubernetes API server. Together, these enhancements provide a more unified, end-to-end view of events, simplifying the process of pinpointing latency and errors from the control plane down to the node. These features have matured through the official Kubernetes release process. KEP-2831 was introduced as an alpha feature in v1.25, while KEP-647 debuted as alpha in v1.22. Both enhancements were promoted to beta together in the v1.27 release. Looking forward, Kubelet Tracing (KEP-2831) and API Server Tracing (KEP-647) are now targeting graduation to stable in the upcoming v1.34 release.

PreferSameZone and PreferSameNode traffic distribution for Services

The spec.trafficDistribution field within a Kubernetes Service allows users to express preferences for how traffic should be routed to Service endpoints.

KEP-3015 deprecates PreferClose and introduces two additional values: PreferSameZone and PreferSameNode. PreferSameZone is equivalent to the current PreferClose. PreferSameNode prioritizes sending traffic to endpoints on the same node as the client.

This feature was introduced in v1.33 behind the PreferSameTrafficDistribution feature gate. It is targeting graduation to beta in v1.34 with its feature gate enabled by default.

Support for KYAML: a Kubernetes dialect of YAML

KYAML aims to be a safer and less ambiguous YAML subset, and was designed specifically for Kubernetes. Whatever version of Kubernetes you use, you'll be able use KYAML for writing manifests and/or Helm charts. You can write KYAML and pass it as an input to any version of kubectl, because all KYAML files are also valid as YAML. With kubectl v1.34, we expect you'll also be able to request KYAML output from kubectl (as in kubectl get -o kyaml …). If you prefer, you can still request the output in JSON or YAML format.

KYAML addresses specific challenges with both YAML and JSON. YAML's significant whitespace requires careful attention to indentation and nesting, while its optional string-quoting can lead to unexpected type coercion (for example: "The Norway Bug"). Meanwhile, JSON lacks comment support and has strict requirements for trailing commas and quoted keys.

KEP-5295 introduces KYAML, which tries to address the most significant problems by:

This might sound a lot like JSON, because it is! But unlike JSON, KYAML supports comments, allows trailing commas, and doesn't require quoted keys.

We're hoping to see KYAML introduced as a new output format for kubectl v1.34. As with all these features, none of these changes are 100% confirmed; watch this space!

As a format, KYAML is and will remain a strict subset of YAML, ensuring that any compliant YAML parser can parse KYAML documents. Kubernetes does not require you to provide input specifically formatted as KYAML, and we have no plans to change that.

Fine-grained autoscaling control with HPA configurable tolerance

KEP-4951 introduces a new feature that allows users to configure autoscaling tolerance on a per-HPA basis, overriding the default cluster-wide 10% tolerance setting that often proves too coarse-grained for diverse workloads. The enhancement adds an optional tolerance field to the HPA's spec.behavior.scaleUp and spec.behavior.scaleDown sections, enabling different tolerance values for scale-up and scale-down operations, which is particularly valuable since scale-up responsiveness is typically more critical than scale-down speed for handling traffic surges.

Released as alpha in Kubernetes v1.33 behind the HPAConfigurableTolerance feature gate, this feature is expected to graduate to beta in v1.34. This improvement helps to address scaling challenges with large deployments, where for scaling in, a 10% tolerance might mean leaving hundreds of unnecessary Pods running. Using the new, more flexible approach would enable workload-specific optimization for both responsive and conservative scaling behaviors.

Want to know more?

New features and deprecations are also announced in the Kubernetes release notes. We will formally announce what's new in Kubernetes v1.34 as part of the CHANGELOG for that release.

The Kubernetes v1.34 release is planned for Wednesday 27th August 2025. Stay tuned for updates!

Get involved

The simplest way to get involved with Kubernetes is to join one of the many Special Interest Groups (SIGs) that align with your interests. Have something you'd like to broadcast to the Kubernetes community? Share your voice at our weekly community meeting, and through the channels below. Thank you for your continued feedback and support.

28 Jul 2025 12:00am GMT

25 Jul 2025

feedJavaScript Weekly

Will WebAssembly ever get DOM support?

#​746 - July 25, 2025

Read on the Web

JavaScript Weekly

es-toolkit: A Modern JavaScript Utility Library - Boasts being both faster and '97% smaller' than the ubiquitous Lodash, for which it is a direct 'seamless' replacement (and now boasting 100% Lodash compatibility). The reference guide shows off all it can do, and it's widely adopted - being used by Storybook, CKEditor, and recommended by Nuxt. GitHub repo.

Viva Republica, Inc

Avoid Common Mistakes in React and Next.js - Avoid redundant useState and useEffect, deeply nested data, unscalable forms, and hidden shared state bugs. David Khourshid teaches practical patterns to refactor complex apps and scale with confidence!

Frontend Masters sponsor

When is WebAssembly Going to Get DOM Support? - Working with the DOM from JavaScript is straightforward, but WebAssembly requires glue code to do it. Is this going to change? Daniel of the TC39 committee digs into the issue here and says that modern build toolchains and WASM's evolution are making things easier all the time.

Daniel Ehrenberg

IN BRIEF:

RELEASES:

📖 Articles and Videos

A JS 'Numbers Station' in 1 Kilobyte - We've recently promoted the js1024 JavaScript code golfing contest - it's now over, but Terence breaks down his interesting entry which recreates the vibe of real-life numbers stations.

Terence Eden

💡 You can also look through all the other js1024 submissions.

Revisiting My 2010 JavaScript Library - A developer looks back at code he wrote 15 years ago, the 'clever solutions' he used, and why most of it is redundant in 2025.

Ibrahim Diallo

Build an MCP Server in Your Next.js Application with Clerk - Add a spec-compliant MCP endpoint in minutes, allowing LLMs to access user data with user consent.

Clerk sponsor

Web Serial: The Only Reason I'll Admit JavaScript Isn't All Bad - The author isn't a fan of JavaScript but likes the power the Web Serial API provides for working with external devices.

Steven Hicks

📄 'It's Time for Modern CSS to Kill the SPA' - "Use modern server rendering. Use actual pages. Animate with CSS. Preload with intent" Jono Alderson

📄 We Migrated Our Next.js Site to Eleventy and Increased Performance by 24% - Eleventy (11ty) is a popular Node-based static site generator. Dan Webb

📄 Handling JavaScript Event Listeners with Parameters Amejimaobari Ollornwi

📄 Build Your Own Font Search Engine - Using vision language models to index and search the fonts. Lúí Smyth

📄 Interactive Text Destruction with Three.js, WebGPU, and Three Shader Language Lolo Armdz

📄 React Router and React Server Components: The Path Forward Ebey and Dalgleish

🛠 Code & Tools

Transformers.js 3.7: Machine Learning and Models for the Web - Brings the ability to run powerful pretrained models in the browser, thanks to the ONNX runtime. v3.7 adds Voxtral (speech transcription and audio understanding), LFM2 and ModernBERT support.

Hugging Face

npq: Safely Install Packages by Auditing Them Pre-Install - npq performs several extra steps compared to npm. It consults Snyk's database of vulnerabilities, looks at the package's age, download count, and docs, and tries to paint a better picture of what you're really installing.

Liran Tal

Measure Web Performance Based on Real User Impact - With Embrace, get full session timelines, Core Web Vitals and JS exceptions in context, and user journey analysis.

Embrace sponsor

Untitled UI React: A Fresh UI Component Library - A giant collection of open-source (MIT) components built around Tailwind CSS and React Aria - there's a full introduction here. It's not only open source, with a 'PRO' offering with more components, examples, and Figma integration.

Untitled UI

ts-regexp: A Statically Typed Alternative to JavaScript's RegExp - A new approach for bringing strict typing to regular expressions in TypeScript.

Danilo Furrer

📰 Classifieds

Meticulous automatically creates and maintains an E2E UI test suite with zero developer effort. Relied on by Dropbox, Wiz, Lattice, Bilt Rewards, etc.

🔍 Discover KeyLines - a scalable JavaScript graph visualization toolkit that turns complex data into actionable insights.

🎁 Some Bonus Items

25 Jul 2025 12:00am GMT

24 Jul 2025

feedUbuntu blog

The Linux Foundation and OpenStack – a new chapter for cloud-native infrastructure

Effective July 23rd, 2025 the Open Infrastructure Foundation (OIF) has officially joined one of the world's largest and most influential open source communities: the Linux Foundation. This strategic move reflects the accelerating trend toward open source standardization and democratization - a movement Canonical has proudly supported since its inception. As a long-standing and active member […]

24 Jul 2025 4:59pm GMT

18 Jul 2025

feedKubernetes Blog

Post-Quantum Cryptography in Kubernetes

The world of cryptography is on the cusp of a major shift with the advent of quantum computing. While powerful quantum computers are still largely theoretical for many applications, their potential to break current cryptographic standards is a serious concern, especially for long-lived systems. This is where Post-Quantum Cryptography (PQC) comes in. In this article, I'll dive into what PQC means for TLS and, more specifically, for the Kubernetes ecosystem. I'll explain what the (suprising) state of PQC in Kubernetes is and what the implications are for current and future clusters.

What is Post-Quantum Cryptography

Post-Quantum Cryptography refers to cryptographic algorithms that are thought to be secure against attacks by both classical and quantum computers. The primary concern is that quantum computers, using algorithms like Shor's Algorithm, could efficiently break widely used public-key cryptosystems such as RSA and Elliptic Curve Cryptography (ECC), which underpin much of today's secure communication, including TLS. The industry is actively working on standardizing and adopting PQC algorithms. One of the first to be standardized by NIST is the Module-Lattice Key Encapsulation Mechanism (ML-KEM), formerly known as Kyber, and now standardized as FIPS-203 (PDF download).

It is difficult to predict when quantum computers will be able to break classical algorithms. However, it is clear that we need to start migrating to PQC algorithms now, as the next section shows. To get a feeling for the predicted timeline we can look at a NIST report covering the transition to post-quantum cryptography standards. It declares that system with classical crypto should be deprecated after 2030 and disallowed after 2035.

Key exchange vs. digital signatures: different needs, different timelines

In TLS, there are two main cryptographic operations we need to secure:

Key Exchange: This is how the client and server agree on a shared secret to encrypt their communication. If an attacker records encrypted traffic today, they could decrypt it in the future, if they gain access to a quantum computer capable of breaking the key exchange. This makes migrating KEMs to PQC an immediate priority.

Digital Signatures: These are primarily used to authenticate the server (and sometimes the client) via certificates. The authenticity of a server is verified at the time of connection. While important, the risk of an attack today is much lower, because the decision of trusting a server cannot be abused after the fact. Additionally, current PQC signature schemes often come with significant computational overhead and larger key/signature sizes compared to their classical counterparts.

Another significant hurdle in the migration to PQ certificates is the upgrade of root certificates. These certificates have long validity periods and are installed in many devices and operating systems as trust anchors.

Given these differences, the focus for immediate PQC adoption in TLS has been on hybrid key exchange mechanisms. These combine a classical algorithm (such as Elliptic Curve Diffie-Hellman Ephemeral (ECDHE)) with a PQC algorithm (such as ML-KEM). The resulting shared secret is secure as long as at least one of the component algorithms remains unbroken. The X25519MLKEM768 hybrid scheme is the most widely supported one.

State of PQC key exchange mechanisms (KEMs) today

Support for PQC KEMs is rapidly improving across the ecosystem.

Go: The Go standard library's crypto/tls package introduced support for X25519MLKEM768 in version 1.24 (released February 2025). Crucially, it's enabled by default when there is no explicit configuration, i.e., Config.CurvePreferences is nil.

Browsers & OpenSSL: Major browsers like Chrome (version 131, November 2024) and Firefox (version 135, February 2025), as well as OpenSSL (version 3.5.0, April 2025), have also added support for the ML-KEM based hybrid scheme.

Apple is also rolling out support for X25519MLKEM768 in version 26 of their operating systems. Given the proliferation of Apple devices, this will have a significant impact on the global PQC adoption.

For a more detailed overview of the state of PQC in the wider industry, see this blog post by Cloudflare.

Post-quantum KEMs in Kubernetes: an unexpected arrival

So, what does this mean for Kubernetes? Kubernetes components, including the API server and kubelet, are built with Go.

As of Kubernetes v1.33, released in April 2025, the project uses Go 1.24. A quick check of the Kubernetes codebase reveals that Config.CurvePreferences is not explicitly set. This leads to a fascinating conclusion: Kubernetes v1.33, by virtue of using Go 1.24, supports hybrid post-quantum X25519MLKEM768 for TLS connections by default!

You can test this yourself. If you set up a Minikube cluster running Kubernetes v1.33.0, you can connect to the API server using a recent OpenSSL client:

$ minikube start --kubernetes-version=v1.33.0
$ kubectl cluster-info
Kubernetes control plane is running at https://127.0.0.1:<PORT>
$ kubectl config view --minify --raw -o jsonpath=\'{.clusters[0].cluster.certificate-authority-data}\' | base64 -d > ca.crt
$ openssl version
OpenSSL 3.5.0 8 Apr 2025 (Library: OpenSSL 3.5.0 8 Apr 2025)
$ echo -n "Q" | openssl s_client -connect 127.0.0.1:<PORT> -CAfile ca.crt
[...]
Negotiated TLS1.3 group: X25519MLKEM768
[...]
DONE

Lo and behold, the negotiated group is X25519MLKEM768! This is a significant step towards making Kubernetes quantum-safe, seemingly without a major announcement or dedicated KEP (Kubernetes Enhancement Proposal).

The Go version mismatch pitfall

An interesting wrinkle emerged with Go versions 1.23 and 1.24. Go 1.23 included experimental support for a draft version of ML-KEM, identified as X25519Kyber768Draft00. This was also enabled by default if Config.CurvePreferences was nil. Kubernetes v1.32 used Go 1.23. However, Go 1.24 removed the draft support and replaced it with the standardized version X25519MLKEM768.

What happens if a client and server are using mismatched Go versions (one on 1.23, the other on 1.24)? They won't have a common PQC KEM to negotiate, and the handshake will fall back to classical ECC curves (e.g., X25519). How could this happen in practice?

Consider a scenario:

A Kubernetes cluster is running v1.32 (using Go 1.23 and thus X25519Kyber768Draft00). A developer upgrades their kubectl to v1.33, compiled with Go 1.24, only supporting X25519MLKEM768. Now, when kubectl communicates with the v1.32 API server, they no longer share a common PQC algorithm. The connection will downgrade to classical cryptography, silently losing the PQC protection that has been in place. This highlights the importance of understanding the implications of Go version upgrades, and the details of the TLS stack.

Limitations: packet size

One practical consideration with ML-KEM is the size of its public keys with encoded key sizes of around 1.2 kilobytes for ML-KEM-768. This can cause the initial TLS ClientHello message not to fit inside a single TCP/IP packet, given the typical networking constraints (most commonly, the standard Ethernet frame size limit of 1500 bytes). Some TLS libraries or network appliances might not handle this gracefully, assuming the Client Hello always fits in one packet. This issue has been observed in some Kubernetes-related projects and networking components, potentially leading to connection failures when PQC KEMs are used. More details can be found at tldr.fail.

State of Post-Quantum Signatures

While KEMs are seeing broader adoption, PQC digital signatures are further behind in terms of widespread integration into standard toolchains. NIST has published standards for PQC signatures, such as ML-DSA (FIPS-204) and SLH-DSA (FIPS-205). However, implementing these in a way that's broadly usable (e.g., for PQC Certificate Authorities) presents challenges:

Larger Keys and Signatures: PQC signature schemes often have significantly larger public keys and signature sizes compared to classical algorithms like Ed25519 or RSA. For instance, Dilithium2 keys can be 30 times larger than Ed25519 keys, and certificates can be 12 times larger.

Performance: Signing and verification operations can be substantially slower. While some algorithms are on par with classical algorithms, others may have a much higher overhead, sometimes on the order of 10x to 1000x worse performance. To improve this situation, NIST is running a second round of standardization for PQC signatures.

Toolchain Support: Mainstream TLS libraries and CA software do not yet have mature, built-in support for these new signature algorithms. The Go team, for example, has indicated that ML-DSA support is a high priority, but the soonest it might appear in the standard library is Go 1.26 (as of May 2025).

Cloudflare's CIRCL (Cloudflare Interoperable Reusable Cryptographic Library) library implements some PQC signature schemes like variants of Dilithium, and they maintain a fork of Go (cfgo) that integrates CIRCL. Using cfgo, it's possible to experiment with generating certificates signed with PQC algorithms like Ed25519-Dilithium2. However, this requires using a custom Go toolchain and is not yet part of the mainstream Kubernetes or Go distributions.

Conclusion

The journey to a post-quantum secure Kubernetes is underway, and perhaps further along than many realize, thanks to the proactive adoption of ML-KEM in Go. With Kubernetes v1.33, users are already benefiting from hybrid post-quantum key exchange in many TLS connections by default.

However, awareness of potential pitfalls, such as Go version mismatches leading to downgrades and issues with Client Hello packet sizes, is crucial. While PQC for KEMs is becoming a reality, PQC for digital signatures and certificate hierarchies is still in earlier stages of development and adoption for mainstream use. As Kubernetes maintainers and contributors, staying informed about these developments will be key to ensuring the long-term security of the platform.

18 Jul 2025 12:00am GMT

feedJavaScript Weekly

A tricky, educational quiz: it's about time..

#​745 - July 18, 2025

Read on the Web

JavaScript Weekly

The JavaScript Date Quiz - Prepare to get irritated? JavaScript's native date parsing features are notoriously arcane and prone to cause surprises if you step off the beaten track. So while we await the broad availability of the Temporal API, why not put your assumptions and knowledge to the test with an educational quiz?

Sam Rose

Next.js 15.4 Released (and What's Coming in Next.js 16) - A relatively small release for Next, but with updates to performance, stability, and Turbopack compatibility, and a good summary of what's coming next in Next.js 16.

Jimmy Lai and Zack Tanner

Add SSO & SCIM with Just a Few Lines of Code - WorkOS offers clean, well-documented APIs for SSO, SCIM, RBAC, and more, so you can focus on building features your users care about. Trusted by engineering teams at Cursor, Replit, Vercel, and Temporal.

WorkOS sponsor

WebAssembly: Yes, But for What? - Writing for ACM Queue, one of the contributors to multiple JavaScript and WebAssembly (WASM) implementations shares a good roundup of where WebAssembly is being used, both in the browser and server-side, and how it's gradually finding its way into seemingly everything.

Andy Wingo / ACM

IN BRIEF:

RELEASES:

📖 Articles and Videos

How to Create an NPM Package in 2025 - One of JavaScript's most essential tasks, but one with numerous steps involved if you want to follow best practices, integrate useful tools, and get things just right. Matt Pocock rounds up the overall process here.

Matt Pocock

The History of React Through Code - An epic article charting React's evolution from its origins at Facebook through to now. It sheds light on React's core philosophies and the motivations behind major decisions. This is a great way to round out your thinking about, and knowledge of, React's overall story.

Corbin Crutchley

How to Build an AI Coding Rules App with Lovable - Guide AI to generate a secure, full-stack app with minimal prompts-learn how to turn ideas into working software fast.

Clerk sponsor

The Untold Story of JavaScript - Two months ago, the Deno team shared A Brief History of JavaScript, a thorough timeline-based tour of JavaScript each year from 1994 till now. This video covers the same ground in just 8 minutes.

Deno

A Better Promise.all() - Utility Types and Functions - Utility types and functions that make deep promise handling more ergonomic and type-safe.

Nick Keuning

📄 Make Your Website Talk with the Web Speech API - A simple, straightforward approach. Andrew Magill

📄 How I Found a Bypass in Google's Big Anti-Adblock Update - A neat bit of JavaScript hackery (which is now fixed in Chrome). Derin Eryilmaz

📄 Building a 3D Product Configurator with Babylon.js - How to take configurable 3D models to the Web. Josh Sanderson

📄 Modern Async Iteration with Array.fromAsync() Matt Smith

🛠 Code & Tools

Tiptap v3: The Headless Rich Text Editor Framework - Tiptap provides a fantastic base for putting together powerful rich text editing experiences, and v3 includes a lot of DX improvements like being able to unmount and remount editors (ideal for dynamic UIs), 'Markviews' for creating custom views for text segments (marks) using your own components, an SSR mode, and more. GitHub repo.

Tiptap GmbH

✉️ Upyo: A Simple Cross-Runtime Email Sending Library - A cross-runtime email library that provides a unified, type-safe API for sending emails both on SMTP and HTTP-based (e.g. SendGrid or Amazon SES) providers. TIL that 'upyo' (우표) means 'postage stamp' in Korean.

Hong Minhee

No Breakpoints, No console.log - Just AI & Time Travel - 15x faster TypeScript and JavaScript debugging than with breakpoints and console.log, upgrading your AI agent into an expert debugger with real-time context.

Wallaby Team sponsor

Hyper Fetch: A 'Turbocharged' Fetch Library for Working with Remote APIs - A framework-agnostic, Axios and TanStack Query-inspired type-safe data-fetching framework for browser and server environments, with request lifecycle management, real-time communication, progress tracking, and codegen for Swagger/OpenAPI. GitHub repo.

Maciej Pyrc et al.

GrowField: Small, Dependency-Free Module for Making Textarea Elements Grow - Very simple. For when you've got a textarea input and you want it to grow as more content is added to it.

Five Fifteen

📰 Classifieds

Meticulous automatically creates and maintains an E2E UI test suite with zero developer effort. Relied on by Dropbox, Wiz, Lattice, Bilt Rewards, etc.

If you're a Node.js developers, don't forget to check out Node Weekly, our sister newsletter where we cover Node more deeply.

🎁 Tiny Bonus Items

18 Jul 2025 12:00am GMT

11 Jul 2025

feedJavaScript Weekly

The details of TC39's last meeting

#​744 - July 11, 2025

Read on the Web

JavaScript Weekly

Vercel Acquires NuxtLabs - Vercel has acquired the company that caretakes the Nuxt project and employs some of its core team - a move Vue creator Evan You is quite optimistic about. Vercel now manages, or at least supports, several key projects like Next.js, Turborepo, Svelte, and shadcn/ui. Nuxt itself remains open source and has a promising future. Vercel's Guillermo Rauch shares a little more about the move here.

NuxtLabs / Vercel

💡 Daniel Roe, leader of the Nuxt team, answered lots of questions about the acquisition on Reddit.

FlexGrid by Wijmo: The Industry-Leading JavaScript Datagrid - A fast and flexible DataGrid for building modern web apps. Key features and virtualized rendering are included in the core grid module. Pick & choose special features to keep your app small. Built for JavaScript, extended to Angular, React, and Vue.

Wijmo From MESCIUS sponsor

A Detailed Summary of the Latest TC39 Plenary - A thorough roundup of May's major ECMAScript committee meeting with far more detail about each proposal's development and the decisions made than we usually get to hear about. Topics include Array.fromAsync, explicit resource management, the Temporal API, and some brainstorming around AsyncContext.

Igalia Compilers Team

IN BRIEF:

RELEASES:

📖 Articles and Videos

What's the Difference Between Ordinary Functions and Arrow Functions? - This sounds like basic stuff, but James always does a good job of digging in and explaining things in a way that gives you a more nuanced way to think about a concept, even if it's just "Which function declaration syntax should I use?"

James Sinclair

💡 His guide to how to compose JS functions that take multiple parameters is also worth revisiting.

Embrace Web RUM Provides User-Focused Observability - Get session timelines, Core Web Vitals and JS exceptions in context, and user journey analysis with issue correlation.

Embrace sponsor

JavaScript Scope Hoisting is Broken - The creator of Parcel argues that scope hoisting (when bundlers inline modules into a shared scope) conflicts with modern JS patterns like code splitting and dynamic imports, causing subtle bugs and offering little benefit, so he's considering removing it in Parcel v3.

Devon Govett

Codepoint-Safe Truncation: Fixing Emoji Slicing - An app's CSV importer kept breaking on emoji-filled rows, triggering errors. James demonstrates how swapping slice for a code-point-aware spread fixes it.

James Mulholland

📄 Parsing 1 Billion Rows in Bun in Under 10 Seconds Tae Kim

📄 Loosely Synchronize Your JS Stores in Multiple Tauri Processes - Tauri is a bit like a Rust-flavored Electron for building cross-platform native apps. Costa Alexoglou

📄 Managing the State of Your Promises - On the potential of Promise.all and Promise.allSettled. Lydia Cho

📄 When Can I Use Temporal? - "If Brendan Eich can invent .. JavaScript in 10 days, why has it taken eight years to replace the Date API?" John Dalziel

📄 Is It Still Worth Using jQuery in 2025? Suren Enfiajyan

🛠 Code & Tools

Driver.js: Tours, Highlights, Contextual Help, and More - A vanilla JS library for making on-page tours and contextual help systems. It's been around for several years, but is still maintained, and there are lots of examples to check out - it's really smooth.

Kamran Ahmed

jsonrepair: Repair Invalid JSON Documents - This has lots of possible use cases, including dealing with weird JSON coming back from LLMs or non-compliant JSON spat out by poorly built software. You can use it from Node, as a CLI tool, or try a basic version online.

Jos de Jong

🤡 In barely related news, someone has turned JSON into its own programming language. Oh, the horror!

Server-Side Support for MCP in Next.js - Server-side MCP just got easier in Next.js. One route, no extra infra-Scorecard cut 1,000 lines to just 70.

Clerk sponsor

line-numbers: A Web Component to Add Line Numbers Next to Various HTML Elements - Useful for custom apps that show source code or other snippets that require line numbering. See examples here, which demonstrate the flexibility customization options for the line numbering.

Zach Leatherman

cRonstrue 3.0: Convert Cron Expressions into Natural Language - Not just English either - it supports about thirty locales. There's also an online demo.

Brady Holt

📰 Classifieds

Meticulous automatically creates and maintains an E2E UI test suite with zero developer effort. Relied on by Dropbox, Wiz, Lattice, Bilt Rewards, etc.

📌 Try out PinMe: free CLI tool helps deploy your static site in seconds-and keeps it online without any ongoing cost or maintenance.

👀 Elsewhere...

Here's a selection of things from the broader ecosystem this week:

11 Jul 2025 12:00am GMT

03 Jul 2025

feedKubernetes Blog

Navigating Failures in Pods With Devices

Kubernetes is the de facto standard for container orchestration, but when it comes to handling specialized hardware like GPUs and other accelerators, things get a bit complicated. This blog post dives into the challenges of managing failure modes when operating pods with devices in Kubernetes, based on insights from Sergey Kanzhelev and Mrunal Patel's talk at KubeCon NA 2024. You can follow the links to slides and recording.

The AI/ML boom and its impact on Kubernetes

The rise of AI/ML workloads has brought new challenges to Kubernetes. These workloads often rely heavily on specialized hardware, and any device failure can significantly impact performance and lead to frustrating interruptions. As highlighted in the 2024 Llama paper, hardware issues, particularly GPU failures, are a major cause of disruption in AI/ML training. You can also learn how much effort NVIDIA spends on handling devices failures and maintenance in the KubeCon talk by Ryan Hallisey and Piotr Prokop All-Your-GPUs-Are-Belong-to-Us: An Inside Look at NVIDIA's Self-Healing GeForce NOW Infrastructure (recording) as they see 19 remediation requests per 1000 nodes a day! We also see data centers offering spot consumption models and overcommit on power, making device failures commonplace and a part of the business model.

However, Kubernetes's view on resources is still very static. The resource is either there or not. And if it is there, the assumption is that it will stay there fully functional - Kubernetes lacks good support for handling full or partial hardware failures. These long-existing assumptions combined with the overall complexity of a setup lead to a variety of failure modes, which we discuss here.

Understanding AI/ML workloads

Generally, all AI/ML workloads require specialized hardware, have challenging scheduling requirements, and are expensive when idle. AI/ML workloads typically fall into two categories - training and inference. Here is an oversimplified view of those categories' characteristics, which are different from traditional workloads like web services:

Training
These workloads are resource-intensive, often consuming entire machines and running as gangs of pods. Training jobs are usually "run to completion" - but that could be days, weeks or even months. Any failure in a single pod can necessitate restarting the entire step across all the pods.
Inference
These workloads are usually long-running or run indefinitely, and can be small enough to consume a subset of a Node's devices or large enough to span multiple nodes. They often require downloading huge files with the model weights.

These workload types specifically break many past assumptions:

Workload assumptions before and now
Before Now
Can get a better CPU and the app will work faster. Require a specific device (or class of devices) to run.
When something doesn't work, just recreate it. Allocation or reallocation is expensive.
Any node will work. No need to coordinate between Pods. Scheduled in a special way - devices often connected in a cross-node topology.
Each Pod can be plug-and-play replaced if failed. Pods are a part of a larger task. Lifecycle of an entire task depends on each Pod.
Container images are slim and easily available. Container images may be so big that they require special handling.
Long initialization can be offset by slow rollout. Initialization may be long and should be optimized, sometimes across many Pods together.
Compute nodes are commoditized and relatively inexpensive, so some idle time is acceptable. Nodes with specialized hardware can be an order of magnitude more expensive than those without, so idle time is very wasteful.

The existing failure model was relying on old assumptions. It may still work for the new workload types, but it has limited knowledge about devices and is very expensive for them. In some cases, even prohibitively expensive. You will see more examples later in this article.

Why Kubernetes still reigns supreme

This article is not going deeper into the question: why not start fresh for
AI/ML workloads since they are so different from the traditional Kubernetes workloads. Despite many challenges, Kubernetes remains the platform of choice for AI/ML workloads. Its maturity, security, and rich ecosystem of tools make it a compelling option. While alternatives exist, they often lack the years of development and refinement that Kubernetes offers. And the Kubernetes developers are actively addressing the gaps identified in this article and beyond.

The current state of device failure handling

This section outlines different failure modes and the best practices and DIY (Do-It-Yourself) solutions used today. The next session will describe a roadmap of improving things for those failure modes.

Failure modes: K8s infrastructure

In order to understand the failures related to the Kubernetes infrastructure, you need to understand how many moving parts are involved in scheduling a Pod on the node. The sequence of events when the Pod is scheduled in the Node is as follows:

  1. Device plugin is scheduled on the Node
  2. Device plugin is registered with the kubelet via local gRPC
  3. Kubelet uses device plugin to watch for devices and updates capacity of the node
  4. Scheduler places a user Pod on a Node based on the updated capacity
  5. Kubelet asks Device plugin to Allocate devices for a User Pod
  6. Kubelet creates a User Pod with the allocated devices attached to it

This diagram shows some of those actors involved:

The diagram shows relationships between the kubelet, Device plugin, and a user Pod. It shows that kubelet connects to the Device plugin named my-device, kubelet reports the node status with the my-device availability, and the user Pod requesting the 2 of my-device.

As there are so many actors interconnected, every one of them and every connection may experience interruptions. This leads to many exceptional situations that are often considered failures, and may cause serious workload interruptions:

The same diagram as one above it, however it has an overlayed orange bang drawings over individual components with the text indicating what can break in that component. Over the kubelet text reads: 'kubelet restart: looses all devices info before re-Watch'. Over the Device plugin text reads: 'device plugin update, evictIon, restart: kubelet cannot Allocate devices or loses all devices state'. Over the user Pod text reads: 'slow pod termination: devices are unavailable'.

The goal for Kubernetes is to make the interruption between these components as reliable as possible. Kubelet already implements retries, grace periods, and other techniques to improve it. The roadmap section goes into details on other edge cases that the Kubernetes project tracks. However, all these improvements only work when these best practices are followed:

Another class of Kubernetes infra-related issues is driver-related. With traditional resources like CPU and memory, no compatibility checks between the application and hardware were needed. With special devices like hardware accelerators, there are new failure modes. Device drivers installed on the node:

Best practices for handling driver versions:

Following the best practices in this section and using device plugins and device driver installers from trusted and reliable sources generally eliminate this class of failures. Kubernetes is tracking work to make this space even better.

Failure modes: device failed

There is very little handling of device failure in Kubernetes today. Device plugins report the device failure only by changing the count of allocatable devices. And Kubernetes relies on standard mechanisms like liveness probes or container failures to allow Pods to communicate the failure condition to the kubelet. However, Kubernetes does not correlate device failures with container crashes and does not offer any mitigation beyond restarting the container while being attached to the same device.

This is why many plugins and DIY solutions exist to handle device failures based on various signals.

Health controller

In many cases a failed device will result in unrecoverable and very expensive nodes doing nothing. A simple DIY solution is a node health controller. The controller could compare the device allocatable count with the capacity and if the capacity is greater, it starts a timer. Once the timer reaches a threshold, the health controller kills and recreates a node.

There are problems with the health controller approach:

There are variations of the health controller solving some of the problems above. The overall theme here though is that to best handle failed devices, you need customized handling for the specific workload. Kubernetes doesn't yet offer enough abstraction to express how critical the device is for a node, for the cluster, and for the Pod it is assigned to.

Pod failure policy

Another DIY approach for device failure handling is a per-pod reaction on a failed device. This approach is applicable for training workloads that are implemented as Jobs.

Pod can define special error codes for device failures. For example, whenever unexpected device behavior is encountered, Pod exits with a special exit code. Then the Pod failure policy can handle the device failure in a special way. Read more on Handling retriable and non-retriable pod failures with Pod failure policy

There are some problems with the Pod failure policy approach for Jobs:

So, this solution has limited applicability.

Custom pod watcher

A little more generic approach is to implement the Pod watcher as a DIY solution or use some third party tools offering this functionality. The pod watcher is most often used to handle device failures for inference workloads.

Since Kubernetes just keeps a pod assigned to a device, even if the device is reportedly unhealthy, the idea is to detect this situation with the pod watcher and apply some remediation. It often involves obtaining device health status and its mapping to the Pod using Pod Resources API on the node. If a device fails, it can then delete the attached Pod as a remediation. The replica set will handle the Pod recreation on a healthy device.

The other reasons to implement this watcher:

Problems with the custom pod watcher:

There are more variations of DIY solutions for handling device failures or upcoming maintenance. Overall, Kubernetes has enough extension points to implement these solutions. However, some extension points require higher privilege than users may be comfortable with or are too disruptive. The roadmap section goes into more details on specific improvements in handling the device failures.

Failure modes: container code failed

When the container code fails or something bad happens with it, like out of memory conditions, Kubernetes knows how to handle those cases. There is either the restart of a container, or a crash of a Pod if it has restartPolicy: Never and scheduling it on another node. Kubernetes has limited expressiveness on what is a failure (for example, non-zero exit code or liveness probe failure) and how to react on such a failure (mostly either Always restart or immediately fail the Pod).

This level of expressiveness is often not enough for the complicated AI/ML workloads. AI/ML pods are better rescheduled locally or even in-place as that would save on image pulling time and device allocation. AI/ML pods are often interconnected and need to be restarted together. This adds another level of complexity and optimizing it often brings major savings in running AI/ML workloads.

There are various DIY solutions to handle Pod failures orchestration. The most typical one is to wrap a main executable in a container by some orchestrator. And this orchestrator will be able to restart the main executable whenever the job needs to be restarted because some other pod has failed.

Solutions like this are very fragile and elaborate. They are often worth the money saved comparing to a regular JobSet delete/recreate cycle when used in large training jobs. Making these solutions less fragile and more streamlined by developing new hooks and extension points in Kubernetes will make it easy to apply to smaller jobs, benefiting everybody.

Failure modes: device degradation

Not all device failures are terminal for the overall workload or batch job. As the hardware stack gets more and more complex, misconfiguration on one of the hardware stack layers, or driver failures, may result in devices that are functional, but lagging on performance. One device that is lagging behind can slow down the whole training job.

We see reports of such cases more and more often. Kubernetes has no way to express this type of failures today and since it is the newest type of failure mode, there is not much of a best practice offered by hardware vendors for detection and third party tooling for remediation of these situations.

Typically, these failures are detected based on observed workload characteristics. For example, the expected speed of AI/ML training steps on particular hardware. Remediation for those issues is highly depend on a workload needs.

Roadmap

As outlined in a section above, Kubernetes offers a lot of extension points which are used to implement various DIY solutions. The space of AI/ML is developing very fast, with changing requirements and usage patterns. SIG Node is taking a measured approach of enabling more extension points to implement the workload-specific scenarios over introduction of new semantics to support specific scenarios. This means prioritizing making information about failures readily available over implementing automatic remediations for those failures that might only be suitable for a subset of workloads.

This approach ensures there are no drastic changes for workload handling which may break existing, well-oiled DIY solutions or experiences with the existing more traditional workloads.

Many error handling techniques used today work for AI/ML, but are very expensive. SIG Node will invest in extension points to make those cheaper, with the understanding that the price cutting for AI/ML is critical.

The following is the set of specific investments we envision for various failure modes.

Roadmap for failure modes: K8s infrastructure

The area of Kubernetes infrastructure is the easiest to understand and very important to make right for the upcoming transition from Device Plugins to DRA. SIG Node is tracking many work items in this area, most notably the following:

Basically, every interaction of Kubernetes components must be reliable via either the kubelet improvements or the best practices in plugins development and deployment.

Roadmap for failure modes: device failed

For the device failures some patterns are already emerging in common scenarios that Kubernetes can support. However, the very first step is to make information about failed devices available easier. The very first step here is the work in KEP 4680 (Add Resource Health Status to the Pod Status for Device Plugin and DRA).

Longer term ideas include to be tested:

Roadmap for failure modes: container code failed

The main improvements to handle container code failures for AI/ML workloads are all targeting cheaper error handling and recovery. The cheapness is mostly coming from reuse of pre-allocated resources as much as possible. From reusing the Pods by restarting containers in-place, to node local restart of containers instead of rescheduling whenever possible, to snapshotting support, and re-scheduling prioritizing the same node to save on image pulls.

Consider this scenario: A big training job needs 512 Pods to run. And one of the pods failed. It means that all Pods need to be interrupted and synced up to restart the failed step. The most efficient way to achieve this generally is to reuse as many Pods as possible by restarting them in-place, while replacing the failed pod to clear up the error from it. Like demonstrated in this picture:

The picture shows 512 pod, most ot them are green and have a recycle sign next to them indicating that they can be reused, and one Pod drawn in red, and a new green replacement Pod next to it indicating that it needs to be replaced.

It is possible to implement this scenario, but all solutions implementing it are fragile due to lack of certain extension points in Kubernetes. Adding these extension points to implement this scenario is on the Kubernetes roadmap.

Roadmap for failure modes: device degradation

There is very little done in this area - there is no clear detection signal, very limited troubleshooting tooling, and no built-in semantics to express the "degraded" device on Kubernetes. There has been discussion of adding data on device performance or degradation in the ResourceSlice used by DRA to represent devices, but it is not yet clearly defined. There are also projects like node-healthcheck-operator that can be used for some scenarios.

We expect developments in this area from hardware vendors and cloud providers, and we expect to see mostly DIY solutions in the near future. As more users get exposed to AI/ML workloads, this is a space needing feedback on patterns used here.

Join the conversation

The Kubernetes community encourages feedback and participation in shaping the future of device failure handling. Join SIG Node and contribute to the ongoing discussions!

This blog post provides a high-level overview of the challenges and future directions for device failure management in Kubernetes. By addressing these issues, Kubernetes can solidify its position as the leading platform for AI/ML workloads, ensuring resilience and reliability for applications that depend on specialized hardware.

03 Jul 2025 12:00am GMT