13 Oct 2025

feedLXer Linux News

CLUDA Posted For Mesa: Gallium3D API Implemented Atop NVIDIA CUDA Driver API

Well, here is a weekend surprise... Red Hat engineer and Rusticl lead developer Karol Herbst has opened a Mesa merge request for "CLUDA" as a compute-only driver that implements the Gallium3D API atop the NVIDIA CUDA driver API. Wow...

13 Oct 2025 7:09am GMT

KDE Frameworks 6.19 Released with Various Improvements and Bug Fixes

The monthly KDE Frameworks updates continue with KDE Frameworks 6.19, released today by the KDE Project as a companion to the latest KDE Plasma 6.4.5 desktop environment and KDE Gear 25.08.2 software suite.

13 Oct 2025 5:38am GMT

Linux 6.18-rc1 Released With New Tyr & Rocket Drivers, Haptic Touchpads & DM-PCACHE

Linux 6.18-rc1 is now available for testing with the Linux 6.18 merge window closed. Linux 6.18 will be out in December and is anticipated to become this year's Linux LTS kernel version...

13 Oct 2025 4:06am GMT

9to5Linux Weekly Roundup: October 12th, 2025

The 261st installment of the 9to5Linux Weekly Roundup is here for the week ending on October 12th, 2025, keeping you updated with the most important things happening in the Linux world.

13 Oct 2025 2:35am GMT

Linux Kernel 6.16 Reaches End of Life, It’s Time to Upgrade to Linux Kernel 6.17

This is your friendly reminder that, as of today, the Linux 6.16 kernel series has reached the end of its supported life, which means that it's time to start upgrading your installations to Linux kernel 6.17.

13 Oct 2025 1:03am GMT

12 Oct 2025

feedLXer Linux News

Linus Torvalds Announces First Linux Kernel 6.18 Release Candidate

Linus Torvalds announced today the general availability for public testing of the first Release Candidate (RC) development milestone of the upcoming Linux 6.18 kernel series.

12 Oct 2025 11:32pm GMT

feedLinuxiac

Linuxiac Weekly Wrap-Up: Week 41 (Oct 6 – 12, 2025)

Linuxiac Weekly Wrap-Up: Week 41 (Oct 6 – 12, 2025)

Catch up on the latest Linux news: Ubuntu 25.10, Gnoppix KDE 25.10, GIMP 3.0.6, KDE Gear 25.08.2, OpenSSH 10.2, Python 3.14, ClamAV 1.5, Solus begins a new epoch, Meta unveils OpenZL, and more.

12 Oct 2025 10:28pm GMT

feedLXer Linux News

The Only Thing Stopping You from Switching to Linux Is Your Mindset

Switching to Linux is less about technical hurdles and more about adjusting your mindset. Here's why.

12 Oct 2025 10:00pm GMT

feedLinuxiac

The Only Thing Stopping You from Switching to Linux Is Your Mindset

The Only Thing Stopping You from Switching to Linux Is Your Mindset

Switching to Linux is less about technical hurdles and more about adjusting your mindset. Here's why.

12 Oct 2025 8:50pm GMT

feedLXer Linux News

Compact M5Stack Unit C6L Integrates RISC-V ESP32-C6 and SX1262 for LoRa Meshtastic Use

The M5Stack Unit C6L is a compact LoRa module featuring the ESP32-C6 SoC and SX1262 transceiver. It supports 868 to 923 MHz operation for private LoRa networks with Meshtastic compatibility and options for custom development. The ESP32-C6 integrates a dual RISC-V architecture consisting of a high-performance 32-bit core running at 160 MHz and a low-power […]

12 Oct 2025 6:30pm GMT

How to Easily Add a Live Wallpaper on KDE Plasma 6

In this tutorial, I'll show you how to add a live wallpaper on KDE Plasma 6 using a video from the internet. We'll turn a regular video into a dynamic desktop wallpaper, making your Plasma desktop look more lively and interesting.

12 Oct 2025 4:58pm GMT

Protect Yourself Online: A Hands-On Guide to the New Tails 7.0

Heading into risky online territory? Consider running Tails 7.0, which now offers faster startup and a smoother setup for safer browsing.

12 Oct 2025 3:27pm GMT

Imagination PowerVR Mesa Vulkan Driver Enables Unofficial Support For More GPUs

Merged today for the Mesa 25.3 graphics driver code is enabling support for more PowerVR Imagination GPUs within the "PVR" Vulkan driver albeit not officially supported nor in active development. Your mileage may vary but for some users with certain GPUs may work out well enough...

12 Oct 2025 1:55pm GMT

How To Fix Broken Flatpak Issue In Ubuntu 25.10 Questing Quokka

Flatpak is not working in Ubuntu 25.10. Here's a temporary fix to fix broken flatpak issue in Ubuntu 25.10..

12 Oct 2025 12:24pm GMT

KDE Frameworks 6.19 Is Out, Here’s What’s New

KDE Frameworks 6.19 fixes 7z crashes, refines Breeze Icons, and delivers better performance across core libraries.

12 Oct 2025 10:52am GMT

LMDE (Linux Mint Debian Edition) 7 Is Now Available for Download

The long-anticipated LMDE (Linux Mint Debian Edition) 7 release is now available for download based on the Debian GNU/Linux 13 "Trixie" operating system series.

12 Oct 2025 9:21am GMT

11 Oct 2025

feedLinuxiac

KDE Frameworks 6.19 Is Out, Here’s What’s New

KDE Frameworks 6.19 Is Out, Here’s What’s New

KDE Frameworks 6.19 fixes 7z crashes, refines Breeze Icons, and delivers better performance across core libraries.

11 Oct 2025 9:18pm GMT

Solus Begins a New Epoch with Polaris Repository and Python 2 Removal

Solus Linux enters the Polaris era, introducing a new stable repository and removing legacy Python 2 components.

11 Oct 2025 3:02pm GMT

TT-RSS Shuts Down, but the Project Lives On Under a New Fork

TT-RSS Shuts Down, but the Project Lives On Under a New Fork

Tiny Tiny RSS is shutting down, but a new fork will keep the popular open-source RSS reader and news aggregator alive and maintained.

11 Oct 2025 1:06pm GMT

feedLXer Linux News

OpenSSH 10.2 Released with Key Bugfix for ControlPersist Issue

OpenSSH 10.2 addresses bugs and prepares to deprecate SHA1 SSHFP records, pushing for stronger, SHA256-based security.

11 Oct 2025 12:38pm GMT

GCC Patches Posted For C++26 SIMD Support

One of the exciting additions on the way for the C++26 programming language is a standardized library around Single Instruction Multiple Data (SIMD) operations. This portable SIMD implementation makes it easier to leverage SIMD and data parallelism in C++ for better performance and to work across SIMD architectures like AVX-512...

11 Oct 2025 11:06am GMT

10 Oct 2025

feedLinuxiac

Kdenlive 25.08.2 Released with Stability Fixes and Polished Effects

Kdenlive 25.08.2 Released with Stability Fixes and Polished Effects

Kdenlive 25.08.2 open-source video editor brings major stability fixes, improved clip handling, and polished frei0r effects.

10 Oct 2025 7:10pm GMT

Pacsea Is a New TUI That Makes Arch Package Browsing Easier

Pacsea Is a New TUI That Makes Arch Package Browsing Easier

Pacsea is a new Rust-written terminal TUI for Arch Linux that unifies official and AUR package searches into a single interface.

10 Oct 2025 3:17pm GMT

OpenSSH 10.2 Released with Key Bugfix for ControlPersist Issue

OpenSSH 10.2 Released with Key Bugfix for ControlPersist Issue

OpenSSH 10.2 addresses bugs and prepares to deprecate SHA1 SSHFP records, pushing for stronger, SHA256-based security.

10 Oct 2025 9:06am GMT

feedLinux Today

DuckDB 1.4 LTS Released with Database Encryption, MERGE, and Iceberg Writes

Discover the latest DuckDB 1.4 LTS release featuring enhanced database encryption, MERGE functionality, and Iceberg writes for improved data management.

The post DuckDB 1.4 LTS Released with Database Encryption, MERGE, and Iceberg Writes appeared first on Linux Today.

10 Oct 2025 8:39am GMT

Euphonica – MPD Client with Delusions of Grandeur

Discover Euphonica, the MPD client with delusions of grandeur. Explore its unique features and how it challenges perceptions of identity and reality.

The post Euphonica - MPD Client with Delusions of Grandeur appeared first on Linux Today.

10 Oct 2025 8:37am GMT

Archinstall 3.0.11 Released with Systemd Service Handling Fixes

Discover the latest Archinstall 3.0.11 release, featuring crucial Systemd service handling fixes. Enhance your installation experience today!

The post Archinstall 3.0.11 Released with Systemd Service Handling Fixes appeared first on Linux Today.

10 Oct 2025 8:34am GMT

Krita 5.2.13 Bugfix Update Brings 16K Page Size Support

Discover the Krita 5.2.13 bugfix update, featuring enhanced 16K page size support. Elevate your digital art experience with improved performance and stability.

The post Krita 5.2.13 Bugfix Update Brings 16K Page Size Support appeared first on Linux Today.

10 Oct 2025 8:31am GMT

Microsoft Exchange Support Coming to Thunderbird in October 2025

Discover how Microsoft Exchange support will enhance Thunderbird in October 2025, offering seamless email management and improved productivity for users.

The post Microsoft Exchange Support Coming to Thunderbird in October 2025 appeared first on Linux Today.

10 Oct 2025 8:29am GMT

From Vienna, with Open Source: XDC 2025

Discover the future of open-source innovation at XDC 2025 in Vienna. Join industry leaders and explore groundbreaking technologies shaping our world.

The post From Vienna, with Open Source: XDC 2025 appeared first on Linux Today.

10 Oct 2025 8:23am GMT

How to Install Python on AlmaLinux 10

Learn how to install Python on AlmaLinux 10 with our step-by-step guide. Simplify your setup and start coding in no time!

The post How to Install Python on AlmaLinux 10 appeared first on Linux Today.

10 Oct 2025 8:20am GMT

Minisforum AI X1 Pro: Gerbil – Run Large Language Models Locally

Discover the Minisforum AI X1 Pro: Gerbil, a powerful solution for running large language models locally. Enhance your AI capabilities today!

The post Minisforum AI X1 Pro: Gerbil - Run Large Language Models Locally appeared first on Linux Today.

10 Oct 2025 8:17am GMT

OBS Studio 32.0 Adds PipeWire Video Capture Improvements, Basic Plugin Manager

Discover the latest updates in OBS Studio 32.0, featuring enhanced PipeWire video capture and a new basic plugin manager for improved streaming experiences.

The post OBS Studio 32.0 Adds PipeWire Video Capture Improvements, Basic Plugin Manager appeared first on Linux Today.

10 Oct 2025 8:11am GMT

Kali Linux 2025.3 Penetration Testing Distro Introduces 10 New Hacking Tools

Discover Kali Linux 2025.3, the latest penetration testing distro featuring 10 new hacking tools. Elevate your cybersecurity skills today!

The post Kali Linux 2025.3 Penetration Testing Distro Introduces 10 New Hacking Tools appeared first on Linux Today.

10 Oct 2025 8:07am GMT

09 Oct 2025

feedLinuxiac

Meta Unveils OpenZL: A New Open Source Data Compression Framework

Meta Unveils OpenZL: A New Open Source Data Compression Framework

OpenZL is Meta's new open-source compression framework that delivers faster, smarter, and lossless data handling.

09 Oct 2025 6:01pm GMT

KDE Gear 25.08.2 Apps Collection Rolls Out, Here’s What’s New

KDE Gear 25.08.2 Apps Collection Rolls Out, Here’s What’s New

KDE Gear 25.08.2 bugfix update improves stability across NeoChat, Tokodon, Itinerary, and other core KDE apps.

09 Oct 2025 3:47pm GMT

01 Oct 2025

feedKernel Planet

Greg Kroah-Hartman: The only benchmark that matters is...

…the one that emulates your real workload. And for me (and probably many of you reading this), that would be "build a kernel as fast as possible." And for that, I recommend the simple kcbench.

I kcbench mentioned it a few years ago, when writing about a new workstation that Level One Techs set up for me, and I've been using that as my primary workstation ever since (just over 5 years!).

01 Oct 2025 12:00am GMT

24 Sep 2025

feedKernel Planet

Matthew Garrett: Investigating a forged PDF

I had to rent a house for a couple of months recently, which is long enough in California that it pushes you into proper tenant protection law. As landlords tend to do, they failed to return my security deposit within the 21 days required by law, having already failed to provide the required notification that I was entitled to an inspection before moving out. Cue some tedious argumentation with the letting agency, and eventually me threatening to take them to small claims court.

This post is not about that.

Now, under Californian law, the onus is on the landlord to hold and return the security deposit - the agency has no role in this. The only reason I was talking to them is that my lease didn't mention the name or address of the landlord (another legal violation, but the outcome is just that you get to serve the landlord via the agency). So it was a bit surprising when I received an email from the owner of the agency informing me that they did not hold the deposit and so were not liable - I already knew this.

The odd bit about this, though, is that they sent me another copy of the contract, asserting that it made it clear that the landlord held the deposit. I read it, and instead found a clause reading SECURITY: The security deposit will secure the performance of Tenant's obligations. IER may, but will not be obligated to, apply all portions of said deposit on account of Tenant's obligations. Any balance remaining upon termination will be returned to Tenant. Tenant will not have the right to apply the security deposit in payment of the last month's rent. Security deposit held at IER Trust Account., where IER is International Executive Rentals, the agency in question. Why send me a contract that says you hold the money while you're telling me you don't? And then I read further down and found this:
Text reading ENTIRE AGREEMENT: The foregoing constitutes the entire agreement between the parties and may bemodified only in writing signed by all parties. This agreement and any modifications, including anyphotocopy or facsimile, may be signed in one or more counterparts, each of which will be deemed anoriginal and all of which taken together will constitute one and the same instrument. The followingexhibits, if checked, have been made a part of this Agreement before the parties’ execution:۞Exhibit 1:Lead-Based Paint Disclosure (Required by Law for Rental Property Built Prior to 1978)۞Addendum 1 The security deposit will be held by (name removed) and applied, refunded, or forfeited in accordance with the terms of this lease agreement.
Ok, fair enough, there's an addendum that says the landlord has it (I've removed the landlord's name, it's present in the original).

Except. I had no recollection of that addendum. I went back to the copy of the contract I had and discovered:
The same text as the previous picture, but addendum 1 is empty
Huh! But obviously I could just have edited that to remove it (there's no obvious reason for me to, but whatever), and then it'd be my word against theirs. However, I'd been sent the document via RightSignature, an online document signing platform, and they'd added a certification page that looked like this:
A Signature Certificate, containing a bunch of data about the document including a checksum or the original
Interestingly, the certificate page was identical in both documents, including the checksums, despite the content being different. So, how do I show which one is legitimate? You'd think given this certificate page this would be trivial, but RightSignature provides no documented mechanism whatsoever for anyone to verify any of the fields in the certificate, which is annoying but let's see what we can do anyway.

First up, let's look at the PDF metadata. pdftk has a dump_data command that dumps the metadata in the document, including the creation date and the modification date. My file had both set to identical timestamps in June, both listed in UTC, corresponding to the time I'd signed the document. The file containing the addendum? The same creation time, but a modification time of this Monday, shortly before it was sent to me. This time, the modification timestamp was in Pacific Daylight Time, the timezone currently observed in California. In addition, the data included two ID fields, ID0 and ID1. In my document both were identical, in the one with the addendum ID0 matched mine but ID1 was different.

These ID tags are intended to be some form of representation (such as a hash) of the document. ID0 is set when the document is created and should not be modified afterwards - ID1 initially identical to ID0, but changes when the document is modified. This is intended to allow tooling to identify whether two documents are modified versions of the same document. The identical ID0 indicated that the document with the addendum was originally identical to mine, and the different ID1 that it had been modified.

Well, ok, that seems like a pretty strong demonstration. I had the "I have a very particular set of skills" conversation with the agency and pointed these facts out, that they were an extremely strong indication that my copy was authentic and their one wasn't, and they responded that the document was "re-sealed" every time it was downloaded from RightSignature and that would explain the modifications. This doesn't seem plausible, but it's an argument. Let's go further.

My next move was pdfalyzer, which allows you to pull a PDF apart into its component pieces. This revealed that the documents were identical, other than page 3, the one with the addendum. This page included tags entitled "touchUp_TextEdit", evidence that the page had been modified using Acrobat. But in itself, that doesn't prove anything - obviously it had been edited at some point to insert the landlord's name, it doesn't prove whether it happened before or after the signing.

But in the process of editing, Acrobat appeared to have renamed all the font references on that page into a different format. Every other page had a consistent naming scheme for the fonts, and they matched the scheme in the page 3 I had. Again, that doesn't tell us whether the renaming happened before or after the signing. Or does it?

You see, when I completed my signing, RightSignature inserted my name into the document, and did so using a font that wasn't otherwise present in the document (Courier, in this case). That font was named identically throughout the document, except on page 3, where it was named in the same manner as every other font that Acrobat had renamed. Given the font wasn't present in the document until after I'd signed it, this is proof that the page was edited after signing.

But eh this is all very convoluted. Surely there's an easier way? Thankfully yes, although I hate it. RightSignature had sent me a link to view my signed copy of the document. When I went there it presented it to me as the original PDF with my signature overlaid on top. Hitting F12 gave me the network tab, and I could see a reference to a base.pdf. Downloading that gave me the original PDF, pre-signature. Running sha256sum on it gave me an identical hash to the "Original checksum" field. Needless to say, it did not contain the addendum.

Why do this? The only explanation I can come up with (and I am obviously guessing here, I may be incorrect!) is that International Executive Rentals realised that they'd sent me a contract which could mean that they were liable for the return of my deposit, even though they'd already given it to my landlord, and after realising this added the addendum, sent it to me, and assumed that I just wouldn't notice (or that, if I did, I wouldn't be able to prove anything). In the process they went from an extremely unlikely possibility of having civil liability for a few thousand dollars (even if they were holding the deposit it's still the landlord's legal duty to return it, as far as I can tell) to doing something that looks extremely like forgery.

There's a hilarious followup. After this happened, the agency offered to do a screenshare with me showing them logging into RightSignature and showing the signed file with the addendum, and then proceeded to do so. One minor problem - the "Send for signature" button was still there, just below a field saying "Uploaded: 09/22/25". I asked them to search for my name, and it popped up two hits - one marked draft, one marked completed. The one marked completed? Didn't contain the addendum.

comment count unavailable comments

24 Sep 2025 10:46pm GMT

19 Sep 2025

feedKernel Planet

Linux Plumbers Conference: In Person Registration is sold out

Apparently there was quite a bit more demand than we anticipated. We are running a waitlist which you can get on by filling in this form:

https://forms.gle/tYjjbyn66q5SQMLPA

The venue is smaller this year but we do have a block of reserved passes for MC content so we'll allocate places to the waitlist after it's decided how many of them get used. Note that in order to be fair to everyone, if you sign up for the waitlist you'll have 7 days to register otherwise your pass will go to the next person.

19 Sep 2025 1:22pm GMT

15 Sep 2025

feedKernel Planet

Linux Plumbers Conference: Registration for LPC 2025 is now open!

We're happy to announce that registration for LPC 2025 is now open. To register please go to our attend page.

To try to prevent the instant sellout, we are keeping our cancellation policy of no refunds, only transfers of registrations. You will find more details during the registration process. LPC 2025 follows the Linux Foundation's health & safety policy.

As usual we expect to sell our rather quickly so don't delay your registration for too long!

15 Sep 2025 8:16pm GMT

Dave Airlie (blogspot): radv takes over from AMDVLK


AMD have announced the end of the AMDVLK open driver in favour of focusing on radv for Linux use cases.

When Bas and I started radv in 2016, AMD were promising their own Linux vulkan driver, which arrived in Dec 2017. At this point radv was already shipping in most Linux distros. AMD strategy of having AMDVLK was developed via over the wall open source releases from internal closed development was always going to be a second place option at that point.

When Valve came on board and brought dedicated developer power to radv, and the aco compiler matured, there really was no point putting effort into using AMDVLK which was hard to package and impossible to contribute to meaningfully for external developers.

radv is probably my proudest contribution to the Linux ecosystem, finally disproving years of idiots saying an open source driver could never compete with a vendor provided driver, now it is the vendor provided driver.

I think we will miss the open source PAL repo as a reference source and I hope AMD engineers can bridge that gap, but it's often hard to find workarounds you don't know exist to ask about them. I'm also hoping AMD will add more staffing beyond the current levels especially around hardware enablement and workarounds.

Now onwards to NVK victory :-)

[1] https://github.com/GPUOpen-Drivers/AMDVLK/discussions/416

15 Sep 2025 7:08pm GMT

08 Sep 2025

feedKernel Planet

Linux Plumbers Conference: The Call for Proposals is nearing its end!

The CfPs for the Linux Plumbers events are coming to an end. If you still want to submit, please get your submission in by the deadline. The deadlines are:

Each of the Microconferences has their own last day to submit. Those are listed in the Accepted Microconferences tab on the website.

All submissions may be added in the Call for Proposals tab. Click the Submit new abstract button at the bottom of that page, and make sure you select the proper Track.

08 Sep 2025 4:14pm GMT

06 Sep 2025

feedKernel Planet

Matthew Garrett: Locally hosting an internet-connected server

I'm lucky enough to have a weird niche ISP available to me, so I'm paying $35 a month for around 600MBit symmetric data. Unfortunately they don't offer static IP addresses to residential customers, and nor do they allow multiple IP addresses per connection, and I'm the sort of person who'd like to run a bunch of stuff myself, so I've been looking for ways to manage this.

What I've ended up doing is renting a cheap VPS from a vendor that lets me add multiple IP addresses for minimal extra cost. The precise nature of the VPS isn't relevant - you just want a machine (it doesn't need much CPU, RAM, or storage) that has multiple world routeable IPv4 addresses associated with it and has no port blocks on incoming traffic. Ideally it's geographically local and peers with your ISP in order to reduce additional latency, but that's a nice to have rather than a requirement.

By setting that up you now have multiple real-world IP addresses that people can get to. How do we get them to the machine in your house you want to be accessible? First we need a connection between that machine and your VPS, and the easiest approach here is Wireguard. We only need a point-to-point link, nothing routable, and none of the IP addresses involved need to have anything to do with any of the rest of your network. So, on your local machine you want something like:

[Interface]
PrivateKey = privkeyhere
ListenPort = 51820
Address = localaddr/32

[Peer]
Endpoint = VPS:51820
PublicKey = pubkeyhere
AllowedIPs = VPS/0


And on your VPS, something like:

[Interface]
Address = vpswgaddr/32
SaveConfig = true
ListenPort = 51820
PrivateKey = privkeyhere

[Peer]
PublicKey = pubkeyhere
AllowedIPs = localaddr/32


The addresses here are (other than the VPS address) arbitrary - but they do need to be consistent, otherwise Wireguard is going to be unhappy and your packets will not have a fun time. Bring that interface up with wg-quick and make sure the devices can ping each other. Hurrah! That's the easy bit.

Now you want packets from the outside world to get to your internal machine. Let's say the external IP address you're going to use for that machine is 321.985.520.309 and the wireguard address of your local system is 867.420.696.005. On the VPS, you're going to want to do:

iptables -t nat -A PREROUTING -p tcp -d 321.985.520.309 -j DNAT --to-destination 867.420.696.005
iptables -t nat -A PREROUTING -p udp -d 321.985.520.309 -j DNAT --to-destination 867.420.696.005

Now, all incoming packets for 321.985.520.309 will be rewritten to head towards 867.420.696.005 instead (make sure you've set net.ipv4.ip_forward to 1 via sysctl!). Victory! Or is it? Well, no.

What we're doing here is rewriting the destination address of the packets so instead of heading to an address associated with the VPS, they're now going to head to your internal system over the Wireguard link. Which is then going to ignore them, because the AllowedIPs statement in the config only allows packets coming from your VPS, and these packets still have their original source IP. We could rewrite the source IP to match the VPS IP, but then you'd have no idea where any of these packets were coming from, and that sucks. Let's do something better. On the local machine, in the peer, let's update AllowedIps to 0.0.0.0/0 to permit packets form any source to appear over our Wireguard link. But if we bring the interface up now, it'll try to route all traffic over the Wireguard link, which isn't what we want. So we'll add table = off to the interface stanza of the config to disable that, and now we can bring the interface up without breaking everything but still allowing packets to reach us. However, we do still need to tell the kernel how to reach the remote VPN endpoint, which we can do with ip route add vpswgaddr dev wg0. Add this to the interface stanza as:

PostUp = ip route add vpswgaddr dev wg0
PreDown = ip route del vpswgaddr dev wg0


That's half the battle. The problem is that they're going to show up there with the source address still set to the original source IP, and your internal system is (because Linux) going to notice it has the ability to just send replies to the outside world via your ISP rather than via Wireguard and nothing is going to work. Thanks, Linux. Thinux.

But there's a way to solve this - policy routing. Linux allows you to have multiple separate routing tables, and define policy that controls which routing table will be used for a given packet. First, let's define a new table reference. On the local machine, edit /etc/iproute2/rt_tables and add a new entry that's something like:

1 wireguard


where "1" is just a standin for a number not otherwise used there. Now edit your wireguard config and replace table=off with table=wireguard - Wireguard will now update the wireguard routing table rather than the global one. Now all we need to do is to tell the kernel to push packets into the appropriate routing table - we can do that with ip rule add from localaddr lookup wireguard, which tells the kernel to take any packet coming from our Wireguard address and push it via the Wireguard routing table. Add that to your Wireguard interface config as:

PostUp = ip rule add from localaddr lookup wireguard
PreDown = ip rule del from localaddr lookup wireguard

and now your local system is effectively on the internet.

You can do this for multiple systems - just configure additional Wireguard interfaces on the VPS and make sure they're all listening on different ports. If your local IP changes then your local machines will end up reconnecting to the VPS, but to the outside world their accessible IP address will remain the same. It's like having a real IP without the pain of convincing your ISP to give it to you.

comment count unavailable comments

06 Sep 2025 3:20pm GMT

05 Aug 2025

feedKernel Planet

Matthew Garrett: Cordoomceps - replacing an Amiga's brain with Doom

There's a lovely device called a pistorm, an adapter board that glues a Raspberry Pi GPIO bus to a Motorola 68000 bus. The intended use case is that you plug it into a 68000 device and then run an emulator that reads instructions from hardware (ROM or RAM) and emulates them. You're still limited by the ~7MHz bus that the hardware is running at, but you can run the instructions as fast as you want.

These days you're supposed to run a custom built OS on the Pi that just does 68000 emulation, but initially it ran Linux on the Pi and a userland 68000 emulator process. And, well, that got me thinking. The emulator takes 68000 instructions, emulates them, and then talks to the hardware to implement the effects of those instructions. What if we, well, just don't? What if we just run all of our code in Linux on an ARM core and then talk to the Amiga hardware?

We're going to ignore x86 here, because it's weird - but most hardware that wants software to be able to communicate with it maps itself into the same address space that RAM is in. You can write to a byte of RAM, or you can write to a piece of hardware that's effectively pretending to be RAM[1]. The Amiga wasn't unusual in this respect in the 80s, and to talk to the graphics hardware you speak to a special address range that gets sent to that hardware instead of to RAM. The CPU knows nothing about this. It just indicates it wants to write to an address, and then sends the data.

So, if we are the CPU, we can just indicate that we want to write to an address, and provide the data. And those addresses can correspond to the hardware. So, we can write to the RAM that belongs to the Amiga, and we can write to the hardware that isn't RAM but pretends to be. And that means we can run whatever we want on the Pi and then access Amiga hardware.

And, obviously, the thing we want to run is Doom, because that's what everyone runs in fucked up hardware situations.

Doom was Amiga kryptonite. Its entire graphical model was based on memory directly representing the contents of your display, and being able to modify that by just moving pixels around. This worked because at the time VGA displays supported having a memory layout where each pixel on your screen was represented by a byte in memory containing an 8 bit value that corresponded to a lookup table containing the RGB value for that pixel.

The Amiga was, well, not good at this. Back in the 80s, when the Amiga hardware was developed, memory was expensive. Dedicating that much RAM to the video hardware was unthinkable - the Amiga 1000 initially shipped with only 256K of RAM, and you could fill all of that with a sufficiently colourful picture. So instead of having the idea of each pixel being associated with a specific area of memory, the Amiga used bitmaps. A bitmap is an area of memory that represents the screen, but only represents one bit of the colour depth. If you have a black and white display, you only need one bitmap. If you want to display four colours, you need two. More colours, more bitmaps. And each bitmap is stored in an independent area of RAM. You never use more memory than you need to display the number of colours you want to.

But that means that each bitplane contains packed information - every byte of data in a bitplane contains the bit value for 8 different pixels, because each bitplane contains one bit of information per pixel. To update one pixel on screen, you need to read from every bitmap, update one bit, and write it back, and that's a lot of additional memory accesses. Doom, but on the Amiga, was slow not just because the CPU was slow, but because there was a lot of manipulation of data to turn it into the format the Amiga wanted and then push that over a fairly slow memory bus to have it displayed.

The CDTV was an aesthetically pleasing piece of hardware that absolutely sucked. It was an Amiga 500 in a hi-fi box with a caddy-loading CD drive, and it ran software that was just awful. There's no path to remediation here. No compelling apps were ever released. It's a terrible device. I love it. I bought one in 1996 because a local computer store had one and I pointed out that the company selling it had gone bankrupt some years earlier and literally nobody in my farming town was ever going to have any interest in buying a CD player that made a whirring noise when you turned it on because it had a fan and eventually they just sold it to me for not much money, and ever since then I wanted to have a CD player that ran Linux and well spoiler 30 years later I'm nearly there. That CDTV is going to be our test subject. We're going to try to get Doom running on it without executing any 68000 instructions.

We're facing two main problems here. The first is that all Amigas have a firmware ROM called Kickstart that runs at powerup. No matter how little you care about using any OS functionality, you can't start running your code until Kickstart has run. This means even documentation describing bare metal Amiga programming assumes that the hardware is already in the state that Kickstart left it in. This will become important later. The second is that we're going to need to actually write the code to use the Amiga hardware.

First, let's talk about Amiga graphics. We've already covered bitmaps, but for anyone used to modern hardware that's not the weirdest thing about what we're dealing with here. The CDTV's chipset supports a maximum of 64 colours in a mode called "Extra Half-Brite", or EHB, where you have 32 colours arbitrarily chosen from a palette and then 32 more colours that are identical but with half the intensity. For 64 colours we need 6 bitplanes, each of which can be located arbitrarily in the region of RAM accessible to the chipset ("chip RAM", distinguished from "fast ram" that's only accessible to the CPU). We tell the chipset where our bitplanes are and it displays them. Or, well, it does for a frame - after that the registers that pointed at our bitplanes no longer do, because when the hardware was DMAing through the bitplanes to display them it was incrementing those registers to point at the next address to DMA from. Which means that every frame we need to set those registers back.

Making sure you have code that's called every frame just to make your graphics work sounds intensely irritating, so Commodore gave us a way to avoid doing that. The chipset includes a coprocessor called "copper". Copper doesn't have a large set of features - in fact, it only has three. The first is that it can program chipset registers. The second is that it can wait for a specific point in screen scanout. The third (which we don't care about here) is that it can optionally skip an instruction if a certain point in screen scanout has already been reached. We can write a program (a "copper list") for the copper that tells it to program the chipset registers with the locations of our bitplanes and then wait until the end of the frame, at which point it will repeat the process. Now our bitplane pointers are always valid at the start of a frame.

Ok! We know how to display stuff. Now we just need to deal with not having 256 colours, and the whole "Doom expects pixels" thing. For the first of these, I stole code from ADoom, the only Amiga doom port I could easily find source for. This looks at the 256 colour palette loaded by Doom and calculates the closest approximation it can within the constraints of EHB. ADoom also includes a bunch of CPU-specific assembly optimisation for converting the "chunky" Doom graphic buffer into the "planar" Amiga bitplanes, none of which I used because (a) it's all for 68000 series CPUs and we're running on ARM, and (b) I have a quad core CPU running at 1.4GHz and I'm going to be pushing all the graphics over a 7.14MHz bus, the graphics mode conversion is not going to be the bottleneck here. Instead I just wrote a series of nested for loops that iterate through each pixel and update each bitplane and called it a day. The set of bitplanes I'm operating on here is allocated on the Linux side so I can read and write to them without being restricted by the speed of the Amiga bus (remember, each byte in each bitplane is going to be updated 8 times per frame, because it holds bits associated with 8 pixels), and then copied over to the Amiga's RAM once the frame is complete.

And, kind of astonishingly, this works! Once I'd figured out where I was going wrong with RGB ordering and which order the bitplanes go in, I had a recognisable copy of Doom running. Unfortunately there were weird graphical glitches - sometimes blocks would be entirely the wrong colour. It took me a while to figure out what was going on and then I felt stupid. Recording the screen and watching in slow motion revealed that the glitches often showed parts of two frames displaying at once. The Amiga hardware is taking responsibility for scanning out the frames, and the code on the Linux side isn't synchronised with it at all. That means I could update the bitplanes while the Amiga was scanning them out, resulting in a mashup of planes from two different Doom frames being used as one Amiga frame. One approach to avoid this would be to tie the Doom event loop to the Amiga, blocking my writes until the end of scanout. The other is to use double-buffering - have two sets of bitplanes, one being displayed and the other being written to. This consumes more RAM but since I'm not using the Amiga RAM for anything else that's not a problem. With this approach I have two copper lists, one for each set of bitplanes, and switch between them on each frame. This improved things a lot but not entirely, and there's still glitches when the palette is being updated (because there's only one set of colour registers), something Doom does rather a lot, so I'm going to need to implement proper synchronisation.

Except. This was only working if I ran a 68K emulator first in order to run Kickstart. If I tried accessing the hardware without doing that, things were in a weird state. I could update the colour registers, but accessing RAM didn't work - I could read stuff out, but anything I wrote vanished. Some more digging cleared that up. When you turn on a CPU it needs to start executing code from somewhere. On modern x86 systems it starts from a hardcoded address of 0xFFFFFFF0, which was traditionally a long way any RAM. The 68000 family instead reads its start address from address 0x00000004, which overlaps with where the Amiga chip RAM is. We can't write anything to RAM until we're executing code, and we can't execute code until we tell the CPU where the code is, which seems like a problem. This is solved on the Amiga by powering up in a state where the Kickstart ROM is "overlayed" onto address 0. The CPU reads the start address from the ROM, which causes it to jump into the ROM and start executing code there. Early on, the code tells the hardware to stop overlaying the ROM onto the low addresses, and now the RAM is available. This is poorly documented because it's not something you need to care if you execute Kickstart which every actual Amiga does and I'm only in this position because I've made poor life choices, but ok that explained things. To turn off the overlay you write to a register in one of the Complex Interface Adaptor (CIA) chips, and things start working like you'd expect.

Except, they don't. Writing to that register did nothing for me. I assumed that there was some other register I needed to write to first, and went to the extent of tracing every register access that occurred when running the emulator and replaying those in my code. Nope, still broken. What I finally discovered is that you need to pulse the reset line on the board before some of the hardware starts working - powering it up doesn't put you in a well defined state, but resetting it does.

So, I now have a slightly graphically glitchy copy of Doom running without any sound, displaying on an Amiga whose brain has been replaced with a parasitic Linux. Further updates will likely make things even worse. Code is, of course, available.

[1] This is why we had trouble with late era 32 bit systems and 4GB of RAM - a bunch of your hardware wanted to be in the same address space and so you couldn't put RAM there so you ended up with less than 4GB of RAM

comment count unavailable comments

05 Aug 2025 3:43am GMT

03 Aug 2025

feedKernel Planet

Brendan Gregg: When to Hire a Computer Performance Engineering Team (2025) part 1 of 2

As a leader in computer performance I've been asked by companies about how (and why) to form a performance engineering team, and as this is broadly useful I'll share my advice here.

Large tech companies in the US hire performance engineers (under that or other titles) to ensure that infrastructure costs and service latency don't grow too high, and that their service is reliable under peak load. A new performance team can likely find enough optimizations to halve infrastructure spend in their first couple of years, even for companies that have been using commercial performance or observability tools. Performance engineers do much more than those tools, working with development teams and vendors to build, test, debug, tune, and adopt new performance solutions, and to find deep optimizations that those tools can miss.

I previously worked on the performance engineering team for Netflix, a large tech consumer running on hundreds of thousands of AWS instances. I'm now doing similar work at Intel (a large tech vendor) for Intel and their customers. As a leader in this space I've also interacted with other performance teams and staff doing performance work at many companies. In this post I'll explain what these teams do and when you should consider forming one. In part 2 I'll provide sample job descriptions, specialties, advice, pitfalls, comments on AI, and what to do if you can't hire a performance team.

It's easy for hardware vendors like Intel to justify hiring performance engineers, as the number one factor in sales is beating a competitor's performance. However, my focus in this post is on non-vendor tech-heavy companies who hire these staff to drive down costs and latency outliers (e.g., banks, telecomms, defense, AI, tech-based companies, and anyone else who is spending more than $1M/year on back-end compute and AI).

What is the ROI of performance engineering?

The main ROIs are infrastructure cost savings, latency reductions, improved scalability and reliability, and faster engineering. The cost savings alone can justify a performance team and can help calculate its size, and I'll explore that in depth, but the other ROIs are worth considering and may be more important to a company depending on its stage of growth.

Infrastructure Cost Savings and Margin Improvements

An appropriately-sized performance team should be targeting 5-10% cost savings per year through tuning and product adoptions. (I'll explain appropriate sizes in the When To Hire section.) For many large companies a 5% result would be considered "good" and a 10% would be "great." Achieving this in practice can mean finding large wins (15-80%) on parts of the infrastructure, which become 5-10% overall. Wins are cumulative, so a team hitting 5% savings each year will multiply to become 28% after their 5th year (like compound interest). Even a modest 2% per year will become significant over time. While these compounded numbers can become large, a team needs to continue finding new cost savings each year to justify long-term retention, and should always focused on the next 5-10%.

Companies may invest in this work for more than just the cost savings: It can be about developing a competitive advantage in their area by providing a better cost/performance ratio, especially for companies with similar tech-based services that pass costs on to customers.

For sites that haven't employed performance engineers before, there can be enough low-hanging fruit that the team can halve infrastructure costs in their first couple of years (50%). It all depends on the number of staff, their level of expertise, how much perf work other staff are already doing (senior developers, SREs), how much custom code is running, and how complex and volatile the stack is.

It would great if we could publicly share specific results, which would look something like this:

"This year we helped reduce our company's infrastructure spend by 5%, from $60M/year to $57M/year, however, since our user base also grew by 3%, we actually reduced cost-per-user by 8%, saving $5M/year in this and all future years."

However, these numbers are usually considered financially sensitive as they can reveal company growth, financial health, confidential infrastructure discounts, etc. As a performance engineer I can talk publicly about percent wins on a back-end service, but I usually can't map it to dollar signs. That doesn't help other companies to understand the value of performance engineering. It's not so much of a problem in Silicon Valley, since staff change companies all the time and word spreads about the latest practices in tech. But in far away countries performance engineering doesn't really exist yet, even though there are companies with sufficiently large infrastructure spend.

Continuing the above example, a typical 8% win could be composed of:

With developer/SRE enablement and vendor adoptions, the performance team isn't finding the wins directly but is enabling other teams and vendors to do so. For example, when I worked at Netflix we built and maintained the flame graph "self-service" application, which developers used daily to find wins, and we worked on multiple product adoptions every year. This all needs to be considered as part of the performance team's ROI.

Latency Reductions

Reducing the response time or latency of a service is a large part of performance engineering. This involves analyzing average latency, 99th percentile latency, and outlier latency; ensuring latency SLA/SLOs are met; and ensuring acceptable latency during perturbations or peak usage.

Many of the cost optimizations described earlier will also reduce average latency, but latency variance or outliers can remain. For example, a once-every-5-minute system task may have negligible cost and CPU footprint, but it may briefly perturb the application and cause latency outliers. These are debugged differently, often using monitoring, logs, distributed tracing, system-level tracing, packet logs, and custom ad-hoc tools. Sometimes the high latency is caused by the request type itself (the system is fine, but the end-user has requested a slow thing) or is an expected consequence of load (queueing theory, tail latency). Other times it can be from complex interactions across multiple layers of the software stack or from interactions across multiple network endpoints.

As a related aside: One performance anti-pattern is when a company, to debug one performance problem, installs a monitoring tool that periodically does work and causes application latency outliers. Now the company has two problems. Tip: try turning off all monitoring agents and see if the problem goes away.

While latency is the main consideration to improve end user experience, others include throughput and parallelism.

Improved Scalability and Reliability

Systems under load can respond with exponential latency or a cascading failure, causing disruptions or a service outage. Performance engineers can test resource scalability with custom load generators and benchmarks, and use analysis tools to study all parts of the system to find and solve bottlenecks. A performance engineer will not just measure scalability limits, but should also explain what the limiting factors are and how to address them to scale further.

A stable and performant service will also earn trust in your company, and can help you grow customers more quickly. It may be a requirement for satisfying enterprise SLA/SLOs.

I'll share a scalability story from my time at Sun Microsystems (a vendor). My goal was to achieve the number one throughput in the industry for a storage system, which would require exceeding 1M IOPS. The expected bottleneck was the rotational disks. I developed my own load generators and analysis tools and concluded that the real bottleneck was, surprisingly, the CPU interconnect. The interconnect was AMD HyperTransport 1, so AMD sent me a new systemboard with HT3 and faster CPUs. I installed it and…performance was identical. I was upset with myself for getting it wrong, until I discovered that AMD had sent me a HT1 board by mistake. They then sent me a real HT3 board and the performance increased by up to 75%! The CPU interconnect (when present) is just one of many components that companies typically don't check, and commercial observability tools don't check either.

Faster Engineering

Performance engineers can take care of components outside of a developer's code base so the developers can stay focused, and also provide them with a greater performance budget so that they can adopt expensive features earlier. For some early stage companies this ROI may be their most important (and is sometimes called engineering velocity). In detail:

What do performance engineers do?

For non-vendor tech companies, in summary:

A. Test, debug, and tune new software and hardware products to find performance improvements, and drives company-wide adoption.

Examples: New cloud instance types, language runtimes, JVM versions, JVM subsystems (new GC algorithms or compilers: Graal vs c2), system libraries (glibc vs tcmalloc etc.), kernels (Linux vs BSD) and versions, compilers (gcc, llvm, icc), processor features (AVX, QAT, etc.), hardware accelerators, and so on. It can take months to debug, fix, and patch everything so the latest thing delivers its performance claim.

B. Develop in-house performance solutions, such as custom analysis tools, that other teams use to find performance wins.

Examples: Custom monitoring using Prometheus and Grafana, one-click flame graphs, and analysis tools using eBPF: All of this is open-source based, but someone has to get it working locally, integrate them with existing local tools, teach other teams how to use them, and maintain them.

C. Does deep-dive analysis to identify and reduce workload bottleneck(s) and latency outliers.

Examples: Using code profilers (CPU flame graphs), distributed tracers (OpenTelemetry and products), application logs, system counters (Linux: sysstat), system tracers (Linux: eBPF, Ftrace, perf), static and dynamic instrumentation (Linux: kprobes, uprobes), debuggers (gdb, etc.), hardware counters (Linux: perf), and on rare occasions hardware instruction tracing. A lot of hands-on live debugging over an SSH session, following methodologies to efficiently find the root-cause(s), which can require the development of custom tools (mini load generators, observability tools, etc.).

D. Optimize software and hardware via tunable parameters and configuration choices.

Examples: System tunables (Linux: sysctls), network tunables (socket options, qdiscs), device tunables, runtime tunables (Java -XX:*), library settings, environment variables, etc. As with (C), the team needs SSH access to do this and likely superuser privileges.

E. Work with development teams (internal and external) to catch non-scalable solutions early in development, and to suggest or test later performance improvements.

Examples: Identifying communication layer will flood network links when horizontally scaled; A developer has a good optimization idea but can't get it to work and needs some help; There's a performance-related pull request on some software the company uses but the request is two years old and needs someone to fix code conflicts, test it, and advocate for merging it.

F. Develop proof-of-concept demonstrations of new performance technologies.

Examples: Linux eBPF and io_uring can provide significant performance improvements when developed into hot-path kernel-based accelerators, but someone needs to at least build a POC to show it would work for the company. These are typically too esoteric for developers to try on their own.

G. Develop performance improvements directly for internal and external code.

Examples: Performance engineers get a lot done by asking the right people, but sometimes no one has the time to code that Linux/runtime/database performance fix the company needs, so a perf engineer takes it on. We aren't as quick as full-time developers since we are hopping between different languages all the time, and as a new code base committer will typically come under extra (and time-consuming) scrutiny.

H. Capacity planning activities: purchase guidance, choosing metrics to monitor, and bottleneck forecasting.

Examples: Modeling and performance characterization for hardware purchases, resource utilization monitoring to forecast capacity issues (nowadays often done by developers and SREs using monitoring tools); propose the best metrics to be watched in those monitoring tools for alert generation and auto-scaling rules; work with business side of the company to help define practical SLA/SLOs.

I. Perform knowledge sharing to uplift engineering.

Examples: Performance education to help developers produce more efficient software; act as a conduit to share performance learnings between teams (that may otherwise be siloed) to avoid rework and rediscovery.

J. Provide in-house expertise to guide purchasing performance solutions.

Examples: Providing in-house expertise for performance topics like observability, telemetry, and eBPF can help the company choose better commercial products by evaluating their capabilities and overhead costs, and can recognize which are just Prometheus and Grafana, or my open source eBPF tools, in a suit. Without expertise you're vulnerable to being ripped off, or may adopt a tool that increases infrastructure costs more than the gains it provides (I've seen some that have overhead exceeding 10%).

To elaborate on (A), the testing of new products: Other staff will try a technology by configuring it based on the README, run a load test, and then share the result with management. Some companies hire dedicated staff for this called "performance testers." Performance engineers get more out of the same technology by running analyzers during the test to understand its limiter ("active benchmarking"), and will tune the technology to get an extra 5%, 50%, or more performance. They may also discover that the limiter is an unintended target (e.g., accidentally testing a caching layer instead). Any performance test should be accompanied by an explanation of the limiting factor, since no explanation will reveal the test wasn't analyzed and the result may be bogus. You can simply ask "why is the result not double?".

As an aside: "CPU bound" isn't an explanation. Do you mean (a) clockspeed, (b) thread pool size, (c) core count, (d) memory bus (which kernels misleadingly include in %CPU counters), or something else (like power, thermal, CPU subsystem bottleneck)? Each of those leads to a different actionable item for the company (E.g.: (a) faster processors; (b) more threads; (c) more cores; (d) faster memory, bigger caches, less NUMA, or software techniques like zero copy). That's just the generic stuff. The code behind any CPU bound workload will also be analyzed to look for inefficiencies, and sometimes their instructions as well.

Day to day, a performance engineer can spend a lot of time fixing broken builds and configuring workloads, because you're the first person testing new patches and bleeding-edge software versions.

What I've described here is for companies that consume tech. For vendors that sell it, performance engineering includes design modeling, analysis of prototype and in-development software and hardware, competitive benchmarking, non-regression testing of new product releases, and pre- and post-sales performance analysis support. (I explain this in more detail in Systems Performance 2nd edition, chapter 1.)

When to Hire a Performance Team and How Many

Most companies I've encountered are already doing some kind of performance work scattered across projects and individuals, but they don't yet have a central performance engineering team looking deeply at everything. This leaves their attention spotty, ok in some areas, poor to absent in others. A central performance team looks at everything and prioritizes work based on the potential ROI.

Here are a few rough rules to determine when you should start forming a company-wide performance engineering team and how to size it (see caveats at the end):

(A) One engineer at $1M/year infrastructure spend, then one per $10M to $20M/year

That first engineer finds some of the low-hanging fruit, and should be cost effective as your company grows past $1M/year. I'd then consider another performance engineer for every $10M to $20M, and maintain a 3:1 junior:senior ratio. The values you use depend on your performance engineer's skill and the complexity of your environment, and how aggressively you wish to improve performance. At a $20M spend, 5% yearly wins means $1M in savings per staff member (minus their cost); whereas for a $10M spend you'd need to hit 10% wins yearly for $1M in savings.

Consider that as your spend keeps growing you will keep adding more staff, which makes their job harder as there is less low-hanging fruit to find. However, your site will also be growing in scale and complexity, and developing new performance issues for the growing team to solve. Also, smaller percent wins become more impactful at large scale, so I expect such a growing perf team to remain cost effective. (To a point: the largest teams I've seen stop at around 150 staff.)

(B) Staff spend should equal or exceed observability monitoring spend

If you're spending $1M/year on an observability product, you can spend $1M/year on a performance engineering team: e.g., 3 to 4 good staff. If you're only spending $50k/year on an observability product, you can't hire a performance engineer at that price, but you can bring in a consultant or pay for performance training and conference attendance. As I'd expect staff to halve infrastructure costs over time, just the savings on monitoring alone (which typically scale with instance/server count) will pay for the new staff. Because these new engineers are actively reducing infrastructure spend, the total savings are much greater.

(C) When latency or reliability is prohibitive to growth

I've heard some small companies and startups say they spend more money on coffee than they do back-end compute, and don't want to waste limited developer time on negligible cost reductions. However, when a wave of new customers arrive they may hit scalability issues and start losing customers because latency is too high or reliability is too inconsistent. That's usually a good time for small companies to start investing in performance engineering.

Caveats for A-C

Companies and Global Staff

Here are some example articles about performance engineering work at non-vendor companies:

I would like to add Bank of America, Wells Fargo, JPMorgan Chase, and CitiGroup to this list since they have many staff with the title "performance engineer" (as you can find on LinkedIn) but it's hard to find public articles about their work. I'd also like a canonical list of central performance engineering teams, but such org chart data can also be hard to find online, and staff don't always call themselves "performance engineers." Other keywords to look out for are: insights, monitoring, and observability; some are just called "support engineers".

Note that there is also a lot of performance engineering done at hardware, software, and cloud vendors (Intel, AMD, NVIDIA, Apple, Microsoft, Google, Amazon, etc.) not listed here, as well as at performance solution companies. In this post I just wanted to focus on non-vendor companies.

Global Staff

I've never seen concrete data on how many people are employed worldwide in performance engineering. Here are my guesses:

It's possible LinkedIn can provide better estimates if you have enterprise access.

Conclusion

There are many reasons to hire a performance engineering team, such as infrastructure cost savings, latency reductions, improved scalability and reliability, and faster engineering. Cost savings alone can justify hiring a team, because a team should be targeting a cost reduction of 5-10% every year, which over the years adds up to become significantly larger: 28%-61% savings after 5 years.

In this post I explained what performance engineers do and provided some suggested rules on hiring:

A) One engineer at >$1M infrastructure spend, then another for every $10-20M.
B) Performance staff spend should equal or exceed observability monitoring spend.

Note that you likely already have some senior developers or SREs who are focusing on perf work, reducing the number of new performance engineers you need.

I've met people who would like to work as performance engineers but their employer has no such roles (other than performance testing: not the same thing) despite spending millions per year on infrastructure. I hope this blog post helps companies understand the value of performance engineering and understand when and how many staff to hire.

Hiring good performance engineers isn't easy as it's a specialized area with a limited talent pool. In part 2 I'll discuss how to hire or train a performance engineering team and provide sample job descriptions and tips, and what to do if you can't hire a performance team.

Thanks

Thanks for the feedback and suggestions: Vadim Filanovsky (OpenAI), Jason Koch (Netflix), Ambud Sharma (Pinterest), Harshad Sane (Netflix), Ed Hunter, Deirdre Straughan.

03 Aug 2025 2:00pm GMT

31 Jul 2025

feedKernel Planet

Matthew Garrett: Secure boot certificate rollover is real but probably won't hurt you

LWN wrote an article which opens with the assertion "Linux users who have Secure Boot enabled on their systems knowingly or unknowingly rely on a key from Microsoft that is set to expire in September". This is, depending on interpretation, either misleading or just plain wrong, but also there's not a good source of truth here, so.

First, how does secure boot signing work? Every system that supports UEFI secure boot ships with a set of trusted certificates in a database called "db". Any binary signed with a chain of certificates that chains to a root in db is trusted, unless either the binary (via hash) or an intermediate certificate is added to "dbx", a separate database of things whose trust has been revoked[1]. But, in general, the firmware doesn't care about the intermediate or the number of intermediates or whatever - as long as there's a valid chain back to a certificate that's in db, it's going to be happy.

That's the conceptual version. What about the real world one? Most x86 systems that implement UEFI secure boot have at least two root certificates in db - one called "Microsoft Windows Production PCA 2011", and one called "Microsoft Corporation UEFI CA 2011". The former is the root of a chain used to sign the Windows bootloader, and the latter is the root used to sign, well, everything else.

What is "everything else"? For people in the Linux ecosystem, the most obvious thing is the Shim bootloader that's used to bridge between the Microsoft root of trust and a given Linux distribution's root of trust[2]. But that's not the only third party code executed in the UEFI environment. Graphics cards, network cards, RAID and iSCSI cards and so on all tend to have their own unique initialisation process, and need board-specific drivers. Even if you added support for everything on the market to your system firmware, a system built last year wouldn't know how to drive a graphics card released this year. Cards need to provide their own drivers, and these drivers are stored in flash on the card so they can be updated. But since UEFI doesn't have any sandboxing environment, those drivers could do pretty much anything they wanted to. Someone could compromise the UEFI secure boot chain by just plugging in a card with a malicious driver on it, and have that hotpatch the bootloader and introduce a backdoor into your kernel.

This is avoided by enforcing secure boot for these drivers as well. Every plug-in card that carries its own driver has it signed by Microsoft, and up until now that's been a certificate chain going back to the same "Microsoft Corporation UEFI CA 2011" certificate used in signing Shim. This is important for reasons we'll get to.

The "Microsoft Windows Production PCA 2011" certificate expires in October 2026, and the "Microsoft Corporation UEFI CA 2011" one in June 2026. These dates are not that far in the future! Most of you have probably at some point tried to visit a website and got an error message telling you that the site's certificate had expired and that it's no longer trusted, and so it's natural to assume that the outcome of time's arrow marching past those expiry dates would be that systems will stop booting. Thankfully, that's not what's going to happen.

First up: if you grab a copy of the Shim currently shipped in Fedora and extract the certificates from it, you'll learn it's not directly signed with the "Microsoft Corporation UEFI CA 2011" certificate. Instead, it's signed with a "Microsoft Windows UEFI Driver Publisher" certificate that chains to the "Microsoft Corporation UEFI CA 2011" certificate. That's not unusual, intermediates are commonly used and rotated. But if we look more closely at that certificate, we learn that it was issued in 2023 and expired in 2024. Older versions of Shim were signed with older intermediates. A very large number of Linux systems are already booting certificates that have expired, and yet things keep working. Why?

Let's talk about time. In the ways we care about in this discussion, time is a social construct rather than a meaningful reality. There's no way for a computer to observe the state of the universe and know what time it is - it needs to be told. It has no idea whether that time is accurate or an elaborate fiction, and so it can't with any degree of certainty declare that a certificate is valid from an external frame of reference. The failure modes of getting this wrong are also extremely bad! If a system has a GPU that relies on an option ROM, and if you stop trusting the option ROM because either its certificate has genuinely expired or because your clock is wrong, you can't display any graphical output[3] and the user can't fix the clock and, well, crap.

The upshot is that nobody actually enforces these expiry dates - here's the reference code that disables it. In a year's time we'll have gone past the expiration date for "Microsoft Windows UEFI Driver Publisher" and everything will still be working, and a few months later "Microsoft Windows Production PCA 2011" will also expire and systems will keep booting Windows despite being signed with a now-expired certificate. This isn't a Y2K scenario where everything keeps working because people have done a huge amount of work - it's a situation where everything keeps working even if nobody does any work.

So, uh, what's the story here? Why is there any engineering effort going on at all? What's all this talk of new certificates? Why are there sensationalist pieces about how Linux is going to stop working on old computers or new computers or maybe all computers?

Microsoft will shortly start signing things with a new certificate that chains to a new root, and most systems don't trust that new root. System vendors are supplying updates[4] to their systems to add the new root to the set of trusted keys, and Microsoft has supplied a fallback that can be applied to all systems even without vendor support[5]. If something is signed purely with the new certificate then it won't boot on something that only trusts the old certificate (which shouldn't be a realistic scenario due to the above), but if something is signed purely with the old certificate then it won't boot on something that only trusts the new certificate.

How meaningful a risk is this? We don't have an explicit statement from Microsoft as yet as to what's going to happen here, but we expect that there'll be at least a period of time where Microsoft signs binaries with both the old and the new certificate, and in that case those objects should work just fine on both old and new computers. The problem arises if Microsoft stops signing things with the old certificate, at which point new releases will stop booting on systems that don't trust the new key (which, again, shouldn't happen). But even if that does turn out to be a problem, nothing is going to force Linux distributions to stop using existing Shims signed with the old certificate, and having a Shim signed with an old certificate does nothing to stop distributions signing new versions of grub and kernels. In an ideal world we have no reason to ever update Shim[6] and so we just keep on shipping one signed with two certs.

If there's a point in the future where Microsoft only signs with the new key, and if we were to somehow end up in a world where systems only trust the old key and not the new key[7], then those systems wouldn't boot with new graphics cards, wouldn't be able to run new versions of Windows, wouldn't be able to run any Linux distros that ship with a Shim signed only with the new certificate. That would be bad, but we have a mechanism to avoid it. On the other hand, systems that only trust the new certificate and not the old one would refuse to boot older Linux, wouldn't support old graphics cards, and also wouldn't boot old versions of Windows. Nobody wants that, and for the foreseeable future we're going to see new systems continue trusting the old certificate and old systems have updates that add the new certificate, and everything will just continue working exactly as it does now.

Conclusion: Outside some corner cases, the worst case is you might need to boot an old Linux to update your trusted keys to be able to install a new Linux, and no computer currently running Linux will break in any way whatsoever.

[1] (there's also a separate revocation mechanism called SBAT which I wrote about here, but it's not relevant in this scenario)

[2] Microsoft won't sign GPLed code for reasons I think are unreasonable, so having them sign grub was a non-starter, but also the point of Shim was to allow distributions to have something that doesn't change often and be able to sign their own bootloaders and kernels and so on without having to have Microsoft involved, which means grub and the kernel can be updated without having to ask Microsoft to sign anything and updates can be pushed without any additional delays

[3] It's been a long time since graphics cards booted directly into a state that provided any well-defined programming interface. Even back in 90s, cards didn't present VGA-compatible registers until card-specific code had been executed (hence DEC Alphas having an x86 emulator in their firmware to run the driver on the card). No driver? No video output.

[4] There's a UEFI-defined mechanism for updating the keys that doesn't require a full firmware update, and it'll work on all devices that use the same keys rather than being per-device

[5] Using the generic update without a vendor-specific update means it wouldn't be possible to issue further updates for the next key rollover, or any additional revocation updates, but I'm hoping to be retired by then and I hope all these computers will also be retired by then

[6] I said this in 2012 and it turned out to be wrong then so it's probably wrong now sorry, but at least SBAT means we can revoke vulnerable grubs without having to revoke Shim

[7] Which shouldn't happen! There's an update to add the new key that should work on all PCs, but there's always the chance of firmware bugs

comment count unavailable comments

31 Jul 2025 4:53pm GMT

25 Jul 2025

feedKernel Planet

Linux Plumbers Conference: All Microconferences have been Accepted!

Good news! All Microconferences have been accepted and are now accepting submissions. The accepted Microconferences are:

You can start submitting topics to these Microconferences. Remember to read the Blog on what makes the ideal Microconference topic before submitting.

After that, submit your topic and make sure that you select the appropriate track that you are submitting for (they are all listed under LPC Microconference Proposals and end with MC).

25 Jul 2025 8:12pm GMT

24 Jul 2025

feedKernel Planet

Dave Airlie (blogspot): ramalama/mesa : benchmarks on my hardware and open source vs proprietary

One of my pet peeves around running local LLMs and inferencing is the sheer mountain of shit^W^W^W complexity of compute stacks needed to run any of this stuff in an mostly optimal way on a piece of hardware.

CUDA, ROCm, and Intel oneAPI all to my mind scream over-engineering on a massive scale at least for a single task like inferencing. The combination of closed source, over the wall open source, and open source that is insurmountable for anyone to support or fix outside the vendor, screams that there has to be a simpler way. Combine that with the pytorch ecosystem and insanity of deploying python and I get a bit unstuck.

What can be done about it?

llama.cpp to me seems like the best answer to the problem at present, (a rust version would be a personal preference, but can't have everything). I like how ramalama wraps llama.cpp to provide a sane container interface, but I'd like to eventually get to the point where container complexity for a GPU compute stack isn't really needed except for exceptional cases.

On the compute stack side, Vulkan exposes most features of GPU hardware in a possibly suboptimal way, but with extensions all can be forgiven. Jeff Bolz from NVIDIA's talk at Vulkanised 2025 started to give me hope that maybe the dream was possible.

The main issue I have is Jeff is writing driver code for the NVIDIA proprietary vulkan driver which reduces complexity but doesn't solve my open source problem.

Enter NVK, the open source driver for NVIDIA GPUs. Karol Herbst and myself are taking a look at closing the feature gap with the proprietary one. For mesa 25.2 the initial support for VK_KHR_cooperative_matrix was landed, along with some optimisations, but there is a bunch of work to get VK_NV_cooperative_matrix2 and a truckload of compiler optimisations to catch up with NVIDIA.

But since mesa 25.2 was coming soon I wanted to try and get some baseline figures out.

I benchmarked on two systems (because my AMD 7900XT wouldn't fit in the case). Both Ryzen CPUs. The first I used system I put in an RTX5080 then a RTX6000 Ada and then the Intel A770. The second I used for the RX7900XT. The Intel SYCL stack failed to launch unfortunately inside ramalama and I hacked llama.cpp to use the A770 MMA accelerators.

ramalama bench hf://unsloth/Qwen3-8B-GGUF:UD-Q4_K_XL

I picked this model at random, and I've no idea if it was a good idea.


Some analysis:

The token generation workload is a lot less matmul heavy than prompt processing, it also does a lot more synchronising. Jeff has stated CUDA wins here mostly due to CUDA graphs and most of the work needed is operation fusion on the llama.cpp side. Prompt processing is a lot more matmul heavy, extensions like NV_coopmat2 will help with that (NVIDIA vulkan already uses it in the above), but there may be further work to help close the CUDA gap. On AMD radv (open source) Vulkan is already better at TG than ROCm, but behind in prompt processing. Again coopmat2 like extensions should help close the gap there.

NVK is starting from a fair way behind, we just pushed support for the most basic coopmat extension and we know there is a long way to go, but I think most of it is achievable as we move forward and I hope to update with new scores on a semi regular basis. We also know we can definitely close the gap on the NVIDIA proprietary Vulkan driver if we apply enough elbow grease and register allocation :-)

I think it might also be worth putting some effort into radv coopmat2 support, I think if radv could overtake ROCm for both of these it would remove a large piece of complexity from the basic users stack.

As for Intel I've no real idea, I hope to get their SYCL implementation up and running, and maybe I should try and get my hands on a B580 card as a better baseline. When I had SYCL running once before I kinda remember it being 2-4x the vulkan driver, but there's been development on both sides.

(The graphs were generated by Gemini.)

24 Jul 2025 10:19pm GMT

22 Jul 2025

feedKernel Planet

Pete Zaitcev: Floating Point

I'm unemployed right now and I go to job interviews once in a while. One time, the company was doing another AI thing, having to do with verifying that training computations were doing something useful, and not just "dumping a stream of floating point numbers".

Until now I didn't think of it, but apparently AI is all in FP. And it reminded me how I worked in a CPU design place, where they had a group focused on FP. Those guys were doing FP since the days of transistor. They migrated their designs, generation by generation, through TTL, ECL, Bi-CMOS, CMOS. When I heard from them last, they were tinkering with "deep sub-micron".

One remarkable part about their thing was that because they started out in transistors, their FPU didn't have any microcode. It was all in hardware. Even divisions! Just a bunch of counters that sequenced whatever necessary.

For a long time during the reign of x86, the group was somewhat de-prioritized, because many microprocessors at the time treated FP performance as an afterthought. A number of desktop CPUs shipped with no hardware FP at all. But look how the tables have turned. I honestly hope that it was not too late and AI has become a boon for the successors of my past colleagues.

22 Jul 2025 5:28pm GMT

12 Jul 2025

feedKernel Planet

Pete Zaitcev: AI writing

On the topic of AI writing code, I've read a Sci-Fi story some 30 years ago, probably from the 1950s or 1960s.

At the future Earth, fiction writers write using machines. The quality of writing is associated with the sophistication of writer's machine. Publishers reject stories written on a lower end machine. The hero of the story is a struggling writer, who has to make do with a cheap unit. As his machine writes poorly, he's paid little, so he cannot save up for an upgrade. He hatches a plan to sneak into the house of a successful writer, and use the better machine to write a break-out story. The oddly prescient punch-line is, he discovers that the successful writer's machine was non-functional. His better wrote his stories manually in secret.

I wonder if such scenario may even be possible, programming-wise, if you work in a company that does not micro-manage velocity, or work as a consultant, so that your sausage factory remains behind a curtain.

12 Jul 2025 3:23am GMT

08 Jul 2025

feedKernel Planet

Harald Welte: Security Issues regarding GSMA eSIMs / eUICCs + Javacard

The independent security researcher Adam Gowdiak has published an extensive report on flaws he found in some eUICCs (the chips used to store eSIM profiles within the GSMA eSIM architecture). While the specific demonstrable exploit was in a product of one specific CardOS Vendor (Kigen, formerly part of ARM), the fundamental underlying issue is actually an architectural one.

The Oracle Javacard [memory] safety architecture relies on a so-called bytecode verifier which is a program that you run after compiling an application, but before executing the code on the Javacard. The specifications allow for both on-card and off-card verification. However, the computational complexity of this verifier is generally assumed to exceed the resources available inside many microcontrollers used to implement java cards. Such microcontrollers often are ARM SC000 (Cortex-M0 based) or SC300 (Cortex-M3 based) based, with only tens of kilobytes of RAM and hundreds of kilobytes of flash.

Javacard was originally developed for use cases within the banking/payment industry. In that industry, the card-issuing bank is the sole entity that has the keys to load java applets onto a card. That entity is of course interested in the security of the card, and will hence always run an off-card bytecode verifier. In a world of physical SIM/USIM cards issued by a single mobile operator, the situation is the same: The card-issuing MNO/MVNO is the only entity with key materials to install additional java applets on the card.

This fundamental problem became already apparent by earlier findings by Adam Gowdiak in 2019, but at least in terms of public responses by Oracle and Gemalto back then, they mostly did hand-waving and/or made lame excuses.

However, when the industry represented in GSMA standardized the eSIM architecture, this changed. Suddenly we have various eSIM profiles of various different operators, each holding key material to install Java applets on the shared card. In such an environment, it is no longer safe to assume that every MNO/MVNO can be trusted to be non-adverserial and hence trusted to run that off-card bytecode verifier before loading applets onto the card.

If the Javacard runtime on the existing card/chip itself cannot autonomously perform those verification tasks, I don't see how the problem can ever be solved short of completely removing/disabling Javacard support in such eUICCs. Luckily it is an optional feature and not a mandatory requirement for an eUICC to be approved/accredited. Sadly many MNOs/MVNOS however will mandate Javacard support in their eSIM profiles and hence refuse to install into an eUICC without it :(

In my opinion, the solution to the problem can only be to either make the GSMA require full on-card bytecode verification on all eUICCs, or to remove Javacard support from the eUICC.

We have to keep in mind that there are hundreds if not thousands of MVNOs around the planet, and all of them are subject to whatever local jurisdiction they operate in, and also subject to whatever government pressure (e.g from intelligence agencies).

In hindsight, anyone familiar with the 2019 work by Gowdiak and an understanding of the fundamental change to multiple stakeholders in an eUICC (compared to classic SIM/USIM) should have arrived at the conclusion that there is a serious problem that needs addressing. I think the 2019 work had not been publicized and covered significantly enough to make sure that everyone in the industry was made aware of the problems. And that in turn is mostly a result of Oracle + Gemalto downplaying the 2019 findings back in the day, rather than raising awareness within all relevant groups and bodies of the industry.

Mitigation via TS.48 key diversification

The specific attack presented was using a GSMA TS.48 test profile to install the malicious java bytecode; those TS.48 profiles are standardized profiles used by the industry for cellular testing; until the attack they contained well-known static OTA key material. The mitigation to randomize/diversity those keys in TS.48v7 closes that particular vector, but the attack itself is not dependent on test profiles. Any MNO/NVNO (or rather, anyone with access to a commercial service of a SM-DP+ accredited by GSMA) obviously has the ability to load java applets into the eSIM profile that they create, using keys that they themselves specify.

What IMHO ought to be done

  • Oracle should get off their we only provide a reference implementation and vendors should invent their own prporietary verification mechanisms horse. This is just covering their own ass and not helping any of their downstream users/implementers. The reference implementation should show how proper verification can be done in the most resource-constrained environment of cards (it's JavaCard, after all!), and any advances of the verifier should happen once at Oracle, and then used by all the implementers (CardOS vendors). Anyone who really cares about security of a standardized platform (like Javacard) should never leave key aspects of it up to each and every implementer, but rather should solve the problem once, publicly, with validation and testing tools, independent 3rd party penetration testing and then ensure that every implementer uses that proven implementation.

  • GSMA should have security requirements (and mandatory penetration tests) specifically regarding the JVM/JRE of each card that gets SAS-UP accredited.

  • GSMA should require that Javacard support should be disabled on all existing eUICCs that cannot legitimately claim/demonstrate that they are performing full bytecode verification entirely on-card.

  • GSMA should refuse any future SAS-UP accreditation to any product that requires off-card bytecode verification

  • The entire industry should find a way to think beyond Javacard, or in fact any technology whose security requires verification of the executable program that is too complex to perform on-card on the targeted microcontrollers.

08 Jul 2025 10:00pm GMT

04 Jul 2025

feedKernel Planet

James Morris: Where Else to Find Me

I'm not blogging much these days, and more likely posting on these accounts:

If you'd like to follow updates for the Linux Security Summit (LSS), see here:

For topics which are specifically $work related, see my LinkedIn:

04 Jul 2025 7:59pm GMT