17 Apr 2025

feedLXer Linux News

GNOME 47.6 Fixes Black Screen Issue on Multi-Monitor Setups with NVIDIA Driver

While GNOME 47.5 was skipped, the GNOME Project released GNOME 47.6 today as a new maintenance update to the GNOME 47 "Denver" desktop environment series to address various issues and fix more bugs.

17 Apr 2025 11:16pm GMT

Open Source Initiative (OSI) Privacy Fiasco in Detail: In Conclusion and Enforcement Action Proceeds Against OSI at the California Privacy Protection Agency (CPPA)

No matter what it used to be (or could be or was supposed to be), the general public must understand what it is right now. Information is essential here; transparency is imperative.

17 Apr 2025 10:00pm GMT

Ubuntu 25.04 (Plucky Puffin) released

Ubuntu 25.04, codenamed "Plucky Puffin", is here.

17 Apr 2025 8:45pm GMT

GNOME 48.1 Desktop Is Out to Improve HDR Support and Fix Various Issues

The GNOME project released today GNOME 48.1 as the first maintenance update to the latest GNOME 48 "Bengaluru" desktop environment series to fix bugs and improve existing functionality.

17 Apr 2025 8:28pm GMT

KDE Gear 25.04 Apps Collection Rolls Out, Here’s What’s New

The KDE Gear 25.04 app collection is now available for download, introducing big changes to Dolphin, Kdenlive, Itinerary, and more.

17 Apr 2025 7:36pm GMT

feedLinuxiac

Arch Says Goodbye to Redis, Adopts Valkey

Arch Says Goodbye to Redis, Adopts Valkey

Redis, a popular in-memory data store, is being deprecated in Arch Linux's repo; Valkey steps in as a high-performance BSD-licensed replacement.

17 Apr 2025 7:21pm GMT

feedLXer Linux News

KDE Gear 25.04 Applications Suite Officially Released, Here’s What’s New

The KDE Project released today KDE Gear 25.04 as the latest version to this collection of apps for the KDE Plasma desktop environment and the Linux ecosystem, adding new features and enhancements to your favorite KDE applications.

17 Apr 2025 6:44pm GMT

CVE Foundation Emerges From Stealth to Rescue CVE Program

The rapid defunding and refunding of the CVE Project is just another sign of the destabilization our government is currently experiencing.

17 Apr 2025 5:51pm GMT

HTTP 307 Temporary Redirect Status Code: What is it and When Should You Use it?

What is the HTTP 307 Temporary Redirect Status Code and when should you use it? Find out everything you need to know in our latest blog.

17 Apr 2025 4:59pm GMT

How To Monitor Network Latency In Linux Using Ping, Mtr and Smokeping

This guide explains how to measure and monitor network latency on Linux using simple tools like ping, mtr, and smokeping.

17 Apr 2025 4:07pm GMT

feedLinux Today

Cheat: Create a Cheatsheet for Your Favorite Command in Linux

Discover how to create a personalized cheatsheet for your favorite Linux commands. Simplify your workflow and enhance your productivity with our expert guide.

The post Cheat: Create a Cheatsheet for Your Favorite Command in Linux appeared first on Linux Today.

17 Apr 2025 3:48pm GMT

EU OS: The Hatchling Distro That Wants to Unify the Eu’s Public Sector

Explore EU OS, the Hatchling Distro designed to unify the EU's public sector. Learn how it enhances collaboration and efficiency across government services.

The post EU OS: The Hatchling Distro That Wants to Unify the Eu's Public Sector appeared first on Linux Today.

17 Apr 2025 3:44pm GMT

Fedora 43 Ushers in RPM 6, Introduces New Project Leader

Discover how Fedora 43 ushers in RPM 6 and introduces a new project leader, enhancing the future of this innovative Linux distribution.

The post Fedora 43 Ushers in RPM 6, Introduces New Project Leader appeared first on Linux Today.

17 Apr 2025 3:42pm GMT

What’s Your Chatbot’s Carbon Footprint?

Discover the environmental impact of your chatbot. Learn how to measure and reduce its carbon footprint for a more sustainable digital presence.

The post What's Your Chatbot's Carbon Footprint? appeared first on Linux Today.

17 Apr 2025 3:39pm GMT

Best Free and Open Source Alternatives to Progress Chef

Discover the best free and open source alternatives to Progress Chef. Explore powerful tools that enhance your DevOps workflow without the cost.

The post Best Free and Open Source Alternatives to Progress Chef appeared first on Linux Today.

17 Apr 2025 3:37pm GMT

Rust 1.86 Introduces Major Language Features

Uncover the significant updates in Rust 1.86. Learn about the new language features that enhance coding efficiency and boost application performance.

The post Rust 1.86 Introduces Major Language Features appeared first on Linux Today.

17 Apr 2025 3:33pm GMT

WattWise: Monitor Your Computer’s Power Usage in Real-Time

Track your computer's power usage effortlessly with WattWise. Gain insights in real-time and make informed decisions to enhance energy efficiency.

The post WattWise: Monitor Your Computer's Power Usage in Real-Time appeared first on Linux Today.

17 Apr 2025 3:30pm GMT

Thunderbird Launches Open Source Services to Rival Gmail and Office365

earn about Thunderbird's latest open-source services that challenge Gmail and Office365, delivering unique solutions for users seeking efficient email management.

The post Thunderbird Launches Open Source Services to Rival Gmail and Office365 appeared first on Linux Today.

17 Apr 2025 3:26pm GMT

8 Best Free and Open Source Emacs-Like Text Editors

Explore our top 8 free and open source text editors inspired by Emacs. Find the perfect tool to boost your productivity and streamline your workflow.

The post 8 Best Free and Open Source Emacs-Like Text Editors appeared first on Linux Today.

17 Apr 2025 3:24pm GMT

Open-Source Malware Doubles, Data Exfiltration Attacks Dominate

Explore the alarming increase in open-source malware and the prevalence of data exfiltration attacks. Learn how to protect your digital assets effectively.

The post Open-Source Malware Doubles, Data Exfiltration Attacks Dominate appeared first on Linux Today.

17 Apr 2025 3:18pm GMT

feedLXer Linux News

KDE Gear 25.04 Delivers Many Improvements To KDE's Applications

Following the recent Plasma 6.3 desktop release, KDE Gear 25.04 is now available for shipping all of the latest and greatest KDE applications...

17 Apr 2025 3:15pm GMT

Linux.com as Spamfarm of the Linux Foundation, Partner of the Gates Foundation

Linux Foundation fired all the staff (including editors) of Linux.com

17 Apr 2025 2:23pm GMT

feedLinuxiac

Ubuntu 25.04 (Plucky Puffin) Now Available, This Is What’s New

Ubuntu 25.04 (Plucky Puffin) Now Available, This Is What’s New

Ubuntu 25.04 is now available for download, featuring GNOME 48, Linux kernel 6.14, systemd 257, updated toolchains, and enhanced gaming and GPU support.

17 Apr 2025 2:02pm GMT

feedLXer Linux News

Ubuntu 25.04 “Plucky Puffin” Is Now Available for Download, This Is What’s New

Canonical published today the Ubuntu 25.04 (Plucky Puffin) release, the latest stable version of their popular GNU/Linux distribution, featuring up-to-date components and new features.

17 Apr 2025 1:31pm GMT

feedLinuxiac

LXQt 2.2 Released with Enhanced Wayland Support

LXQt 2.2 Released with Enhanced Wayland Support

LXQt 2.2 desktop environment introduces enhanced Wayland support, power profile switching, PCManFM-Qt improvements, and better terminal handling.

17 Apr 2025 11:08am GMT

KDE Gear 25.04 Apps Collection Rolls Out, Here’s What’s New

KDE Gear 25.04 Apps Collection Rolls Out, Here’s What’s New

The KDE Gear 25.04 app collection is now available for download, introducing big changes to Dolphin, Kdenlive, Itinerary, and more.

17 Apr 2025 9:40am GMT

feedLXer Linux News

Intels Newest Linux Driver Being Worked On For The Kernel: iXD

Intel open-source software engineers last week posted a set of patches for a new driver: iXD. The three letter acronym party continues and this time even more difficult to decipher than some of their other obscure driver names...

17 Apr 2025 7:22am GMT

Linux turns 25, with corporate contributors now key to its future

What's the weather in Hell today? Linus Torvalds has visited Microsoft at LinuxCon. On August 25, 1991, an unknown Finnish developer posted the following to the comp.os.minix newsgroup:…

17 Apr 2025 4:19am GMT

Google's brand new OS could replace Android

webOS, BeOS and Android heritageThe source code of Google's latest operating system has emerged, and it looks like all new code from the ground up.…

17 Apr 2025 2:48am GMT

Systemd adds filesystem mount tool

Linux mount gets a systemd wrapper. The developers behind Systemd, the alternative to sysvinit, have added a mount tool to their user space bootstrapper.…

17 Apr 2025 1:16am GMT

16 Apr 2025

feedLinuxiac

Tor Browser 14.5 Introduces Android Connection Assist

Tor Browser 14.5 Introduces Android Connection Assist

The latest Tor Browser adds Connection Assist to Android, making it easier for users in censored regions to connect securely with one tap.

16 Apr 2025 7:55pm GMT

NVIDIA Releases Linux Display Driver v575 Beta

NVIDIA Releases Linux Display Driver v575 Beta

NVIDIA releases Linux Display Driver 575.51.02 (BETA), enhancing Vulkan, GLX, and Xwayland support while fixing crashes in Minecraft and Marvel Rivals.

16 Apr 2025 2:43pm GMT

Deepin 23.1 Launches with Smarter AI, Enhanced Hardware Support

Deepin 23.1 Launches with Smarter AI, Enhanced Hardware Support

Debian-based Deepin 23.1 is out now with Linux kernels 6.6/6.12, smarter updates, improved hardware support, AI upgrades, and over 100 bug fixes.

16 Apr 2025 1:59pm GMT

TrueNAS 25.04 (Fangtooth) Open-Source NAS Released

TrueNAS 25.04 “Fangtooth” Open-Source NAS Released

TrueNAS 25.04 open-source NAS launches with ZFS fast deduplication, new API features, experimental Incus-based virtualization, and more.

16 Apr 2025 12:34pm GMT

15 Apr 2025

feedLinuxiac

WordPress 6.8 Released as a Performance-Focused Update

WordPress 6.8 Released as a Performance-Focused Update

WordPress 6.8 CMS, codenamed "Cecil" is out, featuring faster page loads, stronger password security, and a redesigned Style Book.

15 Apr 2025 7:06pm GMT

Fedora-based Ultramarine Linux 41 “Cyberia” Released

Fedora-based Ultramarine Linux 41 “Cyberia” Released

Ultramarine Linux 41 "Cyberia" lands with new installer images, a custom CLI tool, WSL support, and major updates across all editions.

15 Apr 2025 4:35pm GMT

18 Mar 2025

feedKernel Planet

Matthew Garrett: Failing upwards: the Twitter encrypted DM failure

Almost two years ago, Twitter launched encrypted direct messages. I wrote about their technical implementation at the time, and to the best of my knowledge nothing has changed. The short story is that the actual encryption primitives used are entirely normal and fine - messages are encrypted using AES, and the AES keys are exchanged via NIST P-256 elliptic curve asymmetric keys. The asymmetric keys are each associated with a specific device or browser owned by a user, so when you send a message to someone you encrypt the AES key with all of their asymmetric keys and then each device or browser can decrypt the message again. As long as the keys are managed appropriately, this is infeasible to break.

But how do you know what a user's keys are? I also wrote about this last year - key distribution is a hard problem. In the Twitter DM case, you ask Twitter's server, and if Twitter wants to intercept your messages they replace your key. The documentation for the feature basically admits this - if people with guns showed up there, they could very much compromise the protection in such a way that all future messages you sent were readable. It's also impossible to prove that they're not already doing this without every user verifying that the public keys Twitter hands out to other users correspond to the private keys they hold, something that Twitter provides no mechanism to do.

This isn't the only weakness in the implementation. Twitter may not be able read the messages, but every encrypted DM is sent through exactly the same infrastructure as the unencrypted ones, so Twitter can see the time a message was sent, who it was sent to, and roughly how big it was. And because pictures and other attachments in Twitter DMs aren't sent in-line but are instead replaced with links, the implementation would encrypt the links but not the attachments - this is "solved" by simply blocking attachments in encrypted DMs. There's no forward secrecy - if a key is compromised it allows access to not only all new messages created with that key, but also all previous messages. If you log out of Twitter the keys are still stored by the browser, so if you can potentially be extracted and used to decrypt your communications. And there's no group chat support at all, which is more a functional restriction than a conceptual one.

To be fair, these are hard problems to solve! Signal solves all of them, but Signal is the product of a large number of highly skilled experts in cryptography, and even so it's taken years to achieve all of this. When Elon announced the launch of encrypted DMs he indicated that new features would be developed quickly - he's since publicly mentioned the feature a grand total of once, in which he mentioned further feature development that just didn't happen. None of the limitations mentioned in the documentation have been addressed in the 22 months since the feature was launched.

Why? Well, it turns out that the feature was developed by a total of two engineers, neither of whom is still employed at Twitter. The tech lead for the feature was Christopher Stanley, who was actually a SpaceX employee at the time. Since then he's ended up at DOGE, where he apparently set off alarms when attempting to install Starlink, and who today is apparently being appointed to the board of Fannie Mae, a government-backed mortgage company.

Anyway. Use Signal.

comment count unavailable comments

18 Mar 2025 11:58pm GMT

Andi Kleen: Quitting an Intel x86 hypervisor

This is an esoteric topic that might be of interest to people implementing Intel hypervisors. It assumes you know the basics of the Intel virtualization architecture, see Hypervisor from scratch for a tutorial. The actual full VT architecture is described in Volume 3 of the Intel SDM

Let's say we write an x86 hypervisor that starts in the UEFI environment and virtualizes the initialization phase of an OS. But the hypervisor wants to eventually quit itself to not cause extra overhead during OS run time.

The way the hypervisor works is that it runs in its own memory and with its own page tables which are switched atomically on every VM exit by the VT-x implementation. This way it is isolated from the main OS.

At some vm exit with the hypervisor running in its own context it decides that it is not needed anymore and wants to quit. To disable VT support the VMXOFF instruction can be used. But what we really need is an atomic VMXOFF + switch to the original OS page tables plus a jump, and all that without using any registers which need to be already restored to the original state of the OS.

One trick is to use the MOV to CR3 instruction that reloads the page table as a jump. As soon as the page table is reloaded the CPU will fetch the next instruction with the translations from the freshly loaded page table, so we can transfer execution to the guest context. However to do that the MOV CR3 needs to be located just before the page offset of the target instruction. This can be done by copying a trampoline to the right page offset (potentially overlapping into the previous page). The trampoline is located in a special transfer page table mapping that places writable code pages overlapping the target mapping.

But there are some complications. The hypervisor also needs to load the segmentation state (like GDT/LDT/IDT) of the guest. In theory they could just be loaded by mapping these guest pages into the transfer mapping and loading them before the transfer. But what happens if the GDT/LDT/IDT is on the same page as the target address? This is common in real OS' assembler startup code which is implemented in a small assembler file without any page separation between code and data. One option would be to copy them to the transfer page too and load it there, or the hypervisor first copies them to a temporary buffer and loads it from there. In the second option the base addresses of these structures will be incorrect, but in practice you can often rely on them getting reloaded eventually anyways.

Another problem is the register state of the target. MOV to CR3 needs a register as the source of the reload, and it needs to be the last instruction of the trampoline. So it is impossible to restore the register it uses. But remember the hypervisor is doing this as the result of a VM exit. If we chose an exit for a condition that already clobbers a register we can use the same register for the reload and the next instruction executed in the original target (and which caused the exit originally) will just overwrite it again.

A very convenient instruction for this is CPUID. It is executed multiple times in OS startup and overwrites multiple registers. In fact VMX always intercepts CPUID so it has to handle these exits in any case. So the trick to quit an hypervisor is to wait for the next CPUID exit and then use one of the registers clobbered by CPUID for the final CR3 reload. This will have inconsistent register state for one instruction in the target, but unless the original OS is currently running a debugger it will never notice. In principle any exit as a result of an instruction that clobbers a register can be used for this.

There is another potential complication if the target address of the OS conflicts with where the hypervisor is running before entering the transfer mapping. The transfer mapping needs to map the original code so that it can be jumped to. This could be solved with a third auxiliary mapping that is used before jumping to the transfer trampoline. In practice it doesn't seem to be a problem because x86 OS typically run in a 1:1 mapping for startup, and that cannot conflict with the 1:1 mapping used by UEFI programs as our hypervisor.

Happy hypervisor hacking!

18 Mar 2025 9:34pm GMT

03 Mar 2025

feedKernel Planet

Lucas De Marchi: Lazy libkmod compression library loading

One new feature in kmod 34 is related to lazily loading the decompression libraries. In other words, dlopen them when/if they are needed. Although it's desired for a tool like modinfo to be able to inspect a .ko.xz, .ko.zst or .ko.gz module, other daemons linking to libkmod would benefit from never loading them. This is the case for systemd-udevd that will load the kernel modules during boot when they are requested by the kernel.

Testing on Archlinux, it goes from this:

$ cat /proc/$(pidof systemd-udevd)/maps | grep -e libz -e liblzma
7f27a9ae8000-7f27a9aeb000 r--p 00000000 00:19 15656                      /usr/lib/liblzma.so.5.6.4
7f27a9aeb000-7f27a9b0e000 r-xp 00003000 00:19 15656                      /usr/lib/liblzma.so.5.6.4
7f27a9b0e000-7f27a9b19000 r--p 00026000 00:19 15656                      /usr/lib/liblzma.so.5.6.4
7f27a9b19000-7f27a9b1a000 r--p 00031000 00:19 15656                      /usr/lib/liblzma.so.5.6.4
7f27a9b1a000-7f27a9b1b000 rw-p 00032000 00:19 15656                      /usr/lib/liblzma.so.5.6.4
7f27a9b1b000-7f27a9b27000 r--p 00000000 00:19 15985                      /usr/lib/libzstd.so.1.5.7
7f27a9b27000-7f27a9bed000 r-xp 0000c000 00:19 15985                      /usr/lib/libzstd.so.1.5.7
7f27a9bed000-7f27a9bfe000 r--p 000d2000 00:19 15985                      /usr/lib/libzstd.so.1.5.7
7f27a9bfe000-7f27a9bff000 r--p 000e3000 00:19 15985                      /usr/lib/libzstd.so.1.5.7
7f27a9bff000-7f27a9c00000 rw-p 000e4000 00:19 15985                      /usr/lib/libzstd.so.1.5.7
7f27aa892000-7f27aa895000 r--p 00000000 00:19 7852                       /usr/lib/libz.so.1.3.1
7f27aa895000-7f27aa8a3000 r-xp 00003000 00:19 7852                       /usr/lib/libz.so.1.3.1
7f27aa8a3000-7f27aa8a9000 r--p 00011000 00:19 7852                       /usr/lib/libz.so.1.3.1
7f27aa8a9000-7f27aa8aa000 r--p 00017000 00:19 7852                       /usr/lib/libz.so.1.3.1
7f27aa8aa000-7f27aa8ab000 rw-p 00018000 00:19 7852                       /usr/lib/libz.so.1.3.1
$

to this:

$ cat /proc/$(pidof systemd-udevd)/maps | grep -e libz -e liblzma
$

… even if all modules in Archlinux are zstd-compressed.

systemd itself started doing this a few releases ago and published https://systemd.io/ELF_PACKAGE_METADATA/. That spec is also used for this new support in kmod to annotate what libraries are possibly dlopen'ed. However although it prevented libkmod from being loaded in other binaries, it didn't prevent all these decompression libraries from being mapped in systemd-udevd since it uses libkmod.

One might wonder why not even libzstd.so is mapped. That's because when loading the modules and the kernel supports decompressing that format, libkmod just skips any decompression: it opens the file and passes the descriptor to the Linux kernel via finit_module. /sys/module/compression shows what algorithm the Linux kernel can use for module decompression.

In kmod 34 all that is needed is to setup the build with -Ddlopen=all. More fine-grained options are also supported, allowing to specify individual libraries to dlopen. It may go away in a future release if distros just choose an all-or-nothing support.

03 Mar 2025 6:00pm GMT

11 Jan 2025

feedKernel Planet

Pete Zaitcev: Looking for a BSSID

I'm looking for a name for a new WiFi area.

The current one is called "Tokyo-Jupiter". It turns out hard to top, it meets all the requirements. It's a geographic area. It's weeb, but from old enough times: not Naruto Shippuuden, Attack On Titan, or Kimetsu no Yaiba. Classy and unique enough.

"Konoha" is too new, too washed-up, and too short.

"Kodena" and "Yokosuka" add a patriotic American tint nicely, but also too short.

"Minas-Tirith" is a place and outstanding in its reference, but not weeb.

"Big-Sight" is an opposite of the above: too much. I'm a weeb, not otaku.

Any ideas are appreciated.

UPDATE 2025-01-11: The provisional candidate is "Nishi-Teppelin". Don't google it, it's not canon. I remain open to better ideas.

UPDATE 2025-02-20: Ended with "Ostrov-Krym" after all.

11 Jan 2025 1:42am GMT

02 Jan 2025

feedKernel Planet

Matthew Garrett: The GPU, not the TPM, is the root of hardware DRM

As part of their "Defective by Design" anti-DRM campaign, the FSF recently made the following claim:
Today, most of the major streaming media platforms utilize the TPM to decrypt media streams, forcefully placing the decryption out of the user's control (from here).
This is part of an overall argument that Microsoft's insistence that only hardware with a TPM can run Windows 11 is with the goal of aiding streaming companies in their attempt to ensure media can only be played in tightly constrained environments.

I'm going to be honest here and say that I don't know what Microsoft's actual motivation for requiring a TPM in Windows 11 is. I've been talking about TPM stuff for a long time. My job involves writing a lot of TPM code. I think having a TPM enables a number of worthwhile security features. Given the choice, I'd certainly pick a computer with a TPM. But in terms of whether it's of sufficient value to lock out Windows 11 on hardware with no TPM that would otherwise be able to run it? I'm not sure that's a worthwhile tradeoff.

What I can say is that the FSF's claim is just 100% wrong, and since this seems to be the sole basis of their overall claim about Microsoft's strategy here, the argument is pretty significantly undermined. I'm not aware of any streaming media platforms making use of TPMs in any way whatsoever. There is hardware DRM that the media companies use to restrict users, but it's not in the TPM - it's in the GPU.

Let's back up for a moment. There's multiple different DRM implementations, but the big three are Widevine (owned by Google, used on Android, Chromebooks, and some other embedded devices), Fairplay (Apple implementation, used for Mac and iOS), and Playready (Microsoft's implementation, used in Windows and some other hardware streaming devices and TVs). These generally implement several levels of functionality, depending on the capabilities of the device they're running on - this will range from all the DRM functionality being implemented in software up to the hardware path that will be discussed shortly. Streaming providers can choose what level of functionality and quality to provide based on the level implemented on the client device, and it's common for 4K and HDR content to be tied to hardware DRM. In any scenario, they stream encrypted content to the client and the DRM stack decrypts it before the compressed data can be decoded and played.

The "problem" with software DRM implementations is that the decrypted material is going to exist somewhere the OS can get at it at some point, making it possible for users to simply grab the decrypted stream, somewhat defeating the entire point. Vendors try to make this difficult by obfuscating their code as much as possible (and in some cases putting some of it in-kernel), but pretty much all software DRM is at least somewhat broken and copies of any new streaming media end up being available via Bittorrent pretty quickly after release. This is why higher quality media tends to be restricted to clients that implement hardware-based DRM.

The implementation of hardware-based DRM varies. On devices in the ARM world this is usually handled by performing the cryptography in a Trusted Execution Environment, or TEE. A TEE is an area where code can be executed without the OS having any insight into it at all, with ARM's TrustZone being an example of this. By putting the DRM code in TrustZone, the cryptography can be performed in RAM that the OS has no access to, making the scraping described earlier impossible. x86 has no well-specified TEE (Intel's SGX is an example, but is no longer implemented in consumer parts), so instead this tends to be handed off to the GPU. The exact details of this implementation are somewhat opaque - of the previously mentioned DRM implementations, only Playready does hardware DRM on x86, and I haven't found any public documentation of what drivers need to expose for this to work.

In any case, as part of the DRM handshake between the client and the streaming platform, encryption keys are negotiated with the key material being stored in the GPU or the TEE, inaccessible from the OS. Once decrypted, the material is decoded (again either on the GPU or in the TEE - even in implementations that use the TEE for the cryptography, the actual media decoding may happen on the GPU) and displayed. One key point is that the decoded video material is still stored in RAM that the OS has no access to, and the GPU composites it onto the outbound video stream (which is why if you take a screenshot of a browser playing a stream using hardware-based DRM you'll just see a black window - as far as the OS can see, there is only a black window there).

Now, TPMs are sometimes referred to as a TEE, and in a way they are. However, they're fixed function - you can't run arbitrary code on the TPM, you only have whatever functionality it provides. But TPMs do have the ability to decrypt data using keys that are tied to the TPM, so isn't this sufficient? Well, no. First, the TPM can't communicate with the GPU. The OS could push encrypted material to it, and it would get plaintext material back. But the entire point of this exercise was to avoid the decrypted version of the stream from ever being visible to the OS, so this would be pointless. And rather more fundamentally, TPMs are slow. I don't think there's a TPM on the market that could decrypt a 1080p stream in realtime, let alone a 4K one.

The FSF's focus on TPMs here is not only technically wrong, it's indicative of a failure to understand what's actually happening in the industry. While the FSF has been focusing on TPMs, GPU vendors have quietly deployed all of this technology without the FSF complaining at all. Microsoft has enthusiastically participated in making hardware DRM on Windows possible, and user freedoms have suffered as a result, but Playready hardware-based DRM works just fine on hardware that doesn't have a TPM and will continue to do so.

comment count unavailable comments

02 Jan 2025 1:14am GMT

12 Dec 2024

feedKernel Planet

Matthew Garrett: When should we require that firmware be free?

The distinction between hardware and software has historically been relatively easy to understand - hardware is the physical object that software runs on. This is made more complicated by the existence of programmable logic like FPGAs, but by and large things tend to fall into fairly neat categories if we're drawing that distinction.

Conversations usually become more complicated when we introduce firmware, but should they? According to Wikipedia, Firmware is software that provides low-level control of computing device hardware, and basically anything that's generally described as firmware certainly fits into the "software" side of the above hardware/software binary. From a software freedom perspective, this seems like something where the obvious answer to "Should this be free" is "yes", but it's worth thinking about why the answer is yes - the goal of free software isn't freedom for freedom's sake, but because the freedoms embodied in the Free Software Definition (and by proxy the DFSG) are grounded in real world practicalities.

How do these line up for firmware? Firmware can fit into two main classes - it can be something that's responsible for initialisation of the hardware (such as, historically, BIOS, which is involved in initialisation and boot and then largely irrelevant for runtime[1]) or it can be something that makes the hardware work at runtime (wifi card firmware being an obvious example). The role of free software in the latter case feels fairly intuitive, since the interface and functionality the hardware offers to the operating system is frequently largely defined by the firmware running on it. Your wifi chipset is, these days, largely a software defined radio, and what you can do with it is determined by what the firmware it's running allows you to do. Sometimes those restrictions may be required by law, but other times they're simply because the people writing the firmware aren't interested in supporting a feature - they may see no reason to allow raw radio packets to be provided to the OS, for instance. We also shouldn't ignore the fact that sufficiently complicated firmware exposed to untrusted input (as is the case in most wifi scenarios) may contain exploitable vulnerabilities allowing attackers to gain arbitrary code execution on the wifi chipset - and potentially use that as a way to gain control of the host OS (see this writeup for an example). Vendors being in a unique position to update that firmware means users may never receive security updates, leaving them with a choice between discarding hardware that otherwise works perfectly or leaving themselves vulnerable to known security issues.

But even the cases where firmware does nothing other than initialise the hardware cause problems. A lot of hardware has functionality controlled by registers that can be locked during the boot process. Vendor firmware may choose to disable (or, rather, never to enable) functionality that may be beneficial to a user, and then lock out the ability to reconfigure the hardware later. Without any ability to modify that firmware, the user lacks the freedom to choose what functionality their hardware makes available to them. Again, the ability to inspect this firmware and modify it has a distinct benefit to the user.

So, from a practical perspective, I think there's a strong argument that users would benefit from most (if not all) firmware being free software, and I don't think that's an especially controversial argument. So I think this is less of a philosophical discussion, and more of a strategic one - is spending time focused on ensuring firmware is free worthwhile, and if so what's an appropriate way of achieving this?

I think there's two consistent ways to view this. One is to view free firmware as desirable but not necessary. This approach basically argues that code that's running on hardware that isn't the main CPU would benefit from being free, in the same way that code running on a remote network service would benefit from being free, but that this is much less important than ensuring that all the code running in the context of the OS on the primary CPU is free. The maximalist position is not to compromise at all - all software on a system, whether it's running at boot or during runtime, and whether it's running on the primary CPU or any other component on the board, should be free.

Personally, I lean towards the former and think there's a reasonably coherent argument here. I think users would benefit from the ability to modify the code running on hardware that their OS talks to, in the same way that I think users would benefit from the ability to modify the code running on hardware the other side of a network link that their browser talks to. I also think that there's enough that remains to be done in terms of what's running on the host CPU that it's not worth having that fight yet. But I think the latter is absolutely intellectually consistent, and while I don't agree with it from a pragmatic perspective I think things would undeniably be better if we lived in that world.

This feels like a thing you'd expect the Free Software Foundation to have opinions on, and it does! There are two primarily relevant things - the Respects your Freedoms campaign focused on ensuring that certified hardware meets certain requirements (including around firmware), and the Free System Distribution Guidelines, which define a baseline for an OS to be considered free by the FSF (including requirements around firmware).

RYF requires that all software on a piece of hardware be free other than under one specific set of circumstances. If software runs on (a) a secondary processor and (b) within which software installation is not intended after the user obtains the product, then the software does not need to be free. (b) effectively means that the firmware has to be in ROM, since any runtime interface that allows the firmware to be loaded or updated is intended to allow software installation after the user obtains the product.

The Free System Distribution Guidelines require that all non-free firmware be removed from the OS before it can be considered free. The recommended mechanism to achieve this is via linux-libre, a project that produces tooling to remove anything that looks plausibly like a non-free firmware blob from the Linux source code, along with any incitement to the user to load firmware - including even removing suggestions to update CPU microcode in order to mitigate CPU vulnerabilities.

For hardware that requires non-free firmware to be loaded at runtime in order to work, linux-libre doesn't do anything to work around this - the hardware will simply not work. In this respect, linux-libre reduces the amount of non-free firmware running on a system in the same way that removing the hardware would. This presumably encourages users to purchase RYF compliant hardware.

But does that actually improve things? RYF doesn't require that a piece of hardware have no non-free firmware, it simply requires that any non-free firmware be hidden from the user. CPU microcode is an instructive example here. At the time of writing, every laptop listed here has an Intel CPU. Every Intel CPU has microcode in ROM, typically an early revision that is known to have many bugs. The expectation is that this microcode is updated in the field by either the firmware or the OS at boot time - the updated version is loaded into RAM on the CPU, and vanishes if power is cut. The combination of RYF and linux-libre doesn't reduce the amount of non-free code running inside the CPU, it just means that the user (a) is more likely to hit since-fixed bugs (including security ones!), and (b) has less guidance on how to avoid them.

As long as RYF permits hardware that makes use of non-free firmware I think it hurts more than it helps. In many cases users aren't guided away from non-free firmware - instead it's hidden away from them, leaving them less aware that their freedom is constrained. Linux-libre goes further, refusing to even inform the user that the non-free firmware that their hardware depends on can be upgraded to improve their security.

Out of sight shouldn't mean out of mind. If non-free firmware is a threat to user freedom then allowing it to exist in ROM doesn't do anything to solve that problem. And if it isn't a threat to user freedom, then what's the point of requiring linux-libre for a Linux distribution to be considered free by the FSF? We seem to have ended up in the worst case scenario, where nothing is being done to actually replace any of the non-free firmware running on people's systems and where users may even end up with a reduced awareness that the non-free firmware even exists.

[1] Yes yes SMM

comment count unavailable comments

12 Dec 2024 3:57pm GMT

Matthew Garrett: Android privacy improvements break key attestation

Sometimes you want to restrict access to something to a specific set of devices - for instance, you might want your corporate VPN to only be reachable from devices owned by your company. You can't really trust a device that self attests to its identity, for instance by reporting its MAC address or serial number, for a couple of reasons:

If we want a high degree of confidence that the device we're talking to really is the device it claims to be, we need something that's much harder to spoof. For devices with a TPM this is the TPM itself. Every TPM has an Endorsement Key (EK) that's associated with a certificate that chains back to the TPM manufacturer. By verifying that certificate path and having the TPM prove that it's in posession of the private half of the EK, we know that we're communicating with a genuine TPM[1].

Android has a broadly equivalent thing called ID Attestation. Android devices can generate a signed attestation that they have certain characteristics and identifiers, and this can be chained back to the manufacturer. Obviously providing signed proof of the device identifier is kind of problematic from a privacy perspective, so the short version[2] is that only apps installed using a corporate account rather than a normal user account are able to do this.

But that's still not ideal - the device identifiers involved included the IMEI and serial number of the device, and those could potentially be used to correlate devices across privacy boundaries since they're static[3] identifiers that are the same both inside a corporate work profile and in the normal user profile, and also remains static if you move between different employers and use the same phone[4]. So, since Android 12, ID Attestation includes an "Enterprise Specific ID" or ESID. The ESID is based on a hash of device-specific data plus the enterprise that the corporate work profile is associated with. If a device is enrolled with the same enterprise then this ID will remain static, if it's enrolled with a different enterprise it'll change, and it just doesn't exist outside the work profile at all. The other device identifiers are no longer exposed.

But device ID verification isn't enough to solve the underlying problem here. When we receive a device ID attestation we know that someone at the far end has posession of a device with that ID, but we don't know that that device is where the packets are originating. If our VPN simply has an API that asks for an attestation from a trusted device before routing packets, we could pass that on to said trusted device and then simply forward the attestation to the VPN server[5]. We need some way to prove that the the device trying to authenticate is actually that device.

The answer to this is key provenance attestation. If we can prove that an encryption key was generated on a trusted device, and that the private half of that key is stored in hardware and can't be exported, then using that key to establish a connection proves that we're actually communicating with a trusted device. TPMs are able to do this using the attestation keys generated in the Credential Activation process, giving us proof that a specific keypair was generated on a TPM that we've previously established is trusted.

Android again has an equivalent called Key Attestation. This doesn't quite work the same way as the TPM process - rather than being tied back to the same unique cryptographic identity, Android key attestation chains back through a separate cryptographic certificate chain but contains a statement about the device identity - including the IMEI and serial number. By comparing those to the values in the device ID attestation we know that the key is associated with a trusted device and we can now establish trust in that key.

"But Matthew", those of you who've been paying close attention may be saying, "Didn't Android 12 remove the IMEI and serial number from the device ID attestation?" And, well, congratulations, you were apparently paying more attention than Google. The key attestation no longer contains enough information to tie back to the device ID attestation, making it impossible to prove that a hardware-backed key is associated with a specific device ID attestation and its enterprise enrollment.

I don't think this was any sort of deliberate breakage, and it's probably more an example of shipping the org chart - my understanding is that device ID attestation and key attestation are implemented by different parts of the Android organisation and the impact of the ESID change (something that appears to be a legitimate improvement in privacy!) on key attestation was probably just not realised. But it's still a pain.

[1] Those of you paying attention may realise that what we're doing here is proving the identity of the TPM, not the identity of device it's associated with. Typically the TPM identity won't vary over the lifetime of the device, so having a one-time binding of those two identities (such as when a device is initially being provisioned) is sufficient. There's actually a spec for distributing Platform Certificates that allows device manufacturers to bind these together during manufacturing, but I last worked on those a few years back and don't know what the current state of the art there is

[2] Android has a bewildering array of different profile mechanisms, some of which are apparently deprecated, and I can never remember how any of this works, so you're not getting the long version

[3] Nominally, anyway. Cough.

[4] I wholeheartedly encourage people not to put work accounts on their personal phones, but I am a filthy hypocrite here

[5] Obviously if we have the ability to ask for attestation from a trusted device, we have access to a trusted device. Why not simply use the trusted device? The answer there may be that we've compromised one and want to do as little as possible on it in order to reduce the probability of triggering any sort of endpoint detection agent, or it may be because we want to run on a device with different security properties than those enforced on the trusted device.

comment count unavailable comments

12 Dec 2024 12:16pm GMT

30 Oct 2024

feedKernel Planet

Pete Zaitcev: virtio_pci: do not wait forvever at a reset

We all know how it's possible for a guest VM to access various host functions by accessing a PCI device, right? When KVM traps an access to this fake PCI, QEMU emulates the device, which allows packets sent, console updated, or whatever. This is called "virtio".

NVIDIA took it a step further: they have a real PCI device that emuilates QEMU. No joke. And, they have a firmware bug! The following patch works around it:

diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c
index 9193c30d640a..6bbb34f9b088 100644
--- a/drivers/virtio/virtio_pci_modern.c
+++ b/drivers/virtio/virtio_pci_modern.c
@@ -438,6 +438,7 @@ static void vp_reset(struct virtio_device *vdev)
 {
        struct virtio_pci_device *vp_dev = to_vp_device(vdev);
        struct virtio_pci_modern_device *mdev = &vp_dev->mdev;
+       int i;
 
        /* 0 status means a reset. */
        vp_modern_set_status(mdev, 0);
@@ -446,8 +447,16 @@ static void vp_reset(struct virtio_device *vdev)
         * This will flush out the status write, and flush in device writes,
         * including MSI-X interrupts, if any.
         */
-       while (vp_modern_get_status(mdev))
+       i = 0;
+       while (vp_modern_get_status(mdev)) {
+               if (++i >= 10000) {
+                       printk(KERN_INFO
+                              "virtio reset ignoring status 0x%02x\n",
+                              vp_modern_get_status(mdev));
+                       break;
+               }
                msleep(1);
+       }
 
        vp_modern_avq_cleanup(vdev);
 

I'm not dumping on NVIDIA here at all, I think it's awesome for this devious hardware to exist. And bugs are just a way of life.

30 Oct 2024 5:58pm GMT

Pete Zaitcev: LinkedIn Asked You To Train Their AI

They pushed the "You're one of a few experts invited to answer" notifications for a long time - maybe a year, I don't remember. When I had enough and started to capture them with the intent of mockery, they stopped. So sad. Here's what I got:

"You're facing pushback from vendors on cloud integration. How can you convince them to collaborate?"

"You're focused on cutting costs in cloud computing. How do you ensure security protocols aren't compromised?"

"You're overseeing a code review process. How do you ensure feedback boosts developer morale?"

What a dystopia. LinkedIn is owned by Microsoft, so I'm not suprised someone in a giant corporation thought this sort of nonsense was a good idea. But still, the future is stupid, and all that.

P.S. The notification inserts were non-persistent - inserted on the fly. That was just fraud w.r.t. the idea of notification ticker.

P.P.S. Does anyone else think that this sort of thing would cause self-selection? They made their AI trained by the most vain and also least bright members of their user population. I'm not an expert in any of these fields.

UPDATE 2024-10-31: Spoke too soon! They hit me with the notificantion insert: "Here's how you can craft a personalized learning plan for advancing in Cloud Computing." That is not even a formed question. Getting lazy, are we?

UPDATE 2024-11-02: "You're facing budget disputes over cloud solutions. How can you align IT and non-technical teams effectively?" They are not stopping.

Meanwhile, how about another perspective: I saw an update that Hubbert Smith contributed an answer to: "You're facing a ransomware attack crisis. How do you convey the severity to a non-technical executive?" Instead of answering what LinkedIn AI asked, he answered a question of how to deal with ransomware ("Ransomware is fixable with snapshots of sensitive data."). Unless he is an AI himself, he may be thinking that he's dealing with a LinkedIn equivalent of Quora.

I'm trying to ask him what happened.

30 Oct 2024 5:17pm GMT

28 Oct 2024

feedKernel Planet

Brendan Gregg: AI Flame Graphs

Imagine halving the resource costs of AI and what that could mean for the planet and the industry -- based on extreme estimates such savings could reduce the total US power usage by over 10% by 20301. At Intel we've been creating a new analyzer tool to help reduce AI costs called AI Flame Graphs: a visualization that shows an AI accelerator or GPU hardware profile along with the full software stack, based on my CPU flame graphs. Our first version is available to customers in the Intel Tiber AI Cloud as a preview for the Intel Data Center GPU Max Series (previously called Ponte Vecchio). Here is an example:


Simple example: SYCL matrix multiply microbenchmark

(Click for interactive SVG.) The green frames are the actual instructions running on the AI or GPU accelerator, aqua shows the source code for these functions, and red (C), yellow (C++), and orange (kernel) show the CPU code paths that initiated these AI/GPU programs. The gray "-" frames just help highlight the boundary between CPU and AI/GPU code. The x-axis is proportional to cost, so you look for the widest things and find ways to reduce them.


Layers

This flame graph shows a simple program for SYCL (a high-level C++ language for accelerators) that tests three implementations of matrix multiply, running them with the same input workload. The flame graph is dominated by the slowest implementation, multiply_basic(), which doesn't use any optimizations and consumes at 72% of stall samples and is shown as the widest tower. On the right are two thin towers for multiply_local_access() at 21% which replaces the accessor with a local variable, and multiply_local_access_and_tiling() at 6% which also adds matrix tiling. The towers are getting smaller as optimizations are added.

This flame graph profiler is a prototype based on Intel EU stall profiling for hardware profiling and eBPF for software instrumentation. It's designed to be easy and low-overhead, just like a CPU profiler. You should be able to generate a flame graph of an existing AI workload whenever you want, without having to restart anything or launch additional code via an interposer.

Instruction-offset Profiling

This is not the first project to build an AI profiler or even something called an AI Flame Graph, however, others I've seen focus on tracing CPU stacks and timing accelerator execution, but don't profile the instruction offsets running on the accelerator; or do profile them but via expensive binary instrumentation. I wanted to build AI flame graphs that work like CPU flame graphs: Easy to use, negligible cost, production safe, and shows everything. A daily tool for developers, with most of the visualization in the language of the developer: source code functions.

This has been an internal AI project at Intel for the past year. Intel was already investing in this space, building the EU stall profiler capability for the Intel Data Center GPU Max Series that provides an approximation of HW instruction sampling. I was lucky to have Dr. Matthew (Ben) Olson, an Intel AI engineer who has also worked on eBPF performance tooling (processwatch) as well as memory management research, join my team and do most of the development work. His background has helped us power through difficulties that seemed insurmountable. We've also recently been joined by Dr. Brandon Kammerdiener (coincidentally another graduate of the University of Tennessee, like Ben), who also has eBPF and memory internals experience, and has been helping us take on harder and harder workloads. And Gabriel Muñoz just joined today to help with releases. Now that our small team has shown that this is possible, we'll be joined by other teams at Intel to develop this further.

We could have built a harder-to-use and higher-overhead version months ago using Intel GTPin but for widespread adoption it needs minimal overhead and ease of use so that developers don't hesitate to use this daily and to add it to deployment pipelines.

What's a Flame Graph?

A flame graph is a visualization I invented in 2011 for showing sampled code stack traces. It has become the standard for CPU profiling and analysis, helping developers quickly find performance improvements and eliminate regressions. A CPU flame graph shows the "big picture" of running software, with x-axis proportional to CPU cost. The example picture on the right summarizes how easy it can be to go from compute costs to responsible code paths. Prior to flame graphs, it could take hours to understand a complex profile by reading through hundreds of pages of output. Now it takes seconds: all you have to do is look for the widest rectangles.

Flame graphs have had worldwide adoption. They have been the basis for five startups so far, have been adopted in over thirty performance analysis products, and have had over eighty implementations.

My first implementation of flame graphs took a few hours on a Wednesday night after work. The real effort has been in the decade since, where I worked with different profilers, runtimes, libraries, kernels, compilers, and hypervisors to get flame graphs working properly in different environments, including fixing stack walking and symbolization. Earlier this year I posted about the final missing piece: Helping distros enable frame pointers so that profiling works across standard system libraries.

Similar work is necessary for AI workloads: fixing stacks and symbols and getting profiling to work for different hardware, kernel drivers, user-mode drivers, frameworks, runtimes, languages, and models. A lot more work, too, as AI analysis has less maturity than CPU analysis.

Searching Samples

If you are new to flame graphs, it's worth highlighting the built-in search capability. In the earlier example, most of the stall samples are caused by sbid: software scoreboard dependency. As that may be a unique search term, you can run search (Ctrl-F, or click "Search") on "sbid" and it will highlight it in magenta:

Search also shows the total number of stack samples that contained sbid in the bottom right: 78.4%. You can search for any term in the flame graph: accelerator instructions, source paths, function names, etc., to quickly calculate the percentage of stacks where it is present (excluding vertical overlap) helping you prioritise performance work.

Note that the samples are EU stall-based, which means theoretical performance wins can take the percentages down to zero. This is different to timer-based samples as are typically used in CPU profiling. Stalls mean you better focus on the pain, the parts of the code that aren't making forward progress, but you aren't seeing resource usage by unstalled instructions. I'd like to supuport timer-based samples in the future as well, so we can have both views.

Who will use this?

At a recent golang conference, I asked the audience of 200+ to raise their hands if they were using CPU flame graphs. Almost every hand went up. I know of companies where flame graphs are a daily tool that developers use to understand and tune their code, reducing compute costs. This will become a daily tool for AI developers.

My employer will use this as well for evaluation analysis, to find areas to tune to beat competitors, as well as to better understand workload performance to aid design.

Why is AI profiling hard?

Consider CPU instruction profiling: This is easy when the program and symbol table are both in the file system and in a standardized file format (such as ELF) as is the case with native compiled code (C). CPU profiling gets hard for JIT-complied code, like Java, as instructions and symbols are dynamically generated and placed in main memory (the process heap) without following a universal standard. For such JITted code we use runtime-specific methods and agents to retrieve snapshots of the heap information, which is different for each runtime.

AI workloads also have different runtimes (and frameworks, languages, user-mode drivers, compilers, etc.) any of which can require special tinkering to get their CPU stacks and symbols to work. These CPU stacks are shown as the red, orange, and yellow frames in the AI Flame Graph. Some AI workloads are easy to get these frames working, some (like PyTorch) are a lot more work.

But the real challenge is instruction profiling of actual GPU and AI accelerator programs -- shown as the aqua and green frames -- and correctly associating them with the CPU stacks beneath them. Not only may these GPU and AI programs not exist in the file system, but they may not even exist in main memory! Even for running programs. Once execution begins, they may be deallocated from main memory and only exist in special accelerator memory, beyond the direct reach of OS profilers and debuggers. Or within reach, but only through a prohibitively high-overhead HW-specific debugger interface.

There's also no /proc representation for these programs either (I've been proposing building an equivalent) so there's no direct way to even tell what is running and what isn't, and all the other /proc details. Forget instruction profiling, even ps(1) and all the other process tools do not work.

It's been a mind-bending experience, revealing what gets taken for granted because it has existed in CPU land for decades: A process table. Process tools. Standard file formats. Programs that exist in the file system. Programs running from main memory. Debuggers. Profiliers. Core dumping. Disassembling. Single stepping. Static and dynamic instrumentation. Etc. For GPUs and AI, this is all far less mature. It can make the work exciting at times, when you think something is impossible and then find or devise a way.

Fortunately we have a head start as some things do exist. Depending on the runtime and kernel driver, there are debug interfaces where you can list running accelerator programs and other statistics, as used by tools like intel_gpu_top(1). You can kill -9 a GPU workload using intel_gpu_abrt(1). Some interfaces can even generate basic ELF files for the running accelerator programs that you can try to load in a debugger like gdb(1). And there is support for GPU/AI program disassembly, if you can get your hands on the binary. It feels to me like GPU/AI debugging, OS style, is about two years old. Better than zero, but still early on, and lots more ahead of us. A decade, at least.

What do AI developers think of this?

We've shown AI Flame Graphs to other AI developers at Intel and a common reaction is to be a bit puzzled, wondering what to do with it. AI developers think about their bit of code, but with AI Flame Graphs they can now see the entire stack for the first time, including the HW, and many layers they don't usually think about or don't know about. It basically looks like a pile of gibberish with their code only a small part of the flame graph.


CPU Flame Graph Implementations

This reaction is similar to people's first experiences with CPU flame graphs, which show parts of the system that developers and engineers typically don't work on, such as runtime internals, system libraries, and kernel internals. Flame graphs are great at highlighting the dozen or so functions that matter the most, so it becomes a problem of learning what those functions do across a few different code bases, which are typically open source. Understanding a dozen such functions can take a few hours or even a few days -- but if this leads to a 10% or 2x cost win, it is time well spent. And the next time the user looks at a flame graph, they start saying "I've seen that function before" and so on. You can get to the point where understanding the bulk of a CPU flame graph takes less than a minute: look for the widest tower, click to zoom, read the frames, done.

I'm encouraged by the success of CPU flame graphs, with over 80 implementations and countless real world case studies. Sometimes I'm browsing a performance issue I care about on github and hit page down and there's a CPU flame graph. They are everywhere.

I expect AI developers will also be able to understand AI Flame Graphs in less than a minute, but to start with people will be spending a day or more browsing code bases they didn't know were involved. Publishing case studies of found wins will also help people learn how to interpret them, and also help explain the value.

What about PyTorch?

Another common reaction we've had is that AI developers are using PyTorch, and initially we didn't support it as it meant walking Python stacks, which isn't trivial. But prior work has been done there (to support CPU profiling) and after a lot of tinkering we now have the first PyTorch AI Flame Graph:


PyTorch frames in pink

(Click for interactive SVG.) The PyTorch functions are at the bottom and are colored pink. This example runs oneDNN kernels that are JIT-generated, and don't have a source path so that layer just reads "jit". Getting all other the layers included was a real pain to get going, but an important milestone. We think if we can do PyTorch we can do anything.

In this flame graph, we show PyTorch running the Llama 2 7B model using the Intel Extensions for PyTorch (IPEX). This flame graph shows the origin of the GPU kernel execution all the way back to the Python source code shown in pink. Most samples are from a stack leading up to a gemm_kernel (matrix multiply) shown in aqua, which like the previous example has many stalls due to software scoreboarding.

There are two instructions here (0xa30 and 0xa90) that combined are 27% of the entire profile. I expect someone will ask: Can't we just click on instructions and have it bring up a dissassembly view with full source? Yes, that should be possible, but I can't answer how we're going to provide this yet. Another expected question I can't yet answer: Since there are now multiple products providing AI auto-tuning of CPU workloads using CPU flame graphs (including Intel Granulate) can't we have AI auto-tuning of AI workloads using AI Flame Graphs?

First Release: Sometimes hard and with moderate overhead

Getting AI Flame Graphs to work with some workloads is easy, but others are currently hard and cost moderate overhead. It's similar to CPU profiling, where some workloads and languages are easy to profile, whereas others need various things fixed. Some AI workloads use many software dependencies that need various tweaks and recompilation (e.g., enabling frame pointers so that stack walking works) making setup time consuming. PyTorch is especially difficult and can take over a week of OS work to be ready for AI Flame Graphs. We will work on getting these tweaks changed upstream in their respective repositories, something involving teams inside and outside of Intel, and is a process I'd expect to take at least a year. During that time AI workloads will gradually become easier to flame graph, and with lower-overhead as well.

I'm reminded of eBPF in the early days: You had to patch and recompile the kernel and LLVM and Clang, which could take multiple days if you hit errors. Since then all the eBPF dependency patches have been merged, and default settings changed, so that eBPF "just works." We'll get there with AI Flame Graphs too, but right now it's still those early days.

The changes necessary for AI Flame Graphs are really about improving debugging in general, and are a requirement for Fast by Friday: A vision where we can root-cause analyze anything in five days or less.

Availability

AI Flame Graphs will first become available on the Intel Tiber AI Cloud as a preview feature for the Intel Data Center GPU Max Series. If you are currently deployed there you can ask through the Intel service channel for early access. As for if or when it will support other hardware types, be in other Intel products, be officially launched, be open source, etc., these involve various other teams at Intel and they need to make their own announcements before I can discuss them here.

Conclusions

Finding performance improvements for AI data centers of just fractions of a percent can add up to planetary savings in electricity, water, and money. If AI flame graphs have the success that CPU flame graphs have had, I'd expect finding improvements of over 10% will be common, and 50% and higher will eventually be found*. But it won't be easy in these early days as there are still many software components to tweak and recompile, and software layers to learn about that are revealed in the AI flame graph.

In the years ahead I imagine others will build their own AI flame graphs that look the same as this one, and there may even be startups selling them, but if they use more difficult-to-use and higher-overhead technologies I fear they could turn companies off the idea of AI flame graphs altogether and prevent them from finding sorely needed wins. This is too important to do badly. AI flame graphs should be easy to use, cost negligible overhead, be production safe, and show everything. Intel has proven it's possible.

Disclaimer

* This is a personal blog post that makes personal predictions but not guarantees of possible performance improvements. Feel free to take any claim with a grain of salt, and feel free to wait for an official publication and public launch by Intel on this technology.

1 Based on halving the Arm CEO Rene Haas' estimate of 20-25% quoted in Taking a closer look at AI's supposed energy apocalypse by Kyle Orland of ArsTechnica.

Thanks

Thanks to everyone at Intel who have helped us make this happen. Markus Flierl has driven this project and made it a top priority, and Greg Lavender has expressed his support. Special thanks to Michael Cole, Matthew Roper, Luis Strano, Rodrigo Vivi, Joonas Lahtinen, Stanley Gambarin, Timothy Bauer, Brandon Yates, Maria Kraynyuk, Denis Samoylov, Krzysztof Raszknowski, Sanchit Jain, Po-Yu Chen, Felix Degrood, Piotr Rozenfeld, Andi Kleen, and all of the other coworkers that helped clear things up for us, and thanks in advance for everyone else who will be helping us in the months ahead.

My final thanks is to the companies and developers who do the actual hands-on work with flame graphs, collecting them, examining them, finding performance wins, and applying them.
You are helping save the planet.

28 Oct 2024 1:00pm GMT

23 Oct 2024

feedKernel Planet

Harald Welte: On Linux MAINTAINERS file removal of Russian developers

I sincerely regret to see Linux kernel patches like this one removing Russian developers from the MAINTAINERS file. To me, it is a sign or maybe even a symbol of how far the Linux kernel developer community I remember from ~ 20 years ago has changed, and how much it has alienated itself from what I remember back in the day.

In my opinion this commit is wrong at so many different levels:

A later post in the thread has clarified that it's about an U.S. embargo list against certain Russian individuals / companies. It is news to me that the MAINTAINERS file was usually containing Companies or that the Linux kernel development is Companies engaging with each other. I was under the naive assumption that it's individual developers who work together, and their employers do not really matter. Contributions are judged by their merit, and not by the author or their employer / affiliation. In the super unlikely case that indeed those individual developers removed from the MAINTAINERS file would be personally listed in the embargo list: Then yes, of course, I agree, they'd have to be removed. But then the commit log should of course point to [the version] of that list and explicitly mention that they were personally listed there.

And no, I am of course not a friend of the Russian government at all. They are committing war crimes, no doubt about it. But since when has the collaboration of individual developers in an open source project been something related to actions completely unrelated to those individuals? Should I as a German developer be excluded due to the track record of Germany having started two world wars killing millions? Should Americans be excluded due to a very extensive track record of violating international law? Should we exclude Palestinians? Israelis? Syrians? Iranians? [In case it's not obvious: Those are rhetorical questions, my position is of course no to all of them].

I just think there's nothing more wrong than discriminating against people just because of their passport, their employer or their place of residence. Maybe it's my German upbringing/socialization, but we've had multiple times in our history where the concept of **Sippenhaft** (kin liability) existed. In those dark ages of history you could be prosecuted for crimes committed by other family members.

Now of course removal from the MAINTAINERS file or any other exclusion from the Linux kernel development process is of course not in any way comparable to prosecution like imprisonment or execution. However, the principle seems the same: An individual is punished for mere association with some others who happen to be committing crimes.

Now if there really was a compelling legal argument for this (I doubt it, but let's assume for a second there is): In that case I'd expect a broad discussion against it; a reluctance to comply with it; a search for a way to circumvent said legal requirement; a petition or political movement against that requirement.

Even if there was absolutely no way around performing such a "removal of names": At the very least I'd expect some civil disobedience by at least then introducing a statement into the file that one would have hoped to still be listing those individuals as co-maintainers but one was forced by [regulation, court order, ...] to remove them.

But the least I would expect is for senior Kernel developers to simply do apply the patch with a one-sentence commit log message and thereby disrespect the work of said [presumed] Russian developers. All that does is to alienate individuals of the developer community. Not just those who are subject to said treatment today, but any others who see this sad example how Linux developers treat each other and feel discouraged from becoming or remaining active in a community with such behaviour.

It literally hurts me personally to see this happening. It's like a kick in the gut. I used to be proud about having had an involvement with the Linux kernel community in a previous life. This doesn't feel like the community I remember being part of.

23 Oct 2024 4:00pm GMT

22 Oct 2024

feedKernel Planet

Harald Welte: Oral history transcripts: Pioneers of Taiwans Chip + PC industry

During the preparation of my current brief visit to Taiwan, I've more or less by coincidence stumbled on several transcripts of oral history interviews with pioneers of the Taiwanese Chip and PC industry (click on the individual transcripts in the Related Records section at the bottom). They have been recorded, transcribed and translated in 2011 by the Computer History Museum under funding from the National Science Council, Taiwan, R.O.C..

As some of you know, I've been spending a lot of time in recent years researching (and practically exploring + re-implementing) historical telecommunications with my retronetworking project.

Retrocomputing itself is not my main focus. I usually feel there's more than enough people operating, repairing, documenting at least many older computers, as well as keeping archives of related software and continuing to spread knowledge on how they operated. Nevertheless, it is a very interesting topic - I just decided that with my limited spare time I want to focus on retro-communications which is under-explored and under-represented.

What's equally important than keeping the old technology alive, is keeping the knowledge around its creation alive. How did it happen that certain technologies were created and became successful or not? How where they key people behind it? etc.

Given my personal history with Taiwan during the last 18 years, it's actually surprising I haven't yet given thought on how or where the history of the Taiwanese IT industry is documented or kept alive. So far I didn't know of any computer museums that would focus especially on the Taiwanese developments. It didn't even occur to me to even check if there are any.

During my work in Taiwan I've had the chance to briefly meet a few senior people at FIC (large mainboard maker that made many PC mainboards I personally used) and both at VIA (chipset + CPU maker). But I didn't ever have a chance to talk about the history.

In any case, I now found those transcripts of interviews. And what a trove of interesting first-hand information they are! If you have an interest in computer history, and want to understand how it came about that Taiwan became such a major player in either the PC industry or in the semiconductor design + manufacturing, then I believe those transcripts are a "must read".

Now they've made me interested to learn more. I have little hope of many books being published on that subject, particularly in a Language I can read (i.e. English, not mandarin Chinese). But I shall research that subject. I'd also be interested to hear about any other information, like collections of historical artifacts, archives, libraries, etc. So in the unlikely case anybody reading this has some pointers on information about the history of the Taiwanese Chip and Computer history, please by all means do reach out and share!.

Once I have sufficiently prepared myself in reading whatever I can find in terms of written materials, I might be tempted to try to reach out and see if I can find some first-hand witnesses who'd want to share their stories on a future trip to Taiwan...

22 Oct 2024 4:00pm GMT

Harald Welte: Back to Taiwan the first time after 5 years

Some of the readers of this blog know that I have a very special relationship with Taiwan. As a teenager, it was the magical far-away country that built most of the PC components in all my PCs since my first 286-16 I got in 1989. Around 2006-2008 I had the very unexpected opportunity to work in Taiwan for some time (mainly for Openmoko, later some consulting for VIA). During that time I have always felt most welcome in and fascinated by the small island nation who managed to turn themselves into a high-tech development and manufacturing site for ever more complex electronics. And who managed to evolve from decades of military dictatorship and turn into a true democracy - all the while being discriminated by pretty much all of the countries around the world, as everybody wanted to benefit from cheap manufacturing in mainland China and hence expel democratic Taiwan from the united nations in favour of communist mainland Chine.

I have the deepest admiration for Taiwan to manage all of their economic success and progress in terms of democracy and freedom despite the political situation across the Taiwan strait, and despite everything that comes along with it. May they continue to have the chance of continuing their path.

Setting economy, society and politics behind: On a more personal level I've enjoyed their culinary marvels from excellent dumplings around every street corner to niu rou mien (beef noodle soup) to ma la huo guo (spicy hot pot). Plus then the natural beauty, particularly of the rural mountainous regions once you leave the densely populated areas around the coast line and the plains of the north west.

While working in Taiwan in 2006/2007 I decided to buy a motorbike. Using that bike I've first made humble day trips and later (once I was no longer busy with stressful work at Openmoko) multiple week-long road trips around the island, riding on virtually any passable road you can find. My typical routing algorithm is "take the smallest possible road from A to B".

So even after concluding my work in Taiwan, I returned again and again for holidays, each one with more road trips. For some time, Taiwan had literally become my second home. I had my favorite restaurants, shops, as well as some places around the rural parts of the Island I cam back to several times. I even managed to take up some mandarin classes, something I never had the time for while doing [more than] full time work. To my big regret, it's still very humble beginner level; I guess had I not co-started a company (sysmocom) in Berlin in 2011, I'd have spent more time for a more serious story.

In any case, I have nothing but the fondest memory of Taiwan. My frequent visits cam to a forcible halt with the COVID-19 pandemic, Taiwan was in full isolation in 2020/21, and even irrespective of government regulations, I've been very cautious about travel and contact. Plus of course, there's always the bad conscience of frequent intercontinental air travel.

Originally I was planning to finally go on an extended Taiwan holiday in Summer 2024, but then the island was hit by a relatively serious earthquake in April, affecting particularly many of the remote mountain regions that are of main interest to me. There are some roads that I'd have wanted to ride ever since 2008, but which had been closed every successive year when I went there, due to years of reconstructions after [mostly landslides following] earthquakes and typhoons. So I decided to postpone it for another year to 2025.

However, in an unexpected change of faith, the opportunity arose to give the opening Keyonte at the 2024 Open Compliance Summit in Japan, and along with that the opportunity to do a stop-over in Taiwan. It will just be a few days of Taipei this time (no motorbike trips), but I'm very much looking forward to being back in the city I probably know second or third-best on the planet (after Berlin, my home for 23 years, as well as Nuernberg, my place of birth). Let's see what is still the same and what has changed during the past 5 years!

22 Oct 2024 4:00pm GMT

10 Oct 2024

feedKernel Planet

Paul E. Mc Kenney: Parallel Programming: Cooperation

First, let me paraphrase something from my LiveJournal profile: These posts are my own, and in particular do not necessarily reflect my employer's positions, strategies, or opinions.

With that said, some say that the current geopolitical outlook is grim. And far be it from me to minimize the present-day geopolitical problems, nor am I at all interested in comparing them to their counterparts in the "good old days". But neither do I wish to obsess on these problems. I will instead call attention to a few instances of global cooperation, current and past.

Last month, NASA's oldest active astronaut traveled to Kazakhstan's Baikonur Cosmodrome, entered a Soyuz capsule atop a Roscosmos rocket and flew to the International Space Station. For me, this is especially inspiring: If he can do that at age 69, I should certainly be able to continue doing my much less demanding job for many years to come.

Some decades ago, during the Cold War, I purchased an English translation of Gradshteyn's and Ryzhik's classic "Table of Integrals, Series, and Products". Although computer-algebra systems have largely replaced this book, I have used it within the past few years and I used it heavily in the 1980s and early 1990s. Thus, along with many others, I am indebted to the longstanding Russian tradition of excellence in mathematics.

So just this past month, I was happy to receive hard copies of "Параллельное программирование - так ли это сложно?", which is a Russian translation of "Is Parallel Programming Hard, And, If So, What Can You Do About It?" I would like to think that this might be a down payment on my aforementioned debt.

Many other countries have also made many excellent contributions to mathematics, science, and technology. For example, the smartphone that I used hails from South Korea. And earlier this year, SeongJae (SJ) Park completed a Korean translation of the Second Edition of "Is Parallel Programming Hard, And, If So, What Can You Do About It?"

Returning to rocketry, China started working with rockets in the 1200s, if not earlier, and has made a great deal of more recent progress in a wide variety of fields. And rumor has it that a Chinese translation of the Second Edition will be appearing shortly.

So if you tried reading this book, but the English got in the way, you now have two other options and hopefully soon a third! But what if you want a fourth option? Then you, too, can do a translation! Just send me a translated chapter and I will add it to the list in the book's FAQ.txt file.


10 Oct 2024 5:16pm GMT

06 Oct 2024

feedKernel Planet

Pete Zaitcev: Adventures in proprietary software, Solidworks edition

Because FreeCAD was such a disaster for me, I started looking at crazy solutions, like exporting STEP from OpenSCAD. I even stooped to looking at proprietary alternatives. First on the runway was SolidWorks. If it's good for Mark Serbu, surely it's good for me, right?

The first thing I found, you cannot tap your card and download. You have to contact a partner representative - never a good sign. The representative quoted me for untold thousands. I'm not going to post the amount, I'm sure they vary it every time, like small shop owners who vary prices according to the race of the shopper.

In addition, they spam like you would not believe. First you have to unsubscribe from the partner, next from community.3ds.com, next from draftsight.3ds.com, and so on. Eventually, you'll get absolutely random spam, you try to unsubscribe, and they just continue and spam. Fortunately, I used a one-time address, and I killed it. Phew.

06 Oct 2024 4:39pm GMT

04 Oct 2024

feedKernel Planet

Dave Airlie (blogspot): zinking the video

A few years ago Mike and I discussed adding video support to zink, so that we could provide vaapi on top of vulkan video implementations.

This of course got onto a long TODO list and we nerdsniped each other into moving it along, this past couple of weeks we finally dragged it over the line.

This MR adds initial support for zink video decode on top of Vulkan Video. It provides vaapi support. Currently it only support H264 decode, but I've implemented AV1 decode and I've played around a bit with H264 encode. I think adding H265 decode shouldn't be too horrible.

I've tested this mainly on radv, and a bit on anv (but there are some problems I should dig into).


04 Oct 2024 1:00am GMT