17 Nov 2025

feedplanet.freedesktop.org

Lennart Poettering: Mastodon Stories for systemd v258

Already on Sep 17 we released systemd v258 into the wild.

In the weeks leading up to that release I have posted a series of serieses of posts to Mastodon about key new features in this release, under the #systemd258 hash tag. It was my intention to post a link list here on this blog right after completing that series, but I simply forgot! Hence, in case you aren't using Mastodon, but would like to read up, here's a list of all 37 posts:

I intend to do a similar series of serieses of posts for the next systemd release (v259), hence if you haven't left tech Twitter for Mastodon yet, now is the opportunity.

We intend to shorten the release cycle a bit for the future, and in fact managed to tag v259-rc1 already yesterday, just 2 months after v258. Hence, my series for v259 will begin soon, under the #systemd259 hash tag.

In case you are interested, here is the corresponding blog story for systemd v257, and here for v256.

17 Nov 2025 11:00pm GMT

15 Nov 2025

feedplanet.freedesktop.org

Simon Ser: Status update, November 2025

Hi!

This month a lot of new features have added to the Goguma mobile IRC client. Hubert Hirtz has implemented drafts so that unsent text gets saved and network disconnections don't disrupt users typing a message. He also enabled replying to one's own messages, changed the appearance of short messages containing only emoji, upgraded our emoji library to Unicode version 16, fixed some linkifier bugs and added unit tests.

Markus Cisler has added a new option in the message menu to show a user's profile. I've added an on-disk cache for images (with our own implementation, because the widely used cached_network_image package is heavyweight). I've been working on displaying network icons and blocking users, but that work is not finished yet. I've also contributed some maintenance fixes for our webcrypto.dart dependency (toolkit upgrades and CI fixes).

The soju IRC bouncer has also got some love this month. delthas has contributed support for labeled-response for soju clients, allowing more reliable matching of server replies with client commands. I've introduced a new icon directive to configure an image representing the bouncer. soju v0.10.0 has been released, followed by soju v0.10.1 including bug fixes from Karel Balej and Taavi Väänänen.

In Wayland news, wlroots v0.19.2 and v0.18.3 have been released thanks to Simon Zeni. I've added support for the color-representation protocol for the Vulkan renderer, allowing clients to configure the color encoding and range for YCbCr content. Félix Poisot has been hard at work with more color management patches: screen default color primaries are now extracted from the EDID and exposed to compositors, the cursor is now correctly converted to the output's primaries and transfer function, and some work-in-progress patches switch the renderer API from a descriptive model to a prescriptive model.

go-webdav v0.7.0 has been released with a patch from prasad83 to play well with Thunderbird. I've updated clients to make multi-status errors non-fatal, returning partial data alongside the error.

I've released drm_info v2.9.0 with improvements mentioned in the previous status update plus support for the TILE connector property.

See you next month!

15 Nov 2025 10:00pm GMT

10 Nov 2025

feedplanet.freedesktop.org

Dave Airlie (blogspot): a tale of vulkan/nouveau/nvk/zink/mutter + deadlocks

I had a bug appear in my email recently which led me down a rabbit hole, and I'm going to share it for future people wondering why we can't have nice things.

Bug:

1. Get an intel/nvidia (newer than Turing) laptop.

2. Log in to GNOME on Fedora 42/43

3. Hotplug a HDMI port that is connected to the NVIDIA GPU.

4. Desktop stops working.

My initial reproduction got me a hung mutter process with a nice backtrace which pointed at the Vulkan Mesa device selection layer, trying to talk to the wayland compositor to ask it what the default device is. The problem was the process was the wayland compositor, and how was this ever supposed to work. The Vulkan device selection was called because zink called EnumeratePhysicalDevices, and zink was being loaded because we recently switched to it as the OpenGL driver for newer NVIDIA GPUs.

I looked into zink and the device select layer code, and low and behold someone has hacked around this badly already, and probably wrongly and I've no idea what the code does, because I think there is at least one logic bug in it. Nice things can't be had because hacks were done instead of just solving the problem.

The hacks in place ensured under certain circumstances involving zink/xwayland that the device select code to probe the window system was disabled, due to deadlocks seen. I'd no idea if more hacks were going to help, so I decided to step back and try and work out better.

The first question I had is why WAYLAND_DISPLAY is set inside the compositor process, it is, and if it wasn't I would never hit this. It's pretty likely on the initial compositor start this env var isn't set, so the problem only becomes apparent when the compositor gets a hotplugged GPU output, and goes to load the OpenGL driver, zink, which enumerates and hits device select with env var set and deadlocks.

I wasn't going to figure out a way around WAYLAND_DISPLAY being set at this point, so I leave the above question as an exercise for mutter devs.

How do I fix it?

Attempt 1:

At the point where zink is loading in mesa for this case, we have the file descriptor of the GPU device that we want to load a driver for. We don't actually need to enumerate all the physical devices, we could just find the ones for that fd. There is no API for this in Vulkan. I wrote an initial proof of concept instance extensions call VK_MESA_enumerate_devices_fd. I wrote initial loader code to play with it, and wrote zink code to use it. Because this is a new instance API, device-select will also ignore it. However this ran into a big problem in the Vulkan loader. The loader is designed around some internals that PhysicalDevices will enumerate in similiar ways, and it has to trampoline PhysicalDevice handles to underlying driver pointers so that if an app enumerates once, and enumerates again later, the PhysicalDevice handles remain consistent for the first user. There is a lot of code, and I've no idea how hotplug GPUs might fail in such situations. I couldn't find a decent path forward without knowing a lot more about the Vulkan loader. I believe this is the proper solution, as we know the fd, we should be able to get things without doing a full enumeration then picking the answer using the fd info. I've asked Vulkan WG to take a look at this, but I still need to fix the bug.

Attempt 2:

Maybe I can just turn off device selection, like the current hacks do, but in a better manner. Enter VK_EXT_layer_settings. This extensions allows layers to expose a layer setting in the instance creation. I can have the device select layer expose a setting which says don't touch this instance. Then in the zink code where we have a file descriptor being passed in and create an instance, we set the layer setting to avoid device selection. This seems to work but it has some caveats, I need to consider, but I think should be fine.

zink uses a single VkInstance for it's device screen. This is shared between all pipe_screens. Now I think this is fine inside a compositor, since we shouldn't ever be loading zink via the non-fd path, and I hope for most use cases it will work fine, better than the current hacks and better than some other ideas we threw around. The code for this is in [1].

What else might be affected:

If you have a vulkan compositor, it might be worth setting the layer setting if the mesa device select layer is loaded, esp if you set the DISPLAY/WAYLAND_DISPLAY and do any sort of hotplug later. You might be safe if you EnumeratePhysicalDevices early enough, the reason it's a big problem in mutter is it doesn't use Vulkan, it uses OpenGL and we only enumerate Vulkan physical devices at runtime through zink, never at startup.

AMD and NVIDIA I think have proprietary device selection layers, these might also deadlock in similiar ways, I think we've seen some wierd deadlocks in NVIDIA driver enumerations as well that might be a similiar problem.

[1] https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38252

10 Nov 2025 3:16am GMT

04 Nov 2025

feedplanet.freedesktop.org

Sebastian Wick: Flatpak Happenings

Yesterday I released Flatpak 1.17.0. It is the first version of the unstable 1.17 series and the first release in 6 months. There are a few things which didn't make it for this release, which is why I'm planning to do another unstable release rather soon, and then a stable release still this year.

Back at LAS this year I talked about the Future of Flatpak and I started with the grim situation the project found itself in: Flatpak was stagnant, the maintainers left the project and PRs didn't get reviewed.

Some good news: things are a bit better now. I have taken over maintenance, Alex Larsson and Owen Taylor managed to set aside enough time to make this happen and Boudhayan Bhattcharya (bbhtt) and Adrian Vovk also got more involved. The backlog has been reduced considerably and new PRs get reviewed in a reasonable time frame.

I also listed a number of improvements that we had planned, and we made progress on most of them:

Besides the changes directly in Flatpak, there are a lot of other things happening around the wider ecosystem:

What I have also talked about at my LAS talk is the idea of a Flatpak-Next project. People got excited about this, but I feel like I have to make something very clear:

If we redid Flatpak now, it would not be significantly better than the current Flatpak! You could still not do nested sandboxing, you would still need a D-Bus proxy, you would still have a complex permission system, and so on.

Those problems require work outside of Flatpak, but have to integrate with Flatpak and Flatpak-Next in the future. Some of the things we will be doing include:

So if you're excited about Flatpak-Next, help us to improve the Flatpak ecosystem and make Flatpak-Next more feasible!

04 Nov 2025 8:28pm GMT

03 Nov 2025

feedplanet.freedesktop.org

Melissa Wen: Kworkflow at Kernel Recipes 2025

Franks drawing of Melissa Wen with Kernel Recipes mascots around

This was the first year I attended Kernel Recipes and I have nothing but say how much I enjoyed it and how grateful I'm for the opportunity to talk more about kworkflow to very experienced kernel developers. What I mostly like about Kernel Recipes is its intimate format, with only one track and many moments to get closer to experts and people that you commonly talk online during your whole year.

In the beginning of this year, I gave the talk Don't let your motivation go, save time with kworkflow at FOSDEM, introducing kworkflow to a more diversified audience, with different levels of involvement in the Linux kernel development.

At this year's Kernel Recipes I presented the second talk of the first day: Kworkflow - mix & match kernel recipes end-to-end.

The Kernel Recipes audience is a bit different from FOSDEM, with mostly long-term kernel developers, so I decided to just go directly to the point. I showed kworkflow being part of the daily life of a typical kernel developer from the local setup to install a custom kernel in different target machines to the point of sending and applying patches to/from the mailing list. In short, I showed how to mix and match kernel workflow recipes end-to-end.

As I was a bit fast when showing some features during my presentation, in this blog post I explain each slide from my speaker notes. You can see a summary of this presentation in the Kernel Recipe Live Blog Day 1: morning.


Introduction

First slide: Kworkflow by Melissa Wen

Hi, I'm Melissa Wen from Igalia. As we already started sharing kernel recipes and even more is coming in the next three days, in this presentation I'll talk about kworkflow: a cookbook to mix & match kernel recipes end-to-end.

Second slide: About Melissa Wen, the speaker of this talk

This is my first time attending Kernel Recipes, so lemme introduce myself briefly.

Slide 3: and what's this cookbook called Kwokflow? - with kworkflow logo and KR penguin

And what's this cookbook called kworkflow?

Kworkflow (kw)

Slide 4: text below

Kworkflow is a tool created by Rodrigo Siqueira, my colleague at Igalia. It's a single platform that combines software and tools to:

Slide 5: kworkflow is mostly a voluntary work

It's mostly done by volunteers, kernel developers using their spare time. Its features cover real use cases according to kernel developer needs.

Slide 6: Mix & Match the daily life of a kernel developer

Basically it's mixing and matching the daily life of a typical kernel developer with kernel workflow recipes with some secret sauces.

First recipe: A good GPU driver for my AMD laptop

Slide 7: Let's prepare our first recipe

So, it's time to start the first recipe: A good GPU driver for my AMD laptop.

Slide 8: Ingredients and Tools

Before starting any recipe we need to check the necessary ingredients and tools. So, let's check what you have at home.

With kworkflow, you can use:

Slide 9: kw device and kw remote

Slide 11: kw config

Slide 13: kw kernel-config-manager

Slide 15: Preparation

Now, with all ingredients and tools selected and well portioned, follow the right steps to prepare your custom kernel!

First step: Mix ingredients with kw build or just kw b

Slide 16: kw build

Second step: Bake it with kw deploy or just kw d

Slide 18: kw deploy

After compiling the custom kernel, we want to install it in the target machine. Check the name of the custom kernel built: 6.17.0-rc6 and with kw s SSH access the target machine and see it's running the kernel from the Debian distribution 6.16.7+deb14-amd64.

As with building settings, you can also pre-configure some deployment settings, such as compression type, path to device tree binaries, target machine (remote, local, vm), if you want to reboot the target machine just after deploying your custom kernel, and if you want to boot in the custom kernel when restarting the system after deployment.

If you didn't pre-configured some options, you can still customize as a command option, for example: kw d --reboot will reboot the system after deployment, even if I didn't set this in my preference.

With just running kw d --reboot I have installed the kernel in a given target machine and rebooted it. So when accessing the system again I can see it was booted in my custom kernel.

Third step: Time to taste with kw debug

Slide 20: kw debug

Cooking Problems

Slide 22: kw patch-hub

Oh no! That custom kernel isn't tasting good. Don't worry, as in many recipes preparations, we can search on the internet to find suggestions on how to make it tasteful, alternative ingredients and other flavours according to your taste.

With kw patch-hub you can search on the lore kernel mailing list for possible patches that can fix your kernel issue. You can navigate in the mailing lists, check series, bookmark it if you find it relevant and apply it in your local kernel tree, creating a different branch for tasting… oops, for testing. In this example, I'm opening the amd-gfx mailing list where I can find contributions related to the AMD GPU driver, bookmark and/or just apply the series to my work tree and with kw bd I can compile & install the custom kernel with this possible bug fix in one shot.

As I changed my kw config to reboot after deployment, I just need to wait for the system to boot to try again unloading the amdgpu driver with kw debug --dmesg --cm=modprobe -r amdgpu. From the dmesg output retrieved by kw for this command, the driver was unloaded, the problem is fixed by this series and the kernel tastes good now.

If I'm satisfied with the solution, I can even use kw patch-hub to access the bookmarked series and marking the checkbox that will reply the patch thread with a Reviewed-by tag for me.

Second Recipe: Raspberry Pi 4 with Upstream Kernel

Slide 25: Second Recipe RPi 4 with upstream kernel

As in all recipes, we need ingredients and tools, but with kworkflow you can get everything set as when changing scenarios in a TV show. We can use kw env to change to a different environment with all kw and kernel configuration set and also with the latest compiled kernel cached.

I was preparing the first recipe for a x86 AMD laptop and with kw env --use RPI_64 I use the same worktree but moved to a different kernel workflow, now for Raspberry Pi 4 64 bits. The previous compiled kernel 6.17.0-rc6-mainline+ is there with 1266 modules, not the 6.17.0-rc6 kernel with 285 modules that I just built&deployed. kw build settings are also different, now I'm targeting a arm64 architecture with a cross-compiled kernel using aarch64-linu-gnu- cross-compilation tool and my kernel image calls kernel8 now.

Slide 27: kw env

If you didn't plan for this recipe in advance, don't worry. You can create a new environment with kw env --create RPI_64_V2 and run kw init --template to start preparing your kernel recipe with the mirepoix ready.

I mean, with the basic ingredients already cut…

I mean, with the kw configuration set from a template.

And you can use kw remote to set the IP address of your target machine and kw kernel-config-manager to fetch/retrieve the .config file from your target machine. So just run kw bd to compile and install a upstream kernel for Raspberry Pi 4.

Third Recipe: The Mainline Kernel Ringing on my Steam Deck (Live Demo)

Slide 30: Third Recipe - The Mainline Kernel Ringing on my Steam Deck

Let's show you how easy is to build, install and test a custom kernel for Steam Deck with Kworkflow. It's a live demo, but I also recorded it because I know the risks I'm exposed to and something can go very wrong just because of reasons :)

Report: how was the live demo

For this live demo, I took my OLED Steam Deck to the stage. I explained that, if I boot mainline kernel on this device, there is no audio. So I turned it on and booted the mainline kernel I've installed beforehand. It was clear that there was no typical Steam Deck startup audio when the system was loaded.

Franks drawing of Melissa Wen doing a demo of kworkflow with the Steam Deck

As I started the demo in the kw environment for Raspberry Pi 4, I first moved to another environment previously used for Steam Deck. In this STEAMDECK environment, the mainline kernel was already compiled and cached, and all settings for accessing the target machine, compiling and installing a custom kernel were retrieved automatically.

My live demo followed these steps:

  1. With kw env --use STEAMDECK, switch to a kworkflow environment for Steam Deck kernel development.

  2. With kw b -i, shows that kw will compile and install a kernel with 285 modules named 6.17.0-rc6-mainline-for-deck.

  3. Run kw config to show that, in this environment, kw configuration changes to x86 architecture and without cross-compilation.

  4. Run kw device to display information about the Steam Deck device, i.e. the target machine. It also proves that the remote access - user and IP - for this Steam Deck was already configured when using the STEAMDECK environment, as expected.

  5. Using git am, as usual, apply a hot fix on top of the mainline kernel. This hot fix makes the audio play again on Steam Deck.

  6. With kw b, build the kernel with the audio change. It will be fast because we are only compiling the affected files since everything was previously done and cached. Compiled kernel, kw configuration and kernel configuration is retrieved by just moving to the "STEAMDECK" environment.

  7. Run kw d --force --reboot to deploy the new custom kernel to the target machine. The --force option enables us to install the mainline kernel even if mkinitcpio complains about missing support for downstream packages when generating initramfs. The --reboot option makes the device reboot the Steam Deck automatically, just after the deployment completion.

  8. After finishing deployment, the Steam Deck will reboot on the new custom kernel version and made a clear resonant or vibrating sound. [Hopefully]

Finally, I showed to the audience that, if I wanted to send this patch upstream, I just needed to run kw send-patch and kw would automatically add subsystem maintainers, reviewers and mailing lists for the affected files as recipients, and send the patch to the upstream community assessment. As I didn't want to create unnecessary noise, I just did a dry-run with kw send-patch -s --simulate to explain how it looks.

What else can kworkflow already mix & match?

In this presentation, I showed that kworkflow supported different kernel development workflows, i.e., multiple distributions, different bootloaders and architectures, different target machines, different debugging tools and automatize your kernel development routines best practices, from development environment setup and verifying a custom kernel in bare-metal to sending contributions upstream following the contributions-by-e-mail structure. I exemplified it with three different target machines: my ordinary x86 AMD laptop with Debian, Raspberry Pi 4 with arm64 Raspbian (cross-compilation) and the Steam Deck with SteamOS (x86 Arch-based OS). Besides those distributions, Kworkflow also supports Ubuntu, Fedora and PopOS.

Now it's your turn: Do you have any secret recipes to share? Please share with us via kworkflow.


Useful links

03 Nov 2025 9:30pm GMT

31 Oct 2025

feedplanet.freedesktop.org

Mike Blumenkrantz: Hibernate On

Take A Break

We've reached Q4 of another year, and after the mad scramble that has been crunch-time over the past few weeks, it's time for SGC to once again retire into a deep, restful sleep.

2025 saw a lot of ground covered:

It's been a real roller coaster ride of a year as always, but I can say authoritatively that fans of the blog, you need to take care of yourselves. You need to use this break time wisely. Rest. Recover. Train your bodies. Travel and broaden your horizons. Invest in night classes to expand your minds.

You are not prepared for the insanity that will be this blog in 2026.

31 Oct 2025 12:00am GMT

27 Oct 2025

feedplanet.freedesktop.org

Mike Blumenkrantz: Apitrace Goes Vroom

First Time

Today marks the first post of a type that I've wanted to have for a long while: a guest post. There are lots of graphics developers who work on cool stuff and don't want to waste time setting up blogs, but with enough cajoling they will write a single blog post. If you're out there thinking you just did some awesome work and you want the world to know the grimy, gory details, let me know.

The first victimrecipient of this honor is an individual famous for small and extremely sane endeavors such as descriptor buffers in Lavapipe, ray tracing in Lavapipe, and sparse support in Lavapipe. Also wrangling ray tracing for RADV.

Below is the debut blog post by none other than Konstantin Seurer.

What is apitrace?

Apitrace is a powerful tool for capturing and replaying traces of GL and DX applications. The problem is that it is not really suitable for performance testing. This blog post is about implementing a faster method for replaying traces.

About six weeks ago, Mike asked me if I wanted to work on this.

[6:58:58 pm] <zmike> on the topic of traces
[6:59:08 pm] <zmike> I have a longer-term project that could use your expertise
[6:59:19 pm] <zmike> it's low work but high complexity
[7:00:12 pm] <zmike> specifically I would like apitrace to be able to competently output C code from traces and to have this functionality merged upstream

low work

Sure. clueless.png

The state of glretrace

This first obvious step was measuring how glretrace currently performs. Mike kindly provided a couple of traces from his personal collection, and I immediately timed a trace of the only relevant OpenGL game:

$ time ./glretrace -b minecraft-perf.trace
/Users/Cortex/Downloads/graalvm-jdk-23.0.1+11.1/bin/java
Rendered 1261 frames in 10.4269 secs, average of 120.937 fps

real    0m10.554s
user    0m12.938s
sys     0m2.712s

This looks fine, but I have no idea how fast the application is supposed to run. Running the same trace with perf reveals that there is room for improvement.

trace_parse_time.png

2/3 of frametime is spent parsing the trace.

Implementation

An apitrace trace stores API call information in an object-oriented style. This makes basic codegen really easy because the objects map directly to the generated C/C++ code. However, not all API calls are made equal, and the countless special cases that I needed to handle are what made this project take so long.

glretrace has custom implementations for WSI API calls, and it would be a shame not to use them. The easiest way of doing that is generating a shared library instead of an executable and having glretrace load it. The shared library can then provide a bunch of callbacks for the call sequences we can do codegen for and Call objects for everything else.

Besides WSI, there are also arguments and return values that need special treatment. OpenGL allows the application to create all kinds of objects that are represented using IDs. Those IDs are assigned by the driver, and they can be different during replay. glretrace remaps them using std::maps which have non-trivial overhead. I initially did that as well for the codegen to get things up and running, but it is actually possible to emit global variables and have most of the remapping run logic during codegen.

Data streaming

With the main replay overhead being taken care of, a major amount of replay time is now spent loading texture and buffer data. In large traces, there can also be >10GiB of data, so loading everything upfront is not an option. I decided to create one thread for reading the data file and nproc decompression threads. The read thread will wait if enough data has been loaded to limit memory usage. Decompression threads are needed because decompression is slower than reading the compressed data.

codegen in action

The results speak for themselves:

$ ./glretrace --generate-c minecraft-perf minecraft-perf.trace
/Users/Cortex/Downloads/graalvm-jdk-23.0.1+11.1/bin/java
Rendered 0 frames in 79.4072 secs, average of 0 fps
$ cd minecraft-perf
$ echo "Invoke the superior build tool"
$ meson build --buildtype release
$ ninja -Cbuild
$ time ../glretrace build/minecraft-perf.so
info: Opening 'minecraft-perf.so'... (0.00668795 secs)
warning: Waited 0.0461142 secs for data (sequence = 19)
Rendered 1261 frames in 5.19587 secs, average of 242.693 fps

real    0m5.415s
user    0m5.429s
sys     0m4.983s

Nice.

Looking at perf most CPU time is now spent in driver code or streaming binary data for stuff like textures on a separate thread.

result_perf.png

If you are interested in trying this out yourself, feel free to build the upstream PR and report on bugs unintended features. It would also be nice to have DX support in the future, but that will be something for the dxvk developers unless I need something to procrastinate from doing RT work.

- Konstantin

27 Oct 2025 12:00am GMT

15 Oct 2025

feedplanet.freedesktop.org

Simon Ser: Status update, October 2025

Hi!

I skipped last month's status update because I hadn't collected a lot of interesting updates and I've dedicated my time to writing an announcement for the first vali release.

Earlier this month, I've taken the train to Vienna to attend XDC 2025. The conference was great, I really enjoyed discussing face-to-face with open-source graphics folks I usually only interact with online, and meeting new awesome people! Since I'm part of the X.Org Foundation board, it was also nice to see the fruit of our efforts. Many thanks to all organizers!

XDC 2025 main room

We've discussed many interesting topics: a new API for 2D acceleration hardware, adapting the Wayland linux-dmabuf protocol to better support multiple GPUs, some ways to address current Wayback limitations, ideas to improve libliftoff, Vulkan use in Wayland compositors, and a lot more.

On the wlroots side, I've worked on a patch to fallback to the renderer to apply gamma LUTs when the KMS driver doesn't support them (this also paves the way for applying color transforms in KMS). Félix Poisot has updated wlroots to support the gamma 2.2 transfer function and use it by default. llyyr has added support for the BT.1886 transfer function and fixed direct scanout for client using the gamma 2.2 transfer function.

I've sent a patch to add support for DisplayID v2 CTA-861 data blocks, required for handling some HDR screens. I've reviewed and merged a bunch of gamescope patches to avoid protocol errors with the color management protocol, fix nested mode under a Vulkan compositor, fix a crash on VT switch and modernize dependencies.

I've worked a bit on drm_info too. I've added a JSON schema to describe the shape of the JSON objects, made it so EDIDs are included in the JSON output as base64-encoded strings, and added the EDID make/model/serial + bus info to the pretty-printed output.

delthas has added soju support for user metadata, introduced a new work-in-progress metadata key to block users, and made it so soju cancels Web Push notifications if a client marks a message as read (to avoid opening notifications for a very short time when actively chatting with another user). Markus Cisler has revamped Goguma's message bubbles: they look much better now!

See you next month!

15 Oct 2025 10:00pm GMT

10 Oct 2025

feedplanet.freedesktop.org

Sebastian Wick: SO_PEERPIDFD Gets More Useful

A while ago I wrote about the limited usefulness of SO_PEERPIDFD. for authenticating sandboxed applications. The core problem was simple: while pidfds gave us a race-free way to identify a process, we still had no standardized way to figure out what that process actually was - which sandbox it ran in, what application it represented, or what permissions it should have.

The situation has improved considerably since then.

cgroup xattrs

Cgroups now support user extended attributes. This feature allows arbitrary metadata to be attached to cgroup inodes using standard xattr calls.

We can change flatpak (or snap, or any other container engine) to create a cgroup for application instances it launches, and attach metadata to it using xattrs. This metadata can include the sandboxing engine, application ID, instance ID, and any other information the compositor or D-Bus service might need.

Every process belongs to a cgroup, and you can query which cgroup a process belongs to through its pidfd - completely race-free.

Standardized Authentication

Remember the complexity from the original post? Services had to implement different lookup mechanisms for different sandbox technologies:

All of this goes away. Now there's a single path:

  1. Accept a connection on a socket
  2. Use SO_PEERPIDFD to get a pidfd for the client
  3. Query the client's cgroup using the pidfd
  4. Read the cgroup's user xattrs to get the sandbox metadata

This works the same way regardless of which sandbox engine launched the application.

A Kernel Feature, Not a systemd One

It's worth emphasizing: cgroups are a Linux kernel feature. They have no dependency on systemd or any other userspace component. Any process can manage cgroups and attach xattrs to them. The process only needs appropriate permissions and is restricted to a subtree determined by the cgroup namespace it is in. This makes the approach universally applicable across different init systems and distributions.

To support non-Linux systems, we might even be able to abstract away the cgroup details, by providing a varlink service to register and query running applications. On Linux, this service would use cgroups and xattrs internally.

Replacing Socket-Per-App

The old approach - creating dedicated wayland, D-Bus, etc. sockets for each app instance and attaching metadata to the service which gets mapped to connections on that socket - can now be retired. The pidfd + cgroup xattr approach is simpler: one standardized lookup path instead of mounting special sockets. It works everywhere: any service can authenticate any client without special socket setup. And it's more flexible: metadata can be updated after process creation if needed.

For compositor and D-Bus service developers, this means you can finally implement proper sandboxed client authentication without needing to understand the internals of every container engine. For sandbox developers, it means you have a standardized way to communicate application identity without implementing custom socket mounting schemes.

10 Oct 2025 3:04pm GMT

09 Oct 2025

feedplanet.freedesktop.org

Mike Blumenkrantz: Mesh Shaders In The Current Year

It Happened.

Just a quick post to confirm that the OpenGL/ES Working Group has signed off on the release of GL_EXT_mesh_shader.

Credits

This is a monumental release, the largest extension shipped for GL this decade, and the culmination of many, many months of work by AMD. In particular we all need to thank Qiang Yu (AMD), who spearheaded this initiative and did the vast majority of the work both in writing the specification and doing the core mesa implementation. Shihao Wang (AMD) took on the difficult task of writing actual CTS cases (not mandatory for EXT extensions in GL, so this is a huge benefit to the ecosystem).

Big thanks to both of you, and everyone else behind the scenes at AMD, for making this happen.

Also we have to thank the nvidium project and its author, Cortex, for single-handedly pushing the industry forward through the power of Minecraft modding. Stay sane out there.

Support

Minecraft mod support is already underway, so expect that to happen "soon".

The bones of this extension have already been merged into mesa over the past couple months. I opened a MR to enable zink support this morning since I have already merged the implementation.

Currently, I'm planning to wait until either just before the branch point next week or until RadeonSI merges its support to merge the zink MR. This is out of respect: Qiang Yu did a huge lift for everyone here, and ideally AMD's driver should be the first to be able to advertise that extension to reflect that. But the branchpoint is coming up in a week, and SGC will be going into hibernation at the end of the month until 2026, so this offer does have an expiration date.

In any case, we're done here.

09 Oct 2025 12:00am GMT

03 Oct 2025

feedplanet.freedesktop.org

Simon Ser: Announcing vali, a C library for Varlink

In the past months I've been working on vali, a C library for Varlink. Today I'm publishing the first vali release! I'd like to explain how to use it for readers who aren't especially familiar with Varlink, and describe some interesting API design decisions.

What is Varlink anyways?

Varlink is a very simple Remote Procedure Call (RPC) protocol. Clients can call methods exposed by services (ie, servers). To call a method, a client sends a JSON object with its name and parameters over a Unix socket. To reply to a call, a service sends a JSON object with response parameters. That's it.

Here's an example request with a bar parameter containing an integer:

{
      "method": "org.example.service.Foo",
      "parameters": {
              "bar": 42
      }
}

And here's an example response with a baz parameter containing a list of strings:

{
      "parameters": {
              "baz": ["hello", "world"]
      }
}

Varlink also supports calls with no reply or with multiple replies, but let's leave this out of the picture for simplicity's sake.

Varlink services can describe the methods they implement with an interface definition file.

method Foo(bar: int) -> (baz: []string)

Coming from the Wayland world, I love generating code from specification files. This removes all of the manual encoding/decoding boilerplate and is more type-safe. Unfortunately the official libvarlink library doesn't support code generation (and is not actively maintained anymore), so I've decided to write my own. vali is the result!

vali without code generation

To better understand the benefits of code generation and vali design decisions, let's take a minute to have a look at what usage without code generation looks like.

A client first needs to connect via vali_client_connect_unix(), then call vali_client_call() with a JSON object containing input parameters. It'll get back a JSON object containing output parameters, which needs to be parsed.

struct vali_client *client = vali_client_connect_unix("/run/org.example.service");
if (client == NULL) {
      fprintf(stderr, "Failed to connect to service\n");
      exit(1);
}

struct json_object *in = json_object_new_object();
json_object_object_add(in, "bar", json_object_new_int(42));
struct json_object *out = NULL;
if (!vali_client_call(client, "org.example.service.Foo", in, &out, NULL)) {
      fprintf(stderr, "Foo request failed\n");
      exit(1);
}

struct json_object *baz = json_object_object_get(out, "baz");
for (size_t i = 0; i < json_object_array_length(baz); i++) {
      struct json_object *item = json_object_array_get_idx(baz, i);
      printf("%s\n", json_object_get_string(item));
}

This is a fair amount of boilerplate. In case of a type mismatch, the client will silently print nothing, which isn't ideal.

The last parameter of vali_client_call() is an optional struct vali_error *: if set to a non-NULL pointer and the service replies with an error, the struct is populated, otherwise it's zero'ed out:

struct vali_error err;
if (!vali_client_call(client, "org.example.service.Foo", in, &out, &err)) {
      if (err.name != NULL) {
              fprintf(stderr, "Foo request failed: %s\n", err.name);
      } else {
              fprintf(stderr, "Foo request failed: internal error\n");
      }
      vali_error_finish(&err);
      exit(1);
}

How does the service side look like? A service first calls vali_service_create() to initialize a fresh service, defines a callback to be invoked when a Varlink call is performed by a client via vali_service_set_call_handler(), and sets up a Unix socket via vali_service_listen_unix(). Let's demonstrate how a service accesses a shared state by printing the number of calls done so far when the callback is invoked. The callback needs to end the call with vali_service_call_close_with_reply().

void handle_call(struct vali_service_call *call, void *user_data) {
      int *call_count_ptr = user_data;
      (*call_count_ptr)++;
      printf("Received %d-th client call\n", *call_count_ptr);

      struct json_object *baz = json_object_new_array();
      json_object_array_add(baz, json_object_new_string("hello"));
      json_object_array_add(baz, json_object_new_string("world"));
      struct json_object *params = json_object_new_object();
      json_object_object_add(params, "baz", baz);
      vali_service_call_close_with_reply(call, params);
}

int main(int argc, void *argv[]) {
      int call_count = 0;
      struct vali_service_call_handler handler = {
              .func = handle_call,
              .user_data = &call_count,
      };

      struct vali_service *service = vali_service_create();
      vali_service_set_call_handler(service, &handler);
      vali_service_listen_unix(service, "/run/org.example.ftl");
      while (vali_service_dispatch(service));
      return 0;
}

In a prior iteration of the API, the callback would return the reply JSON object. This got changed to vali_service_call_close_with_reply() so that services can handle a call asynchronously. If a service needs some time to reply (e.g. because it needs to send data over a network, or perform a long computation), it can give back control to its event loop so that other clients are not blocked, and later call vali_service_call_close_with_reply() from another callback.

Why bundle the callback and the user data pointer together in a struct, rather than pass them as two separate parameters to vali_service_set_call_handler()? The answer is two-fold:

This design makes wrapping a handler much easier (to create middlewares and routers, more on that below). This all might sound familiar to folks who've written an HTTP server: indeed, struct vali_service_call_handler is inspired from Go's net/http.Handler.

Client side with code generation

Given the method definition from the article introduction, vali generates the following client function:

struct example_Foo_in {
      int bar;
};

struct example_Foo_out {
      char **baz;
      size_t baz_len;
};

bool example_Foo(struct vali_client *client,
      const struct example_Foo_in *in, struct example_Foo *out,
      struct vali_error *err);

It can be used this way to send the JSON request we've seen earlier:

struct vali_client *client = vali_client_connect_unix("/run/org.example.service");
if (client == NULL) {
      fprintf(stderr, "Failed to connect to service\n");
      exit(1);
}

const struct example_Foo_in in = {
      .bar = 42,
};
struct example_Foo_out out;
if (!example_Foo(client, &in, &out, NULL)) {
      fprintf(stderr, "Foo request failed\n");
      exit(1);
}

for (size_t i = 0; i < out.baz_len; i++) {
      printf("%s\n", out.baz[i]);
}

example_Foo_out_finish(&out);

Why does vali generates these twin structs, one for input parameters and the other for output parameters, instead of passing all parameters as function arguments? This does make calls slightly more verbose, but this has a few upsides:

Service side with code generation

The service side is more complicated because it needs to handle multiple connections concurrently and needs to be asynchronous. Being asynchronous is important to not block other clients when processing a call.

The generator for the service code spits out one struct per method and a function to send a reply (and destroy the call):

struct example_Foo_service_call {
      struct vali_service_call *base;
};

void example_Foo_close_with_reply(struct example_Foo_service_call call,
      const struct example_Foo_out *params);

The per-call struct wrapping the struct vali_service_call * makes functions sending replies strongly tied to a particular call, and provides type safety: a Foo reply cannot be sent to a Bar call.

Additionally, the generator also provides a handler struct with one callback per method, and a function to obtain a generic handler from an interface handler:

struct example_handler {
      void (*Foo)(struct example_Foo_service_call call, const struct example_in *in);
};

struct vali_service_call_handler example_get_call_handler(const struct example_handler *handler);

To use all of these toys, a service implementation can define a handler for the Foo method, then feed the result of example_get_call_handler() to vali_service_set_call_handler():

static void handle_foo(struct example_Foo_service_call call, const struct example_Foo_in *in) {
      printf("Foo called with bar=%d\n", in->bar);

      example_Foo_close_with_reply(call, &(const struct example_Foo_out){
              .baz = (char *[]){ "hello", "world" },
              .baz_len = 2,
      });
}

static const struct example_handler example_handler = {
      .Foo = handle_foo,
};

int main(int argc, void *argv[]) {
      struct vali_service *service = vali_service_create();
      vali_service_set_call_handler(service, example_get_call_handler(&example_handler));
      vali_service_listen_unix(service, "/run/org.example.ftl");
      while (vali_service_dispatch(service));
}

Service registry

Some more elaborated services might want to implement more than a single interface. Additionally, services might want to add support for the org.varlink.service interface, which provides introspection: a client can query metadata about the service (e.g. service name, version) and the definition of each interface.

vali makes this easy thanks to struct vali_registry. A service can initialize a new registry via vali_registry_create(), then register each interface by passing its definition and handler to vali_registry_add(). The generated code exposes the interface definition as an example_interface constant. Finally, the registry can be wired up to the struct vali_service by feeding the result of vali_registry_get_call_handler() to vali_service_set_call_handler().

const struct vali_registry_options registry_options = {
      .vendor = "emersion",
      .product = "example",
      .version = "1.0.0",
      .url = "https://example.org",
};
struct vali_registry *registry = vali_registry_create(&registry_options);
vali_registry_add(registry, &example_interface,
      example_get_call_handler(&example_handler));
vali_registry_add(registry, &another_example_interface,
      another_example_get_call_handler(&another_example_handler));

struct vali_service *service = vali_service_create();
vali_service_set_call_handler(service, vali_registry_get_call_handler(registry));
vali_service_listen_unix(service, "/run/org.example.ftl");

while (vali_service_dispatch(service));

This is where the struct vali_service_call_handler fat pointer really shines: the wire-level struct vali_service and the higher-level registry can stay entirely separate. struct vali_service invokes the registry's handler, then the registry is responsible for routing the call to the correct interface-specific handler. The registry's internal state remains hidden away in the handler's opaque user data pointer.

A complete client and service example is available in vali's example/ directory.

What's next?

I plan to leverage vali in the next version of the kanshi Wayland output management daemon.

We've discussed about async on the service side above, but we haven't discussed async on the client side. That can be useful too, especially when a client needs to juggle with multiple sockets, and is still a TODO.

Something I'm still unhappy about is the lack of const fields generated structs. Let's have a look at the struct for output parameters given above:

struct example_Foo_out {
      char **baz;
      size_t baz_len;
};

If a service has a bunch of const char * variables it wants to send as part of the reply, it needs to cast them to char * or strdup() them. None of these options are great.

static const char hiya[] = "hiya";

static void handle_foo(struct example_Foo_service_call call, const struct example_Foo_in *in) {
      example_Foo_close_with_reply(call, &(const struct example_Foo_out){
              // Type error: implicit cast from "const char *" to "char *"
          .baz = (char *[]){ hiya },
              .baz_len = 1,
      });
}

On the other hand, making all struct fields const would be cumbersome when dynamically constructing nested structs in replies, and would be a bit of a lie when passing a reply to example_Foo_out_finish() (that function frees all fields).

Generating two structs (one const, one not) is not an option since types are shared between client and service, and some types can be referenced from both a call input and another call's output. Ideally, C would provide a way to propagate const-ness to fields, but that's not a thing. Oh well, that's life.

If you need an IPC mechanism for your tool, please consider giving vali a shot! Feel free to reach out to report any bugs, questions or suggestions.

03 Oct 2025 10:00pm GMT

Iago Toral: XDC 2025

It has been a while since my last post, I know. Today I just want to thank Igalia for continuing to give me and many other Igalians the opportunity to attend XDC. I had a great time in Vienna where I was able to catch up with other Mesa developers (including Igalians!) I rarely have the opportunity to see face to face. It is amazing to see how Mesa continues to gain traction and interest year after year, seeing more actors and vendors getting involved in one way or another… the push for open source drivers in the industry is real and it is fantastic to see it happening.

I'd also like to thank the organization, I know all the work that goes into making these things happen, so big thanks to everyone who was involved, and to the speakers, the XDC program is getting better every year.

Looking forward to next year already 🙂

03 Oct 2025 6:58am GMT

30 Sep 2025

feedplanet.freedesktop.org

Hans de Goede: Fedora 43 will ship with FOSS Meteor, Lunar and Arrow Lake MIPI camera support

Good news the just released 6.17 kernel has support for the IPU7 CSI2 receiver and the missing USBIO drivers have recently landed in linux-next. I have backported the USBIO drivers + a few other camera fixes to the Fedora 6.17 kernel.

I've also prepared an updated libcamera-0.5.2 Fedora package with support for IPU7 (Lunar Lake) CSI2 receivers as well as backporting a set of upstream SwStats and AGC fixes, fixing various crashes as well as the bad flicker MIPI camera users have been hitting with libcamera 0.5.2.

Together these 2 updates should make Fedora 43's FOSS MIPI camera support work on most Meteor Lake, Lunar Lake and Arrow Lake laptops!

If you want to give this a try, install / upgrade to Fedora 43 beta and install all updates. If you've installed rpmfusion's binary IPU6 stack please run:

sudo dnf remove akmod-intel-ipu6 'kmod-intel-ipu6*'

to remove it as it may interfere with the FOSS stack and finally reboot. Please first try with qcam:

sudo dnf install libcamera-qcam
qcam

which only tests libcamera and after that give apps which use the camera through pipewire a try like gnome's "Camera" app (snapshot) or video-conferencing in Firefox.

Note snapshot on Lunar Lake triggers a bug in the LNL Vulkan code, to avoid this start snapshot from a terminal with:

GSK_RENDERER=gl snapshot

If you have a MIPI camera which still does not work please file a bug following these instructions and drop me an email with the bugzilla link at hansg@kernel.org.

comment count unavailable comments

30 Sep 2025 6:55pm GMT

24 Sep 2025

feedplanet.freedesktop.org

Sebastian Wick: XDG Intents Updates

Andy Holmes wrote an excellent overview of XDG Intents in his "Best Intentions" blog post, covering the foundational concepts and early proposals. Unfortunately, due to GNOME Foundation issues, this work never fully materialized. As I have been running into more and more cases where this would provide a useful primitive for other features, I tried to continue the work.

The specifications have evolved as I worked on implementing them in glib, desktop-file-utils and ptyxis. Here's what's changed:

Intent-Apps Specification

Andy showed this hypothetical syntax for scoped preferences:

[Default Applications]
org.freedesktop.Thumbnailer=org.gimp.GIMP
org.freedesktop.Thumbnailer[image/svg+xml]=org.gnome.Loupe;org.gimp.GIMP

We now use separate groups instead:

[Default Applications]
org.freedesktop.Thumbnailer=org.gimp.GIMP

[org.freedesktop.Thumbnailer]
image/svg+xml=org.gnome.Loupe;org.gimp.GIMP

This approach creates a dedicated group for each intent, with keys representing the scopes. This way, we do not have to abuse the square brackets which were meant for translatable keys and allow only very limited values.

The updated specification also adds support for intent.cache files to improve performance, containing up-to-date lists of applications supporting particular intents and scopes. This is very similar to the already existing cache for MIME types. The update-desktop-database tool is responsible for keeping the cache up-to-date.

This is implemented in glib!4797, desktop-file-utils!27, and the updated specification is in xdg-specs!106.

Terminal Intent Specification

While Andy mentioned the terminal intent as a use case, Zander Brown tried to upstream the intent in xdg-specs!46 multiple years ago. However, because it depended on the intent-apps specification, it unfortunately never went anywhere. With the fleshed-out version of the intent-apps specification, and an implementation in glib, I was able to implement the terminal-intent specification in glib as well. With some help from Christian, we also added support for the intent in the ptyxis terminal.

This revealed some shortcomings in the proposed D-Bus interface. In particular, when a desktop file gets activated with multiple URIs, and the Exec line in the desktop entry only indicates support for a limited number of URIs, multiple commands need to be launched. To support opening those commands in a single window but in multiple tabs in the terminal emulator, for example, those multiple commands must be part of a single D-Bus method call. The resulting D-Bus interface looks like this:

<interface name="org.freedesktop.Terminal1">
  <method name="LaunchCommand">
    <arg type='aa{sv}' name='commands' direction='in' />
    <arg type='ay' name='desktop_entry' direction='in' />
    <arg type='a{sv}' name='options' direction='in' />
    <arg type='a{sv}' name='platform_data' direction='in' />
  </method>
</interface>

This is implemented in glib!4797, ptyxis!119 and the updated specification is in xdg-specs!107.

Deeplink Intent

Andy's post discussed a generic "org.freedesktop.UriHandler" with this example:

[org.freedesktop.UriHandler]
Supports=wise.com;
Patterns=https://*.wise.com/link?urn=urn%3Awise%3Atransfers;

The updated specification introduces a specific org.freedesktop.handler.Deeplink1 intent where the scheme is implicitly http or https and the host comes from the scope (i.e., the Supports part). The pattern matching is done on the path alone:

[org.freedesktop.handler.Deeplink1]
Supports=example.org;extensions.gnome.org
example.org=/login;/test/a?a
extensions.gnome.org=/extension/*/*/install;/extension/*/*/uninstall

This allows us to focus on deeplinking alone and allows the user to set the order of handlers for specific hosts.

In this example, the app would handle the URIs http://example.org/login, http://example.org/test/aba, http://extensions.gnome.org/extension/123456/BestExtension/install and so on.

There is a draft implementation in glib!4833 and the specification is in xdg-specs!109.

Deeplinking Issues and Other Handlers

I am still unsure about the Deeplink1 intent. Does it make sense to allow schemes other than http and https? If yes, how should the priority of applications be determined when opening a URI? How complex does the pattern matching need to be?

Similarly, should we add an org.freedesktop.handler.Scheme1 intent? We currently abuse MIME handlers for this, so it seems like a good idea, but then we need to take backwards compatibility into account. Maybe we can modify update-desktop-database to add entries from org.freedesktop.handler.Scheme1 to mimeapps.list for that?

If we go down that route, is there a reason not to also do the same for MIME handlers and add an org.freedesktop.handler.Mime1 intent for that purpose with the same backwards compatibility mechanism?

Deeplinking to App Locations

While working on this, I noticed that we are not great at allowing linking to locations in our apps. For example, most email clients do not have a way to link to a specific email. Most calendars do not allow referencing a specific event. Some apps do support this. For example, Zotero allows linking to items in the app with URIs of the form zotero://select/items/0_USN95MJC.

Maybe we can improve on this? If all our apps used a consistent scheme and queries (for example xdg-app-org.example.appid:/some/path/in/the/app?name=Example), we could render those links differently and finally have a nice way to link to an email in our calendar.

This definitely needs more thought, but I do like the idea.

Security Considerations

Allowing apps to describe more thoroughly which URIs they can handle is great, but we also live in a world where security has to be taken into account. If an app wants to handle the URI https://bank.example.org, we better be sure that this app actually is the correct banking app. This unfortunately is not a trivial issue, so I will leave it for the next time.

24 Sep 2025 4:57pm GMT

18 Sep 2025

feedplanet.freedesktop.org

Sebastian Wick: Integrating libdex with GDBus

Writing asynchronous code in C has always been a challenge. Traditional callback-based approaches, including GLib's async/finish pattern, often lead to the so-called callback hell that's difficult to read and maintain. The libdex library offers a solution to this problem, and I recently worked on expanding the integration with GLib's GDBus subsystem.

The Problem with the Sync and Async Patterns

Writing C code involving tasks which can take non-trivial amount of time has traditionally required choosing between two approaches:

  1. Synchronous calls - Simple to write but block the current thread
  2. Asynchronous callbacks - Non-blocking but result in callback hell and complex error handling

Often the synchronous variant is chosen to keep the code simple, but in a lot of cases, blocking for potentially multiple seconds is not acceptable. Threads can be used to prevent the other threads from blocking, but it creates parallelism and with it the need for locking. It also can potentially create a huge amount of threads which mostly sit idle.

The asynchronous variant has none of those problems, but consider a typical async D-Bus operation in traditional GLib code:

static void
on_ping_ready (GObject      *source_object,
               GAsyncResult *res,
               gpointer      data)
{
  g_autofree char *pong = NULL;

  if (!dex_dbus_ping_pong_call_ping_finish (DEX_BUS_PING_PONG (source_object),
                                            &pong,
                                            res, NULL))
    return; // handle error

  g_print ("client: %s\n", pong);
}

static void
on_ping_pong_proxy_ready (GObject      *source_object,
                          GAsyncResult *res,
                          gpointer      data)
{
  DexDbusPingPong *pp dex_dbus_ping_pong_proxy_new_finish (res, NULL);
  if (!pp)
    return; // Handle error

  dex_dbus_ping_pong_call_ping (pp, "ping", NULL,
                                on_ping_ready, NULL);
}

This pattern becomes unwieldy quickly, especially with multiple operations, error handling, shared data and cleanup across multiple callbacks.

What is libdex?

Dex provides Future-based programming for GLib. It provides features for application and library authors who want to structure concurrent code in an easy to manage way. Dex also provides Fibers which allow writing synchronous looking code in C while maintaining the benefits of asynchronous execution.

At its core, libdex introduces two key concepts:

Futures alone already simplify dealing with asynchronous code by specefying a call chain (dex_future_then(), dex_future_catch(), and dex_future_finally()), or even more elaborate flows (dex_future_all(), dex_future_all_race(), dex_future_any(), and dex_future_first()) at one place, without the typical callback hell. It still requires splitting things into a bunch of functions and potentially moving data through them.

static DexFuture *
lookup_user_data_cb (DexFuture *future,
                     gpointer   user_data)
{
  g_autoptr(MyUser) user = NULL;
  g_autoptr(GError) error = NULL;

  // the future in this cb is already resolved, so this just gets the value
  // no fibers involved 
  user = dex_await_object (future, &error);
  if (!user)
    return dex_future_new_for_error (g_steal_pointer (&error));

  return dex_future_first (dex_timeout_new_seconds (60),
                           dex_future_any (query_db_server (user),
                                           query_cache_server (user),
                                           NULL),
                           NULL);
}

static void
print_user_data (void)
{
  g_autoptr(DexFuture) future = NULL;

  future = dex_future_then (find_user (), lookup_user_data_cb, NULL, NULL);
  future = dex_future_then (future, print_user_data_cb, NULL, NULL);
  future = dex_future_finally (future, quit_cb, NULL, NULL);

  g_main_loop_run (main_loop);
}

The real magic of libdex however lies in fibers and the dex_await() function, which allows you to write code that looks synchronous but executes asynchronously. When you await a future, the current fiber yields control, allowing other work to proceed while waiting for the result.

g_autoptr(MyUser) user = NULL;
g_autoptr(MyUserData) data = NULL;
g_autoptr(GError) error = NULL;

user = dex_await_object (find_user (), &error);
if (!user)
  return dex_future_new_for_error (g_steal_pointer (&error));

data = dex_await_boxed (dex_future_first (dex_timeout_new_seconds (60),
                                          dex_future_any (query_db_server (user),
                                                          query_cache_server (user),
                                                          NULL),
                                          NULL), &error);
if (!data)
  return dex_future_new_for_error (g_steal_pointer (&error));

g_print ("%s", data->name);

Christian Hergert wrote pretty decent documentation, so check it out!

Bridging libdex and GDBus

With the new integration, you can write D-Bus client code that looks like this:

g_autoptr(DexDbusPingPong) *pp = NULL;
g_autoptr(DexDbusPingPongPingResult) result = NULL;

pp = dex_await_object (dex_dbus_ping_pong_proxy_new_future (connection,
                                                            G_DBUS_PROXY_FLAGS_NONE,
                                                            "org.example.PingPong",
                                                            "/org/example/pingpong"),
                       &error);
if (!pp)
  return dex_future_new_for_error (g_steal_pointer (&error));

res = dex_await_boxed (dex_dbus_ping_pong_call_ping_future (pp, "ping"), &error);
if (!res)
  return dex_future_new_for_error (g_steal_pointer (&error));

g_print ("client: %s\n", res->pong);

This code is executing asynchronously, but reads like synchronous code. Error handling is straightforward, and there are no callbacks involved.

On the service side, if enabled, method handlers will run in a fiber and can use dex_await() directly, enabling complex asynchronous operations within service implementations:

static gboolean
handle_ping (DexDbusPingPong       *object,
             GDBusMethodInvocation *invocation,
             const char            *ping)
{
  g_print ("service: %s\n", ping);

  dex_await (dex_timeout_new_seconds (1), NULL);
  dex_dbus_ping_pong_complete_ping (object, invocation, "pong");

  return G_DBUS_METHOD_INVOCATION_HANDLED;
}

static void
dex_dbus_ping_pong_iface_init (DexDbusPingPongIface *iface)
{
  iface->handle_ping = handle_ping;
}
pp = g_object_new (DEX_TYPE_PING_PONG, NULL);
dex_dbus_interface_skeleton_set_flags (DEX_DBUS_INTERFACE_SKELETON (pp),
                                       DEX_DBUS_INTERFACE_SKELETON_FLAGS_HANDLE_METHOD_INVOCATIONS_IN_FIBER);

This method handler includes a 1-second delay, but instead of blocking the entire service, it yields control to other fibers during the timeout.

The merge request contains a complete example of a client and service communicating with each other.

Implementation Details

The integration required extending GDBus's code generation system. Rather than modifying it directly, the current solution introduces a very simple extension system to GDBus' code generation.

The generated code includes:

Besides the GDBus code generation extension system, there are a few more changes required in GLib to make this work. This is not merged at the time of writing, but I'm confident that we can move this forward.

Future Directions

I hope that this work convinces more people to use libdex! We have a whole bunch of existing code bases which will have to stick with C in the foreseeable future, and libdex provides tools to make incremental improvements. Personally, I want to start using in in the xdg-desktop-portal project.

18 Sep 2025 6:58pm GMT

16 Sep 2025

feedplanet.freedesktop.org

Mike Blumenkrantz: Now We CAD

Perf Must Increase.

After my last post, I'm sure everyone was speculating about the forthcoming zink takeover of the CAD industry. Or maybe just wondering why I'm bothering with this at all. Well, the answer is simple: CAD performance is all performance. If I can improve FPS in viewperf, I'm decreasing CPU utilization in all apps, which is generally useful.

As in the previous post, the catia section of viewperf was improved to a whopping 34fps against the reference driver (radeonsi) by eliminating a few hundred thousand atomic operations per frame. An interesting observation here is that while eliminating atomic operations in radeonsi does improve FPS there by ~5% (105fps), there is no bottlenecking, so this does not "unlock" further optimizations in the same way that it does for zink. I speculate this is because zink has radv underneath, which affects memory access across ccx in ways that do not affect radeonsi.

In short: a rising tide lifts all ships in the harbor, but since zink was effectively a sunken ship, it is rising much more than the others.

Even More Improvements

Since that previous post, I and others have been working quietly in the background on other improvements, all of which have landed in mesa main already:

catia-quietly.png

A nice 35% improvement, largely from three MRs:

That's right. In my quest to maximize perf, I have roped in veteran radv developer and part-time vacation enthusiast, Samuel Pitoiset. Because radv is slow. vkoverhead exists to target noticeably slow cases, and by harnessing the forbidden power of rewriting the whole driver, it was possible for a lone Frenchman to significantly reduce bottlenecking during draw emission.

This Isn't Even My Final Form

Obviously. I'm not about to say that I'll only stop when I reach performance parity, but the FPS can still go up.

At this point, however, it's becoming less useful (in zink) to look at flamegraphs. There's only so much optimization that can be done once the code has been simplified to a certain extent, and eventually those optimizations will lead to obfuscated code which is harder to maintain.

Thus, it's time to step back and look architecturally. What is the app doing? How does that reach the driver? Can it be improved?

GALLIUM_TRACE is a great tool for this, as it logs the API stream as it reaches the backend driver, and there are parser tools to convert the output XML to something readable. Let's take a look at a small cross-section of the trace:

pipe_context::set_vertex_buffers(pipe = context_2, num_buffers = 2, buffers = [[is_user_buffer = 0, buffer_offset = 0, buffer.resource = resource_10043], [is_user_buffer = 0, buffer_offset = 7440, buffer.resource = resource_10043]])
pipe_context::draw_vbo(pipe = context_2, info = [index_size = 2, has_user_indices = 0, mode = 5, start_instance = 0, instance_count = 1, min_index = 0, max_index = 1257, primitive_restart = 0, restart_index = 0, index.resource = resource_10044], drawid_offset = 0, indirect = NULL, draws = [[start = 0, count = 1257, index_bias = 0]], num_draws = 1)
pipe_context::set_vertex_buffers(pipe = context_2, num_buffers = 2, buffers = [[is_user_buffer = 0, buffer_offset = 0, buffer.resource = resource_10045], [is_user_buffer = 0, buffer_offset = 7632, buffer.resource = resource_10045]])
pipe_context::draw_vbo(pipe = context_2, info = [index_size = 2, has_user_indices = 0, mode = 5, start_instance = 0, instance_count = 1, min_index = 0, max_index = 1257, primitive_restart = 0, restart_index = 0, index.resource = resource_10046], drawid_offset = 0, indirect = NULL, draws = [[start = 0, count = 1257, index_bias = 0]], num_draws = 1)
pipe_context::set_vertex_buffers(pipe = context_2, num_buffers = 2, buffers = [[is_user_buffer = 0, buffer_offset = 0, buffer.resource = resource_10047], [is_user_buffer = 0, buffer_offset = 7680, buffer.resource = resource_10047]])
pipe_context::draw_vbo(pipe = context_2, info = [index_size = 2, has_user_indices = 0, mode = 5, start_instance = 0, instance_count = 1, min_index = 0, max_index = 1257, primitive_restart = 0, restart_index = 0, index.resource = resource_10048], drawid_offset = 0, indirect = NULL, draws = [[start = 0, count = 1257, index_bias = 0]], num_draws = 1)
pipe_context::set_vertex_buffers(pipe = context_2, num_buffers = 2, buffers = [[is_user_buffer = 0, buffer_offset = 0, buffer.resource = resource_10049], [is_user_buffer = 0, buffer_offset = 7656, buffer.resource = resource_10049]])
pipe_context::draw_vbo(pipe = context_2, info = [index_size = 2, has_user_indices = 0, mode = 5, start_instance = 0, instance_count = 1, min_index = 0, max_index = 1257, primitive_restart = 0, restart_index = 0, index.resource = resource_10050], drawid_offset = 0, indirect = NULL, draws = [[start = 0, count = 1257, index_bias = 0]], num_draws = 1)
pipe_context::set_vertex_buffers(pipe = context_2, num_buffers = 2, buffers = [[is_user_buffer = 0, buffer_offset = 0, buffer.resource = resource_10051], [is_user_buffer = 0, buffer_offset = 7752, buffer.resource = resource_10051]])
pipe_context::draw_vbo(pipe = context_2, info = [index_size = 2, has_user_indices = 0, mode = 5, start_instance = 0, instance_count = 1, min_index = 0, max_index = 1257, primitive_restart = 0, restart_index = 0, index.resource = resource_10052], drawid_offset = 0, indirect = NULL, draws = [[start = 0, count = 1257, index_bias = 0]], num_draws = 1)
pipe_context::set_vertex_buffers(pipe = context_2, num_buffers = 2, buffers = [[is_user_buffer = 0, buffer_offset = 0, buffer.resource = resource_10053], [is_user_buffer = 0, buffer_offset = 7800, buffer.resource = resource_10053]])
pipe_context::draw_vbo(pipe = context_2, info = [index_size = 2, has_user_indices = 0, mode = 5, start_instance = 0, instance_count = 1, min_index = 0, max_index = 1257, primitive_restart = 0, restart_index = 0, index.resource = resource_10054], drawid_offset = 0, indirect = NULL, draws = [[start = 0, count = 1257, index_bias = 0]], num_draws = 1)
pipe_context::set_vertex_buffers(pipe = context_2, num_buffers = 2, buffers = [[is_user_buffer = 0, buffer_offset = 0, buffer.resource = resource_10055], [is_user_buffer = 0, buffer_offset = 7968, buffer.resource = resource_10055]])
pipe_context::draw_vbo(pipe = context_2, info = [index_size = 2, has_user_indices = 0, mode = 5, start_instance = 0, instance_count = 1, min_index = 0, max_index = 1257, primitive_restart = 0, restart_index = 0, index.resource = resource_10056], drawid_offset = 0, indirect = NULL, draws = [[start = 0, count = 1257, index_bias = 0]], num_draws = 1)
pipe_context::set_vertex_buffers(pipe = context_2, num_buffers = 2, buffers = [[is_user_buffer = 0, buffer_offset = 0, buffer.resource = resource_10057], [is_user_buffer = 0, buffer_offset = 7968, buffer.resource = resource_10057]])
pipe_context::draw_vbo(pipe = context_2, info = [index_size = 2, has_user_indices = 0, mode = 5, start_instance = 0, instance_count = 1, min_index = 0, max_index = 1257, primitive_restart = 0, restart_index = 0, index.resource = resource_10058], drawid_offset = 0, indirect = NULL, draws = [[start = 0, count = 1257, index_bias = 0]], num_draws = 1)
pipe_context::set_vertex_buffers(pipe = context_2, num_buffers = 2, buffers = [[is_user_buffer = 0, buffer_offset = 0, buffer.resource = resource_10059], [is_user_buffer = 0, buffer_offset = 8136, buffer.resource = resource_10059]])
pipe_context::draw_vbo(pipe = context_2, info = [index_size = 2, has_user_indices = 0, mode = 5, start_instance = 0, instance_count = 1, min_index = 0, max_index = 1257, primitive_restart = 0, restart_index = 0, index.resource = resource_10060], drawid_offset = 0, indirect = NULL, draws = [[start = 0, count = 1257, index_bias = 0]], num_draws = 1)
pipe_context::set_vertex_buffers(pipe = context_2, num_buffers = 2, buffers = [[is_user_buffer = 0, buffer_offset = 0, buffer.resource = resource_10061], [is_user_buffer = 0, buffer_offset = 8280, buffer.resource = resource_10061]])
pipe_context::draw_vbo(pipe = context_2, info = [index_size = 2, has_user_indices = 0, mode = 5, start_instance = 0, instance_count = 1, min_index = 0, max_index = 1257, primitive_restart = 0, restart_index = 0, index.resource = resource_10062], drawid_offset = 0, indirect = NULL, draws = [[start = 0, count = 1257, index_bias = 0]], num_draws = 1)
pipe_context::set_vertex_buffers(pipe = context_2, num_buffers = 2, buffers = [[is_user_buffer = 0, buffer_offset = 0, buffer.resource = resource_10063], [is_user_buffer = 0, buffer_offset = 8040, buffer.resource = resource_10063]])
pipe_context::draw_vbo(pipe = context_2, info = [index_size = 2, has_user_indices = 0, mode = 5, start_instance = 0, instance_count = 1, min_index = 0, max_index = 1257, primitive_restart = 0, restart_index = 0, index.resource = resource_10064], drawid_offset = 0, indirect = NULL, draws = [[start = 0, count = 1257, index_bias = 0]], num_draws = 1)
pipe_context::set_vertex_buffers(pipe = context_2, num_buffers = 2, buffers = [[is_user_buffer = 0, buffer_offset = 0, buffer.resource = resource_10065], [is_user_buffer = 0, buffer_offset = 7608, buffer.resource = resource_10065]])
pipe_context::draw_vbo(pipe = context_2, info = [index_size = 2, has_user_indices = 0, mode = 5, start_instance = 0, instance_count = 1, min_index = 0, max_index = 1257, primitive_restart = 0, restart_index = 0, index.resource = resource_10066], drawid_offset = 0, indirect = NULL, draws = [[start = 0, count = 1257, index_bias = 0]], num_draws = 1)

As expected, a huge chunk of the runtime is just set_vertex_buffers -> draw_vbo. Architecturally, this leads to a lot of unavoidably wasted cycles in drivers:

But in the scenario where the driver can know ahead of time exactly what states will be updated, couldn't that yield an improvement? For example, bundling these two calls into a single draw call would eliminate:

In theory, it seems like this should be pretty good. And now that vertex buffer lifetimes have been reworked to use explicit ownership rather than garbage collection, it's actually possible to do this. The optimal site for the optimization would be in threaded-context, where similar types of draw merging are already occurring.

The result looks something like this in a comparable trace:

pipe_context::draw_vbo_buffers(pipe = pipe_2, info = [index_size = 0, has_user_indices = 0, mode = 5, start_instance = 0, instance_count = 1, min_index = 1141, max_index = 5, primitive_restart = 0, restart_index = 0, index.resource = NULL], buffer_count = 3, buffers = [[is_user_buffer = 0, buffer_offset = 163536, buffer.resource = resource_30210], [is_user_buffer = 0, buffer_offset = 191032, buffer.resource = resource_30210], [is_user_buffer = 0, buffer_offset = 771328, buffer.resource = resource_29602]], draws = [[start = 1141, count = 5]], num_draws = 1)
pipe_context::draw_vbo_buffers(pipe = pipe_2, info = [index_size = 0, has_user_indices = 0, mode = 5, start_instance = 0, instance_count = 1, min_index = 1146, max_index = 5, primitive_restart = 0, restart_index = 0, index.resource = NULL], buffer_count = 3, buffers = [[is_user_buffer = 0, buffer_offset = 218528, buffer.resource = resource_30210], [is_user_buffer = 0, buffer_offset = 246144, buffer.resource = resource_30210], [is_user_buffer = 0, buffer_offset = 771360, buffer.resource = resource_29602]], draws = [[start = 1146, count = 5]], num_draws = 1)
pipe_context::draw_vbo_buffers(pipe = pipe_2, info = [index_size = 0, has_user_indices = 0, mode = 5, start_instance = 0, instance_count = 1, min_index = 1151, max_index = 5, primitive_restart = 0, restart_index = 0, index.resource = NULL], buffer_count = 3, buffers = [[is_user_buffer = 0, buffer_offset = 273760, buffer.resource = resource_30210], [is_user_buffer = 0, buffer_offset = 301496, buffer.resource = resource_30210], [is_user_buffer = 0, buffer_offset = 771392, buffer.resource = resource_29602]], draws = [[start = 1151, count = 5]], num_draws = 1)
pipe_context::draw_vbo_buffers(pipe = pipe_2, info = [index_size = 0, has_user_indices = 0, mode = 5, start_instance = 0, instance_count = 1, min_index = 1156, max_index = 5, primitive_restart = 0, restart_index = 0, index.resource = NULL], buffer_count = 3, buffers = [[is_user_buffer = 0, buffer_offset = 329232, buffer.resource = resource_30210], [is_user_buffer = 0, buffer_offset = 357088, buffer.resource = resource_30210], [is_user_buffer = 0, buffer_offset = 771424, buffer.resource = resource_29602]], draws = [[start = 1156, count = 5]], num_draws = 1)
pipe_context::draw_vbo_buffers(pipe = pipe_2, info = [index_size = 0, has_user_indices = 0, mode = 5, start_instance = 0, instance_count = 1, min_index = 1161, max_index = 5, primitive_restart = 0, restart_index = 0, index.resource = NULL], buffer_count = 3, buffers = [[is_user_buffer = 0, buffer_offset = 384944, buffer.resource = resource_30210], [is_user_buffer = 0, buffer_offset = 412920, buffer.resource = resource_30210], [is_user_buffer = 0, buffer_offset = 771456, buffer.resource = resource_29602]], draws = [[start = 1161, count = 5]], num_draws = 1)
pipe_context::draw_vbo_buffers(pipe = pipe_2, info = [index_size = 0, has_user_indices = 0, mode = 5, start_instance = 0, instance_count = 1, min_index = 1166, max_index = 5, primitive_restart = 0, restart_index = 0, index.resource = NULL], buffer_count = 3, buffers = [[is_user_buffer = 0, buffer_offset = 440896, buffer.resource = resource_30210], [is_user_buffer = 0, buffer_offset = 468992, buffer.resource = resource_30210], [is_user_buffer = 0, buffer_offset = 771488, buffer.resource = resource_29602]], draws = [[start = 1166, count = 5]], num_draws = 1)
pipe_context::draw_vbo_buffers(pipe = pipe_2, info = [index_size = 0, has_user_indices = 0, mode = 5, start_instance = 0, instance_count = 1, min_index = 1171, max_index = 5, primitive_restart = 0, restart_index = 0, index.resource = NULL], buffer_count = 3, buffers = [[is_user_buffer = 0, buffer_offset = 497088, buffer.resource = resource_30210], [is_user_buffer = 0, buffer_offset = 525304, buffer.resource = resource_30210], [is_user_buffer = 0, buffer_offset = 771520, buffer.resource = resource_29602]], draws = [[start = 1171, count = 5]], num_draws = 1)
pipe_context::draw_vbo_buffers(pipe = pipe_2, info = [index_size = 0, has_user_indices = 0, mode = 5, start_instance = 0, instance_count = 1, min_index = 1176, max_index = 11, primitive_restart = 0, restart_index = 0, index.resource = NULL], buffer_count = 3, buffers = [[is_user_buffer = 0, buffer_offset = 553520, buffer.resource = resource_30210], [is_user_buffer = 0, buffer_offset = 582000, buffer.resource = resource_30210], [is_user_buffer = 0, buffer_offset = 771552, buffer.resource = resource_29602]], draws = [[start = 1176, count = 11]], num_draws = 1)
pipe_context::draw_vbo_buffers(pipe = pipe_2, info = [index_size = 0, has_user_indices = 0, mode = 5, start_instance = 0, instance_count = 1, min_index = 1187, max_index = 5, primitive_restart = 0, restart_index = 0, index.resource = NULL], buffer_count = 3, buffers = [[is_user_buffer = 0, buffer_offset = 610480, buffer.resource = resource_30210], [is_user_buffer = 0, buffer_offset = 639080, buffer.resource = resource_30210], [is_user_buffer = 0, buffer_offset = 771584, buffer.resource = resource_29602]], draws = [[start = 1187, count = 5]], num_draws = 1)
pipe_context::draw_vbo_buffers(pipe = pipe_2, info = [index_size = 0, has_user_indices = 0, mode = 5, start_instance = 0, instance_count = 1, min_index = 1192, max_index = 6, primitive_restart = 0, restart_index = 0, index.resource = NULL], buffer_count = 3, buffers = [[is_user_buffer = 0, buffer_offset = 667680, buffer.resource = resource_30210], [is_user_buffer = 0, buffer_offset = 696424, buffer.resource = resource_30210], [is_user_buffer = 0, buffer_offset = 771616, buffer.resource = resource_29602]], draws = [[start = 1192, count = 6]], num_draws = 1)
pipe_context::draw_vbo_buffers(pipe = pipe_2, info = [index_size = 0, has_user_indices = 0, mode = 5, start_instance = 0, instance_count = 1, min_index = 1198, max_index = 5, primitive_restart = 0, restart_index = 0, index.resource = NULL], buffer_count = 3, buffers = [[is_user_buffer = 0, buffer_offset = 725168, buffer.resource = resource_30210], [is_user_buffer = 0, buffer_offset = 754032, buffer.resource = resource_30210], [is_user_buffer = 0, buffer_offset = 771648, buffer.resource = resource_29602]], draws = [[start = 1198, count = 5]], num_draws = 1)
pipe_context::draw_vbo_buffers(pipe = pipe_2, info = [index_size = 0, has_user_indices = 0, mode = 5, start_instance = 0, instance_count = 1, min_index = 1203, max_index = 5, primitive_restart = 0, restart_index = 0, index.resource = NULL], buffer_count = 3, buffers = [[is_user_buffer = 0, buffer_offset = 782896, buffer.resource = resource_30210], [is_user_buffer = 0, buffer_offset = 811880, buffer.resource = resource_30210], [is_user_buffer = 0, buffer_offset = 771680, buffer.resource = resource_29602]], draws = [[start = 1203, count = 5]], num_draws = 1)

It's more compact, which is nice, but how does the perf look?

catia-vroom.png

About another 40% improvement, now over 60fps: nearly double the endpoint of the last post. Huge.

And this is driving ecosystem improvements which will affect other apps and games which don't even use zink.

Stay winning, Open Source graphics.

16 Sep 2025 12:00am GMT