31 Oct 2025
planet.freedesktop.org
Mike Blumenkrantz: Hibernate On
Take A Break
We've reached Q4 of another year, and after the mad scramble that has been crunch-time over the past few weeks, it's time for SGC to once again retire into a deep, restful sleep.
2025 saw a lot of ground covered:
- NVK-Zink synergy
- Continued Rusticl improvements
- Viewperf perf and general CPU overhead reduction
- Tiler GPU perf
- Mesh shaders
- apitrace perf
- More GL extensions released than any other year this decade
It's been a real roller coaster ride of a year as always, but I can say authoritatively that fans of the blog, you need to take care of yourselves. You need to use this break time wisely. Rest. Recover. Train your bodies. Travel and broaden your horizons. Invest in night classes to expand your minds.
You are not prepared for the insanity that will be this blog in 2026.
31 Oct 2025 12:00am GMT
27 Oct 2025
planet.freedesktop.org
Mike Blumenkrantz: Apitrace Goes Vroom
First Time
Today marks the first post of a type that I've wanted to have for a long while: a guest post. There are lots of graphics developers who work on cool stuff and don't want to waste time setting up blogs, but with enough cajoling they will write a single blog post. If you're out there thinking you just did some awesome work and you want the world to know the grimy, gory details, let me know.
The first victimrecipient of this honor is an individual famous for small and extremely sane endeavors such as descriptor buffers in Lavapipe, ray tracing in Lavapipe, and sparse support in Lavapipe. Also wrangling ray tracing for RADV.
Below is the debut blog post by none other than Konstantin Seurer.
What is apitrace?
Apitrace is a powerful tool for capturing and replaying traces of GL and DX applications. The problem is that it is not really suitable for performance testing. This blog post is about implementing a faster method for replaying traces.
About six weeks ago, Mike asked me if I wanted to work on this.
[6:58:58 pm] <zmike> on the topic of traces
[6:59:08 pm] <zmike> I have a longer-term project that could use your expertise
[6:59:19 pm] <zmike> it's low work but high complexity
[7:00:12 pm] <zmike> specifically I would like apitrace to be able to competently output C code from traces and to have this functionality merged upstream
low work
The state of glretrace
This first obvious step was measuring how glretrace currently performs. Mike kindly provided a couple of traces from his personal collection, and I immediately timed a trace of the only relevant OpenGL game:
$ time ./glretrace -b minecraft-perf.trace
/Users/Cortex/Downloads/graalvm-jdk-23.0.1+11.1/bin/java
Rendered 1261 frames in 10.4269 secs, average of 120.937 fps
real 0m10.554s
user 0m12.938s
sys 0m2.712s
This looks fine, but I have no idea how fast the application is supposed to run. Running the same trace with perf reveals that there is room for improvement.
2/3 of frametime is spent parsing the trace.
Implementation
An apitrace trace stores API call information in an object-oriented style. This makes basic codegen really easy because the objects map directly to the generated C/C++ code. However, not all API calls are made equal, and the countless special cases that I needed to handle are what made this project take so long.
glretrace has custom implementations for WSI API calls, and it would be a shame not to use them. The easiest way of doing that is generating a shared library instead of an executable and having glretrace load it. The shared library can then provide a bunch of callbacks for the call sequences we can do codegen for and Call objects for everything else.
Besides WSI, there are also arguments and return values that need special treatment. OpenGL allows the application to create all kinds of objects that are represented using IDs. Those IDs are assigned by the driver, and they can be different during replay. glretrace remaps them using std::maps which have non-trivial overhead. I initially did that as well for the codegen to get things up and running, but it is actually possible to emit global variables and have most of the remapping run logic during codegen.
Data streaming
With the main replay overhead being taken care of, a major amount of replay time is now spent loading texture and buffer data. In large traces, there can also be >10GiB of data, so loading everything upfront is not an option. I decided to create one thread for reading the data file and nproc decompression threads. The read thread will wait if enough data has been loaded to limit memory usage. Decompression threads are needed because decompression is slower than reading the compressed data.
codegen in action
The results speak for themselves:
$ ./glretrace --generate-c minecraft-perf minecraft-perf.trace
/Users/Cortex/Downloads/graalvm-jdk-23.0.1+11.1/bin/java
Rendered 0 frames in 79.4072 secs, average of 0 fps
$ cd minecraft-perf
$ echo "Invoke the superior build tool"
$ meson build --buildtype release
$ ninja -Cbuild
$ time ../glretrace build/minecraft-perf.so
info: Opening 'minecraft-perf.so'... (0.00668795 secs)
warning: Waited 0.0461142 secs for data (sequence = 19)
Rendered 1261 frames in 5.19587 secs, average of 242.693 fps
real 0m5.415s
user 0m5.429s
sys 0m4.983s
Nice.
Looking at perf most CPU time is now spent in driver code or streaming binary data for stuff like textures on a separate thread.
If you are interested in trying this out yourself, feel free to build the upstream PR and report on bugs unintended features. It would also be nice to have DX support in the future, but that will be something for the dxvk developers unless I need something to procrastinate from doing RT work.
- Konstantin
27 Oct 2025 12:00am GMT
15 Oct 2025
planet.freedesktop.org
Simon Ser: Status update, October 2025
Hi!
I skipped last month's status update because I hadn't collected a lot of interesting updates and I've dedicated my time to writing an announcement for the first vali release.
Earlier this month, I've taken the train to Vienna to attend XDC 2025. The conference was great, I really enjoyed discussing face-to-face with open-source graphics folks I usually only interact with online, and meeting new awesome people! Since I'm part of the X.Org Foundation board, it was also nice to see the fruit of our efforts. Many thanks to all organizers!

We've discussed many interesting topics: a new API for 2D acceleration hardware, adapting the Wayland linux-dmabuf protocol to better support multiple GPUs, some ways to address current Wayback limitations, ideas to improve libliftoff, Vulkan use in Wayland compositors, and a lot more.
On the wlroots side, I've worked on a patch to fallback to the renderer to apply gamma LUTs when the KMS driver doesn't support them (this also paves the way for applying color transforms in KMS). Félix Poisot has updated wlroots to support the gamma 2.2 transfer function and use it by default. llyyr has added support for the BT.1886 transfer function and fixed direct scanout for client using the gamma 2.2 transfer function.
I've sent a patch to add support for DisplayID v2 CTA-861 data blocks, required for handling some HDR screens. I've reviewed and merged a bunch of gamescope patches to avoid protocol errors with the color management protocol, fix nested mode under a Vulkan compositor, fix a crash on VT switch and modernize dependencies.
I've worked a bit on drm_info too. I've added a JSON schema to describe the shape of the JSON objects, made it so EDIDs are included in the JSON output as base64-encoded strings, and added the EDID make/model/serial + bus info to the pretty-printed output.
delthas has added soju support for user metadata, introduced a new work-in-progress metadata key to block users, and made it so soju cancels Web Push notifications if a client marks a message as read (to avoid opening notifications for a very short time when actively chatting with another user). Markus Cisler has revamped Goguma's message bubbles: they look much better now!
See you next month!
15 Oct 2025 10:00pm GMT
10 Oct 2025
planet.freedesktop.org
Sebastian Wick: SO_PEERPIDFD Gets More Useful
A while ago I wrote about the limited usefulness of SO_PEERPIDFD. for authenticating sandboxed applications. The core problem was simple: while pidfds gave us a race-free way to identify a process, we still had no standardized way to figure out what that process actually was - which sandbox it ran in, what application it represented, or what permissions it should have.
The situation has improved considerably since then.
cgroup xattrs
Cgroups now support user extended attributes. This feature allows arbitrary metadata to be attached to cgroup inodes using standard xattr calls.
We can change flatpak (or snap, or any other container engine) to create a cgroup for application instances it launches, and attach metadata to it using xattrs. This metadata can include the sandboxing engine, application ID, instance ID, and any other information the compositor or D-Bus service might need.
Every process belongs to a cgroup, and you can query which cgroup a process belongs to through its pidfd - completely race-free.
Standardized Authentication
Remember the complexity from the original post? Services had to implement different lookup mechanisms for different sandbox technologies:
- For flatpak: look in
/proc/$PID/root/.flatpak-info - For snap: shell out to
snap routine portal-info - For firejail: no solution
- …
All of this goes away. Now there's a single path:
- Accept a connection on a socket
- Use
SO_PEERPIDFDto get a pidfd for the client - Query the client's cgroup using the pidfd
- Read the cgroup's user xattrs to get the sandbox metadata
This works the same way regardless of which sandbox engine launched the application.
A Kernel Feature, Not a systemd One
It's worth emphasizing: cgroups are a Linux kernel feature. They have no dependency on systemd or any other userspace component. Any process can manage cgroups and attach xattrs to them. The process only needs appropriate permissions and is restricted to a subtree determined by the cgroup namespace it is in. This makes the approach universally applicable across different init systems and distributions.
To support non-Linux systems, we might even be able to abstract away the cgroup details, by providing a varlink service to register and query running applications. On Linux, this service would use cgroups and xattrs internally.
Replacing Socket-Per-App
The old approach - creating dedicated wayland, D-Bus, etc. sockets for each app instance and attaching metadata to the service which gets mapped to connections on that socket - can now be retired. The pidfd + cgroup xattr approach is simpler: one standardized lookup path instead of mounting special sockets. It works everywhere: any service can authenticate any client without special socket setup. And it's more flexible: metadata can be updated after process creation if needed.
For compositor and D-Bus service developers, this means you can finally implement proper sandboxed client authentication without needing to understand the internals of every container engine. For sandbox developers, it means you have a standardized way to communicate application identity without implementing custom socket mounting schemes.
10 Oct 2025 3:04pm GMT
09 Oct 2025
planet.freedesktop.org
Mike Blumenkrantz: Mesh Shaders In The Current Year
It Happened.
Just a quick post to confirm that the OpenGL/ES Working Group has signed off on the release of GL_EXT_mesh_shader.
Credits
This is a monumental release, the largest extension shipped for GL this decade, and the culmination of many, many months of work by AMD. In particular we all need to thank Qiang Yu (AMD), who spearheaded this initiative and did the vast majority of the work both in writing the specification and doing the core mesa implementation. Shihao Wang (AMD) took on the difficult task of writing actual CTS cases (not mandatory for EXT extensions in GL, so this is a huge benefit to the ecosystem).
Big thanks to both of you, and everyone else behind the scenes at AMD, for making this happen.
Also we have to thank the nvidium project and its author, Cortex, for single-handedly pushing the industry forward through the power of Minecraft modding. Stay sane out there.
Support
Minecraft mod support is already underway, so expect that to happen "soon".
The bones of this extension have already been merged into mesa over the past couple months. I opened a MR to enable zink support this morning since I have already merged the implementation.
Currently, I'm planning to wait until either just before the branch point next week or until RadeonSI merges its support to merge the zink MR. This is out of respect: Qiang Yu did a huge lift for everyone here, and ideally AMD's driver should be the first to be able to advertise that extension to reflect that. But the branchpoint is coming up in a week, and SGC will be going into hibernation at the end of the month until 2026, so this offer does have an expiration date.
In any case, we're done here.
09 Oct 2025 12:00am GMT
03 Oct 2025
planet.freedesktop.org
Simon Ser: Announcing vali, a C library for Varlink
In the past months I've been working on vali, a C library for Varlink. Today I'm publishing the first vali release! I'd like to explain how to use it for readers who aren't especially familiar with Varlink, and describe some interesting API design decisions.
What is Varlink anyways?
Varlink is a very simple Remote Procedure Call (RPC) protocol. Clients can call methods exposed by services (ie, servers). To call a method, a client sends a JSON object with its name and parameters over a Unix socket. To reply to a call, a service sends a JSON object with response parameters. That's it.
Here's an example request with a bar parameter containing an integer:
{
"method": "org.example.service.Foo",
"parameters": {
"bar": 42
}
}
And here's an example response with a baz parameter containing a list of strings:
{
"parameters": {
"baz": ["hello", "world"]
}
}
Varlink also supports calls with no reply or with multiple replies, but let's leave this out of the picture for simplicity's sake.
Varlink services can describe the methods they implement with an interface definition file.
method Foo(bar: int) -> (baz: []string)
Coming from the Wayland world, I love generating code from specification files. This removes all of the manual encoding/decoding boilerplate and is more type-safe. Unfortunately the official libvarlink library doesn't support code generation (and is not actively maintained anymore), so I've decided to write my own. vali is the result!
vali without code generation
To better understand the benefits of code generation and vali design decisions, let's take a minute to have a look at what usage without code generation looks like.
A client first needs to connect via vali_client_connect_unix(), then call vali_client_call() with a JSON object containing input parameters. It'll get back a JSON object containing output parameters, which needs to be parsed.
struct vali_client *client = vali_client_connect_unix("/run/org.example.service");
if (client == NULL) {
fprintf(stderr, "Failed to connect to service\n");
exit(1);
}
struct json_object *in = json_object_new_object();
json_object_object_add(in, "bar", json_object_new_int(42));
struct json_object *out = NULL;
if (!vali_client_call(client, "org.example.service.Foo", in, &out, NULL)) {
fprintf(stderr, "Foo request failed\n");
exit(1);
}
struct json_object *baz = json_object_object_get(out, "baz");
for (size_t i = 0; i < json_object_array_length(baz); i++) {
struct json_object *item = json_object_array_get_idx(baz, i);
printf("%s\n", json_object_get_string(item));
}
This is a fair amount of boilerplate. In case of a type mismatch, the client will silently print nothing, which isn't ideal.
The last parameter of vali_client_call() is an optional struct vali_error *: if set to a non-NULL pointer and the service replies with an error, the struct is populated, otherwise it's zero'ed out:
struct vali_error err;
if (!vali_client_call(client, "org.example.service.Foo", in, &out, &err)) {
if (err.name != NULL) {
fprintf(stderr, "Foo request failed: %s\n", err.name);
} else {
fprintf(stderr, "Foo request failed: internal error\n");
}
vali_error_finish(&err);
exit(1);
}
How does the service side look like? A service first calls vali_service_create() to initialize a fresh service, defines a callback to be invoked when a Varlink call is performed by a client via vali_service_set_call_handler(), and sets up a Unix socket via vali_service_listen_unix(). Let's demonstrate how a service accesses a shared state by printing the number of calls done so far when the callback is invoked. The callback needs to end the call with vali_service_call_close_with_reply().
void handle_call(struct vali_service_call *call, void *user_data) {
int *call_count_ptr = user_data;
(*call_count_ptr)++;
printf("Received %d-th client call\n", *call_count_ptr);
struct json_object *baz = json_object_new_array();
json_object_array_add(baz, json_object_new_string("hello"));
json_object_array_add(baz, json_object_new_string("world"));
struct json_object *params = json_object_new_object();
json_object_object_add(params, "baz", baz);
vali_service_call_close_with_reply(call, params);
}
int main(int argc, void *argv[]) {
int call_count = 0;
struct vali_service_call_handler handler = {
.func = handle_call,
.user_data = &call_count,
};
struct vali_service *service = vali_service_create();
vali_service_set_call_handler(service, &handler);
vali_service_listen_unix(service, "/run/org.example.ftl");
while (vali_service_dispatch(service));
return 0;
}
In a prior iteration of the API, the callback would return the reply JSON object. This got changed to vali_service_call_close_with_reply() so that services can handle a call asynchronously. If a service needs some time to reply (e.g. because it needs to send data over a network, or perform a long computation), it can give back control to its event loop so that other clients are not blocked, and later call vali_service_call_close_with_reply() from another callback.
Why bundle the callback and the user data pointer together in a struct, rather than pass them as two separate parameters to vali_service_set_call_handler()? The answer is two-fold:
- Conceptually, the user data pointer is tied to the callback. Other programming languages with support for lambdas just capture variables. Standard C doesn't have lambdas, and the user data pointer is just a way to pass state to the callback.
- Bundling the callback and the user data pointer together as a single fat pointer unlocks more ergonomic and safer APIs: a function can return a single
struct vali_service_call_handlerwithout making the caller manipulate two separate variables to pass it down tovali_service_set_call_handler()(and risk mixing them up in case there are multiple).
This design makes wrapping a handler much easier (to create middlewares and routers, more on that below). This all might sound familiar to folks who've written an HTTP server: indeed, struct vali_service_call_handler is inspired from Go's net/http.Handler.
Client side with code generation
Given the method definition from the article introduction, vali generates the following client function:
struct example_Foo_in {
int bar;
};
struct example_Foo_out {
char **baz;
size_t baz_len;
};
bool example_Foo(struct vali_client *client,
const struct example_Foo_in *in, struct example_Foo *out,
struct vali_error *err);
It can be used this way to send the JSON request we've seen earlier:
struct vali_client *client = vali_client_connect_unix("/run/org.example.service");
if (client == NULL) {
fprintf(stderr, "Failed to connect to service\n");
exit(1);
}
const struct example_Foo_in in = {
.bar = 42,
};
struct example_Foo_out out;
if (!example_Foo(client, &in, &out, NULL)) {
fprintf(stderr, "Foo request failed\n");
exit(1);
}
for (size_t i = 0; i < out.baz_len; i++) {
printf("%s\n", out.baz[i]);
}
example_Foo_out_finish(&out);
Why does vali generates these twin structs, one for input parameters and the other for output parameters, instead of passing all parameters as function arguments? This does make calls slightly more verbose, but this has a few upsides:
- There is a clear split between input and output parameters, instead of having a variable number of function arguments for each. No need for the API user to remember where input parameters end and when output parameters begin, especially when there are a lot of these.
- On the wire and in the interface definition file, input and output parameters are objects. vali always generates structs for all objects. This is more consistent.
- If a new backwards-compatible version of the interface is published, the newly generated code is also backwards-compatible: old callers will still compile and work fine against the newly generated code. For instance, if a new optional field is added to the input parameters, it will naturally left as NULL by the caller when re-generating the code (because omitted fields are zero-initialized in C).
Service side with code generation
The service side is more complicated because it needs to handle multiple connections concurrently and needs to be asynchronous. Being asynchronous is important to not block other clients when processing a call.
The generator for the service code spits out one struct per method and a function to send a reply (and destroy the call):
struct example_Foo_service_call {
struct vali_service_call *base;
};
void example_Foo_close_with_reply(struct example_Foo_service_call call,
const struct example_Foo_out *params);
The per-call struct wrapping the struct vali_service_call * makes functions sending replies strongly tied to a particular call, and provides type safety: a Foo reply cannot be sent to a Bar call.
Additionally, the generator also provides a handler struct with one callback per method, and a function to obtain a generic handler from an interface handler:
struct example_handler {
void (*Foo)(struct example_Foo_service_call call, const struct example_in *in);
};
struct vali_service_call_handler example_get_call_handler(const struct example_handler *handler);
To use all of these toys, a service implementation can define a handler for the Foo method, then feed the result of example_get_call_handler() to vali_service_set_call_handler():
static void handle_foo(struct example_Foo_service_call call, const struct example_Foo_in *in) {
printf("Foo called with bar=%d\n", in->bar);
example_Foo_close_with_reply(call, &(const struct example_Foo_out){
.baz = (char *[]){ "hello", "world" },
.baz_len = 2,
});
}
static const struct example_handler example_handler = {
.Foo = handle_foo,
};
int main(int argc, void *argv[]) {
struct vali_service *service = vali_service_create();
vali_service_set_call_handler(service, example_get_call_handler(&example_handler));
vali_service_listen_unix(service, "/run/org.example.ftl");
while (vali_service_dispatch(service));
}
Service registry
Some more elaborated services might want to implement more than a single interface. Additionally, services might want to add support for the org.varlink.service interface, which provides introspection: a client can query metadata about the service (e.g. service name, version) and the definition of each interface.
vali makes this easy thanks to struct vali_registry. A service can initialize a new registry via vali_registry_create(), then register each interface by passing its definition and handler to vali_registry_add(). The generated code exposes the interface definition as an example_interface constant. Finally, the registry can be wired up to the struct vali_service by feeding the result of vali_registry_get_call_handler() to vali_service_set_call_handler().
const struct vali_registry_options registry_options = {
.vendor = "emersion",
.product = "example",
.version = "1.0.0",
.url = "https://example.org",
};
struct vali_registry *registry = vali_registry_create(®istry_options);
vali_registry_add(registry, &example_interface,
example_get_call_handler(&example_handler));
vali_registry_add(registry, &another_example_interface,
another_example_get_call_handler(&another_example_handler));
struct vali_service *service = vali_service_create();
vali_service_set_call_handler(service, vali_registry_get_call_handler(registry));
vali_service_listen_unix(service, "/run/org.example.ftl");
while (vali_service_dispatch(service));
This is where the struct vali_service_call_handler fat pointer really shines: the wire-level struct vali_service and the higher-level registry can stay entirely separate. struct vali_service invokes the registry's handler, then the registry is responsible for routing the call to the correct interface-specific handler. The registry's internal state remains hidden away in the handler's opaque user data pointer.
A complete client and service example is available in vali's example/ directory.
What's next?
I plan to leverage vali in the next version of the kanshi Wayland output management daemon.
We've discussed about async on the service side above, but we haven't discussed async on the client side. That can be useful too, especially when a client needs to juggle with multiple sockets, and is still a TODO.
Something I'm still unhappy about is the lack of const fields generated structs. Let's have a look at the struct for output parameters given above:
struct example_Foo_out {
char **baz;
size_t baz_len;
};
If a service has a bunch of const char * variables it wants to send as part of the reply, it needs to cast them to char * or strdup() them. None of these options are great.
static const char hiya[] = "hiya";
static void handle_foo(struct example_Foo_service_call call, const struct example_Foo_in *in) {
example_Foo_close_with_reply(call, &(const struct example_Foo_out){
// Type error: implicit cast from "const char *" to "char *"
.baz = (char *[]){ hiya },
.baz_len = 1,
});
}
On the other hand, making all struct fields const would be cumbersome when dynamically constructing nested structs in replies, and would be a bit of a lie when passing a reply to example_Foo_out_finish() (that function frees all fields).
Generating two structs (one const, one not) is not an option since types are shared between client and service, and some types can be referenced from both a call input and another call's output. Ideally, C would provide a way to propagate const-ness to fields, but that's not a thing. Oh well, that's life.
If you need an IPC mechanism for your tool, please consider giving vali a shot! Feel free to reach out to report any bugs, questions or suggestions.
03 Oct 2025 10:00pm GMT
Iago Toral: XDC 2025
It has been a while since my last post, I know. Today I just want to thank Igalia for continuing to give me and many other Igalians the opportunity to attend XDC. I had a great time in Vienna where I was able to catch up with other Mesa developers (including Igalians!) I rarely have the opportunity to see face to face. It is amazing to see how Mesa continues to gain traction and interest year after year, seeing more actors and vendors getting involved in one way or another… the push for open source drivers in the industry is real and it is fantastic to see it happening.
I'd also like to thank the organization, I know all the work that goes into making these things happen, so big thanks to everyone who was involved, and to the speakers, the XDC program is getting better every year.
Looking forward to next year already 
03 Oct 2025 6:58am GMT
30 Sep 2025
planet.freedesktop.org
Hans de Goede: Fedora 43 will ship with FOSS Meteor, Lunar and Arrow Lake MIPI camera support
Good news the just released 6.17 kernel has support for the IPU7 CSI2 receiver and the missing USBIO drivers have recently landed in linux-next. I have backported the USBIO drivers + a few other camera fixes to the Fedora 6.17 kernel.
I've also prepared an updated libcamera-0.5.2 Fedora package with support for IPU7 (Lunar Lake) CSI2 receivers as well as backporting a set of upstream SwStats and AGC fixes, fixing various crashes as well as the bad flicker MIPI camera users have been hitting with libcamera 0.5.2.
Together these 2 updates should make Fedora 43's FOSS MIPI camera support work on most Meteor Lake, Lunar Lake and Arrow Lake laptops!
If you want to give this a try, install / upgrade to Fedora 43 beta and install all updates. If you've installed rpmfusion's binary IPU6 stack please run:
sudo dnf remove akmod-intel-ipu6 'kmod-intel-ipu6*'
to remove it as it may interfere with the FOSS stack and finally reboot. Please first try with qcam:
sudo dnf install libcamera-qcam
qcam
which only tests libcamera and after that give apps which use the camera through pipewire a try like gnome's "Camera" app (snapshot) or video-conferencing in Firefox.
Note snapshot on Lunar Lake triggers a bug in the LNL Vulkan code, to avoid this start snapshot from a terminal with:
GSK_RENDERER=gl snapshot
If you have a MIPI camera which still does not work please file a bug following these instructions and drop me an email with the bugzilla link at hansg@kernel.org.
comments
30 Sep 2025 6:55pm GMT
24 Sep 2025
planet.freedesktop.org
Sebastian Wick: XDG Intents Updates
Andy Holmes wrote an excellent overview of XDG Intents in his "Best Intentions" blog post, covering the foundational concepts and early proposals. Unfortunately, due to GNOME Foundation issues, this work never fully materialized. As I have been running into more and more cases where this would provide a useful primitive for other features, I tried to continue the work.
The specifications have evolved as I worked on implementing them in glib, desktop-file-utils and ptyxis. Here's what's changed:
Intent-Apps Specification
Andy showed this hypothetical syntax for scoped preferences:
[Default Applications]
org.freedesktop.Thumbnailer=org.gimp.GIMP
org.freedesktop.Thumbnailer[image/svg+xml]=org.gnome.Loupe;org.gimp.GIMP
We now use separate groups instead:
[Default Applications]
org.freedesktop.Thumbnailer=org.gimp.GIMP
[org.freedesktop.Thumbnailer]
image/svg+xml=org.gnome.Loupe;org.gimp.GIMP
This approach creates a dedicated group for each intent, with keys representing the scopes. This way, we do not have to abuse the square brackets which were meant for translatable keys and allow only very limited values.
The updated specification also adds support for intent.cache files to improve performance, containing up-to-date lists of applications supporting particular intents and scopes. This is very similar to the already existing cache for MIME types. The update-desktop-database tool is responsible for keeping the cache up-to-date.
This is implemented in glib!4797, desktop-file-utils!27, and the updated specification is in xdg-specs!106.
Terminal Intent Specification
While Andy mentioned the terminal intent as a use case, Zander Brown tried to upstream the intent in xdg-specs!46 multiple years ago. However, because it depended on the intent-apps specification, it unfortunately never went anywhere. With the fleshed-out version of the intent-apps specification, and an implementation in glib, I was able to implement the terminal-intent specification in glib as well. With some help from Christian, we also added support for the intent in the ptyxis terminal.
This revealed some shortcomings in the proposed D-Bus interface. In particular, when a desktop file gets activated with multiple URIs, and the Exec line in the desktop entry only indicates support for a limited number of URIs, multiple commands need to be launched. To support opening those commands in a single window but in multiple tabs in the terminal emulator, for example, those multiple commands must be part of a single D-Bus method call. The resulting D-Bus interface looks like this:
<interface name="org.freedesktop.Terminal1">
<method name="LaunchCommand">
<arg type='aa{sv}' name='commands' direction='in' />
<arg type='ay' name='desktop_entry' direction='in' />
<arg type='a{sv}' name='options' direction='in' />
<arg type='a{sv}' name='platform_data' direction='in' />
</method>
</interface>
This is implemented in glib!4797, ptyxis!119 and the updated specification is in xdg-specs!107.
Deeplink Intent
Andy's post discussed a generic "org.freedesktop.UriHandler" with this example:
[org.freedesktop.UriHandler]
Supports=wise.com;
Patterns=https://*.wise.com/link?urn=urn%3Awise%3Atransfers;
The updated specification introduces a specific org.freedesktop.handler.Deeplink1 intent where the scheme is implicitly http or https and the host comes from the scope (i.e., the Supports part). The pattern matching is done on the path alone:
[org.freedesktop.handler.Deeplink1]
Supports=example.org;extensions.gnome.org
example.org=/login;/test/a?a
extensions.gnome.org=/extension/*/*/install;/extension/*/*/uninstall
This allows us to focus on deeplinking alone and allows the user to set the order of handlers for specific hosts.
In this example, the app would handle the URIs http://example.org/login, http://example.org/test/aba, http://extensions.gnome.org/extension/123456/BestExtension/install and so on.
There is a draft implementation in glib!4833 and the specification is in xdg-specs!109.
Deeplinking Issues and Other Handlers
I am still unsure about the Deeplink1 intent. Does it make sense to allow schemes other than http and https? If yes, how should the priority of applications be determined when opening a URI? How complex does the pattern matching need to be?
Similarly, should we add an org.freedesktop.handler.Scheme1 intent? We currently abuse MIME handlers for this, so it seems like a good idea, but then we need to take backwards compatibility into account. Maybe we can modify update-desktop-database to add entries from org.freedesktop.handler.Scheme1 to mimeapps.list for that?
If we go down that route, is there a reason not to also do the same for MIME handlers and add an org.freedesktop.handler.Mime1 intent for that purpose with the same backwards compatibility mechanism?
Deeplinking to App Locations
While working on this, I noticed that we are not great at allowing linking to locations in our apps. For example, most email clients do not have a way to link to a specific email. Most calendars do not allow referencing a specific event. Some apps do support this. For example, Zotero allows linking to items in the app with URIs of the form zotero://select/items/0_USN95MJC.
Maybe we can improve on this? If all our apps used a consistent scheme and queries (for example xdg-app-org.example.appid:/some/path/in/the/app?name=Example), we could render those links differently and finally have a nice way to link to an email in our calendar.
This definitely needs more thought, but I do like the idea.
Security Considerations
Allowing apps to describe more thoroughly which URIs they can handle is great, but we also live in a world where security has to be taken into account. If an app wants to handle the URI https://bank.example.org, we better be sure that this app actually is the correct banking app. This unfortunately is not a trivial issue, so I will leave it for the next time.
24 Sep 2025 4:57pm GMT
18 Sep 2025
planet.freedesktop.org
Sebastian Wick: Integrating libdex with GDBus
Writing asynchronous code in C has always been a challenge. Traditional callback-based approaches, including GLib's async/finish pattern, often lead to the so-called callback hell that's difficult to read and maintain. The libdex library offers a solution to this problem, and I recently worked on expanding the integration with GLib's GDBus subsystem.
The Problem with the Sync and Async Patterns
Writing C code involving tasks which can take non-trivial amount of time has traditionally required choosing between two approaches:
- Synchronous calls - Simple to write but block the current thread
- Asynchronous callbacks - Non-blocking but result in callback hell and complex error handling
Often the synchronous variant is chosen to keep the code simple, but in a lot of cases, blocking for potentially multiple seconds is not acceptable. Threads can be used to prevent the other threads from blocking, but it creates parallelism and with it the need for locking. It also can potentially create a huge amount of threads which mostly sit idle.
The asynchronous variant has none of those problems, but consider a typical async D-Bus operation in traditional GLib code:
static void
on_ping_ready (GObject *source_object,
GAsyncResult *res,
gpointer data)
{
g_autofree char *pong = NULL;
if (!dex_dbus_ping_pong_call_ping_finish (DEX_BUS_PING_PONG (source_object),
&pong,
res, NULL))
return; // handle error
g_print ("client: %s\n", pong);
}
static void
on_ping_pong_proxy_ready (GObject *source_object,
GAsyncResult *res,
gpointer data)
{
DexDbusPingPong *pp dex_dbus_ping_pong_proxy_new_finish (res, NULL);
if (!pp)
return; // Handle error
dex_dbus_ping_pong_call_ping (pp, "ping", NULL,
on_ping_ready, NULL);
}
This pattern becomes unwieldy quickly, especially with multiple operations, error handling, shared data and cleanup across multiple callbacks.
What is libdex?
Dex provides Future-based programming for GLib. It provides features for application and library authors who want to structure concurrent code in an easy to manage way. Dex also provides Fibers which allow writing synchronous looking code in C while maintaining the benefits of asynchronous execution.
At its core, libdex introduces two key concepts:
- Futures: Represent values that will be available at some point in the future
- Fibers: Lightweight cooperative threads that allow writing synchronous-looking code that yields control when waiting for asynchronous operations
Futures alone already simplify dealing with asynchronous code by specefying a call chain (dex_future_then(), dex_future_catch(), and dex_future_finally()), or even more elaborate flows (dex_future_all(), dex_future_all_race(), dex_future_any(), and dex_future_first()) at one place, without the typical callback hell. It still requires splitting things into a bunch of functions and potentially moving data through them.
static DexFuture *
lookup_user_data_cb (DexFuture *future,
gpointer user_data)
{
g_autoptr(MyUser) user = NULL;
g_autoptr(GError) error = NULL;
// the future in this cb is already resolved, so this just gets the value
// no fibers involved
user = dex_await_object (future, &error);
if (!user)
return dex_future_new_for_error (g_steal_pointer (&error));
return dex_future_first (dex_timeout_new_seconds (60),
dex_future_any (query_db_server (user),
query_cache_server (user),
NULL),
NULL);
}
static void
print_user_data (void)
{
g_autoptr(DexFuture) future = NULL;
future = dex_future_then (find_user (), lookup_user_data_cb, NULL, NULL);
future = dex_future_then (future, print_user_data_cb, NULL, NULL);
future = dex_future_finally (future, quit_cb, NULL, NULL);
g_main_loop_run (main_loop);
}
The real magic of libdex however lies in fibers and the dex_await() function, which allows you to write code that looks synchronous but executes asynchronously. When you await a future, the current fiber yields control, allowing other work to proceed while waiting for the result.
g_autoptr(MyUser) user = NULL;
g_autoptr(MyUserData) data = NULL;
g_autoptr(GError) error = NULL;
user = dex_await_object (find_user (), &error);
if (!user)
return dex_future_new_for_error (g_steal_pointer (&error));
data = dex_await_boxed (dex_future_first (dex_timeout_new_seconds (60),
dex_future_any (query_db_server (user),
query_cache_server (user),
NULL),
NULL), &error);
if (!data)
return dex_future_new_for_error (g_steal_pointer (&error));
g_print ("%s", data->name);
Christian Hergert wrote pretty decent documentation, so check it out!
Bridging libdex and GDBus
With the new integration, you can write D-Bus client code that looks like this:
g_autoptr(DexDbusPingPong) *pp = NULL;
g_autoptr(DexDbusPingPongPingResult) result = NULL;
pp = dex_await_object (dex_dbus_ping_pong_proxy_new_future (connection,
G_DBUS_PROXY_FLAGS_NONE,
"org.example.PingPong",
"/org/example/pingpong"),
&error);
if (!pp)
return dex_future_new_for_error (g_steal_pointer (&error));
res = dex_await_boxed (dex_dbus_ping_pong_call_ping_future (pp, "ping"), &error);
if (!res)
return dex_future_new_for_error (g_steal_pointer (&error));
g_print ("client: %s\n", res->pong);
This code is executing asynchronously, but reads like synchronous code. Error handling is straightforward, and there are no callbacks involved.
On the service side, if enabled, method handlers will run in a fiber and can use dex_await() directly, enabling complex asynchronous operations within service implementations:
static gboolean
handle_ping (DexDbusPingPong *object,
GDBusMethodInvocation *invocation,
const char *ping)
{
g_print ("service: %s\n", ping);
dex_await (dex_timeout_new_seconds (1), NULL);
dex_dbus_ping_pong_complete_ping (object, invocation, "pong");
return G_DBUS_METHOD_INVOCATION_HANDLED;
}
static void
dex_dbus_ping_pong_iface_init (DexDbusPingPongIface *iface)
{
iface->handle_ping = handle_ping;
}
pp = g_object_new (DEX_TYPE_PING_PONG, NULL);
dex_dbus_interface_skeleton_set_flags (DEX_DBUS_INTERFACE_SKELETON (pp),
DEX_DBUS_INTERFACE_SKELETON_FLAGS_HANDLE_METHOD_INVOCATIONS_IN_FIBER);
This method handler includes a 1-second delay, but instead of blocking the entire service, it yields control to other fibers during the timeout.
The merge request contains a complete example of a client and service communicating with each other.
Implementation Details
The integration required extending GDBus's code generation system. Rather than modifying it directly, the current solution introduces a very simple extension system to GDBus' code generation.
The generated code includes:
- Future-returning functions: For every
_proxy_new()and_call_$method()function, corresponding_future()variants are generated - Result types: Method calls return boxed types containing all output parameters
- Custom skeleton base class: Generated skeleton classes inherit from
DexDBusInterfaceSkeletoninstead ofGDBusInterfaceSkeleton, which implements dispatching method handlers in fibers
Besides the GDBus code generation extension system, there are a few more changes required in GLib to make this work. This is not merged at the time of writing, but I'm confident that we can move this forward.
Future Directions
I hope that this work convinces more people to use libdex! We have a whole bunch of existing code bases which will have to stick with C in the foreseeable future, and libdex provides tools to make incremental improvements. Personally, I want to start using in in the xdg-desktop-portal project.
18 Sep 2025 6:58pm GMT
16 Sep 2025
planet.freedesktop.org
Mike Blumenkrantz: Now We CAD
Perf Must Increase.
After my last post, I'm sure everyone was speculating about the forthcoming zink takeover of the CAD industry. Or maybe just wondering why I'm bothering with this at all. Well, the answer is simple: CAD performance is all performance. If I can improve FPS in viewperf, I'm decreasing CPU utilization in all apps, which is generally useful.
As in the previous post, the catia section of viewperf was improved to a whopping 34fps against the reference driver (radeonsi) by eliminating a few hundred thousand atomic operations per frame. An interesting observation here is that while eliminating atomic operations in radeonsi does improve FPS there by ~5% (105fps), there is no bottlenecking, so this does not "unlock" further optimizations in the same way that it does for zink. I speculate this is because zink has radv underneath, which affects memory access across ccx in ways that do not affect radeonsi.
In short: a rising tide lifts all ships in the harbor, but since zink was effectively a sunken ship, it is rising much more than the others.
Even More Improvements
Since that previous post, I and others have been working quietly in the background on other improvements, all of which have landed in mesa main already:
A nice 35% improvement, largely from three MRs:
That's right. In my quest to maximize perf, I have roped in veteran radv developer and part-time vacation enthusiast, Samuel Pitoiset. Because radv is slow. vkoverhead exists to target noticeably slow cases, and by harnessing the forbidden power of rewriting the whole driver, it was possible for a lone Frenchman to significantly reduce bottlenecking during draw emission.
This Isn't Even My Final Form
Obviously. I'm not about to say that I'll only stop when I reach performance parity, but the FPS can still go up.
At this point, however, it's becoming less useful (in zink) to look at flamegraphs. There's only so much optimization that can be done once the code has been simplified to a certain extent, and eventually those optimizations will lead to obfuscated code which is harder to maintain.
Thus, it's time to step back and look architecturally. What is the app doing? How does that reach the driver? Can it be improved?
GALLIUM_TRACE is a great tool for this, as it logs the API stream as it reaches the backend driver, and there are parser tools to convert the output XML to something readable. Let's take a look at a small cross-section of the trace:
pipe_context::set_vertex_buffers(pipe = context_2, num_buffers = 2, buffers = [[is_user_buffer = 0, buffer_offset = 0, buffer.resource = resource_10043], [is_user_buffer = 0, buffer_offset = 7440, buffer.resource = resource_10043]])
pipe_context::draw_vbo(pipe = context_2, info = [index_size = 2, has_user_indices = 0, mode = 5, start_instance = 0, instance_count = 1, min_index = 0, max_index = 1257, primitive_restart = 0, restart_index = 0, index.resource = resource_10044], drawid_offset = 0, indirect = NULL, draws = [[start = 0, count = 1257, index_bias = 0]], num_draws = 1)
pipe_context::set_vertex_buffers(pipe = context_2, num_buffers = 2, buffers = [[is_user_buffer = 0, buffer_offset = 0, buffer.resource = resource_10045], [is_user_buffer = 0, buffer_offset = 7632, buffer.resource = resource_10045]])
pipe_context::draw_vbo(pipe = context_2, info = [index_size = 2, has_user_indices = 0, mode = 5, start_instance = 0, instance_count = 1, min_index = 0, max_index = 1257, primitive_restart = 0, restart_index = 0, index.resource = resource_10046], drawid_offset = 0, indirect = NULL, draws = [[start = 0, count = 1257, index_bias = 0]], num_draws = 1)
pipe_context::set_vertex_buffers(pipe = context_2, num_buffers = 2, buffers = [[is_user_buffer = 0, buffer_offset = 0, buffer.resource = resource_10047], [is_user_buffer = 0, buffer_offset = 7680, buffer.resource = resource_10047]])
pipe_context::draw_vbo(pipe = context_2, info = [index_size = 2, has_user_indices = 0, mode = 5, start_instance = 0, instance_count = 1, min_index = 0, max_index = 1257, primitive_restart = 0, restart_index = 0, index.resource = resource_10048], drawid_offset = 0, indirect = NULL, draws = [[start = 0, count = 1257, index_bias = 0]], num_draws = 1)
pipe_context::set_vertex_buffers(pipe = context_2, num_buffers = 2, buffers = [[is_user_buffer = 0, buffer_offset = 0, buffer.resource = resource_10049], [is_user_buffer = 0, buffer_offset = 7656, buffer.resource = resource_10049]])
pipe_context::draw_vbo(pipe = context_2, info = [index_size = 2, has_user_indices = 0, mode = 5, start_instance = 0, instance_count = 1, min_index = 0, max_index = 1257, primitive_restart = 0, restart_index = 0, index.resource = resource_10050], drawid_offset = 0, indirect = NULL, draws = [[start = 0, count = 1257, index_bias = 0]], num_draws = 1)
pipe_context::set_vertex_buffers(pipe = context_2, num_buffers = 2, buffers = [[is_user_buffer = 0, buffer_offset = 0, buffer.resource = resource_10051], [is_user_buffer = 0, buffer_offset = 7752, buffer.resource = resource_10051]])
pipe_context::draw_vbo(pipe = context_2, info = [index_size = 2, has_user_indices = 0, mode = 5, start_instance = 0, instance_count = 1, min_index = 0, max_index = 1257, primitive_restart = 0, restart_index = 0, index.resource = resource_10052], drawid_offset = 0, indirect = NULL, draws = [[start = 0, count = 1257, index_bias = 0]], num_draws = 1)
pipe_context::set_vertex_buffers(pipe = context_2, num_buffers = 2, buffers = [[is_user_buffer = 0, buffer_offset = 0, buffer.resource = resource_10053], [is_user_buffer = 0, buffer_offset = 7800, buffer.resource = resource_10053]])
pipe_context::draw_vbo(pipe = context_2, info = [index_size = 2, has_user_indices = 0, mode = 5, start_instance = 0, instance_count = 1, min_index = 0, max_index = 1257, primitive_restart = 0, restart_index = 0, index.resource = resource_10054], drawid_offset = 0, indirect = NULL, draws = [[start = 0, count = 1257, index_bias = 0]], num_draws = 1)
pipe_context::set_vertex_buffers(pipe = context_2, num_buffers = 2, buffers = [[is_user_buffer = 0, buffer_offset = 0, buffer.resource = resource_10055], [is_user_buffer = 0, buffer_offset = 7968, buffer.resource = resource_10055]])
pipe_context::draw_vbo(pipe = context_2, info = [index_size = 2, has_user_indices = 0, mode = 5, start_instance = 0, instance_count = 1, min_index = 0, max_index = 1257, primitive_restart = 0, restart_index = 0, index.resource = resource_10056], drawid_offset = 0, indirect = NULL, draws = [[start = 0, count = 1257, index_bias = 0]], num_draws = 1)
pipe_context::set_vertex_buffers(pipe = context_2, num_buffers = 2, buffers = [[is_user_buffer = 0, buffer_offset = 0, buffer.resource = resource_10057], [is_user_buffer = 0, buffer_offset = 7968, buffer.resource = resource_10057]])
pipe_context::draw_vbo(pipe = context_2, info = [index_size = 2, has_user_indices = 0, mode = 5, start_instance = 0, instance_count = 1, min_index = 0, max_index = 1257, primitive_restart = 0, restart_index = 0, index.resource = resource_10058], drawid_offset = 0, indirect = NULL, draws = [[start = 0, count = 1257, index_bias = 0]], num_draws = 1)
pipe_context::set_vertex_buffers(pipe = context_2, num_buffers = 2, buffers = [[is_user_buffer = 0, buffer_offset = 0, buffer.resource = resource_10059], [is_user_buffer = 0, buffer_offset = 8136, buffer.resource = resource_10059]])
pipe_context::draw_vbo(pipe = context_2, info = [index_size = 2, has_user_indices = 0, mode = 5, start_instance = 0, instance_count = 1, min_index = 0, max_index = 1257, primitive_restart = 0, restart_index = 0, index.resource = resource_10060], drawid_offset = 0, indirect = NULL, draws = [[start = 0, count = 1257, index_bias = 0]], num_draws = 1)
pipe_context::set_vertex_buffers(pipe = context_2, num_buffers = 2, buffers = [[is_user_buffer = 0, buffer_offset = 0, buffer.resource = resource_10061], [is_user_buffer = 0, buffer_offset = 8280, buffer.resource = resource_10061]])
pipe_context::draw_vbo(pipe = context_2, info = [index_size = 2, has_user_indices = 0, mode = 5, start_instance = 0, instance_count = 1, min_index = 0, max_index = 1257, primitive_restart = 0, restart_index = 0, index.resource = resource_10062], drawid_offset = 0, indirect = NULL, draws = [[start = 0, count = 1257, index_bias = 0]], num_draws = 1)
pipe_context::set_vertex_buffers(pipe = context_2, num_buffers = 2, buffers = [[is_user_buffer = 0, buffer_offset = 0, buffer.resource = resource_10063], [is_user_buffer = 0, buffer_offset = 8040, buffer.resource = resource_10063]])
pipe_context::draw_vbo(pipe = context_2, info = [index_size = 2, has_user_indices = 0, mode = 5, start_instance = 0, instance_count = 1, min_index = 0, max_index = 1257, primitive_restart = 0, restart_index = 0, index.resource = resource_10064], drawid_offset = 0, indirect = NULL, draws = [[start = 0, count = 1257, index_bias = 0]], num_draws = 1)
pipe_context::set_vertex_buffers(pipe = context_2, num_buffers = 2, buffers = [[is_user_buffer = 0, buffer_offset = 0, buffer.resource = resource_10065], [is_user_buffer = 0, buffer_offset = 7608, buffer.resource = resource_10065]])
pipe_context::draw_vbo(pipe = context_2, info = [index_size = 2, has_user_indices = 0, mode = 5, start_instance = 0, instance_count = 1, min_index = 0, max_index = 1257, primitive_restart = 0, restart_index = 0, index.resource = resource_10066], drawid_offset = 0, indirect = NULL, draws = [[start = 0, count = 1257, index_bias = 0]], num_draws = 1)
As expected, a huge chunk of the runtime is just set_vertex_buffers -> draw_vbo. Architecturally, this leads to a lot of unavoidably wasted cycles in drivers:
set_vertex_buffers"binds" vertex buffers to the context and flags state updatesdraw_vbochecks all of the driver's update-able states, updates the flagged ones, and then emits draws
But in the scenario where the driver can know ahead of time exactly what states will be updated, couldn't that yield an improvement? For example, bundling these two calls into a single draw call would eliminate:
- "binding" of vertex buffers
- vbo state update flagging
- draw-time validation
- calling multiple driver entrypoints
In theory, it seems like this should be pretty good. And now that vertex buffer lifetimes have been reworked to use explicit ownership rather than garbage collection, it's actually possible to do this. The optimal site for the optimization would be in threaded-context, where similar types of draw merging are already occurring.
The result looks something like this in a comparable trace:
pipe_context::draw_vbo_buffers(pipe = pipe_2, info = [index_size = 0, has_user_indices = 0, mode = 5, start_instance = 0, instance_count = 1, min_index = 1141, max_index = 5, primitive_restart = 0, restart_index = 0, index.resource = NULL], buffer_count = 3, buffers = [[is_user_buffer = 0, buffer_offset = 163536, buffer.resource = resource_30210], [is_user_buffer = 0, buffer_offset = 191032, buffer.resource = resource_30210], [is_user_buffer = 0, buffer_offset = 771328, buffer.resource = resource_29602]], draws = [[start = 1141, count = 5]], num_draws = 1)
pipe_context::draw_vbo_buffers(pipe = pipe_2, info = [index_size = 0, has_user_indices = 0, mode = 5, start_instance = 0, instance_count = 1, min_index = 1146, max_index = 5, primitive_restart = 0, restart_index = 0, index.resource = NULL], buffer_count = 3, buffers = [[is_user_buffer = 0, buffer_offset = 218528, buffer.resource = resource_30210], [is_user_buffer = 0, buffer_offset = 246144, buffer.resource = resource_30210], [is_user_buffer = 0, buffer_offset = 771360, buffer.resource = resource_29602]], draws = [[start = 1146, count = 5]], num_draws = 1)
pipe_context::draw_vbo_buffers(pipe = pipe_2, info = [index_size = 0, has_user_indices = 0, mode = 5, start_instance = 0, instance_count = 1, min_index = 1151, max_index = 5, primitive_restart = 0, restart_index = 0, index.resource = NULL], buffer_count = 3, buffers = [[is_user_buffer = 0, buffer_offset = 273760, buffer.resource = resource_30210], [is_user_buffer = 0, buffer_offset = 301496, buffer.resource = resource_30210], [is_user_buffer = 0, buffer_offset = 771392, buffer.resource = resource_29602]], draws = [[start = 1151, count = 5]], num_draws = 1)
pipe_context::draw_vbo_buffers(pipe = pipe_2, info = [index_size = 0, has_user_indices = 0, mode = 5, start_instance = 0, instance_count = 1, min_index = 1156, max_index = 5, primitive_restart = 0, restart_index = 0, index.resource = NULL], buffer_count = 3, buffers = [[is_user_buffer = 0, buffer_offset = 329232, buffer.resource = resource_30210], [is_user_buffer = 0, buffer_offset = 357088, buffer.resource = resource_30210], [is_user_buffer = 0, buffer_offset = 771424, buffer.resource = resource_29602]], draws = [[start = 1156, count = 5]], num_draws = 1)
pipe_context::draw_vbo_buffers(pipe = pipe_2, info = [index_size = 0, has_user_indices = 0, mode = 5, start_instance = 0, instance_count = 1, min_index = 1161, max_index = 5, primitive_restart = 0, restart_index = 0, index.resource = NULL], buffer_count = 3, buffers = [[is_user_buffer = 0, buffer_offset = 384944, buffer.resource = resource_30210], [is_user_buffer = 0, buffer_offset = 412920, buffer.resource = resource_30210], [is_user_buffer = 0, buffer_offset = 771456, buffer.resource = resource_29602]], draws = [[start = 1161, count = 5]], num_draws = 1)
pipe_context::draw_vbo_buffers(pipe = pipe_2, info = [index_size = 0, has_user_indices = 0, mode = 5, start_instance = 0, instance_count = 1, min_index = 1166, max_index = 5, primitive_restart = 0, restart_index = 0, index.resource = NULL], buffer_count = 3, buffers = [[is_user_buffer = 0, buffer_offset = 440896, buffer.resource = resource_30210], [is_user_buffer = 0, buffer_offset = 468992, buffer.resource = resource_30210], [is_user_buffer = 0, buffer_offset = 771488, buffer.resource = resource_29602]], draws = [[start = 1166, count = 5]], num_draws = 1)
pipe_context::draw_vbo_buffers(pipe = pipe_2, info = [index_size = 0, has_user_indices = 0, mode = 5, start_instance = 0, instance_count = 1, min_index = 1171, max_index = 5, primitive_restart = 0, restart_index = 0, index.resource = NULL], buffer_count = 3, buffers = [[is_user_buffer = 0, buffer_offset = 497088, buffer.resource = resource_30210], [is_user_buffer = 0, buffer_offset = 525304, buffer.resource = resource_30210], [is_user_buffer = 0, buffer_offset = 771520, buffer.resource = resource_29602]], draws = [[start = 1171, count = 5]], num_draws = 1)
pipe_context::draw_vbo_buffers(pipe = pipe_2, info = [index_size = 0, has_user_indices = 0, mode = 5, start_instance = 0, instance_count = 1, min_index = 1176, max_index = 11, primitive_restart = 0, restart_index = 0, index.resource = NULL], buffer_count = 3, buffers = [[is_user_buffer = 0, buffer_offset = 553520, buffer.resource = resource_30210], [is_user_buffer = 0, buffer_offset = 582000, buffer.resource = resource_30210], [is_user_buffer = 0, buffer_offset = 771552, buffer.resource = resource_29602]], draws = [[start = 1176, count = 11]], num_draws = 1)
pipe_context::draw_vbo_buffers(pipe = pipe_2, info = [index_size = 0, has_user_indices = 0, mode = 5, start_instance = 0, instance_count = 1, min_index = 1187, max_index = 5, primitive_restart = 0, restart_index = 0, index.resource = NULL], buffer_count = 3, buffers = [[is_user_buffer = 0, buffer_offset = 610480, buffer.resource = resource_30210], [is_user_buffer = 0, buffer_offset = 639080, buffer.resource = resource_30210], [is_user_buffer = 0, buffer_offset = 771584, buffer.resource = resource_29602]], draws = [[start = 1187, count = 5]], num_draws = 1)
pipe_context::draw_vbo_buffers(pipe = pipe_2, info = [index_size = 0, has_user_indices = 0, mode = 5, start_instance = 0, instance_count = 1, min_index = 1192, max_index = 6, primitive_restart = 0, restart_index = 0, index.resource = NULL], buffer_count = 3, buffers = [[is_user_buffer = 0, buffer_offset = 667680, buffer.resource = resource_30210], [is_user_buffer = 0, buffer_offset = 696424, buffer.resource = resource_30210], [is_user_buffer = 0, buffer_offset = 771616, buffer.resource = resource_29602]], draws = [[start = 1192, count = 6]], num_draws = 1)
pipe_context::draw_vbo_buffers(pipe = pipe_2, info = [index_size = 0, has_user_indices = 0, mode = 5, start_instance = 0, instance_count = 1, min_index = 1198, max_index = 5, primitive_restart = 0, restart_index = 0, index.resource = NULL], buffer_count = 3, buffers = [[is_user_buffer = 0, buffer_offset = 725168, buffer.resource = resource_30210], [is_user_buffer = 0, buffer_offset = 754032, buffer.resource = resource_30210], [is_user_buffer = 0, buffer_offset = 771648, buffer.resource = resource_29602]], draws = [[start = 1198, count = 5]], num_draws = 1)
pipe_context::draw_vbo_buffers(pipe = pipe_2, info = [index_size = 0, has_user_indices = 0, mode = 5, start_instance = 0, instance_count = 1, min_index = 1203, max_index = 5, primitive_restart = 0, restart_index = 0, index.resource = NULL], buffer_count = 3, buffers = [[is_user_buffer = 0, buffer_offset = 782896, buffer.resource = resource_30210], [is_user_buffer = 0, buffer_offset = 811880, buffer.resource = resource_30210], [is_user_buffer = 0, buffer_offset = 771680, buffer.resource = resource_29602]], draws = [[start = 1203, count = 5]], num_draws = 1)
It's more compact, which is nice, but how does the perf look?
About another 40% improvement, now over 60fps: nearly double the endpoint of the last post. Huge.
And this is driving ecosystem improvements which will affect other apps and games which don't even use zink.
Stay winning, Open Source graphics.
16 Sep 2025 12:00am GMT
15 Sep 2025
planet.freedesktop.org
Dave Airlie (blogspot): radv takes over from AMDVLK
AMD have announced the end of the AMDVLK open driver in favour of focusing on radv for Linux use cases.
When Bas and I started radv in 2016, AMD were promising their own Linux vulkan driver, which arrived in Dec 2017. At this point radv was already shipping in most Linux distros. AMD strategy of having AMDVLK was developed via over the wall open source releases from internal closed development was always going to be a second place option at that point.
When Valve came on board and brought dedicated developer power to radv, and the aco compiler matured, there really was no point putting effort into using AMDVLK which was hard to package and impossible to contribute to meaningfully for external developers.
radv is probably my proudest contribution to the Linux ecosystem, finally disproving years of idiots saying an open source driver could never compete with a vendor provided driver, now it is the vendor provided driver.
I think we will miss the open source PAL repo as a reference source and I hope AMD engineers can bridge that gap, but it's often hard to find workarounds you don't know exist to ask about them. I'm also hoping AMD will add more staffing beyond the current levels especially around hardware enablement and workarounds.
Now onwards to NVK victory :-)
[1] https://github.com/GPUOpen-Drivers/AMDVLK/discussions/416
15 Sep 2025 7:08pm GMT
09 Sep 2025
planet.freedesktop.org
Christian Schaller: More adventures in the land of AI and Open Source
I been doing a lot of work with AI recently, both as part of a couple of projects I am part of at work, but I have also taken a personal interest in understanding the current state and what is possible. My favourite AI tool currently is Claude.ai. Anyway I have a Prusa Core One 3D printer now that I also love playing with and one thing I been wanting to do is to print some multicolor prints with it. So the Prusa Core One is a single extruder printer, which means it only has 1 filament loaded at any given time. Other printers on the market, like the PrusaXL has 5 extruders, so it can have 5 filaments or colors loaded at the same time.
Prusa Single Extruder Multimaterial setting
The thing is that the Prusa Slicer (the slicer is the software that takes a 3d model and prepares the instructions for the printer based on that 3d model) got this feature called Single Extruder Multi Material. And while it is a process that wastes a lot of filament and takes a lot of manual intervention during the print, it does basically work.
What I quickly discovered was that using this feature is non-trivial. First of all I had to manually add some G Code to the model to actually get it to ask me to switch filament for each color in my print, but the bigger issue is that the printer will ask you to change the color or filament, but you have no way of knowing which one to switch to, so for my model I had 15 filament changes and no simple way of knowing which order to switch in. So people where solving this among other things through looking through the print layer by layer and writing down the color changes, but I thought that this must be possible to automate with an application. So I opened Claude and started working on this thing I ended up calling Prusa Color Mate..
So the idea for the application was simple enough, have it analyze the project file, extract information about the order of color changes and display them for the user in a way that allows them to manually check of each color as its inserted. So I started off with doing a simple python script that would just print to the console. So it quickly turned out that the hard part of this project was to parse the input files and it was made worse by my ignorance. So what I learned the hard way is that if you store a project in Prusa Slicer it will use this format called 3mf. So my thought was, lets just analyze the 3mf file and extract the information I need. It took my quite a bit of back and forth with Claude, feeding claude source code from Prusa's implementation and pdf files with specifications, but eventually the application did spit out a list of 15 toolchanges and the colors associated with them. So I happily tried to use it to print my model. I quickly discovered that the color ordering was all wrong. And after even more back and forth with Claude and reading online I realized that the 3mf file is a format for storing 3d models, but that is not what is being fed your 3d printer, instead for the printer the file provided is a bgcode file. And while the 3mf file did contain the information that you had to change filament 15 times, the information on in which order is simply not stored in the 3mf file as that is something chosen as part of composing your print. That print composition file is using a file format called bgcode. So I now had to extract the information from the bgcode file which took me basically a full day to figure out with the help of Claude. I could probably have gotten over the finish line sooner by making some better choices underway, but the extreme optimism of the AI probably lead me to believe it was going to be easier than it was to for instance just do everything in Python.
At first I tried using this libbgcode library written in C++, but I had a lot of issues getting Claude to incorporate it properly into my project, with Meson and CMAKE interaction issues (in retrospect I should have just made a quick RPM of libbgcode and used that). After a lot of struggles with this Claude thought that parsing the bgcode file in python natively would be easier than trying to use the C++ library, so I went down that route. I started by feeding Claude a description of the format that I found online and asked it to write me a parser for it. It didn't work very well and I ended up having a lot of back and forth, testing and debugging, finding more documentation, including a blog post about this meatpack format used inside the file, but it still didn't really work very well. In the end what probably helped the most was asking it to use the relevant files from libbgcode and Prusa Slicer as documentation, because even if that too took a lot of back and forth, eventually I had a working application that was able to extract the tool change data and associated colors from the file. I ended up using one external dependency which was the heatshrink2 library that I PIP installed, but while that worked correctly, it took a look time for me and Claude to figure out exactly what parameters to feed it to work with the Prusa generated file.
Screenshot of Prusa Color Mate
So know I had the working application going and was able to verify it with my first print. I even polished it up a little, by also adding detection of the manual filament change code, so that people who try to use the application will be made aware they need to add that through Prusa Slicer. Maybe I could bake that into the tool, but atm I got only bgcode decoders, not encoders, in my project.
Warning showed for missing G Code
Dialog that gives detailed instructions for how to add G Code
So to conclude, it probably took me 2.5 days to write this application using Claude, it is a fairly niche tool, so I don't expect a lot of users, but I made it to solve a problem for myself. If I had to write this pre-AI myself it would have taken me weeks, like figuring out the different formats and how library APIs worked etc. would have taken me a long time. So I am not an especially proficient coder, so a better coder than me could probably put this together quicker than I would, but I think this is part of what I think will change with AI, that even with limited time and technical skills you can put together simple applications like this to solve your own problems.
If you are a Prusa Core One user and would like to play with multicolor prints you can find Prusa Color Mate on Gitlab. I have not tested it on any other system or printer than my own, so I don't even know if it will work with other non-Core One Prusa printers. There are rpms for Fedora you can download in the packaging directory of the gitlab repo, which also includes a RPM for the heatshrink2 library.
As for future plans for this application I don't really have any. It solves my issue the way it is today, but if there turns out to be an interested user community out there maybe I will try to clean it up and create a proper flatpak for it.
09 Sep 2025 2:39pm GMT
Mike Blumenkrantz: Big Lifts
New Record
For months now I've been writing increasingly unhinged patchsets. Sometimes it might even seem like there is no real point to what I'm doing. Or that I'm just churning code to have something to do.
But I'm here today to tell you that finally, the long journey is over.
We have reached the promised land of perf.
Huge.
Many months ago, I began examining viewperf, AKA the final frontier of driver performance. What makes this the final frontier? some of you might be asking.
Imagine an application which does 10,000 individual draws per frame, each with their own vertex buffer bindings. That's a lot of draws.
Now imagine an application which does ten times that many draws per frame. This is viewperf, which represents common use cases of CAD-adjacent technologies. Where other applications might hammer on the GPU, viewperf tests the CPU utilization. It's what separates the real developers from average, sane people.
So all those months ago, I ran viewperf on zink, and I ended up here:
18fps. This is on threadripper 5975WX with RADV; not the most modern or powerful CPU, but it's still pretty quick.
Then I loaded up radeonsi and got 100fps. Brutal.
Plumbing The Abyss
Examining this was where I entered into into realms of insanity not known to mere mortals. perf started to fail and give confusing results, other profilers just drew a circle around the driver and pointed to the whole thing as the problem area, and some tools just gave up entirely. No changes affected the performance in any way. This is when the savvy hacker begins profiling by elimination: delete as much code as possible and try to force changes.
Thus, I deleted a lot of code to see what would pop out, and eventually I discovered the horrifying truth: I was being bottlenecked by the sheer number of atomic operations occurring.
Like I said before, viewperf does upwards of 100,000 draw calls per frame. This means 100,000 draw calls, 100,000 vertex buffer binds (times two because there are two vertex buffers), 100,000 index buffer binds, and a few shader changes sprinkled in. The way that mesa/gallium work means that every single vertex buffer and index buffer which get sent to the driver incur multiple atomic operations (each) along the way for refcounting: because gallium uses refcounting rather than an ownership model since it is much easier to manage. That means we're talking about upwards of 300,000 atomic operations per frame.
Unfortunately, hackily deleting all the refcounting made the FPS go brrrrr, and it was a long road to legitimately get there. A very, very long road. Six months, in fact. But all the unhinged MRs above landed, reducing the surface area of the refcounting to just buffers, which put me in a position to do this pro gamer move where I also am removing all the refcounting from the buffers.
This works, roughly speaking, by enforcing ownership on the buffers and then releasing them when they are no longer used. Sounds simple, but plumbing it through all the gallium drivers without breaking everything was less so. Let's see where moving to that model gets the numbers:
One more frame. Tremendous.
But wait, there's more. The other part of that MR further deletes all the refcounting in zink for buffers, fully removing the atomics. And…
Blammo, that doubles the perf and manages to eliminate the bottleneck, which sets the stage for further improvements. The gap is still large, but it's about to close real fast.
Shout out to Marek for heroically undertaking the review of this leviathan.
09 Sep 2025 12:00am GMT
05 Sep 2025
planet.freedesktop.org
Mike Blumenkrantz: Mesh Shader Progress
VKCTS Tests: 27,890 | GLCTS Tests: 227 | Percentage of Vulkan Drivers With Mesh Bugs: 100%
05 Sep 2025 12:00am GMT
03 Sep 2025
planet.freedesktop.org
Hans de Goede: Leaving Red Hat
After 17 years I feel that it is time to change things up a bit and for a new challenge. I'm leaving Red Hat and my last day at Red Hat will be October 31st.
I would like to thank Red Hat for the opportunity to work on many interesting open-source projects during my time at Red Hat and for all the things I've learned while at Red Hat.
I want to use this opportunity to thank everyone I've worked with, both my great Red Hat colleagues, as well as everyone from the community for all the good times during the last 17 years.
I've a pretty good idea of what will come next, but this is not set in stone yet. I definitely will continue to work on open-source and on Linux hw-enablement.
comments
03 Sep 2025 6:46pm GMT








