18 Sep 2025

feedplanet.freedesktop.org

Sebastian Wick: Integrating libdex with GDBus

Writing asynchronous code in C has always been a challenge. Traditional callback-based approaches, including GLib's async/finish pattern, often lead to the so-called callback hell that's difficult to read and maintain. The libdex library offers a solution to this problem, and I recently worked on expanding the integration with GLib's GDBus subsystem.

The Problem with the Sync and Async Patterns

Writing C code involving tasks which can take non-trivial amount of time has traditionally required choosing between two approaches:

  1. Synchronous calls - Simple to write but block the current thread
  2. Asynchronous callbacks - Non-blocking but result in callback hell and complex error handling

Often the synchronous variant is chosen to keep the code simple, but in a lot of cases, blocking for potentially multiple seconds is not acceptable. Threads can be used to prevent the other threads from blocking, but it creates parallelism and with it the need for locking. It also can potentially create a huge amount of threads which mostly sit idle.

The asynchronous variant has none of those problems, but consider a typical async D-Bus operation in traditional GLib code:

static void
on_ping_ready (GObject      *source_object,
               GAsyncResult *res,
               gpointer      data)
{
  g_autofree char *pong = NULL;

  if (!dex_dbus_ping_pong_call_ping_finish (DEX_BUS_PING_PONG (source_object),
                                            &pong,
                                            res, NULL))
    return; // handle error

  g_print ("client: %s\n", pong);
}

static void
on_ping_pong_proxy_ready (GObject      *source_object,
                          GAsyncResult *res,
                          gpointer      data)
{
  DexDbusPingPong *pp dex_dbus_ping_pong_proxy_new_finish (res, NULL);
  if (!pp)
    return; // Handle error

  dex_dbus_ping_pong_call_ping (pp, "ping", NULL,
                                on_ping_ready, NULL);
}

This pattern becomes unwieldy quickly, especially with multiple operations, error handling, shared data and cleanup across multiple callbacks.

What is libdex?

Dex provides Future-based programming for GLib. It provides features for application and library authors who want to structure concurrent code in an easy to manage way. Dex also provides Fibers which allow writing synchronous looking code in C while maintaining the benefits of asynchronous execution.

At its core, libdex introduces two key concepts:

Futures alone already simplify dealing with asynchronous code by specefying a call chain (dex_future_then(), dex_future_catch(), and dex_future_finally()), or even more elaborate flows (dex_future_all(), dex_future_all_race(), dex_future_any(), and dex_future_first()) at one place, without the typical callback hell. It still requires splitting things into a bunch of functions and potentially moving data through them.

static DexFuture *
lookup_user_data_cb (DexFuture *future,
                     gpointer   user_data)
{
  g_autoptr(MyUser) user = NULL;
  g_autoptr(GError) error = NULL;

  // the future in this cb is already resolved, so this just gets the value
  // no fibers involved 
  user = dex_await_object (future, &error);
  if (!user)
    return dex_future_new_for_error (g_steal_pointer (&error));

  return dex_future_first (dex_timeout_new_seconds (60),
                           dex_future_any (query_db_server (user),
                                           query_cache_server (user),
                                           NULL),
                           NULL);
}

static void
print_user_data (void)
{
  g_autoptr(DexFuture) future = NULL;

  future = dex_future_then (find_user (), lookup_user_data_cb, NULL, NULL);
  future = dex_future_then (future, print_user_data_cb, NULL, NULL);
  future = dex_future_finally (future, quit_cb, NULL, NULL);

  g_main_loop_run (main_loop);
}

The real magic of libdex however lies in fibers and the dex_await() function, which allows you to write code that looks synchronous but executes asynchronously. When you await a future, the current fiber yields control, allowing other work to proceed while waiting for the result.

g_autoptr(MyUser) user = NULL;
g_autoptr(MyUserData) data = NULL;
g_autoptr(GError) error = NULL;

user = dex_await_object (find_user (), &error);
if (!user)
  return dex_future_new_for_error (g_steal_pointer (&error));

data = dex_await_boxed (dex_future_first (dex_timeout_new_seconds (60),
                                          dex_future_any (query_db_server (user),
                                                          query_cache_server (user),
                                                          NULL),
                                          NULL), &error);
if (!data)
  return dex_future_new_for_error (g_steal_pointer (&error));

g_print ("%s", data->name);

Christian Hergert wrote pretty decent documentation, so check it out!

Bridging libdex and GDBus

With the new integration, you can write D-Bus client code that looks like this:

g_autoptr(DexDbusPingPong) *pp = NULL;
g_autoptr(DexDbusPingPongPingResult) result = NULL;

pp = dex_await_object (dex_dbus_ping_pong_proxy_new_future (connection,
                                                            G_DBUS_PROXY_FLAGS_NONE,
                                                            "org.example.PingPong",
                                                            "/org/example/pingpong"),
                       &error);
if (!pp)
  return dex_future_new_for_error (g_steal_pointer (&error));

res = dex_await_boxed (dex_dbus_ping_pong_call_ping_future (pp, "ping"), &error);
if (!res)
  return dex_future_new_for_error (g_steal_pointer (&error));

g_print ("client: %s\n", res->pong);

This code is executing asynchronously, but reads like synchronous code. Error handling is straightforward, and there are no callbacks involved.

On the service side, if enabled, method handlers will run in a fiber and can use dex_await() directly, enabling complex asynchronous operations within service implementations:

static gboolean
handle_ping (DexDbusPingPong       *object,
             GDBusMethodInvocation *invocation,
             const char            *ping)
{
  g_print ("service: %s\n", ping);

  dex_await (dex_timeout_new_seconds (1), NULL);
  dex_dbus_ping_pong_complete_ping (object, invocation, "pong");

  return G_DBUS_METHOD_INVOCATION_HANDLED;
}

static void
dex_dbus_ping_pong_iface_init (DexDbusPingPongIface *iface)
{
  iface->handle_ping = handle_ping;
}
pp = g_object_new (DEX_TYPE_PING_PONG, NULL);
dex_dbus_interface_skeleton_set_flags (DEX_DBUS_INTERFACE_SKELETON (pp),
                                       DEX_DBUS_INTERFACE_SKELETON_FLAGS_HANDLE_METHOD_INVOCATIONS_IN_FIBER);

This method handler includes a 1-second delay, but instead of blocking the entire service, it yields control to other fibers during the timeout.

The merge request contains a complete example of a client and service communicating with each other.

Implementation Details

The integration required extending GDBus's code generation system. Rather than modifying it directly, the current solution introduces a very simple extension system to GDBus' code generation.

The generated code includes:

Besides the GDBus code generation extension system, there are a few more changes required in GLib to make this work. This is not merged at the time of writing, but I'm confident that we can move this forward.

Future Directions

I hope that this work convinces more people to use libdex! We have a whole bunch of existing code bases which will have to stick with C in the foreseeable future, and libdex provides tools to make incremental improvements. Personally, I want to start using in in the xdg-desktop-portal project.

18 Sep 2025 6:58pm GMT

16 Sep 2025

feedplanet.freedesktop.org

Mike Blumenkrantz: Now We CAD

Perf Must Increase.

After my last post, I'm sure everyone was speculating about the forthcoming zink takeover of the CAD industry. Or maybe just wondering why I'm bothering with this at all. Well, the answer is simple: CAD performance is all performance. If I can improve FPS in viewperf, I'm decreasing CPU utilization in all apps, which is generally useful.

As in the previous post, the catia section of viewperf was improved to a whopping 34fps against the reference driver (radeonsi) by eliminating a few hundred thousand atomic operations per frame. An interesting observation here is that while eliminating atomic operations in radeonsi does improve FPS there by ~5% (105fps), there is no bottlenecking, so this does not "unlock" further optimizations in the same way that it does for zink. I speculate this is because zink has radv underneath, which affects memory access across ccx in ways that do not affect radeonsi.

In short: a rising tide lifts all ships in the harbor, but since zink was effectively a sunken ship, it is rising much more than the others.

Even More Improvements

Since that previous post, I and others have been working quietly in the background on other improvements, all of which have landed in mesa main already:

catia-quietly.png

A nice 35% improvement, largely from three MRs:

That's right. In my quest to maximize perf, I have roped in veteran radv developer and part-time vacation enthusiast, Samuel Pitoiset. Because radv is slow. vkoverhead exists to target noticeably slow cases, and by harnessing the forbidden power of rewriting the whole driver, it was possible for a lone Frenchman to significantly reduce bottlenecking during draw emission.

This Isn't Even My Final Form

Obviously. I'm not about to say that I'll only stop when I reach performance parity, but the FPS can still go up.

At this point, however, it's becoming less useful (in zink) to look at flamegraphs. There's only so much optimization that can be done once the code has been simplified to a certain extent, and eventually those optimizations will lead to obfuscated code which is harder to maintain.

Thus, it's time to step back and look architecturally. What is the app doing? How does that reach the driver? Can it be improved?

GALLIUM_TRACE is a great tool for this, as it logs the API stream as it reaches the backend driver, and there are parser tools to convert the output XML to something readable. Let's take a look at a small cross-section of the trace:

pipe_context::set_vertex_buffers(pipe = context_2, num_buffers = 2, buffers = [[is_user_buffer = 0, buffer_offset = 0, buffer.resource = resource_10043], [is_user_buffer = 0, buffer_offset = 7440, buffer.resource = resource_10043]])
pipe_context::draw_vbo(pipe = context_2, info = [index_size = 2, has_user_indices = 0, mode = 5, start_instance = 0, instance_count = 1, min_index = 0, max_index = 1257, primitive_restart = 0, restart_index = 0, index.resource = resource_10044], drawid_offset = 0, indirect = NULL, draws = [[start = 0, count = 1257, index_bias = 0]], num_draws = 1)
pipe_context::set_vertex_buffers(pipe = context_2, num_buffers = 2, buffers = [[is_user_buffer = 0, buffer_offset = 0, buffer.resource = resource_10045], [is_user_buffer = 0, buffer_offset = 7632, buffer.resource = resource_10045]])
pipe_context::draw_vbo(pipe = context_2, info = [index_size = 2, has_user_indices = 0, mode = 5, start_instance = 0, instance_count = 1, min_index = 0, max_index = 1257, primitive_restart = 0, restart_index = 0, index.resource = resource_10046], drawid_offset = 0, indirect = NULL, draws = [[start = 0, count = 1257, index_bias = 0]], num_draws = 1)
pipe_context::set_vertex_buffers(pipe = context_2, num_buffers = 2, buffers = [[is_user_buffer = 0, buffer_offset = 0, buffer.resource = resource_10047], [is_user_buffer = 0, buffer_offset = 7680, buffer.resource = resource_10047]])
pipe_context::draw_vbo(pipe = context_2, info = [index_size = 2, has_user_indices = 0, mode = 5, start_instance = 0, instance_count = 1, min_index = 0, max_index = 1257, primitive_restart = 0, restart_index = 0, index.resource = resource_10048], drawid_offset = 0, indirect = NULL, draws = [[start = 0, count = 1257, index_bias = 0]], num_draws = 1)
pipe_context::set_vertex_buffers(pipe = context_2, num_buffers = 2, buffers = [[is_user_buffer = 0, buffer_offset = 0, buffer.resource = resource_10049], [is_user_buffer = 0, buffer_offset = 7656, buffer.resource = resource_10049]])
pipe_context::draw_vbo(pipe = context_2, info = [index_size = 2, has_user_indices = 0, mode = 5, start_instance = 0, instance_count = 1, min_index = 0, max_index = 1257, primitive_restart = 0, restart_index = 0, index.resource = resource_10050], drawid_offset = 0, indirect = NULL, draws = [[start = 0, count = 1257, index_bias = 0]], num_draws = 1)
pipe_context::set_vertex_buffers(pipe = context_2, num_buffers = 2, buffers = [[is_user_buffer = 0, buffer_offset = 0, buffer.resource = resource_10051], [is_user_buffer = 0, buffer_offset = 7752, buffer.resource = resource_10051]])
pipe_context::draw_vbo(pipe = context_2, info = [index_size = 2, has_user_indices = 0, mode = 5, start_instance = 0, instance_count = 1, min_index = 0, max_index = 1257, primitive_restart = 0, restart_index = 0, index.resource = resource_10052], drawid_offset = 0, indirect = NULL, draws = [[start = 0, count = 1257, index_bias = 0]], num_draws = 1)
pipe_context::set_vertex_buffers(pipe = context_2, num_buffers = 2, buffers = [[is_user_buffer = 0, buffer_offset = 0, buffer.resource = resource_10053], [is_user_buffer = 0, buffer_offset = 7800, buffer.resource = resource_10053]])
pipe_context::draw_vbo(pipe = context_2, info = [index_size = 2, has_user_indices = 0, mode = 5, start_instance = 0, instance_count = 1, min_index = 0, max_index = 1257, primitive_restart = 0, restart_index = 0, index.resource = resource_10054], drawid_offset = 0, indirect = NULL, draws = [[start = 0, count = 1257, index_bias = 0]], num_draws = 1)
pipe_context::set_vertex_buffers(pipe = context_2, num_buffers = 2, buffers = [[is_user_buffer = 0, buffer_offset = 0, buffer.resource = resource_10055], [is_user_buffer = 0, buffer_offset = 7968, buffer.resource = resource_10055]])
pipe_context::draw_vbo(pipe = context_2, info = [index_size = 2, has_user_indices = 0, mode = 5, start_instance = 0, instance_count = 1, min_index = 0, max_index = 1257, primitive_restart = 0, restart_index = 0, index.resource = resource_10056], drawid_offset = 0, indirect = NULL, draws = [[start = 0, count = 1257, index_bias = 0]], num_draws = 1)
pipe_context::set_vertex_buffers(pipe = context_2, num_buffers = 2, buffers = [[is_user_buffer = 0, buffer_offset = 0, buffer.resource = resource_10057], [is_user_buffer = 0, buffer_offset = 7968, buffer.resource = resource_10057]])
pipe_context::draw_vbo(pipe = context_2, info = [index_size = 2, has_user_indices = 0, mode = 5, start_instance = 0, instance_count = 1, min_index = 0, max_index = 1257, primitive_restart = 0, restart_index = 0, index.resource = resource_10058], drawid_offset = 0, indirect = NULL, draws = [[start = 0, count = 1257, index_bias = 0]], num_draws = 1)
pipe_context::set_vertex_buffers(pipe = context_2, num_buffers = 2, buffers = [[is_user_buffer = 0, buffer_offset = 0, buffer.resource = resource_10059], [is_user_buffer = 0, buffer_offset = 8136, buffer.resource = resource_10059]])
pipe_context::draw_vbo(pipe = context_2, info = [index_size = 2, has_user_indices = 0, mode = 5, start_instance = 0, instance_count = 1, min_index = 0, max_index = 1257, primitive_restart = 0, restart_index = 0, index.resource = resource_10060], drawid_offset = 0, indirect = NULL, draws = [[start = 0, count = 1257, index_bias = 0]], num_draws = 1)
pipe_context::set_vertex_buffers(pipe = context_2, num_buffers = 2, buffers = [[is_user_buffer = 0, buffer_offset = 0, buffer.resource = resource_10061], [is_user_buffer = 0, buffer_offset = 8280, buffer.resource = resource_10061]])
pipe_context::draw_vbo(pipe = context_2, info = [index_size = 2, has_user_indices = 0, mode = 5, start_instance = 0, instance_count = 1, min_index = 0, max_index = 1257, primitive_restart = 0, restart_index = 0, index.resource = resource_10062], drawid_offset = 0, indirect = NULL, draws = [[start = 0, count = 1257, index_bias = 0]], num_draws = 1)
pipe_context::set_vertex_buffers(pipe = context_2, num_buffers = 2, buffers = [[is_user_buffer = 0, buffer_offset = 0, buffer.resource = resource_10063], [is_user_buffer = 0, buffer_offset = 8040, buffer.resource = resource_10063]])
pipe_context::draw_vbo(pipe = context_2, info = [index_size = 2, has_user_indices = 0, mode = 5, start_instance = 0, instance_count = 1, min_index = 0, max_index = 1257, primitive_restart = 0, restart_index = 0, index.resource = resource_10064], drawid_offset = 0, indirect = NULL, draws = [[start = 0, count = 1257, index_bias = 0]], num_draws = 1)
pipe_context::set_vertex_buffers(pipe = context_2, num_buffers = 2, buffers = [[is_user_buffer = 0, buffer_offset = 0, buffer.resource = resource_10065], [is_user_buffer = 0, buffer_offset = 7608, buffer.resource = resource_10065]])
pipe_context::draw_vbo(pipe = context_2, info = [index_size = 2, has_user_indices = 0, mode = 5, start_instance = 0, instance_count = 1, min_index = 0, max_index = 1257, primitive_restart = 0, restart_index = 0, index.resource = resource_10066], drawid_offset = 0, indirect = NULL, draws = [[start = 0, count = 1257, index_bias = 0]], num_draws = 1)

As expected, a huge chunk of the runtime is just set_vertex_buffers -> draw_vbo. Architecturally, this leads to a lot of unavoidably wasted cycles in drivers:

But in the scenario where the driver can know ahead of time exactly what states will be updated, couldn't that yield an improvement? For example, bundling these two calls into a single draw call would eliminate:

In theory, it seems like this should be pretty good. And now that vertex buffer lifetimes have been reworked to use explicit ownership rather than garbage collection, it's actually possible to do this. The optimal site for the optimization would be in threaded-context, where similar types of draw merging are already occurring.

The result looks something like this in a comparable trace:

pipe_context::draw_vbo_buffers(pipe = pipe_2, info = [index_size = 0, has_user_indices = 0, mode = 5, start_instance = 0, instance_count = 1, min_index = 1141, max_index = 5, primitive_restart = 0, restart_index = 0, index.resource = NULL], buffer_count = 3, buffers = [[is_user_buffer = 0, buffer_offset = 163536, buffer.resource = resource_30210], [is_user_buffer = 0, buffer_offset = 191032, buffer.resource = resource_30210], [is_user_buffer = 0, buffer_offset = 771328, buffer.resource = resource_29602]], draws = [[start = 1141, count = 5]], num_draws = 1)
pipe_context::draw_vbo_buffers(pipe = pipe_2, info = [index_size = 0, has_user_indices = 0, mode = 5, start_instance = 0, instance_count = 1, min_index = 1146, max_index = 5, primitive_restart = 0, restart_index = 0, index.resource = NULL], buffer_count = 3, buffers = [[is_user_buffer = 0, buffer_offset = 218528, buffer.resource = resource_30210], [is_user_buffer = 0, buffer_offset = 246144, buffer.resource = resource_30210], [is_user_buffer = 0, buffer_offset = 771360, buffer.resource = resource_29602]], draws = [[start = 1146, count = 5]], num_draws = 1)
pipe_context::draw_vbo_buffers(pipe = pipe_2, info = [index_size = 0, has_user_indices = 0, mode = 5, start_instance = 0, instance_count = 1, min_index = 1151, max_index = 5, primitive_restart = 0, restart_index = 0, index.resource = NULL], buffer_count = 3, buffers = [[is_user_buffer = 0, buffer_offset = 273760, buffer.resource = resource_30210], [is_user_buffer = 0, buffer_offset = 301496, buffer.resource = resource_30210], [is_user_buffer = 0, buffer_offset = 771392, buffer.resource = resource_29602]], draws = [[start = 1151, count = 5]], num_draws = 1)
pipe_context::draw_vbo_buffers(pipe = pipe_2, info = [index_size = 0, has_user_indices = 0, mode = 5, start_instance = 0, instance_count = 1, min_index = 1156, max_index = 5, primitive_restart = 0, restart_index = 0, index.resource = NULL], buffer_count = 3, buffers = [[is_user_buffer = 0, buffer_offset = 329232, buffer.resource = resource_30210], [is_user_buffer = 0, buffer_offset = 357088, buffer.resource = resource_30210], [is_user_buffer = 0, buffer_offset = 771424, buffer.resource = resource_29602]], draws = [[start = 1156, count = 5]], num_draws = 1)
pipe_context::draw_vbo_buffers(pipe = pipe_2, info = [index_size = 0, has_user_indices = 0, mode = 5, start_instance = 0, instance_count = 1, min_index = 1161, max_index = 5, primitive_restart = 0, restart_index = 0, index.resource = NULL], buffer_count = 3, buffers = [[is_user_buffer = 0, buffer_offset = 384944, buffer.resource = resource_30210], [is_user_buffer = 0, buffer_offset = 412920, buffer.resource = resource_30210], [is_user_buffer = 0, buffer_offset = 771456, buffer.resource = resource_29602]], draws = [[start = 1161, count = 5]], num_draws = 1)
pipe_context::draw_vbo_buffers(pipe = pipe_2, info = [index_size = 0, has_user_indices = 0, mode = 5, start_instance = 0, instance_count = 1, min_index = 1166, max_index = 5, primitive_restart = 0, restart_index = 0, index.resource = NULL], buffer_count = 3, buffers = [[is_user_buffer = 0, buffer_offset = 440896, buffer.resource = resource_30210], [is_user_buffer = 0, buffer_offset = 468992, buffer.resource = resource_30210], [is_user_buffer = 0, buffer_offset = 771488, buffer.resource = resource_29602]], draws = [[start = 1166, count = 5]], num_draws = 1)
pipe_context::draw_vbo_buffers(pipe = pipe_2, info = [index_size = 0, has_user_indices = 0, mode = 5, start_instance = 0, instance_count = 1, min_index = 1171, max_index = 5, primitive_restart = 0, restart_index = 0, index.resource = NULL], buffer_count = 3, buffers = [[is_user_buffer = 0, buffer_offset = 497088, buffer.resource = resource_30210], [is_user_buffer = 0, buffer_offset = 525304, buffer.resource = resource_30210], [is_user_buffer = 0, buffer_offset = 771520, buffer.resource = resource_29602]], draws = [[start = 1171, count = 5]], num_draws = 1)
pipe_context::draw_vbo_buffers(pipe = pipe_2, info = [index_size = 0, has_user_indices = 0, mode = 5, start_instance = 0, instance_count = 1, min_index = 1176, max_index = 11, primitive_restart = 0, restart_index = 0, index.resource = NULL], buffer_count = 3, buffers = [[is_user_buffer = 0, buffer_offset = 553520, buffer.resource = resource_30210], [is_user_buffer = 0, buffer_offset = 582000, buffer.resource = resource_30210], [is_user_buffer = 0, buffer_offset = 771552, buffer.resource = resource_29602]], draws = [[start = 1176, count = 11]], num_draws = 1)
pipe_context::draw_vbo_buffers(pipe = pipe_2, info = [index_size = 0, has_user_indices = 0, mode = 5, start_instance = 0, instance_count = 1, min_index = 1187, max_index = 5, primitive_restart = 0, restart_index = 0, index.resource = NULL], buffer_count = 3, buffers = [[is_user_buffer = 0, buffer_offset = 610480, buffer.resource = resource_30210], [is_user_buffer = 0, buffer_offset = 639080, buffer.resource = resource_30210], [is_user_buffer = 0, buffer_offset = 771584, buffer.resource = resource_29602]], draws = [[start = 1187, count = 5]], num_draws = 1)
pipe_context::draw_vbo_buffers(pipe = pipe_2, info = [index_size = 0, has_user_indices = 0, mode = 5, start_instance = 0, instance_count = 1, min_index = 1192, max_index = 6, primitive_restart = 0, restart_index = 0, index.resource = NULL], buffer_count = 3, buffers = [[is_user_buffer = 0, buffer_offset = 667680, buffer.resource = resource_30210], [is_user_buffer = 0, buffer_offset = 696424, buffer.resource = resource_30210], [is_user_buffer = 0, buffer_offset = 771616, buffer.resource = resource_29602]], draws = [[start = 1192, count = 6]], num_draws = 1)
pipe_context::draw_vbo_buffers(pipe = pipe_2, info = [index_size = 0, has_user_indices = 0, mode = 5, start_instance = 0, instance_count = 1, min_index = 1198, max_index = 5, primitive_restart = 0, restart_index = 0, index.resource = NULL], buffer_count = 3, buffers = [[is_user_buffer = 0, buffer_offset = 725168, buffer.resource = resource_30210], [is_user_buffer = 0, buffer_offset = 754032, buffer.resource = resource_30210], [is_user_buffer = 0, buffer_offset = 771648, buffer.resource = resource_29602]], draws = [[start = 1198, count = 5]], num_draws = 1)
pipe_context::draw_vbo_buffers(pipe = pipe_2, info = [index_size = 0, has_user_indices = 0, mode = 5, start_instance = 0, instance_count = 1, min_index = 1203, max_index = 5, primitive_restart = 0, restart_index = 0, index.resource = NULL], buffer_count = 3, buffers = [[is_user_buffer = 0, buffer_offset = 782896, buffer.resource = resource_30210], [is_user_buffer = 0, buffer_offset = 811880, buffer.resource = resource_30210], [is_user_buffer = 0, buffer_offset = 771680, buffer.resource = resource_29602]], draws = [[start = 1203, count = 5]], num_draws = 1)

It's more compact, which is nice, but how does the perf look?

catia-vroom.png

About another 40% improvement, now over 60fps: nearly double the endpoint of the last post. Huge.

And this is driving ecosystem improvements which will affect other apps and games which don't even use zink.

Stay winning, Open Source graphics.

16 Sep 2025 12:00am GMT

15 Sep 2025

feedplanet.freedesktop.org

Dave Airlie (blogspot): radv takes over from AMDVLK


AMD have announced the end of the AMDVLK open driver in favour of focusing on radv for Linux use cases.

When Bas and I started radv in 2016, AMD were promising their own Linux vulkan driver, which arrived in Dec 2017. At this point radv was already shipping in most Linux distros. AMD strategy of having AMDVLK was developed via over the wall open source releases from internal closed development was always going to be a second place option at that point.

When Valve came on board and brought dedicated developer power to radv, and the aco compiler matured, there really was no point putting effort into using AMDVLK which was hard to package and impossible to contribute to meaningfully for external developers.

radv is probably my proudest contribution to the Linux ecosystem, finally disproving years of idiots saying an open source driver could never compete with a vendor provided driver, now it is the vendor provided driver.

I think we will miss the open source PAL repo as a reference source and I hope AMD engineers can bridge that gap, but it's often hard to find workarounds you don't know exist to ask about them. I'm also hoping AMD will add more staffing beyond the current levels especially around hardware enablement and workarounds.

Now onwards to NVK victory :-)

[1] https://github.com/GPUOpen-Drivers/AMDVLK/discussions/416

15 Sep 2025 7:08pm GMT

09 Sep 2025

feedplanet.freedesktop.org

Christian Schaller: More adventures in the land of AI and Open Source

I been doing a lot of work with AI recently, both as part of a couple of projects I am part of at work, but I have also taken a personal interest in understanding the current state and what is possible. My favourite AI tool currently is Claude.ai. Anyway I have a Prusa Core One 3D printer now that I also love playing with and one thing I been wanting to do is to print some multicolor prints with it. So the Prusa Core One is a single extruder printer, which means it only has 1 filament loaded at any given time. Other printers on the market, like the PrusaXL has 5 extruders, so it can have 5 filaments or colors loaded at the same time.

Prusa Single Extruder Multimaterial setting

Prusa Single Extruder Multimaterial setting


The thing is that the Prusa Slicer (the slicer is the software that takes a 3d model and prepares the instructions for the printer based on that 3d model) got this feature called Single Extruder Multi Material. And while it is a process that wastes a lot of filament and takes a lot of manual intervention during the print, it does basically work.

What I quickly discovered was that using this feature is non-trivial. First of all I had to manually add some G Code to the model to actually get it to ask me to switch filament for each color in my print, but the bigger issue is that the printer will ask you to change the color or filament, but you have no way of knowing which one to switch to, so for my model I had 15 filament changes and no simple way of knowing which order to switch in. So people where solving this among other things through looking through the print layer by layer and writing down the color changes, but I thought that this must be possible to automate with an application. So I opened Claude and started working on this thing I ended up calling Prusa Color Mate..

So the idea for the application was simple enough, have it analyze the project file, extract information about the order of color changes and display them for the user in a way that allows them to manually check of each color as its inserted. So I started off with doing a simple python script that would just print to the console. So it quickly turned out that the hard part of this project was to parse the input files and it was made worse by my ignorance. So what I learned the hard way is that if you store a project in Prusa Slicer it will use this format called 3mf. So my thought was, lets just analyze the 3mf file and extract the information I need. It took my quite a bit of back and forth with Claude, feeding claude source code from Prusa's implementation and pdf files with specifications, but eventually the application did spit out a list of 15 toolchanges and the colors associated with them. So I happily tried to use it to print my model. I quickly discovered that the color ordering was all wrong. And after even more back and forth with Claude and reading online I realized that the 3mf file is a format for storing 3d models, but that is not what is being fed your 3d printer, instead for the printer the file provided is a bgcode file. And while the 3mf file did contain the information that you had to change filament 15 times, the information on in which order is simply not stored in the 3mf file as that is something chosen as part of composing your print. That print composition file is using a file format called bgcode. So I now had to extract the information from the bgcode file which took me basically a full day to figure out with the help of Claude. I could probably have gotten over the finish line sooner by making some better choices underway, but the extreme optimism of the AI probably lead me to believe it was going to be easier than it was to for instance just do everything in Python.
At first I tried using this libbgcode library written in C++, but I had a lot of issues getting Claude to incorporate it properly into my project, with Meson and CMAKE interaction issues (in retrospect I should have just made a quick RPM of libbgcode and used that). After a lot of struggles with this Claude thought that parsing the bgcode file in python natively would be easier than trying to use the C++ library, so I went down that route. I started by feeding Claude a description of the format that I found online and asked it to write me a parser for it. It didn't work very well and I ended up having a lot of back and forth, testing and debugging, finding more documentation, including a blog post about this meatpack format used inside the file, but it still didn't really work very well. In the end what probably helped the most was asking it to use the relevant files from libbgcode and Prusa Slicer as documentation, because even if that too took a lot of back and forth, eventually I had a working application that was able to extract the tool change data and associated colors from the file. I ended up using one external dependency which was the heatshrink2 library that I PIP installed, but while that worked correctly, it took a look time for me and Claude to figure out exactly what parameters to feed it to work with the Prusa generated file.

Screenshot of Prusa Color Mate

Screenshot of Prusa Color Mate

So know I had the working application going and was able to verify it with my first print. I even polished it up a little, by also adding detection of the manual filament change code, so that people who try to use the application will be made aware they need to add that through Prusa Slicer. Maybe I could bake that into the tool, but atm I got only bgcode decoders, not encoders, in my project.

Missing G Code warning

Warning showed for missing G Code Dialog that gives detailed instructions for how to add G Code Dialog that gives detailed instructions for how to add G Code

So to conclude, it probably took me 2.5 days to write this application using Claude, it is a fairly niche tool, so I don't expect a lot of users, but I made it to solve a problem for myself. If I had to write this pre-AI myself it would have taken me weeks, like figuring out the different formats and how library APIs worked etc. would have taken me a long time. So I am not an especially proficient coder, so a better coder than me could probably put this together quicker than I would, but I think this is part of what I think will change with AI, that even with limited time and technical skills you can put together simple applications like this to solve your own problems.

If you are a Prusa Core One user and would like to play with multicolor prints you can find Prusa Color Mate on Gitlab. I have not tested it on any other system or printer than my own, so I don't even know if it will work with other non-Core One Prusa printers. There are rpms for Fedora you can download in the packaging directory of the gitlab repo, which also includes a RPM for the heatshrink2 library.

As for future plans for this application I don't really have any. It solves my issue the way it is today, but if there turns out to be an interested user community out there maybe I will try to clean it up and create a proper flatpak for it.

09 Sep 2025 2:39pm GMT

Mike Blumenkrantz: Big Lifts

New Record

For months now I've been writing increasingly unhinged patchsets. Sometimes it might even seem like there is no real point to what I'm doing. Or that I'm just churning code to have something to do.

But I'm here today to tell you that finally, the long journey is over.

We have reached the promised land of perf.

Huge.

Many months ago, I began examining viewperf, AKA the final frontier of driver performance. What makes this the final frontier? some of you might be asking.

Imagine an application which does 10,000 individual draws per frame, each with their own vertex buffer bindings. That's a lot of draws.

Now imagine an application which does ten times that many draws per frame. This is viewperf, which represents common use cases of CAD-adjacent technologies. Where other applications might hammer on the GPU, viewperf tests the CPU utilization. It's what separates the real developers from average, sane people.

So all those months ago, I ran viewperf on zink, and I ended up here:

catia-before.png

18fps. This is on threadripper 5975WX with RADV; not the most modern or powerful CPU, but it's still pretty quick.

Then I loaded up radeonsi and got 100fps. Brutal.

Plumbing The Abyss

Examining this was where I entered into into realms of insanity not known to mere mortals. perf started to fail and give confusing results, other profilers just drew a circle around the driver and pointed to the whole thing as the problem area, and some tools just gave up entirely. No changes affected the performance in any way. This is when the savvy hacker begins profiling by elimination: delete as much code as possible and try to force changes.

Thus, I deleted a lot of code to see what would pop out, and eventually I discovered the horrifying truth: I was being bottlenecked by the sheer number of atomic operations occurring.

Like I said before, viewperf does upwards of 100,000 draw calls per frame. This means 100,000 draw calls, 100,000 vertex buffer binds (times two because there are two vertex buffers), 100,000 index buffer binds, and a few shader changes sprinkled in. The way that mesa/gallium work means that every single vertex buffer and index buffer which get sent to the driver incur multiple atomic operations (each) along the way for refcounting: because gallium uses refcounting rather than an ownership model since it is much easier to manage. That means we're talking about upwards of 300,000 atomic operations per frame.

Unfortunately, hackily deleting all the refcounting made the FPS go brrrrr, and it was a long road to legitimately get there. A very, very long road. Six months, in fact. But all the unhinged MRs above landed, reducing the surface area of the refcounting to just buffers, which put me in a position to do this pro gamer move where I also am removing all the refcounting from the buffers.

This works, roughly speaking, by enforcing ownership on the buffers and then releasing them when they are no longer used. Sounds simple, but plumbing it through all the gallium drivers without breaking everything was less so. Let's see where moving to that model gets the numbers:

catia-during.png

One more frame. Tremendous.

But wait, there's more. The other part of that MR further deletes all the refcounting in zink for buffers, fully removing the atomics. And…

catia-after.png

Blammo, that doubles the perf and manages to eliminate the bottleneck, which sets the stage for further improvements. The gap is still large, but it's about to close real fast.

Shout out to Marek for heroically undertaking the review of this leviathan.

09 Sep 2025 12:00am GMT

05 Sep 2025

feedplanet.freedesktop.org

Mike Blumenkrantz: Mesh Shader Progress

VKCTS Tests: 27,890 | GLCTS Tests: 227 | Percentage of Vulkan Drivers With Mesh Bugs: 100%

meshcts.png

05 Sep 2025 12:00am GMT

03 Sep 2025

feedplanet.freedesktop.org

Hans de Goede: Leaving Red Hat

After 17 years I feel that it is time to change things up a bit and for a new challenge. I'm leaving Red Hat and my last day at Red Hat will be October 31st.

I would like to thank Red Hat for the opportunity to work on many interesting open-source projects during my time at Red Hat and for all the things I've learned while at Red Hat.

I want to use this opportunity to thank everyone I've worked with, both my great Red Hat colleagues, as well as everyone from the community for all the good times during the last 17 years.

I've a pretty good idea of what will come next, but this is not set in stone yet. I definitely will continue to work on open-source and on Linux hw-enablement.

comment count unavailable comments

03 Sep 2025 6:46pm GMT

29 Aug 2025

feedplanet.freedesktop.org

Mike Blumenkrantz: Tiler Improvements

Super Late Code

Meant to blog about this last quarter, but somehow another two months went by and here we are.

A while back, I did some work to improve zink performance on tiling GPUs. Namely this entailed adding renderpass tracking into threaded-context, and also implementing command stream reordering, and inlining swapchain resolves, and framebuffer discards, and actually maybe it's more than just "some" work. All of this amounted to improved performance by reducing memory bandwidth.

How much improved performance? All of it.

And then, around two months ago, a colleague told me he was no longer going to use zink on his tiling GPU.

Devastated

Some of you noticed that the blog has gone quiet in recent times. I'm going to take this opportunity to foist all the blame onto that colleague: to preserve his identity, let's just call him Gabe.

Gabe came to me a few months ago and told me zink was too slow. Vulkan was better. Faster. More "reliable".

I said there's no way that could be true; I've put way more bugs into Vulkan than I have into zink.

Unblinking, he stared at me across the digital divide. I task-switched to important whitespace cleanups.

Time passed, and I pulled myself together. I compiled some app traces. Analyzed them. Did some deep thinking. There was one place where zink indeed could be less performant than this "Vulkan" thing. The final frontier of driver performance. Some call it graphics heaven.

I call it hell.

Web Browsers

Chrome is the web browser, and, statistically, everyone uses it. It ships on desktops and phones, embeds in apps, and even allows you to read this blog. Haters will say No I uSe FiReFoX, but they may as well be Netscape users in the year 2000.

In the past, Chrome defaulted to using GL, which made testing easy. Now, however, --disable-features=Vulkan is needed to return to the comfort of an API so reliable it no longer receives versioned updates. Looking at an apitrace of Chrome, I saw a disturbing rendering pattern that went something like this:

In this case, zink would correctly inline the FBO3/swapchain resolve at the end, but the intermediate multisampled rendering on FBO1 would pay the full performance penalty of storing the multisampled image data and then loading it again for the separate resolve operation.

I'd like to say it was simple to inline this intermediate resolve. That I just slapped a single MR into mesa and it magically worked. Unfortunately, nothing is ever that simple. There were minor fixups all over the place. And this brought me to the real insanity.

Chrome has bugs too.

Literal Hell

Let's take a concrete example: launch Chrome with --disable-features=Vulkan and check out this tiny SVG: chromebug.html

This is most likely what you see:

chromebug-good.png

The reason you see this is because you are on a big, strong desktop GPU which doesn't give a shit about load/store ops or uninitialized GPU memory. You're driving a giant industrial bulldozer on your morning commute: traffic no longer exists and stop signals are fully optional. On a wimpy tiling GPU, however, things are different.

Using a recent version of zink, even on a desktop GPU, you can run the same Chrome browser using ZINK_DEBUG=rp,rploads to enable the same codepaths used by tilers and also clear all uninitialized memory to red. Now load the same SVG, and you'll see this:

chromebug-bad.png

It took nearly a week of pair debugging and a new zink debug mode to prune down test cases and figure out what was happening. All around the composited SVG texture, memory is uninitialized.

But this only shows up on tiling GPUs. And only if the driver is doing near-lethal amounts of very legal renderpass optimizations.

This fast is too fast.

29 Aug 2025 12:00am GMT

26 Aug 2025

feedplanet.freedesktop.org

Alyssa Rosenzweig: Dissecting the Apple M1 GPU, the end

In 2020, Apple released the M1 with a custom GPU. We got to work reverse-engineering the hardware and porting Linux. Today, you can run Linux on a range of M1 and M2 Macs, with almost all hardware working: wireless, audio, and full graphics acceleration.

Our story begins in December 2020, when Hector Martin kicked off Asahi Linux. I was working for Collabora working on Panfrost, the open source Mesa3D driver for Arm Mali GPUs. Hector put out a public call for guidance from upstream open source maintainers, and I bit. I just intended to give some quick pointers. Instead, I bought myself a Christmas present and got to work. In between my university coursework and Collabora work, I poked at the shader instruction set.

One thing led to another. Within a few weeks, I drew a triangle.

In 3D graphics, once you can draw a triangle, you can do anything.

Pretty soon, I started work on a shader compiler. After my final exams that semester, I took a few days off from Collabora to bring up an OpenGL driver capable of spinning gears with my new compiler.

Over the next year, I kept reverse-engineering and improving the driver until it could run 3D games on macOS.

Meanwhile, Asahi Lina wrote a kernel driver for the Apple GPU. My userspace OpenGL driver ran on macOS, leaving her kernel driver as the missing piece for an open source graphics stack. In December 2022, we shipped graphics acceleration in Asahi Linux.

In January 2023, I started my final semester in my Computer Science program at the University of Toronto. For years I juggled my courses with my part-time job and my hobby driver. I faced the same question as my peers: what will I do after graduation?

Maybe Panfrost? I started reverse-engineering of the Mali Midgard GPU back in 2017, when I was still in high school. That led to an internship at Collabora in 2019 once I graduated, turning into my job throughout four years of university. During that time, Panfrost grew from a kid's pet project based on blackbox reverse-engineering, to a professional driver engineered by a team with Arm's backing and hardware documentation. I did what I set out to do, and the project succeeded beyond my dreams. It was time to move on.

What did I want to do next?

Panfrost was my challenge until we "won". My next challenge? Gaming on Linux on M1.

Once I finished my coursework, I started full-time on gaming on Linux. Within a month, we shipped OpenGL 3.1 on Asahi Linux. A few weeks later, we passed official conformance for OpenGL ES 3.1. That put us at feature parity with Panfrost. I wanted to go further.

OpenGL (ES) 3.2 requires geometry shaders, a legacy feature not supported by either Arm or Apple hardware. The proprietary OpenGL drivers emulate geometry shaders with compute, but there was no open source prior art to borrow. Even though multiple Mesa drivers need geometry/tessellation emulation, nobody did the work to get there.

My early progress on OpenGL was fast thanks to the mature common code in Mesa. It was time to pay it forward. Over the rest of the year, I implemented geometry/tessellation shader emulation. And also the rest of the owl. In January 2024, I passed conformance for the full OpenGL 4.6 specification, finishing up OpenGL.

Vulkan wasn't too bad, either. I polished the OpenGL driver for a few months, but once I started typing a Vulkan driver, I passed 1.3 conformance in a few weeks.

What remained was wiring up the geometry/tessellation emulation to my shiny new Vulkan driver, since those are required for Direct3D. Et voilà, Proton games.

Along the way, Karol Herbst passed OpenCL 3.0 conformance on the M1, running my compiler atop his "rusticl" frontend.

Meanwhile, when the Vulkan 1.4 specification was published, we were ready and shipped a conformant implementation on the same day.

After that, I implemented sparse texture support, unlocking Direct3D 12 via Proton.

…Now what?

That's a wrap.

We've succeeded beyond my dreams. The challenges I chased, I have tackled. The drivers are fully upstream in Mesa. Performance isn't too bad. With the Vulkan on Apple myth busted, conformant Vulkan is now coming to macOS via LunarG's KosmicKrisp project building on my work.

Satisfied, I am now stepping away from the Apple ecosystem. My friends in the Asahi Linux orbit will carry the torch from here. As for me?

Onto the next challenge!

26 Aug 2025 5:00am GMT

21 Aug 2025

feedplanet.freedesktop.org

Sebastian Wick: Testing with Portals

At the Linux App Summit (LAS) in Albania three months ago, I gave a talk about testing in the xdg-desktop-portal project. There is a recording of the presentation, and the slides are available as well.

To give a quick summary of the work I did:

The hope I had is that this will result in:

While it's hard to get definite data on those points, at least some of it seems to have become reality. I have seen an increase in activity (there are other factors to this for sure), and a lot of PRs already come with tests without me even having to ask for it. Canonical is involved again, taking care of the Snap side of things. So far it seems like we didn't introduce any new regressions, but this usually shows after a new release. The experience of refactoring portals also became a lot better because there is a baseline level of confidence when the tests pass, as well as the possibility to easily bisect issues. Overall I'm already quite happy with the results.

Two weeks ago, Georges merged the last piece of what I talked about in the LAS presentation, so we're finally testing the code paths that are specific to host, Flatpak and Snap applications! I also continued a bit with improving the tests, and now they can be run with Valgrind, which is super slow and that's why we're not doing it in the CI, but it tends to find memory leaks which ASAN does not. With the existing tests, it found 9 small memory leaks.

If you want to improve the Flatpak story, come and contribute to xdg-desktop-portal. It's now easier than ever!

21 Aug 2025 9:00pm GMT

20 Aug 2025

feedplanet.freedesktop.org

Peter Hutterer: Why is my device a touchpad and a mouse and a keyboard?

If you have spent any time around HID devices under Linux (for example if you are an avid mouse, touchpad or keyboard user) then you may have noticed that your single physical device actually shows up as multiple device nodes (for free! and nothing happens for free these days!). If you haven't noticed this, run libinput record and you may be part of the lucky roughly 50% who get free extra event nodes.

The pattern is always the same. Assuming you have a device named FooBar ExceptionalDog 2000 AI[1] what you will see are multiple devices

/dev/input/event0: FooBar ExceptionalDog 2000 AI Mouse
/dev/input/event1: FooBar ExceptionalDog 2000 AI Keybard 
/dev/input/event2: FooBar ExceptionalDog 2000 AI Consumer Control 

The Mouse/Keyboard/Consumer Control/... suffixes are a quirk of the kernel's HID implementation which splits out a device based on the Application Collection. [2]

A HID report descriptor may use collections to group things together. A "Physical Collection" indicates "these things are (on) the same physical thingy". A "Logical Collection" indicates "these things belong together". And you can of course nest these things near-indefinitely so e.g. a logical collection inside a physical collection is a common thing.

An "Application Collection" is a high-level abstractions to group something together so it can be detected by software. The "something" is defined by the HID usage for this collection. For example, you'll never guess what this device might be based on the hid-recorder output:

# 0x05, 0x01,                    // Usage Page (Generic Desktop)              0
# 0x09, 0x06,                    // Usage (Keyboard)                          2
# 0xa1, 0x01,                    // Collection (Application)                  4
...
# 0xc0,                          // End Collection                            74

Yep, it's a keyboard. Pop the champagne[3] and hooray, you deserve it.

The kernel, ever eager to help, takes top-level application collections (i.e. those not inside another collection) and applies a usage-specific suffix to the device. For the above Generic Desktop/Keyboard usage you get "Keyboard", the other ones currently supported are "Keypad" and "Mouse" as well as the slightly more niche "System Control", "Consumer Control" and "Wireless Radio Control" and "System Multi Axis". In the Digitizer usage page we have "Stylus", "Pen", "Touchscreen" and "Touchpad". Any other Application Collection is currently unsuffixed (though see [2] again, e.g. the hid-uclogic driver uses "Touch Strip" and other suffixes).

This suffix is necessary because the kernel also splits out the data sent within each collection as separate evdev event node. Since HID is (mostly) hidden from userspace this makes it much easier for userspace to identify different devices because you can look at a event node and say "well, it has buttons and x/y, so must be a mouse" (this is exactly what udev does when applying the various ID_INPUT properties, with varying levels of success).

The side effect of this however is that your device may show up as multiple devices and most of those extra devices will never send events. Sometimes that is due to the device supporting multiple modes (e.g. a touchpad may by default emulate a mouse for backwards compatibility but once the kernel toggles it to touchpad mode the mouse feature is mute). Sometimes it's just laziness when vendors re-use the same firmware and leave unused bits in place.

It's largely a cosmetic problem only, e.g. libinput treats every event node as individual device and if there is a device that never sends events it won't affect the other event nodes. It can cause user confusion though: "why does my laptop say there's a mouse?" and in some cases it can cause functional degradation - the two I can immediately recall are udev detecting the mouse node of a touchpad as pointing stick (because i2c mice aren't a thing), hence the pointing stick configuration may show up in unexpected places. And fake mouse devices prevent features like "disable touchpad if a mouse is plugged in" from working correctly. At the moment we don't have a good solution for detecting these fake devices - short of shipping giant databases with product-specific entries we cannot easily detect which device is fake. After all, a Keyboard node on a gaming mouse may only send events if the user configured the firmware to send keyboard events, and the same is true for a Mouse node on a gaming keyboard.

So for now, the only solution to those is a per-user udev rule to ignore a device. If we ever figure out a better fix, expect to find a gloating blog post in this very space.

[1] input device naming is typically bonkers, so I'm just sticking with precedence here
[2] if there's a custom kernel driver this may not apply and there are quirks to change this so this isn't true for all devices
[3] or sparkling wine, let's not be regionist here

20 Aug 2025 1:12am GMT

16 Aug 2025

feedplanet.freedesktop.org

Simon Ser: Status update, August 2025

Hi!

This month I've spent quite some time working on vali, a C library and code generator for the Varlink IPC protocol. It was formerly named "varlinkgen", but the new name is shorter and more accurate (the library can be used without the code generator). I've fixed a bunch of bugs, updated the service implementation to use non-blocking IO, added some tests and support for continued calls (which are Varlink's way to emit events from a service). I've also written a patch to port the kanshi output configuration tool to vali.

Speaking of kanshi, I've released version 1.8. A new kanshictl status command shows the current mode, and wildcard patterns are now supported to match outputs. I want to finish up relative output positioning for the next release, but some tricky usability issues need to be sorted out first.

Support for toplevel capture in xdg-desktop-portal-wlr has been merged. This was the last missing piece to be able to share an individual window from Web browsers. libdisplay-info v0.3 has been released with support for many new CTA data blocks and groundwork for DisplayID v2. José Expósito has sent libdrm and drm_info patches to add user-space support for the special "faux" bus used in cases where a device isn't backed by hardware (up until now, the platform bus was abused).

The Goguma mobile IRC client now displays a small bubble when someone else mentions you, making these easier to spot at a glance:

Goguma highlight bubble

Jean THOMAS has added a new option to choose between the in-app Web view and an external browser when opening links. Hubert Hirtz has tweaked the login forms to play better with the autofill feature some password managers provide.

I've released go-imap v2 beta 6, with support for SPECIAL-USE and CHILDREN thanks to Dejan Štrbac and legacy RECENT thanks to fox.cpp. I'd like to eventually ship v2, but there are still some rough edges that I'd like to smooth out. I now realize it's been more than 2 years since the first v2 alpha release, maybe I should listen a bit more to my bio teacher who used to repeat "perfect is the enemy of good".

Anyways, that's all for now, see you next month!

16 Aug 2025 10:00pm GMT

15 Aug 2025

feedplanet.freedesktop.org

Sebastian Wick: Display Next Hackfest 2025

A few weeks ago, a bunch of display driver and compositor developers met once again for the third iteration of the Display Next Hackfest. The tradition was started by Red Hat, followed by Igalia (thanks Melissa), and now AMD (thanks Harry). We met in the AMD offices in Markham, Ontario, Canada; and online, to discuss issues, present things we worked on, figure out future steps on a bunch of topics related to displays, GPUs, and compositors.

A bunch of people with laptops sitting at desks facing each other. A screens in the background with remote attendees.

The Display Next Hackfest in the AMD Markham offices

It was really nice meeting everyone again, and also seeing some new faces! Notably, Charles Poynton who "decided that HD should have 1080 image rows, and square pixels", and Keith Lee who works for AMD and designed their color pipeline, joined us this year. This turned out to be invaluable. It was also great to see AMD not only organizing the event, but also showing genuine interest and support for what we are trying to achieve.

This year's edition is likely going to be the last dedicated Display Next Hackfest, but we're already plotting to somehow fuse it with XDC next year in some way.

If you're looking for a more detailed technical rundown of what we were doing there, you can read Xaver's, or Louis' blog posts, or our notes.

With all that being said, here is an incomplete list of things I found exciting:

Like always, lots of work ahead of us, but it's great to actually see the progress this year with the entire ecosystem having HDR support now.

Me sitting at a desk with 4 small glasses of beer in front of me

Sampling local craft beers

See you all at XDC this year (or at least the one next year)!

15 Aug 2025 3:41pm GMT

11 Aug 2025

feedplanet.freedesktop.org

Peter Hutterer: xkeyboard-config 2.45 has a new install location

This is a heads ups that if you install xkeyboard-config 2.45 (the package that provides the XKB data files), some manual interaction may be needed. Version 2.45 has changed the install location after over 20 years to be a) more correct and b) more flexible.

When you select a keyboard layout like "fr" or "de" (or any other ones really), what typically happens in the background is that an XKB parser (xkbcomp if you're on X, libxkbcommon if you're on Wayland) goes off and parses the data files provided by xkeyboard-config to populate the layouts. For historical reasons these data files have resided in /usr/share/X11/xkb and that directory is hardcoded in more places than it should be (i.e. more than zero). As of xkeyboard-config 2.45 however, the data files are now installed in the much more sensible directory /usr/share/xkeyboard-config-2 with a matching xkeyboard-config-2.pc for anyone who relies on the data files. The old location is symlinked to the new location so everything keeps working, people are happy, no hatemail needs to be written, etc. Good times.

The reason for this change is two-fold: moving it to a package-specific directory opens up the (admittedly mostly theoretical) use-case of some other package providing XKB data files. But even more so, it finally allows us to start versioning the data files and introduce new formats that may be backwards-incompatible for current parsers. This is not yet the case however, the current format in the new location is guaranteed to be the same as the format we've always had, it's really just a location change in preparation for future changes.

Now, from an upstream perspective this is not just hunky, it's also dory. Distributions however struggle a bit more with this change because of packaging format restrictions. RPM for example is quite unhappy with a directory being replaced by a symlink which means that Fedora and OpenSuSE have to resort to the .rpmmoved hack. If you have ever used the custom layout and/or added other files to the XKB data files you will need to manually move those files from /usr/share/X11/xkb.rpmmoved/ to the new equivalent location. If you have never used that layout and/or modified local you can just delete /usr/share/X11/xkb.rpmmoved. Of course, if you're on Wayland you shouldn't need to modify system directories anyway since you can do it in your $HOME.

Corresponding issues on what to do on Arch and Gentoo, I'm not immediately aware of other distributions's issues but if you search for them in your bugtracker you'll find them.

11 Aug 2025 1:44am GMT

08 Aug 2025

feedplanet.freedesktop.org

Sebastian Wick: GNOME 49 Backlight Changes

One of the things I'm working on at Red Hat is HDR support. HDR is inherently linked to luminance (brightness, but ignoring human perception) which makes it an important parameter for us that we would like to be in control of.

One reason is rather stupid. Most external HDR displays refuse to let the user control the luminance in their on-screen-display (OSD) if the display is in HDR mode. Why? Good question. Read my previous blog post.

The other reason is that the amount of HDR headroom we have available is the result of the maximum luminance we can achieve versus the luminance that we use as the luminance a sheet of white paper has (reference white level). For power consumption reasons, we want to be able to dynamically change the available headroom, depending on how much headroom the content can make use of. If there is no HDR content on the screen, there is no need to crank up the backlight to give us more headroom, because the headroom will be unused.

To work around the first issue, mutter can change the signal it sends to the display, so that white is not a signal value of 1.0 but somewhere else between 0 and 1. This essentially emulates a backlight in software. The drawback is that we're not using a bunch of bits in the signal anymore and issues like banding might become more noticeable, but since we're using this only with 10 or 12 bits HDR signals, this isn't an issue in practice.

This has been implemented in GNOME 48 already, but it was an API that mutter exposed and Settings showed as "HDR Brightness".

GNOME Settings Display panel showing the HDR Brightness Setting

"HDR Brightness" in GNOME Settings

The second issue requires us to be able to map the backlight value to luminance, and change the backlight atomically with an update to the screen. We could work towards adding those things to the existing sysfs backlight API but it turns out that there are a number of problems with it. Mapping the sysfs entry to a connected display is really hard (GNOME pretends that there is only one single internal display that can ever be controlled), and writing a value to the backlight requires root privileges or calling a logind dbus API. One internal panel can expose multiple backlights, the value of 0 can mean the display turns off or is just really dim.

So a decision was made to create a new API that will be part of KMS, the API that we use to control the displays.

The sysfs backlight has been controlled by gnome-settings-daemon and GNOME Shell called a dbus API when the screen brightness slider was moved.

To recap:

Overall, this is quite messy, so I decided to clean this up.

Over the last year, I moved the sysfs backlight handling from gnome-settings-daemon into mutter, added logic to decide which backlight to use (sysfs or the "software backlight"), and made it generic so that any screen can have a backlight. This means mutter is the single source of truth for the backlight itself. The backlight itself gets its value from a number of sources. The user can configure the screen brightness in the quick settings menu and via keyboard shortcuts. Power saving features can kick in and dim the screen. Lastly, an Ambient Light Sensor (ALS) can take control over the screen brightness. To make things more interesting, a single "logical monitor" can have multiple hardware monitors which each can have a backlight. All of that logic is now neatly sitting in GNOME Shell which takes signals from gnome-settings-daemon about the ALS and dimming. I also changed the Quick Settings UI to make it possible to control the brightness on multiple screens, and removed the old "HDR Brightness" from Settings.

All of this means that we can now handle screen brightness on multiple monitors and when the new KMS backlight API makes it upstream, we can just plug it in, and start to dynamically create HDR headroom.

08 Aug 2025 3:26pm GMT

07 Aug 2025

feedplanet.freedesktop.org

Peter Hutterer: libinput and Lua plugins (Part 2)

Part 2 is, perhaps suprisingly, a follow-up to libinput and lua-plugins (Part 1).

The moon has circled us a few times since that last post and some update is in order. First of all: all the internal work required for plugins was released as libinput 1.29 but that version does not have any user-configurable plugins yet. But cry you not my little jedi and/or sith lord in training, because support for plugins has now been merged and, barring any significant issues, will be in libinput 1.30, due somewhen around October or November. This year. 2025 that is.

Which means now is the best time to jump in and figure out if your favourite bug can be solved with a plugin. And if so, let us know and if not, then definitely let us know so we can figure out if the API needs changes. The API Documentation for Lua plugins is now online too and will auto-update as changes to it get merged. There have been a few minor changes to the API since the last post so please refer to the documentation for details. Notably, the version negotiation was re-done so both libinput and plugins can support select versions of the plugin API. This will allow us to iterate the API over time while designating some APIs as effectively LTS versions, minimising plugin breakages. Or so we hope.

What warrants a new post is that we merged a new feature for plugins, or rather, ahaha, a non-feature. Plugins now have an API accessible that allows them to disable certain internal features that are not publicly exposed, e.g. palm detection. The reason why libinput doesn't have a lot of configuration options have been explained previously (though we actually have quite a few options) but let me recap for this particular use-case: libinput doesn't have a config option for e.g. palm detection because we have several different palm detection heuristics and they depend on device capabilities. Very few people want no palm detection at all[1] so disabling it means you get a broken touchpad and we now get to add configuration options for every palm detection mechanism. And keep those supported forever because, well, workflows.

But plugins are different, they are designed to take over some functionality. So the Lua API has a EvdevDevice:disable_feature("touchpad-palm-detection") function that takes a string with the feature's name (easier to make backwards/forwards compatible this way). This example will disable all palm detection within libinput and the plugin can implement said palm detection itself. At the time of writing, the following self-explanatory features can be disabled: "button-debouncing", "touchpad-hysteresis", "touchpad-jump-detection", "touchpad-palm-detection", "wheel-debouncing". This list is mostly based on "probably good enough" so as above - if there's something else then we can expose that too.

So hooray for fewer features and happy implementing!

[1] Something easily figured out by disabling palm detection or using a laptop where palm detection doesn't work thanks to device issues

07 Aug 2025 10:00am GMT