16 Apr 2025
planet.freedesktop.org
Simon Ser: Status update, April 2025
Hi!
Last week wlroots 0.19.0-rc1 has been released! It includes the new color management protocol, however it doesn't include HDR10 support because the renderer and backend bits haven't yet been merged. Also worth noting is full explicit synchronization support as well as the new screen capture protocols. I plan to release new release candidates weekly until we're happy with the stability. Please test!
Sway is also getting close to its first release candidate. I plan to publish version 1.11.0-rc1 this week-end. Thanks to Ferdinand Bachmann, Sway no longer aborts on shutdown due to dangling signal listeners. I've also updated my HDR10 patch to add an output hdr
command (but it's Sway 1.12 material).
I've spent a bit of time on libicc, my C library to manipulate ICC profiles. I've introduced an encoder to make it easy to write new ICC profiles, and used that to write a small program to create an ICC profile which inverts colors. The encoder doesn't support as many ICC elements as the decoder yet (patches welcome!), but does support many interesting bits for display profiles: basic matrices and curves, lut16Type
elements and more advanced lutAToBType
elements. New APIs have been introduced to apply ICC profile transforms to a color value. I've also added tests which compare the results given by libicc and by LittleCMS. For some reason lut16Type
and lutAToBType
results are multiplied by 2 by LittleCMS, I haven't yet understood why that is, even after reading the spec in depth and staring at LittleCMS source code for a few hours (if you have a guess please ping me). In the future I'd like to add a small tool to convert ICC profiles to and from JSON files to make it easy to create new files or adjust exist ones.
Version 0.9.0 of the soju IRC bouncer has been released. Among the most notable changes, the database is used by default to store messages, pinned/muted channels and buffers can be synchronized across devices, and database queries have been optimized. I've continued working on the Goguma mobile IRC client, fixing a few bugs such as dangling Firebase push subscriptions and message notifications being dismissed too eagerly.
Max Ehrlich has contributed a mako patch to introduce a Notifications
property to the mako-specific D-Bus API, so that external programs can monitor active notifications (e.g. display a count in a status bar, or display a list on a lockscreen).
That's all I have in store, see you next month!
16 Apr 2025 10:00pm GMT
Mike Blumenkrantz: Another Milestone
16 Apr 2025 12:00am GMT
15 Apr 2025
planet.freedesktop.org
Christian Schaller: Fedora Workstation 42 is upon us!
We are excited about the Fedora Workstation 42 released today. Having worked on some great features for it.
Fedora Workstation 42 HDR edition
I would say that the main feature that landed was HDR or High Dynamic Range. It is a feature we spent years on with many team members involved and a lot of collaboration with various members of the wider community.

GNOME Settings menu showing HDR settings
The fact that we got this over the finish line was especially due to all the work Sebastian Wick put into it in collaboration with Pekka Paalanen around HDR Wayland specification and implementations.
Another important aspect was tools like libdisplay which was co-created with Simon Ser, with others providing more feedback and assistance in the final stretch of the effort.

HDR setup in Ori and Will of the Wisps
That said a lot of other people at Red Hat and in the community deserve shout outs for this too. Like Xaver Hugl whose work on HDR in Kwin was a very valuable effort that helped us move the GNOME support forward too. Matthias Clasen and Benjamin Otte for their work on HDR support in GTK+, Martin Stransky for his work on HDR support in Firefox, Jonas Aadahl and Olivier Fourdan for their protocol and patch reviews. Jose Exposito for packaging up the Mesa Vulkan support for Fedora 42.
One area that should benefit from HDR support are games. In the screenshot about you see the game Ori and the Will of the Wisps which is known for great HDR support. Valve will need to update to a Wine version for Proton that supports Wayland natively though before this just works, at the moment you can get it working using gamescope, but hopefully soon it will just work under both Mutter and Kwin.
Also a special shoutout to the MPV community for quickly jumping on this and releasing a HDR capable video player recently.

MPV video player playing HDR content
Of course getting Fedora Workstation 42 to out with these features is just the beginning, with the baseline support it now is really the time when application maintainers have a real chance of starting to make use of these features, so I would expect various content creative applications for instance to start having support over the next year.
For the desktop itself there are also open questions we need to decide on like:
- Format to use for HDR screenshots
- Better backlight and brightness handling
- Better offloading
- HDR screen recording video format
- How to handle HDR webcams (seems a lot of them are not really capable of producing HDR output).
- Version of the binary NVIDIA driver released supporting the
VK_EXT_hdr_metadata and VK_COLOR_SPACE_HDR10_ST2084_EXT
Vulkan extension on Linux - A million smaller issues we will need to iron out
Accessibility
Our accessibility team has been hard at work trying to ensure we have a great accessibility story in Fedora Workstation 42. Our accessibility team with Lukas Tyrychtr and Bohdan Milar has been working hard together with others to ensure that Fedora Workstation 42 has the best accessibility support you can get on Linux. One major effort that landed was the new keyboard monitoring interface which is critical for making Orca work well under Wayland. This was a collaboration of between Lukas Tyrychtr, Matthias Clasen and Carlos Garnacho on our team. If you are interested in Accessibility, as a user or a developer or both then make sure to join in by reaching out to the Accessibility Working group
PipeWire
PipeWire also keeps going strong with continuous improvements and bugfixes. Thanks to the great work by Jan Grulich the support for PipeWire in Firefox and Chrome is now working great, including for camera handling. It is an area where we want to do an even better job though, so Wim Taymans is currently looking at improving video handling to ensure we are using the best possible video stream the camera can provide and handle conversion between formats transparently. He is currently testing it out using a ffmpeg software backend, but the end goal is to have it all hardware accelerated through directly using Vulkan.
Another feature Wim Taymans added recently is MIDI2 support. This is the next generation of MIDI with only a limited set of hardware currently supporting it, but on the other hand it feels good that we are now able to be ahead of the curve instead of years behind thanks to the solid foundation we built with Pipewire.
Wayland
For a long time the team has been focused on making sure Wayland has all the critical pieces and was functionality wise on the same level as X11. For instance we spent a lot of time and effort on ensuring proper remote desktop support. That work all landed in the previous Fedora release which means that over the last 6 Months the team has had more time to look at things like various proposed Wayland protocols and get them supported in GNOME. Thanks to that we helped ensure the Cursor Shape Protocol and Toplevel Drag protocols got landed in time for this release. We are already looking and what to help land for the next release, so expect a continued acceleration in Wayland protocol adoption going forward.
First steps into AI
So an effort we been plugging away at recently is starting to bring AI tooling to Open Source desktop applications. Our first effort in this regard is Granite.code. Granite.code is a extension for Visual Studio Code that sets up a local AI engine on your system to help with various tasks including code generation and chat inside Visual Studio Code. So what is special about this effort is that it relies on downloading and running a copy of the open source AI Granite LLM model to your system instead on relying on it being run in a cloud instance somewhere. That means you can use Granite.code without having to share your data and work with someone else. Granite.code is still very early stage and it requires a NVIDIA or AMD GPU with over 8GB of video ram to use under Linux. (It also runs under Windows and MacOS X). It is still in a pre-release stage, we are waiting for the Granite 3.3 model update to enable some major features for us before we make the first formal release, but for those willing to help us test you can search for Granite in the Visual Studio Code extension marketplace and install it.
We are hoping though that this will just the starting point where our work can get picked up and used by other IDEs out there too and also we are thinking about how we can offer AI features in other parts of the desktop too.

Granite.code running on Linux
15 Apr 2025 2:38pm GMT
28 Mar 2025
planet.freedesktop.org
André Almeida: Linux 6.14, an almost forgotten release
Linux 6.14 is the second release of 2025, and as usual Igalia took part on it. It's a very normal release, except that it was release on Monday, instead of the usual Sunday release that has been going on for years now. The reason behind this? Well, quoting Linus himself:
I'd like to say that some important last-minute thing came up and delayed things.
But no. It's just pure incompetence.
But we did not forget about it, so here's our Linux 6.14 blog post!
A part of the development cycle for this release happened during late December, when a lot of maintainers and developers were taking their deserved breaks. As a result of this, this release contains less changes than usual as stated by LWN as the "lowest level of merge-window activity seen in years". Nevertheless, some cool features made through this release:
- NT synchronization primitives: Elizabeth Figura, from Codeweavers, is know from her work around improving Wine sync functions, like mutexes and semaphores. She was one the main collaborators behind the
futex_waitv()
work and now developed a virtual driver that is more compliant with the precise semantics that the NT kernel exposes. This allows Wine to behave closer to Windows without the need to create new syscalls, since this driver usesioctl()
as the front-end uAPI. - RWF_UNCACHED: Linux has two ways of dealing with storage I/O: buffered I/O (usually the preferred one) that stores data in a temporary buffer and regularly syncs the cache data with the device; and direct I/O that doesn't use cache and always writes/reads synchronously with the storage device. Now a new mixed approach is available: uncached buffered I/O. This method is aimed to have a fast way to write or read data that will not be needed again in the short term. For reading, the device writes data in the buffer and as soon as the user finished reading the buffer, it's cleared from the cache. For writing, as soon as userspace fills the cache, the device reads it and removes it from the cache. In this way we still have the advantage of using a fast cache but reducing the cache pressure.
- amdgpu panic support: AMD developers added kernel panic support for amdgpu driver, "which displays a pretty user friendly message on the screen when a Linux kernel panic occurs" instead of just a black screen or a partial dmesg log.
As usual Kernel Newbies provides a very good summary, you should check it for more details: Linux 6.14 changelog. Now let's jump to see what were the merged contributions by Igalia for this release!
DRM
For the DRM common infrastructure, we helped to land a standardization for DRM client memory usage reporting. Additionally, we contributed to improve and fix bugs found in drivers of AMD, Intel, Broadcom, and Vivante.
AMDGPU
For the AMD driver, we fixed bugs experienced by users of Cosmic Desktop Environment on several AMD hardware versions. One was uncovered with the introduction of overlay cursor mode, and a definition mismatch across the display driver caused a page-fault in the usage of multiple overlay planes. Another bug was related to division by zero on plane scaling. Also, we fixed regressions on VRR and MST generated by the series of changes to migrate AMD display driver from open-coded EDID handling to drm_edid
struct.
Intel
For the Intel drivers, we fixed a bug in the xe GPU driver which prevented certain type of workarounds from being applied, helped with the maintainership of the i915 driver, handled external code contributions, maintained the development branch and sent several pull requests.
Raspberry Pi (V3D)
We fixed the GPU resets for the Raspberry Pi 4 as we found out to be broken as per a user bug report.
Also in the V3D driver, the active performance monitor is now properly stopped before being destroyed, addressing a potential use-after-free issue. Additionally, support for a global performance monitor has been added via a new DRM_IOCTL_V3D_PERFMON_SET_GLOBAL
ioctl. This allows all jobs to share a single, globally configured perfmon, enabling more consistent performance tracking and paving the way for integration with user-space tools such as perfetto.
A small video demo of perfetto integration with V3D
etnaviv
On the etnaviv side, fdinfo
support has been implemented to expose memory usage statistics per file descriptor, enhancing observability and debugging capabilities for memory-related behavior.
sched_ext
Many BPF schedulers (e.g., scx_lavd
) frequently call bpf_ktime_get_ns()
for tracking tasks' runtime properties. bpf_ktime_get_ns()
eventually reads a hardware timestamp counter (TSC). However, reading a hardware TSC is not performant in some hardware platforms, degrading instructions per cycyle (IPC).
We addressed the performance problem of reading hardware TSC by leveraging the rq clock in the scheduler core, introducing a scx_bpf_now()
function for BPF schedulers. Whenever the rq clock is fresh and valid, scx_bpf_now()
provides the rq clock, which is already updated by the scheduler core, so it can reduce reading the hardware TSC. Using scx_bpf_now()
reduces the number of reading hardware TSC by 50-80% (e.g., 76% for scx_lavd
).
Assorted kernel fixes
Continuing our efforts on cleaning up kernel bugs, we provided a few fixes that address issues reported by syzbot with the goal of increasing stability and security, leveraging the fuzzing capabilities of syzkaller to bring to the surface certain bugs that are hard to notice otherwise. We're addressing bug reports from different kernel areas, including drivers and core subsystems such as the memory manager. As part of this effort, several fixes were done for the probe path of the rtlwifi driver.
Check the complete list of Igalia's contributions for the 6.14 release
Authored (38)
Changwoo Min
- sched_ext: Relocate scx_enabled() related code
- sched_ext: Implement scx_bpf_now()
- sched_ext: Add scx_bpf_now() for BPF scheduler
- sched_ext: Add time helpers for BPF schedulers
- sched_ext: Replace bpf_ktime_get_ns() to scx_bpf_now()
- sched_ext: Use time helpers in BPF schedulers
- sched_ext: Fix incorrect time delta calculation in time_delta()
Christian Gmeiner
- drm/v3d: Stop active perfmon if it is being destroyed
- drm/etnaviv: Add fdinfo support for memory stats
- drm/v3d: Add DRM_IOCTL_V3D_PERFMON_SET_GLOBAL
Luis Henriques
Maíra Canal
- drm/v3d: Fix performance counter source settings on V3D 7.x
- drm/v3d: Fix miscellaneous documentation errors
- drm/v3d: Assign job pointer to NULL before signaling the fence
- drm/v3d: Don't run jobs that have errors flagged in its fence
- drm/v3d: Set job pointer to NULL when the job's fence has an error
Melissa Wen
- drm/amd/display: fix page fault due to max surface definition mismatch
- drm/amd/display: increase MAX_SURFACES to the value supported by hw
- drm/amd/display: fix divide error in DM plane scale calcs
- drm/amd/display: restore invalid MSA timing check for freesync
- drm/amd/display: restore edid reading from a given i2c adapter
Ricardo Cañuelo Navarro
- mm,madvise,hugetlb: check for 0-length range after end address adjustment
- mm: shmem: remove unnecessary warning in shmem_writepage()
Rodrigo Siqueira
Thadeu Lima de Souza Cascardo
- wifi: rtlwifi: do not complete firmware loading needlessly
- wifi: rtlwifi: rtl8192se: rise completion of firmware loading as last step
- wifi: rtlwifi: wait for firmware loading before releasing memory
- wifi: rtlwifi: fix init_sw_vars leak when probe fails
- wifi: rtlwifi: usb: fix workqueue leak when probe fails
- wifi: rtlwifi: remove unused check_buddy_priv
- wifi: rtlwifi: destroy workqueue at rtl_deinit_core
- wifi: rtlwifi: fix memory leaks and invalid access at probe error path
- wifi: rtlwifi: pci: wait for firmware loading before releasing memory
- Revert "media: uvcvideo: Require entities to have a non-zero unique ID"
- char: misc: deallocate static minor in error path
Tvrtko Ursulin
- drm/amdgpu: Use DRM scheduler API in amdgpu_xcp_release_sched
- drm/xe: Fix GT "for each engine" workarounds
Reviewed (36)
André Almeida
- ASoC: cs35l41: Fallback to using HID for system_name if no SUB is available
- ASoC: cs35l41: Fix acpi_device_hid() not found
Christian Gmeiner
- drm/v3d: Fix performance counter source settings on V3D 7.x
- drm/etnaviv: Convert timeouts to secs_to_jiffies()
Iago Toral Quiroga
- drm/v3d: Fix performance counter source settings on V3D 7.x
- drm/v3d: Assign job pointer to NULL before signaling the fence
- drm/v3d: Don't run jobs that have errors flagged in its fence
- drm/v3d: Set job pointer to NULL when the job's fence has an error
Jose Maria Casanova Crespo
Luis Henriques
- fuse: rename to fuse_dev_end_requests and make non-static
- fuse: Move fuse_get_dev to header file
- fuse: Move request bits
- fuse: Add fuse-io-uring design documentation
- fuse: make args->in_args[0] to be always the header
- fuse: {io-uring} Handle SQEs - register commands
- fuse: Make fuse_copy non static
- fuse: Add fuse-io-uring handling into fuse_copy
- fuse: {io-uring} Make hash-list req unique finding functions non-static
- fuse: Add io-uring sqe commit and fetch support
- fuse: {io-uring} Handle teardown of ring entries
- fuse: {io-uring} Make fuse_dev_queue_{interrupt,forget} non-static
- fuse: Allow to queue fg requests through io-uring
- fuse: Allow to queue bg requests through io-uring
- fuse: {io-uring} Prevent mount point hang on fuse-server termination
- fuse: block request allocation until io-uring init is complete
- fuse: enable fuse-over-io-uring
- fuse: prevent disabling io-uring on active connections
Maíra Canal
- drm/vkms: Remove index parameter from init_vkms_output
- drm/vkms: Code formatting
- drm/vkms: Use drm_frame directly
- drm/vkms: Use const for input pointers in pixel_read an pixel_write functions
- drm/v3d: Add DRM_IOCTL_V3D_PERFMON_SET_GLOBAL
Tvrtko Ursulin
- drm/etnaviv: Add fdinfo support for memory stats
- drm: make drm-active- stats optional
- Documentation/gpu: Clarify drm memory stats definition
- drm/sched: Fix preprocessor guard
Tested (2)
André Almeida
Christian Gmeiner
Acked (1)
Iago Toral Quiroga
Maintainer SoB (6)
Maíra Canal
- drm/v3d: Stop active perfmon if it is being destroyed
- drm/v3d: Add DRM_IOCTL_V3D_PERFMON_SET_GLOBAL
- drm/vc4: plane: Remove WARN on state being set in plane_reset
Tvrtko Ursulin
- drm/i915: Remove deadcode
- drm/i915: Remove unused intel_huc_suspend
- drm/i915: Remove unused intel_ring_cacheline_align
28 Mar 2025 12:00am GMT
15 Mar 2025
planet.freedesktop.org
Simon Ser: Status update, March 2025
Hi all!
This month I've finally finished my initial work on HDR10 support for wlroots! My branch supports playing both SDR and HDR content on either an SDR or HDR output. It's a pretty basic version: wlroots only performs very basic gamut mapping, and has a simple luminance multiplier instead of proper tone mapping. Additionally the source content luminance and mastering display metadata isn't taken into account. Thus the result isn't as good as it could be, but that can be improved once the initial work is merged!
I've also been talking with dnkl about blending optical color values rather than electrical values in foot ("gamma-correct blending"). Thanks to the color-management protocol, foot can specify that its buffers contain linearly encoded values (as opposed to the default, sRGB) and can implement this blending method without sacrificing performance. See the foot pull request for more details.
We've been working on fixing the few last known blockers remaining for the next wlroots release, in particular related to scene-graph clipping, custom modes, and explicit synchronization. I hope we'll be able to start the release candidate dance soon.
The NPotM is Bakah, a small utility to build Docker Bake configuration files with Buildah (the library powering Podman). I've written more about the motivation and design of this tool in a separate article.
I've released tlstunnel 0.4 with better support for certificate files and some bugfixes. The sogogi WebDAV file server got support for graceful shutdown and Unix socket listeners thanks to Krystian Chachuła. Last, mako 1.10 adds a bunch of useful features such as include
directives, more customization for border sizes and icon border radius, and a --no-history
flag for makoctl dismiss
.
See you next month!
15 Mar 2025 10:00pm GMT
13 Mar 2025
planet.freedesktop.org
Pekka Paalanen: Wayland color-management, SDR vs. HDR, and marketing
This time I have three topics.
First, I want to promote the blog post I wrote to celebrate the landing of the Wayland color-management extension into wayland-protocols staging area. It's a brief historique of the journey.
Second, I want to discuss SDR and HDR video modes on monitors and TVs. I have seen people expect that the same sRGB content displayed on the SDR video mode and the HDR (BT.2100/PQ) video mode on the same monitor will look the same, and they can arbitrarily switch between the modes at any time. I have argued that this is a false expectation. Why?
Monitors tend to have a slew of settings. I tend to call them monitor "knobs". There are brightness, contrast, color temperature, picture mode, dynamic contrast, sharpness, gamma, and whatever. Many people have noticed that when the video source puts the monitor into BT.2100/PQ video mode, the monitor locks out some settings, often brightness and/or contrast included. So, SDR and HDR video modes do not play by the same rules. Hence, one cannot generally expect a match even if the video source does everything correctly.
Third, there is marketing. Have a look at the first third of this video. They discuss video streaming services, TV selling, and HDR from the picture quality point of view. My take of that is, that (some? most?) monitors and TVs come with a screaming broken picture out-of-the-box because marketing has to sell them. If all displays displayed a given content as intended, they would all look the same, major technology differences notwithstanding, but marketing wants to make each individual stand out.
Have you heard of TV calibration services? If I buy a new TV from a local electronics department store, they offer a calibration service, for a considerable additional fee. Why would anyone need a calibration service, the factory settings should be good, right?
13 Mar 2025 9:40am GMT
11 Mar 2025
planet.freedesktop.org
Ricardo Garcia: Device-Generated Commands at Vulkanised 2025
A month ago I attended Vulkanised 2025 in Cambridge, UK, to present a talk about Device-Generated Commands in Vulkan. The event was organized by Khronos and took place in the Arm Cambridge office. The talk I presented was similar to the one from XDC 2024, but instead of being a lightning 5-minutes talk, I had 25-30 minutes to present and I could expand the contents to contain proper explanations of almost all major DGC concepts that appear in the spec.
I attended the event together with my Igalia colleagues Lucas Fryzek and Stéphane Cerveau, who presented about lavapipe and Vulkan Video, respectively. We had a fun time in Cambridge and I can sincerely recommend attending the event to any Vulkan enthusiasts out there. It allows you to meet Khronos members and people working on both the specification and drivers, as well as many other Vulkan users from a wide variety of backgrounds.
The recordings for all sessions are now publicly available, and the one for my talk can be found embedded below. For those of you preferring slides and text, I'm also providing a transcription of my presentation together with slide screenshots further down.
In addition, at the end of the video there's a small Q&A section but I've always found it challenging to answer questions properly on the fly and with limited time. For this reason, instead of transcribing the Q&A section literally, I've taken the liberty of writing down the questions and providing better answers in written form, and I've also included an extra question that I got in the hallways as bonus content. You can find the Q&A section right after the embedded video.
Vulkanised 2025 recording
Questions and answers with longer explanations
Question: can you give an example of when it's beneficial to use Device-Generated Commands?
There are two main use cases where DGC would improve performance: on the one hand, many times game engines use compute pre-passes to analyze the scene they want to draw and prepare some data for that scene. This includes maybe deciding LOD levels, discarding content, etc. After that compute pre-pass, results would need to be analyzed from the CPU in some way. This implies a stall: the output from that compute pre-pass needs to be transferred to the CPU so the CPU can use it to record the right drawing commands, or maybe you do this compute pre-pass during the previous frame and it contains data that is slightly out of date. With DGC, this compute dispatch (or set of compute dispatches) could generate the drawing commands directly, so you don't stall or you can use more precise data. You also save some memory bandwidth because you don't need to copy the compute results to host-visible memory.
On the other hand, sometimes scenes contain so much detail and geometry that recording all the draw calls from the CPU takes a nontrivial amount of time, even if you distribute this draw call recording among different threads. With DGC, the GPU itself can generate these draw calls, so potentially it saves you a lot of CPU time.
Question: as the extension makes heavy use of buffer device addresses, what are the challenges for tools like GFXReconstruct when used to record and replay traces that use DGC?
The extension makes use of buffer device addresses for two separate things. First, it uses them to pass some buffer information to different API functions, instead of passing buffer handles, offsets and sizes. This is not different from other APIs that existed before. The VK_KHR_buffer_device_address extension contains APIs like vkGetBufferOpaqueCaptureAddressKHR, vkGetDeviceMemoryOpaqueCaptureAddressKHR that are designed to take care of those cases and make it possible to record and reply those traces. Contrary to VK_KHR_ray_tracing_pipeline, which has a feature to indicate if you can capture and replay shader group handles (fundamental for capture and replay when using ray tracing), DGC does not have any specific feature for capture-replay. DGC does not add any new problem from that point of view.
Second, the data for some commands that is stored in the DGC buffer sometimes includes device addresses. This is the case for the index buffer bind command, the vertex buffer bind command, indirect draws with count (double indirection here) and ray tracing command. But, again, the addresses in those commands are buffer device addresses. That does not add new challenges for capture and replay compared to what we already had.
Question: what is the deal with the last token being the one that dispatches work?
One minor detail from DGC, that's important to remember, is that, by default, DGC respects the order in which sequences appear in the DGC buffer and the state used for those sequences. If you have a DGC buffer that dispatches multiple draws, you know the state that is used precisely for each draw: it's the state that was recorded before the execute-generated-commands call, plus the small changes that a particular sequence modifies like push constant values or vertex and index buffer binds, for example. In addition, you know precisely the order of those draws: executing the DGC buffer is equivalent, by default, to recording those commands in a regular command buffer from the CPU, in the same order they appear in the DGC buffer.
However, when you create an indirect commands layout you can indicate that the sequences in the buffer may run in an undefined order (this is VK_INDIRECT_COMMANDS_LAYOUT_USAGE_UNORDERED_SEQUENCES_BIT_EXT). If the sequences could dispatch work and then change state, we would have a logical problem: what do those state changes affect? The sequence that is executed right after the current one? Which one is that? We would not know the state used for each draw. Forcing the work-dispatching command to be the last one is much easier to reason about and is also logically tight.
Naturally, if you have a series of draws on the CPU where, for some of them, you change some small bits of state (e.g. like disabling the depth or stencil tests) you cannot do that in a single DGC sequence. For those cases, you need to batch your sequences in groups with the same state (and use multiple DGC buffers) or you could use regular draws for parts of the scene and DGC for the rest.
Question from the hallway: do you know what drivers do exactly at preprocessing time that is so important for performance?
Most GPU drivers these days have a kernel side and a userspace side. The kernel driver does a lot of things like talking to the hardware, managing different types of memory and buffers, talking to the display controller, etc. The kernel driver normally also has facilities to receive a command list from userspace and send it to the GPU.
These command lists are particular for each GPU vendor and model. The packets that form it control different aspects of the GPU. For example (this is completely made-up), maybe one GPU has a particular packet to modify depth buffer and test parameters, and another packet for the stencil test and its parameters, while another GPU from another vendor has a single packet that controls both. There may be another packet that dispatches draw work of all kinds and is flexible to accomodate the different draw commands that are available on Vulkan.
The Vulkan userspace driver translates Vulkan command buffer contents to these GPU-specific command lists. In many drivers, the preprocessing step in DGC takes the command buffer state, combines it with the DGC buffer contents and generates a final command list for the GPU, storing that final command list in the preprocess buffer. Once the preprocess buffer is ready, executing the DGC commands is only a matter of sending that command list to the GPU.
Talk slides and transcription
Hello, everyone! I'm Ricardo from Igalia and I'm going to talk about device-generated commands in Vulkan.
First, some bits about me. I have been part of the graphics team at Igalia since 2019. For those that don't know us, Igalia is a small consultancy company specialized in open source and my colleagues in the graphics team work on things such as Mesa drivers, Linux kernel drivers, compositors… that kind of things. In my particular case the focus of my work is contributing to the Vulkan Conformance Test Suite and I do that as part of a collaboration between Igalia and Valve that has been going on for a number of years now. Just to highlight a couple of things, I'm the main author of the tests for the mesh shading extension and device-generated commands that we are talking about today.
So what are device-generated commands? So basically it's a new extension, a new functionality, that allows a driver to read command sequences from a regular buffer: something like, for example, a storage buffer, instead of the usual regular command buffers that you use. The contents of the DGC buffer could be filled from the GPU itself. This is what saves you the round trip to the CPU and, that way, you can improve the GPU-driven rendering process in your application. It's like one step ahead of indirect draws and dispatches, and one step behind work graphs. And it's also interesting because device-generated commands provide a better foundation for translating DX12. If you have a translation layer that implements DX12 on top of Vulkan like, for example, Proton, and you want to implement ExecuteIndirect, you can do that much more easily with device generated commands. This is important for Proton, which Valve uses to run games on the Steam Deck, i.e. Windows games on top of Linux.
If we set aside Vulkan for a moment, and we stop thinking about GPUs and such, and you want to come up with a naive CPU-based way of running commands from a storage buffer, how do you do that? Well, one immediate solution we can think of is: first of all, I'm going to assign a token, an identifier, to each of the commands I want to run, and I'm going to store that token in the buffer first. Then, depending on what the command is, I want to store more information.
For example, if we have a sequence like we see here in the slide where we have a push constant command followed by dispatch, I'm going to store the token for the push constants command first, then I'm going to store some information that I need for the push constants command, like the pipeline layout, the stage flags, the offset and the size. Then, after that, depending on the size that I said I need, I am going to store the data for the command, which is the push constant values themselves. And then, after that, I'm done with it, and I store the token for the dispatch, and then the dispatch size, and that's it.
But this doesn't really work: this is not how GPUs work. A GPU would have a hard time running commands from a buffer if we store them this way. And this is not how Vulkan works because in Vulkan you want to provide as much information as possible in advance and you want to make things run in parallel as much as possible, and take advantage of the GPU.
So what do we do in Vulkan? In Vulkan, and in the Vulkan VK_EXT_device_generated_commands extension, we have this central concept, which is called the Indirect Commands Layout. This is the main thing, and if you want to remember just one thing about device generated commands, you can remember this one.
The indirect commands layout is basically like a template for a short sequence of commands. The way you build this template is using the tokens and the command information that we saw colored red and green in the previous slide, and you build that in advance and pass that in advance so that, in the end, in the command buffer itself, in the buffer that you're filling with commands, you don't need to store that information. You just store the data for each command. That's how you make it work.
And the result of this is that with the commands layout, that I said is a template for a short sequence of commands (and by short I mean a handful of them like just three, four or five commands, maybe 10), the DGC buffer can be pretty large, but it does not contain a random sequence of commands where you don't know what comes next. You can think about it as divided into small chunks that the specification calls sequences, and you get a large number of sequences stored in the buffer but all of them follow this template, this commands layout. In the example we had, push constant followed by dispatch, the contents of the buffer would be push constant values, dispatch size, push content values, dispatch size, many times repeated.
The second thing that Vulkan does to be able to make this work is that we limit a lot what you can do with device-generated commands. There are a lot of things you cannot do. In fact, the only things you can do are the ones that are present in this slide.
You have some things like, for example, update push constants, you can bind index buffers, vertex buffers, and you can draw in different ways, using mesh shading maybe, you can dispatch compute work and you can dispatch raytracing work, and that's it. You also need to check which features the driver supports, because maybe the driver only supports device-generated commands for compute or ray tracing or graphics. But you notice you cannot do things like start render passes or insert barriers or bind descriptor sets or that kind of thing. No, you cannot do that. You can only do these things.
This indirect commands layout, which is the backbone of the extension, specifies, as I said, the layout for each sequence in the buffer and it has additional restrictions. The first one is that it must specify exactly one token that dispatches some kind of work and it must be the last token in the sequence. You cannot have a sequence that dispatches graphics work twice, or that dispatches computer work twice, or that dispatches compute first and then draws, or something like that. No, you can only do one thing with each DGC buffer and each commands layout and it has to be the last one in the sequence.
And one interesting thing that also Vulkan allows you to do, that DX12 doesn't let you do, is that it allows you (on some drivers, you need to check the properties for this) to choose which shaders you want to use for each sequence. This is a restricted version of the bind pipeline command in Vulkan. You cannot choose arbitrary pipelines and you cannot change arbitrary states but you can switch shaders. For example, if you want to use a different fragment shader for each of the draws in the sequence, you can do that. This is pretty powerful.
How do you create one of those indirect commands layout? Well, with one of those typical Vulkan calls, to create an object that you pass these CreateInfo structures that are always present in Vulkan.
And, as you can see, you have to pass these shader stages that will be used, will be active, while you draw or you execute those indirect commands. You have to pass the pipeline layout, and you have to pass in an indirect stride. The stride is the amount of bytes for each sequence, from the start of a sequence to the next one. And the most important information of course, is the list of tokens: an array of tokens that you pass as the token count and then the pointer to the first element.
Now, each of those tokens contains a bit of information and the most important one is the type, of course. Then you can also pass an offset that tells you how many bytes into the sequence for the start of the data for that command. Together with the stride, it tells us that you don't need to pack the data for those commands together. If you want to include some padding, because it's convenient or something, you can do that.
And then there's also the token data which allows you to pass the information that I was painting in green in other slides like information to be able to run the command with some extra parameters. Only a few tokens, a few commands, need that. Depending on the command it is, you have to fill one of the pointers in the union but for most commands they don't need this kind of information. Knowing which command it is you just know you are going to find some fixed data in the buffer and you just read that and process that.
One thing that is interesting, like I said, is the ability to switch shaders and to choose which shaders are going to be used for each of those individual sequences. Some form of pipeline switching, or restricted pipeline switching. To do that you have to create something that is called Indirect Execution Sets.
Each of these execution sets is like a group or an array, if you want to think about it like that, of pipelines: similar pipelines or shader objects. They have to share something in common, which is that all of the state in the pipeline has to be identical, basically. Only the shaders can change.
When you create these execution sets and you start adding pipelines or shaders to them, you assign an index to each pipeline in the set. Then, you pass this execution set beforehand, before executing the commands, so that the driver knows which set of pipelines you are going to use. And then, in the DGC buffer, when you have this pipeline token, you only have to store the index of the pipeline that you want to use. You create the execution set with 20 pipelines and you pass an index for the pipeline that you want to use for each draw, for each dispatch, or whatever.
The way to create the execution sets is the one you see here, where we have, again, one of those CreateInfo structures. There, we have to indicate the type, which is pipelines or shader objects. Depending on that, you have to fill one of the pointers from the union on the top right here.
If we focus on pipelines because it's easier on the bottom left, you have to pass the maximum pipeline count that you're going to store in the set and an initial pipeline. The initial pipeline is what is going to set the template that all pipelines in the set are going to conform to. They all have to share essentially the same state as the initial pipeline and then you can change the shaders. With shader objects, it's basically the same, but you have to pass more information for the shader objects, like the descriptor set layouts used by each stage, push-constant information… but it's essentially the same.
Once you have that execution set created, you can use those two functions (vkUpdateIndirectExecutionSetPipelineEXT and vkUpdateIndirectExecutionSetShaderEXT) to update and add pipelines to that execution set. You need to take into account that you have to pass a couple of special creation flags to the pipelines, or the shader objects, to tell the driver that you may use those inside an execution set because the driver may need to do something special for them. And one additional restriction that we have is that if you use an execution set token in your sequences, it must appear only once and it must be the first one in the sequence.
The recap, so far, is that the DGC buffer is divided into small chunks that we call sequences. Each sequence follows a template that we call the Indirect Commands Layout. Each sequence must dispatch work exactly once and you may be able to switch the set of shaders we used with with each sequence with an Indirect Execution Set.
Wow do we go about actually telling Vulkan to execute the contents of a specific buffer? Well, before executing the contents of the DGC buffer the application needs to have bound all the needed states to run those commands. That includes descriptor sets, initial push constant values, initial shader state, initial pipeline state. Even if you are going to use an Execution Set to switch shaders later you have to specify some kind of initial shader state.
Once you have that, you can call this vkCmdExecuteGeneratedCommands. You bind all the state into your regular command buffer and then you record this command to tell the driver: at this point, execute the contents of this buffer. As you can see, you typically pass a regular command buffer as the first argument. Then there's some kind of boolean value called isPreprocessed, which is kind of confusing because it's the first time it appears and you don't know what it is about, but we will talk about it in a minute. And then you pass a relatively larger structure containing information about what to execute.
In that GeneratedCommandsInfo structure, you need to pass again the shader stages that will be used. You have to pass the handle for the Execution Set, if you're going to use one (if not you can use the null handle). Of course, the indirect commands layout, which is the central piece here. And then you pass the information about the buffer that you want to execute, which is the indirect address and the indirect address size as the buffer size. We are using buffer device address to pass information.
And then we have something again mentioning some kind of preprocessing thing, which is really weird: preprocess address and preprocess size which looks like a buffer of some kind (we will talk about it later). You have to pass the maximum number of sequences that you are going to execute. Optionally, you can also pass a buffer address for an actual counter of sequences. And the last thing that you need is the max draw count, but you can forget about that if you are not dispatching work using draw-with-count tokens as it only applies there. If not, you leave it as zero and it should work.
We have a couple of things here that we haven't talked about yet, which are the preprocessing things. Starting from the bottom, that preprocess address and size give us a hint that there may be a pre-processing step going on. Some kind of thing that the driver may need to do before actually executing the commands, and we need to pass information about the buffer there.
The boolean value that we pass to the command ExecuteGeneratedCommands tells us that the pre-processing step may have happened before so it may be possible to explicitly do that pre-processing instead of letting the driver do that at execution time. Let's take a look at that in more detail.
First of all, what is the pre-process buffer? The pre-process buffer is auxiliary space, a scratch buffer, because some drivers need to take a look at how the command sequence looks like before actually starting to execute things. They need to go over the sequence first and they need to write a few things down just to be able to properly do the job later to execute those commands.
Once you have the commands layout and you have the maximum number of sequences that you are going to execute, you can call this vkGetGeneratedCommandMemoryRequirementsEXT and the driver is going to tell you how much space it needs. Then, you can create a buffer, you can allocate the space for that, you need to pass a special new buffer usage flag (VK_BUFFER_USAGE_2_PREPROCESS_BUFFER_BIT_EXT) and, once you have that buffer, you pass the address and you pass a size in the previous structure.
Now the second thing is that we have the possibility of ding this preprocessing step explicitly. Explicit pre-processing is something that's optional, but you probably want to do that if you care about performance because it's the key to performance with some drivers.
When you use explicit pre-processing you don't want to (1) record the state, (2) call this vkPreProcessGeneratedCommandsEXT and (3) call vkExecuteGeneratedCommandsEXT. That is what implicit pre-processing does so this doesn't give you anything if you do it this way.
This is designed so that, if you want to do explicit pre-processing, you're going to probably want to use a separate command buffer for pre-processing. You want to batch pre-processing calls together and submit them all together to keep the GPU busy and to give you the performance that you want. While you submit the pre-processing steps you may be still preparing the rest of the command buffers to enqueue the next batch of work. That's the key to doing pre-processing optimally.
You need to decide beforehand if you are going to use explicit pre-processing or not because, if you're going to use explicit preprocessing, you need to pass a flag when you create the commands layout, and then you have to call the function to preprocess generated commands. If you don't pass that flag, you cannot call the preprocessing function, so it's an all or nothing. You have to decide, and you do what you want.
One thing that is important to note is that preprocessing needs to know and has to have the same state, the same contents of the input buffers as when you execute so it can run properly.
The video contains a cut here because the presentation laptop ran out of battery.
If the pre-processing step needs to have the same state as the execution, you need to have bound the same pipeline state, the same shaders, the same descriptor sets, the same contents. I said that explicit pre-processing is normally used using a separate command buffer that we submit before actual execution. You have a small problem to solve, which is that you would need to record state twice: once on the pre-process command buffer, so that the pre-process step knows everything, and once on the execution, the regular command buffer, when you call execute. That would be annoying.
Instead of that, the pre-process generated commands function takes an argument that is a state command buffer and the specification tells you: this is a command buffer that needs to be in the recording state, and the pre-process step is going to read the state from it. This is the first time, and I think the only time in the specification, that something like this is done. You may be puzzled about what this is exactly: how do you use this and how do we pass this?
I just wanted to get this slide out to tell you: if you're going to use explicit pre-processing, the ergonomic way of using it and how we thought about using the processing step is like you see in this slide. You take your main command buffer and you record all the state first and, just before calling execute-generated-commands, the regular command buffer contains all the state that you want and that preprocess needs. You stop there for a moment and then you prepare your separate preprocessing command buffer passing the main one as an argument to the preprocess call, and then you continue recording commands in your regular command buffer. That's the ergonomic way of using it.
You do need some synchronization at some steps. The main one is that, if you generate the contents of the DGC buffer from the GPU itself, you're going to need some synchronization: writes to that buffer need to be synchronized with something else that comes later which is executing or reading those commands from from the buffer.
Depending on if you use explicit preprocessing you can use the pipeline stage command-pre-process which is new and pre-process-read or you synchronize that with the regular device-generated-commands-execution which was considered part of the regular draw-indirect-stage using indirect-command-read access.
If you use explicit pre-processing you need to make sure that writes to the pre-process buffer happen before you start reading from that. So you use these just here (VK_PIPELINE_STAGE_COMMAND_PREPROCESS_BIT_EXT, VK_ACCESS_COMMAND_PREPROCESS_WRITE_BIT_EXT) to synchronize processing with execution (VK_PIPELINE_STAGE_DRAW_INDIRECT_BIT, VK_ACCESS_INDIRECT_COMMAND_READ_BIT) if you use explicit preprocessing.
The quick how-to: I just wanted to get this slide out for those wanting a reference that says exactly what you need to do. All the steps that I mentioned here about creating the commands layout, the execution set, allocating the preprocess buffer, etc. This is the basic how-to.
And that's it. Thanks for watching! Questions?
11 Mar 2025 4:30pm GMT
Mike Blumenkrantz: Znvk
27 Feb 2025
planet.freedesktop.org
Mike Blumenkrantz: Slow Down
Once Again We Return Home
It's been a while, but for the first time this year I have to do it. Some of you are shaking your heads, saying you knew it, and you were right. Here we are again.
It's time to vkoverhead.
The Numbers Must Go Up
I realized while working on some P E R F that there was a lot of perf to be gained in places I wasn't testing. That makes sense, right? If there's no coverage, the perf can't go up.
So I added a new case for the path I was using, and boy howdy did I start to see some weird stuff.
Normally this is where I'd post up some gorgeous flamegraphs, and we would sit back in our expensive leather armchairs debating the finer points of optimization. But you know what? We can't do that anymore.
Why, you're asking. The reason is simple: perf
is totally fucking broken and has been for a while. But only on certain machines. Specifically, mine. So no more flamegraphs for you, and none for me.
Despite this massive roadblock, the perf gains must continue. Through the power of guesswork and frustration, I've managed some sizable gains:
# | Draw Tests | 1000op/s (before) | % relative to 'draw' (before) | 1000op/s (after) | % relative to 'draw' (after) |
---|---|---|---|---|---|
0 | draw | 46298 | 100.0% | 46426 | 100.0% |
16 | vbo change | 17741 | 38.3% | 22413 | 48.3% |
17 | vbo change dynamic (new!) | 4544 | 9.8% | 8686 | 18.7% |
18 | 1vattrib change | 3021 | 6.5% | 3316 | 7.1% |
20 | 16vattrib 16vbo change | 5266 | 11.4% | 6398 | 13.8% |
21 | 16vattrib change | 2352 | 5.1% | 2512 | 5.4% |
22 | 16vattrib change dynamic | 3976 | 8.6% | 5003 | 10.8% |
Though I was mainly targeting the case of using dynamic vertex input and binding new vertex buffers for every draw (and managed a nearly 100% improvement there) , I ended up seeing noteworthy gains across the board for binding vertex buffers, even when using fully static state. This should provide some minor gains to general RADV perf.
Future Improvements
Given the still-massive perf gap between using static and dynamic vertex state when only vertex buffers change, it seems likely there's still some opportunities to reclaim more perf. Only time will tell what can be achieved here, but for now this is what I've got.
27 Feb 2025 12:00am GMT
26 Feb 2025
planet.freedesktop.org
Mike Blumenkrantz: CLthulhu
Insanity Has A Name
Karol Herbst. At SGC, we know this man. We fear him. His photo is on the wall over a break-in-case-of-emergency glass panel which shields a button activating a subterranean escape route set to implode as soon as I sprint through.
Despite this, and despite all past evidence leading me to be wary of any idea he pitched, the madman got me again.
cl_khr_image2d_from_buffer. On the surface, an innocuous little extension used to access a buffer like a 2D image. Vulkan already has this support for 1D images in the form of VkBufferView
, so why would adding a stride to that be any harder (aside from the fact that the API doesn't support it)?
I was deep into otherworldly optimizations at this point, far beyond the point where I was able to differentiate between improvement and neutral, let alone sane or twisted. His words seemed so reasonable: why couldn't I just throw a buffer to the GPU as a 2D image? I'd have to be an idiot not to be able to do something as simple as that. Wouldn't I?
Dammit, Karol.
How to 2D a Buffer
You can't. I mean, I can, but you? Vulkan won't let you do it. There's (currently) no extension that enables a 2D bufferview. Rumor has it some madman on a typewriter is preparing to fax over an extension specification to add this, but only time will tell whether Khronos accepts submissions in this format.
Here at SGC, we're all smart HUMANS though, so there's an obvious solution to this.
It's not memory aliasing. Sure, rebinding buffer memory onto an image might work. But in reading the spec, the synchronization guarantees for buffer-image aliasing didn't seem that strong. And also it'd be a whole bunch of code to track it, and maybe do weird layout stuff, and add some kind of synchronization on the buffer too, and pray the driver isn't buggy, and doesn't this sound a lot like the we-have-this-at-home version of another, better mechanism that zink already has incredible support for?
Yeah. What about these things? How do they wORK?
DMA Buffers: Totally Normal
A DMAbuf is basically a pipe. On one end you have memory. And if you yell TRIANGLE into the other end really loud, something unimaginable and ineffable that lurks deep withinthevoid will slitherand crawl its way up the pipeuntil it GAZES UPON YOU IN YOUR FLESHY MORTAL SHELL ATTEMPTING TO USURP THE POWERS OF THE OLD ONES. It's a fun little experiment with absolutely no unwanted consequences. Try it at home!
The nice thing about dmabufs is I know they work. And I know they work in zink. That's because in order to run an x̸̧̠͓̣̣͎͚̰͎̍̾s̶̡̢͙̞̙͍̬̝̠̩̱̞̮̩̣̑͂͊̎͆̒̓͐͛͊̒͆̄̋ȩ̶̡̨̳̭̲̹̲͎̪̜͒̓̈́̏r̶̩̗͖͙͖̬̟̞̜̠͙̠̎͑̉̌̎̍̑́̏̓̏̒̍͜͝v̶̞̠̰̘̞͖̙̯̩̯̝̂̃̕͜e̴̢̡͎̮͔̤͖̤͙̟̳̹͛̓͌̈̆̈́̽͘̕ŕ̶̫̾͐͘ or a Wayland compositor (e.g., Ŵ̶̢͍̜̙̺͈͉̼̩̯̺̗̰̰͕͍̱͊͊̓̈̀͛̾̒̂̚̕͝ͅḙ̵̛̬̜͔̲͕͖̜̱̻͊̌̾͊͘s̶̢̗̜͈̘͎̠̘̺͉͕̣̯̘̦͓͈̹̻͙̬̘̿͆̏̃̐̍̂̕ͅt̷̨͈̠͕͔̬̙̣͈̪͕̱͕̙̦͕̼̩͙̲͖͉̪̹̼͛̌͋̃̂̂̓̏̂́̔͠͝ͅơ̸̢̛̛̲̟͙͚̰͇̞̖̭̲͍͇̫̘̦̤̩̖͍̄̓́͑̉̿̅̀̉͒͋͒̂́̆̋̚͝ͅͅn̶̢̡̝̥̤̣͔̣͉͖̖̻̬̝̥̦͇͕̘͋͂͛̌̃͠ͅͅ, the reference compositor), dmabufs have to work. Zink can run both of those just fine, so I know there's absolutely zero bugs. There can't be any bugs. No. Not bugs again. NO MORE BUGS
Even better, I know that I can do imports and exports of dmabufs in any dimensionality thanks to that crazy CL-GL sharing extension Karol already suckered me into supporting at the expense of every Vulkan driver's bug tracker. That KAROL HERBST guy, hah, he's such a kidder!
So obviously-It's just common sense at this point-Obviously I should just be able to hook up the pipes here. Export a buffer and then import a 2D image with whatever random CAUSALITY IS A LIE passes for stride. Right? Basically a day at the beach for me.
And of course it works perfectly with no problems whatsoever, giving Davinci Resolve a nice performance boost.
Stay sane, readers.
26 Feb 2025 12:00am GMT
24 Feb 2025
planet.freedesktop.org
Hans de Goede: ThinkPad X1 Carbon Gen 12 camera support and other IPU6 camera work
I have been working on getting the camera on the ThinkPad X1 Carbon Gen 12 to work under Fedora.
This requires 3 things:
- Some ov08x40 sensor patches, these are available as downstream cherry-picks in Fedora kernels >= 6.12.13
- A small pipewire fix to avoid WirePlumber listing a bunch of bogus extra "ipu6" Video Sources, these fixes are available in Fedora's pipewire packages >= 1.2.7-4
- I2C and GPIO drivers for the new Lattice USB IO-expander, these drivers are not available in the upstream / mainline kernel yet
I have also rebased the out of tree IPU6 ISP and proprietary userspace stack in rpmfusion and I have integrated the USBIO drivers into the intel-ipu6-kmod package. So for now getting the cameras to work on the X1 Carbon Gen 12 requires installing the out of tree drivers through rpmfusion. Follow these instructions to enable rpmfusion, you need both the free and nonfree repos.
Then make sure you have a new enough kernel installed and install the rpmfusion akmod for the USBIO drivers:
sudo dnf update 'kernel*'
sudo dnf install akmod-intel-ipu6
The latest version of the out of tree IPU6 ISP driver can co-exist with the mainline / upstream IPU6 CSI receiver kernel driver. So both the libcamera software ISP FOSS stack and Intel's proprietary stack can co-exist now. If you do not want to use the proprietary stack you can disable it by running 'sudo ipu6-driver-select foss'.
After installing the kmod package reboot and then in Firefox go to Mozilla's webrtc test page and click on the "Camera" button, you should now get a camera permisson dialog with 2 cameras: "Built in Front Camera" and "Intel MIPI Camera (V4L2)" the "Built in Front Camera" is the FOSS stack and the "Intel MIPI Camera (V4L2)" is the proprietary stack. Note the FOSS stack will show a strongly zoomed in (cropped) image, this is caused by the GUM test-page, in e.g. google-meet this will not be the case.
I have also been making progress with some of the other open IPU6 issues:
- Camera's failing on Dell XPS laptops due to iVSC errors (rhbz#2316918, rhbz#2324683) after a long debugging session this is finally fixed, the fix for this will be available in Fedora kernels >= 6.13.4 which should show up in updates-testing today
- Camera's no working on Microsoft Surface book with ov7251 sensor, the fix for this has landed upstream
comments
24 Feb 2025 2:44pm GMT
Peter Hutterer: libinput and 3-finger dragging
Ready in time for libinput 1.28 [1] and after a number of attempts over the years we now finally have 3-finger dragging in libinput. This is a long-requested feature that allows users to drag by using a 3-finger swipe on the touchpad. Instead of the normal swipe gesture you simply get a button down, pointer motion, button up sequence. Without having to tap or physically click and hold a button, so you might be able to see the appeal right there.
Now, as with any interaction that relies on the mere handful of fingers that are on our average user's hand, we are starting to have usage overlaps. Since the only difference between a swipe gesture and a 3-finger drag is in the intention of the user (and we can't detect that yet, stay tuned), 3-finger swipes are disabled when 3-finger dragging is enabled. Otherwise it does fit in quite nicely with the rest of the features we have though.
There really isn't much more to say about the new feature except: It's configurable to work on 4-finger drag too so if you mentally substitute all the threes with fours in this article before re-reading it that would save me having to write another blog post. Thanks.
[1] "soonish" at the time of writing
24 Feb 2025 5:38am GMT
Peter Hutterer: GNOME 48 and a changed tap-and-drag drag lock behaviour
This is a heads up as mutter PR!4292 got merged in time for GNOME 48. It (subtly) changes the behaviour of drag lock on touchpads, but (IMO) very much so for the better. Note that this feature is currently not exposed in GNOME Settings so users will have to set it via e.g. the gsettings commandline tool. I don't expect this change to affect many users.
This is a feature of a feature of a feature, so let's start at the top.
"Tapping" on touchpads refers to the ability to emulate button presses via short touches ("taps") on the touchpad. When enabled, a single-finger tap corresponds emulates a left mouse button click, a two-finger tap a right button click, etc. Taps are short interactions and to be recognised the finger must be set down and released again within a certain time and not move more than a certain distance. Clicking is useful but it's not everything we do with touchpads.
"Tap-and-drag" refers to the ability to keep the pointer down so it's possible to drag something while the mouse button is logically down. The sequence required to do this is a tap immediately followed by the finger down (and held down). This will press the left mouse button so that any finger movement results in a drag. Releasing the finger releases the button. This is convenient but especially on large monitors or for users with different-than-whatever-we-guessed-is-average dexterity this can make it hard to drag something to it's final position - a user may run out of touchpad space before the pointer reaches the destination. For those, the tap-and-drag "drag lock" is useful.
"Drag lock" refers to the ability of keeping the mouse button pressed until "unlocked", even if the finger moves off the touchpads. It's the same sequence as before: tap followed by the finger down and held down. But releasing the finger will not release the mouse button, instead another tap is required to unlock and release the mouse button. The whole sequence thus becomes tap, down, move.... tap with any number of finger releases in between. Sounds (and is) complicated to explain, is quite easy to try and once you're used to it it will feel quite natural.
The above behaviour is the new behaviour which non-coincidentally also matches the macOS behaviour (if you can find the toggle in the settings, good practice for easter eggs!). The previous behaviour used a timeout instead so the mouse button was released automatically if the finger was up after a certain timeout. This was less predictable and caused issues with users who weren't fast enough. The new "sticky" behaviour resolves this issue and is (alanis morissette-stylue ironically) faster to release (a tap can be performed before the previous timeout would've expired).
Anyway, TLDR, a feature that very few people use has changed defaults subtly. Bring out the pitchforks!
As said above, this is currently only accessible via gsettings and the drag-lock behaviour change only takes effect if tapping, tap-and-drag and drag lock are enabled:
$ gsettings set org.gnome.desktop.peripherals.touchpad tap-to-click true $ gsettings set org.gnome.desktop.peripherals.touchpad tap-and-drag true $ gsettings set org.gnome.desktop.peripherals.touchpad tap-and-drag-lock true
All features above are actually handled by libinput, this is just about a default change in GNOME.
24 Feb 2025 4:17am GMT
22 Feb 2025
planet.freedesktop.org
Simon Ser: Using Podman, Compose and BuildKit
For my day job, I need to build and run a Docker Compose project. However, because Docker doesn't play well with nftables and I prefer a rootless + daemonless approach, I'm using Podman.
Podman supports Docker Compose projects with two possible solutions: either by connecting the official Docker Compose CLI to a Podman socket, either by using their own drop-in replacement. They ship a small wrapper to select one of these options. (The wrapper has the same name as the replacement, which makes things confusing.)
Unfortunately, both options have downsides. When using the official Docker Compose CLI, the classic builder is used instead of the newer BuildKit builder. As a result, some features such as additional contexts are not supported. When using the podman-compose replacement, some other features are missing, such as !reset
, configs
and referencing another service in additional contexts. It would be possible to add these features to podman-compose, but that's an endless stream of work (Docker Compose regularly adds new features) and I don't really see the value in re-implementing all of this (the fact that it's Python doesn't help me getting motivated).
I've started looking for a way to convince the Docker Compose CLI to run under Podman with BuildKit enabled. I've tried a few months ago and never got it to work, but it seems like this recently became easier! The podman-compose wrapper force-disables BuildKit, so we need to use directly the Docker Compose CLI without the wrapper. On Arch Linux, this can be achieved by enabling the Podman socket and creating a new Docker context (same as setting DOCKER_HOST
, but more permanent):
pacman -S docker-compose docker-buildx
systemctl --user start podman.socket
docker context create podman --docker host=unix://$XDG_RUNTIME_DIR/podman/podman.sock
docker context use podman
With that, docker compose
just works! It turns out it automagically creates a buildx_buildkit_default
container under-the-hood to run the BuildKit daemon. Since I don't like automagical things, I immediately tried to run BuildKit daemon myself:
pacman -S buildkit
systemctl --user start buildkit.service
docker buildx create --name local unix://$XDG_RUNTIME_DIR/buildkit/rootless
docker buildx use local
Now docker compose
uses our systemd-managed BuildKit service. But we're not done yet! One of the reasons I like Podman is because it's daemonless, and we've got a daemon running in the background. This isn't the end of the world, but it'd be nicer to be able to run the build without BuildKit.
Fortunately, there's a way around this: any Compose project can be turned into a JSON description of the build commands called Bake. docker buildx bake --print
will print that JSON file (and the Docker Compose CLI will use Bake files if COMPOSE_BAKE=true
is set since v2.33). Note, Bake supports way more features (e.g. HCL files) but we don't really need these for our purposes (and the command above can lower fancy Bake files into dumb JSON ones).
The JSON file is pretty similar to the podman build
CLI arguments. It's not that hard to do the translation, so I've written Bakah, a small tool which does exactly this. It uses Buildah instead of shelling out to Podman (Buildah is the library used by Podman under-the-hood to build images). A few details required a bit more attention, for instance dependency resolution and parallel builds, but it's quite simple. It can be used like so:
docker buildx bake --print >bake.json
bakah --file bake.json
Bakah is still missing the fancier Bake features (HCL files, inheritance, merging/overriding files, variables, and so on), but it's enough to build complex Compose projects. I plan to use it for soju-containers in the future, to better split my Dockerfiles (one for the backend, one for the frontend) and remove the CI shell script (which contains a bunch of Podman CLI invocations). I hope it can be useful to you as well!
22 Feb 2025 10:00pm GMT
20 Feb 2025
planet.freedesktop.org
Mike Blumenkrantz: Againicl
Busy.
I didn't forget to blog. I know you don't believe me, but I've been accumulating items to blog about for the past month. Powering up. Preparing. And now, finally, it's time to begin opening the valves.
Insanity Returns
When I got back from hibernation, I was immediately accosted by a developer I'd forgotten. One with whom I spent an amount of time consuming adult beverages at XDC again. One who walks with a perpetual glint of madness in his eyes, ready at the drop of a hat to tackle the nearest driver developer and begin raving about the benefits of supporting OpenCL.
Obviously I'm talking about Karol "HOW IS THE PUB ALREADY CLOSED IT'S ONLY 10:30???" Herbst.
I was minding my own business, fixing bugs and addressing perf issues when he assaulted me with a vicious nerdsnipe late one night in January. "Hey, why can't I run DaVinci Resolve on Zink?" he casually asked me, knowing full well the ramifications of such a question.
I tried to put him off, but he persisted. "You know, RadeonSI supports all those features," he said next, and my entire week was ruined. As everyone knows, Zink can only ever be compared to one driver, and the comparisons can't be too uneven.
So it was that I started looking at the CL CTS for the first time this year to implement cl_khr_gl_sharing. This extension is basically EXT_external_objects
for CL. It should "just work". Right?
Right…
The thing is, this mechanism (on Linux) uses dmabufs. You know, that thing we all love because they make display servers go vroom. dmabufs allow sharing memory regions between processes through file descriptors. Or just within the same process. Anywhere, really. One side exports the memory object to the FD, and the other side imports it.
But that's how normal people use dmabufs. 2D image import/export for display server usage. Or, occasionally, some crazy multi-process browser engine thing. But still 2D.
You know who uses dmabufs with all-the-Ds? OpenCL.
You know who doesn't implement all-the-Ds? Any Vulkan drivers. Probably. Case in point, I had to hack it in for RADV before I could get CTS to pass and VVL to stop screaming at me.
From there, it turned out zink mostly supported everything already. A minor bugfix and some conditionals to enable raw buffer import/export, and it just works.
Brace yourselves, because this is the foundation for getting Cthulhu-level insane next time.
20 Feb 2025 12:00am GMT
17 Feb 2025
planet.freedesktop.org
Simon Ser: Status update, February 2025
Hi!
This month has been pretty hectic, with FOSDEM and all. I've really enjoyed meeting face-to-face all of these folks I work online with the rest of the year! My talk about modern IRC has been published on the FOSDEM website (unfortunately the audio quality isn't great).
In Wayland news, the color management protocol has finally been merged! I haven't done much apart cheering from the sidelines: huge thanks to everyone involved for carrying this over the finish line, especially Pekka Paalanen, Sebastian Wick and Xaver Hugl! I've started a wlroots implementation, which was enough with some hacks to get MPV to display a HDR video on Sway. I've also posted a patch to convert to BT2020 and encode to PQ, but I still need to figure out why red shows up as pink (or rebrand it as lipstick-filter
in the Sway config file).
I've released sway 1.10.1 with a bunch of bugfixes, as well as wlr-randr 0.5.0 which adds relative positioning options (e.g. --left-of
) and a man page. I've rewritten makoctl
in C (the shell script approach has been showing its limitations for a while), and merged support for icon border radius, per-corner radius settings, and a new signal in the mako-specific D-Bus API to notify when the current modes are changed.
delthas has contributed support for showing redacted messages as such in gamja. goguma's compact mode now displays an unread and date delimiter, just like the default mode (thanks Eigil Skjæveland!). I've added a basic UI to my WebDAV server, sogogi, to display directory listings and easily upload files from the browser.
That's all, see you next month!
17 Feb 2025 10:00pm GMT