07 Dec 2016
xinput is a tool to query and modify X input device properties (amongst other things). Every so-often someone-complains about it's non-intuitive interface, but this is where users are mistaken: xinput is a not a configuration UI. It is a DUI - a developer user interface  - intended to test things without having to write custom (more user-friendly) for each new property. It is nothing but a tool to access what is effectively a key-value store. To use it you need to know not only the key name(s) but also the allowed formats, some of which are only documented in header files. It is intended to be run under user supervision, anything it does won't survive device hotplugging. Relying on xinput for configuration is the same as relying on 'echo' to toggle parameters in /sys for kernel configuration. It kinda possibly maybe works most of the time but it's not pretty. And it's not intended to be, so please don't complain to me about the arcane user interface.
 don't do it, things will be a bit confusing, you may not do the right thing, you can easily do damage, etc. A lot of similarities... ;)
07 Dec 2016 2:58am GMT
06 Dec 2016
This post mostly affects developers of desktop environments/Wayland compositors. A systemd pull request was merged to add two new properties to some keyboards: XKB_FIXED_LAYOUT and XKB_FIXED_VARIANT. If set, the device must not be switched to a user-configured layout but rather the one set in the properties. This is required to make fake keyboard devices work correctly out-of-the-box. For example, Yubikeys emulate a keyboard and send the configured passwords as key codes matching a US keyboard layout. If a different layout is applied, then the password may get mangled by the client.
Since udev and libinput are sitting below the keyboard layout there isn't much we can do in this layer. This is a job for those parts that handle keyboard layouts and layout configurations, i.e. GNOME, KDE, etc. I've filed a bug for gnome here, please do so for your desktop environment.
If you have a device that falls into this category, please submit a systemd patch/file a bug and cc me on it (@whot).
06 Dec 2016 2:44am GMT
05 Dec 2016
This post applies to most tools that interface with the X server and change settings in the server, including xinput, xmodmap, setxkbmap, xkbcomp, xrandr, xsetwacom and other tools that start with x. The one word to sum up the future for these tools under Wayland is: "non-functional".
An X window manager is little more than an innocent bystander when it comes to anything input-related. Short of handling global shortcuts and intercepting some mouse button presses (to bring the clicked window to the front) there is very little a window manager can do. It's a separate process to the X server and does not receive most input events and it cannot affect what events are being generated. When it comes to input device configuration, any X client can tell the server to change it - that's why general debugging tools like xinput work.
A Wayland compositor is much more, it is a window manager and the display server merged into one process. This gives the compositor a lot more power and responsibility. It handles all input events as they come out of libinput and also manages device's configuration. Oh, and instead of the X protocol it speaks Wayland protocol.
The difference becomes more obvious when you consider what happens when you toggle a setting in the GNOME control center. In both Wayland and X, the control center toggles a gsettings key and waits for some other process to pick it up. In both cases, mutter gets notified about the change but what happens then is quite different. In GNOME(X), mutter tells the X server to change a device property, the server passes that on to the xf86-input-libinput driver and from there the setting is toggled in libinput. In GNOME(Wayland), mutter toggles the setting directly in libinput.
Since there is no X server in the stack, the various tools can't talk to it. So to get the tools to work they would have to talk to the compositor instead. But they only know how to speak X protocol, and no Wayland protocol extension exists for input device configuration. Such a Wayland protocol extension would most likely have to be a private one since the various compositors expose device configuration in different ways. Whether this extension will be written and added to compositors is uncertain, I'm not aware of any plans or even intentions to do so (it's a very messy problem). But either way, until it exists, the tools will merely shout into the void, without even an echo to keep them entertained. Non-functional is thus a good summary.
05 Dec 2016 8:42pm GMT
The Raspberry Pi Foundation recently started contracting with Free Electrons to give me some support on the display side of the stack. Last week I got to review and release their first big piece of work: Boris Brezillon's code for SDTV support. I had suggested that we use this as the first project because it should have been small and self contained. It ended up that we had some clock bugs Boris had to fix, and a bug in my core VC4 CRTC code, but he got a working patch series together shockingly quickly. He did one respin for a couple more fixes once I had tested it, and it's now out on the list waiting for devicetree maintainer review. If nothing goes wrong, we should have composite out support in 4.11 (we're probably a week late for 4.10).
On my side, I spent some more time on HDMI audio and the DSI panel. On the audio side, I'm now emitting the GCP packet for audio mute appropriately (I think), and with some more clocking fixes it's now accepting the audio data at the expected rate. On the DSI front, I fixed a bit of sequencing and added debugfs for the registers like we have in our other encoders. There's still no actual audio coming out of HDMI, and only white coming out of the panel.
The DSI situation is making me wish for someone else's panel that I could attach to the connector, so I could see if my bugs are in the Atmel bridge programming or in the DSI driver.
I did some more analysis of 3DMMES's shaders, and improved our code generation, for wins of 0.4%, 1.9%, 1.2%, 2.6%, and 1.2%. I also experimented with changing the UBO (indirect addressed uniform array) upload path, which showed no effect. 3DMMES's uniform arrays are tiny, though, so it may be a win in some other app later.
I also got a couple of new patches from Jonas Pfeil. I went through his thread switch delay slots patch, which is pretty close to ready. He has a similar patch for branching delay slots, though apparently that one isn't showing wins yet in things he's tested. Perhaps most exciting, though, is that he went and implemented an idea I had dropped on github: replacing our shadow copies of raster textures with a bunch of math in the shader and using general memory loads. This could potentially fix X performance without a compositor, which we otherwise really don't have a workaround for other than "use a compositor." It could also improve GL-in-a-window performance: right now all of our DRI surfaces are raster format, because we want to be able to get full screen pageflipping, but that means we do the shadow copy if they aren't fullscreen. Hopefully this week I'll get a chance to test and review it.
05 Dec 2016 8:17pm GMT
I pushed the patch to require resolution today, expect this to hit the general public with libinput 1.6. If your graphics tablet does not provide axis resolution we will need to add a hwdb entry. Please file a bug in systemd and CC me on it (@whot).
How do you know if your device has resolution? Run sudo evemu-describe against the device node and look for the ABS_X/ABS_Y entries:
# Event code 0 (ABS_X)
# Value 2550
# Min 0
# Max 3968
# Fuzz 0
# Flat 0
# Resolution 13
# Event code 1 (ABS_Y)
# Value 1323
# Min 0
# Max 2240
# Fuzz 0
# Flat 0
# Resolution 13
if the Resolution value is 0 you'll need a hwdb entry or your tablet will stop working in libinput 1.6. You can file the bug now and we can get it fixed, that way it'll be in place once 1.6 comes out.
05 Dec 2016 1:52am GMT
pastebins are useful for dumping large data sets whenever the medium of conversation doesn't make this easy or useful. IRC is one example, or audio/video conferencing. But pastebins only work when the other side looks at the pastebin before it expires, and the default expiry date for a pastebin may only be a few days.
This makes them effectively useless for bugs where it may take a while for the bug to be triaged and the assignee to respond. It may take even longer to figure out the source of the bug, and if there's a regression it can take months to figure it out. Once the content disappears we have to re-request the data from the reporter. And there is a vicious dependency too: usually, logs are more important for difficult bugs. Difficult bugs take longer to fix. Thus, with pastebins, the more difficult the bug, the more likely the logs become unavailable.
All useful bug tracking systems have an attachment facility. Use that instead, it's archived with the bug and if a year later we notice a regression, we still have access to the data.
If you got here because I pasted the link to this blog post, please do the following: download the pastebin content as raw text, then add it as attachment to the bug (don't paste it as comment). Once that's done, we can have a look at your bug again.
05 Dec 2016 1:51am GMT
28 Nov 2016
I missed last week's update, but with the holiday it ended up being a short week anyway.
The multithreaded fragment shaders are now in drm-next and Mesa master. I think this was the last big win for raw GL performance and we're now down to the level of making 1-2% improvements in our commits. That is, unless we're supposed to be using double-buffer non-MS mode and the closed driver was just missing that feature. With the glmark2 comparisons I've done, I'm comfortable with this state, though. I'm now working on performance comparisons for 3DMMES Taiji, which the HW team often uses as a benchmark. I spent a day or so trying to get it ported to the closed driver and failed, but I've got it working on the open stack and have written a couple of little performance fixes with it.
The first was just a regression fix from the multithreading patch series, but it was impressive that multithreading hid a 2.9% instruction count penalty and still showed gains.
One of the new fixes I've been working on is folding ALU operations into texture coordinate writes. This came out of frustration from the instruction selection research I had done the last couple of weeks, where all algorithms seemed like they would still need significant peephole optimization after selection. I finally said "well, how hard would it be to just finish the big selection problems I know are easily doable with peepholes?" and it wasn't all that bad. The win came out to about 1% of instructions, with a similar benefit to overall 3DMMES performance (it's shocking how ALU-bound 3DMMES is)
I also started on a branch to jump to the end of the program when all 16 pixels in a thread have been discarded. This had given me a 7.9% win on GLB2.7 on Intel, so I hoped for similar wins here. 3DMMES seemed like a good candidate for testing, too, with a lot of discards that are followed by reams of code that could be skipped, including texturing. Initial results didn't seem to show a win, but I haven't actually done any stats on it yet. I also haven't done the usual "draw red where we were applying the optimization" hack to verify that my code is really working, either.
While I've been working on this, Jonas Pfeil (who originally wrote the multithreading support) has been working on a couple of other projects. He's been trying to get instructions scheduled into the delay slots of thread switches and branching, which should help reduce any regressions those two features might have caused. More exciting, he's just posed a branch for doing nearest-filtered raster textures (the primary operation in X11 compositing) using direct memory lookups instead of our shadow-copy fallback. Hopefully I get a chance to review, test, and merge in the next week or two.
On the kernel side, my branches for 4.10 have been pulled. We've got ETC1 and multithread FS for 4.10, and a performance win in power management. I've also been helping out and reviewing Boris Brezillon's work for SDTV output in vc4. Those patches should be hitting the list this week.
28 Nov 2016 7:05pm GMT
20 Nov 2016
The Fedora Change to retire the synaptics driver was approved by FESCO. This will apply to Fedora 26 and is part of a cleanup to, ironically, make the synaptics driver easier to install.
Since Fedora 22, xorg-x11-drv-libinput is the preferred input driver. For historical reasons, almost all users have the xorg-x11-drv-synaptics package installed. But to actually use the synaptics driver over xorg-x11-drv-libinput requires a manually dropped xorg.conf.d snippet. And that's just not ideal. Unfortunately, in DNF/RPM we cannot just say "replace the xorg-x11-drv-synaptics package with xorg-x11-drv-libinput on update but still allow users to install xorg-x11-drv-synaptics after that".
So the path taken is a package rename. Starting with Fedora 26, xorg-x11-drv-libinput's RPM will Provide/Obsolete  xorg-x11-drv-synaptics and thus remove the old package on update. Users that need the synaptics driver then need to install xorg-x11-drv-synaptics-legacy. This driver will then install itself correctly without extra user intervention and will take precedence over the libinput driver. Removing xorg-x11-drv-synaptics-legacy will remove the driver assignment and thus fall back to libinput for touchpads. So aside from the name change, everything else works smoother now. Both packages are now updated in Rawhide and should be available from your local mirror soon.
What does this mean for you as a user? If you are a synaptics user, after an update/install, you need to now manually install xorg-x11-drv-synaptics-legacy. You can remove any xorg.conf.d snippets assigning the synaptics driver unless they also include other custom configuration.
 "Provide" in RPM-speak means the package provides functionality otherwise provided by some other package even though it may not necessarily provide the code from that package. "Obsolete" means that installing this package replaces the obsoleted package.
20 Nov 2016 3:57am GMT
16 Nov 2016
The one case where xf86-video-freedreno is still useful is bringing up a new generation of adreno, since it can do dri2 with pure-sw fallbacks for all the EXA ops. But if that is what you are doing, I guess you know how to git clone and build.
The possible alternative is to push a patch that makes xf86-video-freedreno still build, but only probe (with latest xserver ABI) if some "ForceLoad" type option is given in xorg.conf, otherwise fallback to modesetting/glamor. I can't think of a good reason to do this at the moment. But as always, questions/comments/suggestions welcome.
16 Nov 2016 3:59pm GMT
15 Nov 2016
As usual, it can be turned on and off at build-time and there is some configuration available as well to control how the effect works. Here are some screen-shots:
Motion Blur Off
Motion Blur Off
Motion Blur On, intensity 12.5%
Motion Blur On, intensity 12.5%
|Motion Blur On, intensity 25%||Motion Blur On, intensity 25%|
Motion Blur On, intensity 50%
Motion Blur On, intensity 50%
Motion blur is a technique used to make movement feel smoother than it actually is and is targeted at hiding the fact that things don't move in continuous fashion, but rather, at fixed intervals dictated by the frame rate. For example, a fast moving object in a scene can "leap" many pixels between consecutive frames even if we intend for it to have a smooth animation at a fixed speed. Quick camera movement produces the same effect on pretty much every object on the scene. Our eyes can notice these gaps in the animation, which is not great. Motion blur applies a slight blur to objects across the direction in which they move, which aims at filling the animation gaps produced by our discrete frames, tricking the brain into perceiving a smoother animation as result.
In my demo there are no moving objects other than the sky box or the shadows, which are relatively slow objects anyway, however, camera movement can make objects change screen-space positions quite fast (specially when we rotate the view point) and the motion- blur effect helps us perceive a smoother animation in this case.
I will try to cover the actual implementation in some other post but for now I'll keep it short and leave it to the images above to showcase what the filter actually does at different configuration settings. Notice that the smoothing effect is something that can only be perceived during motion, so still images are not the best way to showcase the result of the filter from the perspective of the viewer, however, still images are a good way to freeze the animation and see exactly how the filter modifies the original image to achieve the smoothing effect.
15 Nov 2016 12:19pm GMT
Last Friday, both a GNOME bug day and a bank holiday, a few of us got together to squash some bugs, and discuss GNOME and GNOME technologies.
Guillaume, a new comer in our group, tested the captive portal support for NetworkManager and GNOME in Gentoo, and added instructions on how to enable it to their Wiki. He also tested a gateway related configuration problem, the patch for which I merged after a code review. Near the end of the session, he also rebuilt WebKitGTK+ to test why Google Docs was not working for him anymore in Web. And nobody believed that he could build it that quickly. Looks like opinions based on past experiences are quite hard to change.
Mathieu worked on removing jhbuild's .desktop file as nobody seems to use it, and it was creating the Sundry category for him, in gnome-shell. He also spent time looking into the tracker blocker that is Mozilla's Focus, based on disconnectme's block lists. It's not as effective as uBlock when it comes to blocking adverts, but the memory and performance improvements, and the slow churn rate, could make it a good default blocker to have in Web.
Haïkel looked into using Emeus, potentially the new GTK+ 4.0 layout manager, to implement the series properties page for Videos.
15 Nov 2016 9:48am GMT
I dropped HDMI audio last week because Jonas Pfeil showed up with a pair of branches to do multithreaded fragment shaders.
Some context for multithreaded fragment shaders: Texture lookups are really slow. I think we eyeballed them as having a latency of around 20 QPU instructions. We can hide latency sometimes if there's some unrelated math to be done after the texture coordinate calculation but before using the texture sample. However, in most cases, you need the texture sample results right away for the next bit of work.
To allow programs to hide that latency, there's a cooperative hyperthreading mode that a fragment shader can opt into. The shader stays in the bottom half of register space, and before it collects the results of a texture fetch, it issues a thread switch signal, which the hardware will use to run a second fragment shader thread until that one issues its own thread switch. For the second thread, the top bit of the register addresses gets flipped, so the two threads' states don't overlap (except for the accumulators and the flags, which the shaders have to save and restore).
I had delayed working on this because the full solution was going to be really tricky: queue up as many lookups as we can, then thread switch, then collect all those results before switching again, all while respecting the FIFO sizes. However, Jonas's huge contribution here was in figuring out that you don't need to be perfect, you can get huge gains by thread switching between each texture lookup and reading its results.
The upshot was a 0-20% performance improvement on glmark2 and a performance hit to only one testcase. With this we're down to 3 subtests that we're slower on than the closed source driver. Jonas's kernel patch is out on the list, and I rewrote the Mesa series to expose threading to the register allocator and landed all but the enabling patch (which has to wait on the kernel side getting merged). Hopefully I'll finish merging it in a week.
In the process of writing multithreading, Jonas noticed that we were scheduling our TLB_Z writes early in the program, which can cut into fragment shader parallelism between the QPUs (of which we have 12) because a TLB_Z write locks the scoreboard for that 2x2 pixel quad. I merged a patch to Mesa that pushes the TLB_Z down to the bottom, at the cost of a single extra QPU instruction. Some day we should flip QPU scheduling around so that we pair that TLB_Z up better, and fill our delay slots in thread switches and branching as well.
I also tracked down a major performance issue in Raspbian's desktop using the open driver. I asked them to use xcompmgr a while back, because readback from the front buffer using the GPU is really slow (The texture unit can't read from raster textures, so we have to reformat them to be readable), and made window dragging unbearable. However, xcompmgr doesn't unredirect fullscreen windows, so full screen GL apps emit copies (with reformatting!) instead of pageflipping.
It looks like the best choice for Raspbian is going to be using compton, an xcompmgr fork that does pageflipping (you get tear free screen updates!) and unredirection of full screen windows (you get pageflipping directly from GL apps). I've also opened a bug report on compton with a description of how they could improve their GL drawing for a tiled renderer like the Pi, which could improve its performance for windowed updates significantly.
Simon ran into some trouble with compton, so he hasn't flipped the default yet, but I would encourage anyone running a Raspberry Pi desktop to give it a shot -- the improvement should be massive.
Other things last week: More VCHIQ patch review, merging more to the -next branches for 4.10 (Martin Sperl's thermal driver, at last!), and a shadow texturing bugfix.
15 Nov 2016 12:00am GMT
14 Nov 2016
I've written more extensively about this here but here's an analogy that should get the point across a bit better: Wayland is just a protocol, just like HTTP. In both cases, you have two sides with very different roles and functionality. In the HTTP case, you have the server (e.g. Apache) and the client (a browser, e.g. Firefox). The communication protocol is HTTP but both sides make a lot of decisions unrelated to the protocol. The server decides what data is sent, the client decides how the data is presented to the user. Wayland is very similar. The server, called the "compositor", decides what data is sent (also: which of the clients even gets the data). The client renders the data  and decides what to do with input like key strokes, etc.
Asking Does $FEATURE work under Wayland? is akin to asking Does $FEATURE work under HTTP?. The only answer is: it depends on the compositor and on the client. It's the wrong question. You should ask questions related to the compositor and the client instead, e.g. "does $FEATURE work in GNOME?" or "does $FEATURE work in GTK applications?". That's a question that can be answered.
Of course, there are some cases where the fault is really the protocol itself. But often enough, it's not.
 albeit it does so by telling the compositor to display it. The analogy with HTTP only works to some extent... :)
14 Nov 2016 12:45am GMT
10 Nov 2016
So, in Fedora Workstation 24 we added H264 support through OpenH264. In Fedora Workstation 25 I am happy to tell you all that we are taking another step in improving our codec support by adding support for mp3 playback. I know this has been a big wishlist item for a long time for a lot of people so I am really happy that we are finally in a position to fulfil that wish. You should be able to download the mp3 plugin on day 1 through GNOME Software or through the missing codec installer in various GStreamer applications. For Fedora Workstation 26 I would not be surprised if we decide to ship it on the install media.
Fo the technically inclined out there, our initial enablement is through the mpeg123 library and corresponding GStreamer plugin. The main reason we choose this library over all the others available out there was a combination of using the same license as GStreamer (lgpl v2) and being a well established library used by a lot of different applications already. There might be other mp3 decoders added in the future depending on interest in and effort by the community. So get ready to install Fedora Workstation 25 when its released soon and play some tunes
P.S. To be 110% clear we will not be adding encoding support at this time.
10 Nov 2016 7:34pm GMT
08 Nov 2016
08 Nov 2016 2:26pm GMT
07 Nov 2016
I spent a little bit of time last week on actual 3D work. I finally figured out what was going wrong with glmark2's terrain demo. The RCP (reciprocal) instruction on vc4 is a very rough approximation, and in translating GLSL code doing floating point divides (which we have to convert from a / b to a * (1 / b)) we use a Newton-Raphson step to improve our approximation. However, we forgot to do this for the implicit divide we have to do at the end of your vertex shader, so the triangles in the distance were unstable and bounced around based on the error in the RCP output.
I also finally got around to writing kernel-side validation for ETC1 texture compression. I should be able to merge support for this to v4.10, which is one of the feature checkboxes we were missing compared to the old driver. Unfortunately ETC1 doesn't end up being very useful for us in open source stacks, because S3TC has been around a lot longer and supported on more hardware, so there's not much content using ETC that I know of outside of Android land.
I also spent a while fixing up various regressions that had happened in Mesa while I'd been playing in the kernel. Some day I should work on CI infrastructure on real hardware, but unfortunately Mesa doesn't have any centralized CI infrastructure to plug into. Intel has a test farm, but not all proposed patches get submitted to it, and it (understandably) doesn't have non-Intel hardware.
In kernel land, I sent out a patch to reduce V3D's overhead due to power management. We were immediately shutting off the 3D engine when it went idle, even if userland might be expected to submit a new frame shortly thereafter. This was taking up 1% of the CPU on profiles I was looking at last week, and I've seen it up to 3.5%. Now we keep the GPU on for at least 40ms ("about two frames plus some slack"). If I'm reading right, the closed driver does immediate powerdown as well, so we may be using slightly more power now, but given that power state transitions themselves cost power, and the CPU overhead costs power, hopefully this works out fine anyway (even though we've never done power measurements for the platform).
I also reviewed more patches from Michael Zoran for vchiq. It should now (once all reviewed patches land) work on arm64. We still need to get the vchiq-using drivers like camera and vcsm into staging.
Finally, I made more progress on HDMI audio. One of the ALSA developers, Lars-Peter Clausen, pointed out that ALSA's design is that userspace would provide the IEC958 subframes by using alsalib's iec958 plugin and a bit ouf asoundrc configuration. He also provided a ton of help debugging that pipeline. I rebased and cleaned up into a branch that uses simple-audio-card as my machine driver, so the whole stack is pretty trivial. Now aplay runs successfully, though no sound has come out. Next step is to go test that the closed stack actually plays audio on this monitor and capture some register state for debugging.
07 Nov 2016 11:11pm GMT