12 Dec 2018

feedplanet.freedesktop.org

Peter Hutterer: Understanding HID report descriptors

This time we're digging into HID - Human Interface Devices and more specifically the protocol your mouse, touchpad, joystick, keyboard, etc. use to talk to your computer.

Remember the good old days where you had to install a custom driver for every input device? Remember when PS/2 (the protocol) had to be extended to accommodate for mouse wheels, and then again for five button mice. And you had to select the right protocol to make it work. Yeah, me neither, I tend to suppress those memories because the world is awful enough as it is.

As users we generally like devices to work out of the box. Hardware manufacturers generally like to add bits and bobs because otherwise who would buy that new device when last year's device looks identical. This difference in needs can only be solved by one superhero: Committee-man, with the superpower to survive endless meetings and get RFCs approved.

Many many moons ago, when USB itself was in its infancy, Committee man and his sidekick Caffeine boy got the USB consortium agree on a standard for input devices that is so self-descriptive that operating systems (Win95!) can write one driver that can handle this year's device, and next year's, and so on. No need to install extra drivers, your device will just work out of the box. And so HID was born. This may only an approximate summary of history.

Originally HID was designed to work over USB. But just like Shrek the technology world is obsessed with layers so these days HID works over different transport layers. HID over USB is what your mouse uses, HID over i2c may be what your touchpad uses. HID works over Bluetooth and it's celebrity-diet version BLE. Somewhere, someone out there is very slowly moving a mouse pointer by sending HID over carrier pigeons just to prove a point. Because there's always that one guy.

HID is incredibly simple in that the static description of the device can just be bytes burnt into the ROM like the Australian sun into unprepared English backpackers. And the event frames are often an identical series of bytes where every bit is filled in by the firmware according to the axis/buttons/etc.

HID is incredibly complicated because parsing it is a stack-based mental overload. Each individual protocol item is simple but getting it right and all into your head is tricky. Luckily, I'm here for you to make this simpler to understand or, failing that, at least more entertaining.

As said above, the purpose of HID is to make devices describe themselves in a generic manner so that you can have a single driver handle any input device. The idea is that the host parses that standard protocol and knows exactly how the device will behave. This has worked out great, we only have around 200 files dealing with vendor- and hardware-specific HID quirks as of v4.20.

HID messages are Reports. And to know what a Report means and how to interpret it, you need a Report Descriptor. That Report Descriptor is static and contains a series of bytes detailing "what" and "where", i.e. what a sequence of bits represents and where to find those bits in the Report. So let's try and parse one of Report Descriptors, let's say for a fictional mouse with a few buttons. How exciting, we're at the forefront of innovation here.

The Report Descriptor consists of a bunch of Items. A parser reads the next Item, processes the information within and moves on. Items are small (1 byte header, 0-4 bytes payload) and generally only apply exactly one tiny little bit of information. You need to accumulate several items to build up enough information to actually know what's happening.

The "what" question of the Report Descriptor is answered with the so-called Usage. This could be something simple like X or Y (0x30 and 0x31) or something more esoteric like System Menu Exit (0x88). A Usage is 16 bits but all Usages are grouped into so-called Usage Pages. A Usage Page too is a 16 bit value and together they form the 32-bit value that tells us what the device can do. Examples:


0001 0031 # Generic Desktop, Y
0001 0088 # Generic Desktop, System Menu Exit
0003 0005 # VR Controls, Head Tracker
0003 0006 # VR Controls, Head Mounted Display
0004 0031 # Keyboard, Keyboard \ and |

Note how the Usage in the last item is the same as the first one, without the Usage Page you will mix things up. It helps if you always think of as the Usage as a 32-bit number. For your kids' bed-time story time, here are the HID Usage Tables from 2004 and the approved HID Usage Table Review Requests of the last decade. Because nothing puts them to sleep quicker than droning on about hex numbers associated with remote control buttons.

To successfully interpret a Report from the device, you need to know which bits have which Usage associated with them. So let's go back to our innovative mouse. We would want a report descriptor with 6 items like this:


Usage Page (Generic Desktop)
Usage (X)
Report Size (16)
Usage Page (Generic Desktop)
Usage (Y)
Report Size (16)

This basically tells the host: X and Y both have 16 bits. So if we get a 4-byte Report from the device, we know two bytes are for X, two for Y.

HID was invented when a time when bits were more expensive than printer ink, so we can't afford to waste any bits (still the case because who would want to spend an extra penny on more ROM). HID makes use of so-called Global items, once those are set their value applies to all following items until changed. Usage Page and Report Size are such Global items, so the above report descriptor is really implemented like this:


Usage Page (Generic Desktop)
Usage (X)
Usage (Y)
Report Count (2)
Report Size (16)
Input (Data,Var,Rel)

The Report Count just tells us that 2 fields of the current Report Size are coming up. We have two usages, two fields, and 16 bits each so we know what to do. The Input item is sort-of the marker for the end of the stack, it basically tells us "process what you've seen so far", together with a few flags. Rel in this case means that the Usages are relative. Oh, and Input means that this is data from device to host. Output would be data from host to device, e.g. to set LEDs on a keyboard. There's also Feature which indicates configurable items.

Buttons on a device are generally just numbered so it'd be monumental 16-bits-at-a-time waste to have HID send Usage (Button1), Usage (Button2), etc. for every button on the device. HID instead provides a Usage Minimum and Usage Maximumto sequentially order them. This looks like this:


Usage Page (Button)
Usage Minimum (1)
Usage Maximum (5)
Report Count (5)
Report Size (1)
Input (Data,Var,Abs)

So we have 5 buttons here and each button has one bit. Note how the buttons are Abs because a button state is not a relative value, it's either down or up. HID is quite intolerant to Schrödinger's thought experiments.

Let's put the two things together and we have an almost-correct Report descriptor:


Usage Page (Button)
Usage Minimum (1)
Usage Maximum (5)
Report Count (5)
Report Size (1)
Input (Data,Var,Abs)

Report Size (3)
Report Count (1)
Input (Cnst,Arr,Abs)

Usage Page (Generic Desktop)
Usage (X)
Usage (Y)
Report Count (2)
Report Size (16)
Input (Data,Var,Rel)

New here is Cnst. This signals that the bits have a constant value, thus don't need a Usage and basically don't matter (haha. yeah, right. in theory). Linux does indeed ignore those. Cnst is used for padding to align on byte boundaries - 5 bits for buttons plus 3 bits padding make 8 bits. Which makes one byte as everyone agrees except for granddad over there in the corner. I don't know how he got in.

Were we to get a 5-byte Report from the device, we'd parse it approximately like this:


button_state = byte[0] & 0x1f
x = bytes[1] | (byte[2] << 8)
y = bytes[3] | (byte[4] << 8)

Hooray, we're almost ready. Except not. We may need more info to correctly interpret the data within those reports.

The Logical Minimum and Logical Maximum specify the value range of the actual data. We need this to tell us whether the data is signed and what the allowable range is. Together with the Physical Minimumand the Physical Maximum they specify what the values really mean. In the simple case:


Usage Page (Generic Desktop)
Usage (X)
Usage (Y)
Report Count (2)
Report Size (16)
Logical Minimum (-32767)
Logical Maximum (32767)
Input (Data,Var,Rel)

This just means our x/y data is signed. Easy. But consider this combination:


...
Logical Minimum (0)
Logical Maximum (1)
Physical Minimum (1)
Physical Maximum (12)

This means that if the bit is 0, the effective value is 1. If the bit is 1, the effective value is 12.

Note that the above is one report only. Devices may have multiple Reports, indicated by the Report ID. So our Report Descriptor may look like this:


Report ID (01)
Usage Page (Button)
Usage Minimum (1)
Usage Maximum (5)
Report Count (5)
Report Size (1)
Input (Data,Var,Abs)
Report Size (3)
Report Count (1)
Input (Cnst,Arr,Abs)

Report ID (02)
Usage Page (Generic Desktop)
Usage (X)
Usage (Y)
Report Count (2)
Report Size (16)
Input (Data,Var,Rel)

If we were to get a Report now, we need to check byte 0 for the Report ID so we know what this is. i.e. our single-use hard-coded parser would look like this:


if byte[0] == 0x01:
button_state = byte[1] & 0x1f
else if byte[0] == 0x02:
x = bytes[2] | (byte[3] << 8)
y = bytes[4] | (byte[5] << 8)

A device may use multiple Reports if the hardware doesn't gather all data within the same hardware bits. Now, you may ask: if I get fifteen reports, how should I know what belongs together? Good question, and lucky for you the HID designers are miles ahead of you. Report IDs are grouped into Collections.

Collections can have multiple types. An Application Collectiondescribes a set of inputs that make sense as a whole. Usually, every Report Descriptor must define at least one Application Collection but you may have two or more. For example, a a keyboard with integrated trackpoint should and/or would use two. This is how the kernel knows it needs to create two separate event nodes for the device. Application Collections have a few reserved Usages that indicate to the host what type of device this is. These are e.g. Mouse, Joystick, Consumer Control. If you ever wondered why you have a device named like "Logitech G500s Laser Gaming Mouse Consumer Control" this is the kernel simply appending the Application Collection's Usage to the device name.

A Physical Collection indicates that the data is collected at one physical point though what a point is is a bit blurry. Theoretical physicists will disagree but a point can be "a mouse". So it's quite common for all reports on a mouse to be wrapped in one Physical Collections. If you have a device with two sets of sensors, you'd have two collections to illustrate which ones go together. Physical Collections also have reserved Usages like Pointer or Head Tracker.

Finally, a Logical Collection just indicates that some bits of data belong together, whatever that means. The HID spec uses the example of buffer length field and buffer data but it's also common for all inputs from a mouse to be grouped together. A quick check of my mice here shows that Logitech doesn't wrap the data into a Logical Collection but Microsoft's firmware does. Because where would we be if we all did the same thing...

Anyway. Now that we know about collections, let's look at a whole report descriptor as seen in the wild:


Usage Page (Generic Desktop)
Usage (Mouse)
Collection (Application)
Usage Page (Generic Desktop)
Usage (Mouse)
Collection (Logical)
Report ID (26)
Usage (Pointer)
Collection (Physical)
Usage Page (Button)
Usage Minimum (1)
Usage Maximum (5)
Report Count (5)
Report Size (1)
Logical Minimum (0)
Logical Maximum (1)
Input (Data,Var,Abs)
Report Size (3)
Report Count (1)
Input (Cnst,Arr,Abs)
Usage Page (Generic Desktop)
Usage (X)
Usage (Y)
Report Count (2)
Report Size (16)
Logical Minimum (-32767)
Logical Maximum (32767)
Input (Data,Var,Rel)
Usage (Wheel)
Physical Minimum (0)
Physical Maximum (0)
Report Count (1)
Report Size (16)
Logical Minimum (-32767)
Logical Maximum (32767)
Input (Data,Var,Rel)
End Collection
End Collection
End Collection

We have one Application Collection (Generic Desktop, Mouse) that contains one Logical Collection (Generic Desktop, Mouse). That contains one Physical Collection (Generic Desktop, Pointer). Our actual Report (and we have only one but it has the decimal ID 26) has 5 buttons, two 16-bit axes (x and y) and finally another 16 bit axis for the Wheel. This device will thus send 8-byte reports and our parser will do:


if byte[0] != 0x1a: # it's decimal in the above descriptor
error, should be 26
button_state = byte[1] & 0x1f
x = byte[2] | (byte[3] << 8)
y = byte[4] | (byte[5] << 8)
wheel = byte[6] | (byte[7] << 8)

That's it. Now, obviously, you can't write a parser for every HID descriptor out there so your actual parsing code needs to be generic. The Linux kernel does exactly that and so does everything else that needs to parse HID. There's a huge variety in devices out there, all with HID descriptors that may or may not be correct. As with so much in life, correct HID implementations are often defined by "whatever Windows accepts" so if you like playing catch, Linux development is for you.

Oh, in case you just got a bit too optimistic about the state of the world: HID allows for vendor-defined usages. Which does exactly what you'd think it does, it hides vendor-specific protocol inside what should be a generic protocol. There are devices with hidden report IDs that you can only unlock by sending the right magic sequence to the report and/or by defeating the boss on Level 4. Usually those devices present themselves as basic/normal devices over HID but if you know the magic sequence you get to use *gasp* all buttons. Or access the device-specific configuration features. Logitech's HID++ is just one example here but at least that's one where we have most of the specs available.

The above describes how to parse the HID report descriptor and interpret the reports. But what happens once you have a HID report correctly parsed? In the case of the Linux kernel, once the report descriptor is parsed evdev nodes are created (one per Application Collection, more or less). As the Reports come in, they are mapped to evdev codes and the data appears on the evdev node. That's where userspace like libinput can pick it up. That bit is actually quite simple (mostly anyway).

The above output was generated with the tools from the hid-tools repository. Go forth and hid-record.

12 Dec 2018 10:21am GMT

Peter Hutterer: High resolution wheel scrolling on Linux v4.21

Disclaimer: this is pending for v4.21 and thus not yet in any kernel release.

Most wheel mice have a physical feature to stop the wheel from spinning freely. That feature is called detents, notches, wheel clicks, stops, or something like that. On your average mouse that is 24 wheel clicks per full rotation, resulting in the wheel rotating by 15 degrees before its motion is arrested. On some other mice that angle is 18 degrees, so you get 20 clicks per full rotation.

Of course, the world wouldn't be complete without fancy hardware features. Over the last 10 or so years devices have added free-wheeling scroll wheels or scroll wheels without distinct stops. In many cases wheel behaviour can be configured on the device, e.g. with Logitech's HID++ protocol. A few weeks back, Harry Cutts from the chromium team sent patches to enable Logitech high-resolution wheel scrolling in the kernel. Succinctly, these patches added another axis next to the existing REL_WHEEL named REL_WHEEL_HI_RES. Where available, the latter axis would provide finer-grained scroll information than the click-by-click REL_WHEEL. At the same time I accidentally stumbled across the documentation for the HID Resolution Multiplier Feature. A few patch revisions later and we now have everything queued up for v4.21. Below is a summary of the new behaviour.

The kernel will continue to provide REL_WHEEL as axis for "wheel clicks", just as before. This axis provides the logical wheel clicks, (almost) nothing changes here. In addition, a REL_WHEEL_HI_RES axis is available which allows for finer-grained resolution. On this axis, the magic value 120 represents one logical traditional wheel click but a device may send a fraction of 120 for a smaller motion. Userspace can either accumulate the values until it hits a full 120 for one wheel click or it can scroll by a few pixels on each event for a smoother experience. The same principle is applied to REL_HWHEEL and REL_HWHEEL_HI_RES for horizontal scroll wheels (which these days is just tilting the wheel). The REL_WHEEL axis is now emulated by the kernel and simply sent out whenever we have accumulated 120.

Important to note: REL_WHEEL and REL_HWHEEL are now legacy axes and should be ignored by code handling the respective high-resolution version.

The magic value of 120 is taken directly from Windows. That value was chosen because it has a good number of integer factors, so dividing 120 by whatever multiplier the mouse uses gives you a integer fraction of 120. And because HW manufacturers want it to work on Windows, we can rely on them doing it right, provided we use the same approach.

There are two implementations that matter. Harry's patches enable the high-resolution scrolling on Logitech mice which seem to mostly have a multiplier of 8 (i.e. REL_WHEEL_HI_RES will send eight events with a value of 15 before REL_WHEEL sends 1 click). There are some interesting side-effects with e.g. the MX Anywhere 2S. In high-resolution mode with a multiplier of 8, a single wheel movement does not always give us 8 events, the firmware does its own magic here. So we have some emulation code in place with the goal of making the REL_WHEEL event happen on the mid-point of a wheel click motion. The exact point can shift a bit when the device sends 7 events instead of 8 so we have a few extra bits in place to reset after timeouts and direction changes to make sure the wheel behaviour is as consistent as possible.

The second implementation is for the generic HID protocol. This was all added for Windows Vista, so we're only about a decade behind here. Microsoft got the Resolution Multiplier feature into the official HID documentation (possibly in the hope that other HW manufacturers implement it which afaict didn't happen). This feature effectively provides a fixed value multiplier that the device applies in hardware when enabled. It's basically the same as the Logitech one except it's set through a HID feature instead of a vendor-specific protocol. On the devices tested so far (all Microsoft mice because no-one else seems to implement this) the multipliers vary a bit, ranging from 4 to 12. And the exact behaviour varies too. One mouse behaves correctly (Microsoft Comfort Optical Mouse 3000) and sends more events than before. Other mice just send the multiplied value instead of the normal value, so nothing really changes. And at least one mouse (Microsoft Sculpt Ergonomic) sends the tilt-wheel values more frequently and with a higher value. So instead of one event with value 1 every X ms, we now get an event with value 3 every X/4 ms. The mice tested do not drop events like the Logitech mice do, so we don't need fancy emulation code here. Either way, we map this into the 120 range correctly now, so userspace gets to benefit.

As mentioned above, the Resolution Multiplier HID feature was introduced for Windows Vista which is... not the most recent release. I have a strong suspicion that Microsoft dumped this feature as well, the most recent set of mice I have access to don't provide the feature anymore (they have vendor-private protocols that we don't know about instead). So the takeaway for all this is: if you have a Logitech mouse, you'll get higher-resolution scrolling on v4.21. If you have a Microsoft mouse a few years old, you may get high-resolution wheel scrolling if the device supports it. Any other vendor or a new Microsoft mouse, you don't get it.

Coincidentally, if you know anyone at Microsoft who can provide me with the specs for their custom protocol, I'd appreciate it. We'd love to have support for it both in libratbag and in the kernel. Or any other vendor, come to think of it.

12 Dec 2018 4:27am GMT

10 Dec 2018

feedplanet.freedesktop.org

Eric Anholt: 2018-12-10

For V3D last week, I resurrected my old GLES 3.1 series with SSBO and shader imgae support, rebuilt it for V3D 4.1 (shader images no longer need manual tiling), and wrote indirect draw support and started on compute shaders. As of this weekend, dEQP-GLES31 is passing 1387/1567 of tests with "compute" in the name on the simulator. I have a fix needed for barrier(), then it's time to build the kernel interface. In the process, I ended up fixing several job flushing bugs, plugging memory leaks, improving our shader disassembly debug dumps, and reducing memory consumption and CPU overhead.

The TFU series is now completely merged, and the kernel cache management series from last week is now also merged. I also fixed a bug in the core GPU scheduler that would cause use-after-frees when tracing.

For vc4, Boris completed and merged the H/V flipping work after fixing display of flipped/offset SAND images. More progress has been made on chamelium testing, so we're nearing having automated regression tests for parts of our display stack. My PM driver has the acks we needed, but my co-maintainer has tacked a new request on in v3 so it'll be delayed.

10 Dec 2018 12:30am GMT

05 Dec 2018

feedplanet.freedesktop.org

Samuel Iglesias: VK_KHR_shader_float_controls and Mesa support

Khronos Group has published two new extensions for Vulkan: VK_KHR_shader_float16_int8 and VK_KHR_shader_float_controls. In this post, I will talk about VK_KHR_shader_float_controls, which is the extension I have been implementing on Anvil driver, the open-source Intel Vulkan driver, as part of my job at Igalia. For information about VK_KHR_shader_float16_int8 and its implementation in Mesa, you can read Iago's blogpost.

The Vulkan Working Group has defined a new extension VK_KHR_shader_float_controls, which allows applications to query and override the implementation's default floating point behavior for rounding modes, denormals, signed zero and infinity. From the Vulkan application developer perspective, VK_shader_float_controls defines a new structure called VkPhysicalDeviceFloatControlsPropertiesKHR where the drivers expose the supported capabilities such as the rounding modes for each floating point data type, how the denormals are expected to be handled by the hardware (either flush to zero or preserve their bits) and if the value is a signed zero, infinity and NaN, whether it will preserve their bits.

typedef struct VkPhysicalDeviceFloatControlsPropertiesKHR {
    VkStructureType    sType;
    void*              pNext;
    VkBool32           separateDenormSettings;
    VkBool32           separateRoundingModeSettings;
    VkBool32           shaderSignedZeroInfNanPreserveFloat16;
    VkBool32           shaderSignedZeroInfNanPreserveFloat32;
    VkBool32           shaderSignedZeroInfNanPreserveFloat64;
    VkBool32           shaderDenormPreserveFloat16;
    VkBool32           shaderDenormPreserveFloat32;
    VkBool32           shaderDenormPreserveFloat64;
    VkBool32           shaderDenormFlushToZeroFloat16;
    VkBool32           shaderDenormFlushToZeroFloat32;
    VkBool32           shaderDenormFlushToZeroFloat64;
    VkBool32           shaderRoundingModeRTEFloat16;
    VkBool32           shaderRoundingModeRTEFloat32;
    VkBool32           shaderRoundingModeRTEFloat64;
    VkBool32           shaderRoundingModeRTZFloat16;
    VkBool32           shaderRoundingModeRTZFloat32;
    VkBool32           shaderRoundingModeRTZFloat64;
} VkPhysicalDeviceFloatControlsPropertiesKHR;

This structure will be filled by the driver when calling vkGetPhysicalDeviceProperties2(), with a pointer to such structure as one of the pNext pointers of VkPhysicalDeviceProperties2 structure. With that, we know if the driver will support the SPIR-V capabilities we want to use in our shaders, if separate*Settings are true, remember to check the value of the property for the floating point bit-size types you are planning to work with.

The required bits to enable such capabilities in a SPIR-V shader are the following:

  1. Enable the extension: OpExtension "SPV_KHR_float_controls"
  2. Enable the desired capability. For example: OpCapability DenormFlushToZero
  3. Specify where to apply it. For example, we would like to flush to zero all fp64 denormalss in the %main function of a shader: OpExecutionMode %main DenormFlushToZero 64. If we want to apply different modes, we would repeat that line with the needed ones.
  4. Profit!

I implemented the support of this extensions for the Anvil's supported GPUs (Broadwell, Skylake, Kabylake and newer), although we don't support all the capabilities. For example on Broadwell, float16 denormals are not supported, and the support for flushing to zero the float16 denormals is not supported for all the instructions in the rest of generations.

If you are interested, the patches are now under review :-) As there are not real world code using this feature yet, please fill any bug you find about this in our bugzilla.

05 Dec 2018 3:57pm GMT

04 Dec 2018

feedplanet.freedesktop.org

Iago Toral: VK_KHR_shader_float16_int8 on Anvil

The last time I talked about my driver work was to announce the implementation of the shaderInt16 feature for the Anvil Vulkan driver back in May, and since then I have been working on VK_KHR_shader_float16_int8, a new Vulkan extension recently announced by the Khronos group, for which I have just posted initial patches in mesa-dev supporting Broadwell and later Intel platforms.

As you probably guessed by the name, this extension enables Vulkan to consume SPIR-V shaders that use of Float16 and Int8 types in arithmetic operations, extending the functionality included with VK_KHR_16bit_storage and VK_KHR_8bit_storage, which was limited to load/store operations. In theory, applications that do not need the range and precision of regular 32-bit floating point and integers, can use these new types to improve performance by increasing ALU throughput and reducing register pressure, which in some platforms can also lead to improved parallelism.

In the case of the Intel platforms initial testing done by Intel suggests that better ALU throughput is expected when issuing half-float instructions. Lower register pressure is also expected, at least for SIMD16 fragment and compute shaders, where we can pack all 16-channels worth of half-float data into a single GPU register, which could significantly improve performance for shaders that would otherwise need to spill registers to memory.

Another neat thing is that while VK_KHR_shader_float16_int8 is a Vulkan extension, its implementation is mostly API agnostic, so most of the work we did here should also help us have a proper mediump implementation for GLSL ES shaders in the future.

There are a few caveats to consider as well though: on some hardware platforms smaller bit-sizes have certain hardware restrictions that may lead to emitting worse shader code in some scenarios, and generally, Mesa's compiler infrastructure (and the Intel compiler backend in particular) have a long history of being 32-bit only, so there are parts of the compiler stack that still work better for 32-bit code.

Because VK_KHR_shader_float16_int8 is a brand new feature, we don't really have any real world use cases yet. This is on top of the fact that Mesa's compiler backends have been mostly (or exclusively) 32-bit aware until now (and more recently 64-bit too), so going forward I would expect a lot of focus on making our compiler be as robust (and optimal) for 16-bit code as it is for 32-bit code.

While we are already aware of a few areas where we can do better and I am currently working on addressing a few of these, one of the major limiting factors we have at the moment is the fact that the only source of 16-bit shaders available to us is the Khronos CTS, which due to its particular motivation, is very different from real world shader workloads and it is not a valid source material to drive compiler optimization work. Unfortunately, it might take some time until we start seeing applications using these new features, so in the meantime we will need to find other ways to drive further work in this area, and I think our best option here might be GLSL ES's mediump and lowp qualifiers.

GLSL ES mediump and lowp qualifiers have been around for a long time but they are only defined as hints to the shader compiler that lower precision is acceptable and we have never really used them to emit half-float code. Thankfully, Topi Pohjolainen from Intel has been working on this for a while, which would open up a much better scenario for improving our 16-bit compiler paths, so this is something I am really looking forward to.

Finally, as I say above, we could could definitely use more testing and feedback from real world use cases, so if you decide to use this feature in your next project and you hit any bugs, please be sure to file them in Bugzilla so we can continue to improve our implementation.

04 Dec 2018 8:25am GMT

03 Dec 2018

feedplanet.freedesktop.org

Robert Foss: Running Docker privileged inside of LXC / LXD

The architecture is a bit of container matroska, but what we're trying to achieve is running Docker privileged inside of a LXC container on a baremetal host.

Alt text

Setup container on LXC Host

In order to give Docker in the guest privileges, the guest container itself has to be given privileges.

There is no simple switch for doing this in LXC unfortunately, but a few config options will do the trick.

lxc launch images:ubuntu/bionic container

lxc config set container security.nesting true
lxc config set container security.privileged true
cat <<EOT | lxc config set container raw.lxc -
lxc.cgroup.devices.allow = a
lxc.cap.drop =
EOT

lxc restart container

Setup docker on container

Just to verify that this works, start a privileged Docker container …

03 Dec 2018 6:00pm GMT

Peter Hutterer: ggkbdd is a generic gaming keyboard daemon

Last week while reviewing a patch I read that some gaming keyboards have two modes - keyboard mode and gaming mode. When in gaming mode, the keys send out pre-recorded macros when pressed. Presumably (I am not a gamer) this is to record keyboard shortcuts to have quicker access to various functionalities. The macros are stored in the hardware and are thus relatively independent of the host system. Pprovided you have access to the custom protocol, which you probably don't when you're on Linux. But I digress.

I reckoned this could be done in software and work with any 5 dollar USB keyboard. A few hours later, I have this working now: ggkbdd. It sits directly above the kernel and waits for key events. Once the 'mode key' is hit, the keyboard will send pre-configured key sequences for the respective keys. Hitting the mode key again (or ESC) switches back to normal mode.

There's a lot of functionality that is missing such as integration with the desktop (probably via DBus), better security (dropping privs, masking the fd to avoid accidental key logging), better system integration (request fds from logind, possibly through the compositor). And error handling, etc. I think the total time on this spent is somewhere between 3 and 4h, and that includes the time to write this blog post and debug the systemd unit autostartup. There are likely other projects that solve it the same way, or at least in a similar manner. I didn't check.

This was done as proof-of-concept and

In the grand glorious future and provided this is indeed something generally useful, this would need compositor integration. Not sure we'll ever get to that point. Meanwhile, consider this a code drop for a proof-of-concept and expect that you'll have to fix any bugs yourself.

03 Dec 2018 5:43am GMT

Dave Airlie (blogspot): Open source compute stack talk from Linux Plumbers Conference 2018

I spoke at Linux Plumbers Conference 2018 in Vancouver a few weeks ago, about CUDA and the state of open source compute stacks.

The video is now available.

https://www.youtube.com/watch?v=d94N2Lu4x9s


03 Dec 2018 1:43am GMT

Eric Anholt: 2018-12-03

In V3D land, I built kernel support for the Texture Formatting Unit (TFU) and started using it for glGenerateMipmaps() support. This unit will be also be important for reformatting linear dmabufs (such as from X11 with a linear-only scanout engine) or media decode output in SAND format into the UIF format that V3D can render from. The kernel side is in drm-misc-next, and I'll land the Mesa side once it hits drm-next.

I also rewrote the V3D cache invalidation in the kernel thanks to feedback from Dave Emett on the hardware team. In the process, this fixed a 3ms(!) CPU-side wait on every job submission, which improved throughput by 4-10x. Now I know why my fancy new hardware felt so slow! I also landed new tracepoints for anyone else debugging execution (aka I could write good sysprof visualization now).

Now that cache management is less broken, I've started on resurrecting my old SSBO and shader image branches so that I can write CSD (compute shader) support and get the driver to GLES 3.1.

In the process of writing new kernel code, I found multiple regressions in DRM core during CTS runs. I rebased my old V3D IGT support and got it merged, and built new tests for replicating one of the failures (GPU hang recovery was broken).

The tinydrm work from the previous month also continued. I've polished kmsro some more (fewer driver symlinks, more loader knowledge of kmsro), and will be resubmitting soon I hope.

Since I had kmsro nearly working, I gave it a shot for a V3D testing idea from last month. Right now to get X11 running to do conformance runs on it, I've got a stub KMS implementation in a branch. If I could do buffer sharing between VKMS and V3D, then kmsro could bind the two together so that we didn't need out-of-tree patches for V3D testing. It turned out that VKMS didn't have PRIME support, so I used Noralf's proposed shmem helpers to add it. I got X11 up and running and looking plausible, but tests were failing left and right. This may be due to the shmem helpers using cached CPU mappings, when we want WC for interaction with most GPUs. Because of issues like this, the shmem helpers for VKMS are stalled at the moment but I may land V3D support in kmsro anyway.

In vc4 land, the Bootlin folks have been knocking off more of the HVS TODO entries. Boris respun underscan support, and hopefully after the next revision we'll be able to land it. He's also implemented H/V flipping of planes, which will be useful in some cases for 180-degree rotated displays. (For 90 degree rotations, we would need panel support for it as the HVS can't go that way). Paul is taking over Boris's load tracker work and is building tests for it to determine how accurate it is in predicting underflows.

Finally, for vc4 I got permission to write a driver for the PM block that controls power and reset. So far the only thing Linux uses the block for directly is the watchdog timer (which also performs reboots by letting the watchdog timer fire quickly). For power domains, we've been using my raspberrypi-power driver to talk to the Raspberry Pi firmware. However, we should be able to do things more reliably, and implement GPU reset properly, if we expose power domains and a reset controller from the PM block. I wrote a series that moves the PM block to a new MFD driver, binds the watchdog driver to it, and adds a new soc driver for power domains and reset. So far, I've only tested this driver for V3D power domains and reset, but it looks good and gets more of the Raspberry Pi platform supported in fully open source code.

03 Dec 2018 12:30am GMT

22 Nov 2018

feedplanet.freedesktop.org

Ben Widawsky: FreeBSD for Thanksgiving

I've been working on FreeBSD for Intel for almost 6 months now. In the world of programmers, I am considered an old dog, and these 6 months have been all about learning new tricks. Luckily, I've found myself in a remarkably inclusive and receptive community whose patience seems plentiful. As I get ready to take […]

22 Nov 2018 1:57am GMT

19 Nov 2018

feedplanet.freedesktop.org

Rodrigo Siqueira: An attempt to create a local Kernel community

Since the day I had my first class of Operating Systems (OS) in my engineering course, I got passionate about it; for me, OS represents one of the greatest achievements of mankind. As a result of my delight for OS, I always tried to gravitate around this field, but my school environment did not provide me with many opportunities to get into the area. To summarize this long journey, I will jump directly into the main point, on November 15 of 2017, I joined to a conference named Linuxdev-br [1] which brought together some of the best Brazilians Kernel developers. I took this opportunity to learn everything that I could by asking lots of questions to developers. Additionally, I was lucky to meet Gustavo Padovan. He helped me a lot during my first steps in the Linux Kernel.

From November 2017 until now, I did the best I could to become a Kernel developer, and I have to admit that the path was very complicated. I paid the price to work from 8 AM to 11 PM, from Sunday to Sunday, to maintain my efforts in my master and the Linux Kernel at the same time; unfortunately, I could not stay focused only in the Kernel. However, all of these efforts were paid off along the year; I had many patches accepted in the Kernel, I joined the Google Summer of Code (GSoC), I traveled to conferences, I returned to Linuxdev-br 2018 as a speaker, I joined XDC2018 [2], and many other good things happened.

Now I am close to complete one year of Linux Kernel, and one question still bugs me: why does it have to be so hard for someone in a similar condition to become part of this world? I realized that I had great support from many people (especially from my sweet and calm wife) and I also pushed myself very hard. Now, I feel that it is time to start giving back something to society; as a result, I began to promote some small events about free software in the university and the city I live. However, my main project related to this started around two months ago with six undergraduate students at the University of Sao Paulo, IME [3]. My plan is simple: train all of these six students to contribute to the Linux Kernel with the intention to help them to create a local group of Kernel developers. I am excited about this project! I noticed that within a few weeks of mentoring the students they already learned lots of things, and in a few days, they will send out their contributions to the Kernel. I want to write a new post about that in December 2018, reporting the results of this new tiny project and the summary of this one year of Linux Kernel. See you soon :)

Reference

  1. linuxdev-br
  2. XDC 2018
  3. IME USP

19 Nov 2018 2:00am GMT

15 Nov 2018

feedplanet.freedesktop.org

Hans de Goede: New plymouth theme for flickerfree boot

As discussed in my previous blog post one of my TODO list items for plymouth is creating a new plymouth theme.

Since the transition to plymouth is not entirely smooth plymouth by default will wait 5 seconds (counted from starting the kernel) before showing itself so that on systems which boot under 5 seconds it never shows. As can be seen in this video, this leads to a very non-smooth experience when the boot takes say 7 seconds as plymouth then only shows briefly, leading to a kinda "flash" effect while it briefly shows.

Another problem with the 5 second wait, is now that we do not show GRUB the user is looking at the firmware's bootsplash for not only the often long firmware initialization time, but also for the 5 seconds plymouth waits on top, making it look as if nothing is happening.

To fix this I've been working on a new plymouth theme which draws a spinner over the firmware boot splash, eliminating the ugly transition from the firmware boot splash to plymouth. This also allows removing the show-delay, so that we provide feedback that something is happening as soon as plymouth starts.

Firmware being firmware getting this done right was somewhat harder then I expected, but I've a first "draft" of a new theme doing this now. I've created some videos showing 2 different systems booting the new theme:

Note the videos with diskcrypt where paused when I entered my passphrase. So there is a bit of a jump in them because of this.

I've built a test version of plymouth for Fedora 29, to give this a try download all rpm files from here except the .src.rpm and -devel files and then from a directory with all those files in it, run:

sudo rpm -Uvh plymouth*.rpm

Since plymouth is part of your initrd, you also need to regenerate your initrd:

sudo dracut -f /boot/initramfs-$(uname -r).img $(uname -r)

This regenerates the initrd for the kernel you are currently running, so if you've installed a kernel update and have not rebooted since then you may not get the new theme when rebooting. In this case rerun the dracut command after rebooting.

Note if you've previously followed my instructions to test flickerfree boot, then you need to remove "plymouth.splash_delay=20" from your kernel commandline, since we now no longer want to have a splash-delay.

Now reboot and you should get the new spinner on firmware-boot-splash theme, with Fedora branding.

If you give this a try and the new theme somehow does not look correct, please mail at hdegoede@redhat.com. If you mail me about the theme not displaying correctly please attach the /run/plymouth.log file which this test-build generates to the email and a video of how the theme misbehaves would be great too.

I still need to discuss the idea of using a new theme incorporating the firmware boot splash with the GNOME design team so this is all subject to change.

15 Nov 2018 4:21pm GMT

31 Oct 2018

feedplanet.freedesktop.org

Bastien Nocera: Pipewire Hackfest 2018

Good morning from Edinburgh, where the breakfast contains haggis, and the charity shops have some interesting finds.

My main goal in attending this hackfest was to discuss Pipewire integration in the desktop, and how it will eventually replace PulseAudio as the audio daemon.

The main problem GNOME has had over the years with PulseAudio relate mostly to how PulseAudio was a black box when it came to its routing policy. What happens when you plug in an HDMI cable into your laptop? Or turn on your Bluetooth headset? I've heard the stories of folks with highly mobile workstations having to constantly visit the Sound settings panel.

PulseAudio has policy scattered in a number of places (do a "git grep routing" inside the sources to see that): some are in the device manager, then modules themselves can set priorities for their outputs and inputs. But there's nothing to take all the information in, and take a decision based on the hardware that's plugged in, and the applications currently in use.

For Pipewire, the policy decisions would be split off from the main daemon. Pipewire, as it gains PulseAudio compatibility layers, will grow a default/example policy engine that will try to replicate PulseAudio's behaviour. At the very least, that will mean that Pipewire won't regress compared to PulseAudio, and might even be able to take better decisions in the short term.

For GNOME, we still wanted to take control of that part of the experience, and make our own policy decisions. It's very possible that this engine will end up being featureful and generic enough that it will be used by more than just GNOME, or even become the default Pipewire one, but it's far too early to make that particular decision.

In the meanwhile, we wanted the GNOME policies to not be written in C, difficult to experiment with for power users, and for edge use cases. We could have started writing a configuration language, but it would have been too specific, and there are plenty of embeddable languages around. It was also a good opportunity for me to finally write the helper library I've been meaning to write for years, based on my favourite embedded language, Lua.

So I'm introducing Anatole. The goal of the project is to make it trivial to write chunks of programs in Lua, while the core of your project is written in C (we might even be able to embed it in Python or Javascript, once introspection support is added).

It's still in the very early days, and unusable for anything as of yet, but progress should be pretty swift. The code is mostly based on Victor Toso's incredible "Lua factory" plugin in Grilo. (I'm hoping that, once finished, I won't have to remember on which end of the stack I need to push stuff for Lua to do something with it ;)

31 Oct 2018 11:44am GMT

Roman Gilg: Representing KDE at XDC 2018

Last month the X.Org Developer's Conference (XDC) was held in A Coruña, Spain. I took part as a Plasma/KWin developer. My main goal was to simply get into contact with developers from other projects and companies working on open source technology in order to show them that the KDE community aims at being a reliable partner to them now and in the future.

Instead of recounting chronologically what went down at the conference let us look at three key groups of attendees, who are relevant to KWin and Plasma: the graphics drivers and kernel developers, upstream userland and colleagues working on other compositor projects.

Graphics drivers and kernel

If you search on Youtube for videos of talks from previous XDC conferences or for the videos from this year's XDC you will notice that there are many talks by graphics drivers developers, often directly employed by hardware vendors.

The reason is that hardware vendors have enough money to employ open source developers and send them to conferences and that they benefit greatly from contributing directly to open source projects. Something which also Nvidia management will realize at some point.

On the other side I talked to the Nvidia engineers at the conference, who were very friendly and eager to converse about their technical solutions which they are allowed to share with the community. Sadly their primarily usage of proprietary technology in general hinders them in taking a more active role in the community and there is apparently no progress on their proposed open standard Wayland buffer sharing API.

At least we arranged that they would send some hardware for testing purposes. I won't be the recipient, since my work focus will be on other topics in the immediate future, but I was able to point to another KWin contributor, who should receive some Nvidia hardware in the future so he can better troubleshoot problems our users on Nvidia experience.

The situation looks completely different for Intel and AMD. In particular Intel has a longstanding track record of open development of their own drivers and contributing to generic open source solutions also being supported by other vendors. And AMD decided not too long ago to open source their most commonly used graphics drivers on Linux. In both cases it is a bliss to target their latest hardware and it was as great as I imagined it to be talking to their developers at XDC, because they are not only interested in their own products but in boosting the whole ecosystem and finding suitable solutions for everyone. I want to explicitly mention Martin Peres from Intel and Harry Wentland from AMD, who I had long, interesting discussions with and who showed great interest in improving the collaboration of low-level engineers and us in userland.

Who I haven't mentioned yet is ARM. Although they are just like Nvidia, Intel and AMD an XDC "Gold Sponsor" their contribution in terms of content to the conference was minimal, most likely for the same reason of being mostly closed source as in the Nvidia case. And that is equally sad, since we do have some interest in making ARM a well supported target for Plasma. An example is Plasma on the Pinebook. But the driver situation for ARM Mali GPUs is just ugly, developing for them is torture. I know because I did some of the integration work for the Pinebook. All the more I respect the efforts by several extremely talented hackers to provide open-source drivers for ARM Mali GPUs. Some of them presented their work at XDC.

X.Org and freedesktop.org upstream

Linux graphics drivers are cool and all, but without XServer, Wayland and other auxiliary cross-vendor user space libraries there would be not much to show off to the user. And after all it is the X.Org Developer's conference, most notably being home to the XServer and maybe in the future governance wise also to freedesktop.org. So after looking at low-level driver development, what role did these projects and their developers play at the conference?

First I have to say, that the dichotomy established in the previous paragraph is of course not that distinct. Several graphics drivers are part of mesa, which is again part of freedesktop.org and many graphics drivers developers are also contributing to user land or involved in organizational aspects of X.Org and freedesktop.org. A more prominent one of these organizational aspects is hosting of projects. There was a presentation by Daniel Stone about the freedesktop.org transition to GitLab, what was a rather huge project this year and is still ongoing.

But regarding technical topics there were not many presentations about XServer, Wayland and other high level components. After seeing some lightning talks on the first day of the conference I decided to hold a lightning talk myself about my Xwayland GSOC project in 2017. I got one of the last slots on Friday and you can watch a video of my presentation here. Also Drew De Vault presented a demo of wlroot's layer shell.

So there were not so many talks about the higher level user space graphics stack, but some of us plan to increase the ratio of such talks in the future. After talking about graphics drivers developers and upstream userland this brings me directly to the last group of people:

Compositors developers

We were somewhat a special crowd at XDC. From distinct projects, some of us were from wlroots, Guido from Purism and me from KWin, we were united in, to my knowledge, all of us being the first time at XDC.

If you look at past conferences the involvement of compositor developers was marginal. My proclaimed goal and I believe also the one of all the others is to change this from now on. Because from embedded to desktop we will all benefit by working together where possible and exchanging information with each other, with upstream and with hardware vendors. I believe X.Org and freedesktop.org can be a perfect platform for that.

Final remarks on organisation

The organisation of the conference was simply great. Huge thanks to igalia for hosting XDC in their beautiful home town.

What I really liked about the conference schedule was that there were always three long breaks every day and long pauses between the talks allowing the attendees to talk to each other.

What I didn't like about the conference was that all the attendees were spread over the city in different hotels. I do like the KDE Akademy approach better in this regard: everyone in one place so you can drink together a last beer at the hotel bar before going to bed. That said there were events at multiple evenings throughout the week, but recommending a reasonable priced default hotel for everyone not being part of a large group might still be an idea for next XDC.

31 Oct 2018 11:00am GMT

30 Oct 2018

feedplanet.freedesktop.org

Christian Schaller: PipeWire Hackfest

So we kicked off the PipeWire hackfest in Edinburgh yesterday. We have 15 people attending including Arun Raghavan, Tanu Kaskinen and Colin Guthrie from PulseAudio, PipeWire creator Wim Taymans, Bastien Nocera and Jan Grulich representing GNOME and KDE, Mark Brown from the ALSA kernel team, Olivier Crête,George Kiagiadakis and Nicolas Dufresne was there to represent embedded usecases for PipeWire and finally Thierry Bultel representing automotive.

The event kicked off with Wim Taymans presenting on current state of PipeWire and outlining the remaining issues and current thoughts on how to resolve them. Most of the first day was spent on a roadtable discussion about what are and should be the goals of PipeWire and what potential tradeoffs there would be going forward. PipeWire is probably a bit closer to Jack than PulseAudio in design, so quite a bit of the discussion went on how that would affect the PulseAudio usecases and what is planned to ensure PipeWire works very well for consumer audio usecases.

Personally I ended up spending quite some time just testing and running various Jack apps to see what works already and what doesn't. In terms of handling outputing audio with Jack apps I was positively surprised how many Jack apps I was able to make work (aka output audio) using PipeWire instead of Jack, but of course we still have some gaps to cover before PipeWire is ready as a drop-in Jack replacement, for instance the Jack session management protocol needs to be implemented first.

The second day we outlined the areas that need work before we are ready to replace PulseAudio and came up with the following list:

It is still a bit hard to have a clear timeline for when we will be ready to drop in PipeWire support to replace PulseAudio and then Jack, but we feel the Wayland migration was a good example to follow where we held off doing the switch until we felt comfortable the move would be transparent to most users. There will of course always be corner cases and bugs, but we hope that in general people agree that the Wayland transition was done in a responsible manner and thus could be a good example to follow for us here.

We would like to offers big thanks to the GNOME Foundation for sponsoring travel for some of the community attendees and to Collabora for sponsoring dinner for all attendees the first night.

If you want to take a look at PipeWire, Wim updated the wiki page with PipeWire build intructions to be up-to-date. The hackfest attendees tested them out so we are sure they work, just be aware that you want the 'Work' branch and not the Master branch, as that is the one where all the audio work is happening. The Master branch is the video focused branch we use in Fedora for desktop remoting support in browsers and VNC under Wayland.

30 Oct 2018 2:15pm GMT

Eric Anholt: 2018-10-30

This last week, I worked on the oldest issue I had in my rpi kernel repo: Adding support for tinydrm displays (it actually predates tinydrm). I wrote a tinydrm driver for the panel I had on hand and got it upstreamed. I wrote a Mesa series enabling vc4 to talk to this new hx8357d tinydrm driver. Once the Mesa side is agreed on, then we can extend it to the other tinydrm drivers and have a solution for X with 3D with limited updates (as opposed to full screen refreshing like fbdev did) on these panels.

I also fixed a longstanding vc4 bug that slightly rotated SDL2 output. They were running all of their coordinates through a rotation matrix they built on each VS invocation, and in the common case of no rotation the cos(0)/sin(0) was inaccurate enough to rotate things by a pixel near the window edges. The sin()/cos() in Mesa is now more accurate so this problem shouldn't happen with old SDL2, we have a piglit test to make sure other drivers don't break with SDL2, and upstream SDL2 has stopped building a rotation matrix in the shader. (yay!)

Small stuff from the V3D driver since my last post:

Also in V3D, I've been experimenting with other ways of running the driver using the software simulator. Right now I have a frankenstein mode in vc4 and v3d where the driver sits on top of the i915 GEM driver, and does BO mapping and interactions with the window system on i915 and then copies those BOs into the simulator to run vc4/v3d ioctls on it. This is gross, and it's only really usable on my desktop with the i915 driver.

Talking with one of the HW engineers who's interested in running Mesa against the simulator, there are a few ways we could do simulation. One would be to have the actual kernel driver call back up into a userspace daemon linking the simulator, and provide mappings of BOs and proxied register writes to that userspace dameon. Another would be to implement an actual V3D device in QEMU by linking in the simulator, but we can't do that due to the GPL. Another would be to use my current simulation code, but let it work on top of vkms or vgem (so that a CI system wouldn't need i915). This one seems easy. However, I started on a project mimicing what Intel did for their software-only CI setup: intercept libc calls to simulate having a V3D driver that's not actually present in the kernel. I've got the code to the point of trying to submit CLs, at which point the simulator hangs for reasons I haven't debugged yet. I don't know if I'll want to do this long term, but it seems like potentially useful code to other driver developers who have a software simulator.

Finally, in the X world, this month I've reviewed and merged more patches than I normally would (thanks, gitlab!), and wrote an extensible testcase for the damage extension due to a bug report/fix that a user had been submitted.

30 Oct 2018 12:30am GMT