23 Oct 2014

feedplanet.freedesktop.org

Eric Anholt: VC4 driver status update

I've just spent another week hanging out with my Broadcom and Raspberry Pi teammates, and it's unblocked a lot of my work.

Notably, I learned some unstated rules about how loading and storing from the tilebuffer work, which has significantly improved stability on the Pi (as opposed to simulation, which only asserted about following half of these rules).

I got an intro on the debug process for GPU hangs, which ultimately just looks like "run it through simpenrose (the simulator) directly. If that doesn't catch the problem, you capture a .CLIF file of all the buffers involved and feed it into RTL simulation, at which point you can confirm for yourself that yes, it's hanging, and then you hand it to somebody who understands the RTL and they tell you what the deal is." There's also the opportunity to use JTAG to look at the GPU's perspective of memory, which might be useful for some classes of problems. I've started on .CLIF generation (currently simulation-environment-only), but I've got some bugs in my generated files because I'm using packets that the .CLIF generator wasn't prepared for.

I got an overview of the cache hierarchy, which pointed out that I wasn't flushing the ARM dcache to get my writes out into system L2 (more like an L3) so that the GPU could see it. This should also improve stability, since before we were only getting lucky that the GPU would actually see our command stream.

Most importantly, I ended up fixing a mistake in my attempt at reset using the mailbox commands, and now I've got working reset. Testing cycles for GPU hangs have dropped from about 5 minutes to 2-30 seconds. Between working reset and improved stability from loads/stores, we're at the point that X is almost stable. I can now run piglit on actual hardware! (it takes hours, though)

On the X front, the modesetting driver is now merged to the X Server with glamor-based X rendering acceleration. It also happens to support DRI3 buffer passing, but not Present's pageflipping/vblank synchronization. I've submitted a patch series for DRI2 support with vblank synchronization (again, no pageflipping), which will get us more complete GLX extension support, including things like GLX_INTEL_swap_event that gnome-shell really wants.

In other news, I've been talking to a developer at Raspberry Pi who's building the KMS support. Combined with the discussions with keithp and ajax last week about compositing inside the X Server, I think we've got a pretty solid plan for what we want our display stack to look like, so that we can get GL swaps and video presentation into HVS planes, and avoid copies on our very bandwidth-limited hardware. Baby steps first, though -- he's still working on putting giant piles of clock management code into the kernel module so we can even turn on the GPU and displays on our own without using the firmware blob.

Testing status:
- 93.8% passrate on piglit on simulation
- 86.3% passrate on piglit gpu.py on Raspberry Pi

All those opcodes I mentioned in the previous post are now completed -- sadly, I didn't get people up to speed fast enough to contribute before those projects were the biggest things holding back the passrate. I've started a page at http://dri.freedesktop.org/wiki/VC4/ for documenting the setup process and status.

And now, next steps. Now that I've got GPU reset, a high priority is switching to interrupt-based render job tracking and putting an actual command queue in the kernel so we can have multiple GPU jobs queued up by userland at the same time (the VC4 sadly has no ringbuffer like other GPUs have). Then I need to clean up user <-> kernel ABI so that I can start pushing my linux code upstream, and probably work on building userspace BO caching.

23 Oct 2014 8:53am GMT

22 Oct 2014

feedplanet.freedesktop.org

Christian Schaller: GStreamer Conference 2014 talks online

For those of you who like me missed this years GStreamer Conference the recorded talks are now available online thanks to Ubicast. Ubicats has been a tremendous partner for GStreamer over the years making sure we have high quality talk recordings online shortly after the conference ends. So be sure to check out this years batch of great GStreamer talks.

Btw, I also done a minor release of Transmageddon today, which mostly includes a couple of bugfixes and a few less deprecated widgets :)

22 Oct 2014 6:24pm GMT

Ben Widawsky: OS backtrace with symbol names Goodbye Sabbatical

For those of you reading this that didn't know, I've had two months of paid vacation - one of the real perks of working for Intel. Today is the last day. It is as hard as I thought it would be.

Most of the vacation was spent vacationing. As I have access to none of the pictures at the moment, and I don't want to make you jealous, I'll skip over that. Toward the end though, I ended up at a coffee shop waiting for someone with nothing to do. I spent a little bit of time working on HOBos, and I thought it could be interesting to write about it.

WARNING: There is nothing novel here.

A brief history of HOBos

The HOBby operating system is an operating system project I started a while ago. I am a bit embarrassed that I started an OS. In my opinion, it's one of lamer tasks to take on because 1. everyone seems to do it; 2. there really isn't a need, there are many operating systems with permissive licenses already; and 3. sites like OSDev have made much of the work trivial (I like to think that when I started there wasn't quite as much info readily available, but that's a lie).

Larrabee Av in Portland (not what the project was named after)

Larrabee Av in Portland
(not what the project was named after)

HOBos began while I was working on the Larrabee project. The team spent a lot of time optimizing the memory management, and the scheduler for the embedded software. I really wanted to work on these things full time. Unfortunately for me, having had a background in device drivers, I was often required to do other things. As a means to scratch the itch, I started HOBos after not finding anything suitable for my needs. The stuff I found were all either too advanced, or too rudimentary. When I was hired to work on i915, I decided that it was better use of my free time. Since then, I've been tweaking things here or there, and I do try to make sure things still run with the latest QEMU and compilers at least once a year. The last actual feature I added was more than 1300 days ago:

   commit 1c9b5c78b22b97246989b00e807c9bf1fbc9e517
        Author: Ben Widawsky <ben@bwidawsk.net>
        Date: Sat Mar 19 21:19:57 2011 -0700

        basic backtrace

So back to the coffee shop. I tried to do something or other, got a hang, and didn't want to fire up GDB.

Backtracing

HOBos had implemented backtraces since the original import from SVN (let's pretend that means, since always). Obtaining a backtrace is actually pretty straightforward on x86.

The stack frame

The stack frame can be thought of as memory contents that are locally scoped to a function. Declaring a local variable will end up in the stack frame. A global variable will not. As functions call other functions, you end up with multiple frames. A stack is used because the last frames added are the first ones removed (this is not always true for things like exceptions, but nevermind that for now). The fact that a stack decrements is arbitrarily chosen, as far as I can tell. The following shows the stack when the function foo() calls the function bar().

Example Stackframe

Example Stackframe

The memory contents shown above are created as a result of two things. First is what the CPU implicitly does upon execution of the call instruction. The second is what the compiler generates. Since we're talking about x86 here, the call instruction always pushes at least the return address. The second I'll detail a bit more in a bit. Correlating this to the picture, the green (foo) and blue (bar) are creations of the compiler. The brown is automatically pushed on to the stack by the hardware, and is automatically popped from the stack on the ret instruction.

In the above there are two registers worth noting, RBP, and RSP. RBP which is the 64b extension to the original 8086 BP, Base Pointer, register is the beginning of the stack frame ie the Frame Pointer. RSP, the extension to the 8086 SP, Stack Pointer, points to the end of the stack frame. By convention the Base Pointer doesn't change throughout a function being executed and therefore it is often used as the reference to local variables stored on the stack. -100(%rbp) as an example above.

Digging further into that disassembly above, one notices a pattern. Every function begins with:

push   %rbp       // Push the old RBP, RSP now points to this
mov    %rsp,%rbp  // Store RSP in RBP

Assuming this is the convention, it implies that at any given point during the execution of a function we can obtain the previous RBP by reading the current RBP and doing some processing. Specifically, reading RBP gives us the old Stack Pointer, which is pointing to the last RBP. As mentioned above, the x86 CPU pushed the return address immediately before the push %rbp - which means as we work backwards through the Base Pointers, we can also obtain the caller for the current stack frame. People have done really nice pictures on this - use your favorite search engine.

Here is the HOBos code (ignore the part about symbols for now):

void bt_fp(void *fp)
{
        do {
                uint64_t prev_rbp = *((uint64_t *)fp);
                uint64_t prev_ip = *((uint64_t *)(fp + sizeof(prev_rbp)));
                struct sym_offset sym_offset = get_symbol((void *)prev_ip);
                printf("\t%s (+0x%x)\n", sym_offset.name, sym_offset.offset);
                fp = (void *)prev_rbp;
                /* Stop if rbp is not in the kernel
                 * TODO< need an upper bound too*/
                if (fp <= (void *)KVADDR(DMAP_PML,0,0,0))
                        break;
        } while(1);
}

As far as I know, all modern CPUs work in a similar fashion with differences sprinkled here and there. ARM for example has an LR register for the return address instead of using the stack.

ABI/Calling Conventions

The fact that we can work backwards this way is a byproduct of the calling convention. One example of an aspect of the calling convention is where to put the arguments to a function. Do they go on the stack, in registers, or somewhere else? In addition to these arguments, the way in which RBP, and RSP are used are strictly software constructs that are part of the convention. As a result, it might not always be possible to get a backtrace if:

  1. This convention is not adhered to (or -fomit-frame-pointer)
  2. The contents of RBP are destroyed
  3. The contents of the stack are corrupted.

How arguments are passed to function are also needed to make sure linkers and loaders (both static and dynamic) can operate to correctly form an executable, or dynamically call a function. Since this isn't really important to obtaining a backtrace, I will leave it there. Some architectures do provide a way to obtain useful backtrace information without caring about the calling convention: Intel's Processor Trace for example.

Symbol information

The previous section will get us a reverse list of addresses for all function calls working backward from a given point during execution. But having names makes things much more easier to quickly diagnose what is going wrong. There is a lot of information on the internet about this stuff. I'm simply providing all that's relevant to my specific problem.

ELF Symbols (linking)

The ELF format provides everything we need (assuming things aren't stripped). Glossing over the details (see this simple tool if you're curious), we end up with two "sections" that tell us everything we need. They are conventionally named, ".symtab", and ".strtab" and are conveniently of type, SHT_SYMTAB, and SHT_STRTAB. The symbol table defines the information about each symbol (functions, variables, whatever). Part of the information is a name, which is an index into the string table. In this simplest case, these are provisions for inter-object linking. If I had defined foo() in foo.c, and bar() in bar.c, the compiled object files can be linked together, but the linker needs the information about the symbol bar (in this case) in order to do its job.

readelf -S a.out
[Nr] Name Type Address Offset
[33] .symtab SYMTAB 0000000000000000 000015b8
[34] .strtab STRTAB 0000000000000000 00001c90
> readelf -S a.out | egrep "\.strtab|\.symtab" | wc -l
2
> strip a.out
> readelf -S a.out | egrep "\.strtab|\.symtab" | wc -l
0

Summing that up, if we have an entire ELF file, and the symbol and string tables haven't been stripped, we're good to go. However, ELF sections are not the unit in which an ELF loader decides what to load. The loader loads segments which are of type PT_LOAD. A segment is made up of 0 or more sections, plus padding. Since the Operating System is itself an ELF loaded by an ELF loader (the bootloader) we're not in a good position. :(

> readelf -l a.out | egrep "\.strtab|\.symtab" | wc -l
0
ELF Loader

ELF Loader

Debug Info

Note that what we want is not the same thing as debug information. If one wants to do source level debug, there needs to be some way of correlating a machine instruction to a line of source code. This is also a software abstraction, and there is usually a spec for it unless you are using some proprietary thing. It would technically be possible to include DWARF capabilities within the kernel, but I do not know of a way to get that info to the OS (see multiboot stuff for details).

From boot to symbols

The HOBos project implements a small Multiboot compliant bootloader called smallboot. When the machine starts up, boot firmware is loaded from a fixed location (this is currently done by SeaBIOS). The boot firmware then loads the smallboot bootloader. The bootloader will load the kernel (smallboot, and most modern bootloaders will do this through a text configuration file on the resident filesystem). In the HOBos case, the kernel is simply an ELF file. smallboot implements a basic ELF loader to load the kernel into memory and give execution over.

The multiboot specification is a standardized communication mechanism (for various things) from the bootloader to the Operating System (or any file really). One of these things is symbol information. Quoting the multiboot spec

If bit 5 in the 'flags' word is set, then the following fields in the Multiboot information structure starting at byte 28 are valid:

             +-------------------+
     28      | num               |
     32      | size              |
     36      | addr              |
     40      | shndx             |
             +-------------------+

These indicate where the section header table from an ELF kernel is, the size of each entry, number of entries, and the string table used as the index of names. They correspond to the 'shdr_*' entries ('shdr_num', etc.) in the Executable and Linkable Format (elf) specification in the program header. All sections are loaded, and the physical address fields of the elf section header then refer to where the sections are in memory (refer to the i386 elf documentation for details as to how to read the section header(s)). Note that 'shdr_num' may be 0, indicating no symbols, even if bit 5 in the 'flags' word is set.

Since the beginning I had implemented these fields in the bootloader:

multiboot_info.flags |= MULTIBOOT_INFO_ELF_SHDR;
multiboot_info.u.elf_sec = *table;

Because of the fact that the symbols weren't in the ELF segments though, I was stumped as to how to get at the data one the OS is loaded. As it turned out, I didn't actually read all 4 sentences and I had missed one very important part.

All sections are loaded, and the physical address fields of the elf section header then refer to where the sections are in memory

What the spec is dictating is that even though the sections are not in loadable segments, they shall exist within memory during the handover to the OS, and the section header information will be updated so that the OS knows where to find it. With this, the OS can copy out, or just make sure not to overwrite the info, and then get access to it.

    for (i = 0; i < shnum; i++) {
        __ElfN(Shdr) *sh = &shdr[i];
        if (sh->sh_size == 0)
            continue;

        if (sh->sh_addr) /* Already loaded */
            continue;

        ASSERT(sizeof(void *) == 4);
        *((volatile __ElfN(Addr) *)&sh->sh_addr) = sh->sh_offset + (uint32_t)addr;
    }

et cetera

The code for pulling out the symbols is quite a bit longer, but it can be found in kern/core/syms.c. With the given RBP unwinder near the top, we can easily get the IP for the caller. With that IP, we do a symbol lookup from the symbols we got via the multiboot info.

Screenshot with backtrace

Screenshot with backtrace

Inkscape Links

https://bwidawsk.net/blog/wp-content/uploads/2014/10/stackframe.svg
https://bwidawsk.net/blog/wp-content/uploads/2014/10/loader.svg

Download PDF

22 Oct 2014 1:20am GMT

21 Oct 2014

feedplanet.freedesktop.org

Bastien Nocera: A GNOME Kernel wishlist

GNOME has long had relationships with Linux kernel development, in that we would have some developers do our bidding, helping us solve hard problems. Features like inotify, memfd and kdbus were all originally driven by the desktop.

I've posted a wishlist of kernel features we'd like to see implemented on the GNOME Wiki, and referenced it on the kernel mailing-list.

I hope it sparks healthy discussions about alternative (and possibly existing) features, allowing us to make instant progress.

21 Oct 2014 9:06am GMT

13 Oct 2014

feedplanet.freedesktop.org

Rob Clark: Silly r/e tool nonsense hacks

In the process of reverse engineering work for freedreno, I've cobbled together some interesting tools. The earliest and most useful of which is cffdump. (Named after some command-stream dumping debug code in the old kgsl android kernel driver, upon which it was originally inspired.) The cffdump tool knows how to parse out the "toplevel" command-stream stored as an .rd (re-dump) file, finding packets that load state memory, write registers, IB (indirect branch), etc. The .rd file contains snapshots of gpu buffers, in order to chase gpu pointers at decode time. It links in librnn from the nouveau envytools project for the decoding of individual registers, and a few other things. It also calls out to the freedreno disassembler code to show inline disassembly of shaders, decodes vertex and constant (uniform) buffers, etc. And even generates pretty color output (thanks to librnn):


A few months back, I added some basic lua scripting support to cffdump, mostly to assist in r/e work for adreno a4xx. When invoked with the --script argument, cffdump would load the specific lua script, and call the 'draw' function it defines on each CP_DRAW_INDX opcode. The choice of lua was mostly because it seemed fairly easy to integrate with .c code.

Since then, I've had the thought in the back of my mind that adding script bindings to integrate rnn register decode to lua would be useful for much more. Such as writing a command-stream validator to check for inconsistent programming. There are a number of places where inconsistencies between various register settings and such will result in gpu lockup. The general adreno design philosophy appears to be to not ever dedicate transistors to making the driver writer's life easier... which for a SoC gpu is certainly the right choice, but it doesn't make things any easier for me. Over time, I've discovered many of these of these rules, but they are mostly all in my head at the moment. And from time to time, when adding new features to the gallium driver, I inadvertently break one or more of the rules and end up wasting time studying cmdstream dumps from the freedreno gallium driver to figure out what I did wrong.

So, on the way to XDC2014 I started hacking up support for register decoding from lua scripts. It turns out that time in airports and airplanes, where I can't exactly break out an ifc6410 and hdmi monitor to do some driver work, is a good time to catch up on these sort of projects. Now I can do nifty things like:


-- load rnn database file for a320:
r = rnn.init("a320")

function start_cmdstream(name)
io.write("START: " .. name .. "\n")
end

function draw(primtype, nindx)
-- simple full register access:
io.write("GRAS_CL_VPORT_XOFFSET: " .. r.GRAS_CL_VPORT_XOFFSET .. "\n")
-- access boolean bitfield Z_ENABLE in RB_DEPTH_CONTROL register:
io.write("RB_DEPTH_CONTROL.Z_ENABLE: " .. tostring(r.RB_DEPTH_CONTROL.Z_ENABLE) .. "\n")
-- access ROP_CONTROL bitfield inside CONTROL register inside RB_MRT[] array:
io.write("RB_MRT[0].CONTROL.ROP_CODE: " .. r.RB_MRT[0].CONTROL.ROP_CODE .. "\n")
end

function end_cmdstream()
io.write("END\n")
end

function finish()
io.write("FINISH\n")
end


which will generate output like:

[robclark@thunkpad:~/src/freedreno (master)]$ ./cffdump --script test.lua piglit.rd
Reading piglit.rd...
START: piglit.rd

GRAS_CL_VPORT_XOFFSET: 79.5
RB_DEPTH_CONTROL.Z_ENABLE: true
RB_MRT[0].CONTROL.ROP_CODE: 12


Currently it should handle all of the rnndb constructs that are used for adreno. Ie. simple registers, arrays of simple registers, arrays of groups of registers, etc. No support for "stripes" yet since those are not used for freedreno.

At the moment, all the script bindings are in freedreno.git/util/script.c but if there is some interest in this from nouveau or anyone else using librnn then it would be a good idea to try to refactor some of this into more generic code in librnn. It would still need a bit of glue from the tool linking librnn to get at the actual register values.

Still needed are a few more script hooks (such as CP_LOAD_STATE) to do everything I need for a validator script. Hopefully I find some time to work on that before the next conference ;-)

PS. I hope this post is at least a bit coherent.. I am still a bit jetlagged..



13 Oct 2014 3:01pm GMT

10 Oct 2014

feedplanet.freedesktop.org

Matthias Klumpp: Listaller + Glick: Some new ideas

As you might know, due to invasive changes in PackageKit, I am currently rewriting the 3rd-party application installer Listaller. Since I am not the only one looking at the 3rd-party app-installation issue (there is a larger effort going on at GNOME, based on Lennarts ideas), it makes sense to redesign some concepts of Listaller.

Currently, dependencies and applications are installed into directories in /opt, and Listaller contains some logic to make applications find dependencies, and to talk to the package manager to install missing things. This has some drawbacks, like the need to install an application before using it, the need for applications to be relocatable, and application-installations being non-atomic.

Glick2

There is/was another 3rd-party app installer approach on the GNOME side, by Alexander Larsson, called Glick2. Glick uses application bundles (do you remember Klik from back in the days?) mounted via FUSE. This allows some neat features, like atomic installations and software upgrades, no need for relocatable apps and no need to install the application.

However, it also has disadvantages. Quoting the introduction document for Glick2:

"Bundling isn't perfect, there are some well known disadvantages. Increased disk footprint is one, although current storage space size makes this not such a big issues. Another problem is with security (or bugfix) updates in bundled libraries. With bundled libraries its much harder to upgrade a single library, as you need to find and upgrade each app that uses it. Better tooling and upgrader support can lessen the impact of this, but not completely eliminate it."

This is what Listaller does better, since it was designed to do a large effort to avoid duplication of code.

Also, currently Glick doesn't have support for updates and software-repositories, which Listaller had.

Combining Listaller and Glick ideas

So, why not combine the ideas of Listaller and Glick? In order to have Glick share resources, the system needs to know which shared resources are available. This is not possible if there is one huge Glick bundle containing all of the application's dependencies. So I modularized Glick bundles to contain just one software component, which is e.g. GTK+ or Qt, GStreamer or could even be a larger framework (e.g. "GNOME 3.14 Platform"). These components are identified using AppStream XML metadata, which allows them to be installed from the distributor's software repositories as well, if that is wanted.

If you now want to deploy your application, you first create a Glick bundle for it. Then, in a second step, you bundle your application bundle with it's dependencies in one larger tarball, which can also be GPG signed and can contain additional metadata.

The resulting "metabundle" will look like this:

glick-libundle

This doesn't look like we share resources yet, right? The dependencies are still bundled with the application requiring them. The trick lies in the "installation" step: While the application above can be executed right away without installing it, there will also be an option to install it. For the user, this will mean that the application shows up in GNOME-Shell's overview or KDEs Plasma launcher, gets properly registered with mimetypes and is - if installed for all users - available system-wide.

Technically, this will mean that the application's main bundle is extracted and moved to a special location on the file system, so are the dependency-bundles. If bundles already exist, they will not be installed again, and the new application will simply use the existing software. Since the bundles contain information about their dependencies, the system is able to determine which software is needed and which can simply be deleted from the installation directories.

If the application is started now, the bundles are combined and mounted, so the application can see the libraries it depends on.

Additionally, this concept allows secure updates of applications and shared resources. The bundle metadata contains an URL which points to a bundle repository. If new versions are released, the system's auto-updater can automatically pick these up and install them - this means e.g. the Qt bundle will receive security updates, even if the developer who shipped it with his/her app didn't think of updating it.

Conclusion

So far, no productive code exists for this - I just have a proof-of-concept here. But I pretty much like the idea, and I am thinking about going further in that direction, since it allows deploying applications on the Linux desktop as well as deploying software on servers in a way which plays nice with the native package manager, and which does not duplicate much code (less risk of having not-updated libraries with security flaws around).

However, there might be issues I haven't thought about yet. Also, it makes sense to look at GNOME to see how the whole "3rd-party app deployment" issue develops. In case I go further with Listaller-NEXT, it is highly likely that it will make use of the ideas sketched above (comments and feedback are more than welcome!).

10 Oct 2014 12:59pm GMT

09 Oct 2014

feedplanet.freedesktop.org

Tiago Vignatti: Chromium Ozone-GBM explained

I've wrote an article about the new graphics platform for Chromium called Ozone-GBM. I particularly think that Ozone-GBM will play an important role next in Chromium and Linux graphics communities in general. I hope you enjoy the read :) Please share it.

https://01.org/chromium/blogs/tiagovignatti/2014/chromium-ozone-gbm-explained


09 Oct 2014 4:48pm GMT

08 Oct 2014

feedplanet.freedesktop.org

Ian Romanick: Stochastic Search-Based Testing for Uniform Block Layouts

I just finished my talk at XDC 2014. The short version: UBO support in OpenGL drivers is terrible, and I have the test cases to prove it.

There are slides, a white paper, and, eventually, a video.

UPDATE: Fixed a typo in the white paper reported by Jonas Kulla on Twitter.

UPDATE: Direct link to video.

08 Oct 2014 1:28pm GMT

04 Oct 2014

feedplanet.freedesktop.org

Rob Clark: Freedreno Update

A number of people have recently asked what is new with freedreno. It had been a while since posting an update.. and, well, not everyone watches mesa commit logs for fun, or watches #freedreno on freenode, so it seemed like time for another semi-irregular freedreno blog post.

The tl;dr version: recently it has been a lot of robustness, and bug fixes and smaller feature implementation for piglit, etc. No one big exciting feature this time.. but lots of little things adding up to make freedreno on a3xx more complete and mature.

And an obligatory screenshot, just because:


(Yeah, webgl should probably be faster in chrome/chromium.. but not packaged for fedora, and chrome build system was invented by someone who wants to make compiling their src as difficult as possible.)

Mesa..

On the mesa/gallium driver front, the big news is that earlier this week we finally achieved a 90% pass ratio for piglit. (In fact, 90.4%) To put this in perspective, a little over six months ago freedreno was at just 50% pass. Since June, we have added around 600 passing tests. In fact in the last week, an additional ~50 tests are passing, which bumps us up to 91% pass.

For those who are not familiar with it, piglit is an open source OpenGL test suite. Since the mesa developers are quite good about adding new test cases to piglit whenever adding a new feature/extension to mesa, it is a very comprehensive test suite. The down side, if you could call it that, is that it has a lot more OpenGL tests compared to OpenGLES (at least for GLES < 3.0). So getting the pass ratio up involved implementing (and in some cases emulating) a number of features that the blob ES-only driver does not support. Fortunately enough of the registers and bitfields are known at this point that trial and error with educated guesses (and then see which guesses make piglit tests pass) has worked out reasonably well for some features. Other features, like GL_CLAMP and two sided color, we need to emulate in the shader, which was implemented as a TGSI to TGSI pass in order to hopefully be useful for other gallium drivers for GLES class hardware. (And, in fact both of those are things that at least some of the desktop drivers need to emulate as well.)

And big thanks to Ilia Mirkin for a lot of advice and some patches for the failing piglits. Ilia has also started sending a lot of patches for the compiler to flesh out integer support, add new instructions (in particular texture sample instructions), and other things that will be needed for GL3/GLES3. In fact as a result of his work, we are already at ~85% pass for GL3 despite missing some bullet-point features!

DDX..

On the xf86-video-freedreno front, over the last few months we have gained server managed fd's and OutputClass support (so that a sufficiently new xserver can auto-pick the correct driver, like we have had for a long time on desktop/pci systems). And a hot-off-the-presses 1.3.0 release with a handful of robustness fixes. I strongly recommend to upgrade.

Kernel..

These last few kernel releases have seen a significant improvement in the state of apq8064/ifc6410 support upstream. As of the 3.17 kernel, the main things missing to work on a pure-upstream[1] kernel are the rpm/rpm-regulators iommu drivers. The linaro folks have been a big help there. In particular, their integration branch, which consists of latest upstream plus in-flight patches, is significantly easier than tracking all the relevant kernel mailing lists.

For drm/msm, the last few kernel releases have seen: some basic gpu perf and logging debugfs features, DT support for mdp4 (display controller version in apq8064), LVDS and multi-monitor support for mdp4, and mdp5 v1.3 support from qcom for upcoming devices. And of course bug fixes!

[1] Ie. Linus's tree... kernel-msm or AOSP is not upstream, for any android type's who were confused about that.


04 Oct 2014 2:08pm GMT

03 Oct 2014

feedplanet.freedesktop.org

Daniel Vetter: Neat drm/i915 stuff for 3.18

Since Dave Airlie moved the feature cut-off of the drm-next tree roughly one month ahead it is already time for our regular look at what's ahead. Even though the 3.17 features aren't even released yet.

On the modeset side of things we now have the final pieces for plane rotation support from Sonika Jindal and Ville. The DisplayPort code has also seen lots of improvements, with updated training values in preparation of the latest eDP standard (Sonika Jindal) and support for DP training pattern 3 (Ville). DSI panels now support burst mode (Shobhit) and hdmi conformance has been improved with some fixes from Clint Taylor.

For eDP panels we also have improved panel power sequencing code, mostly to fix issues on Cherryview, from Ville. Ville has also contributed fixes to the VDD handling code, which is used to temporarily enable panel power. And the backlight code learned to handle the bl_power setting so that the backlight can be turned off completely without upsetting the panel's power sequencing, contributed by Jani.

Chris Wilson has also been fairly busy on the modeset code: 3.18 includes his patches to cache EDIDs for a single probe call - unfortunately the full caching solution to keep the EDID around between multiple probe calls isn't merged yet. And pageflips have now improved error detection and recovery logic: In case something goes wrong we shouldn't end up stuck any longer waiting for a pageflip to complete that has been lost by either the hardware or the driver.

Moving on to platform specific work there's been lots of preparations for Skylake, most of it from Damien and Sonika. The actual intial platform enabling is delayed for 3.19 though. On the other end of the timeline Ville fixed up i830M modeset support on a rainy w/e in his vacation, and 3.18 now has all that code. And there has been a lot of Cherryview fixes all over.

Cherryview also gained support for power wells and hence runtime pm (Ville). And for platform agnostic feature a lot of the preparation for DRRS (dynamic refresh rate switching) is merged, hopefully the actual feature patches from Vandana Kannan will land in 3.19.

Moving on the render side of the driver there's been a lot of patches to beat the full ppgtt support into shape. The context code has been cleaned up, lifetime handling for ppgtt address spaces is fixed and bad interactions with secure batches are now also rectified. Enabling full ppgtt missed the feature cutoff by a hair though, but it's already enabling for the following release.

Basic support for execlists command submission from Ben Widawsky, Oscar Mateo and Thomas Daniel was also merged. This is the fancy new way to submit commands available on Gen8 and subsequent platforms. It's not yet enabled by default, but since it's a requirement for a lot of cool new features keep an eye on what's going on here. There is also a lot of work going on underneath to enable all this new code in GEM, like preparing to switch away from sequence numbers to tracking gpu progress more abstractly using the driver's request structures.

And this time around there is also some cool stuff going on in the drm core worth of a shout-out: The vblank handling code is massively revamped, hopefully plugging all the small races, inconsistencies and inefficiencies in that code. And thanks to David Herrmann it is finally possible to write a drm driver without the drm midlayer getting completely in the way of a proper driver load and unload sequence! Unfortunately i915 can't be converted right away since the legacy usermodesetting code crucial relies on this midlayer functionality. But that's now well deprecated and hopefully can be removed in one of the next releases.

03 Oct 2014 3:47pm GMT

02 Oct 2014

feedplanet.freedesktop.org

Christian Schaller: Fedora Workstation Progress Report (Wayland and more)

So I am writing this blog entry using the current development snapshot of the Fedora Workstation and using Wayland as my display server. It is an important milestone for the Fedora Workstation, for Wayland and for me personally. There are many things here I am very happy about, first of all this is a major milestone for what in some sense was the first and biggest engineering effort we kicked off under the Fedora Workstation banner, meaning it was an effort we decided to put our weight behind with the vision we have for the Fedora Workstation being the primary motivator for doing it. And it has been a big success in more ways than I expected, I think it is fair to say that the level of engagement and support from the wider community took me by surprise, and I want to state that if it wasn't for all the incredible effort from the wider community pushing Wayland forward we would not been able to provide something of this quality so soon.
The fact that Wayland now runs and works on non-Intel GPUs for this release, that XWayland is fully functional, that libinput is as far along as it is, are all thanks to the wider community. There are more people who have contributed than I can list, but I want to call out Adel Gadllah and Jonas Ådahl, who have contributed many crucial pieces to GNOME Shell, Wayland or libinput.

I would also like to specially say thank you to Jasper St.Pierre, because we would not be here today without his tireless effort on porting the GNOME Shell to Wayland. I think anyone who knows Jasper appreciates the amount of effort he puts in and the level of enthusiasm he brings to everything he does. So Jasper recently transferred from Red Hat to Endless Mobile and I am very happy that he will continue to contribute to both the GNOME Shell and Wayland as part of his job at Endless too, as he would be sorely missed both as a developer and as an individual otherwise.

Another person I want to call out at this point is of course Kristian Høgsberg, who created Wayland and got it to reach critical mass in terms of mindshare and functionality. Having been around linux for a long time I have seen efforts at replacing the X window system come and go so I know that achieving what Kristian has achieved here is not trivial at all. So a big thank you to Kristian for his incredible work and for his incredible level of persistence allowing Wayland become a reality where so many other projects have failed.

Wayland in Fedora Workstation 21 is also an important milestone as it exemplifies the new development philosophy we are embarking on. Fedora has for a long time been known to be a linux distribution where a lot of new pieces become available first. The problem here is that it has also given Fedora bit of a reputation for being not as dependable as some other distributions or operating systems, which has kept a lot of people away from Fedora that I think would be inclined to use it otherwise.

So we want to keep being a place where you do get access to new and exciting technologies first, but as you see with the Wayland effort we are now going to go the extra mile to make sure we offer this new technologies in a way that allows you to still use Fedora as your day to day working machine without worrying that these new features will hinder your work. So we will keep Wayland available as a separate non-default session until we feel very confident that our users are not going to be negatively impacted by the switch. Which means we want to fix and polish up the last remaining bits and pieces, make sure that performance is top notch, make sure all input hardware works flawlessly, work with NVidia and AMD to help them make their binary drivers available for Wayland before we make this the new default.

An crucial value for us at Red Hat and for the Fedora community is working closely with our upstreams. Which means we always aim at working with our upstream communities to get the features we need or bugfixes we want included in the upstream releases which we then integrate into Fedora (and Red Hat Enterprise Linux). Working closely with the upstream communities enables us to achieve a lot more than we would be able to do on our own. In preparation for Fedora Workstation 21 we have of course done a lot of work on improving the general Fedora desktop experience which has meant a lot of work has gone into GNOME 3.14. And while most of our upstream contributions here has been about code, not all of it is code. A major part of creating a modern and polished desktop experience is making sure that the applications you run conforms to a shared set of interface guidelines, to both bring a unique and polished look to the applications, but also to make using them easier as things like keybindings or work patterns you learn with one application will transfer over to the next. To help accelerate that process for the Fedora Workstation we had Allan Day work with the GNOME community to create am updated set of Human Interface Guidelines for GNOME 3 and thus implicitly for the Fedora Workstation.

Another crucial improvement that you will see in Fedora Workstation 21 is on software installation. There has been a range of things in Fedora in regards to software installation that has been suboptimal. On the command line and library level there has been a piece of Fedora that I know a lot of people have disliked, many to such a strong degree that they have kept away from Fedora, namely Yum. Yum for those who doesn't know it is the tool you used either directly or indirectly to install new software on a Fedora system. Yum used to be very slow and while it has gotten a lot better over the years it was still considered a bit of an eyesore for many. So Aleš Kozumplík and others have worked writing a new set of tools to do the low level software handling over the last few years and I am happy to say that for Fedora Workstation 21 we will be using those tools to greatly improve the software installation and update experience. There is a new commandline tool called dnf that will work with the same command line parameters you know from yum, but will complete its task much quicker than before.

The desktop Software installer side Richard Hughes has been working on making the installer use the new libraries developed for dnf, called hawkeye and libsolve, to provide you with a much smoother software installation experience in Fedora Workstation 21. So if you tried the preview we offered of the Software tool in Fedora 20, then I think you will find Software to be a lot more responsive in Fedora Workstation 21.
Of course a good software installer is not just about how nice the user interface looks or how quickly it can perform an installation, it is also very much a product of the quality of your installation metadata. Richard Hughes got a blog entry outlining the great progress is being made on providing more and improved metadata, like application descriptions and screenshots, for Fedora Workstation 21. Ryan Lerch has been working with Richard to improve our cover greatly which means the quality of the software listings in Fedora Workstation 21 should be greatly improved over what you saw in Fedora 20. For more details and screenshots Kalev Lember got a great writeup of the state of the Software installer in Fedora Workstation 21.

This also highlights one of the advantages of the new Fedora product model where we have one clear desktop product we are targeting, that we can define operating system standards for things like application metadata and apply them to the system as a whole. So for Fedora 22 we expect to make appdata metadata a mandatory part of the application packaging for Fedora, ensuring that any desktop application packaged for Fedora is easily discover able by our users. In the old 'bucket of parts' model these things would in practice not happen as there was no clear target that everyone was expected to aim for.

There has also been a lot of general user interface polish work happening, both on the toolkit level with a lot of work being done by our UI designers to improve the default desktop theme called Adwaita. And since we want people to run all kinds of applications in Fedora Workstation 21 we are not only doing this for GTK+, but we also have Martin Briza working on bringing Adwaita to Qt for Fedora Workstation. We hope to get the Qt theme packaged soon, but for those interested in taking a look the Adwaita Qt code can be found here. In Fedora Workstation 21 we hope to cover Qt4 applications using the standard Adwaita theme, with wider support planed for Fedora Workstation 22, to cover more Qt versions and also make sure we have full coverage for the Adwaita Dark variant and accessibility versions. There is a chance we will miss the Fedora 21 cutoff date with this theme, but hopefully we can then get it included during the Fedora Workstation 21 lifespan.

We also worked on improving the shell animations. Things like animations might seem like their unimportant, but they contribute greatly to the general feeling of polish in the system. The team worked hard on improving these for Fedora Workstation 21, so in GNOME 3.14 you will for instance see that the animations in the shell overview has been greatly improved.

Last but not least I want to say that while I am very excited about what we have put together for Fedora Workstation 21 it is just the beginning. Being the first release under the new 3 product strategy a lot of time and effort has gone into re-jigging the whole Fedora development process to cater for having 3 different products instead of one, changing the way the Fedora community organize itself, get contributors on board and re-aligned with the new products and so on and also refocus our internal development teams at Red Hat to start thinking about their development process and goals with contributing to these 3 new products in mind. So my expectation is that as we go towards Fedora Workstation 22 the pace of innovation and progress will only pick up. So great things are ahead and I hope that once Fedora Workstation 21 is released regardless of if you are a long time Fedora users, a lapsed former Fedora users or someone who has never tried Fedora before you will be willing to give it a try and hopefully become as excited about it as we are.

02 Oct 2014 10:51am GMT

01 Oct 2014

feedplanet.freedesktop.org

Alan Coopersmith: Moving Oracle Solaris to LP64 bit by bit

A few people have noticed a trend in the Oracle Solaris 11 update releases of delivering more and more Solaris commands as 64-bit binaries, so I figured it was time to write a detailed explanation to answer some of the questions and help prepare users & developers for further change, as it now becomes more critical to deliver shared objects in both 32-bit (ILP32) and 64-bit (LP64) versions.

I'd like to thank Ali, Gary, Margot, Rod, and Jeff for their feedback on this post, and most especially to Sharon for helping rework it to get to the most important bits first.

For the developers and administrators who are familiar with Solaris history, I'll start with info about what you should be doing to make sure you're ready for increased use of 64-bit software in Solaris. You can refer to the section that describes issues that arise when converting binaries - I cover examples from the X Window System packages for Oracle Solaris that I've done the conversion work for, and discuss LP64 conversion work by other Solaris engineers.

For those who need more background, see the sections about LP64 history in Solaris and the Application Binary Interface (ABI) differences between 32-bit and 64-bit binaries.

What do you need to do?

You need to do what you have always done - ensure you have both 32-bit and 64-bit versions of any shared objects. This requirement is being highlighted because the consequences of not providing both binary versions is becoming more disruptive in each subsequent Solaris release.

Development requirements

If you develop software for Solaris, the requirement is to provide both 32-bit and 64-bit versions of any shared objects you provide for other software to use, whether as libraries to link their programs against or as loadable objects for frameworks such as localized input methods, PAM service modules, custom crypt(3C) password hashes, or dozens of other shared object uses in Solaris.

Additionally, you should make sure all of your software is 64-bit clean, even if you still support it on 32-bit systems. While Solaris 10 still has more than 6 years left of active support, eventually you'll move on to only supporting Solaris 11 and later and be able to go 64-bit only as well. The Solaris 64-bit Developer's Guide and Solaris Studio Guide to Converting Applications for a 64-Bit Environment can help here. Oracle's ISV support team is also looking to provide more assistance in future versions of the Oracle Solaris Preflight Applications Checker tool to help find possible issues for you.

User requirements

Administrators and users should verify that the software that they install and use provides both 32-bit & 64-bit shared objects. If they are not provided, contact the developers to provide both binary versions.

You should also keep an eye on the End of Features (EOF) Planned for Future Releases of Oracle Solaris page to see if something you may still need is being removed because it was determined not to be useful any more when it was reviewed for LP64 conversion. The page is updated regularly as new items make their way through the internal obsolescence review processes. If something appears there that would cause you major grief, let Solaris development know through your sales or support channels, so we can supply a better transition or replacement plan where possible.

And of course, if you find bugs in the Solaris 64-bit converted software, or find that you need a 64-bit version of a particular Solaris library or shared object that's not already available, file bugs via Oracle Support or the Oracle Partner Network for Solaris.

Delivery of X commands as LP64 binaries

Because Solaris 11 no longer requires programs run on a 32-bit kernel, and the minimum supported system has 64 times as much RAM as the first Ultra 1, Oracle Solaris can now ship programs directly as 64-bit binaries, which better equips them to run on modern sized data sets, while utilizing the full capabilities of today's hardware, and have started doing so.

For instance, before Solaris 11 shipped, I switched the default build flags for the X Window System programs in Solaris to 64-bit (with a few exceptions). Solaris has long shipped all the non-obsolete public libraries for X as both 32-bit and 64-bit, and the upstream open source versions of this software were made 64-bit clean long ago, originally by DEC for their Alpha workstations, and maintained since then by the BSD & Linux platforms that delivered 64-bit only distributions instead of multilib models. For the most part, this was just an implementation detail for the X programs - their functionality didn't change, nor did their interfaces.

One case with a visible impact from becoming LP64-only was the Xorg server, which uses dlopen() to load shared object drivers for the specific hardware in use. Solaris 10 8/07 moved from Xorg 6.9 to 7.2 and started delivering Xorg as LP64 binaries for SPARC & x64, since video cards would soon have more VRAM than a 32-bit X server could access. This was also the first delivery of Xorg for SPARC, and did not need to support 32-bit SPARC platforms, so on SPARC Xorg was LP64 from the beginning. On x86, since Solaris was still supporting 32-bit platforms, the 64-bit Xorg was added alongside the existing 32-bit version. As of the 64-bit only release of Solaris 11, we could drop the 32-bit version of Xorg in Solaris, but because Solaris wasn't the only source of the loadable driver modules (for instance, nvidia & VirtualBox both provided drivers for their graphics), we ensured that 64-bit versions were available, before announcing the end of support for 32-bit driver modules.

One less visible impact was in xdm, the old style login GUI for X, which Solaris still ships, though most Solaris systems use the more modern GNOME display manager (gdm) instead. In order to authenticate users, xdm uses the PAM framework, which allows administrators to configure a variety of login methods, such as Kerberos or SmartCards. Administrators can also install additional PAM methods to work with authentication systems Solaris doesn't have built-in support for, or additional pluggable crypt modules to handle other password hashing methods. While PAM has supported 64-bit modules since Solaris 7, and the crypt framework has supported 64-bit modules since it was introduced in Solaris 9, most programs calling PAM and the crypt framework have been 32-bit. Installing only the 32-bit version of a crypt or PAM module thus worked most of the time. However, now that xdm is 64-bit, a 32-bit-only module will generate failure messages for users trying to login via xdm, because the system won't find a 64-bit module to load. While xdm and PAM show that a non-existent binary can prevent system login, other less prominent 64-bit shared objects are going to be required over time, so providers of shared objects need to ensure they are installing both 32-bit & 64-bit versions of their software going forward.

Some users of custom input methods for different languages may also notice that their input methods are not available in 64-bit programs, since input methods are similarly provided as shared objects that are loaded via dlopen() and thus also have to exist in the same ABI variants as the calling programs.

Conversion of Solaris commands to LP64 binaries

This effort isn't limited to X11 software - engineers in all areas of Solaris are evaluating the various programs and determining what needs to be done. They have two tasks - to ensure the program itself is 64-bit clean, and to ensure that the surrounding ecosystem (such as the PAM modules example above) is 64-bit ready. In some cases, they've found software Solaris doesn't need to ship any longer, like the gettable tool for maintaining Internet host tables in the days before DNS, and are publishing End of Support notices for them. But for most cases, they're working to deliver 64-bit versions into Solaris as time allows.

The results of this work are already visible - the number of LP64 programs in /usr/bin and /usr/sbin in the full Oracle Solaris package repositories has climbed each release:

Release 32-bit 64-bit total
Solaris 11.0 1707 (92%) 144 (8%) 1851
Solaris 11.1 1723 (92%) 150 (8%) 1873
Solaris 11.2 1652 (86%) 271 (14%) 1923

(These numbers only count ELF binaries, not python programs, shell scripts, etc.)

In Solaris 11.0, X11 programs provided the bulk of the LP64 programs, but there were some from other subsystems, such as gdb, emacs, and the NSS crypto commands. Solaris 11.1 added LP64 crypto commands, including digest, decrypt, elfsign, pktool, and tpmadm; as well as other commands like top & gzip. Solaris 11.2 added a number of LP64 GNU programs, including the GNU coreutils and groff. New tools added in 11.2 were 64-bit in their first delivery, such as mlocate, the Intel GPU tools, and the jsl JavaScript Lint package. The bzip2 compression program was made LP64 in 11.2 at the request of a customer who uses it on large files and wanted the extra performance on x64.

Solaris 11.2 also adds Java 8 packages, alongside Java 7, and the now deprecated Java 6 packages. Java 8 for Solaris is 64-bit-only, dropping the 32-bit binary option found in previous Java releases. As noted under the Removal of 32-bit Solaris item in the Features Removed From JDK8 list, the Java plugin for web browsers was also removed from Java 8 for Solaris, because a 64-bit Java plugin cannot be run in the 32-bit Firefox browser.

Solaris Data Model History

Solaris 11.2 represents the latest step in a 20 year journey.

The original Solaris 2.x ABI used a data model referred to as ILP32, in which the size of the C language types "int", "long", and pointers are all 32-bit numbers. This data model matched the SPARC & x86 CPUs available at the time.

In 1995, Sun introduced its first SPARC v9 CPU's, the UltraSPARC I, which offered 64-bit integers and addresses. Solaris 7 followed, bringing a second ABI to Solaris, using the LP64 model, in which "int" remained 32-bits wide, but "long" and pointer sizes doubled to 64-bits.

This affected both the kernel and user space code, and Solaris 7 delivered both 32-bit (ILP32) and 64-bit (LP64) kernels for UltraSPARC systems. The Solaris kernel implementation only allowed running 64-bit user space processes if a 64-bit kernel was loaded, but 32-bit user space software could be used with either a 32-bit or 64-bit kernel.

For user-space programs, the class (32 or 64-bit) of libraries must match the program that links to them. In order to support both 32 and 64-bit programs, it was necessary to provide both 32 and 64-bit versions of libraries. To preserve binary compatibility with existing 32-bit software, the libraries in directories such as /usr/lib were left as the 32-bit versions, and the 64-bit versions were added in a new sparcv9 subdirectory. On modern Linux systems, this approach is now called "multilib."

Because the UltraSPARC CPUs did not impose any significant performance penalty when running existing 32-bit code on a 64-bit CPU, Sun continued to ship 32-bit user-space programs in the ILP32 ABI so that the same binary could be used on both 32-bit-only and 64-bit-capable CPUs, reducing development, testing, and support costs. It also reduced by half the memory requirements for pointers and long ints (thus more easily fitting them in the CPU caches) in programs which wouldn't benefit from the larger sizes, an important consideration in systems with only 32MB of RAM. For the small number of programs that had to run 64-bit to be able to read 64-bit kernel structures or debug other 64-bit binaries, Solaris shipped both 32-bit & 64-bit versions, with isaexec used to execute the version matching the kernel in use.

On the x86 side, nearly a decade later AMD's first AMD64 CPUs provided similar hardware support, and Solaris 10 introduced a matching LP64 ABI for x86 platforms in 2005, with the 64-bit libraries delivered in amd64 subdirectories. Even though 32-bit binaries did not run as fast as fully 64-bit binaries, Solaris followed the same model of providing mostly ILP32 programs to get the release to market faster; to save on development, test, and support costs; and to be consistent with Solaris on SPARC platforms.

In these transitional phases, 32-bit & 64-bit software and hardware coexisted. For SPARC, support for 32-bit-only CPUs was phased out in Solaris 10, when support for the last pre-sun4u platforms was dropped. Support for the 32-bit kernel was dropped at the same time, because all remaining supported hardware could run the 64-bit kernel. For x86, support for 32-bit-only CPUs was dropped in Solaris 11.

Therefore, as of the shipping of Oracle Solaris 11 in November 2011, the supported set of platforms have 64-bit kernels that can run 32-bit or 64-bit user space binaries.

Other Differences Between 32-bit & 64-bit ABIs

While the size of the types is the defining difference between the two ABI models, the opportunity to introduce a fresh new ABI after learning of the mistakes and limitations of the old ABI was hard to resist, and other changes were made as well. Significant differences include:

Additionally, since not all 64 bits in pointers are needed to address current memory sizes, unused ones can be used for additional tasks, such as some of the new features coming in the SPARC M7 CPU.

Some other platforms followed a different strategy. For example, Linux introduced "x32", a new 32-bit ABI, and is considering a proposal for year-2038-safe ABIs. Engineers at Sun long ago debated a "large time" extension to the 32-bit ABI like the large file interfaces, but decided to concentrate efforts on LP64 instead. Because Solaris is not trying to maintain 32-bit kernel support for embedded devices, that is not a problem we have to solve as we move forward. The result should be a simpler system, which is always a benefit for developers and ISVs.

We don't know yet when we'll finish this journey, but hopefully we'll get there before the industry starts converting software to run on CPUs with 128-bit addressing.


Disclaimer: The preceding is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle's products remains at the sole discretion of Oracle.

01 Oct 2014 7:38am GMT

Keith Packard: chromium-dri3

Chromium (the browser) and DRI3

I got a note on IRC a week ago that Chromium was crashing with DRI3.

The Google team working on Chromium eventually sent me a link to the bug report. That's secret Google stuff, so you won't be able to follow the link, even though it's a bug in a free software application when running on free software drivers.

There's a bug report in the freedesktop bugzilla which looks the same to me.

In both cases, the recommended "fix" was to switch from DRI3 back to DRI2. That's not exactly a great plan, given that DRI3 offers better security between GPU-using applications, which seems like a pretty nice thing to have when you're running random GL applications from the web.

Chromium Sandboxing

I'm not entirely sure how it works, but Chromium creates a process separate from the main browser engine to talk to the GPU. That process has very limited access to the operating system via some fancy library adventures. Presumably, the hope is that security bugs in the GL driver would be harder to leverage into a remote system exploit.

Debugging in this environment is a bit tricky as you can't simply run chromium under gdb and expect to be able to set breakpoints in the GL driver. Instead, you have to run chromium with a magic flag which causes the GPU process to pause before loading the driver so you can connect to it with gdb and debug from there, along with a flag that lets you see crashes within the gpu process and the usual flag that causes chromium to ignore the GPU black list which seems to always include the Intel driver for one reason or another:

$ chromium --gpu-startup-dialog --disable-gpu-watchdog --ignore-gpu-blacklist

Once Chromium starts up, it will print out a message telling you to attach gdb to the GPU process and send that process a SIGUSR1 to continue it. Now you can happily debug and get a stack trace when the crash occurs.

Locating the Bug

The bug manifested with a segfault at the first access to a DRI3-allocated buffer within the application. We've seen this problem in the past; whenever buffer allocation fails for some reason, the driver ignores the problem and attempts to de-reference through the (NULL) buffer pointer, causing a segfault. In this case, Chromium called glClear, which tried (and failed) to allocate a back buffer causing the i965 driver to subsequently segfault.

We should probably go fix the i965 driver to not segfault when buffer allocation fails, but that wouldn't provide a lot of additional information. What I have done is add some error messages in the DRI3 buffer allocation path which at least tell you why the buffer allocation failed. That patch has been merged to Mesa master, and should also get merged to the Mesa stable branch for the next stable release.

Once I had added the error messages, it was pretty easy to see what happened:

$ chromium --ignore-gpu-blacklist
[10618:10643:0930/200525:ERROR:nss_util.cc(856)] After loading Root Certs, loaded==false: NSS error code: -8018
libGL: pci id for fd 12: 8086:0a16, driver i965
libGL: OpenDriver: trying /local-miki/src/mesa/mesa/lib/i965_dri.so
libGL: Can't open configuration file /home/keithp/.drirc: Operation not permitted.
libGL: Can't open configuration file /home/keithp/.drirc: Operation not permitted.
libGL error: DRI3 Fence object allocation failure Operation not permitted

The first two errors were just the sandbox preventing Mesa from using my GL configuration file. I'm not sure how that's a security problem, but it shouldn't harm the driver much.

The last error is where the problem lies. In Mesa, the DRI3 implementation uses a chunk of shared memory to hold a fence object that lets Mesa know when buffers are idle without using the X connection. That shared memory segment is allocated by creating a temporary file using the O_TMPFILE flag:

fd = open("/dev/shm", O_TMPFILE|O_RDWR|O_CLOEXEC|O_EXCL, 0666);

This call "cannot fail" as /dev/shm is used by glibc for shared memory objects, and must therefore be world writable on any glibc system. However, with the Chromium sandbox enabled, it returns EPERM.

Running Without a Sandbox

Now that the bug appears to be in the sandboxing code, we can re-test with the GPU sandbox disabled:

$ chromium --ignore-gpu-blacklist --disable-gpu-sandbox

And, indeed, without the sandbox getting in the way of allocating a shared memory segment, Chromium appears happy to use the Intel driver with DRI3.

Final Thoughts

I looked briefly at the Chromium sandbox code. It looks like it needs to know intimate details of the OpenGL implementation for every possible driver it runs on; it seems to contain a fixed list of all possible files and modes that the driver will pass to open(2). That seems incredibly fragile to me, especially when used in a general Linux desktop environment. Minor changes in how the GL driver operates can easily cause the browser to stop working.

01 Oct 2014 6:51am GMT

30 Sep 2014

feedplanet.freedesktop.org

Bastien Nocera: GTK+ widget templates now in Javascript

Let's get the features in early!

If you're working on a Javascript application for GNOME, you'll be interested to know that you can now write GTK+ widget templates in gjs.

Many thanks to Giovanni for writing the original patches. And now to a small example:

const MyComplexGtkSubclass = new Lang.Class({
Name: 'MyComplexGtkSubclass',
Extends: Gtk.Grid,
Template: 'resource:///org/gnome/myapp/widget.xml',
Children: ['label-child'],

_init: function(params) {
this.parent(params);

this._internalLabel = this.get_template_child(MyComplexGtkSubclass,
'label-child');
}
});


And now you just need to create your widget:

let content = new MyComplexGtkSubclass();
content._internalLabel.set_label("My updated label");


You'll need gjs from git master to use this feature. And if you see anything that breaks, don't hesitate to file a bug against gjs in the GNOME Bugzilla.

30 Sep 2014 12:56pm GMT

24 Sep 2014

feedplanet.freedesktop.org

Peter Hutterer: libinput - a common input stack for Wayland compositors and X.Org drivers

Last November, Jonas Ådahl sent an RFC to the wayland-devel list about a common library to handle input devices in Wayland compositors called libinput. Fast-forward and we are now at libinput 0.6, with a broad support of devices and features. In this post I'll give an overview on libinput and why it is necessary in the first place. Unsuprisingly I'll be focusing on Linux, for other systems please mentally add the required asterisks, footnotes and handwaving.

The input stack in X.org

The input stack as it currently works in X.org is a bit of a mess. I'm not even talking about the different protocol versions that we need to support and that are partially incompatible with each other (core, XI 1.x, XI2, XKB, etc.), I'm talking about the backend infrastructure. Let's have a look:

The graph above is a simplification of the input stack, focusing on the various high-level functionalities. The X server uses some device discovery mechanism (udev now, previously hal) and matches each device with an input driver (evdev, synaptics, wacom, ...) based on the configuration snippets (see your local /usr/share/X11/xorg.conf.d/ directory).

The idea of having multiple drivers for different hardware was great when hardware still mattered, but these days we only speak evdev and the drivers that are hardware-specific are racing the dodos to the finishing line.

The drivers can communicate with the server through the very limited xf86 DDX API, but there is no good mechanism to communicate between the drivers. It's possible, just not doable in a sane manner. Most drivers support multiple X server releases so any communication between drivers would need to take version number mixes into account. Only the server knows the drivers that are loaded, through a couple of structs. Knowledge of things like "which device is running synaptics" is obtainable, but not sensibly. Likewise, drivers can get to the drivers of other devices, but not reasonably so (the API is non-opaque, so you can get to anything if you find the matching header).

Some features are provided by the X server: pointer acceleration, disabling a device, mapping a device to monitor, etc. Most other features such as tapping, two-finger scrolling, etc. are provided by the driver. That leads to an interesting feature support matrix: synaptics and wacom both provide two-finger scrolling, but only synaptics has edge-scrolling. evdev and wacom have calibration support but they're incompatible configuration options, etc. The server in general has no idea what feature is being used, all it sees is button, motion and key events.

The general result of this separation is that of a big family gathering. It looks like a big happy family at first, but then you see that synaptics won't talk to evdev because of the tapping incident a couple of years back, mouse and keyboard are have no idea what forks and knives are for, wacom is the hippy GPL cousin that doesn't even live in the same state and no-one quite knows why elographics keeps getting invited. The X server tries to keep the peace by just generally getting in the way of everyone so no-one can argue for too long. You step back, shrug apologetically and say "well, that's just how these things are, right?"

To give you one example, and I really wish this was a joke: The server is responsible for button mappings, tapping is implemented in synaptics. In order to support left-handed touchpads, gnome-settings-daemon sets the button mappings on the device in the server. Then it has to swap the tapping actions from left/right/middle for 1/2/3-finger tap to right/left/middle. That way synaptics submits right button clicks for a one-finger tap, which is then swapped back by the server to a left click.

The X.org input drivers are almost impossible to test. synaptics has (quick guesstimate done with grep and wc) around 70 user-configurable options. Testing all combinations would be something around the order of 10101 combinations, not accounting for HW differences. Testing the driver on it's own is not doable, you need to fire up an X server and then run various tests against that (see XIT). But now you're not testing the driver, you're testing the whole stack. And you can only get to the driver through the X protocol, and that is 2 APIs away from the driver core. Plus, test results get hard to evaluate as different modules update separately.

So in summary, in the current stack features are distributed across modules that don't communicate with each other. The stack is impossible to test, partially thanks to the vast array of user-exposed options. These are largely technical issues, we control the xf86 DDX API and can break it when needed to, but at this point you're looking at something that resembles a rewrite anyway. And of course, don't you dare change my workflow!

The input stack in Wayland

From the input stack's POV, Wayland simply merges the X server and the input modules into one item. See the architecture diagram from the wayland docs:

evdev gets fed into the compositor and wayland comes out the other end. If life were so simple... Due to the X.org input modules being inseparable from the X server, Weston and other compositors started implementing their own input stack, separately. Let me introduce a game called feature bingo: guess which feature is currently not working in $COMPOSITOR. If you collect all five in a row, you get to shout "FFS!" and win the price of staying up all night fixing your touchpad. As much fun as that can be, maybe let's not do that.

libinput

libinput provides a full input stack to compositors. It does device discovery over udev and event processing and simply provides the compositor with the pre-processed events. If one of the devices is a touchpad, libinput will handle tapping, two-finger scrolling, etc. All the compositor needs to worry about is moving the visible cursor, selecting the client and converting the events into wayland protocol. The stack thus looks something like this:

Almost everything has moved into libinput, including device discovery and pointer acceleration. libinput has internal backends for pointers, touchpads, tablets, etc. but they are not exposed to the compositor. More importantly, libinput knows about all devices (within a seat), so cross-device communication is possible but invisible to the compositor. The compositor still does configuration parsing, but only for user-specific options such as whether to enable tapping or not. And it doesn't handle the actual feature, it simply tells libinput to enable or disable it.

The graph above also shows another important thing: libinput provides an API to the compositor. libinput is not "wayland-y", it doesn't care about the Wayland protocol, it's simply an input stack. Which means it can be used as base for an X.org input driver or even Canonical's MIR.

libinput is very much a black box, at least compared to X input drivers (remember those 70 options in synaptics?). The X mantra of "mechanism, not policy" allows for interesting use-cases, but it makes the default 90% use-case extremely painful from a maintainer's and integrator's point of view. libinput, much like wayland itself, is a lot more restrictive in what it allows, specifically in the requirement it places on the compositor. At the same time aims for a better out-of-the-box experience.

To give you an example, the X.org synaptics driver lets you arrange the software buttons more-or-less freely on the touchpad. The default placement is simply a config snippet. In libinput, the software buttons are always at the bottom of the touchpad and also at the top of the touchpad on some models (Lenovo *40 series, mainly). The buttons are of a fixed size (which we decided on after analysing usage data), and you only get a left and right button. The top software buttons have left/middle/right matching the markings on the touchpad. The whole configuration is decided based on the hardware. The compositor/user don't have to enable them, they are simply there when needed.

That may sound restrictive, but we have a number of features on top that we can enable where required. Pressing both left and right software buttons simultaneously gives you a middle button click; the middle button in the top software button row provides wheel emulation for the trackstick. When the touchpad is disabled, the top buttons continue to work and they even grow larger to make them easier to hit. The touchpad can be set to auto-disable whenever an external mouse is plugged in.

And the advantage of having libinput as a generic stack also results in us having tests. So we know what the interactions are between software buttons and tapping, we don't have to wait for a user to trip over a bug to tell us what's broken.

Summary

We need a new input stack for Wayland, simply because the design of compositors in a Wayland world is different. We can't use the current modules of X.org for a number of technical reasons, and the amount of work it would require to get those up to scratch and usable is equivalent to a rewrite.

libinput now provides that input stack. It's the default in Weston at the time of this writing and used in other compositors or in the process of being used. It abstracts most of input away and more importantly makes input consistent across all compositors.

24 Sep 2014 3:56am GMT

22 Sep 2014

feedplanet.freedesktop.org

Bastien Nocera: On fêtera la sortie de GNOME 3.14 mardi soir à Lyon

In French, for a change :)

Mardi soir, le 23 septembre, quelques-uns d'entre nous se retrouveront vers 18h30 au Smoking Dog pour quelques boissons, et poursuivront avec un dîner indien prés du métro St-Jean.

N'hésitez pas à vous inscrire sur le Wiki, que vous soyez utilisateurs de GNOME, développeurs ou simplement des amis du logiciel libre.

À mardi!

22 Sep 2014 7:00am GMT