22 Aug 2016

feedplanet.freedesktop.org

Eric Anholt: vc4 status update for 2016-08-22: camera, NIR, testing

Last week I finally plugged in the camera module I got a while ago to go take a look at what vc4 needs for displaying camera output.

The surprising answer was "nothing." vc4 could successfully import RGB dmabufs and display them as planes, even though I had been expecting to need fixes on that front.

However, the bcm2835 v4l camera driver needs a lot of work. First of all, it doesn't use the proper contiguous memory support in v4l (vb2-dma-contig), and instead asks the firmware to copy from the firmware's contiguous memory into vmalloced kernel memory. This wastes memory and wastes memory bandwidth, and doesn't give us dma-buf support.

Even more, MMAL (the v4l equivalent that the firmware exposes for driving the hardware) wants to output planar buffers with specific padding. However, instead of using the multi-plane format support in v4l to expose buffers with that padding, the bcm2835 driver asks the firmware to do another copy from the firmware's planar layout into the old no-padding V4L planar format.

As a user of the V4L api, you're also in trouble because none of these formats have any priority information that I can see: The camera driver says it's equally happy to give you RGB or planar, even though RGB costs an extra copy. I think properly done today, the camera driver would be exposing multi-plane planar YUV, and giving you a mem2mem adapter that could use MMAL calls to turn the planar YUV into RGB.

For now, I've updated the bug report with links to the demo code and instructions.

I also spent a little bit of time last week finishing off the series to use st/nir in vc4. I managed to get to no regressions, and landed it today. It doesn't eliminate TGSI, but it does mean TGSI is gone from the normal GLSL path.

Finally, I got inspired to do some work on testing. I've been doing some free time work on servo, Mozilla's Rust-based web browser, and their development environment has been a delight as a new developer. All patch submissions, from core developers or from newbies, go through github pull requests. When you generate a PR, Travis builds and runs the unit tests on the PR. Then a core developer reviews the code by adding a "r" comment in the PR or provides feedback. Once it's reviewed, a bot picks up the pull request, tries merging it to master, then runs the full integration test suite on it. If the test suite passes, the bot merges it to master, otherwise the bot writes a comment with a link to the build/test logs.

Compare this to Mesa's development process. You make a patch. You file it in the issue tracker and it gets utterly ignored. You complain, and someone tells you you got the process wrong, so you join the mailing list and send your patch (and then get a flood of email until you unsubscribe). It gets mangled by your email client, and you get told to use git-send-email, so you screw around with that for a while before you get an email that will actually show up in people's inboxes. Then someone reviews it (hopefully) before it scrolls off the end of their inbox, and then it doesn't get committed anyway because your name was familiar enough that the reviewer thought maybe you had commit access. Or they do land your patch, and it turns out you hasn't run the integration tests and then people complain at you for not testing.

So, as a first step toward making a process like Mozilla's possible, I put some time into fixing up Travis on Mesa, and building Travis support for the X Server. If I can get Travis to run piglit and ensure that expected-pass tests don't regress, that at least gives us a documentable path for new developers in these two projects to put their code up on github and get automated testing of the branches they're proposing on the mailing lists.

22 Aug 2016 8:17pm GMT

16 Aug 2016

feedplanet.freedesktop.org

Keith Packard: udevwrap

Wrapping libudev using LD_PRELOAD

Peter Hutterer and I were chasing down an X server bug which was exposed when running the libinput test suite against the X server with a separate thread for input. This was crashing deep inside libudev, which led us to suspect that libudev was getting run from multiple threads at the same time.

I figured I'd be able to tell by wrapping all of the libudev calls from the server and checking to make sure we weren't ever calling it from both threads at the same time. My first attempt was a simple set of cpp macros, but that failed when I discovered that libwacom was calling libgudev, which was calling libudev.

Instead of recompiling the world with my magic macros, I created a new library which exposes all of the (public) symbols in libudev. Each of these functions does a bit of checking and then simply calls down to the 'real' function.

Finding the real symbols

Here's the snippet which finds the real symbols:

static void *udev_symbol(const char *symbol)
{
    static void *libudev;
    static pthread_mutex_t  find_lock = PTHREAD_MUTEX_INITIALIZER;

    void *sym;
    pthread_mutex_lock(&find_lock);
    if (!libudev) {
        libudev = dlopen("libudev.so.1.6.4", RTLD_LOCAL | RTLD_NOW);
    }
    sym = dlsym(libudev, symbol);
    pthread_mutex_unlock(&find_lock);
    return sym;
}

Yeah, the libudev version is hard-coded into the source; I didn't want to accidentally load the wrong one. This could probably be improved...

Checking for re-entrancy

As mentioned above, we suspected that the bug was caused when libudev got called from two threads at the same time. So, our checks are pretty simple; we just count the number of calls into any udev function (to handle udev calling itself). If there are other calls in process, we make sure the thread ID for those is the same as the current thread.

static void udev_enter(const char *func) {
    pthread_mutex_lock(&check_lock);
    assert (udev_running == 0 || udev_thread == pthread_self());
    udev_thread = pthread_self();
    udev_func[udev_running] = func;
    udev_running++;
    pthread_mutex_unlock(&check_lock);
}

static void udev_exit(void) {
    pthread_mutex_lock(&check_lock);
    udev_running--;
    if (udev_running == 0)
    udev_thread = 0;
    udev_func[udev_running] = 0;
    pthread_mutex_unlock(&check_lock);
}

Wrapping functions

Now, the ugly part -- libudev exposes 93 different functions, with a wide variety of parameters and return types. I constructed a hacky macro, calls for which could be constructed pretty easily from the prototypes found in libudev.h, and which would construct our stub function:

#define make_func(type, name, formals, actuals)         \
    type name formals {                     \
    type ret;                       \
    static void *f;                     \
    if (!f)                         \
        f = udev_symbol(__func__);              \
    udev_enter(__func__);                   \
    ret = ((typeof (&name)) f) actuals;         \
    udev_exit();                        \
    return ret;                     \
    }

There are 93 invocations of this macro (or a variant for void functions) which look much like:

make_func(struct udev *,
      udev_ref,
      (struct udev *udev),
      (udev))

Using udevwrap

To use udevwrap, simply stick the filename of the .so in LD_PRELOAD and run your program normally:

# LD_PRELOAD=/usr/local/lib/libudevwrap.so Xorg 

Source code

I stuck udevwrap in my git repository:

http://keithp.com/cgi-bin/gitweb.cgi?p=udevwrap;a=summary

You can clone it using

$ git git://keithp.com/git/udevwrap

16 Aug 2016 6:32am GMT

15 Aug 2016

feedplanet.freedesktop.org

Lennart Poettering: Preliminary systemd.conf 2016 Schedule

A Preliminary systemd.conf 2016 Schedule is Now Available!

We have just published a first, preliminary version of the systemd.conf 2016 schedule. There is a small number of white slots in the schedule still, because we're missing confirmation from a small number of presenters. The missing talks will be added in as soon as they are confirmed.

The schedule consists of 5 workshops by high-profile speakers during the workshop day, 22 exciting talks during the main conference days, followed by one full day of hackfests.

Please sign up for the conference soon! Only a limited number of tickets are available, hence make sure to secure yours quickly before they run out! (Last year we sold out.) Please sign up here for the conference!

15 Aug 2016 10:00pm GMT

Eric Anholt: vc4 status update for 2016-08-15: DSI panel, Raspbian updates, and docs

Last week I mostly worked on getting the upstream work I and others have done into downstream Raspbian (most of that time unfortunately in setting up another Raspbian development environment, after yet another SD card failed).

However, the most exciting thing for most users is that with the merge of the rpi-4.4.y-dsi-stub-squash branch, the DSI display should now come up by default with the open source driver. This is unfortunately not a full upstreamable DSI driver, because the closed-source firmware is getting in the way of Linux by stealing our interrupts and then talking to the hardware behind our backs. To work around the firmware, I never talk to the DSI hardware, and we just replace the HVS display plane configuration on the DSI's output pipe. This means your display backlight is always on and the DSI link is always running, but better that than no display.

I also transferred the wiki I had made for VC4 over to github. In doing so, I was pleasantly surprised at how much documentation I wanted to write once I got off of the awful wiki software at freedesktop. You can find more information on VC4 at my mesa and linux trees.

(Side note, wikis on github are interesting. When you make your fork, you inherit the wiki of whoever you fork from, and you can do PRs back to their wiki similarly to how you would for the main repo. So my linux tree has Raspberry Pi's wiki too, and I'm wondering if I want to move all of my wiki over to their tree. I'm not sure.)

Is there anything that people think should be documented for the vc4 project that isn't there?

15 Aug 2016 4:35pm GMT

12 Aug 2016

feedplanet.freedesktop.org

Christian Schaller: Want make Linux run better on laptops?

So we have two jobs openings in the Red Hat desktop team. What we are looking for is people to help us ensure that Fedora and RHEL runs great on various desktop hardware, with a focus on laptops. Since these jobs require continuous access to a lot of new and different hardware we can not accept applications this time for remotees, but require you to work out of out office in Munich, Germany. We are looking for people with people not afraid to jump into a lot of different code and who likes tinkering with new hardware. The hardware enablement here might include some kernel level work, but will more likely involve improving higher level stacks. So for example if we have a new laptop where bluetooth doesn't work you would need to investigate and figure out if the problem is in the kernel, in the bluez stack or in our Bluetooth desktop parts.

This will be quite varied work and we expect you to be part of a team which will be looking at anything from driver bugs, battery life issues, implementing new stacks, biometric login and enabling existing features in the kernel or in low level libraries in the user interface.

You can read more about the jobs at the jobs.redhat.com. That link lists a Senior Engineer, but we also got a Principal Engineer position open with id 53653, but that one is not on the website as I post this, but should hopefully be very soon.

Also if you happen to be in the Karlsruhe area or at GUADEC this year I will be here until Sunday, so you could come over for a chat. Feel free to email me on christian.schaller@gmail.com if you are interested in meeting up.

12 Aug 2016 9:07am GMT

11 Aug 2016

feedplanet.freedesktop.org

Bastien Nocera: Flatpak cross-compilation support

A couple of weeks ago, I hinted at a presentation that I wanted to do during this year's GUADEC, as a Lightning talk.

Unfortunately, I didn't get a chance to finish the work that I set out to do, encountering a couple of bugs that set me back. Hopefully this will get resolved post-GUADEC, so you can expect some announcements later on in the year.

At least one of the tasks I set to do worked out, and was promptly obsoleted by a nicer solution. Let's dive in.

How to compile for a different architecture

There are four possible solutions to compile programs for a different architecture:

The final option is one that's used more and more, mixing the last 2 solutions: the QEmu user-space emulator.

Using the QEMU user-space emulator

If you want to run just the one command, you'd do something like:

qemu-static-arm myarmbinary


Easy enough, but hardly something you want to try when compiling a whole application, with library dependencies. This is where binfmt support in Linux comes into play. Register the ELF format for your target with that user-space emulator, and you can run myarmbinary without any commands before it.


One thing to note though, is that this won't work as easily if the qemu user-space emulator and the target executable are built as a dynamic executables: QEmu will need to find the libraries for your architecture, usually x86-64, to launch itself, and the emulated binary will also need to find its libraries.

To solve that first problem, there are QEmu static binaries available in a number of distributions (Fedora support is coming). For the second one, the easiest would be if we didn't have to mix native and target libraries on the filesystem, in a chroot, or container for example. Hmm, container you say.


Running QEmu user-space emulator in a container

We have our statically compiled QEmu, and a filesystem with our target binaries, and switched the root filesystem. Well, you try to run anything, and you get a bunch of errors. The problem is that there is a single binfmt configuration for the kernel, whether it's the normal OS, or inside a container or chroot.

The Flatpak hack

This commit for Flatpak works-around the problem. The binary for the emulator needs to have the right path, so it can be found within the chroot'ed environment, and it will need to be copied there so it is accessible too, which is what this patch will do for you.

Follow the instructions in the commit, and test it out with this Flatpak script for GNU Hello.

$ TARGET=arm ./build.sh
[...]
$ ls org.gnu.hello.arm.xdgapp
918k org.gnu.hello.arm.xdgapp

Ready to install on your device!

The proper way

The above solution was built before it looked like the "proper way" was going to find its way in the upstream kernel. This should hopefully land in the upcoming 4.8 kernel.

Instead of launching a separate binary for each non-native invocation, this patchset allows the kernel to keep the binary opened, so it doesn't need to be copied to the container.

In short

With the work being done on Fedora's static QEmu user-space emulators, and the kernel feature that will land, we should be able to have a nice tickbox in Builder to build for any of the targets supported by QEmu.

Get cross-compiling!

11 Aug 2016 2:00pm GMT

Julien Danjou: The definitive guide to Python exceptions

Three years after my definitive guide on Python classic, static, class and abstract methods, it seems to be time for a new one. Here, I would like to dissect and discuss Python exceptions.

Dissecting the base exceptions

In Python, the base exception class is named BaseException. Being rarely used in any program or library, it ought to be considered as an implementation detail. But to discover how it's implemented, you can go and read Objects/exceptions.c in the CPython source code. In that file, what is interesting is to see that the BaseException class defines all the basic methods and attribute of exceptions. The basic well-known Exception class is then simply defined as a subclass of BaseException, nothing more:

/*
* Exception extends BaseException
*/
SimpleExtendsException(PyExc_BaseException, Exception,
"Common base class for all non-exit exceptions.");


The only other exceptions that inherits directly from BaseException are GeneratorExit, SystemExit and KeyboardInterrupt. All the other builtin exceptions inherits from Exception. The whole hierarchy can be seen by running pydoc2 exceptions or pydoc3 builtins.

Here are the graph representing the builtin exceptions inheritance in Python 2 and Python 3 (generated using this script).

Python 2 builtin exceptions inheritance graph Python 3 builtin exceptions inheritance graph

The BaseException.__init__ signature is actually BaseException.__init__(*args). This initialization method stores any arguments that is passed in the args attribute of the exception. This can be seen in the exceptions.c source code - and is true for both Python 2 and Python 3:

static int
BaseException_init(PyBaseExceptionObject *self, PyObject *args, PyObject *kwds)
{
if (!_PyArg_NoKeywords(Py_TYPE(self)->tp_name, kwds))
return -1;

Py_INCREF(args);
Py_XSETREF(self->args, args);

return 0;
}


The only place where this args attribute is used is in the BaseException.__str__ method. This method uses self.args to convert an exception to a string:

static PyObject *
BaseException_str(PyBaseExceptionObject *self)
{
switch (PyTuple_GET_SIZE(self->args)) {
case 0:
return PyUnicode_FromString("");
case 1:
return PyObject_Str(PyTuple_GET_ITEM(self->args, 0));
default:
return PyObject_Str(self->args);
}
}


This can be translated in Python to:

def __str__(self):
if len(self.args) == 0:
return ""
if len(self.args) == 1:
return str(self.args[0])
return str(self.args)


Therefore, the message to display for an exception should be passed as the first and the only argument to the BaseException.__init__ method.

Defining your exceptions properly

As you may already know, in Python, exceptions can be raised in any part of the program. The basic exception is called Exception and can be used anywhere in your program. In real life, however no program nor library should ever raise Exception directly: it's not specific enough to be helpful.

Since all exceptions are expected to be derived from the base class Exception, this base class can easily be used as a catch-all:

try:
do_something()
except Exception:
# THis will catch any exception!
print("Something terrible happened")


To define your own exceptions correctly, there are a few rules and best practice that you need to follow:

Organization

There is no limitation on where and when you can define exceptions. As they are, after all, normal classes, they can be defined in any module, function or class - even as closures.

Most libraries package their exceptions into a specific exception module: SQLAlchemy has them in sqlalchemy.exc, requests has them in requests.exceptions, Werkzeug has them in werkzeug.exceptions, etc.

That makes sense for libraries to export exceptions that way, as it makes it very easy for consumers to import their exception module and know where the exceptions are defined when writing code to handle errors.

This is not mandatory, and smaller Python modules might want to retain their exceptions into their sole module. Typically, if your module is small enough to be kept in one file, don't bother splitting your exceptions into a different file/module.

While this wisely applies to libraries, applications tend to be different beasts. Usually, they are composed of different subsystems, where each one might have its own set of exceptions. This is why I generally discourage going with only one exception module in an application, but to split them across the different parts of one's program. There might be no need of a special myapp.exceptions module.

For example, if your application is composed of an HTTP REST API defined into the module myapp.http and of a TCP server contained into myapp.tcp, it's likely they can both define different exceptions tied to their own protocol errors and cycle of life. Defining those exceptions in a myapp.exceptions module would just scatter the code for the sake of some useless consistency. If the exceptions are local to a file, just define them somewhere at the top of that file. It will simplify the maintenance of the code.

Wrapping exceptions

Wrapping exception is the practice by which one exception is encapsulated into another:

class MylibError(Exception):
"""Generic exception for mylib"""
def __init__(self, msg, original_exception)
super(MylibError, self).__init__(msg + (": %s" % original_exception))
self.original_exception = original_exception

try:
requests.get("http://example.com")
except requests.exceptions.ConnectionError as e:
raise MylibError("Unable to connect", e)


This makes sense when writing a library which leverages other libraries. If a library uses requests and does not encapsulate requests exceptions into its own defined error classes, it will be a case of layer violation. Any application using your library might receive a requests.exceptions.ConnectionError, which is a problem because:

  1. The application has no clue that the library was using requests and does not need/want to know about it.
  2. The application will have to import requests.exceptions itself and therefore will depend on requests - even if it does not use it directly.
  3. As soon as mylib changes from requests to e.g. httplib2, the application code catching requests exceptions will become irrelevant.

The Tooz library is a good example of wrapping, as it uses a driver-based approach and depends on a lot of different Python modules to talk to different backends (ZooKeeper, PostgreSQL, etcd…). Therefore, it wraps exception from other modules on every occasion into its own set of error classes. Python 3 introduced the raise from form to help with that, and that's what Tooz leverages to raise its own error.

It's also possible to encapsulate the original exception into a custom defined exception, as done above. That makes the original exception available for inspection easily.

Catching and logging

When designing exceptions, it's important to remember that they should be targeted both at humans and computers. That's why they should include an explicit message, and embed as much information as possible. That will help to debug and write resilient programs that can pivot their behavior depending on the attributes of exception, as seen above.

Also, silencing exceptions completely is to be considered as bad practice. You should not write code like that:

try:
do_something()
except Exception:
# Whatever
pass


Not having any kind of information in a program where an exception occurs is a nightmare to debug.

If you use (and you should) the logging library, you can use the exc_info parameter to log a complete traceback when an exception occurs, which might help debugging on severe and unrecoverable failure:

try:
do_something()
except Exception:
logging.getLogger().error("Something bad happened", exc_info=True)


Further reading

If you understood everything so far, congratulations, you might be ready to handle exception in Python! If you want to have a broader scope on exceptions and what Python misses, I encourage you to read about condition systems and discover the generalization of exceptions - that I hope we'll see in Python one day!

I hope this will help you building better libraries and application. Feel free to shoot any question in the comment section!

11 Aug 2016 9:52am GMT

09 Aug 2016

feedplanet.freedesktop.org

Bastien Nocera: Blog backlog, Post 4, Headset fixes for Dell machines

At the bottom of the release notes for GNOME 3.20, you might have seen the line:

If you plug in an audio device (such as a headset, headphones or microphone) and it cannot be identified, you will now be asked what kind of device it is. This addresses an issue that prevented headsets and microphones being used on many Dell computers.

Before I start explaining what this does, as a picture is worth a thousand words:


This selection dialogue is one you will get on some laptops and desktop machines when the hardware is not able to detect whether the plugged in device is headphones, a microphone, or a combination of both, probably because it doesn't have an impedance detection circuit to figure that out.

This functionality was integrated into Unity's gnome-settings-daemon version a couple of years ago, written by David Henningsson.

The code that existed for this functionality was completely independent, not using any of the facilities available in the media-keys plugin to volume keys, and it could probably have been split as an external binary with very little effort.

After a bit of to and fro, most of the sound backend functionality was merged into libgnome-volume-control, leaving just 2 entry points, one to signal that something was plugged into the jack, and another to select which type of device was plugged in, in response to the user selection. This means that the functionality should be easily implementable in other desktop environments that use libgnome-volume-control to interact with PulseAudio.

Many thanks to David Henningsson for the original code, and his help integrating the functionality into GNOME, Bednet for providing hardware to test and maintain this functionality, and Allan, Florian and Rui for working on the UI notification part of the functionality, and wiring it all up after I abandoned them to go on holidays ;)

09 Aug 2016 9:49am GMT

08 Aug 2016

feedplanet.freedesktop.org

Eric Anholt: vc4 status update for 2016-08-08: cutting memory usage

Last week's project for vc4 was to take a look at memory usage. Eben had expressed concern that the new driver stack would use more memory than the closed stack, and so I figured I would spend a little while working on that.

I first pulled out valgrind's massif tool on piglit's glsl-algebraic-add-add-1.shader_test. This works as a minimum "how much memory does it take to render *anything* with this driver?" test. We were consuming 1605k of heap at the peak, and there were some obvious fixes to be made.

First, the gallium state_tracker was allocating 659kb of space at context creation so that it could bytecode-interpret TGSI if needed for glRasterPos() and glRenderMode(GL_FEEDBACK). Given that nobody should ever use those features, and luckily they rarely do, I delayed the allocation of the somewhat misleadingly-named "draw" context until the fallbacks were needed.

Second, Mesa was allocating the memory for the GL 1.x matrix stacks up front at context creation. We advertise 32 matrices for modelview/projection, 10 per texture unit (32 of those), and 4 for programs. I instead implemented a typical doubling array reallocation scheme for storing the matrices, so that only the top matrix per stack is allocated at context creation. This saved 63kb of dirty memory per context.

722KB for these two fixes may not seem like a whole lot of memory to readers on fancy desktop hardware with 8GB of RAM, but the Raspberry Pi has only 1GB of RAM, and when you exhaust that you're swapping to an SD card. You should also expect a desktop to have several GL contexts created: the X Server uses one to do its rendering, you have a GL-based compositor with its own context, and your web browser and LibreOffice may each have one or more. Additionally, trying to complete our piglit testsuite on the Raspberry Pi is currently taking me 6.5 hours (when it even succeeds and doesn't see piglit's python runner get shot by the OOM killer), so I could use any help I can get in reducing context initialization time.

However, malloc()-based memory isn't all that's involved. The GPU buffer objects that get allocated don't get counted by massif in my analysis above. To try to approximately fix this, I added in valgrind macro calls to mark the mmap()ed space in a buffer object as being a malloc-like operation until the point that the BO is freed. This doesn't get at allocations for things like the window-system renderbuffers or the kernel's overflow BO (valgrind requires that you have a pointer involved to report it to massif), but it does help.

Once I has massif reporting more, I noticed that glmark2 -b terrain was allocating a *lot* of memory for shader BOs. Going through them, an obvious problem was that we were generating a lot of shaders for glGenerateMipmap(). A few weeks ago I improved performance on the benchmark by fixing glGenerateMipmap()'s fallback blits that we were doing because vc4 doesn't support the GL_TEXTURE_BASE_LEVEL that the gallium aux code uses. I had fixed the fallback by making the shader do an explicit-LOD lookup of the base level if the GL_TEXTURE_BASE_LEVEL==GL_TEXTURE_MAX_LEVEL. However, in the process I made the shader depend on that base level, so we would comple a new shader variant per level of the texture. The fix was to make the base level into a uniform value that's uploaded per draw call, and with that change I dropped 572 shader variants from my shader-db results.

Reducing extra shaders was fun, so I set off on another project I had thought of before. VC4's vertex shader to fragment shader IO system is a bit unusual in that it's just a FIFO of floats (effectively), with none of these silly "vec4"s that permeate GLSL. Since I can take my inputs in any order, and more flexibility in the FS means avoiding register allocation failures sometimes, I have the FS compiler tell the VS what order it would like its inputs in. However, the list of all the inputs in their arbitrary orders would be expensive to hash at draw time, so I had just been using the identity of the compiled fragment shader variant in the VS and CS's key to decide when to recompile it in case output order changed. The trick was that, while the set of all possible orders is huge, the number that any particular application will use is quite small. I take the FS's input order array, keep it in a set, and use the pointer to the data in the set as the key. This cut 712 shaders from shader-db.

Also, embarassingly, when I mentioned tracking the FS in the CS's key above? Coordinate shaders don't output anything to the fragment shader. Like the name says, they just generate coordinates, which get consumed by the binner. So, by removing the FS from the CS key, I trivially cut 754 shaders from shader-db. Between the two, piglit's gl-1.0-blend-func test now passes instead of OOMing, so we get test coverage on blending.

Relatedly, while working on fixing a kernel oops recently, I had noticed that we were still reallocating the overflow BO on every draw call. This was old debug code from when I was first figuring out how overflow worked. Since each client can have up to 5 outstanding jobs (limited by Mesa) and each job was allocating a 256KB BO, we coud be saving a MB or so per client assuming they weren't using much of their overflow (likely true for the X Server). The solution, now that I understand the overflow system better, was just to not reallocate and let the new job fill out the previous overflow area.

Other projects for the week that I won't expand on here: Debugging GPU hang in piglit glsl-routing (generated fixes for vc4-gpu-tools parser, tried writing a GFXH30 workaround patch, still not fixed) and working on supporting direct GLSL IR to NIR translation (lots of cleanups, a couple fixes, patches on the Mesa list).

08 Aug 2016 9:22pm GMT

05 Aug 2016

feedplanet.freedesktop.org

Peter Hutterer: libinput and disable-while-typing

A common issue with users typing on a laptop is that the user's palms will inadvertently get in contact with the touchpad at some point, causing the cursor to move and/or click. In the best case it's annoying, in the worst case you're now typing your password into the newly focused twitter application. While this provides some general entertainment and thus makes the world a better place for a short while, here at the libinput HQ [1] we strive to keep life as boring as possible and avoid those situations.

The best way to avoid accidental input is to detect palm touches and simply ignore them. That works ok-ish on some touchpads and fails badly on others. Lots of hardware is barely able to provide an accurate touch location, let alone enough information to decide whether a touch is a palm. libinput's palm detection largely works by using areas on the touchpad that are likely to be touched by the palms.

The second-best way to avoid accidental input is to disable the touchpad while a user is typing. The libinput marketing department [2] has decided to name this feature "disable-while-typing" (DWT) and it's been in libinput for quite a while. In this post I'll describe how exactly DWT works in libinput.

Back in the olden days of roughly two years ago we all used the synaptics X.Org driver and were happy with it [3]. Disable-while-typing was featured there through the use of a tool called syndaemon. This synaptics daemon [4] has two modes. One was to poll the keyboard state every few milliseconds and check whether a key was down. If so, syndaemon sends a command to the driver to tell it to disable itself. After a timeout when the keyboard state is neutral again syndaemon tells the driver to re-enable itself. This causes a lot of wakeups, especially during those 95% of the time when the user isn't actually typing. Or missed keys if the press + release occurs between two polls. Hence the second mode, using the RECORD extension, where syndaemon opens a second connection to the X server and end checks for key events [5]. If it sees one float past, it tells the driver to disable itself, and so on and so forth. Either way, you had a separate process that did that job. syndaemon had a couple of extra options and features that I'm not going to discuss here, but we've replicated the useful ones in libinput.

libinput has no external process, DWT is integrated into the library with a couple of smart extra features. This is made easier by libinput controlling all the devices, so all keyboard events are passed around internally to the touchpad backend. That backend then decides whether it should stop sending events. And this is where the interesting bits come in.

First, we have different timeouts: if you only hit a single key, the touchpad will re-enable itself quicker than after a period of typing. So if you use the touchpad, hit a key to trigger some UI the pointer only stops moving for a very short time. But once you type, the touchpad disables itself longer. Since your hand is now in a position over the keyboard, moving back to the touchpad takes time anyway so a longer timeout doesn't hurt. And as typing is interrupted by pauses, a longer timeout bridges over those to avoid accidental movement of the cursor.

Second, any touch started while you were typing is permanently ignored, so it's safe to rest the palm on the touchpad while typing and leave it there. But we keep track of the start time of each touch so any touch started after the last key event will work normally once the DWT timeout expires. You may feel a short delay but it should be well in the acceptable range of a tens of ms.

Third, libinput is smart enough to detect which keyboard to pair with. If you have an external touchpad like the Apple Magic Trackpad or a Logitech T650, DWT will never enable on those. Likewise, typing on an external keyboard won't disable the internal touchpad. And in the rare case of two internal touchpads [6], both of them will do the right thing. As of systemd v231 the information of whether a touchpad is internal or external is available in the ID_INPUT_TOUCHPAD_INTEGRATION udev tag and thus available to everyone, not just libinput.

Finally, modifier keys are ignored for DWT, so using the touchpad to do shift-clicks works unimpeded. This also goes for the F-Key row and the numpad if you have any. These keys are usually out of the range of the touchpad anyway so interference is not an issue here. As of today, modifier key combos work too. So hitting Ctrl+S to save a document won't disable the touchpad (or any other modifiers + key combination). But once you are typing DWT activates and if you now type Shift+S to type the letter 'S' the touchpad remains disabled.

So in summary: what we've gained from switching to libinput is one external process less that causes wakeups and the ability to be a lot smarter about when we disable the touchpad. Coincidentally, libinput has similar code to avoid touchpad interference when the trackpoint is in use.

[1] that would be me
[2] also me
[3] uphill, both ways, snow, etc.
[4] nope. this one wasn't my fault
[5] Yes, syndaemon is effectively a keylogger, except it doesn't do any of the "logging" bit a keylogger would be expected to do to live up to its name
[6] This currently happens on some Dell laptops using hid-i2c. We get two devices, one named "DLL0704:01 06CB:76AE Touchpad" or similar and one "SynPS/2 Synaptics TouchPad". The latter one will never send events unless hid-i2c is disabled in the kernel

05 Aug 2016 5:15am GMT

01 Aug 2016

feedplanet.freedesktop.org

Rob Clark: dirty tricks for moar fps!

This weekend I landed a patchset in mesa to add support for resource shadowing and batch re-ordering in freedreno. What this is, will take a bit of explaining, but the tl;dr: is a nice fps boost in many games/apps.

But first, a bit of background about tiling gpu's: the basic idea of a tiler is to render N draw calls a tile at a time, with a tile's worth of the "framebuffer state" (ie. each of the MRT color bufs + depth/stencil) resident in an internal tile buffer. The idea is that most of your memory traffic is to/from your color and z/s buffers. So rather than rendering each of your draw calls in it's entirety, you split the screen up into tiles and repeat each of the N draws for each tile to fast internal/on-chip memory. This avoids going back to main memory for each of the color and z/s buffer accesses, and enables a tiler to do more with less memory bandwidth. But it means there is never a single point in the sequence of draws.. ie. draw #1 for tile #2 could happen after draw #2 for tile #1. (Also, that is why GL_TIMESTAMP queries are bonkers for tilers.)

For purpose of discussion (and also how things are named in the code, if you look), I will define a tile-pass, ie. rendering of N draws for each tile in succession (or even if multiple tiles are rendered in parallel) as a "batch".

Unfortunately, many games/apps are not written with tilers in mind. There are a handful of common anti-patterns which force a driver for a tiling gpu to flush the current batch. Examples are unnecessary FBO switches, and texture or UBO uploads mid-batch.

For example, with a 1920x1080 r8g8b8a8 render target, with z24s8 depth/stencil buffer, an unnecessary batch flush costs you 16MB of write memory bandwidth, plus another 16MB of read when we later need to pull the data back into the tile buffer. That number can easily get much bigger with games using float16 or float32 (rather than 8 bits per component) intermediate buffers, and/or multiple render targets. Ie. two MRT's with float16 internal-format plus z24s8 z/s would be 40MB write + 40MB read per extra flush.

So, take the example of a UBO update, at a point where you are not otherwise needing to flush the batch (ie. swapbuffers or FBO switch). A straightforward gl driver for a tiler would need to flush the current batch, so each of the draws before the UBO update would see the old state, and each of the draws after the UBO update would see the new state.

Enter resource shadowing and batch reordering. Two reasonably big (ie. touches a lot of the code) changes in the driver which combine to avoid these extra batch flushes, as much as possible.

Resource shadowing is allocating a new backing GEM buffer object (BO) for the resource (texture/UBO/VBO/etc), and if necessary copying parts of the BO contents to the new buffer (back-blit).

So for the example of the UBO update, rather than taking the 16MB+16MB (or more) hit of a tile flush, why not just create two versions of the UBO. It might involve copying a few KB's of UBO (ie. whatever was not overwritten by the game), but that is a lot less than 32MB?

But of course, it is not that simple. Was the buffer or texture level mapped with GL_MAP_INVALIDATE_BUFFER_BIT or GL_MAP_INVALIDATE_RANGE_BIT? (Or GL API that implies the equivalent, although fortunately as a gallium driver we don't have to care so much about all the various different GL paths that amount to the same thing for the hw.) For a texture with mipmap levels, we unfortunately don't know at the time where we need to create the new shadow BO whether the next GL calls will glGenerateMipmap() or upload the remaining mipmap levels. So there is a bit of complexity in handling all the cases properly. There may be a few more cases we could handle without falling back to flushing the current batch, but for now we handle all the common cases.

The batch re-ordering component of this allows any potential back-blits from the shadow'd BO to the new BO (when resource shadowing kicks in), to be split out into a separate batch. The resource/dependency tracking between batches and resources (ie. if various batches need to read from a given resource, we need to know that so they can be executed before something writes to the resource) lets us know which order to flush various in-flight batches to achieve correct results. Note that this is partly because we use util_blitter, which turns any internally generated resource-shadowing back-blits into normal draw calls (since we don't have a dedicated blit pipe).. but this approach also handles the unnecessary FBO switch case for free.

Unfortunately, the batch re-ordering required a bit of an overhaul about how cmdstream buffers are handled, which required changes in all layers of the stack (mesa + libdrm + kernel). The kernel changes are in drm-next for 4.8 and libdrm parts are in the latest libdrm release. And while things will continue to work with a new userspace and old kernel, all these new optimizations will be disabled.

(And, while there is a growing number of snapdragon/adreno SBC's and phones/tablets getting upstream attention, if you are stuck on a downstream 3.10 kernel, look here.)

And for now, even with a new enough kernel, for the time being reorder support is not enabled by default. There are a couple more piglit tests remaining to investigate, but I'll probably flip it to be enabled by default (if you have a new enough kernel) before the next mesa release branch. Until then, use FD_MESA_DEBUG=reorder (and once the default is switched, that would be FD_MESA_DEBUG=noreorder to disable).

I'll cover the implementation and tricks to keep the CPU overhead of all this extra bookkeeping small later (probably at XDC2016), since this post is already getting rather long. But the juicy bits: ~30% gain in supertuxkart (new render engine) and ~20% gain in manhattan are the big winners. In general at least a few percent gain in most things I looked at, generally in the 5-10% range.





01 Aug 2016 12:28pm GMT

27 Jul 2016

feedplanet.freedesktop.org

Lennart Poettering: FINAL REMINDER! systemd.conf 2016 CfP Ends on Monday!

Please note that the systemd.conf 2016 Call for Participation ends on Monday, on Aug. 1st! Please send in your talk proposal by then! We've already got a good number of excellent submissions, but we are very interested in yours, too!

We are looking for talks on all facets of systemd: deployment, maintenance, administration, development. Regardless of whether you use it in the cloud, on embedded, on IoT, on the desktop, on mobile, in a container or on the server: we are interested in your submissions!

In addition to proposals for talks for the main conference, we are looking for proposals for workshop sessions held during our Workshop Day (the first day of the conference). The workshop format consists of a day of 2-3h training sessions, that may cover any systemd-related topic you'd like. We are both interested in submissions from the developer community as well as submissions from organizations making use of systemd! Introductory workshop sessions are particularly welcome, as the Workshop Day is intended to open up our conference to newcomers and people who aren't systemd gurus yet, but would like to become more fluent.

For further details on the submissions we are looking for and the CfP process, please consult the CfP page and submit your proposal using the provided form!

ALSO: Please sign up for the conference soon! Only a limited number of tickets are available, hence make sure to secure yours quickly before they run out! (Last year we sold out.) Please sign up here for the conference!

AND OF COURSE: We are also looking for more sponsors for systemd.conf! If you are working on systemd-related projects, or make use of it in your company, please consider becoming a sponsor of systemd.conf 2016! Without our sponsors we couldn't organize systemd.conf 2016!

Thank you very much, and see you in Berlin!

27 Jul 2016 10:00pm GMT

20 Jul 2016

feedplanet.freedesktop.org

Peter Hutterer: libinput is done

Don't panic. Of course it isn't. Stop typing that angry letter to the editor and read on. I just picked that title because it's clickbait and these days that's all that matters, right?

With the release of libinput 1.4 and the newest feature to add tablet pad mode switching, we've now finished the TODO list we had when libinput was first conceived. Let's see what we have in libinput right now:

The side-effect of libinput is that we are also trying to fix the rest of the stack where appropriate. Mostly this meant pushing stuff into systemd/udev so far, with the odd kernel fix as well. Specifically the udev bits means we

And of course, the whole point of libinput is that it can be used from any Wayland compositor and take away most of the effort of implementing an input stack. GNOME, KDE and enlightenment already uses libinput, and so does Canonical's Mir. And some distribution use libinput as the default driver in X through xf86-input-libinput (Fedora 22 was the first to do this). So overall libinput is already quite a success.

The hard work doesn't stop of course, there are still plenty of areas where we need to be better. And of course, new features come as HW manufacturers bring out new hardware. I already have touch arbitration on my todo list. But it's nice to wave at this big milestone as we pass it into the way to the glorious future of perfect, bug-free input. At this point, I'd like to extend my thanks to all our contributors: Andreas Pokorny, Benjamin Tissoires, Caibin Chen, Carlos Garnacho, Carlos Olmedo Escobar, David Herrmann, Derek Foreman, Eric Engestrom, Friedrich Schöller, Gilles Dartiguelongue, Hans de Goede, Jackie Huang, Jan Alexander Steffens (heftig), Jan Engelhardt, Jason Gerecke, Jasper St. Pierre, Jon A. Cruz, Jonas Ådahl, JoonCheol Park, Kristian Høgsberg, Krzysztof A. Sobiecki, Marek Chalupa, Olivier Blin, Olivier Fourdan, Peter Frühberger, Peter Hutterer, Peter Korsgaard, Stephen Chandler Paul, Thomas Hindoe Paaboel Andersen, Tomi Leppänen, U. Artie Eoff, Velimir Lisec.

Finally: libinput was started by Jonas Ådahl in late 2013, so it's already over 2.5 years old. And the git log shows we're approaching 2000 commits and a simple LOCC says over 60000 lines of code. I would also like to point out that the vast majority of commits were done by Red Hat employees, I've been working on it pretty much full-time since 2014 [3]. libinput is another example of Red Hat putting money, time and effort into the less press-worthy plumbing layers that keep our systems running. [4]

[1] Ironically, that's also the biggest cause of bugs because touchpads are terrible. synaptics still only does single-finger with a bit of icing and on bad touchpads that often papers over hardware issues. We now do that in libinput for affected hardware too.
[2] The synaptics driver uses absolute numbers, mostly based on the axis ranges for Synaptics touchpads making them unpredictable or at least different on other touchpads.
[3] Coincidentally, if you see someone suggesting that input is easy and you can "just do $foo", their assumptions may not match reality
[4] No, Red Hat did not require me to add this. I can pretty much write what I want in this blog and these opinions are my own anyway and don't necessary reflect Red Hat yadi yadi ya. The fact that I felt I had to add this footnote to counteract whatever wild conspiracy comes up next is depressing enough.

20 Jul 2016 12:45am GMT

19 Jul 2016

feedplanet.freedesktop.org

Dave Airlie: radv: initial hacking on a vulkan driver for AMD VI GPUs

(email sent to mesa-devel list).

I was waiting for an open source driver to appear when I realised I should really just write one myself, some talking with Bas later, and we decided to see where we could get.

This is the point at which we were willing to show it to others, it's not really a vulkan driver yet, so far it's a vulkan triangle demos driver.

It renders the tri and cube demos from the vulkan loader,
and the triangle demo from Sascha Willems demos
and the Vulkan CTS smoke tests (all 4 of them one of which draws a triangle).

There is a lot of work to do, and it's at the stage where we are seeing if anyone else wants to join in at the start, before we make too many serious design decisions or take a path we really don't want to.

So far it's only been run on Tonga and Fiji chips I think, we are hoping to support radeon kernel driver for SI/CIK at some point, but I think we need to get things a bit further on VI chips first.

The code is currently here:
https://github.com/airlied/mesa/tree/semi-interesting

There is a not-interesting branch which contains all the pre-history which might be useful for someone else bringing up a vulkan driver on other hardware.

The code is pretty much based on the Intel anv driver, with the winsys ported from gallium driver,
and most of the state setup from there. Bas wrote the code to connect NIR<->LLVM IR so we could reuse it in the future for SPIR-V in GL if required. It also copies AMD addrlib over, (this should be shared).

Also we don't do SPIR-V->LLVM direct. We use NIR as it has the best chance for inter shader stage optimisations (vertex/fragment combined) which neither SPIR-V or LLVM handles for us, (nir doesn't do it yet but it can).

If you want to submit bug reports, they will only be taken seriously if accompanied by working patches at this stage, and we've no plans to merge to master yet, but open to discussion on when we could do that and what would be required.

19 Jul 2016 7:59pm GMT

Bastien Nocera: GUADEC Flatpak contest

I will be presenting a lightning talk during this year's GUADEC, and running a contest related to what I will be presenting.

Contest

To enter the contest, you will need to create a Flatpak for a piece of software that hasn't been flatpak'ed up to now (application, runtime or extension), hosted in a public repository.

You will have to send me an email about the location of that repository.

I will choose a winner amongst the participants, on the eve of the lightning talks, depending on, but not limited to, the difficulty of packaging, the popularity of the software packaged and its redistributability potential.

You can find plenty of examples (and a list of already packaged applications and runtimes) on this Wiki page.

Prize

A piece of hardware that you can use to replicate my presentation (or to replicate my attempts at a presentation, depending ;). You will need to be present during my presentation at GUADEC to claim your prize.

Good luck to one and all!

19 Jul 2016 2:39pm GMT

Daniel Vetter: New Blog Engine!

I finally unlazied and moved my blog away from the Google mothership to something simply, fast and statically generated. It's built on Jekyll, hosted on github. It's not quite as fancy as the old one, but with some googling I figured out how to add pages for tags and an archive section, and that's about all that's really needed.

Comments are gone too, because I couldn't be bothered, and because everything seems to add Orwellian amounts of trackers. Ping me on IRC, by mail or on twitter instead. The share buttons are also just plain links now without tracking for Twitter (because I'm there) and G+ (because all the cool kernel hackers are there, but I'm not cool enough).

And in case you wonder why I blatter for so long about this change: I need a new blog entry to double check that the generated feeds are still at the right spots for the various planets to pick them up …

19 Jul 2016 12:00am GMT