02 Apr 2015

feedplanet.freedesktop.org

Daniel Vetter: Community Code of Conduct for intel-gfx

[This is a cross-post from the mail I just sent out to intel-gfx.]

Code of conducts seem to be in the news a bit recently, and I realized that I've never really documented how we run things. It's different from the kernel's overall CodeOfConflict and also differs from the official Intel/OTC one in small details about handling issues. And for completeness there's also the Xorg Foundation event policy. Anyway, I think this is worth clarifying and here it goes.

It's simple: Be respectful, open and excellent to each another.

Which doesn't mean we want to sacrifice quality to be nice. Striving for technical excellence very much doesn't exclude being excellent to someone else, and in our experience it tends to go hand in hand.

Unfortunately things go south occasionally. So if you feel threatened, personally abused or otherwise uncomfortable, even and especially when you didn't participate in a discussion yourself, then please raise this in private with the drm/i915 maintainers (currently Daniel Vetter and Jani Nikula, see MAINTAINERS for contact information). And the "in private" part is important: Humans screw up, disciplining minor fumbles by tarnishing someones google-able track record forever is out of proportion.

Still there are some teeth to this code of conduct:

1. First time around minor issues will be raised in private.

2. On repeat cases a public reply in the discussion will enforce that respectful behavior is expected.

3. We'll ban people who don't get it.

And severe cases will be escalated much quicker.

This applies to all community communication channels (irc, mailing list and bugzilla). And as mentioned this really just is a public clarification of the rules already in place - you can't see that though since we never had to go further than step 1.

Let's keep it at that.

And in case you have a problem with an individual drm/i915 maintainer and don't want to raise it with the other one there's the Xorg BoD, linux foundation TAB and the drm upstream maintainer Dave Airlie.

02 Apr 2015 7:53am GMT

30 Mar 2015

feedplanet.freedesktop.org

Matthias Klumpp: Limba Project: Another progress report

And once again, it's time for another Limba blogpost :-)limba-small

Limba is a solution to install 3rd-party software on Linux, without interfering with the distribution's native package manager. It can be useful to try out different software versions, use newer software on a stable OS release or simply to obtain software which does not yet exist for your distribution.

Limba works distribution-independent, so software authors only need to publish their software once for all Linux distributions.

I recently released version 0.4, with which all most important features you would expect from a software manager are complete. This includes installing & removing packages, GPG-signing of packages, package repositories, package updates etc. Using Limba is still a bit rough, but most things work pretty well already.

So, it's time for another progress report. Since a FAQ-like list is easier to digest. compared to a long blogpost, I go with this again. So, let's address one important general question first:

How does Limba relate to the GNOME Sandboxing approach?

(If you don't know about GNOMEs sandboxes, take a look at the GNOME Wiki - Alexander Larsson also blogged about it recently)

First of all: There is no rivalry here and no NIH syndrome involved. Limba and GNOMEs Sandboxes (XdgApp) are different concepts, which both have their place.

The main difference between both projects is the handling of runtimes. A runtime is the shared libraries and other shared ressources applications use. This includes libraries like GTK+/Qt5/SDL/libpulse etc. XdgApp applications have one big runtime they can use, built with OSTree. This runtime is static and will not change, it will only receive critical security updates. A runtime in XdgApp is provided by a vendor like GNOME as a compilation of multiple single libraries.

Limba, on the other hand, generates runtimes on the target system on-the-fly out of several subcomponents with dependency-relations between them. Each component can be updated independently, as long as the dependencies are satisfied. The individual components are intended to be provided by the respective upstream projects.

Both projects have their individual up and downsides: While the static runtime of XdgApp projects makes testing simple, it is also harder to extend and more difficult to update. If something you need is not provided by the mega-runtime, you will have to provide it by yourself (e.g. we will have some applications ship smaller shared libraries with their binaries, as they are not part of the big runtime).

Limba does not have this issue, but instead, with its dynamic runtimes, relies on upstreams behaving nice and not breaking ABIs in security updates, so existing applications continue to be working even with newer software components.

Obviously, I like the Limba approach more, since it is incredibly flexible, and even allows to mimic the behaviour of GNOMEs XdgApp by using absolute dependencies on components.

Do you have an example of a Limba-distributed application?

Yes! I recently created a set of package for Neverball - Alexander Larsson also created a XdgApp bundle for it, and due to the low amount of stuff Neverball depends on, it was a perfect test subject.

One of the main things I want to achieve with Limba is to integrate it well with continuous integration systems, so you can automatically get a Limba package built for your application and have it tested with the current set of dependencies. Also, building packages should be very easy, and as failsafe as possible.

You can find the current Neverball test in the Limba-Neverball repository on Github. All you need (after installing Limba and the build dependencies of all components) is to run the make_all.sh script.

Later, I also want to provide helper tools to automatically build the software in a chroot environment, and to allow building against the exact version depended on in the Limba package.

Creating a Limba package is trivial, it boils down to creating a simple "control" file describing the dependencies of the package, and to write an AppStream metadata file. If you feel adventurous, you can also add automatic build instructions as a YAML file (which uses a subset of the Travis build config schema)

This is the Neverball Limba package, built on Tanglu 3, run on Fedora 21:

Limba-installed Neverball

Which kernel do I need to run Limba?

The Limba build tools run on any Linux version, but to run applications installed with Limba, you need at least Linux 3.18 (for Limba 0.4.2). I plan to bump the minimum version requirement to Linux 4.0+ very soon, since this release contains some improvements in OverlayFS and a few other kernel features I am thinking about making use of.

Linux 3.18 is included in most Linux distributions released in 2015 (and of course any rolling release distribution and Fedora have it).

Building all these little Limba packages and keeping them up-to-date is annoying…

Yes indeed. I expect that we will see some "bigger" Limba packages bundling a few dependencies, but in general this is a pretty annoying property of Limba currently, since there are so few packages available you can reuse. But I plan to address this. Behind the scenes, I am working on a webservice, which will allow developers to upload Limba packages.

This central ressource can then be used by other developers to obtain dependencies. We can also perform some QA on the received packages, map the available software with CVE databases to see if a component is vulnerable and publish that information, etc.

All of this is currently planned, and I can't say a lot more yet. Stay tuned! (As always: If you want to help, please contact me)

Are the Limba interfaces stable? Can I use it already?

The Limba package format should be stable by now - since Limba is still Alpha software, I will however, make breaking changes in case there is a huge flaw which makes it reasonable to break the IPK package format. I don't think that this will happen though, as the Limba packages are designed to be easily backward- and forward compatible.

For the Limba repository format, I might make some more changes though (less invasive, but you might need to rebuilt the repository).

tl;dr: Yes! Plase use Limba and report bugs, but keep in mind that Limba is still in an early stage of development, and we need bug reports!

Will there be integration into GNOME-Software and Muon?

From the GNOME-Software side, there were positive signals about that, but some technical obstancles need to be resolved first. I did not yet get in contact with the Muon crew - they are just implementing AppStream, which is a prerequisite for having any support for Limba[1].

Since PackageKit dropped the support for plugins, every software manager needs to implement support for Limba.


So, thanks for reading this (again too long) blogpost :) There are some more exciting things coming soon, especially regarding AppStream on Debian/Ubuntu!

[1]: And I should actually help with the AppStream support, but currently I can not allocate enough time to take that additional project as well - this might change in a few weeks. Also, Muon does pretty well already!

30 Mar 2015 7:46pm GMT

25 Mar 2015

feedplanet.freedesktop.org

Bastien Nocera: GNOME 3.16 is out!

Did you see?

It will obviously be in Fedora 22 Beta very shortly.


What happened since 3.14? Quite a bit, and a number of unfinished projects will hopefully come to fruition in the coming months.

Hardware support

After quite a bit of back and forth, automatic rotation for tablets will not be included directly in systemd/udev, but instead in a separate D-Bus daemon. The daemon has support for other sensor types, Ambient Light Sensors (ColorHug ALS amongst others) being the first ones. I hope we have compass support soon too.

Support for the Onda v975w's touchscreen and accelerometer are now upstream. Work is on-going for the Wi-Fi driver.

I've started some work on supporting the much hated Adaptive keyboard on the X1 Carbon 2nd generation.

Technical debt

In the last cycle, I've worked on triaging gnome-screensaver, gnome-shell and gdk-pixbuf bugs.

The first got merged into the second, the second got plenty of outdated bugs closed, and priorities re-evaluated as a result.

I wrangled old patches and cleaned up gdk-pixbuf. We still have architectural problems in the library for huge images, but at least we're up to a state where we know what the problems are, not being buried in Bugzilla.

Foundation building

A couple of projects got started that didn't reached maturation yet. I'm pretty happy that we're able to use gnome-books (part of gnome-documents) today to read Comic books. ePub support is coming!



Grilo saw plenty of activity. The oft requested "properties" page in Totem is closer than ever, so is series grouping.

In December, Allan and I met with the ABRT team, and we've landed some changes we discussed there, including a simple "Report bugs" toggle in the Privacy settings, with a link to the OS' privacy policy. The gnome-abrt application had a facelift, but we got somewhat stuck on technical problems, which should get solved in the next cycle. The notifications were also streamlined and simplified.



I'm a fan

Of the new overlay scrollbars, and the new gnome-shell notification handling. And I'm cheering on co-new app in 3.16, GNOME Calendar.

There's plenty more new and interesting stuff in the release, but I would just be duplicating much of the GNOME 3.16 release notes.

25 Mar 2015 3:23pm GMT

20 Mar 2015

feedplanet.freedesktop.org

Bastien Nocera: "GNOME à 15 ans" aux JdLL de Lyon



Le week-end prochain, je vais faire une petite présentation sur les quinze ans de GNOME aux JdLL.

Si les dieux de la livraison sont cléments, GNOME devrait aussi avoir une présence dans le village associatif.

20 Mar 2015 4:25pm GMT

Pekka Paalanen: Weston repaint scheduling

Now that Presentation feedback has finally landed in Weston (feedback, flags), people are starting to pay attention to the output timings as now you can better measure them. I have seen a couple of complaints already that Weston has an extra frame of latency, and this is true. I also have a patch series to fix it that I am going to propose.

To explain how the patch series affects Weston's repaint loop, I made some JSON-timeline recordings before and after, and produced some graphs with Wesgr. Here I will explain how the repaint loop works timing-wise.

Original post Feb 11, 2015.
Update Mar 20, 2015: the patches have landed in Weston.


The old algorithm

The old repaint scheduling algorithm in Weston repaints immediately on receiving the pageflip completion event. This maximizes the time available for the compositor itself to repaint, but it also means that clients can never hit the very next vblank / pageflip.

Figure 1. The old algorithm, the client paints as response to frame callbacks.


Frame callback events are sent at the "post repaint" step. This gives clients almost a full frame's time to draw and send their content before the compositor goes to "begin repaint" again. In Figure 1. you see, that if a client paints extremely fast, the latency to screen is almost two frame periods. The frame latency can never be less than one frame period, because the compositor samples the surface contents (the "repaint flush" point) immediately after the previous vblank.

Figure 2. The old algorithm, the client paints as response to Presentation feedback events.


While frame callback driven clients still get to the full frame rate, the situation is worse if the client painting is driven by presentation_feedback.presented events. The intent is to draw and show a new frame as soon as the old frame was shown. Because Weston starts repaint immediately on the pageflip completion, which is essentially the same time when Presentation feedback is sent, the client cannot hit the repaint of this frame and gets postponed to the next. This is the same two frame latency as with frame callbacks, but here the framerate is halved because the client waits for the frame to be actually shown before continuing, as is evident in Figure 2.

Figure 3. The old algorithm, client posts a frame while the compositor is idle.


Figure 3. shows a less relevant case, where the compositor is idle while a client posts a new frame ("damage commit"). When the compositor is idle graphics-wise (the gray background in the figure), it is not repainting continuously according to the output scanout cycle. To start painting again, Weston waits for an extra vblank first, then repaints, and then the new frame is shown on the next vblank. This is also a 1-2 frame period latency, but it is unrelated to the other two cases, and is not changed by the patches.

The modification to the algorithm

The modification is simple, yet perhaps counter-intuitive at first. We reduce the latency by adding a delay. The "delay before repaint" is in all the figures, and the old algorithm is essentially using a zero delay. The compositor's repaint is delayed so that clients have a chance to post a new frame before the compositor samples the surface contents.

A good amount of delay is a hard question. Too small delay and clients do not have time to act. Too long delay and the compositor itself will be in danger of missing the vblank deadline. I do not know what a good amount is or how to derive it, so I just made it configurable. You can set the repaint window length in milliseconds in weston.ini. The repaint window is the time from starting repaint to the deadline, so the delay is the frame period minus the repaint window. If the repaint window is too long for a frame period, the algorithm will reduce to the old behaviour.

The new algorithm

The following figures are made with a 60 Hz refresh and a 7 millisecond repaint window.

Figure 4. The new algorithm, the client paints as response to frame callback.


When a client paints as response to the frame callback (Figure 4), it still has a whole frame period of time to paint and post the frame. The total latency to screen is a little shorter now, by the length of the delay before compositor's repaint. It is a slight improvement.

Figure 5. The new algorithm, the client paints as response to Presentation feedback.


A significant improvement can be seen in Figure 5. A client that uses the Presentation extension to wait for a frame to be actually shown before painting again is now able to reach the full output frame rate. It just needs to paint and post a new frame during the delay before compositor's repaint. This mode of operation provides the shortest possible latency to screen as the client is able to target the very next vblank. The latency is below one frame period if the deadlines are met.

Discussion

This is a relatively simple change that should reduce display latency, but analyzing how exactly it affects things is not trivial. That is why Wesgr was born.

This change does not really allow clients to wait some additional time before painting to reduce the latency even more, because nothing tells clients when the compositor will repaint exactly. The risk of missing an unknown deadline grows the later a client paints. Would knowing the deadline have practical applications? I'm not sure.

These figures also show the difference between the frame callback and Presentation feedback. When a client's repaint loop is driven by frame callbacks, it maximizes the time available for repainting, which reduces the possibility to miss the deadline. If a client drives its repaint loop by Presentation feedback events, it minimizes the display latency at the cost of increased risk of missing the deadline.

All the above ignores a few things. First, we assume that the time of display is the point of vblank which starts to scan out the new frame. Scanning out a frame actually takes most of the frame period, it's not instantaneous. Going deeper, updating the framebuffer during scanout period instead of vblank could allow reducing latency even more, but the matter becomes complicated and even somewhat subjective. I hear some people prefer tearing to reduce the latency further. Second, we also ignore any fencing issues that might come up in practise. If a client submits a GPU job that takes a long while, there is a good chance it will cause everything to miss a deadline or more.

As usual, this work and most of the development of JSON-timeline and Wesgr were sponsored by Collabora.

PS. Latency and timing issues are nothing new. Owen Taylor has several excellent posts on related subjects in his blog.

20 Mar 2015 10:25am GMT

19 Mar 2015

feedplanet.freedesktop.org

Peter Hutterer: Lenovos X1 Carbon 3rd touchpad woes

[Update 19/03/15]: TLDR: kernel patches queued for 4.0 do not require any userspace changes.

Lenovo released a new set of laptops for 2015 with a new (old) feature: the trackpoint device has the physical buttons back. Last year's experiment apparently didn't work out so well.

What do we do in Linux with the last generation's touchpads? The kernel marks them with INPUT_PROP_TOPBUTTONPAD based on the PNPID [1]. In the X.Org synaptics driver and libinput we take that property and emulate top software buttons based on it. That took us a while to get sorted and feed into the myriad of Linux distributions out there but at some point last year the delicate balance of nature was restored and the touchpad-related rage dropped to the usual background noise.

Slow-forward to 2015 and Lenovo introduces the new series. In the absence of unnecessary creativity they are called the X1 Carbon 3rd, T450, T550, X250, W550, L450, etc. Lenovo did away with the un(der)-appreciated top software buttons and re-introduced physical buttons for the trackpoint. Now, that's bound to make everyone happy again. However, as we learned from Agent Smith, happiness is not the default state of humans so Lenovo made sure the harvest is safe.

What we expected to happen was that the trackpoint device has BTN_LEFT, BTN_MIDDLE, BTN_RIGHT and the touchpad has BTN_LEFT and is marked with INPUT_PROP_BUTTONPAD (i.e. it is a Clickpad). That is the case on the x220 generation and the T440 generation. Though the latter doesn't actually have trackpoint buttons and we emulated them in software.

On the X1 Carbon 3rd, the trackpoint has BTN_LEFT, BTN_MIDDLE, BTN_RIGHT but they never send events. The touchpad has BTN_LEFT and BTN_0, BTN_1 and BTN_2 [2]. Clicking the left button on the trackpoint generates BTN_0 on the touchpad device, clicking the right button generates BTN_1 on the touchpad device. So in short, Lenovo has decided to wire the newly re-introduced trackpoint buttons to the touchpad, not the trackpoint. [3] The middle button is currently dead, which is a kernel bug. Meanwhile we think of it as security feature - never accidentally paste your password into your IRC session again!

What does this mean for us? Neither synaptics nor evdev nor libinput currently support this so we've been busy apidae and writing patches like crazy. [Update 19/03/15]: The patches queued in Dmitry's for-linus branch re-route the trackstick buttons in the kernel through the trackstick device. To userspace, the trackstick thus looks the same as in the past and no userspace patches are needed. I've reverted the synaptics patch already, the libinput patch will follow. The patch goes into the kernel and udev.... The two patches needed go into the kernel and udev, and libinput. No, the three patches needed go into the kernel, udev and libinput, and synaptics. The four patches, no, wait. Amongst the projects needing patches are the kernel, udev, libinput and synaptics. I'll try again:

With those put together, things pretty much work as they're supposed to. libinput handles middle button scrolling as well this way but synaptics won't, much for the same reason it didn't work in the previous generation: synaptics can't talk to evdev and vice versa. And given that synaptics is on life support or in pallative care, depending how you look at it, I recommend not holding your breath for a fix. Otherwise you may join it quickly.

Note that all the patches are fresh off the presses and there may be a few bits changing before they are done. If you absolutely can't live without the trackpoint's buttons you can work around it by disabling the synaptics kernel driver until the patches have trickled down to your distribution.

The tracking bug for all this is Bug 88609. Feel free to CC yourself on it. Bring popcorn.

Final note: I haven't seen logs from the T450, T550, ... devices yet yet so this is so far only confirmed on the X1 Carbon so far. Given the hardware is essentially identical I expect it to be true for the rest of the series though.

[1] We also apply quirks for the 2013 generation because the firmware was buggy - a problem Synaptics Inc. has since fixed (but currently gives us slight headaches).
[2] It is also marked with INPUT_PROP_TOPBUTTONPAD which is a bug. It uses a new PNPID but one that was in the range we previously believed was for pads without trackpoint buttons. That's an an easy thing to fix.
[3] The reason for that seems to be HW design: this way they can keep the same case/keyboard and just swap the touchpad bits.
[4] synaptics is old enough to support dedicated scroll buttons. Buttons that used to send BTN_0 and BTN_1 and are thus interpreted as scroll up/down event.

19 Mar 2015 12:48am GMT

16 Mar 2015

feedplanet.freedesktop.org

Dave Airlie: virgil3d local rendering test harness

So I've still been working on the virgil3d project along with part time help from Marc-Andre and Gerd at Red Hat, and we've been making steady progress. This post is about a test harness I just finished developing for adding and debugging GL features.

So one of the more annoying issuess with working on virgil has been that while working on adding 3D renderer features or trying to track down a piglit failure, you generally have to run a full VM to do so. This adds a long round trip in your test/development cycle.

I'd always had the idea to do some sort of local system renderer, but there are some issues with calling GL from inside a GL driver. So my plan was to have a renderer process which loads the renderer library that qemu loads, and a mesa driver that hooks into the software rasterizer interfaces. So instead of running llvmpipe or softpipe I have a virpipe gallium wrapper, that wraps my virgl driver and the sw state tracker via a new vtest winsys layer for virgl.

So the virgl pipe driver sits on top of the new winsys layer, and the new winsys instead of using the Linux kernel DRM apis just passes the commands over a UNIX socket to a remote server process.

The remote server process then uses EGL and the renderer library, forks a new copy for each incoming connection and dies off when the rendering is done.

The final rendered result has to be read back over the socket, and then the sw winsys is used to putimage the rendering onto the screen.

So this system is probably going to be slower in raw speed terms, but for developing features or debugging fails it should provide an easier route without the overheads of the qemu process. I was pleasantly surprised it only took two days to pull most of this test harness together which was neat, I'd planned much longer for it!

The code lives in two halves.
http://cgit.freedesktop.org/~airlied/virglrenderer
http://cgit.freedesktop.org/~airlied/mesa virgl-mesa-driver

[updated: pushed into the main branches]

Also the virglrenderer repo is standalone now, it also has a bunch of unit tests in it that are run using valgrind also, in an attempt to lock down some more corners of the API and test for possible ways to escape the host.

16 Mar 2015 10:30pm GMT

06 Mar 2015

feedplanet.freedesktop.org

Peter Hutterer: Why libinput doesn't support edge scrolling

libinput supports edge scrolling since version 0.7.0. Whoops, how does the post title go with this statement? Well, libinput supports edge scrolling, but only on some devices and chances are your touchpad won't be one of them. Bug 89381 is the reference bug here.

First, what is edge scrolling? As the libinput documentation illustrates, it is scrolling triggered by finger movement within specific regions of the touchpad - the left and bottom edges for vertical and horizontal scrolling, respectively. This is in contrast to two-finger scrolling, triggered by a two-finger movement, anywhere on the touchpad. synaptics had edge scrolling since at least 2002, the earliest commit in the repo. Back then we didn't have multitouch-capable touchpads, these days they're the default and you'd be struggling to find one that doesn't support at least two fingers. But back then edge-scrolling was the default, and touchpads even had the markings for those scroll edges painted on.

libinput adds a whole bunch of features to the touchpad driver, but those features make it hard to support edge scrolling. First, libinput has quite smart software button support. Those buttons are usually on the lowest ~10mm of the touchpad. Depending on finger movement and position libinput will send a right button click, movement will be ignored, etc. You can leave one finger in the button area while using another finger on the touchpad to move the pointer. You can press both left and right areas for a middle click. And so on. On many touchpads the vertical travel/physical resistance is enough to trigger a movement every time you click the button, just by your finger's logical center moving.

libinput also has multi-direction scroll support. Traditionally we only sent one scroll event for vertical/horizontal at a time, even going as far as locking the scroll direction. libinput changes this and only requires a initial threshold to start scrolling, after that the caller will get both horizontal and vertical scroll information. The reason is simple: it's context-dependent when horizontal scrolling should be used, so a global toggle to disable doesn't make sense. And libinput's scroll coordinates are much more fine-grained too, which is particularly useful for natural scrolling where you'd expect the content to move with your fingers.

Finally, libinput has smart palm detection. The large majority of palm touches are along the left and right edges of the touchpad and they're usually indistinguishable from finger presses (same pressure values for example). Without palm detection some laptops are unusable (e.g. the T440 series).

These features interfere heavily with edge scrolling. Software button areas are in the same region as the horizontal scroll area, palm presses are in the same region as the vertical edge scroll area. The lower vertical edge scroll zone overlaps with software buttons - and that's where you would put your finger if you'd want to quickly scroll up in a document (or down, for natural scrolling). To support edge scrolling on those touchpads, we'd need heuristics and timeouts to guess when something is a palm, a software button click, a scroll movement, the start of a scroll movement, etc. The heuristics are unreliable, the timeouts reduce responsiveness in the UI. So our decision was to only provide edge scrolling on touchpads where it is required, i.e. those that cannot support two-finger scrolling, those with physical buttons. All other touchpads provide only two-finger scrolling. And we are focusing on making 2 finger scrolling good enough that you don't need/want to use edge scrolling (pls file bugs for anything broken)

Now, before you get too agitated: if edge scrolling is that important to you, invest the time you would otherwise spend sharpening pitchforks, lighting torches and painting picket signs into developing a model that allows us to do reliable edge scrolling in light of all the above, without breaking software buttons, maintaining palm detection. We'd be happy to consider it.

06 Mar 2015 1:23am GMT

Peter Hutterer: libinput scroll sources

This feature got merged for libinput 0.8 but I noticed I hadn't blogged about it. So belatedly, here is a short description of scroll sources in libinput.

Scrolling is a fairly simple concept. You move the mouse wheel and the content moves down. Beyond that the details get quite nitty, possibly even gritty. On touchpads, scrolling is emulated through a custom finger movement (e.g. two-finger scrolling). A mouse wheel moves in discrete steps of (usually) 15 degrees, a touchpad's finger movement is continuous (within the device physical resolution). Another scroll method is implemented for the pointing stick: holding the middle button down while moving the stick will generate scroll events. Like touchpad scroll events, these events are continuous. I'll ignore natural scrolling in this post because it just inverts the scroll direction. Kinetic scrolling ("fling scrolling") is a comparatively recent feature: when you lift the finger, the final finger speed determines how long the software will keep emulating scroll events. In synaptics, this is done in the driver and causes all sorts of issues - the driver may keep sending scroll events even while you start typing.

In libinput, there is no kinetic scrolling at all, what we have instead are scroll sources. Currently three sources are defined, wheel, finger and continuous. Wheel is obvious, it provides the physical value in degrees (see this post) and in discrete steps. The "finger" source is more interesting, it is the hint provided by libinput that the scroll event is caused by a finger movement on the device. This means that a) there are no discrete steps and b) libinput guarantees a terminating scroll event when the finger is lifted off the device. This enables the caller to implement kinetic scrolling: simply wait for the terminating event and then calculate the most recent speed. More importantly, because the kinetic scrolling implementation is pushed to the caller (who will push it to the client when the Wayland protocol for this is ready), kinetic scrolling can be implemented on a per-widget basis.

Finally, the third source is "continuous". The only big difference to "finger" is that we can't guarantee that the terminating event is sent, simply because we don't know if it will happen. It depends on the implementation. For the caller this means: if you see a terminating scroll event you can use it as kinetic scroll information, otherwise just treat it normally.

For both the finger and the continuous sources the scroll distance provided by libinput is equivalent to "pixels", i.e. the value that the relative motion of the device would otherwise send. This means the caller can interpret this depending on current context too. Long-term, this should make scrolling a much more precise and pleasant experience than the old X approach of "You've scrolled down by one click".

The API documentation for all this is here: http://wayland.freedesktop.org/libinput/doc/latest/group__event__pointer.html, search for anything with "pointer_axis" in it.

06 Mar 2015 12:56am GMT

28 Feb 2015

feedplanet.freedesktop.org

Corbin Simpson: Monte: Typhon Profiling

Work continues on Typhon. I've recently yearned for a way to study the Monte-level call stacks for profiling feedback. After a bit of work, I think that I've built some things that will help me.

My initial desire was to get the venerable perf to work with Typhon. perf's output is easy to understand, with a little practice, and describes performance problems pretty well.

I'm going to combine this with Brendan Gregg's very cool flame graph system for visually summarizing call stacks, in order to show off how the profiling information is being interpreted. I like flame graphs and they were definitely a goal of this effort.

Maybe perf doesn't need any help. I was able to use it to isolate some Typhon problems last week. I'll use my Monte webserver for all of these tests, putting it under some stress and then looking at the traces and flame graphs.

Now seems like a good time to mention that my dorky little webserver is not production-ready; it is literally just good enough to respond to Firefox, siege, and httperf with a 200 OK and a couple bytes of greeting. This is definitely a microbenchmark.

With that said, let's look at what perf and flame graphs say about webserver performance:

An unhelpful HTTP server profile

You can zoom in on this by clicking. Not that it'll help much. This flame graph has two big problems:

  1. Most of the time is spent in the mysterious "[unknown]" frames. I bet that those are just caused by the JIT's code generation, but perf doesn't know that they're meaningful or how to label them.
  2. The combination of JIT and builtin objects with builtin methods result in totally misleading call stacks, because most object calls don't result in new methods being added to the stack.

I decided to tackle the first problem first, because it seemed easier. Digging a bit, I found a way to generate information on JIT-created code objects and get that information to perf via a temporary file.

The technique is only documented via tribal knowledge and arcane blog entries. (I suppose that, in this regard, I am not helping.) It is described both in this kernel patch implementing the feature, and also in this V8 patch. The Typhon JIT hooks show off my implementation of it.

So, does it work? What does it look like?

I didn't upload a picture of this run, because it doesn't look different from the earlier graph! The big [unknown] frames aren't improved at all. Sufficient digging will reveal the specific newly-annotated frames being nearly never called. Clearly this was not a winning approach.

At this point, I decided to completely change my tack. I wrote a completely new call stack tracer inside Typhon. I wanted to do a sampling profiler, but sampling is hard in RPython. The vmprof project might fix that someday. For now, I'll have to do a precise profiler.

Unlabeled HTTP server profile with correct atoms

I omitted the coding montage. Every time a call is made from within SmallCaps, the profiler takes a time measurement before and after the call. This is pretty great! But can we get more useful names?

Names in Monte are different from names in, say, Python or Java. Python and Java both have class names. Since Monte does not have classes, Monte doesn't have a class name. A compromise which we accept here is to use the "display name" of the class, which will be the pattern used to bind a user-level object literal, and will be the name of the class for all of the runtime object classes. This is acceptable.

HTTP server profile with correct atoms and useful display names

Note how the graphs are differently shaped; all of the frames are being split out properly and the graph is more detailed as a result. The JIT is still active during this entire venture, and it'd be cool to see what the JIT is doing. We can use RPython's rpython.rlib.jit.we_are_jitted() function to mark methods as being JIT'd, and we can ask the flame graph generator to colorize them.

HTTP server profile with JIT COLORS HOLY FUCK I CANNOT BELIEVE IT WORKS

Oh man! This is looking pretty cool. Let's colorize the frames that are able to sit directly below JIT entry points. I do this with a heuristic (regular expression).

THE COLORS NEVER STOP, CAN'T STOP, WON'T STOP

This isn't even close to the kind of precision and detail from the amazing Java-on-Illumos profiles on Gregg's site, but it's more than enough to help my profiling efforts.

28 Feb 2015 4:13pm GMT

26 Feb 2015

feedplanet.freedesktop.org

Bastien Nocera: Another fake flash story

I recently purchased a 64GB mini SD card to slot in to my laptop and/or tablet, keeping media separate from my home directory pretty full of kernel sources.

This Samsung card looked fast enough, and at 25€ include shipping, seemed good enough value.


Hmm, no mention of the SD card size?


The packaging looked rather bare, and with no mention of the card's size. I opened up the packaging, and looked over the card.

Made in Taiwan?


What made it weirder is that it says "made in Taiwan", rather than "Made in Korea" or "Made in China/PRC". Samsung apparently makes some cards in Taiwan, I've learnt, but I didn't know that before getting suspicious.

After modifying gnome-multiwriter's fake flash checker, I tested the card, and sure enough, it's an 8GB card, with its firmware modified to show up as 67GB (67GB!). The device (identified through the serial number) is apparently well-known in swindler realms.

Buyer beware, do not buy from "carte sd" on Amazon.fr, and always check for fake flash memory using F3 or h2testw, until udisks gets support for this.

Amazon were prompt in reimbursing me, but the Comité national anti-contrefaçon and Samsung were completely uninterested in pursuing this further.

In short:

26 Feb 2015 10:57am GMT

23 Feb 2015

feedplanet.freedesktop.org

Christian Schaller: Reliable BIOS updates in Fedora

Some years ago I bought myself a new laptop, deleted the windows partition and installed Fedora on the system. Only to later realize that the system had a bug that required a BIOS update to fix and that the only tool for doing such updates was available for Windows only. And while some tools and methods have been available from a subset of vendors, BIOS updates on Linux has always been somewhat of hit and miss situation. Well luckily it seems that we will finally get a proper solution to this problem.
Peter Jones, who is Red Hat's representative to the UEFI working group and who is working on making sure we got everything needed to support this on Linux, approached me some time ago to let me know of the latest incoming update to the UEFI standard which provides a mechanism for doing BIOS updates. Which means that any system that supports UEFI 2.5 will in theory be one where we can initiate the BIOS update from Linux. So systems supporting this version of the UEFI specs is expected to become available through the course of this year and if you are lucky your hardware vendor might even provide a BIOS update bringing UEFI 2.5 support to your existing hardware, although you would of course need to do that one BIOS update in the old way.

So with Peter's help we got hold of some prototype hardware from our friends at Intel which already got UEFI 2.5 support. This hardware is currently in the hands of Richard Hughes. Richard will be working on incorporating the use of this functionality into GNOME Software, so that you can do any needed BIOS updates through GNOME Software along with all your other software update needs.

Peter and Richard will as part of this be working to define a specification/guideline for hardware vendors for how they can make their BIOS updates available in a manner we can consume and automatically check for updates. We will try to align ourselves with the requirements from Microsoft in this area to allow the vendors to either use the exact same package for both Windows and Linux or at least only need small changes to them. We can hopefully get this specification up on freedesktop.org for wider consumption once its done.

I am also already speaking with a couple of hardware vendors to see if we can pilot this functionality with them, to both encourage them to support UEFI 2.5 as quickly as possible and also work with them to figure out the finer details of how to make the updates available in a easily consumable fashion.

Our hope here is that you eventually can get almost any hardware and know that if you ever need a BIOS update you can just fire up Software and it will tell you what if any BIOS updates are available for your hardware, and then let you download and install them. For people running Fedora servers we have had some initial discussions about doing BIOS updates through Cockpit, in addition of course to the command line tools that Peter is writing for this.

I mentioned in an earlier blog post that one of our goals with the Fedora Workstation is to drain the swamp in terms of fixing the real issues making using a Linux desktop challenging, well this is another piece of that puzzle and I am really glad we had Peter working with the UEFI standards group to ensure the final specification was useful also for Linux users.

Anyway as soon as I got some data on concrete hardware that will support this I will make sure to let you know.

23 Feb 2015 4:17pm GMT

22 Feb 2015

feedplanet.freedesktop.org

Rob Clark: a4xx shaping up

So, I finally figured out the bug that was causing some incorrect rendering in xonotic (and, it turns out, to be the same bug plaguing a lot of other games/webgl/etc). The fix is pushed to upstream mesa master (and I guess I should probably push it to the 10.5 stable branch too). Now that xonotic renders correctly, I think I can finally call freedreno a4xx support usable:



Also, for fun, a little comparison between the ifc6540 board (snapdragon 805, aka apq8084), and my laptop (i5-4310U). Both have 1920x1080 resolution, both running gnome-shell and firefox (with identical settings). Laptop is fedora f21 while ifc6540 is rawhide), but it is quite close to an apples-to-apples comparision:



Obviously not a rigorous benchmark, so please don't read too much into the results. The intel is still faster overall (as it should be at it's size/price/power budget), but amazing that the gap is becoming so small between something that can be squeezed into a cell phone and dedicated laptop class chips.

22 Feb 2015 3:41pm GMT

21 Feb 2015

feedplanet.freedesktop.org

Samuel Pitoiset: Reverse engineering Windows or Linux PCI drivers with Intel VT-d and QEMU – Part 1

Today, I will describe a new way to reverse engineer PCI drivers by creating a PCI passthrough with a QEMU virtual machine. In this article, I will show you how to use the Intel VT-d technology in order to trace memory mapped input/output (MMIO) accesses of a QEMU VM. As a member of Nouveau community, this howto will only be focused on the NVIDIA's proprietary driver but it should be pretty similar for all PCI drivers.

Introduction

Reverse engineering the NVIDIA's proprietary driver is not an easy task, especially on Windows because we have no support for both mmiotrace, a toolbox for tracing memory mapped I/O access within the Linux kernel, and valgrind-mmt which allows tracing application accesses to mmaped memory.

When I started to reverse engineer NVIDIA Perfkit on Windows (for graphics performance counters) in-between the Google Summer of Code 2013 and 2014, I wrote some tools for dumping the configuration of these performance counters, but it was very painful to find multiplexers because I couldn't really trace MMIO accesses. I would have liked to use Intel VT-d but my old computer didn't support that recent technology, but recently I got a new computer and my life has changed. ;)

But what is VT-d and how to use it with QEMU ?

An input/output memory management unit (IOMMU) allows guest virtual machines to directly use peripheral devices, such as Ethernet, accelerated graphics cards, through DMA and interrupt remapping. This is called VT-d at Intel and AMD-Vi at AMD.

QEMU allows to use that technology through the VFIO driver which is an IOMMU/device agnostic framework for exposing direct device access to userspace, in a secure, IOMMU protected environment. In other words, this allows safe, non-privileged, userspace drivers. Initially developed by Cisco, VFIO is now maintened by Alex Williamson at Red Hat.

In this howto, I will use Fedora as guest OS but whatever you use it should work for both Linux and Windows OS. Let's get start.

Tested hardware

Motherboard: ASUS B85 PRO GAMER

CPU: Intel Core i5-4460 3.20GHz

GPU: NVIDIA GeForce 210 (host) and NVIDIA GeForce 9500 GT (guest)

OS: Arch Linux (host) and Fedora 21 (guest)

Prerequisites

Your CPU needs to support both virtualization and IOMMU (Intel VT-d technology, Core i5 at least). You will also need two NVIDIA GPUs and two monitors, or one with two different inputs (one plugged into your host GPU, one into your guest GPU). I would also recommend you to have a separate keyboard and mouse for the guest OS.

Step 1: Hardware setup

Check if your CPU supports virtualization.

egrep -i '^flags.*(svm|vmx)' /proc/cpuinfo

If so, enable CPU virtualization support and Intel VT-d from the BIOS.

Step 2: Kernel config

1) Modify kernel config
Device Drivers --->
    [*] IOMMU Hardware Support  --->
        [*]   Support for Intel IOMMU using DMA Remapping Devices
        [*]   Support for Interrupt Remapping
Device Drivers --->
    [*] VFIO Non-Privileged userspace driver framework  --->
        [*]   VFIO PCI support for VGA devices
Bus options (PCI etc.) --->
    [*] PCI Stub driver
2) Build kernel
3) Reboot, and check if your system has support for both IOMMU and DMA remapping
dmesg | grep -e IOMMU -e DMAR
[    0.000000] ACPI: DMAR 0x00000000BD9373C0 000080 (v01 INTEL  HSW      00000001 INTL 00000001)
[    0.019360] dmar: IOMMU 0: reg_base_addr fed90000 ver 1:0 cap d2008c20660462 ecap f010da
[    0.019362] IOAPIC id 8 under DRHD base  0xfed90000 IOMMU 0
[    0.292166] DMAR: No ATSR found
[    0.292235] IOMMU: dmar0 using Queued invalidation
[    0.292237] IOMMU: Setting RMRR:
[    0.292246] IOMMU: Setting identity map for device 0000:00:14.0 [0xbd8a6000 - 0xbd8b2fff]
[    0.292269] IOMMU: Setting identity map for device 0000:00:1a.0 [0xbd8a6000 - 0xbd8b2fff]
[    0.292288] IOMMU: Setting identity map for device 0000:00:1d.0 [0xbd8a6000 - 0xbd8b2fff]
[    0.292301] IOMMU: Prepare 0-16MiB unity mapping for LPC
[    0.292307] IOMMU: Setting identity map for device 0000:00:1f.0 [0x0 - 0xffffff]

!!! If you have no output, you have to fix this before continuing !!!

Step 3: Build QEMU

git clone git://git.qemu-project.org/qemu.git --depth 1
cd qemu
./configure --python=/usr/bin/python2 # Python 3 is not yet supported
make && make install

You can also install QEMU from your favorite package manager, but I would recommend you to get the source code if you want to enable VFIO tracing support.

Step 4: Unbind the GPU with pci-stub

According to my hardware config, I have two NVIDIA GPUs, so blacklisting the Nouveau kernel module is not so good. Instead, I will use pci-stub in order to unbind the GPU which will be assigned to the guest OS.

NOTE: If pci-stub was built as a module, you'll need to modify /etc/mkinitcpio.conf, add pci-stub in the MODULES section, and update your initramfs.

lspci
01:00.0 VGA compatible controller: NVIDIA Corporation GT218 [GeForce 210] (rev a2)
05:00.0 VGA compatible controller: NVIDIA Corporation G96 [GeForce 9500 GT] (rev a1)
lspci -n
01:00.0 0300: 10de:0a65 (rev a2) # GT218
05:00.0 0300: 10de:0640 (rev a1) # G96

Now add the following kernel parameter to your bootloader.

pci-stub.ids=10de:0640

Reboot, and check.

dmesg | grep pci-stub
[    0.000000] Command line: BOOT_IMAGE=/vmlinuz-nouveau root=UUID=5f64607c-5c72-4f65-9960-d5c7a981059e rw quiet pci-stub.ids=10de:0640
[    0.000000] Kernel command line: BOOT_IMAGE=/vmlinuz-nouveau root=UUID=5f64607c-5c72-4f65-9960-d5c7a981059e rw quiet pci-stub.ids=10de:0640
[    0.295763] pci-stub: add 10DE:0640 sub=FFFFFFFF:FFFFFFFF cls=00000000/00000000
[    0.295768] pci-stub 0000:05:00.0: claimed by stub

Step 5: Bind the GPU with VFIO

Now, it's time to bind the GPU (the G96 card in this example) with VFIO in order to pass through it to the VM. You can use this script to make life easier:

#!/bin/bash

modprobe vfio-pci

for dev in "$@"; do
        vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
        device=$(cat /sys/bus/pci/devices/$dev/device)
        if [ -e /sys/bus/pci/devices/$dev/driver ]; then
                echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
        fi
        echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done

Bind the GPU:

./vfio-bind.sh 0000:05:00.0 # G96

Step 6: Testing KVM VGA-Passthrough

Let's test if it works, as root:

qemu-system-x86_64 \
    -enable-kvm \
    -M q35 \
    -m 2G \
    -cpu host, kvm=off \
    -device vfio-pci,host=05:00.0,multifunction=on,x-vga=on

If it works fine, you should see a black QEMU window with the message "Guest has not initialized the display (yet)". You will need to pass -vga none, otherwise it won't work. I'll show you all the options I use a bit later.

NOTE: kvm=off is required for some recent NVIDIA proprietary drivers because it won't be loaded if it detects KVM…

Step 7: Add USB support

At this step, we have assigned the GPU to the virtual machine, but it would be a good idea to be able to use that guest OS with a keyboard, for example. To do this, we need to add USB support to the VM. The preferred way is to pass through an entire USB controller like we already did for the GPU.

lspci | grep USB
00:14.0 USB controller: Intel Corporation 8 Series/C220 Series Chipset Family USB xHCI (rev 05)
00:1a.0 USB controller: Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #2 (rev 05)
00:1d.0 USB controller: Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #1 (rev 05)

Add the following line to QEMU, example for 00:14.0:

-device vfio-pci,host=00:14.0,bus=pcie.0

Before trying USB support inside the VM, you need to assign that USB controller to VFIO, but you will lose your keyboard and your mouse from the host in case they are connected to that controller.

./vfio-bind.sh 0000:00:14.0

In order to re-enable the USB support from the host, you will need to unbind the controller, and to bind it to xhci_hcd.

echo 0000:00:14.0 > /sys/bus/pci/drivers/vfio-pci/unbind
echo 0000:00:14.0 > /sys/bus/pci/drivers/xhci_hcd/bind

If you get an error with USB support, you might simply try a different controller, or try to assign USB devices by ID.

Step 8: Install guest OS

Now, it's time to install the guest OS. I installed Fedora 21 because it's just not possible to run Arch Linux inside QEMU due to a bug in syslinux… Whatever, install your favorite Linux OS and go ahead. I would also recommend to install envytools (a collection of tools developed by the members of the Nouveau community) in order to easily test the tracing support.

You can use the script below to launch a VM with VGA and USB passthrough, and all the stuff we need.

#!/bin/bash

modprobe vfio-pci

vfio_bind()
{
    dev="$1"
        vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
        device=$(cat /sys/bus/pci/devices/$dev/device)
        if [ -e /sys/bus/pci/devices/$dev/driver ]; then
                echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
        fi
        echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
}

# Bind devices.
modprobe vfio-pci
vfio_bind 0000:05:00.0  # GPU (NVIDIA G96)
vfio_bind 0000:00:14.0  # USB controller

qemu-system-x86_64 \
    -enable-kvm \
    -M q35 \
    -m 2G \
    -hda fedora.img \
    -boot d \
    -cpu host,kvm=off \
    -vga none \
    -device vfio-pci,host=05:00.0,multifunction=on,x-vga=on \
    -device vfio-pci,host=00:14.0,bus=pcie.0


# Restore USB controller
echo 0000:00:14.0 > /sys/bus/pci/drivers/vfio-pci/unbind
echo 0000:00:14.0 > /sys/bus/pci/drivers/xhci_hcd/bind

Step 9: Enable VFIO tracing support for QEMU

1) Configure QEMU to enable tracing

Enable the stderr trace backend. Please refer to docs/tracing.txt if you want to change the backend.

./configure --python=/usr/bin/python2 --enable-trace-backends=stderr
2) Disable MMAP support

Disabling MMAP support uses the slower read/write accesses to MMIO space that will get traced. To do this, open the file include/hw/vfio/vfio-common.h, and change #define VFIO_ALLOW_MMAP from 1 to 0.

 /* Extra debugging, trap acceleration paths for more logging */
-#define VFIO_ALLOW_MMAP 1
+#define VFIO_ALLOW_MMAP 0

Re-build QEMU.

3) Add the trace points you want to observe

Create a events.txt file and add the vfio_region_write trace point which dumps MMIO read/write accesses of the GPU.

echo "vfio_region_write" > events.txt

VFIO tracing support is now enabled and configured, really easy, huh?

Thanks to Alex Williamson for these hints.

Step 10: Trace MMIO write accesses

Let's now test VFIO tracing support. Enable events tracing by adding the following line to the script which launchs the VM.

-trace events=events.txt

Launch the VM. You should see lot of traces from the standard error output, this is a good news.

Open a terminal in the VM, go to the directory where envytools has been built, and run (as root) the following command.

./nvahammer 0xa404 0xdeadbeef

This command writes a 32-bit value (0xdeadbeef) to the MMIO register at 0xa404 and repeats the write in an infinite loop. It needs to be manually aborted.

Go back to the host, and you should see the following traces if it works fine.

12347@1424299207.289770:vfio_region_write  (0000:05:00.0:region0+0xa404, 0xdeadbeef, 4)
12347@1424299207.289774:vfio_region_write  (0000:05:00.0:region0+0xa404, 0xdeadbeef, 4)
12347@1424299207.289778:vfio_region_write  (0000:05:00.0:region0+0xa404, 0xdeadbeef, 4)

In this example, we have only traced MMIO write accesses, but of course, if you want to trace read accesses, you just have to change vfio_region_write to vfio_region_read.

Congratulations!

In this article I showed you how to trace MMIO accesses using a PCI passthrough with QEMU, Intel VT-d and VFIO. However, all PCI accesses are currently traced including USB controller and this is not ideal unlike mmiotrace which only dumps accesses for one peripheral. It would be also a good idea to have the same format as mmiotrace in order to use the decoding tools we already have for it in envytools.

Future work

- do not trace all PCI accesses (device and subrange address filtering)

- VFIO traces to the mmiotrace format

- compare performance when tracing support is enabled or not

Related ressources

KVM VGA-Passthrough on ArchLinux

VGA-Passthrough on Debian

VFIO documentation

QEMU VFIO tracing documentation


21 Feb 2015 11:13am GMT

18 Feb 2015

feedplanet.freedesktop.org

Luc Verhaegen: FOSDEM and early Tamil driver work.

FOSDEM 2015

It was another great FOSDEM this year. Even with their 5-10.000 attendants, the formula of being absolutely free, with limited sponsorship, and while only making small changes each year is an absolute winner. There is just no conference which comes even close to FOSDEM.

For those on ICE14 on Friday, the highspeed train from Frankfurt to Brussels south at 14:00, who were so nice to participate in my ad-hoc visitor count: 66. I counted 66 people, but i might have skipped a few as people were sometimes too engrossed in their laptops to hear me over their headphones. On a ~400 seat train, that's a pretty high number, and i never see the same level of geekiness on the Frankfurt to Brussels trains as on the Friday before FOSDEM. If it didn't sound like an organizational nightmare, it might have been a good idea to talk to DB and get a whole carriage reserved especially for FOSDEM goers.

With the Graphics DevRoom we returned to the K building this year, and i absolutely love the cozy 80 people classrooms we have there. With good airflow, freely movable tables, and an easy way to put in my powersockets, i have an easy time as a devroom organizer. While some speakers probably prefer bigger rooms to have a higher number of attendants, there is nothing like the more direct interaction of the rooms in the K buildings. With us sitting on the top floor, we also only had people who were very actively interested in the topics of the graphics devroom.

Despite the fact that FOSDEM has no equal, and anyone who does anything with open source software in the European Union should attend, i always have a very difficult time recruiting a full set of speakers for the graphics DevRoom. Perhaps the biggest reason for this is the fact that it is a free conference, and it lacks the elitarian status of a paid-for conference. Everyone can attend your talk, even people who do not work on the kernel or on graphics drivers, and the potential speaker might feel as if he needs to waste his time on people who are below his own perceived station. Another reason may be that it is harder to convince the beancounters to sponsor a visit to a free conference. In that case, if you live in the European Union and when you are paid to do open source software, you should be able to afford going to the must-visit FOSDEM by yourself.

As for next year, i am not sure whether there will be a graphics devroom again. Speaker count really was too low and perhaps it is time for another hiatus. Perhaps it is once again time to show people that talking in a devroom at FOSDEM truly is a privilege, and not to be taken for granted.

Tamil "Driver" talk.

My talk this year, or rather, my incoherent mumble finished off with a demo, was about showing my work on the ARM Mali Midgard GPUs. For those who had to endure it, my apologies for my ill-preparedness; i poured all my efforts into the demo (which was finished on Wednesday), and spent too much time doing devroom stuff (which ate Thursday) and of course in drinking up, ahem, the event that is FOSDEM (which ate the rest of the weekend). I will try to make up for it now in this blog post.

Current Tamil Status.

As some might remember, in September and October 2013, i did some preliminary work on the Mali T-series. I spent about 3 to 3.5 weeks building the infrastructure to capture the command stream and replay it. At the same time I also dug deep into the Mali binary driver to expose the binary shader compiler. These two feats gave me all the prerequisites for doing the job of reverse engineering the command stream of the Mali Midgard.

During the Christmas holidays of 2014 and in January 2015, i spent my time building a command stream parser. This is a huge improvement over my work on lima, where i only ended doing so later on in the process (while bringing up Q3A). As i built up the capabilities of this parser, i threw ever more complex GLES2 programs at it. A week before FOSDEM, my parser was correctly handling multiple draws and frames, uniforms, attributes, varyings and textures. Instead of having raw blobs of memory, i had C structs and tables, allowing me to easily see the differences between streams.

I then took the parsed result of my most complex test and slowly turned that into actual C code, using the shader binary produced by the binary compiler, and adding a trivial sequential memory allocator. I then added rotation into the mix, and this is the demo as seen on FOSDEM (and now uploaded to youtube).

All the big things are known.

For textures. I only have simple texture support at this time, no mipmapping nor cubemapping yet, and only RGB565 and RGBA32 are supported at this time. I also still have not figured out how to disable swizzling, instead i re-use the texture swizzling code from lima, the only place where I was able to re-use code in tamil. This missing knowledge is just some busywork, and a bit of coding away.

As for programs, while both the Mali Utgard (M-series) and Midgard (T-series) binary compilers output in a format called MBS (Mali Binary Shader), the contents of each file is significantly different. I had no option but to rewrite the MBS parser for tamil.

Instead of rewriting the vertex shaders binaries like ARMs binary driver does, i reorder the entries in the attribute descriptor table to match the order as described by the shader compiler output. This avoids adding a whole lot of logic to handle this correctly, even though MBS now describes which bits to alter in the binary. I still lay uniforms, attributes and varyings out by hand though, i similarly have only limited knowledge of typing at this point. This mostly is a bit of busywork of writing up the actual logic, and trying out a few different things.

I know only very few things about most of the GL state. Again, mostly busywork with a bit of testing and coding up the results. And while many values left and right are still magic, nothing big is hiding any more.

Unlike lima, i am refraining from building up more infrastructure (than necessary to show the demo) outside of Mesa. The next step really is writing up a mesa driver. Since my lima driver for mesa was already pretty advanced, i should be able to re-use a lot of the knowledge gained there, and perhaps some code.

The demo

The demo was shown on a Samsung ARM Chromebook, with a kernel and linux installation from september 2013 (when i brought up cs capture and exposed the shader compiler). The exynos KMS driver on this 3.4.0 kernel is terrible. It only accepts a few fixed resolutions (as if I never existed and modesetting wasn't a thing), and then panics when you even look at it. Try to leave X while using HDMI: panic. Try to use a KMS plane to display the resulting render: panic.

In the end, i used fbdev and memcpy the rendered frame over to the console. On this kernel, i cannot even disable the console, so some of the visible flashing is the console either being overwritten by or overwriting the copied render.

The youtube video shows a capture of the Chromebooks built in LCD, at 1280x720 on a 1366x768 display. At FOSDEM, i was lucky that the projector accepted the 1280x720 mode the exynos hdmi driver produced. My dumb HDMI->VGA converter (which makes the image darker) was willing to pass this through directly. I have a more intelligent HDMI->VGA adapter which also does scaling and which keeps colours nice, but that one just refused the output of the exynos driver. The video that was captured in our devroom probably does not show the demo correctly, as that should've been at 1024x768.

The demo shows 3 cubes rotating in front of the milky way. It is 4 different draws, using 3 different textures, and 3 different programs. These cubes currently rotate at 47fps, with the memcpy. During the talk, the chromebook slowed down progressively down to 26fps and even 21fps at one point, but i have not seen that behaviour before or since. I do know of an issue that makes the demo fail at frame 79530, which is 100% reproducible. I still need to track this down, it probably is an issue with my job handling code.

linux-exynos.org

With Lima and Tamil i am in a very unique position. Unlike on adreno, tegra or Videocore, i have to deal with many different SoCs. Apart from the difference in kernel GPU drivers, i also have to deal with differences in display drivers and run into a whole new world of hurts every time i move over to a new target device. The information for doing a proper linux installation on an android or chrome device is usually dispersed, not up to date, and not too good, and i get to do a lot of the legwork for myself every time, knowing full well that a lot of others have done so already but couldn't be bothered to document things (hence my role in the linux-sunxi community).

The ARM chromebook and its broken kms driver is much of the same. Last FOSDEM i complained how badly supported and documented the Samsung ARM chromebook is, despite its popularity, and appealed for more linux-sunxi style, SoC specific communities, especially since I, as a graphics driver developer, cannot go and spend as much time in each and every of the SoC projects as i have done with sunxi.

During the questions round of this talk, one guy in the audience asked what needed to be done to fix the SoC pain. At first i completely missed the question, upon which he went and rephrased his question. My answer was: provide the infrastructure, make some honest noise and people will come. Usually, when some asks such a question, nothing ever comes from it. But Merlijn "Wizzup" Wajer and his buddy S.J.R. "Swabbles" van Schaik really came through.

Today there is the linux-exynos.org wiki, the linux-exynos mailinglist, and the #linux-exynos irc channel. While the community is not as large as linux-sunxi, it is steadily growing. So if you own exynos based hardware, or if your company is basing a product on the exynos chipset, head to linux-exynos.org and help these guys out. Linux-exynos deserves your support.

18 Feb 2015 9:05am GMT

16 Feb 2015

feedplanet.freedesktop.org

Julien Danjou: Hacking Python AST: checking methods declaration

A few months ago, I wrote the definitive guide about Python method declaration, which had quite a good success. I still fight every day in OpenStack to have the developers declare their methods correctly in the patches they submit.

Automation plan

The thing is, I really dislike doing the same things over and over again. Furthermore, I'm not perfect either, and I miss a lot of these kind of problems in the reviews I made. So I decided to replace me by a program - a more scalable and less error-prone version of my brain.

In OpenStack, we rely on flake8 to do static analysis of our Python code in order to spot common programming mistakes.

But we are really pedantic, so we wrote some extra hacking rules that we enforce on our code. To that end, we wrote a flake8 extension called hacking. I really like these rules, I even recommend to apply them in your own project. Though I might be biased or victim of Stockholm syndrome. Your call.

Anyway, it's pretty clear that I need to add a check for method declaration in hacking. Let's write a flake8 extension!

Typical error

The typical error I spot is the following:

class Foo(object):
# self is not used, the method does not need
# to be bound, it should be declared static
def bar(self, a, b, c):
return a + b - c


That would be the correct version:

class Foo(object):
@staticmethod
def bar(a, b, c):
return a + b - c


This kind of mistake is not a show-stopper. It's just not optimized. Why you have to manually declare static or class methods might be a language issue, but I don't want to debate about Python misfeatures or design flaws.

Strategy

We could probably use some big magical regular expression to catch this problem. flake8 is based on the pep8 tool, which can do a line by line analysis of the code. But this method would make it very hard and error prone to detect this pattern.

Though it's also possible to do an AST based analysis on on a per-file basis with pep8. So that's the method I pick as it's the most solid.

AST analysis

I won't dive deeply into Python AST and how it works. You can find plenty of sources on the Internet, and I even talk about it a bit in my book The Hacker's Guide to Python.

To check correctly if all the methods in a Python file are correctly declared, we need to do the following:

Flake8 plugin

In order to register a new plugin in flake8 via hacking, we just need to add an entry in setup.cfg:

[entry_points]
flake8.extension =
[…]
H904 = hacking.checks.other:StaticmethodChecker
H905 = hacking.checks.other:StaticmethodChecker


We register 2 hacking codes here. As you will notice later, we are actually going to add an extra check in our code for the same price. Stay tuned.

The next step is to write the actual plugin. Since we are using an AST based check, the plugin needs to be a class following a certain signature:

@core.flake8ext
class StaticmethodChecker(object):
def __init__(self, tree, filename):
self.tree = tree

def run(self):
pass


So far, so good and pretty easy. We store the tree locally, then we just need to use it in run() and yield the problem we discover following pep8 expected signature, which is a tuple of (lineno, col_offset, error_string, code).

This AST is made for walking ♪ ♬ ♩

The ast module provides the walk function, that allow to iterate easily on a tree. We'll use that to run through the AST. First, let's write a loop that ignores the statement that are not class definition.

@core.flake8ext
class StaticmethodChecker(object):
def __init__(self, tree, filename):
self.tree = tree

def run(self):
for stmt in ast.walk(self.tree):
# Ignore non-class
if not isinstance(stmt, ast.ClassDef):
continue


We still don't check for anything, but we know how to ignore statement that are not class definitions. The next step need to be to ignore what is not function definition. We just iterate over the attributes of the class definition.

for stmt in ast.walk(self.tree):
# Ignore non-class
if not isinstance(stmt, ast.ClassDef):
continue
# If it's a class, iterate over its body member to find methods
for body_item in stmt.body:
# Not a method, skip
if not isinstance(body_item, ast.FunctionDef):
continue


We're all set for checking the method, which is body_item. First, we need to check if it's already declared as static. If so, we don't have to do any further check and we can bail out.

for stmt in ast.walk(self.tree):
# Ignore non-class
if not isinstance(stmt, ast.ClassDef):
continue
# If it's a class, iterate over its body member to find methods
for body_item in stmt.body:
# Not a method, skip
if not isinstance(body_item, ast.FunctionDef):
continue
# Check that it has a decorator
for decorator in body_item.decorator_list:
if (isinstance(decorator, ast.Name)
and decorator.id == 'staticmethod'):
# It's a static function, it's OK
break
else:
# Function is not static, we do nothing for now
pass


Note that we use the special for/else form of Python, where the else is evaluated unless we used break to exit the for loop.

for stmt in ast.walk(self.tree):
# Ignore non-class
if not isinstance(stmt, ast.ClassDef):
continue
# If it's a class, iterate over its body member to find methods
for body_item in stmt.body:
# Not a method, skip
if not isinstance(body_item, ast.FunctionDef):
continue
# Check that it has a decorator
for decorator in body_item.decorator_list:
if (isinstance(decorator, ast.Name)
and decorator.id == 'staticmethod'):
# It's a static function, it's OK
break
else:
try:
first_arg = body_item.args.args[0]
except IndexError:
yield (
body_item.lineno,
body_item.col_offset,
"H905: method misses first argument",
"H905",
)
# Check next method
continue


We finally added some check! We grab the first argument from the method signature. Unless it fails, and in that case, we know there's a problem: you can't have a bound method without the self argument, therefore we raise the H905 code to signal a method that misses its first argument.

Now you know why we registered this second pep8 code along with H904 in setup.cfg. We have here a good opportunity to kill two birds with one stone.

The next step is to check if that first argument is used in the code of the method.

for stmt in ast.walk(self.tree):
# Ignore non-class
if not isinstance(stmt, ast.ClassDef):
continue
# If it's a class, iterate over its body member to find methods
for body_item in stmt.body:
# Not a method, skip
if not isinstance(body_item, ast.FunctionDef):
continue
# Check that it has a decorator
for decorator in body_item.decorator_list:
if (isinstance(decorator, ast.Name)
and decorator.id == 'staticmethod'):
# It's a static function, it's OK
break
else:
try:
first_arg = body_item.args.args[0]
except IndexError:
yield (
body_item.lineno,
body_item.col_offset,
"H905: method misses first argument",
"H905",
)
# Check next method
continue
for func_stmt in ast.walk(body_item):
if six.PY3:
if (isinstance(func_stmt, ast.Name)
and first_arg.arg == func_stmt.id):
# The first argument is used, it's OK
break
else:
if (func_stmt != first_arg
and isinstance(func_stmt, ast.Name)
and func_stmt.id == first_arg.id):
# The first argument is used, it's OK
break
else:
yield (
body_item.lineno,
body_item.col_offset,
"H904: method should be declared static",
"H904",
)


To that end, we iterate using ast.walk again and we look for the use of the same variable named (usually self, but if could be anything, like cls for @classmethod) in the body of the function. If not found, we finally yield the H904 error code. Otherwise, we're good.

Conclusion

I've submitted this patch to hacking, and, finger crossed, it might be merged one day. If it's not I'll create a new Python package with that check for flake8. The actual submitted code is a bit more complex to take into account the use of abc module and include some tests.

As you may have notice, the code walks over the module AST definition several times. There might be a couple of optimization to browse the AST in only one pass, but I'm not sure it's worth it considering the actual usage of the tool. I'll let that as an exercise for the reader interested in contributing to OpenStack. 😉

Happy hacking!

The Hacker's Guide to Python

A book I wrote talking about designing Python applications, state of the art, advice to apply when building your application, various Python tips, etc. Interested? Check it out.

16 Feb 2015 11:39am GMT