21 Nov 2014
So someone leaked 2011 era PowerVR SGX microcode and user space... And now everyone is pissing themselves like a bunch of overexcited puppies...
I've been fed links from several sides now, and i cannot believe how short-sighted and irresponsible people are, including a few people who should know better.
STOP TELLING PEOPLE TO LOOK AT PROPRIETARY CODE.
Having gotten that out of the way, I am writing this blog to put everyone straight and stop the nonsense, and to calmly explain why this leak is not a good thing.
Before i go any further, IANAL, but i clearly do seem to tread much more carefully on these issues than most. As always, feel free to debunk what i write here in the comments, especially you actual lawyers, especially those lawyers in the .EU.
LIBV and the PVR.
Let me just, once again, state my position towards the PowerVR.
I have worked on the Nokia N9, primarily on the SGX kernel side (which is of course GPLed), but i also touched both the microcode and userspace. So I have seen the code, worked with and i am very much burned on it. Unless IMG itself gives me permission to do so, i am not allowed to contribute to any open source driver for the PowerVR. I personally also include the RGX, and not just SGX, in that list, as i believe that some things do remain the same. The same is true for Rob Clark, who worked with PowerVR when at Texas Instruments.
This is, however, not why i try to keep people from REing the PowerVR.
The reason why i tell people to stay away is because of the design of the PowerVR and its driver stack: PVR is heavily microcode driven, and this microcode is loaded through the kernel from userspace. The microcode communicates directly with the kernel through some shared structs, which change depending on build options. There are sometimes extensive changes to both the microcode, kernel and userspace code depending on the revision of the SGX, customer project and build options, and sometimes the whole stack is affected, from microcode to userspace. This makes the powervr a very unstable platform: change one component, and the whole house of cards comes tumbling down. A nightmare for system integrators, but also bad news for people looking to provide a free driver for this platform. As if the murderous release cycle of mobile hardware wasn't bad enough of a moving target already.
The logic behind me attempting to keep people away from REing the PowerVR is, at one end, the attempt to focus the available decent developers on more rewarding GPUs and to keep people from burning out on something as shaky as the PowerVR. On the other hand, by getting everyone working on the other GPUs, we are slowly forcing the whole market open, singling out Imagination Technologies. At one point, IMG will be forced to either do this work itself, and/or to directly support open sourcing themselves, or to remain the black sheep forever.
None of the above means that I am against an open source driver for PVR, quite the opposite, I just find it more productive to work on the other GPUs amd wait this one out.
Given their bad reputation with system integrators, their shaky driver/microcode design, and the fact that they are in a cut throat competition with ARM, Imagination Technologies actually has the most to gain from an open source driver. It would at least take some of the pain out of that shaky microcode/kernel/userspace combination, and make a lot of peoples lives a lot easier.
This is not open source software.
Just because someone leaked this code, it has not magically become free software.
It is still just as proprietary as before. You cannot use this code in any open source project, or at all, the license on it applies just as strongly as before. If you download it, or distribute it, or whatever other actions forbidden in the license, you are just as accountable as the other parties in the chain.
So for all you kiddies who now think "Great, finally an open driver for PowerVR, let's go hack our way into celebrity", you couldn't be more wrong. At best, you just tainted yourself.
But the repercussion go further than that. The simple fact that this code has been leaked has cast a very dark shadow on any future open source project that might involve the powervr. So be glad that we have been pretty good at dissuading people from wasting their time on powervr, and that this leak didn't end up spoiling many man-years of work.
Why? Well, let's say that there was an advanced and active PowerVR reverse engineering project. Naturally, the contributors would not be able to look at the leaked code. But it goes further than that. Say that you are the project maintainer of such a reverse engineered driver, how do you deal with patches that come in from now on? Are you sure that they are not taken more or less directly from the leaked driver? How do you prove this?
Your fun project just turned from a relatively straightforward REing project to a project where patches absolutely need to be signed-off, and where you need to establish some severe trust into your contributors. That's going to slow you down massively.
But even if you can manage to keep your code clean, the stigma will remain. Even if lawyers do not get involved, you will spend a lot of time preparing yourself for such an eventuality. Not a fun position to be in.
The manpower issue.
I know that any clued and motivated individual can achieve anything. I also know that really clued people, who are dedicated and can work structuredly are extremely rare and that their time is unbelievably valuable.
With the exception of Rob, who is allowed to spend some of his redhat time on the freedreno driver, none of the people working on the open ARM GPU drivers have any support. Working on such a long haul project without support either limits the amount of time available for it, or severely reduces the living standard of the person doing so, or anywhere between those extremes. If you then factor in that there are only a handful of people working on a handful of drivers, you get individuals spending several man-years mostly on their own for themselves.
If you are wondering why ARM GPU drivers are not moving faster, then this is why. There are just a limited few clued individuals who are doing this, and they are on their own, and they have been at it for years by now. Think of that the next time you want to ask "Is it done yet?".
This is why I tried to keep people from REing the powerVR, what little talent and stamina there is can be better put to use on more straightforward GPUs. We have a hard enough time as it is already.
Less work? More work!
If you think that this leaked driver takes away much of the hard work of reverse engineering and makes writing an open source driver easy, you couldn't be more wrong.
This leak means that here is no other option left apart from doing a full clean room. And there need to be very visible and fully transparent processes in place in a case like this. Your one man memory dumper/bit-poker/driver writer just became at least two persons. One of them gets to spend his time ogling bad code (which proprietary code usually ends up being), trying to make sense of it, and then trying to write extensive documentation about it (without being able to test his findings much). The other gets to write code from that documentation, but also little more. Both sides are very much forbidden to go back and forth between those two positions.
As if we ARM GPU driver developers didn't have enough frustration to deal with, and the PVR stack isn't bad enough already, the whole situation just got much much worse.
So for all those who think that now the floodgates are open for PowerVR, don't hold your breath. And to those who now suddenly want to create an open source driver for the powervr, i ask: you and what army?
For all those who are rinsing out their shoes ask yourself how many unsupported man-years you will honestly be able to dedicate to this, and whether there will be enough individuals who can honestly claim the same. Then pick your boring task, and then stick to it. Forever. And hope that the others also stick to their side of this bargain.
What have we come to?
The leaked source code of a proprietary graphics driver is not something you should be spreading amongst your friends for "Lolz", especially not amongst your open source graphics driver developing friends.
I personally am not too bothered about the actual content of this one, the link names were clear about what it was, and I had seen it before. I was burned before, so i quickly delved in to verify that this was indeed SGX userspace. In some cases, with the links being posted publicly, i then quickly moved on to dissuade people from looking at it, for what limited success that could have had.
But what would i have done if this were Mali code, and the content was not clear from the link name? I got lucky here.
I am horrified about the lack of responsibility of a lot of people. These are not some cat pictures, or some nude celebrities. This is code that forbids people from writing graphics drivers.
But even if you haven't looked at this code yet, most of the damage has been done. A reverse engineered driver for powervr SGX will now probably never happen. Heck, i just got told that someone even went and posted the links to the powerVR REing mailinglist (which luckily has never seen much traffic). I wonder how that went:
Are you the guys doing the open source driver for PowerVR SGX?
I have some proprietary code here that could help you speed things along.
So for the person who put this up on github: thank you so much. I hope that you at least didn't use your real name. I cannot imagine that any employer would want to hire anyone who acts this irresponsibly. Your inability to read licenses means that you cannot be trusted with either proprietary code or open source code, as you seem unable to distinguish between them. Well done.
The real culprit is of course LG, for crazily sticking the GPL on this. But because one party "accidentally" sticks a GPL on that doesn't make it GPL, and that doesn't suddenly give you the right to repeat the mistake.
Last months ISA release.
And now for something slightly different...
Just over a month ago, there was the announcement about Imagination Technologies' new SDK. Supposedly, at least according to the phoronix article, Imagination Technologies made the ISA (instruction set architecture) of the RGX available in it.
This was not true.
What was released was the assembly language for the PowerVR shaders, which then needs to be assembled by the IMG RGX assembler to provide the actual shader binaries. This is definitely not the ISA, and I do not know whether it was Alexandru Voica (an Imagination marketing guy who suddenly became active on the phoronix forums, and who i believe to be the originator of this story) or the author of the article on Phoronix who made this error. I do not think that this was bad intent though, just that something got lost in translation.
The release of the assembly language is very nice though. It makes it relatively straightforward to match the assembly to the machine code, and takes away most of the pain of ISA REing.
Despite the botched message, this was a big step forwards for ARM GPU makers; Imagination delivered what its customers need (in this case, the ability to manually tune some shaders), and in the process it also made it easier for potential REers to create an open source driver.
Between the leak, the assembly release, and the market position Imagination Technologies is in, things are looking up though.
Whereas the leak made a credible open source reverse engineering project horribly impractical and very unlikely, it did remove some of the incentive for IMG to not support an open source project themselves. I doubt that IMG will now try to bullshit us with the inane patent excuse. The (not too credible) potential damage has been done here already now.
With the assembly language release, a lot of the inner workings and the optimization of the RGX shaders was also made public. So there too the barrier has disappeared.
Given the structure of the IMG graphics driver stack, system integrators have a limited level of satisfaction with IMG. I really doubt that this has improved too much since my Nokia days. Going open source now, by actively supporting some clued open source developers and by providing extensive NDA-free documentation, should not pose much of a legal or political challenge anymore, and could massively improve the perception of Imagination Technologies, and their hardware.
So go for it, IMG. No-one else is going to do this for you, and you can only gain from it!
21 Nov 2014 11:24pm GMT
A bit of history
A little more than year ago, with my colleague Yassine Lamgarchal and others at eNovance, we investigated on how to solve a problem often encountered inside OpenStack: synchronization of multiple distributed workers. And while many people in our ecosystem continue to drive development by adding new bells and whistles, we made a point of solving new problems with a generic solution able to address the technical debt at the same time.
Yassine wrote the first ideas of what should be the group membership service that was needed for OpenStack, identifying several projects that could make use of this. I've presented this concept during the OpenStack Summit in Hong-Kong during an Oslo session. It turned out that the idea was well-received, and the week following the summit we started the tooz project on StackForge.
Tooz is a Python library that provides a coordination API. Its primary goal is to handle groups and membership of these groups in distributed systems.
Tooz also provides another useful feature which is distributed locking. This allows distributed nodes to acquire and release locks in order to synchronize themselves (for example to access a shared resource).
If you are familiar with distributed systems, you might be thinking that there are a lot of solutions already available to solve these issues: ZooKeeper, the Raft consensus algorithm or even Redis for example.
You'll be thrilled to learn that Tooz is not the result of the NIH syndrome, but is an abstraction layer on top of all these solutions. It uses drivers to provide the real functionalities behind, and does not try to do anything fancy.
All the drivers do not have the same amount of functionality of robustness, but depending on your environment, any available driver might be suffice. Like most of OpenStack, we let the deployers/operators/developers chose whichever backend they want to use, informing them of the potential trade-offs they will make.
So far, Tooz provides drivers based on:
- Kazoo (ZooKeeper)
- SysV IPC (only for distributed locks for now)
- PostgreSQL (only for distributed locks for now)
- MySQL (only for distributed locks for now)
All drivers are distributed across processes. Some can be distributed across the network (ZooKeeper, memcached, redis…) and some are only available on the same host (IPC).
Also note that the Tooz API is completely asynchronous, allowing it to be more efficient, and potentially included in an event loop.
Tooz provides an API to manage group membership. The basic operations provided are: the creation of a group, the ability to join it, leave it and list its members. It's also possible to be notified as soon as a member joins or leaves a group.
Each group can have a leader elected. Each member can decide if it wants to run for the election. If the leader disappears, another one is elected from the list of current candidates. It's possible to be notified of the election result and to retrieve the leader of a group at any moment.
When trying to synchronize several workers in a distributed environment, you may need a way to lock access to some resources. That's what a distributed lock can help you with.
Adoption in OpenStack
Ceilometer is the first project in OpenStack to use Tooz. It has replaced part of the old alarm distribution system, where RPC was used to detect active alarm evaluator workers. The group membership feature of Tooz was leveraged by Ceilometer to coordinate between alarm evaluator workers.
Another new feature part of the Juno release of Ceilometer is the distribution of polling tasks of the central agent among multiple workers. There's again a group membership issue to know which nodes are online and available to receive polling tasks, so Tooz is also being used here.
This opens the door to push Tooz further in OpenStack. Our next candidate would be write a service group driver for Nova.
The complete documentation for Tooz is available online and has examples for the various features described here, go read it if you're curious and adventurous!
21 Nov 2014 12:10pm GMT
19 Nov 2014
Debian's latest round of angry mailing list threads have been about some combination of init systems, future direction and project governance. The details aren't particularly important here, and pretty much everything worthwhile in favour of or against each position has already been said several times, but I think this bit is important enough that it bears repeating: the reason I voted "we didn't need this General Resolution" ahead of the other options is that I hope we can continue to use our normal technical and decision-making processes to make Debian 8 the best possible OS distribution for everyone. That includes people who like systemd, people who dislike systemd, people who don't care either way and just want the OS to work, and everyone in between those extremes.
I think that works best when we do things, and least well when a lot of time and energy get diverted into talking about doing things. I've been trying to do my small part of the former by fixing some release-critical bugs so we can release Debian 8. Please join in, and remember to write good unblock requests so our hard-working release team can get through them in a finite time. I realise not everyone will agree with my idea of which bugs, which features and which combinations of packages are highest-priority; that's fine, there are plenty of bugs to go round!
Regarding init systems specifically, Debian 'jessie' currently works with at least systemd-sysv or sysvinit-core as pid 1 (probably also Upstart, but I haven't tried that) and I'm confident that Debian developers won't let either of those regress before it's released as Debian 8.
I expect the freeze for Debian 'stretch' (presumably Debian 9) to be a couple of years away, so it seems premature to say anything about what will or won't be supported there; that depends on what upstream developers do, and what Debian developers do, between now and then. What I can predict is that the components that get useful bug reports, active maintenance, thorough testing, careful review, and similar help from contributors will work better than the things that don't; so if you like a component and want it to be supported in Debian, you can help by, well, supporting it.
PS. If you want the Debian 8 installer to leave you running sysvinit as pid 1 after the first reboot, here's a suitable incantation to add to the kernel command-line in the installer's bootloader. This one certainly worked when KiBi asked for testing a few days ago:
preseed/late_command="in-target apt-get install -y sysvinit-core"
I think that corresponds to this line in a preseeding file, if you use those:
d-i preseed/late_command string in-target apt-get install -y sysvinit-core
A similar apt-get command, without the in-target prefix, should work on an installed system that already has systemd-sysv. Depending on other installed software, you might need to add systemd-shim to the command line too, but when I tried it, apt-get was able to work that out for itself.
If you use aptitude instead of apt-get, double-check what it will do before saying "yes" to this particular switchover: its heuristic for resolving conflicts seems to be rather more trigger-happy about removing packages than the one in apt-get.
19 Nov 2014 12:00am GMT
16 Nov 2014
Apparently, people care when you, as privileged person (white, male, long-time Debian Developer) throw in the towel because the amount of crap thrown your way just becomes too much. I guess that's good, both because it gives me a soap box for a short while, but also because if enough people talk about how poisonous the well that Debian is has become, we can fix it.
This morning, I resigned as a member of the systemd maintainer team. I then proceeded to leave the relevant IRC channels and announced this on twitter. The responses I've gotten have been almost all been heartwarming. People have generally been offering hugs, saying thanks for the work put into systemd in Debian and so on. I've greatly appreciated those (and I've been getting those before I resigned too, so this isn't just a response to that). I feel bad about leaving the rest of the team, they're a great bunch: competent, caring, funny, wonderful people. On the other hand, at some point I had to draw a line and say "no further".
Debian and its various maintainer teams are a bunch of tribes (with possibly Debian itself being a supertribe). Unlike many other situations, you can be part of multiple tribes. I'm still a member of the DSA tribe for instance. Leaving pkg-systemd means leaving one of my tribes. That hurts. It hurts even more because it feels like a forced exit rather than because I've lost interest or been distracted by other shiny things for long enough that you don't really feel like part of a tribe. That happened with me with debian-installer. It was my baby for a while (with a then quite small team), then a bunch of real life thing interfered and other people picked it up and ran with it and made it greater and more fantastic than before. I kinda lost touch, and while it's still dear to me, I no longer identify as part of the debian-boot tribe.
Now, how did I, standing stout and tall, get forced out of my tribe? I've been a DD for almost 14 years, I should be able to weather any storm, shouldn't I? It turns out that no, the mountain does get worn down by the rain. It's not a single hurtful comment here and there. There's a constant drum about this all being some sort of conspiracy and there are sometimes flares where people wish people involved in systemd would be run over by a bus or just accusations of incompetence.
Our code of conduct says, "assume good faith". If you ever find yourself not doing that, step back, breathe. See if there's a reasonable explanation for why somebody is saying something or behaving in a way that doesn't make sense to you. It might be as simple as your native tongue being English and their being something else.
If you do genuinely disagree with somebody (something which is entirely fine), try not to escalate, even if the stakes are high. Examples from the last year include talking about this as a war and talking about "increasingly bitter rear-guard battles". By using and accepting this terminology, we, as a project, poison ourselves. Sam Hartman puts this better than me:
I'm hoping that we can all take a few minutes to gain empathy for those who disagree with us. Then I'm hoping we can use that understanding to reassure them that they are valued and respected and their concerns considered even when we end up strongly disagreeing with them or valuing different things.
I'd be lying if I said I didn't ever feel the urge to demonise my opponents in discussions. That they're worse, as people, than I am. However, it is imperative to never give in to this, since doing that will diminish us as humans and make the entire project poorer. Civil disagreements with reasonable discussions lead to better technical outcomes, happier humans and a healthier projects.
16 Nov 2014 10:55pm GMT
15 Nov 2014
So in the last weeks, in between various other kernel work (atomic-helper conversion and few other misc things for 3.19) and RHEL stuff, I've managed to bang out initial gallium support for a4xx. There are still plenty of missing things, or stuff hard-coded, etc. But yesterday I managed to get textures working, and fix RGBA/BGRA confusion, so now enough works for 'gears and maybe about half of glmark2:
I've intentionally pushed it (just now) after the mesa 10.4 branch point, since it isn't quite ready to be enabled by default in distro mesa builds. When it gets to the point of at least being able to run a desktop environment (gnome-shell / compiz / etc), I may backport to 10.4. But there is still a lot of work to do. The good news is that so far it seems quite fast (and that is without hw binning or XA yet even!)
15 Nov 2014 3:49pm GMT
11 Nov 2014
Since a while containers have been one of the hot topics on Linux. Container managers such as libvirt-lxc, LXC or Docker are widely known and used these days. In this blog story I want to shed some light on systemd's integration points with container managers, to allow seamless management of services across container boundaries.
We'll focus on OS containers here, i.e. the case where an init system runs inside the container, and the container hence in most ways appears like an independent system of its own. Much of what I describe here is available on pretty much any container manager that implements the logic described here, including libvirt-lxc. However, to make things easy we'll focus on systemd-nspawn, the mini-container manager that is shipped with systemd itself. systemd-nspawn uses the same kernel interfaces as the other container managers, however is less flexible as it is designed to be a container manager that is as simple to use as possible and "just works", rather than trying to be a generic tool you can configure in every low-level detail. We use systemd-nspawn extensively when developing systemd.
Anyway, so let's get started with our run-through. Let's start by creating a Fedora container tree in a subdirectory:
# yum -y --releasever=20 --nogpg --installroot=/srv/mycontainer --disablerepo='*' --enablerepo=fedora install systemd passwd yum fedora-release vim-minimal
This downloads a minimal Fedora system and installs it in in
/srv/mycontainer. This command line is Fedora-specific, but most distributions provide similar functionality in one way or another. The examples section in the systemd-nspawn(1) man page contains a list of the various command lines for other distribution.
We now have the new container installed, let's set an initial root password:
# systemd-nspawn -D /srv/mycontainer Spawning container mycontainer on /srv/mycontainer Press ^] three times within 1s to kill container. -bash-4.2# passwd Changing password for user root. New password: Retype new password: passwd: all authentication tokens updated successfully. -bash-4.2# ^D Container mycontainer exited successfully. #
We use systemd-nspawn here to get a shell in the container, and then use passwd to set the root password. After that the initial setup is done, hence let's boot it up and log in as root with our new password:
$ systemd-nspawn -D /srv/mycontainer -b Spawning container mycontainer on /srv/mycontainer. Press ^] three times within 1s to kill container. systemd 208 running in system mode. (+PAM +LIBWRAP +AUDIT +SELINUX +IMA +SYSVINIT +LIBCRYPTSETUP +GCRYPT +ACL +XZ) Detected virtualization 'systemd-nspawn'. Welcome to Fedora 20 (Heisenbug)! [ OK ] Reached target Remote File Systems. [ OK ] Created slice Root Slice. [ OK ] Created slice User and Session Slice. [ OK ] Created slice System Slice. [ OK ] Created slice system-getty.slice. [ OK ] Reached target Slices. [ OK ] Listening on Delayed Shutdown Socket. [ OK ] Listening on /dev/initctl Compatibility Named Pipe. [ OK ] Listening on Journal Socket. Starting Journal Service... [ OK ] Started Journal Service. [ OK ] Reached target Paths. Mounting Debug File System... Mounting Configuration File System... Mounting FUSE Control File System... Starting Create static device nodes in /dev... Mounting POSIX Message Queue File System... Mounting Huge Pages File System... [ OK ] Reached target Encrypted Volumes. [ OK ] Reached target Swap. Mounting Temporary Directory... Starting Load/Save Random Seed... [ OK ] Mounted Configuration File System. [ OK ] Mounted FUSE Control File System. [ OK ] Mounted Temporary Directory. [ OK ] Mounted POSIX Message Queue File System. [ OK ] Mounted Debug File System. [ OK ] Mounted Huge Pages File System. [ OK ] Started Load/Save Random Seed. [ OK ] Started Create static device nodes in /dev. [ OK ] Reached target Local File Systems (Pre). [ OK ] Reached target Local File Systems. Starting Trigger Flushing of Journal to Persistent Storage... Starting Recreate Volatile Files and Directories... [ OK ] Started Recreate Volatile Files and Directories. Starting Update UTMP about System Reboot/Shutdown... [ OK ] Started Trigger Flushing of Journal to Persistent Storage. [ OK ] Started Update UTMP about System Reboot/Shutdown. [ OK ] Reached target System Initialization. [ OK ] Reached target Timers. [ OK ] Listening on D-Bus System Message Bus Socket. [ OK ] Reached target Sockets. [ OK ] Reached target Basic System. Starting Login Service... Starting Permit User Sessions... Starting D-Bus System Message Bus... [ OK ] Started D-Bus System Message Bus. Starting Cleanup of Temporary Directories... [ OK ] Started Cleanup of Temporary Directories. [ OK ] Started Permit User Sessions. Starting Console Getty... [ OK ] Started Console Getty. [ OK ] Reached target Login Prompts. [ OK ] Started Login Service. [ OK ] Reached target Multi-User System. [ OK ] Reached target Graphical Interface. Fedora release 20 (Heisenbug) Kernel 3.18.0-0.rc4.git0.1.fc22.x86_64 on an x86_64 (console) mycontainer login: root Password: -bash-4.2#
Now we have everything ready to play around with the container integration of systemd. Let's have a look at the first tool, machinectl. When run without parameters it shows a list of all locally running containers:
$ machinectl MACHINE CONTAINER SERVICE mycontainer container nspawn 1 machines listed.
The "status" subcommand shows details about the container:
$ machinectl status mycontainer mycontainer: Since: Mi 2014-11-12 16:47:19 CET; 51s ago Leader: 5374 (systemd) Service: nspawn; class container Root: /srv/mycontainer Address: 192.168.178.38 10.36.6.162 fd00::523f:56ff:fe00:4994 fe80::523f:56ff:fe00:4994 OS: Fedora 20 (Heisenbug) Unit: machine-mycontainer.scope ├─5374 /usr/lib/systemd/systemd └─system.slice ├─dbus.service │ └─5414 /bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-act... ├─systemd-journald.service │ └─5383 /usr/lib/systemd/systemd-journald ├─systemd-logind.service │ └─5411 /usr/lib/systemd/systemd-logind └─console-getty.service └─5416 /sbin/agetty --noclear -s console 115200 38400 9600
With this we see some interesting information about the container, including its control group tree (with processes), IP addresses and root directory.
The "login" subcommand gets us a new login shell in the container:
# machinectl login mycontainer Connected to container mycontainer. Press ^] three times within 1s to exit session. Fedora release 20 (Heisenbug) Kernel 3.18.0-0.rc4.git0.1.fc22.x86_64 on an x86_64 (pts/0) mycontainer login:
The "reboot" subcommand reboots the container:
# machinectl reboot mycontainer
The "poweroff" subcommand powers the container off:
# machinectl poweroff mycontainer
So much about the machinectl tool. The tool knows a couple of more commands, please check the man page for details. Note again that even though we use systemd-nspawn as container manager here the concepts apply to any container manager that implements the logic described here, including libvirt-lxc for example.
machinectl is not the only tool that is useful in conjunction with containers. Many of systemd's own tools have been updated to explicitly support containers too! Let's try this (after starting the container up again first, repeating the systemd-nspawn command from above.):
# hostnamectl -M mycontainer set-hostname "wuff"
This uses hostnamectl(1) on the local container and sets its hostname.
Similar, many other tools have been updated for connecting to local containers. Here's systemctl(1)'s
-M switch in action:
# systemctl -M mycontainer UNIT LOAD ACTIVE SUB DESCRIPTION -.mount loaded active mounted / dev-hugepages.mount loaded active mounted Huge Pages File System dev-mqueue.mount loaded active mounted POSIX Message Queue File System proc-sys-kernel-random-boot_id.mount loaded active mounted /proc/sys/kernel/random/boot_id [...] time-sync.target loaded active active System Time Synchronized timers.target loaded active active Timers systemd-tmpfiles-clean.timer loaded active waiting Daily Cleanup of Temporary Directories LOAD = Reflects whether the unit definition was properly loaded. ACTIVE = The high-level unit activation state, i.e. generalization of SUB. SUB = The low-level unit activation state, values depend on unit type. 49 loaded units listed. Pass --all to see loaded but inactive units, too. To show all installed unit files use 'systemctl list-unit-files'.
As expected, this shows the list of active units on the specified container, not the host. (Output is shortened here, the blog story is already getting too long).
Let's use this to restart a service within our container:
# systemctl -M mycontainer restart systemd-resolved.service
systemctl has more container support though than just the
-M switch. With the
-r switch it shows the units running on the host, plus all units of all local, running containers:
# systemctl -r UNIT LOAD ACTIVE SUB DESCRIPTION boot.automount loaded active waiting EFI System Partition Automount proc-sys-fs-binfmt_misc.automount loaded active waiting Arbitrary Executable File Formats File Syst sys-devices-pci0000:00-0000:00:02.0-drm-card0-card0\x2dLVDS\x2d1-intel_backlight.device loaded active plugged /sys/devices/pci0000:00/0000:00:02.0/drm/ca [...] timers.target loaded active active Timers mandb.timer loaded active waiting Daily man-db cache update systemd-tmpfiles-clean.timer loaded active waiting Daily Cleanup of Temporary Directories mycontainer:-.mount loaded active mounted / mycontainer:dev-hugepages.mount loaded active mounted Huge Pages File System mycontainer:dev-mqueue.mount loaded active mounted POSIX Message Queue File System [...] mycontainer:time-sync.target loaded active active System Time Synchronized mycontainer:timers.target loaded active active Timers mycontainer:systemd-tmpfiles-clean.timer loaded active waiting Daily Cleanup of Temporary Directories LOAD = Reflects whether the unit definition was properly loaded. ACTIVE = The high-level unit activation state, i.e. generalization of SUB. SUB = The low-level unit activation state, values depend on unit type. 191 loaded units listed. Pass --all to see loaded but inactive units, too. To show all installed unit files use 'systemctl list-unit-files'.
We can see here first the units of the host, then followed by the units of the one container we have currently running. The units of the containers are prefixed with the container name, and a colon (":"). (The output is shortened again for brevity's sake.)
list-machines subcommand of systemctl shows a list of all running containers, inquiring the system managers within the containers about system state and health. More specifically it shows if containers are properly booted up, or if there are any failed services:
# systemctl list-machines NAME STATE FAILED JOBS delta (host) running 0 0 mycontainer running 0 0 miau degraded 1 0 waldi running 0 0 4 machines listed.
To make things more interesting we have started two more containers in parallel. One of them has a failed service, which results in the machine state to be
Let's have a look at journalctl(1)'s container support. It too supports
-M to show the logs of a specific container:
# journalctl -M mycontainer -n 8 Nov 12 16:51:13 wuff systemd: Starting Graphical Interface. Nov 12 16:51:13 wuff systemd: Reached target Graphical Interface. Nov 12 16:51:13 wuff systemd: Starting Update UTMP about System Runlevel Changes... Nov 12 16:51:13 wuff systemd: Started Stop Read-Ahead Data Collection 10s After Completed Startup. Nov 12 16:51:13 wuff systemd: Started Update UTMP about System Runlevel Changes. Nov 12 16:51:13 wuff systemd: Startup finished in 399ms. Nov 12 16:51:13 wuff sshd: Server listening on 0.0.0.0 port 24. Nov 12 16:51:13 wuff sshd: Server listening on :: port 24.
However, it also supports
-m to show the combined log stream of the host and all local containers:
# journalctl -m -e
(Let's skip the output here completely, I figure you can extrapolate how this looks.)
But it's not only systemd's own tools that understand container support these days, procps sports support for it, too:
# ps -eo pid,machine,args PID MACHINE COMMAND 1 - /usr/lib/systemd/systemd --switched-root --system --deserialize 20 [...] 2915 - emacs contents/projects/containers.md 3403 - [kworker/u16:7] 3415 - [kworker/u16:9] 4501 - /usr/libexec/nm-vpnc-service 4519 - /usr/sbin/vpnc --non-inter --no-detach --pid-file /var/run/NetworkManager/nm-vpnc-bfda8671-f025-4812-a66b-362eb12e7f13.pid - 4749 - /usr/libexec/dconf-service 4980 - /usr/lib/systemd/systemd-resolved 5006 - /usr/lib64/firefox/firefox 5168 - [kworker/u16:0] 5192 - [kworker/u16:4] 5193 - [kworker/u16:5] 5497 - [kworker/u16:1] 5591 - [kworker/u16:8] 5711 - sudo -s 5715 - /bin/bash 5749 - /home/lennart/projects/systemd/systemd-nspawn -D /srv/mycontainer -b 5750 mycontainer /usr/lib/systemd/systemd 5799 mycontainer /usr/lib/systemd/systemd-journald 5862 mycontainer /usr/lib/systemd/systemd-logind 5863 mycontainer /bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation 5868 mycontainer /sbin/agetty --noclear --keep-baud console 115200 38400 9600 vt102 5871 mycontainer /usr/sbin/sshd -D 6527 mycontainer /usr/lib/systemd/systemd-resolved [...]
This shows a process list (shortened). The second column shows the container a process belongs to. All processes shown with "-" belong to the host itself.
But it doesn't stop there. The new "sd-bus" D-Bus client library we have been preparing in the systemd/kdbus context knows containers too. While you use
sd_bus_open_system() to connect to your local host's system bus
sd_bus_open_system_container() may be used to connect to the system bus of any local container, so that you can execute bus methods on it.
sd-login.h and machined's bus interface provide a number of APIs to add container support to other programs too. They support enumeration of containers as well as retrieving the machine name from a PID and similar.
systemd-networkd also has support for containers. When run inside a container it will by default run a DHCP client and IPv4LL on any veth network interface named
host0 (this interface is special under the logic described here). When run on the host networkd will by default provide a DHCP server and IPv4LL on veth network interface named
ve- followed by a container name.
Let's have a look at one last facet of systemd's container integration: the hook-up with the name service switch. Recent systemd versions contain a new NSS module nss-mymachines that make the names of all local containers resolvable via
getaddrinfo(). This only applies to containers that run within their own network namespace. With the systemd-nspawn command shown above the the container shares the network configuration with the host however; hence let's restart the container, this time with a virtual
veth network link between host and container:
# machinectl poweroff mycontainer # systemd-nspawn -D /srv/mycontainer --network-veth -b
Now, (assuming that networkd is used in the container and outside) we can already ping the container using its name, due to the simple magic of nss-mymachines:
# ping mycontainer PING mycontainer (10.0.0.2) 56(84) bytes of data. 64 bytes from mycontainer (10.0.0.2): icmp_seq=1 ttl=64 time=0.124 ms 64 bytes from mycontainer (10.0.0.2): icmp_seq=2 ttl=64 time=0.078 ms
Of course, name resolution not only works with
ping, it works with all other tools that use libc
getaddrinfo() too, among them venerable
And this is pretty much all I want to cover for now. We briefly touched a variety of integration points, and there's a lot more still if you look closely. We are working on even more container integration all the time, so expect more new features in this area with every systemd release.
Note that the whole machine concept is actually not limited to containers, but covers VMs too to a certain degree. However, the integration is not as close, as access to a VM's internals is not as easy as for containers, as it usually requires a network transport instead of allowing direct syscall access.
Anyway, I hope this is useful. For further details, please have a look at the linked man pages and other documentation.
11 Nov 2014 11:00pm GMT
Over the last couple of years, we've put some effort into better tooling for debugging input devices. Benjamin's hid-replay is an example for a low-level tool that's great for helping with kernel issues, evemu is great for userspace debugging of evdev devices. evemu has recently gained better Python bindings, today I'll explain here how those make it really easy to analyse event recordings.
Requirement: evemu 2.1.0 or later
The input needed to make use of the Python bindings is either a device directly or an evemu recordings file. I find the latter a lot more interesting, it enables me to record multiple users/devices first, and then run the analysis later. So let's go with that:
$ sudo evemu-record > mouse-events.evemu
/dev/input/event0: Lid Switch
/dev/input/event1: Sleep Button
/dev/input/event2: Power Button
/dev/input/event3: AT Translated Set 2 keyboard
/dev/input/event4: SynPS/2 Synaptics TouchPad
/dev/input/event5: Lenovo Optical USB Mouse
Select the device event number [0-5]: 5
That pipes any event from the mouse into the file, to be terminated by ctrl+c. It's just a text file, feel free to leave it running for hours.
Now for the actual analysis. The simplest approach is to read all events from a file and print them:
filename = sys.argv
# create an evemu instance from the recording,
# create=False means don't create a uinput device from it
d = evemu.Device(filename, create=False)
for e in d.events():
That prints out all events, so the output should look identical to the input file's event list. The output you should see is something like:
E: 7.817877 0000 0000 0000 # ------------ SYN_REPORT (0) ----------
E: 7.821887 0002 0000 -001 # EV_REL / REL_X -1
E: 7.821903 0000 0000 0000 # ------------ SYN_REPORT (0) ----------
E: 7.825872 0002 0000 -001 # EV_REL / REL_X -1
E: 7.825879 0002 0001 -001 # EV_REL / REL_Y -1
E: 7.825883 0000 0000 0000 # ------------ SYN_REPORT (0) ----------
The events are an evemu.InputEvent object, with the properties type, code, value and the timestamp as sec, usec accessible (i.e. the underlying C struct). The most useful method of the object is InputEvent.matches(type, code) which takes both integer values and strings:
print "this is a relative event of some kind"
elif e.matches("EV_ABS", "ABS_X"):
print "absolute X movement"
elif e.matches(0x03, 0x01):
printf "absolute Y movement"
A practical example: let's say we want to know the maximum delta value our mouse sends.
filename = sys.argv
# create an evemu instance from the recording,
# create=False means don't create a uinput device from it
d = evemu.Device(filename, create=False)
if not d.has_event("EV_REL", "REL_X") or \
not d.has_event("EV_REL", "REL_Y"):
print "%s isn't a mouse" % d.name
deltas = 
for e in d.events():
if e.matches("EV_REL", "REL_X") or \
max = max([abs(x) for x in deltas])
print "Maximum delta is %d" % (max)
And voila, with just a few lines of code we've analysed a set of events. The rest is up to your imagination. So far I've used scripts like this to help us implement palm detection, figure out ways how to deal with high-DPI mice, estimate the required size for top softwarebuttons on touchpads, etc.
Especially for printing event values, a couple of other functions come in handy here:
type = evemu.event_get_value("EV_REL")
code = evemu.event_get_value("EV_REL", "REL_X")
strtype = evemu.event_get_name(type)
strcode = evemu.event_get_name(type, code)
They do what you'd expect from them, and both functions take either strings and actual types/codes as numeric values. The same exists for input properties.
11 Nov 2014 9:17pm GMT
The following was debugged and discovered by Benjamin Tissoires, I'm merely playing the editor and publisher. All credit and complimentary beverages go to him please.
Wacom recently added two interesting products to its lineup: the Intuos Creative Stylus 2 and the Bamboo Stylus Fineline. Both are styli only, without the accompanying physical tablet and they are marketed towards the Apple iPad market. The basic idea here is that touch location is provided by the system, the pen augments that with buttons, pressure and whatever else. The tips of the styli are 2.9mm (Creative Stylus 2) and 1.9mm (Bamboo Fineline), so definitely smaller than your average finger, and smaller than most other touch pens. This could of course be useful for any touch-capable Linux laptop, it's a cheap way to get an artist's tablet. The official compatibility lists the iPads only, but then that hasn't stopped anyone in the past.
We enjoy a good relationship with the Linux engineers at Wacom, so naturally the first thing was to ask if they could help us out here. Unfortunately, the answer was no. Or more specifically (and heavily paraphrased): "those devices aren't really general purpose, so we wouldn't want to disclose the spec". That of course immediately prompted Benjamin to go and buy one.
From Wacom's POV not disclosing the specs makes sense and why will become more obvious below. The styli are designed for a specific use-case, if Wacom claims that they can work in any use-case they have a lot to lose - mainly from the crowd that blames the manufacturer if something doesn't work as they expect. Think of when netbooks were first introduced and people complained that they weren't full-blown laptops, despite the comparatively low price...
The first result: the stylus works on most touchscreens (and Benjamin has a few of those) but not on all of them. Specifically, the touchscreen on the Asus N550JK didn't react to it. So that's warning number 1: it may not work on your specific laptop and you probably won't know until you try.
Pairing works, provided you have a Bluetooth 4.0 chipset and your kernel supports it (tested on 3.18-rc3). Problem is: you can connect the device but you don't get anything out of it. Why? Bluetooth LE. Let's expand on that: Bluetooth LE uses the Generic Attribute Profile (GATT). The actual data is divided into Profiles, Services and Characteristics, which are clearly named by committee and stand for the general topic, subtopic/item and data point(s). So in the example here the Profile is Heart Rate Profile, the Service is Heart Rate Measurement and the Characteristic is the actual count of "lub-dub" on your ticker . All are predefined. Again, why does this matter? Because what we're hoping for is the Hid Service or the Hid over GATT Service service. In both cases we could then use the kernel's uhid module to get the stylus to work. Alas, the actual output of the device is:
[bluetooth]# info C5:37:E8:73:57:BE
UUID: Vendor specific (00001523-1212-efde-1523-785feabcd123)
UUID: Generic Access Profile (00001800-0000-1000-8000-00805f9b34fb)
UUID: Generic Attribute Profile (00001801-0000-1000-8000-00805f9b34fb)
UUID: Device Information (0000180a-0000-1000-8000-00805f9b34fb)
UUID: Battery Service (0000180f-0000-1000-8000-00805f9b34fb)
UUID: Vendor specific (6e400001-b5a3-f393-e0a9-e50e24dcca9e)
So we can see GAP and GATT, Device Information and Battery Service (both predefined) and 2 Vendor specific profiles (i.e. "magic fairy dust"). And this is where Benjamin got stuck - each of these may have a vendor-specific handshake, protocol, etc. And it's not even sure he'll be able to set the device up so it talks to him. So warning number 2: you can see and connect the device, but it'll talk gibberish (or nothing).
Now, it's probably possible to reverse engineer that if you have sufficient motivation. We don't. The Bluetooth spec is available though, once you work your way through that you can start working on the vendor specific protocol which we know nothing about.
Last but not least: the userspace component. The device itself is not ready-to-use, it provides pressure but you'd still have to associate it with the right touch point. That's not trivial, especially in the presence of other touch points (the outside of your hand while using the stylus for example). So we'd need to add support for this in the X drivers and libinput to make it work. Wacom and/or OS X presumably solved this for iPads, but even there it doesn't just work. The applications need to support it and "You do have to do some digging to figure out to connect the stylus to your favorite art apps -- it's a different procedure for each one, but that's common among these styluses." That's something we wouldn't do the same way on the Linux desktop. So warning number 3: if you can make the kernel work, it won't work as you expect in userspace, and getting it to work is a huge task.
Now all that pretty much boils down to: is it worthwhile? Our consensus so far was "no". I guess Wacom was right in holding back the spec after all. These devices won't work on any tablet and even if they would, we don't have anything in the userspace stack to actually support them properly. So in summary: don't buy the stylus if you plan to use it in Linux.
 lub-dub is good. ta-lub-dub is not. you don't want lub-dub-ta. wikipedia
11 Nov 2014 5:44am GMT
In my previous blog post, I was talking about a pathologically bad Linux desktop performance with FullHD monitors on Allwinner A10 hardware.
A lot of time has passed since then. Thanks to the availability of Rockchip sources and documentation, we have learned a lot of information about the DRAM controller in Allwinner A10/A13/A20 SoCs. Both Allwinner and Rockchip are apparently licensing the DRAM controller IP from the same third-party vendor. And their DRAM controller hardware registers are sharing a lot of similarities (though unfortunately this is not an exact match).
Having a much better knowledge about the hardware allowed us to revisit this problem, investigate it in more details and come up with a solution back in April 2014. The only missing part was providing an update in this blog. At least to make it clear that the problem has been resolved now. So here we go...
11 Nov 2014 12:00am GMT
10 Nov 2014
As some of you already know, since the larger restructuring in PackageKit for the 1.0 release, I am rethinking Listaller, the 3rd-party application installer for Linux systems, as well.
During the past weeks, I was playing around with a lot of different ideas and code, to make installations of 3rd-party software easily possible on Linux, but also working together with the distribution package manager. I now have come up with an experimental project, which might achieve this.
Many of you know Lennart's famous blogpost on how we put together Linux distributions. And he makes a lot of good and valid points there (in fact, I agree with his reasoning there). The proposed solution, however, is not something which I am very excited about, at least not for the use-case of installing a simple application. Leaving things like the exclusive dependency on technology like Btrfs aside, the solution outlined by Lennart basically bypasses the distribution itself, instead of working together with it. This results in a duplication of installed libraries, making it harder to overview which versions of which software component are actually running on the system. There is also a risk for security holes due to libraries not being updated. The security issues are worked around by a superior sandbox, which still needs to be implemented (but will definitively come soon, maybe next year).
I wanted to explore a different approach of managing 3rd-party applications on Linux systems, which allows sharing as much code as possible between applications.
In order to allow easy creation of software packages, as well as the ability to share software between different 3rd-party applications, I took heavy inspiration from Alexander Larssons Glick2 project, combining it with ideas from the application-directory based Listaller.
The result is Limba (named after Limba tree, not the voodoo spirit - I needed some name starting with "li" to keep the prefix used in Listaller, and for a tool like this the name didn't really matter ).
Limba uses OverlayFS to combine an application with its dependencies before running it, as well as mount namespaces and shared subtrees. Except for OverlayFS, which just landed in the kernel recently, all other kernel features needed by Limba are available for years now (and many distributions ship with OverlayFS on older kernels as well).
How does it work?
In order to to achieve separation of software, each software component is located in a separate container (= package). A software component can be an application, like Kate or GEdit, but also be a single shared library (openssl) or even a full runtime (KDE Frameworks 5 parts, GNOME 3).
Each of these software components can be identified via AppStream metadata, which is just a few bits of XML. A Limba package can declare a dependency on any other software component. In case that software is available in the distribution's repositories, the version found there can be used. Otherwise, another Limba package providing the software is required.
Limba packages can be provided from software repositories (e.g. provided by the distributor), or be nested in other packages. For example, imagine the software "Kate" requires a version of the Qt5 libraries, >= 5.2. The downloadable package for "Kate" can be bundled with that dependency, by including the "Qt5 5.2″ Limba package in the "Kate" package. In case another software is installed later, which also requires the same version of Qt, the already installed version will be used.
Since the software components are located in separate directories under /opt/software, an application will not automatically find its dependencies, or be able to locate its own files. Therefore, each application has to be run by a special tool, which merges the directory trees of the main application and it's dependencies together using OverlayFS. This has the nice sideeffect that the main application could override files from its dependencies, if necessary. The tool also sets up a new mount namespace, so if the application is compiled with a certain prefix, it does not need to be relocatable to find its data files.
At installation time, to achieve better system integration, certain files (like e.g. the .desktop file) are split out of the installed directory tree, so the newly installed application achieves almost full system integration.
Can I use Limba now?
Limba is an experiment. I like it very much, but it might happen that I find some issues with it and kill it off again. So, if you feel adventurous, you can compile the source code and use the example "Foobar" application to play around with Limba. Before it can be used in production (if at all), some more time is needed.
I will publish documentation on how to test the project soon.
Doesn't OverlayFS have a maximum stacking depth?
Oh yes it has! The "How does it work" explanation doesn't tell the whole truth in that regard (mainly to keep the section small). In fact, Limba will generate a "runtime" for the newly installed software, which is a directory with links to the actual individual software components the runtime consists of. The runtime is identified by an UUID. This runtime is then mounted together with the respective applications using OverlayFS. This works pretty great, and also results in no dependency-resolution to be done immediately before an application is started.
Than dependency stuff gives me a headache…
Admittedly, allowing dependencies adds a whole lot of complexity. Other approaches, like the one outlined by Lennart work around that (and there are good reasons for doing that as well).
In my opinion, the dependency-sharing and de-duplication of software components, as well as the ability to use the components which are packaged by your Linux distribution is worth the extra effort.
Can you give an overview of future plans for Limba?
Sure, so here is the stuff which currently works:
- Creating simple packages
- Installing packages
- Very basic dependency resolution (no relations (like >, <, =) are respected yet)
- Running applications
- Initial bits of system integration (.desktop files are registered)
These features are planned for the new future:
- Support for removing software
- Automatic software updates of 3rd-party software
- Atomic updates
- Better system integration
- Integration with the new sandboxing features
- GPG signing of packages
- More documentation / bugfixes
Remember that Limba is an experiment, still
Technically, I am replacing one solution with another one here, so the situation does not change at all ;-). But indeed, some duplicate work is done due to more people working in this area now on similar questions.
But I think this is a good thing, because the solutions worked on are fundamentally different approaches, and by exploring multiple ways of doing things, we will come up with something great in the end. (XKCD reference)
Doesn't the use of OverlayFS have an impact on the performance of software running with Limba?
I ran some synthetic benchmarks and didn't notice any problems - even the startup speed of Limba applications is only a few milliseconds slower than the startup of the "raw" native application. However, I will still have to run further tests to give a definitive answer on this.
How do you solve ABI compatibility issues?
This approach requires software to keep their ABI stable. But since software can have strict dependencies on a specific version of a software (although I'd discourage that), even people who are worried about this issue can be happy. We are getting much better at tracking unwanted ABI breaks, and larger projects offer stable API/ABI during a major release cycle. For smaller dependencies, there are, as explained above, stricter dependencies.
In summary, I don't think ABI incompatibilities will be a problem with this approach - at least not more than they have been in general. (The libuild facilities from Listaller to minimize dependencies will still be present im Limba, of course)
You are wrong because of $X!
Please leave a comment in this case! I'd love to discuss new ideas and find the limits of the Limba concept - that's why I am writing C code afterall, since what looks great on paper might not work in reality or have issues one hasn't thought about before. So any input is welcomed!
Last but not least I want to thank Alexander Larsson for writing Glick2, which Limba is heavily inspired from, and for his patient replies to my emails.
If Limba turns out to be a good idea, you can expect a few more blog posts about it soon.
* Answered questions nobody asked yet
: Don't get me wrong, I would like to have these ideas implemented - they offer great value. But I think for "simple" software deployment, the solution is an overkill.
10 Nov 2014 12:06pm GMT
07 Nov 2014
okay another braindump (still nothing working).
The git repo mentioned in previous post has all the code I've hacked up so far.
I finished writing the HDCP protocol stages, and sending all the msgs and getting replies from the device.
So I've successfully reached a point where I've negotiated a HDCP session key with the device, and we are both happy about it. Unfortunately I've no idea what I'm meant to be encrypting to send to the device. The next packet the USB traces contain is 384-bytes of encrypted data.
Now HDCP v2 had a vulnerabilty in its key neg, and I've written code to try and use this fact. So I've taken a trace I made from Windows, and extracted the necessary bits, and using that I've managed to derive the master key used in that trace, and subsequently managed to derived the session key for it. So I've replayed the first encrypted packet from the trace to the device and got an encrypted response the same as in the trace.
I've tried changing a bit in the session key, riv value and data I'm sending, and doing that causes the device not to reply with the answer. This to me implies that the device is using the HDCP cipher to encode the control channel. Now HDCP does say you should only do this for video streams, but maybe DisplayLink forgot to read that bit.
Now where does this leave me, in theory I should be able to replay the full trace (haven't had time yet) and I should see the same picture on screen as I did (though I can't remember what monitor/device I used, so I might have to retrace and restage my tests before then).
However I really need to decrypt the encrypted data in the trace, and from reading the HDCP spec the only values I need to feed the AES engine are ks ^ lc128, riv, streamctr, inputctr. I'm assuming streamctr and inputctr are 0 for the first packet (I could be wrong, maybe they use some wacky streamctr to avoid messing with hdcp), riv and ks I've captured. So lc128 is possibly the crux.
Now what is lc128? Its a secret 128-bit value in the HDCP world given only to HDCP adopters. Its normally something you'd store in hw on the GPU etc as an input to the hw cipher. But in displaylink there is no GPU encrypting the data. Now its possible that displaylink don't use the same lc128 as the HDCP people, unlikely but possible. Maybe they cipher their streams with their own lc128, and only use the offical hdcp lc128 for actual HDCP streams.
I don't think lc128 has leaked, I'm not sure what the consequences of it leaking would be, but hey its just a magic number, and if displaylink are using as an input to their AES code, it must be in RAM at some point, now I need to figure out ways to work that out. I'm not sure how long it would take to brute force as 128-bit key space, probably impossible.
At any point if someone from DisplayLink wants to talk, you know where to find me :-)
07 Nov 2014 4:21am GMT
I got back to working on enabling LLVM's machine scheduler for radeonsi targets in the R600 backend after seeing a really good tutorial about how it works at this year's LLVM Developer's conference.
Since I last worked on this, I've figured out how to enable register pressure tracking in the scheduler, so now the scheduler will switch to a register pressure reduction strategy once register usage approaches the threshold where using more registers reduces the number of threads that can run in parallel.
So far the results look pretty good, several of the Phoronix benchmarks are faster with the scheduler enabled. However, I am still trying to track down a bug which is causing the xonotic benchmark to lockup when using the 'ultra' settings.
If anyone wants to test it out, I've pushed the code to my personal repo.
07 Nov 2014 2:37am GMT
06 Nov 2014
… or "Why do I not see any update notifications on my brand-new Debian Jessie installation??"
This is a short explanation of the status quo, and also explains the "no update notifications" issue in a slightly more detailed way, since I am already getting bug reports for that.
As you might know, GNOME provides GNOME-Software for installation of applications via PackageKit. In order to work properly, GNOME-Software needs AppStream metadata, which is not yet available in Debian. There was a GSoC student working on the necessary code for that, but the code is not yet ready and doesn't produce good results yet. Therefore, I postponed AppStream integration to Jessie+1, with an option to include some metadata for GNOME and KDE to use via a normal .deb package.
Then, GNOME was updated to 3.14. GNOME 3.14 moved lots of stuff into GNOME-Software, including the support for update-notifications (which have been in g-s-d before). GNOME-Software is also the only thing which can edit the application groups in GNOME-Shell, at least currently.
So obviously, there was no a much stronger motivation to support GNOME-Software in Jessie. The appstream-glib library, which GNOME-Software uses exclusively to read AppStream metadata, didn't support the DEP-11 metadata format which Debian uses in place of the original AppSTream XML for a while, but does so in it's current development branch. So that component had to be packaged first. Later, GNOME-Software was uploaded to the archive as well, but still lacked the required metadata. That data was provided by me as a .deb package later, locally generated using the current code by my SoC student (the data isn't great, but better than nothing). So far with the good news.
But there are multiple issues at time. First of all, the appstream-data package didn't pass NEW so far, due to it's complex copyright situation (nothing we can't resolve, since app-install-data, which appstream-data would replace, is in Debian as well). Also, GNOME-Software is exclusively using offline-updates (more information also on  and ) at time. This isn't always working at the moment, since I haven't had the time to test it properly - and I didn't expect it to be used in Debian Jessie as well.
Furthermore, the offline-updates feature requires systemd (which isn't an issue in itself, I am quite fine with that, but people not using it will get unexpected results, unless someone does the work to implement offline-updates with sysvinit).
Since we are in freeze at time, and obviously this stuff is not ready yet, GNOME is currently without update notifications and without a way to change the shell application groups.
So, how can we fix this? One way would of course be to patch notification support back into g-s-d, if the new layout there allows doing that. But that would not give us the other features GNOME-Software provides, like application-group-editing.
Implementing that differently and patching it to make it work would be more or at least the same amount of work like making GNOME-Software run properly. I therefore prefer getting GNOME-Software to run, at least with basic functionality. That would likely mean hiding things like the offline-update functionality, and using online-updates with GNOME-PackageKit instead.
Obviously, this approach has it's own issues, like doing most of the work post-freeze, which kind of defeats the purpose of the freeze and would need some close coordination with the release-team.
So, this is the status quo at time. It is kind of unfortunate that GNOME moved crucial functionality into a new component which requires additional integration work by the distributors so quickly, but that's something which isn't worth to talk about. We need a way forward to bring update-notifications back, and there is currently work going on to do that. For all Debian users: Please be patient while we resolve the situation, and sorry for the inconvenience. For all developers: If you would like to help, please contact me or Laurent Bigonville, there are some tasks which could use some help.
As a small remark: If you are using KDE, you are lucky - Apper provides the notification support like it always did, and thanks to improvements in aptcc and PackageKit, it even is a bit faster now. For the Xfce and <other_desktop> people, you need to check if your desktop provides integration with PackageKit for update-checking. At least Xfce doesn't, but after GNOME-PackageKit removed support for it (which was moved to gnome-settings-daemon and now to GNOME-Software) nobody stepped up to implement it yet (so if you want to do it - it's not super-complicated, but knowledge of C and GTK+ is needed).
: It looks like dpkg tries to ask a debconf question for some reason, or an external tool like apt-listchanges is interfering with the process, which must run completely unsupervised. There is some debugging needed to resolve these Debian-specific issues.
06 Nov 2014 10:34am GMT
04 Nov 2014
So just a quick update. I pushed out the 1.5 release of Transmageddon today. No major new features just fixing a regression in terms of dealing with files where you only have a video track or where you want to drop the audio track as part of the transcoding process. I am also having some issues with Intel Hardware encoding atm, but I think those are somewhere lower in the stack, so I hope to file a bug against either GStreamer or the libva project for that issue, but for now I recommend not having the Intel VA plugins for GStreamer installed.
As always you find the latest release on linuxrising.org.
I also submitted a Transmageddon update to Fedora 21, so if you are a Fedora user please test the build there and give it some Karma
04 Nov 2014 6:18pm GMT
03 Nov 2014
So I've just reposted my atomic modeset helper series, and since the main goal of all that work was to ensure a smooth and simple transition for existing drivers to the promised atomic land it's time to elaborate a bit. The big problem is that the existing helper libraries and callbacks to driver backends don't really fit the new semantics, so some shuffling was required to avoid long-term pain. So if you are a driver writer and just interested in the details then read for what needs to be done to support atomic modeset updates using these new helper libraries.
Phase 1: Reworking the Driver Backend Functions for Planes
The first phase is reworking the driver backend callbacks to fit the new world. There are two big mismatches between the new atomic semantics and legacy ioctl interfaces:
- The primary plane is no longer tied to the CRTC. Instead it is possible to enable the CRTC without any planes (resulting in a black screen) or only overlay planes. And the primary plane can be enabled/disabled and moved without changing the mode (of course only if the hardware actually supports it). But the existing CRTC helper library used to implement modesets only provides the single
crtc->mode_setdriver callback which always implicitly enables the primary plane, too.
- Atomic updates of multiple planes isn't supported at all. And worse the code to check whether a plane update will work out is smashed into the same callback that does the actual plane update, defeating the check/commit distinction used in atomic interfaces.
Both issues are addressed by adding new driver backend callbacks. Furthermore a few transitional helper functions are provided to implement the legacy entry points in terms of these new callbacks. That way the driver backend can be reworked without the additional hassle of needing to deal with all the atomic state object handling and check/commit semantics.
The first step is to rework the
->disable/update_plane hooks using the transitional helper implementations
drm_plane_helper_update/disable. These need the following new driver callbacks:
->atomic_checkfor both CRTCs and planes. This isn't strictly required, but any state checks implemented in the current
->update_planehook must be moved into the plane's ->atomic_check callback. The CRTC's callback will likely be empty for now.
->atomic_flushCRTC callbacks. These wrap the actual plane update and should do per-CRTC work like preparing to send out the flip completion event. Or ensure that the plane updates are actually done atomically by e.g. setting/clearing GO bits or latching the update through some other means. Or if the hardware does not provide any support for synchronized updates, use vblank evasion to ensure all updates happen on the same frame.
->cleanup_fbhooks are also optional. These are used to setup the framebuffers, e.g. pin their backing storage into memory and set up any needed hardware resources. The important part is that besides the
->prepare_fbis the only new callback which is allowed to fail. This is important to make asynchronous commits of atomic state updates work. The helper library guarantees that for any successful call of
->prepare_fbit will call
->cleanup_fb- even when something else fails in the atomic update.
- Finally there's
->atomic_update. That's the function which does all the per-plane update, like setting up the new viewport or the new base address of the framebuffer for each plane.
With this it's also easy to implement universal plane support directly, instead of with the default implementation which doesn't allow the primary plane to be disabled. Universal planes are a requirement for atomic and need to be implemented in phase 1, but testing the primary plane support is also a good preparation for the next step:
crtc->mode_set_nofb callback must be implement, which just updates the CRTC timings and data in the hardware without touching the primary plane state at all. The provided helpers functions
drm_helper_crtc_mode_set_base then implement the callbacks required by the CRTC helpers in terms of the new
->mode_set_nofb callback and the above newly implemented plane helper callbacks.
Phase 2: Wire up the Atomic State Object Scaffolding
With the completion of phase 1 all the driver backend functions have been adapted to the new requirements of the atomic helper library. The goal of phase 2 is to get all the state object handling needed for atomic updates into place. There are three steps to that:
- The first is fairly simply and consists in just wiring up all the state reset, duplicate and destroy functions for planes, CRTCs and connectors. Except for really crazy cases the default implementations from the atomic helper library should be good enough, at least to get started. With this there will always be an atomic state object stored in each object's
- The second step is patching up the state objects in legacy code paths to make sure that we can partially transition to atomic updates. If your driver doesn't have any transition checks for plane updates (i.e. doesn't ever look at the old state to figure out whether an change is possible) then all you need to do is keep the framebuffer pointers and reference counts in balance with
drm_atomic_set_fb_for_plane. The transitional helpers from phase 1 already do this, so usually the only place this is needs to be manually added is in the
- Finally all
->mode_fixupcallbacks need to be audited to not depend upon any state which is only set in the various CRTC helper callbacks and not tracked by the atomic state objects. This isn't required for implementing the legacy interfaces using atomic updates, but this is important to correctly implement the check/commit semantics. Especially when the commit is done asynchronously. This really is a corner-case though, but any such code must be moved into
->atomic_checkhooks and rewritten in terms of the atomic state objects.
Phase 3: Rolling out Atomic Support
With the driver backend changes from phase 1 and the state handling changes from phase 2 everything is ready for the step-by-step rollout of atomic support. Presuming nothing was missed this just consists of wiring up the ->atomic_check and ->atomic_commit implementations from the atomic helper library. And then replacing all the legacy entry pointers with the corresponding functions from the atomic helper library to implement them in terms of atomic.
The recommended order is to start with planes first, then test the ->set_config functionality. Page flips and properties are best done later since they likely need some additional work:
- The atomic helper library doesn't provide any default asynchronous commit support, since driver and hardware requirements seem to be too diverse. At least until we have a few proper implementations and can use them to extract a good set of helper functions. Hence drivers must implement basic async commit support using the building blocks provided (and other drm and kernel infrastructre like flip queues, fence callbacks and work items - hopefully soonish also vblank callbacks).
- Property values need to be moved into the relevant state object first before the corresponding implementations can be wired up. As long as the driver doesn't yet support the full atomic ioctl this can be done at leisure, but must be completed before the ioctl can be enabled. To do so drivers need to subclass the relevant state structure and reimplement all the state handling functions rolled out in phase 2.
Besides these two complications (which might require a bit of work depending upon the driver) this is all that's needed for full atomic modeset and pageflip support.
Follow-up Driver Cleanups
But there's of course quite a bit of cleanup work possible afterards!
The are some big differences between the old CRTC helper modeset logic and the new one (using the same callbacks, but completely rewritten otherwise) in the atomic helper library:
- The encoder/bridge/CRTC enabling/disabling sequence for a given modeset configuration is now always the same. Which means unused CRTC won't be disabled any more only after everything else is set up, but together with all the other blocks before enabling anything from the new configuration. Also, when an output pipeline changes the helper library will always force a full modeset of the entire pipeline.
This reduces combinatorial complexity a lot and should especially help with shared resources (like PLLs) - no longer can a modeset spuriously fail just because the old CRTC hasn't released its PLL before the new one was enabled.
- Thanks to the atomic state tracking the helper code won't lose track of the software state of each object any more. Which means disabled functions won't be disabled more than once. So all code in the driver-backend which checks the current state and acts accordingly can be flattened and replaced by WARNings.
These are all lessons learned from the i915 modeset rewrite. The only thing missing in the atomic helpers compared to i915 is the state readout and cross-checking support - everything else is there. But even that can be easily implemented by adding hardware state readout callbacks and using them in the various state reset functions (to reconstruct matching software state) and also to cross-check state.
The other big cleanup task is to stop using all the legacy state variables and switch all the driver backend code to only look at the state object structures. The two big examples here are
crtc->mode and the
So What Now?
With all that converting drivers should be simple and can be done with a series of not-too-invasive refactorings. But my patch series doesn't yet contain the actual atomic modeset ioctl. So what's left to be done in the drm core?
- Per-plane locking is still missing - currently all plane-related changes just lock all CRTCs. Which is a bit too much, and in cases like the cursor plane or for page flips actually a regression compared to the legacy code paths.
- The atomic ioctl uses properties for everything, even for the standard properties inherited from the legacy ioctls. All the code for parsing properties exists already in Rob Clark's patch series, but needs to be rebased and adpated to the slightly different interfaces this latest iteration of the internal atomic interface has.
- The fbdev emulation needs to grow proper atomic check/commit support. This is both a good kernel-internal validation of the atomic interface and would finally allow us to get multi-pipe configuration for fbcon to work correctly. But before we can do this we need a driver with multiple CRTCs, shared resource constraints and proper atomic support to be able to even test this.
- There's still some room for more helpers, for example pretty much all drivers have some sort of vblank driver callback and work item infrastructure. That's better done as a helper in the core vblank handling code.
- And finally we need the actual ioctl code.
So still a few things to do, besides adding atomic support to all drivers.
Update: The explanation for how to implement state readout and cross checking was a bit confused, so I reworded that.
03 Nov 2014 10:23pm GMT
02 Nov 2014
Since Dave Airlie moved the feature cut-off of the drm-next tree roughly one month ahead it is already time for our regular look at what's ahead. Even though the 3.17 features aren't even released yet.
On the modeset side of things we now have the final pieces for plane rotation support from Sonika Jindal and Ville. The DisplayPort code has also seen lots of improvements, with updated training values in preparation of the latest eDP standard (Sonika Jindal) and support for DP training pattern 3 (Ville). DSI panels now support burst mode (Shobhit) and hdmi conformance has been improved with some fixes from Clint Taylor.
For eDP panels we also have improved panel power sequencing code, mostly to fix issues on Cherryview, from Ville. Ville has also contributed fixes to the VDD handling code, which is used to temporarily enable panel power. And the backlight code learned to handle the bl_power setting so that the backlight can be turned off completely without upsetting the panel's power sequencing, contributed by Jani.
Chris Wilson has also been fairly busy on the modeset code: 3.18 includes his patches to cache EDIDs for a single probe call - unfortunately the full caching solution to keep the EDID around between multiple probe calls isn't merged yet. And pageflips have now improved error detection and recovery logic: In case something goes wrong we shouldn't end up stuck any longer waiting for a pageflip to complete that has been lost by either the hardware or the driver.
Moving on to platform specific work there's been lots of preparations for Skylake, most of it from Damien and Sonika. The actual intial platform enabling is delayed for 3.19 though. On the other end of the timeline Ville fixed up i830M modeset support on a rainy w/e in his vacation, and 3.18 now has all that code. And there has been a lot of Cherryview fixes all over.
Cherryview also gained support for power wells and hence runtime pm (Ville). And for platform agnostic feature a lot of the preparation for DRRS (dynamic refresh rate switching) is merged, hopefully the actual feature patches from Vandana Kannan will land in 3.19.
Moving on the render side of the driver there's been a lot of patches to beat the full ppgtt support into shape. The context code has been cleaned up, lifetime handling for ppgtt address spaces is fixed and bad interactions with secure batches are now also rectified. Enabling full ppgtt missed the feature cutoff by a hair though, but it's already enabling for the following release.
Basic support for execlists command submission from Ben Widawsky, Oscar Mateo and Thomas Daniel was also merged. This is the fancy new way to submit commands available on Gen8 and subsequent platforms. It's not yet enabled by default, but since it's a requirement for a lot of cool new features keep an eye on what's going on here. There is also a lot of work going on underneath to enable all this new code in GEM, like preparing to switch away from sequence numbers to tracking gpu progress more abstractly using the driver's request structures.
And this time around there is also some cool stuff going on in the drm core worth of a shout-out: The vblank handling code is massively revamped, hopefully plugging all the small races, inconsistencies and inefficiencies in that code. And thanks to David Herrmann it is finally possible to write a drm driver without the drm midlayer getting completely in the way of a proper driver load and unload sequence! Unfortunately i915 can't be converted right away since the legacy usermodesetting code crucial relies on this midlayer functionality. But that's now well deprecated and hopefully can be removed in one of the next releases.
02 Nov 2014 2:31pm GMT