21 Feb 2018


Alan Coopersmith: Oracle Solaris 11.3 SRU 29 Released

We've just released Oracle Solaris 11.3 SRU 29. It contains some important security fixes and enhancements. SRU29 is now available from My Oracle Support Doc ID 2045311.1, or via 'pkg update' from the support repository at https://pkg.oracle.com/solaris/support .

Features included in this SRU include:

The SRU also updates the following components which have security fixes:

Full details of this SRU can be found in My Oracle Support Doc 2361795.1
For the list of Service Alerts affecting each Oracle Solaris 11.3 SRU, see Important Oracle Solaris 11.3 SRU Issues (Doc ID 2076753.1).

21 Feb 2018 5:51pm GMT

Alyssa Rosenzweig: My Name is Cafe Beverage

I have a secret, so secret that it cannot be whispered without multiple layers of encryption. So secret that it could only be told to the world under a pseudonym -- an alter ego -- except to the closest of friends on a "need to know" basis. So secret that this pseudonym only connected to the Internet through Tor, using the Tor Browser Bundle and Tor Messenger to maintain resilience against even the most subtle of identity leaks.

So secret that I'll write it on my blog under my real legal name, for all the world to see and criticise:

Over the summer I founded the chai project (proof). My goal was to write a free software driver for Midgard GPUs, better known as the Mali T series produced by ARM. Why? Libreboot supports the Rockchip RK3288 chipset, containing a Mali T760 GPU that is unusable without proprietary software. I purchased a laptop1 with this chipset, and thus my work began.

I was not the first to free the GPU. Luc Verhaegan, better known by his handle libv, precedes me. His project, Tamil, reverse engineered enough of the command stream2 to display a textured cube at FOSDEM.

But the project ceased there due to the pushback; he never released his work. To this day, the source code for Tamil remains only on the hard drive of libv, invisible to the rest of the free software community. Without that source code, Tamil could not enable free graphics acceleration on my own laptop.

I was aware of his ill-fated tale. That said, I was not discouraged. My work with the GPU would proceed, only carefully enough to avoid his own fate. For starters, to avoid the risk that a future me would withhold the source code, I published everything from the beginning, immediately as it was written. Thankfully, the source code remains online, downloaded by many others, spread out between several laptops and servers hosted in multiple countries. Retroactive attempts to censor me would inevitably fail thanks to the Streisand effect. The work is safe.

My other change was to adopt a pseudonym. By contrast, libv worked under his real name, so when there was alleged retaliation, they knew exactly who to target. His real person was exposed to damage as a result of Tamil. This could be a problem for me: depending on my life aspirations and my personal resilience, to work on this GPU under my real name could pose a threat to me as well. Thus, I, Alyssa Rosenzweig, student, writer, programmer, and artist, did not author the work.

Cafe Beverage did. If the project were to come into jeopardy, no harm would come to me. Only my alias cafe was at risk. And at least at first, I was surgically careful to ensure that an outsider could not correlate my two identities.

Yet here I am, six months later, proclaiming it for all to hear: I am Cafe Beverage.

What changed?

I was no longer able to handle the cognitive load of a second persona. Technically, I am well-versed in anonymity software, as an avid user of Tor, GPG, and OTR. But socially and psychologically? Nobody warned me of the mental toll it would take to have to separate myself from, well, myself.

To use anonymity technologies passively is easy. When simply browsing the web in peace, merely exercising the anonymous right to read, the only burden is the frustration of the network speed itself. But once you begin to contribute, changing from anonymity to pseudonymity, the complexity balloons. It becomes necessary to scrutinise every little thing you do, in both your pseudonymous and real lives, to ensure that there will be no crossing over whatsoever.

There is no golden rule to staying pseudonymous. No, there are dozens of rules, most of which are easy to ignore and difficult to follow. It is nearly impossible to balance it all for anyone not already well-versed in operational security ("opsec").

What types of precautions are needed? The Whonix project maintains a short list of twenty-two basic missteps which compromise your anonymity. They also maintain a specific guide on anonymous "Surfing, Posting, and Blogging" with another endless list of attacks to defend against. There are, of course, the precautions given by the Tor project itself, although really you should read through the entirety of the Tor FAQ -- it's only twenty-one thousand words.

Of course, the mind-boggling complexity is warranted. For an activist in a repressive regime, perhaps the archetypical blogger in Iran, these precautions are necessary and hardly enough. For someone in such a situation, the slightest identity slip is lethal. Maintaining full separation of identities is difficult, but when a user's own government is determined to kill dissidents, there are bigger concerns than the cognitive burden of pseudonymity.

But for me? For me, there is no risk of physical injury from my research. libv is still alive despite his "secret" being out for years. There are no black bags in my future from publicising who cafe really is.

Of course, there is a third option: disappear. Stop using the cafe identity, pretend it never existed, let my nick on IRC expire. The other contributors would not know me well enough to compromise me. I could cease the mental gymnastics. There would be virtually no risk. As far as an observer of my non-anonymous self is concerned, the project never existed.

For six months, I chose this option, or rather I succumbed to it. The quantity of my code contributions slowed to a halt; the frequency of my check-ins to IRC faded soon thereafter. At some point, I must have logged in as cafe for the last time, but I do not remember this final milestone. I merely faded into obscurity, until one day I wiped my hard drive during a routine operating system install and lost cafe's passwords. I don't mourn the missing files.

But the new year has come. I have switched computers; today, I am using my Libreboot laptop full-time. This post is being written from vim on that very laptop, running only free software. I need the free Midgard GPU drivers more than ever, and I am willing to put in the work to bring them into fruition.

So note the pronoun: I will be the one continuing on the project, alongside the awesome folks from the BiOpenly community. Not cafe nor a new pseudonym conjured from mid-air, but I. GPU driver development is difficult enough without the mental juggling associated with creating a person out of nothing.

To come out of the shadows brings legitimacy to the project. It will clear up any legal uncertainties surrounding copyright, the complexities of which are amplified when the author of a work is a pseudonym who disappeared. It will allow me to take ownership of my driver work, for instance on a C.V.

Most of all, it will allow me to be myself. It will grant me a different type of digital freedom, a familiar breath of fresh air. Coming out as the author of chai is, in many respects, the same as coming out as a queer person.

My name is Cafe Beverage.

It feels good to write it.

  1. An Asus C201 Chromebook made in 2015

  2. GPU parlance for the protocol used to interface with the hardware. It is also necessary to reverse engineer the shader binaries, a task which was completed successfully by Connor Abbott

21 Feb 2018 4:16pm GMT

Alyssa Rosenzweig: Hello, Triangle!

whistles -- Nothing to see here, move along kids.

Hello, Triangle!

Hello, Triangle!

21 Feb 2018 4:16pm GMT

Nicolai Hähnle: TableGen #2: Functional Programming

This is the second part of a series; see the first part for a table of contents.

When the basic pattern of having classes with variables that are filled in via template arguments or let-statements reaches the limits of its expressiveness, it can become useful to calculate values on the fly. TableGen provides string concatenation out of the box with the paste operator ('#'), and there are built-in functions which can be easily recognized since they start with an exclamation mark, such as !add, !srl, !eq, and !listconcat. There is even an !if-builtin and a somewhat broken and limited !foreach.

There is no way of defining new functions, but there is a pattern that can be used to make up for it: classes with ret-values:

class extractBit<int val, int bitnum> {
bit ret = !and(!srl(val, bitnum), 1);

class Foo<int val> {
bit bitFour = extractBit<val, 4>.ret;

def Foo1 : Foo<5>;
def Foo2 : Foo<17>;

This doesn't actually work in LLVM trunk right now because of the deficiencies around anonymous record instantiations that I mentioned in the first part of the series, but after a lot of refactoring and cleanups, I got it to work reliably. It turns out to be an extremely useful tool.

In case you're wondering, this does not support recursion and it's probably better that way. It's possible that TableGen is already accidentally Turing complete, but giving it that power on purpose seems unnecessary and might lead to abuse.

Without recursion, a number of builtin functions are required. There has been a !foreach for a long time, and it is a very odd duck:

def Defs {
int num;

class Example<list<int> nums> {
list<int> doubled = !foreach(Defs.num, nums, !add(Defs.num, Defs.num));

def MyNums : Example<[4, 1, 9, -3]>;

In many ways it does what you'd expect, except that having to define a dummy record with a dummy variable in this way is clearly odd and fragile. Until very recently it did not actually support everything you'd think even then, and even with the recent fixes there are plenty of bugs. Clearly, this is how !foreach should look instead:

class Example<list<int> nums> {
list<int> doubled =
!foreach(x, nums, !add(x, x));

def MyNums : Example<[4, 1, 9, -3]>;

... and that's what I've implemented.

This ends up being a breaking change (the only one in the whole series, hopefully), but !foreach isn't actually used in upstream LLVM proper anyway, and external projects can easily adapt.

A new feature that I have found very helpful is a fold-left operation:

class Enumeration<list<string> items> {
list<string> ret = !foldl([], items, lhs, item,
!listconcat(lhs, [!size(lhs) # ": " # item]));

def MyList : Enumeration<["foo", "bar", "baz"]>;

This produces the following record:

def MyList { // Enumeration
list<string> ret = ["0: foo", "1: bar", "2: baz"];
string NAME = ?;

Needless to say, it was necessary to refactor the TableGen tool very deeply to enable this kind of feature, but I am quite happy with how it ended up.

The title of this entry is "Functional Programming", and in a sense I lied. Functions are not first-class values in TableGen even with my changes, so one of the core features of functional programming is missing. But that's okay: most of what you'd expect to have and actually need is now available in a consistent manner, even if it's still clunkier than in a "real" programming language. And again: making functions first-class would immediately make TableGen Turing complete. Do we really want that?

21 Feb 2018 10:25am GMT

Nicolai Hähnle: TableGen #1: What has TableGen ever done for us?

This is the first entry in an on-going series. Here's a list of all entries:

  1. What has TableGen ever done for us?
  2. Functional Programming
  3. to be continued

Anybody who has ever done serious backend work in LLVM has probably developed a love-hate relationship with TableGen. At its best it can be an extremely useful tool that saves a lot of manual work. At its worst, it will drive you mad with bizarre crashes, indecipherable error messages, and generally inscrutable failures to understand what you want from it.

TableGen is an internal tool of the LLVM compiler framework. It implements a domain-specific language that is used to describe many different kinds of structures. These descriptions are translated to read-only data tables that are used by LLVM during compilation.

For example, all of LLVM's intrinsics are described in TableGen files. Additionally, each backend describes its target machine's instructions, register file(s), and more in TableGen files.

The unit of description is the record. At its core, a record is a dictionary of key-value pairs. Additionally, records are typed by their superclass(es), and each record can have a name. So for example, the target machine descriptions typically contain one record for each supported instruction. The name of this record is the name of the enum value which is used to refer to the instruction. A specialized backend in the TableGen tool collects all records that subclass the Instruction class and generates instruction information tables that is used by the C++ code in the backend and the shared codegen infrastructure.

The main point of the TableGen DSL is to provide an ostensibly convenient way to generate a large set of records in a structured fashion that exploits regularities in the target machine architecture. To get an idea of the scope, the X86 backend description contains ~47k records generated by ~62k lines of TableGen. The AMDGPU backend description contains ~39k records generated by ~24k lines of TableGen.

To get an idea of what TableGen looks like, consider this simple example:

def Plain {
int x = 5;

class Room<string name> {
string Name = name;
string WallColor = "white";

def lobby : Room<"Lobby">;

multiclass Floor<int num, string color> {
let WallColor = color in {
def _left : Room<num # "_left">;
def _right : Room<num # "_right">;

defm first_floor : Floor<1, "yellow">;
defm second_floor : Floor
<2, "gray">;

This example defines 6 records in total. If you have an LLVM build around, just run the above through llvm-tblgen to see them for yourself. The first one has name Plain and contains a single value named x of value 5. The other 5 records have Room as a superclass and contain different values for Name and WallColor.

The first of those is the record of name lobby, whose Name value is "Lobby" (note the difference in capitalization) and whose WallColor is "white".

Then there are four records with the names first_floor_left, first_floor_right, second_floor_left, and second_floor_right. Each of those has Room as a superclass, but not Floor. Floor is a multiclass, and multiclasses are not classes (go figure!). Instead, they are simply collections of record prototypes. In this case, Floor has two record prototypes, _left and _right. They are instantiated by each of the defm directives. Note how even though def and defm look quite similar, they are conceptually different: one instantiates the prototypes in a multiclass (or several multiclasses), the other creates a record that may or may not have one or more superclasses.

The Name value of first_floor_left is "1_left" and its WallColor is "yellow", overriding the default. This demonstrates the late-binding nature of TableGen, which is quite useful for modeling exceptions to an otherwise regular structure:

class Foo {
string salutation = "Hi";
string message = salutation#", world!";

def : Foo {
salutation = "Hello";

The message of the anonymous record defined by the def-statement is "Hello, world!".

There is much more to TableGen. For example, a particularly surprising but extremely useful feature are the bit sets that are used to describe instruction encodings. But that's for another time.

For now, let me leave you with just one of the many ridiculous inconsistencies in TableGen:

class Tag<int num> {
int Number = num;

class Test<int num> {
int Number1 = Tag<5>.Number;
int Number2 = Tag<num>.Number;
Tag Tag1 = Tag<5>;
Tag Tag2 = Tag<num>;

def : Test<5>;

What are the values in the anonymous record? It turns out that Number1 and Number2 are both 5, but Tag1 and Tag2 refer to different records. Tag1 refers to an anonymous record with superclass Tag and Number equal to 5, while Tag2 also refers to an anonymous record, but with the Number equal to an unresolved variable reference.

This clearly doesn't make sense at all and is the kind of thing that sometimes makes you want to just throw it all out of the window and build your own DSL with blackjack and Python hooks. The problem with that kind of approach is that even if the new thing looks nicer initially, it'd probably end up in a similarly messy state after another five years.

So when I ran into several problems like the above recently, I decided to take a deep dive into the internals of TableGen with the hope of just fixing a lot of the mess without reinventing the wheel. Over the next weeks, I plan to write a couple of focused entries on what I've learned and changed, starting with how a simple form of functional programming should be possible in TableGen.

21 Feb 2018 10:24am GMT

20 Feb 2018


Robert Foss: APA102 LED Current Usage

Alt text

What we're seeing here is a the LED being fully off (albeit with floating clock and data inputs), drawing somewhere between 0.7-1 mA.

I was quite surprised to see such a high quiescent current.

For the APA102 2020 which has a 20x20mm footprint this is somewhat disappointing, not because it is worse than the normal 50x50 APA102 variants, but rather because the small footprint begs for the IC to be used in wearables and other power consumption sensitive applications.


So this is the very simple setup I was using. It's nothing fancy; a multimeter set to the mA range, connected between the power supply and the APA102 breakout board I happened to have laying around.

Alt text

20 Feb 2018 9:26pm GMT

16 Feb 2018


Dave Airlie (blogspot): virgl caps - oops I messed.up

When I designed virgl I added a capability system to pass some info about the host GL to the guest driver along the lines of gallium caps. The design was at the virtio GPU level you have a number of capsets each of which has a max version and max size.

The virgl capset is capset 1 with max version 1 and size 308 bytes.

Until now we've happily been using version 1 at 308 bytes. Recently we decided we wanted to have a v2 at 380 bytes, and the world fell apart.

It turned out there is a bug in the guest kernel driver, it asks the host for a list of capsets and allows guest userspace to retrieve from it. The guest userspace has it's own copy of the struct.

The flow is:
Guest mesa driver gives kernel a caps struct to fill out for capset 1.
Kernel driver asks the host over virtio for latest capset 1 info, max size, version.
Host gives it the max_size, version for capset 1.
Kernel driver asks host to fill out malloced memory of the max_size with the
caps struct.
Kernel driver copies the returned caps struct to userspace, using the size of the returned host struct.

The bug is the last line, it uses the size of the returned host struct which ends up corrupting the guest in the scenario where the host has a capset 1 v2, size 380, but the host is still running old userspace which understands capset v1, size 308.

The 380 bytes gets memcpy over the 308 byte struct and boom.

Now we can fix the kernel to not do this, but we can't upgrade every kernel in an existing VM. So if we allow the virglrenderer process to expose a v2 all older sw will explode unless it is also upgraded which isn't really something you want in a VM world.

I came up with some virglrenderer workarounds, but due to another bug where qemu doesn't reset virglrenderer when it should, there was no way to make it reliable, and things like kexec old kernel from new kernel would blow up.

I decided in the end to bite the bullet and just make capset 2 be a repaired one. Unfortunately this needs patches in all 4 components before it can be used.

1) virglrenderer needs to expose capset 2 with the new version/size to qemu.
2) qemu needs to allow the virtio-gpu to transfer capset 2 as a virgl capset to the host.
3) The kernel on the host needs fixing to make sure we copy the minimum of the host caps and the guest caps into the guest userspace driver, then it needs to
provide a way that guest userspace knows the fixed version is in place.
4) The guest userspace needs to check if the guest kernel has the fix, and then query capset 2 first, and fallback to querying capset 1.

After talking to a few other devs in virgl land, they pointed out we could probably just never add a new version of capset 2, and grow the struct endlessly.

The guest driver would fill out the struct it wants to use with it's copy of default minimum values.
It would then call the kernel ioctl to copy over the host caps. The kernel ioctl would copy the minimum size of the host caps and the guest caps.

In this case if the host has a 400 byte capset 2, and the guest still only has 380 byte capset 2, the new fields from the host won't get copied into the guest struct
and it will be fine.

If the guest has the 400 byte capset 2, but the host only has the 380 byte capset 2, the guest would preinit the extra 20 bytes with it's default values (0 or whatever) and the kernel would only copy 380 bytes into the start of the 400 bytes and leave the extra bytes alone.

Now I just have to got write the patches and confirm it all.

Thanks to Stephane at google for creating the patch that showed how broken it was, and to others in the virgl community who noticed how badly it broke old guests! Now to go write the patches...

16 Feb 2018 12:11am GMT

14 Feb 2018


Hans de Goede: i915 driver Panel Self Refresh (PSR) status update

Hi All,

First of Thank you to everyone who has been sending me PSR test results, I've received well over a 100 reports!

Quite a few testers have reported various issues when enabling PSR, 3 often reported issues are:

The Intel graphics team has been working on a number of fixes which make PSR work better in various cases. Note we don't expect this to fix it everywhere, but it should get better and work on more devices in the near future.

This is good news, but the bad news is that this means that all the tests people have so very kindly done for me will need to be redone once the new improved PSR code is ready for testing. I will do a new blogpost (and email people who have send me test-reports), when the new
PSR code is ready for people to (re-)test (sorry).



14 Feb 2018 8:57am GMT

13 Feb 2018


Robert Foss: Fixing .zsh_history corruption

zsh: corrupt history file /home/$USER/.zsh_history

Most zsh user will have seen the above line at one time or another.
And it means that re-using your shell history is no longer possible.

Maybe some of it can be recovered, but more than likely some has been lost. And even if nothing important has been lost, you probably don't want to spend any time dealing with this.

Make zsh maintain a backup

Run this snippet in the terminal of your choice.

cat <<EOT>> ~/.zshrc

# Backup and restore ZSH history
strings ~/.zsh_history | sed ':a;N;$!ba;s/\\\\\n//g' | sort | uniq -u > ~/.zsh_history.backup
cat ~/.zsh_history ~/.zsh_history.backup | sed ':a;N;$!ba;s/\\\\\n//g'| sort | uniq > ~/.zsh_history


What does this actually do?

The snippet …

13 Feb 2018 7:20pm GMT

12 Feb 2018


Alan Coopersmith: posix_spawn() as an actual system call


As a developer, there are always those projects when it is hard to find a way to go forward. Drop the project for now and find another project, if only to rest your eyes and find yourself a new insight for the temporarily abandoned project. This is how I embarked on posix_spawn() as an actual system call you will find in Oracle Solaris 11.4. The original library implementation of posix_spawn() uses vfork(), but why care about the old address space if you are not going to use it? Or, worse, stop all the other threads in the process and don't start them until exec succeeded or when you call exit()?

As I had already written kernel modules for nefarious reason to run executables directly from the kernel, I decided to benchmark the simple "make process, execute /bin/true" against posix_spawn() from the library. Even with two threads, posix_spawn() scaled poorly: additional threads did not allow a large number of additional spawns per second.

Starting a new process

All ways to start a new process need to copy a number of process properties: file descriptors, credentials, priorities, resource controls, etc.

The original way to start a new process is fork(); you will need to mark all the pages as copy-on-write (O(n) in the size of the number of pages in the process) and so this gets more and more expensive when the process get larger and larger. In Solaris we also reserve all the needed swap; a large process calling fork() doubles its swap requirement.

In BSD vfork() was introduced; it borrows the address space and was cheap when it was invented. In much larger processes with hundreds of threads, it became more and more of bottleneck. Dynamic linking also throws a spanner in the works: what you can do between vfork() and the final exec() is extremely small.

In the standard universe, posix_spawn() was invented; it was aimed mostly at small embedded systems and a very number of specific actions can be performed before the new executable is run. As it was part of the standard, Solaris grew its own copy build on top of vfork(). It has, of course, the same problems as vfork() has; but because it is implemented in the library we can be sure we steer clear from all the other vfork() pitfalls.

Native spawn(2) call

The native spawn(2) system introduced in Oracle Solaris 11.4 shares a lot of code with the forkx(2) and execve(2). It mostly avoids doing those unneeded operations:

The exec() call copies from its own address space but when spawn(2) needs the argument, it is already in a new process. So early in the spawn(2) system call we copy the environment vector and the arguments and save them away. The data blob is given to the child and the parent waits until the client is about to return from the system call in the new process or when it decides that it can't actually exec and calls exit instead.

A process can spawn(2) in all its threads and the concurrently is only limited by locks that need to be held shortly when processes are created.

The performance win depends on the application; you won't win anything unless you use posix_spawn(); I was very happy to see that our standard shell is using posix_spawn() to start new processes as do popen(3c) as well as system(3c) so the call is well tested. The more threads you have, the bigger the win. Stopping a thread is expensive, especially if it hold up in a system call. The world used to stop but now it just continues.

Support in truss(1), mdb(1)

When developing a new system call special attention needs to be given to proc(5) and truss(1) interaction. The spawn(2) system call is an exception but only because it is much harder to get it right; support is also needed in debuggers or they won't see a new process starting. This includes mdb(1) but also truss(1). They also need to learn that when spawn(2) succeeds, that they are stopping in a completely different executable; we may also have crossed a privilege boundary, e.g., when spawning su(8) or ping(8).

12 Feb 2018 5:00pm GMT

Eric Anholt: 2018-02-12

I spent the end of January gearing up for LCA, where I gave a talk about what I've done in Broadcom graphics since my last LCA talk 3 years earlier. Video is here.

(Unfortunately, I failed to notice the time countdown, so I didn't make it to my fun VC5 demo, which had to be done in the hallway after)

I then spent the first week of February in Cambridge at the Raspberry Pi office working on vc4. The goal was to come up with a plan for switching to at least the "fkms" mode with no regressions, with a route to full KMS by default.

The first step was just fixing regressions for fkms in 4.14. The amusing one was mouse lag, caused by us accidentally syncing mouse updates to vblank, and an old patch to reduce HID device polling to ~60fps having been accidentally dropped in the 4.14 rebase. I think we should be at parity-or-better compared to 4.9 now.

For full KMS, the biggest thing we need to fix is getting media decode / camera capture feeding into both VC4 GL and VC4 KMS. I wrote some magic shader code to turn linear Y/U/V or Y/UV planes into tiled textures on the GPU, so that they can be sampled from using GL_OES_EGL_image_external. The kmscube demo works, and working with Dave Stevenson I got a demo mostly working of H.264 decode of Big Buck Bunny into a texture in GL on X11.

While I was there, Dave kept hammering away at the dma-buf sharing work he's been doing. Our demo worked by having a vc4 fd create the dma-bufs, and importing that into vcsm (to talk MMAL to) and into the vc4 fd used by Mesa (mmal needs the buffers to meet its own size restrictions, so VC4 GL can't do the allocations for it). The extra vc4 fd is a bit silly - we should be able to take vcsm buffers and export them to vc4.

Also, if VCSM could do CMA allocations for us, then we could potentially have VCSM take over the role of allocating heap for the firmware, meaning that you wouldn't need big permanent gpu_mem= memory carveouts in order for camera and video to work.

Finally, on the last day Dave got a bit distracted and put together VC4 HVS support for the SAND tiling modifier. He showed me a demo of BBB H.264 decode directly to KMS on the console, and sent me the patch. I'll do a little bit of polish, and send it out once I get back from vacation.

We also talked about plans for future work. I need to:

Dave plans to:

Someone needs to:

Finally, other little updates:

12 Feb 2018 12:30am GMT

11 Feb 2018


Rob Clark: Infrequent freedreno update

As is usually the case, I'm long overdue for an update. So this covers the last six(ish) months or so. The first part might be old news if you follow phoronix.

Older News

In the last update, I mentioned basic a5xx compute shader support. Late last year (and landing in the mesa 18.0 branch) I had a chance to revisit compute support for a5xx, and finished:
  • image support
  • shared variable support
  • barriers, which involved some improvements to the ir3 instruction scheduler so barriers could be scheduled in the correct order (ie. for various types of barriers, certain instructions can't be move before/after the related barrier
There were also some semi-related SSBO fixes, and additional r/e of instruction encodings, in particular for barriers (new cat7 group of instructions) and image vs SSBO (where different variation of the cat6 instruction encoding are used for images vs SSBOs).

Also I r/e'd and added support for indirect compute, indirect draw, texture-gather, stencil textures, and ARB_framebuffer_no_attachments on a5xx. Which brings us pretty close to gles31 support. And over the holiday break I r/e'd and implemented tiled texture support, because moar fps ;-)

Ilia Mirkin also implemented indirect draw, stencil texture, and ARB_framebuffer_no_attachments for a4xx. Ilia and Wladimir J. van der Laan also landed a handful of a2xx and a20x fixes. (But there are more a20x fixes hanging out on a branch which we still need to rebase and merge.) It is definitely nice seeing older hw, which blob driver has long since dropped support for, getting some attention.

Other News

Not exactly freedreno related, but probably of some interest to freedreno users.. in the 4.14 kernel, my qcom_iommu driver finally landed! This was the last piece to having the gpu working on a vanilla upstream kernel on the dragonboard 410c. In addition, the camera driver also landed in 4.14, and venus, the v4l2 mem-to-mem driver for hw video decode/encode landed in 4.13. (The venus driver also already has support for db820c.)

Fwiw, the v4l2 mem-to-mem driver interface is becoming the defacto standard for hw video decode/encode on SoC's. GStreamer has had support for a long time now. And more recently ffmpeg (v3.4) and kodi have gained support:

When I first started on freedreno, qcom support for upstream kernel was pretty dire (ie. I think serial console support might have worked on some ancient SoC). When I started, the only kernel that I could use to get the gpu running was old downstream msm android kernels (initially 2.6.35, and on later boards 3.4 and 3.10). The ifc6410 was the first board that I (eventually) could run an upstream kernel (after starting out with an msm-3.4 kernel), and the db410c was the first board I got where I never even used an downstream android kernel. Initially db410c was upstream kernel with a pile of patches, although the size of the patchset dropped over time. With db820c, that pattern is repeating again (ie. the patchset is already small enough that I managed to easily rebase it myself for after 4.14). Linaro and qcom have been working quietly in the background to upstream all the various drivers that something like drm/msm depend on to work (clk, genpd, gpio, i2c, and other lower level platform support). This is awesome to see, and the linaro/qcom developers behind this progress deserve all the thanks. Without much fanfare, snapdragon has gone from a hopeless case (from upstream perspective) to one of the better supported platforms!

Thanks to the upstream kernel support, and u-boot/UEFI support which I've mentioned before, Fedora 27 supports db410c out of the box (and the situation should be similar with other distro's that have new enough kernel (and gst/ffmpeg/kodi if you care about hw video decode). Note that the firmware for db410c (and db820c) has been merged in linux-firmware since that blog post.

More Recent News

More recently, I have been working on a batch of (mostly) compiler related enhancements to improve performance with things that have more complex shaders. In particular:
  • Switch over to NIR's support for lowering phi-web's to registers, instead of dealing with phi instructions in ir3. NIR has a much more sophisticated pass for coming out of SSA, which does a better job at avoiding the need to insert extra MOV instructions, although a bunch of RA (register allocation) related fixes were required. The end result is fewer instructions in resulting shader, and more importantly a reduction in register usage.
  • Using NIR's peephole_select pass to lower if/else, instead of our own pass. This was a pretty small change (although it took some work to arrive at a decent threshold). Previously the ir3_nir_lower_if_else pass would try to lower all if/else to select instructions, but in extreme cases this is counter-productive as it increases register pressure. (Background: in simple cases for a GPU, executing both sides of an if/else and using a select instruction to choose the results makes sense, since GPUs tend to be a SIMT arch, and if you aren't executing both sides, you are stalling threads in a warp that took the opposite direction in the if/else.. but in extreme cases this increases register usage which reduces the # of warps in flight.) End result was 4x speedup in alu2 benchmark, although in the real world it tends to matter less (ie. most shaders aren't that complex).
  • Better handling of sync flags across basic blocks
  • Better instruction scheduling across basic blocks
  • Better instruction scheduling for SFU instructions (ie. sqrt, rsqrt, sin, cos, etc) to avoid stalls on SFU.
  • R/e and add support for (sat)urate flag flag (to avoid extra sequence of min.f + max.f instructions to clamp a result)
  • And a few other tweaks.
The end results tend to depend on how complex the shaders that a game/benchmark uses. At the extreme high end, 4x improvement for alu2. On the other hand, probably doesn't make much difference for older games like xonotic. Supertuxkart and most of the other gfxbench benchmarks show something along the lines of 10-20% improvement. Supertuxkart, in particular, with advanced pipeline, the combination of compiler improvements with previous lrz and tiled texture (ie. FD_MESA_DEBUG=lrz,ttile) is a 30% improvement! Some of the more complex shaders I've been looking at, like shadertoy piano, show 25% improvement on the compiler changes alone. (Shadertoy isn't likely to benefit from lrz/ttile since it is basically just drawing a quad with all the rendering logic in the fragment shader.)

In other news, things are starting to get interesting for snapdragon 845 (sdm845). Initial patches for a6xx GPU support have been posted (although I still need to my hands on a6xx hw to start r/e for userspace, so those probably won't be merged soon). And qcom has drm/msm display support buried away in their msm-4.9 tree (expect to see first round of patches for upstream soon.. it's a lot of code, so expect some refactoring before it is merged, but good to get this process started now).

11 Feb 2018 10:46pm GMT

09 Feb 2018


Robert Foss: Virtualizing GPU Access

For the past few years a clear trend of containerization of applications and services has emerged. Having processes containerized is beneficial in a number of ways. It both improves portability and strengthens security, and if done properly the performance penalty can be low.

In order to further improve security containers are commonly run in virtualized environments. This provides some new challenges in terms of supporting the accelerated graphics usecase.

OpenGL ES implementation

Currently Collabora and Google are implementing OpenGL ES 2.0 support. OpenGL ES 2.0 is the lowest common denominator for many mobile platforms and as such is a requirement for Virgil3D to be viable on the those platforms.

That is is the motivation for making Virgil3D work on OpenGL ES hosts.

How …

09 Feb 2018 10:17am GMT

08 Feb 2018


Alan Coopersmith: Installing Packages — Oracle Solaris 11.4 Beta

We've been getting random questions about how to install (Oracle Solaris) packages onto their newly installed Oracle Solaris 11.4 Beta. And of course key is pointing to the appropriate IPS repository.

One of the options is to download the full repository and install it on it's own locally or add this to an existing local repository and then just point the publisher to this local repository. This is mostly used by folks who have a test system/LDom/Kernel Zone where they will probably have one or more local repositories already.

However experience shows that a large percentage of folks testing a beta version like this do this in a VirtualBox instance on their laptop or workstation. And because of this they want to use the Gnome Desktop rather than remotely logging through ssh. So one of the things we do is supply an Oracle VM Template for VirtualBox which already has the solaris-desktop group package installed (officially the group/system/solaris-desktop) so it shows more than the console when started and give you the ability to run desktop like tasks like Firefox and a Terminal. (Btw as per the Release Notes on Runtime Issues there's a glitch with gnome-terminal you might run into and you'd need to run a workaround to get it working.)

For this group of VirtualBox based testers the chances are high that they're not going to have a local repository nearby, especially on a laptop that's moving around. This is where using our central repository at pkg.oracle.com is very useful which is well described in the Oracle Solaris documentation.

However going through this there may be some minor obstacles to clear when using this method that aren't directly part of the process but get in the way when using the VirtualBox installed OVM Template.

First, when using the Firefox browser to request certificates and download certificates and later point to the repository you'll need to have DNS working and depending on the install the DNS client may not yet be enabled. Here's how you check it:

demo@solaris-vbox:~$ svcs dns/client STATE STIME FMRI disabled 5:45:26 svc:/network/dns/client:default

This is fairly simple to solve. First check that the Oracle Solaris instance has correctly picked up the DNS information from VirtualBox in the DHCP process buy looking in /etc/resolv.conf. If that looks good simply enable the dns/client service:

demo@solaris-vbox:~$ sudo svcadm enable dns/client

You'll be asked for your password and then it will be enabled. Note you can also use pfexec(1) instead of sudo(8). This will also check if your user has the appropriate privileges.

You can check if the service is running:

demo@solaris-vbox:~/Downloads$ svcs dns/client STATE STIME FMRI online 10:21:16 svc:/network/dns/client:default

Now DNS is running you should be able to ping pkg.oracle.com.

The second gotya is that on the pkg-register.oracle.com page the Oracle Solaris 11.4 Beta repository is at the very bottom of the list of available repositories and should not be confused with the Oracle Solaris 11 Support repository (to which you may already have requested access) listed at the top of the page.

The same certificate/key pair are used for any of the Oracle Solaris repositories, however in order permit the use of the any existing cert/key pair the license for the Oracle Solaris 11.4 Beta repository must be accepted. This means selecting the 'Request Access' button next to the Solaris 11.4 Beta repository entry.

Once you have the cert/key, or you have accepted the license, then you can configure the beta repository as:

pkg set-publisher -k <your-key> -c <your-cert> -g https://pkg.oracle.com/solaris/beta solaris

With the Virtual Box image the default repository setup includes the 'release' repository. It is best to remove that:

pkg set-publisher -G http://pkg.oracle.com/solaris/release solaris

This can be performed in one command:

pkg set-publisher -k <your-key> -c <your-cert> -G http://pkg.oracle.com/solaris/release\ -g https://pkg.oracle.com/solaris/beta solaris

Note that here too you'll need to either use pfexec(1) or sudo(8) again. This should kickoff the pkg(1) command and once it's done you can check it's status with:

demo@solaris-vbox:~/Downloads$ pkg publisher solaris Publisher: solaris Alias: Origin URI: https://pkg.oracle.com/solaris/beta/ Origin Status: Online SSL Key: /var/pkg/ssl/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx SSL Cert: /var/pkg/ssl/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx Cert. Effective Date: January 29, 2018 at 03:04:58 PM Cert. Expiration Date: February 6, 2020 at 03:04:58 PM Client UUID: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx Catalog Updated: January 24, 2018 at 02:09:16 PM Enabled: Yes

And now you're up and running.

A final thought, if for example you've chosen to install the Text Install version of the Oracle Solaris 11.4 Beta because you want yo have a nice minimal install with no overhead of Gnome and things like that, you can also download the key and certificate to another system or the hosting OS (in the case you're using VirtualBox) and then rsync or rcp them across and then follow all the same steps.

08 Feb 2018 5:00pm GMT

05 Feb 2018


Alan Coopersmith: System maintenance — evacuate all Zones!

The number one use case for live migration today is for evacuation: when a Solaris Zones host needs some maintenance operation that involves a reboot, then the zones are live migrated to some other willing host. This avoids scheduling simultaneous maintenance windows for all the services provided by those zones.

Implementing this today on Solaris 11.3 involves manually migrating zones with individual zoneadm migrate commands, and especially, determining suitable destinations for each of the zones. To make this common scenario simpler and less error prone, Solaris 11.4 Beta comes with a new command sysadm(8) for system maintenance that also allows for zone evacuation.

The basic idea of how it is supposed to be used is like this:

# pkg update ... # sysadm maintain -s -m "updating to new build" # sysadm evacuate -v Evacuating 3 zones... Migrating myzone1 to rads://destination1/ ... Migrating myzone3 to rads://destination1/ ... Migrating myzone4 to rads://destination2/ ... Done in 3m30s. # reboot ... # sysadm maintain -e # sysadm evacuate -r ...

When in maintenance mode, an attempt to attach or boot any zone is refused: if the admin is trying to move zones off the host, it's not helpful to allow incoming zones. Note that this maintenance mode is recorded system-wide, not just in the zones framework; even though the only current impact is on zones, it seems likely other sub-systems may find it useful in the future.

To set up an evacuation target for a zone, an SMF property evacuation/target for a given zone service instance system/zones/zone:<zone-name> must be set to the target host. You can either use rads:// or ssh:// location identifier, e.g.: ssh://janp@mymachine.mydomain.com. Do not forget to refresh the service instance for your change to take effect.

You can evacuate running Kernel Zones and both installed native and Kernel Zones. The evacuation always means evacuating running zones, and with the option -a, installed zones are included as well. Only those zones with the set evacuation/target property in their service instance are scheduled for evacuation. However, if any of the running zone (and also installed if the evacuate -a is used) is not set with the property, the overall result of the evacuation will be reported as failed by sysadm which is logical as an evacuation by its definition means to evacuate everything.

As live zone migration does not support native zones, those can be only evacuated in the installed state. Also note that you can only evacuate zones installed on shared storage. For example, on iSCSI volumes. See the storage URI manual page, suri(7), for information on what other shared storage is supported. Note that you can install Kernel Zones to NFS files as well.

To setup live Kernel Zone migration, please check out Migrating an Oracle Solaris Kernel Zone section of the 11.4 online documentation.

Now, let's see a real example. We have a few zones on host nacaozumbi. All running and installed zones are on shared storage, including the native zone tzone1 and Kernel Zone evac1:

root:nacaozumbi:~# zonecfg -z tzone1 info rootzpool rootzpool: storage: iscsi://saison/luname.naa.600144f0dbf8af1900005582f1c90007 root:nacaozumbi::~$ zonecfg -z evac1 info device device: storage: iscsi://saison/luname.naa.600144f0dbf8af19000058ff48060017 id: 1 bootpri: 0 root:nacaozumbi:~# zoneadm list -cv ID NAME STATUS PATH BRAND IP 0 global running / solaris shared 82 evac3 running - solaris-kz excl 83 evac1 running - solaris-kz excl 84 evac2 running - solaris-kz excl - tzone1 installed /system/zones/tzone1 solaris excl - on-fixes configured - solaris-kz excl - evac4 installed - solaris-kz excl - zts configured - solaris-kz excl

Zones not set for evacution were detached - ie. on-fixes and zts. All running and installed zones are set to be evacuated to bjork, for example:

root:nacaozumbi:~# svccfg -s system/zones/zone:evac1 listprop evacuation/target evacuation/target astring ssh://root@bjork

Now, let's start the maintenance window:

root:nacaozumbi:~# sysadm maintain -s -m "updating to new build" root:nacaozumbi:~# sysadm maintain -l TYPE USER DATE MESSAGE admin root 2018-02-02 01:10 updating to new build

At this point we can no longer boot or attach zones on nacaozumbi:

root:nacaozumbi:~# zoneadm -z on-fixes attach zoneadm: zone 'on-fixes': attach prevented due to system maintenance: see sysadm(8)

And that also includes migrating zones to nacaozumbi:

root:bjork:~# zoneadm -z on-fixes migrate ssh://root@nacaozumbi zoneadm: zone 'on-fixes': Using existing zone configuration on destination. zoneadm: zone 'on-fixes': Attaching zone. zoneadm: zone 'on-fixes': attach failed: zoneadm: zone 'on-fixes': attach prevented due to system maintenance: see sysadm(8)

Now we start evacuating all the zones. In this example, all running and installed zones have their service instance property evacuation/target set. The option -a means all the zones, that is including those installed. The -v option provides verbose output.

root:nacaozumbi:~# sysadm evacuate -va sysadm: preparing 5 zone(s) for evacuation ... sysadm: initializing migration of evac1 to bjork ... sysadm: initializing migration of evac3 to bjork ... sysadm: initializing migration of evac4 to bjork ... sysadm: initializing migration of tzone1 to bjork ... sysadm: initializing migration of evac2 to bjork ... sysadm: evacuating 5 zone(s) ... sysadm: migrating tzone1 to bjork ... sysadm: migrating evac2 to bjork ... sysadm: migrating evac4 to bjork ... sysadm: migrating evac1 to bjork ... sysadm: migrating evac3 to bjork ... sysadm: evacuation completed successfully. sysadm: evac1: evacuated to ssh://root@bjork sysadm: evac2: evacuated to ssh://root@bjork sysadm: evac3: evacuated to ssh://root@bjork sysadm: evac4: evacuated to ssh://root@bjork sysadm: tzone1: evacuated to ssh://root@bjork

While being evacuated, you can check the state of evacuation like this:

root:nacaozumbi:~# sysadm evacuate -l sysadm: evacuation in progress

After the evacuation is done, you can also see the details like this (for example, in case you did not run it in verbose mode):

root:nacaozumbi:~# sysadm evacuate -l -o ZONENAME,STATE,DEST ZONENAME STATE DEST evac1 EVACUATED ssh://root@bjork evac2 EVACUATED ssh://root@bjork evac3 EVACUATED ssh://root@bjork evac4 EVACUATED ssh://root@bjork tzone1 EVACUATED ssh://root@bjork

And you can see all the evacuated zones are now in the configured state on the source host:

root:nacaozumbi:~# zoneadm list -cv ID NAME STATUS PATH BRAND IP 0 global running / solaris shared - tzone1 configured /system/zones/tzone1 solaris excl - evac1 configured - solaris-kz excl - on-fixes configured - solaris-kz excl - evac4 configured - solaris-kz excl - zts configured - solaris-kz excl - evac3 configured - solaris-kz excl - evac2 configured - solaris-kz excl

And the migrated zones are happily running or in the installed state on host bjork:

jpechane:bjork::~$ zoneadm list -cv ID NAME STATUS PATH BRAND IP 0 global running / solaris shared 57 evac3 running - solaris-kz excl 58 evac1 running - solaris-kz excl 59 evac2 running - solaris-kz excl - on-fixes installed - solaris-kz excl - tzone1 installed /system/zones/tzone1 solaris excl - zts installed - solaris-kz excl - evac4 installed - solaris-kz excl

The maintenance state is still held at this point:

root:nacaozumbi:~# sysadm maintain -l TYPE USER DATE MESSAGE admin root 2018-02-02 01:10 updating to new build

Upgrade the system with a new boot environment unless you did that before (which you should have to keep the time your zones are running on the other host to minimum):

root:nacaozumbi:~# pkg update --be-name=.... -C0 entire@... root:nacaozumbi:~# reboot

Now, finish the maintenance mode.

root:nacaozumbi:~# sysadm maintain -e

And as the final step, return all the evacuated zones now. As explained before, you would not be able to do it if still in the maintenace mode.

root:nacaozumbi:~# sysadm evacuate -ra sysadm: preparing zones for return ... 5/5 sysadm: returning zones ... 5/5 sysadm: return completed successfully.

Possible enhancements for the future we are considering include specifying multiple targets and a spread policy, with a resource utilisation comparison algorithm that would consider CPU arch, RAM and CPU resources.

05 Feb 2018 5:00pm GMT

Alan Coopersmith: What is this BUI thing anyway?

This is part two in my series of posts about Solaris Analytics in the Solaris 11.4 release. You may find part one here.

The Solaris Analytics WebUI (or "bui" for short) is what we use to tie together all our data gathering from the Stats Store. Comprised of two web apps (titled "Solaris Dashboard" and "Solaris Analytics"), enable the webui service via # svcadm enable webui/server

Once the service is online, point your browser at and log in. [Note that the self-signed certificate is that generated by your system, and adding an exception for it in your browser is fine]. Rather than roll our own toolkit, we make use of Oracle Jet, which means we can keep a consistent look and feel across Oracle web applications.

After logging in, you'll see yourself at the Oracle Solaris Web Dashboard, which shows an overview of several aspects of your system, along with Faults (FMA) and Solaris Audit activity if your user has sufficient privileges to read them.

Mousing over any of the visualizations on this page will give you a brief description of what the visualization provides, and clicking on it will take you to a more detailed page.

If you click on the hostname in the top bar (next to Applications), you'll see what we call the Host Drawer. This pulls information from svc:/system/sysstat.

Click the 'x' on the top right to close the drawer.

Selecting Applications / Solaris Analytics will take you to the main part of the bui:

I've select the NFS Client sheet, resulting in the dark shaded box on the right popping up with a description of what the sheet will show you.

Building blocks: faults, utilization and audit events
In the previous installment I mentioned that we wanted to provide a way for you to tie together the many sources of information we provide, so that you could answer questions about your system. This is a small example of how you can do so.

The host these screenshots were taken from is a single processor, four-core Intel-based workstation. In a terminal window I ran # psradm -f 3 Followed a few minutes later by # psradm -n 3
You can see those events marked on each of the visualizations with a blue triangle here:

Now if I mouseover the triangle marking the second offline/online pair, in the Thread Migrations viz, I can see that the system generated a Solaris Audit event:

This allows us to observe that the changes in system behaviour (primarily load average and thread migrations across cores) were correlated with the offlining of a cpu core.

Finally, let's have a look at the Audit sheet. To view the stats on this page, you need to login to the bui as a suitably-privileged user - either root, or a user with the solaris.sstore.read.sensitive privileges.

# usermod -A +solaris.sstore.read.sensitive $USER

For this screenshot I not only redid the psradm operations from earlier, I also tried making an ssh connection with an unknown user, and logged in on another of this system's virtual consoles. There are many other things you could observe with the audit subsystem; this is just a glimpse:

Tune in next time for a discussion of using the C and Python bindings to the Stats Store so you can add your own statistics.

05 Feb 2018 2:00pm GMT