19 Mar 2018


Alan Coopersmith: One SMF Service to Monitor the Rest!

Contributed by: Thejaswini Kodavur

Have you ever wondered if there was a single service that monitors all your other services and makes administration easier? If yes then "SMF goal services", a new feature of Oracle Solaris 11.4, is here to provide a single, unambiguous, and well-defined point where one can consider the system up and running. You can choose your customized, mission critical services and link them together into a single SMF service in one step. This SMF service is called a goal service. It can be used to monitor the health of your system upon booting up. This makes administration much easier as monitoring each of the services individually is no longer required!

There are two ways in which you can make your services part of a goal service.

1. Using the supplied Goal Service

By default Oracle Solaris 11.4 system provides you a goal service called "svc:/milestone/goals:default". This goal service has a dependency on the service "svc:/milestone/multi-user-server:default" by default.

You can set your mission critical service to the default goal service as below:

# svcadm goals system/my-critical-service-1:default

Note: This is a set/clear interface. Therefore the above command will clear the dependency from "svc:/milestone/multi-user-server:default".

In order to set the dependency on both the services use:

# svcadm goals svc:/milestone/multi-user-server:default \ system/my-critical-service-1:default 2. Creating you own Goal Service

Oracle Solaris 11.4 allows you to create your own goal service and set your mission critical services as dependent services. Follow the below steps to create and use a goal service.

# svcbundle -o new-gs.xml -s service-name=milestone/new-gs -s start-method=":true" # cp new-gs.xml /lib/svc/manifest/site/new-gs.xml # svccfg validate /lib/svc/manifest/site/new-gs.xml # svcadm restart svc:/system/manifest-import # svcs new-gs STATE STIME FMRI online 6:03:36 svc:/milestone/new-gs:default

# svcadm disable svc:/milestone/new-gs:default # svccfg -s svc:/milestone/new-gs:default setprop general/goal-service=true # svcadm enable svc:/milestone/new-gs:default

# svcadm goals -g svc:/milestone/new-gs:default system/critical-service-1:default \ system/critical-service-2:default

Note: By omitting the -g option without specifying a goal service, you will set the dependency to the system provided default goal service, i.e svc:/milestone/multi-user-server:default.

# svcs -d milestone/new-gs STATE STIME FMRI disabled 5:54:31 svc:/system/critical-service-2:default online Feb_19 svc:/system/critical-service-1:default # svcs milestone/new-gs STATE STIME FMRI maintenance 5:54:30 svc:/milestone/new-gs:default

Note: You can use -d option of svcs(1) to check the dependencies on your goal service.

# svcs -d milestone/new-gs STATE STIME FMRI online Feb_19 svc:/system/critical-service-1:default online 5:56:39 svc:/system/critical-service-2:default # svcs milestone/new-gs STATE STIME FMRI online 5:56:39 svc:/milestone/new-gs:default

Note: For more information refer to "Goal Services" in smf(7) and subcommand goal in svcadm(8).

The goal service "milestone/new-gs" is your new single SMF service with which you can monitor all of your other mission critical services!

Thus, Goals Service acts as the headquarters that monitors the rest of your services.

19 Mar 2018 5:00pm GMT

18 Mar 2018


Alyssa Rosenzweig: Midgard Shaders with the Free NIR Compiler

In my last update on the Panfrost project, I showed an assembler and disassembler pair for Midgard, the shader architecture for Mali Txxx GPUs. Unfortunately, Midgard assembly is an arcane, unwieldly language, understood by Connor Abbott, myself, and that's about it besides engineers bound by nondisclosure agreements. You can read the low-level details of the ISA if you're interested.

In any case, what any driver really needs is not just an assembler but a compiler. Ideally, such a compiler would live in Mesa itself, capable of converting programs written in high level GLSL into an architecture-specific binary.

Such a mammoth task ought to be delayed until after we begin moving the driver into Mesa, through the Gallium3D infrastructure. In any event, back in January I had already begun such a compiler, ingesting NIR, an intermediate representation coincidentally designed by Connor himself. The past few weeks were spent improving and debugging this compiler until it produced correct, reasonably efficient code for both fragment and vertex shaders.

As of last night, I have reached this milestone for simple shaders!

As an example, an input fragment shader written in GLSL might look like:

uniform vec4 uni4;

void main() {
    gl_FragColor = clamp(
        vec4(1.3, 0.2, 0.8, 1.0) - vec4(uni4.z),
        0.0, 1.0);

Through the fully free compiler stack, passed through the free diaassembler for legibility, this yields:

vadd.fadd.sat r0, r26, -r23.zzzz
br_cond.write +0
fconstants 1.3, 0.2, 0.8, 1

vmul.fmov r0, r24.xxxx, r0
br_cond.write -1

This is the optimal compilation for this particular shader; the majority of that shader is the standard fragment epilogue which writes the output colour to the framebuffer.

For some background on the assembly, Midgard is a Very Long Instruction Word (VLIW) architecture. That is, multiple instructions are grouped together in blocks. In the disassembly, this is represented by spacing. Each line is an instruction, and blank lines delimit blocks.

The first instruction contains the entirety of the shader logic. Reading it off, it means "using the vector addition unit, perform the saturated floating point addition of the attached constants (register 26) and the negation of the z component of the uniform (register 23), storing the result into register 0". It's very compact, but comparing with the original GLSL, it should be clear where this is coming from. The constants are loaded at the end of the block with the fconstants meta instruction.

The other four instructions are the standard fragment epilogue. We're not entirely sure why it's so strange - framebuffer writes are fixed from the result of register 0, and are accomplished with a special loop using branching instruction. We're also not sure why the redundant move is necessary; Connor and I suspect there may be a hardware limitation or errata preventing a br_cond.write instruction from standing alone in a block. Thankfully, we do understand more or less what's going on, and they appear to be fixed. The compiler is able to generate it just fine, including optimising the code to write into register 0.

As for vertex shaders, well, fragment shaders are simpler than vertex shaders. Whereas the former merely has the aforementioned weird instruction sequence, vertex epilogues need to handle perspective division and viewport scaling, operations which are not implemented in hardware on this embedded GPU. When this is fully implemented, it will be quite a bit more difficult-to-optimise code in the output, although even the vendor compiler does not seem to optimise it. (Perhaps in time our vertex shaders could be faster than the vendor's compiled shader due to a smarter epilogue!)

Without further ado, an example vertex shader looks like:

attribute vec4 vin;
uniform vec4 u;

void main() {
    gl_Position = (vin + u.xxxx * vec4(0.01, -0.02, 0.0, 0.0)) * (1.0 / u.x);

Through the same stack and a stub vertex epilogue which assumes there is no perspective division needed (that the input is normalised device coordinates) and that the framebuffer happens to be the resolution 400x240, the compiler emits:

vmul.fmov r1, r24.xxxx, r26
fconstants 0, 0, 0, 0

ld_attr_32 r2, 0, 0x1E1E

vmul.fmul r4, r23.xxxx, r26
vadd.fadd r5, r2, r4
fconstants 0.01, -0.02, 0, 0

lut.frcp r6.x, r23.xxxx, #2.61731e-39
fconstants 0.01, -0.02, 0, 0

vmul.fmul r7, r5, r6.xxxx

vmul.fmul r9, r7, r26
fconstants 200, 120, 0.5, 0

vadd.fadd r27, r26, r9
fconstants 200, 120, 0.5, 1

st_vary_32 r1, 0, 0x1E9E

There is a lot of room for improvement here, but for now, the important part is that it does work! The transformed vertex (after scaling) must be written to the special register 27. Currently, a dummy varying store is emitted to workaround what appears to be yet another hardware quirk. (Are you noticing a trend here? GPUs are funky.). The rest of the code should be more or less intelligible by looking at the ISA notes. In the future, we might improve the disassembler to hide some of the internal encoding peculiarities, such as the dummy r24.xxxx and #0 arguments for fmov and frcp instructions respectively.

All in all, the compiler is progressing nicely. It is currently using a simple SSA-based intermediate representation which maps one-to-one with the hardware, minus details about register allocation and VLIW. This architecture will enable us to optimise our code as needed in the future, once we write a register allocators and instruction scheduler. A number of arithmetic (ALU) operations are supported, and although there is much work left to do - including generating texture instructions, which were only decoded a few weeks ago - the design is sound, clocking in at a mere 1500 lines of code.

The best part, of course, is that this is no standalone compiler; it is already sitting in our fork of mesa, using mesa's infrastructure. When the driver is written, it'll be ready from day 1. Woohoo!

Source code is available; get it while it's hot!

Getting the shader compiler to this point was a bigger time sink than anticipated. Nevertheless, we did do a bit of code cleanup in the meanwhile. On the command stream side, I began passing memory-resident structures by name rather than by address, slowly rolling out a basic watermark allocator. This step is revealing potential issues in the understanding of the command stream, preparing us for proper, non-replay-based driver development. Textures still remain elusive, unfortunately. Aside from that, however, much of - if not most - of the command stream is well-understood now. With the help of the shader compiler, basic 3D tests like test-triangle-msoothed are now almost entirely understood and for the most part devoid of magic.

Lyude Paul has been working on code clean-up specifically regarding the build systems. Her goal is to let new contributors play with GPUs, rather than fight with meson and CMake. We're hoping to attract some more people with low-level programming knowledge and some spare time to pitch in. (Psst! That might mean you! Join us on IRC!)

On a note of administrivia, the project name has been properly changed to Panfrost. For some history, over the summer two driver projects were formed: chai, by me, for Midgard; and BiOpenly, by Lyude et al, for Bifrost. Thanks to Rob Clark's matchmaking, we found each other and quickly realised that the two GPU architectures had identical command streams; it was only the shader cores that were totally redesigned and led to the rename. Thus, we merged to join efforts, but the new name was never officially decided.

We finally settled on the name "Panfrost", and our infrastructure is being changed to reflect this. The IRC channel, still on Freenode, now redirects to #panfrost. Additionally Freedesktop.org rolled out their new GitLab CE instance, of which we are the first users; you can find our repositories at the Panfrost organisation on the fd.o GitLab.

On Monday, our project was discussed in Robert Foss's talk "Progress in the Embedded GPU Ecosystem". Foss predicted the drivers would not be ready for another three years.

Somehow, I have a feeling it'll be much sooner!

18 Mar 2018 7:00am GMT

13 Mar 2018


Alan Coopersmith: Oracle Solaris 11.4 beta progress on LP64 conversion

Back in 2014, I posted Moving Oracle Solaris to LP64 bit by bit describing work we were doing then. In 2015, I provided an update covering Oracle Solaris 11.3 progress on LP64 conversion.

Now that we've released the Oracle Solaris 11.4 Beta to the public you can see the ratio of ILP32 to LP64 programs in /usr/bin and /usr/sbin in the full Oracle Solaris package repositories has dramatically shifted in 11.4:

Release 32-bit 64-bit total Solaris 11.0 1707 (92%) 144 (8%) 1851 Solaris 11.1 1723 (92%) 150 (8%) 1873 Solaris 11.2 1652 (86%) 271 (14%) 1923 Solaris 11.3 1603 (80%) 379 (19%) 1982 Solaris 11.4 169 (9%) 1769 (91%) 1938

That's over 70% more of the commands shipped in the OS which can use ADI to stop buffer overflows on SPARC, take advantage of more registers on x86, have more address space available for ASLR to choose from, are ready for timestamps and dates past 2038, and receive the other benefits of 64-bit software as described in previous blogs.

And while we continue to provide more features for 64-bit programs, such as making ADI support available in the libc malloc, we aren't abandoning 32-bit programs either. A change that just missed our first beta release, but is coming in a later refresh of our public beta will make it easier for 32-bit programs to use file descriptors > 255 with stdio calls, relaxing a long held limitation of the 32-bit Solaris ABI.

This work was years in the making, and over 180 engineers contributed to it in the Solaris organization, plus even more who came before to make all the FOSS projects we ship and the libraries we provide be 64-bit ready so we could make this happen. We thank all of them for making it possible to bring this to you now.

13 Mar 2018 5:37am GMT