29 Oct 2020

feedplanet.freedesktop.org

Mike Blumenkrantz: Invalidation

Buffering

I've got a lot of exciting stuff in the pipe now, but for today I'm just going to talk a bit about resource invalidation: what it is, when it happens, and why it's important.

Let's get started.

What is invalidation?

Resource invalidation occurs when the backing buffer of a resource is wholly replaced. Consider the following scenario under zink:

On a sane/competent driver, the second glBufferData call will trigger invalidation, which means that A.buffer will be replaced entirely, while A is still the driver resource used by Gallium to represent target.

When does invalidation occur?

Resource invalidation can occur in a number of scenarios, but the most common is when unsetting a buffer's data, as in the above example. The other main case for it is replacing the data of a buffer that's in use for another operation. In such a case, the backing buffer can be replaced to avoid forcing a sync in the command stream which will stall the application's processing. There's some other cases for this as well, like glInvalidateFramebuffer and glDiscardFramebufferEXT, but the primary usage that I'm interested in is buffers.

Why is invalidation important?

The main reason is performance. In the above scenario without invalidation, the second glBufferData call will write null to the whole buffer, which is going to be much more costly than just creating a new buffer.

That's it

Now comes the slightly more interesting part: how does invalidation work in zink?

Currently, as of today's mainline zink codebase, we have struct zink_resource to represent a resource for either a buffer or an image. One struct zink_resource represents exactly one VkBuffer or VkImage, and there's some passable lifetime tracking that I've written to guarantee that these Vulkan objects persist through the various command buffers that they're associated with.

Each struct zink_resource is, as is the way of Gallium drivers, also a struct pipe_resource, which is tracked by Gallium. Because of this, struct zink_resource objects themselves cannot be invalidated in order to avoid breaking Gallium, and instead only the inner Vulkan objects themselves can be replaced.

For this, I created struct zink_resource_object, which is an object that stores only the data that directly relates to the Vulkan objects, leaving struct zink_resource to track the states of these objects. Their lifetimes are separate, with struct zink_resource being bound to the Gallium tracker and struct zink_resource_object persisting for either the lifetime of struct zink_resource or its command buffer usage-whichever is longer.

Code

The code for this mechanism isn't super interesting since it's basically just moving some parts around. Where it gets interesting is the exact mechanics of invalidation and how struct zink_resource_object can be injected into an in-use resource, so let's dig into that a bit.

Here's what the pipe_context::invalidate_resource hook looks like:

static void
zink_invalidate_resource(struct pipe_context *pctx, struct pipe_resource *pres)
{
   struct zink_context *ctx = zink_context(pctx);
   struct zink_resource *res = zink_resource(pres);
   struct zink_screen *screen = zink_screen(pctx->screen);

   if (pres->target != PIPE_BUFFER)
      return;

This only handles buffer resources, but extending it for images would likely be little to no extra work.

   if (res->valid_buffer_range.start > res->valid_buffer_range.end)
      return;

Zink tracks the valid data segments of its buffers. This conditional is used to check for an uninitialized buffer, i.e., one which contains no valid data. If a buffer has no data, it's already invalidated, so there's nothing to be done here.

   util_range_set_empty(&res->valid_buffer_range);

Invalidating means the buffer will no longer have any valid data, so the range tracking can be reset here.

   if (!get_all_resource_usage(res))
      return;

If this resource isn't currently in use, unsetting the valid range is enough to invalidate it, so it can just be returned right away with no extra work.

   struct zink_resource_object *old_obj = res->obj;
   struct zink_resource_object *new_obj = resource_object_create(screen, pres, NULL, NULL);
   if (!new_obj) {
      debug_printf("new backing resource alloc failed!");
      return;
   }

Here's the old internal buffer object as well as a new one, created using the existing buffer as a template so that it'll match.

   res->obj = new_obj;
   res->access_stage = 0;
   res->access = 0;

struct zink_resource is just a state tracker for the struct zink_resource_object object, so upon invalidate, the states are unset since this is effectively a brand new buffer.

   zink_resource_rebind(ctx, res);

This is the tricky part, and I'll go into more detail about it below.

   zink_descriptor_set_refs_clear(&old_obj->desc_set_refs, old_obj);

If this resource was used in any cached descriptor sets, the references to those sets need to be invalidated so that the sets won't be reused.

   zink_resource_object_reference(screen, &old_obj, NULL);
}

Finally, the old struct zink_resource_object is unrefed, which will ensure that it gets destroyed once its current command buffer has finished executing.

Simple enough, but what about that zink_resource_rebind() call? Like I said, that's where things get a little tricky, but because of how much time I spent on descriptor management, it ends up not being too bad.

This is what it looks like:

void
zink_resource_rebind(struct zink_context *ctx, struct zink_resource *res)
{
   assert(res->base.target == PIPE_BUFFER);

Again, this mechanism is only handling buffer resource for now, and there's only one place in the driver that calls it, but it never hurts to be careful.

   for (unsigned shader = 0; shader < PIPE_SHADER_TYPES; shader++) {
      if (!(res->bind_stages & BITFIELD64_BIT(shader)))
         continue;
      for (enum zink_descriptor_type type = 0; type < ZINK_DESCRIPTOR_TYPES; type++) {
         if (!(res->bind_history & BITFIELD64_BIT(type)))
            continue;

Something common to many Gallium drivers is this idea of "bind history", which is where a resource will have bitflags set when it's used for a certain type of binding. While other drivers have a lot more cases than zink does due to various factors, the only thing that needs to be checked for my purposes is the descriptor type (UBO, SSBO, sampler, shader image) across all the shader stages. If a given resource has the flags set here, this means it was at some point used as a descriptor of this type, so the current descriptor bindings need to be compared to see if there's a match.

         uint32_t usage = zink_program_get_descriptor_usage(ctx, shader, type);
         while (usage) {
            const int i = u_bit_scan(&usage);

This is a handy mechanism that returns the current descriptor usage of a shader as a bitfield. So for example, if a vertex shader uses UBOs in slots 0, 1, and 3, usage will be 11, and the loop will process i as 0, 1, and 3.

            struct zink_resource *cres = get_resource_for_descriptor(ctx, type, shader, i);
            if (res != cres)
               continue;

Now the slot of the descriptor type can be compared against the resource that's being re-bound. If this resource is the one that's currently bound to the specified slot of the specified descriptor type, then steps can be taken to perform additional operations necessary to successfully replace the backing storage for the resource, mimicking the same steps taken when initially binding the resource to the descriptor slot.

            switch (type) {
            case ZINK_DESCRIPTOR_TYPE_SSBO: {
               struct pipe_shader_buffer *ssbo = &ctx->ssbos[shader][i];
               util_range_add(&res->base, &res->valid_buffer_range, ssbo->buffer_offset,
                              ssbo->buffer_offset + ssbo->buffer_size);
               break;
            }

For SSBO descriptors, the only change needed is to add valid range for the bound region as . This region is passed to the shader, so even if it's never written to, it might be, and so it can be considered a valid region.

            case ZINK_DESCRIPTOR_TYPE_SAMPLER_VIEW: {
               struct zink_sampler_view *sampler_view = zink_sampler_view(ctx->sampler_views[shader][i]);
               zink_descriptor_set_refs_clear(&sampler_view->desc_set_refs, sampler_view);
               zink_buffer_view_reference(ctx, &sampler_view->buffer_view, NULL);
               sampler_view->buffer_view = get_buffer_view(ctx, res, sampler_view->base.format,
                                                           sampler_view->base.u.buf.offset, sampler_view->base.u.buf.size);
               break;
            }

Sampler descriptors require a new VkBufferView be created since the previous one is no longer valid. Again, the references for the existing bufferview need to be invalidated now since that descriptor set can no longer be reused from the cache, and then the new VkBufferView is set after unrefing the old one.

            case ZINK_DESCRIPTOR_TYPE_IMAGE: {
               struct zink_image_view *image_view = &ctx->image_views[shader][i];
               zink_descriptor_set_refs_clear(&image_view->desc_set_refs, image_view);
               zink_buffer_view_reference(ctx, &image_view->buffer_view, NULL);
               image_view->buffer_view = get_buffer_view(ctx, res, image_view->base.format,
                                                         image_view->base.u.buf.offset, image_view->base.u.buf.size);
               util_range_add(&res->base, &res->valid_buffer_range, image_view->base.u.buf.offset,
                              image_view->base.u.buf.offset + image_view->base.u.buf.size);
               break;
            }

Images are nearly identical to the sampler case, the difference being that while samplers are read-only like UBOs (and therefore reach this point already having valid buffer ranges set), images are more like SSBOs and can be written to. Thus the valid range must be set here like in the SSBO case.

            default:
               break;

Eagle-eyed readers will note that I've omitted a UBO case, and this is because there's nothing extra to be done there. UBOs will already have their valid range set and don't need a VkBufferView.

            }

            invalidate_descriptor_state(ctx, shader, type);

Finally, the incremental decsriptor state hash for this shader stage and descriptor type is invalidated. It'll be recalculated normally upon the next draw or compute operation, so this is a quick zero-setting operation.

         }
      }
   }
}

That's everything there is to know about the current state of resource invalidation in zink!

29 Oct 2020 12:00am GMT

28 Oct 2020

feedplanet.freedesktop.org

Adam Jackson: on abandoning the X server

There's been some recent discussion about whether the X server is abandonware. As the person arguably most responsible for its care and feeding over the last 15 years or so, I feel like I have something to say about that.

The thing about being the maintainer of a public-facing project for nearly the whole of your professional career is it's difficult to separate your own story from the project. So I'm not going to try to be dispassionate, here. I started working on X precisely because free software had given me options and capabilities that really matter, and I feel privileged to be able to give that back. I can't talk about that without caring about it.

So here's the thing: X works extremely well for what it is, but what it is is deeply flawed. There's no shame in that, it's 33 years old and still relevant, I wish more software worked so well on that kind of timeframe. But using it to drive your display hardware and multiplex your input devices is choosing to make your life worse.

It is, however, uniquely well suited to a very long life as an application compatibility layer. Though the code happens to implement an unfortunate specification, the code itself is quite well structured, easy to hack on, and not far off from being easily embeddable.

The issue, then, is how to get there. And I don't have any real desire to get there while still pretending that the xfree86 hardware-backed server code is a real thing. Sorry, I guess, but I've worked on xfree86-derived servers for very nearly as long as XFree86-the-project existed, and I am completely burnt out on that on its own merits, let alone doing that and also being release manager and reviewer of last resort. You can only apply so much thrust to the pig before you question why you're trying to make it fly at all.

So, is Xorg abandoned? To the extent that that means using it to actually control the display, and not just keep X apps running, I'd say yes. But xserver is more than xfree86. Xwayland, Xwin, Xephyr, Xvnc, Xvfb: these are projects with real value that we should not give up. A better way to say it is that we can finally abandon xfree86.

And if that sounds like a world you'd like to see, please, come talk to us, let's make it happen. I'd be absolutely thrilled to see someone take this on, and I'm happy to be your guide through the server internals.

28 Oct 2020 3:01pm GMT

24 Oct 2020

feedplanet.freedesktop.org

Mike Blumenkrantz: Catching Up

Never Seen Before

A rare Saturday post because I spent so much time this week intending to blog and then somehow not getting around to it. Let's get to the status updates, and then I'm going to dive into the more interesting of the things I worked on over the past few days.

Zink has just hit another big milestone that I've just invented: as of now, my branch is passing 97% of piglit tests up through GL 4.6 and ES 3.2, and it's a huge improvement from earlier in the week when I was only at around 92%. That's just over 1000 failure cases remaining out of ~41,000 tests. For perspective, a table.

IRIS zink-mainline zink-wip
Passed Tests 43508 21225 40190
Total Tests 43785 22296 41395
Pass Rate 99.4% 95.2% 97.1%

As always, I happen to be running on Intel hardware, so IRIS and ANV are my reference points.

It's important to note here that I'm running piglit tests, and this is very different from CTS; put another way, I may be passing over 97% of the test cases I'm running, but that doesn't mean that zink is conformant for any versions of GL or ES, which may not actually be possible at present (without huge amounts of awkward hacks) given the persistent issues zink has with provoking vertex handling. I expect this situation to change in the future through the addition of more Vulkan extensions, but for now I'm just accepting that there's some areas where zink is going to misrender stuff.

What Changed?

The biggest change that boosted the zink-wip pass rate was my fixing 64bit vertex attributes, which in total had been accounting for ~2000 test failures.

Vertex attributes, as we all know since we're all experts in the graphics field, are the inputs for vertex shaders, and the data types for these inputs can vary just like C data types. In particular, with GL 4.1, ARB_vertex_attrib_64bit became a thing, which allows 64bit values to be passed as inputs here.

Once again, this is a problem for zink.

It comes down to the difference between GL's implicit handling methodology and Vulkan's explicit handling methodology. Consider the case of a dvec4 data type. Conceptually, this is a data type which is 4x64bit values, requiring 32bytes of storage. A vec4 uses 16bytes of storage, and this equates to a single "slot" or "location" within the shader inputs, as everything there is vec4-aligned. This means that, by simple arithmetic, a dvec4 requires two slots for its storage, one for the first two members, and another for the second two, both consuming a single 16byte slot.

When loading a dvec4 in GL(SL), a single variable with the first location slot is used, and the driver will automatically use the second slot when loading the second half of the value.

When loading a dvec4 in (SPIR)Vulkan, two variables with consecutive, explicit location slots must be used, and the driver will load exactly the input location specified.

This difference requires that for any dvec3 or dvec4 vertex input in zink, the value and also the load have to be split along the vec4 boundary for things to work.

Gallium already performs this split on the API side, allowing zink to already be correctly setting things up in the VkPipeline creation, so I wrote a NIR pass to fix things on the shader side.

Shader Rewriting

Yes, it's been at least a week since I last wrote about a NIR pass, so it's past time that I got back into that.

Going into this, the idea here is to perform the following operations within the vertex shader:

Simple, right?

Here we go.

static bool
lower_64bit_vertex_attribs_instr(nir_builder *b, nir_instr *instr, void *data)
{
   if (instr->type != nir_instr_type_deref)
      return false;
   nir_deref_instr *A_deref = nir_instr_as_deref(instr);
   if (A_deref->deref_type != nir_deref_type_var)
      return false;
   nir_variable *A = nir_deref_instr_get_variable(A_deref);
   if (A->data.mode != nir_var_shader_in)
      return false;
   if (!glsl_type_is_64bit(A->type) || !glsl_type_is_vector(A->type) || glsl_get_vector_elements(A->type) < 3)
      return false;

First, it's necessary to filter out all the instructions that aren't what should be rewritten. As above, only dvec3 and dvec4 types are targeted here (dmat* types are reduced to dvec types prior to this point), so anything other than a A_deref of variables with those types is ignored.

   /* create second variable for the split */
   nir_variable *B = nir_variable_clone(A, b->shader);
   /* split new variable into second slot */
   B->data.driver_location++;
   nir_shader_add_variable(b->shader, B);

B matches A except in its type and slot location, which will always be one greater than the slot location of A, so A can be cloned here to simplify the process of creating B.

   unsigned total_num_components = glsl_get_vector_elements(A->type);
   /* new variable is the second half of the dvec */
   B->type = glsl_vector_type(glsl_get_base_type(A->type), glsl_get_vector_elements(A->type) - 2);
   /* clamp original variable to a dvec2 */
   A_deref->type = A->type = glsl_vector_type(glsl_get_base_type(A->type), 2);

A and B need their types modified to not cross the vec4/slot boundary. A is always a dvec2, which has 2 components, and B will always be the remaining components.

   /* create A_deref instr for new variable */
   b->cursor = nir_after_instr(instr);
   nir_deref_instr *B_deref = nir_build_deref_var(b, B);

Now B_deref has been added thanks to the nir_builder helper function which massively simplifies the process of setting up all the instruction parameters.

   nir_foreach_use_safe(A_deref_use, &A_deref->dest.ssa) {

NIR is SSA-based, and all uses of an SSA value are tracked for the purposes of ensuring that SSA values are truly assigned only once as well as ease of rewriting them in the case where a value needs to be modified, just as this pass is doing. This use-tracking comes along with a simple API for iterating over the uses.

      nir_instr *A_load_instr = A_deref_use->parent_instr;
      assert(A_load_instr->type == nir_instr_type_intrinsic &&
             nir_instr_as_intrinsic(A_load_instr)->intrinsic == nir_intrinsic_load_deref);

The only use of A_deref should be A_load, so really iterating over the A_deref uses is just a quick, easy way to get from there to the A_load instruction.

      /* this is a load instruction for the A_deref, and we need to split it into two instructions that we can
       * then zip back into a single ssa def */
      nir_intrinsic_instr *A_load = nir_instr_as_intrinsic(A_load_instr);
      /* clamp the first load to 2 64bit components */
      A_load->num_components = A_load->dest.ssa.num_components = 2;

A_load must be clamped to a single slot location to avoid crossing the vec4 boundary, so this is done by changing the number of components to 2, which matches the now-changed type of A.

      b->cursor = nir_after_instr(A_load_instr);
      /* this is the second load instruction for the second half of the dvec3/4 components */
      nir_intrinsic_instr *B_load = nir_intrinsic_instr_create(b->shader, nir_intrinsic_load_deref);
      B_load->src[0] = nir_src_for_ssa(&B_deref->dest.ssa);
      B_load->num_components = total_num_components - 2;
      nir_ssa_dest_init(&B_load->instr, &B_load->dest, B_load->num_components, 64, NULL);
      nir_builder_instr_insert(b, &B_load->instr);

This is B_load, which loads a number of components that matches the type of B. It's inserted after A_load, though the before/after isn't important in this case. The key is just that this instruction is added before the next one.

      nir_ssa_def *def[4];
      /* createa new dvec3/4 comprised of all the loaded components from both variables */
      def[0] = nir_vector_extract(b, &A_load->dest.ssa, nir_imm_int(b, 0));
      def[1] = nir_vector_extract(b, &A_load->dest.ssa, nir_imm_int(b, 1));
      def[2] = nir_vector_extract(b, &B_load->dest.ssa, nir_imm_int(b, 0));
      if (total_num_components == 4)
         def[3] = nir_vector_extract(b, &B_load->dest.ssa, nir_imm_int(b, 1));
      nir_ssa_def *C_load = nir_vec(b, def, total_num_components);

Now that A_load and B_load both exist and are loading the corrected number of components, these components can be extracted and reassembled into a larger type for use in the shader, specifically the original dvec3 or dvec4 which is being used. nir_vector_extract performs this extraction from a given instruction by taking an index of the value to extract, and then the composite value is created by passing the extracted components to nir_vec as an array.

      /* use the assembled dvec3/4 for all other uses of the load */
      nir_ssa_def_rewrite_uses_after(&A_load->dest.ssa, nir_src_for_ssa(C_load), C_load->parent_instr);

Since this is all SSA, the NIR helpers can be used to trivially rewrite all the uses of the loaded value from the original A_load instruction to now use the assembled C_load value. It's important that only the uses after C_load has been created (i.e., nir_ssa_def_rewrite_uses_after) are those that are rewritten, however, or else the shader will also rewrite the original A_load value with C_load, breaking the shader entirely with an SSA-impossible as well as generally-impossible C_load = vec(C_load + B_load) assignment.

   }

   return true;
}

Progress has occurred, so the pass returns true to reflect that.

Now those large attributes are loaded according to Vulkan spec, and everything is great because, as expected, ANV has no bugs here.

24 Oct 2020 12:00am GMT

09 Nov 2011

feedPlanet KDE

Cool new stuff in CMake 2.8.6 (2): pkg-config compatible mode added for use e.g. with autotools

After introducing the automoc feature in my last blog, here comes the next part of this series. More will follow.

The new --find-package mode of CMake

Typically, in projects which are built using autotools or handwritten Makefiles, the tool pkg-config is used to find whether and where some library, used by the software, is installed on the current system, and prints the respective command line options for the compiler to stdout.

Since CMake 2.8.6, also CMake can be used additionally to or instead of pkg-config in such projects to find installed libraries.

With version 2.8.6 CMake features the new command line flag --find-package. When called in this mode, CMake produces results compatible to pkg-config, and can thus be used in a similar way.

E.g. to get the compiler command line arguments for compiling an object file, it can be called like this:

   $ cmake --find-package -DNAME=LibXml2 -DLANGUAGE=C -DCOMPILER_ID=GNU -DMODE=COMPILE
   -I/usr/include/libxml2
   $

To get the flags needed for linking, do

   $ cmake --find-package -DNAME=LibXml2 -DLANGUAGE=C -DCOMPILER_ID=GNU -DMODE=LINK
   -rdynamic -lxml2
   $

As result, the flags are printed to stdout, as you can see.

The required parameters are

So, you can insert calls like the above in your hand-written Makefiles.
For using CMake in autotools-based projects, you can use cmake.m4, which is now also installed by CMake.
This is used similar to the pkg-config m4-macro, just that it uses CMake internally instead of pkg-config. So your configure.in could look something like this:

   ...
   PKG_CHECK_MODULES(XFT, xft >= 2.1.0, have_xft=true, have_xft=false)
    if test $have_xft = "true"; then
        AC_MSG_RESULT(Result: CFLAGS: $XFT_CFLAGS LIBS: $XFT_LIBS)
    fi

   CMAKE_FIND_PACKAGE(LibXml2, C, GNU)
   AC_MSG_RESULT(Result: CFLAGS: $LibXml2_CFLAGS LIBS: $LibXml2_LIBS)
   ...

This will define the variables LibXml2_CFLAGS and LibXml2_LIBS, which can then be used in the Makefile.in/Makefiles.

What does that mean for developers of CMake-based libraries ?

You don't have to install pkg-config pc-files anymore, just install a Config.cmake file for CMake, and both CMake-based and also autotools-based or any other projects can make use of your library without problems.
Documentation how this is done can be found here:

What does that mean for developers working on e.g. autotools-based projects, and using a project built with CMake ?

Take a look at the cmake_find_package() m4-macro installed since CMake 2.8.6 in share/aclocal/cmake.m4, it contains documentation, and will help you using that library.
Thanks go to Matthias Kretz of Phonon fame, now working on HPC stuff, who wrote the cmake.m4 from scratch (which was necessary since it had to be BSD-licensed in order to be included in CMake).

Internals

Internally, CMake basically executes a find_package() with the given name, turns the results into the command line options for the compiler and prints them to stdout.
This means it works basically for all packages for which a FindFoo.cmake file exists or which install a FooConfig.cmake file.
There is one issue though: FindFoo.cmake files, which execute try_compile() or try_run() commands internally, are not supported, since this would required setting up and testing the compiler toolchain completely.
It works best for libraries which install a FooConfig.cmake file, since in these cases nothing has to be detected, all the information is already there.

All this stuff is still very new, and has not yet seen wide real world testing.
So, if you use it and find issues, or have suggestions how to improve it, please let me know.

Alex

09 Nov 2011 9:15pm GMT

Debugging nepomuk/virtuoso’s CPU usage

There was a lot of bug fixing regarding nepomuk and its indexing. However you might still get a high CPU-usage. Reporting this is a bit useless unless you can at least give some info about what's happening.

So what you can do is query virtuoso's status. On openSUSE it works like this: first find the .ini file currently in usage to get the port virtuoso is using, connect to virtuoso and finally query virtuoso for its status and running statements. The latter are unfortunately truncated so I would appreciate some hint on how to get around that.

ps aux | grep virtuoso

finds /usr/bin/virtuoso-t +foreground +configfile /tmp/virtuoso_T18122.ini +wait

cat /tmp/virtuoso_T18122.ini | grep Port

finds ServerPort=1111

isql-vt -H localhost -P 1111 -U dba -P dba

which connects to virtuoso

status();

which shows you some info and which queries are keeping the process busy. isql-vt is part of the virtuoso-server package but it might be that only recent packages have it compiled and older packages lack the tools.

Further you can do the following:

If you have any other hints regarding this piece of software feel free to mention them and I will add them to the post.


09 Nov 2011 3:10pm GMT

Help KDE e.V. secure funding for a sprint with just a few clicks

Some weeks ago Lydia blogged about a German bank giving away 1000 euros for each 1000 associations who can get the most votes. Well, until four days ago we were at postion 320, now we are at 735 and falling. Please, read Lydia's post about how to vote and help KDE e.V., it is just a few clicks.

It surprises me KDE e.V. have had only 3652 votes so far. If each people voted three times, which is allowed by the rules, it would give ~1218 people. Is there only 1218 people using KDE in the world?

Just the poll page is in German, but the poll is not limited to German citizens. Anybody can vote and according to the rules you can vote three times with your e-mail.

09 Nov 2011 12:22pm GMT