19 Apr 2024

feedplanet.freedesktop.org

Tomeu Vizoso: Rockchip NPU update 3: Real-time object detection on RK3588

Progress

Yesterday I managed to implement in my open-source driver all the remaining operations so the SSDLite MobileDet model can run on Rockchip's NPU in the RK3588 SoC.

Performance is pretty good at 30 frames per second when using just one of the 3 cores that the NPU contains.


I uploaded the generated video to YouTube at:

You can get the source code at my branch here.

Next steps

Now that we got to this level of usefulness, I'm going to switch to writing a kernel driver suited for inclusion into the Linux kernel, to the drivers/accel subsystem.

There is still lots of work to do, but progress is going pretty fast, though as I write more drivers for different NPUs I will have to split my time among them. At least, until we get more contributors! :)

19 Apr 2024 8:17am GMT

15 Apr 2024

feedplanet.freedesktop.org

Simon Ser: Status update, April 2024

Hi!

The X.Org Foundation results are in, and I'm now officially part of the Board of Directors. I hope I can be of use to the community on more organizational issues! Speaking of which, I've spent quite a bit of time dealing with Code of Conduct matters lately. Of course I can't disclose details for privacy, but hopefully our actions can gradually improve the contribution experience for FreeDesktop.Org projects.

New extensions have been merged in wayland-protocols. linux-drm-syncobj-v1 enables explicit synchronization which is a better architecture than what we have today (implicit synchronization) and will improve NVIDIA support. alpha-modifier-v1 allows Wayland clients to set an alpha channel multiplier on its surfaces, it can be used to implement effects such as fade-in or fade-out without redrawing, and can even be offloaded to KMS. The tablet-v2 protocol we've used for many years has been stabilized.

In other Wayland news, a new API has been added to dynamically resize libwayland's internal buffer. By default, the server-side buffer size is still 4 KiB but the client-side buffer will grow as needed. This should help with bursts (e.g. long format lists) and high poll rate mice. I've added a new wayland-scanner mode to generate headers with only enums to help libraries such as wlroots which use these in their public API. And I've sent an announcement for the next Wayland release, it should happen at the end of May if all goes well.

With the help of Sebastian Wick, libdisplay-info has gained support for more bits, in particular DisplayID type II, III and VII timings, as well as CTA Video Format Preference blocks, Room Configuration blocks and Speaker Location blocks. I've worked on libicc to finish up the parser, next I'd like to add the math required to apply an ICC profile. gamja now has basic support for file uploads (only when pasting a file for now) and hides no-op nickname changes (e.g. from "emersion" to "emersion_" and back).

See you next month!

15 Apr 2024 10:00pm GMT

Christian Gmeiner: hwdb - The only truth

Trusting hardware, particularly the registers that describe its functionality, is fundamentally risky.

tl;dr

The etnaviv GPU stack is continuously improving and becoming more robust. This time, a hardware database was incorporated into Mesa, utilizing header files provided by the SoC vendors.

If you are interested in the implementation details, I recommend checking out this Mesa MR.

Are you employed at Versilicon and want to help? You could greatly simplify our work by supplying the community with a comprehensive header that includes all the models you offer.

Last but not least: I deeply appreciate Igalia's passion for open source GPU driver development, and I am grateful to be a part of the team. Their enthusiasm for open source work not only pushes the boundaries of technology but also builds a strong, collaborative community around it.

The good old days

Years ago, when I began dedicating time to hacking on etnaviv, the kernel driver in use would read a handful of registers and relay the gathered information to the user space blob. This blob driver was then capable of identifying the GPU (including model, revision, etc.), supported features (such as DXT texture compression, seamless cubemaps, etc.), and crucial limits (like the number of registers, number of varyings, and so on).

For reverse engineering purposes, this interface is super useful. Image if you could change one of these feature bits on a target running the binary blob.

With libvivhook it is possible to do exactly this. From time to time, I am running such an old vendor driver stack on an i.MX 6QuadPlus SBC, which features a Vivante GC3000 as its GPU.

Somewhere, I have a collection of scripts that I utilized to acquire additional knowledge about unknown GPU states activated when a specific feature bit was set.

To explore a simple example, let's consider the case of misrepresenting a GPU's identity as a GC2000. This involves modifying the information provided by the kernel driver to the user space, making the user space driver believe it is interacting with a GC2000 GPU. This scenario could be used for testing, debugging, or understanding how specific features or optimizations are handled differently across GPU models.

export ETNAVIV_CHIP_MODEL="0x2000"
export ETNAVIV_CHIP_REVISION="0x5108"
export ETNAVIV_FEATURES0_CLEAR="0xFFFFFFFF"
export ETNAVIV_FEATURES1_CLEAR="0xFFFFFFFF"
export ETNAVIV_FEATURES2_CLEAR="0xFFFFFFFF"
export ETNAVIV_FEATURES0_SET="0xe0296cad"
export ETNAVIV_FEATURES1_SET="0xc9799eff"
export ETNAVIV_FEATURES2_SET="0x2efbf2d9"
LD_PRELOAD="/lib/viv_interpose.so" ./test-case

If you capture the generated command stream and compare it with the one produced under the correct identity, you'll observe many differences. This is super useful - I love it.

Changing Tides: The Shift in ioctl() Interface

At some point in time, Vivante changed their ioctl() interface and modified the gcvHAL_QUERY_CHIP_IDENTITY command. Instead of providing a very detailed chip identity, they reduced the data set to the following values:

This shift could indeed hinder reverse engineering efforts significantly. At a glance, it becomes impossible to alter any feature value, and understanding how the vendor driver processes these values is out of reach. Determining the function or impact of an unknown feature bit now seems unattainable.

However, the kernel driver also requires a mechanism to verify the existing features of the GPU, as it needs to accommodate a wide variety of GPUs. Therefore, there must be some sort of system or method in place to ensure the kernel driver can effectively manage and support the diverse functionalities and capabilities of different GPUs.

A New Approach: The Hardware Database Dilemma

Let's welcome: gc_feature_database.h, or hwdb for short.

Vivante transitioned to using a database that stores entries for limit values and feature bits. This database is accessed by querying with model, revision, product id, eco id and customer id.

There is some speculation why this move was done. My theory posits that they became frustrated with the recurring cycle of introducing feature bits to indicate the implementation of a feature, subsequently discovering problems with said feature, and then having to introduce additional feature bits to signal that the feature now truly operates as intended. It became far more straightforward to deactivate a malfunctioning feature by modifying information in the hardware database (hwdb). After they began utilizing the hwdb within the driver, updates to the feature registers in the hardware ceased.

Here is a concrete example of such a case that can be found in the etnaviv gallium driver:

screen->specs.tex_astc = VIV_FEATURE(screen, chipMinorFeatures4, TEXTURE_ASTC) &&
                            !VIV_FEATURE(screen, chipMinorFeatures6, NO_ASTC);

Meanwhile, in the etnaviv world there was a hybrid in the making. We stuck with the detailed feature words and found a smart way to convert from Vivante's hwdb entries to our own in-kernel database. There is even a full blown Vivante -> etnaviv hwdb convert.

At that time, I did not fully understand all the consequences this approach would bring - more on that later. So, I dedicated my free time to reverse engineering and tweaking the user space driver, while letting the kernel developers do their thing.

About a year after the initial hwdb landed in the kernel, I thought it might be a good idea to read out the extra id values, and provide them via sysfs to the user space. At that time, I already had the idea of moving the hardware database to user space in mind. However, I was preoccupied with other priorities that were higher on my to-do list, and I ended up forgetting about it.

Challange accepted

Tomeu Vizoso began to work on teflon and a Neural Processing Unit (NPU) driver within Mesa, leveraging a significant amount of the existing codebase and concepts, including the same kernel driver for the GPU. During this process, he encountered a need for some NPU-specific limit values. To address this, he added an in-kernel hwdb entry and made the limit values accessible to user space.

That's it - the kernel supplies all the values the NPU driver requires. We're finished, aren't we?

It turns out, that there are many more NPU related values that need to be exposed in the same manner, with seemingly no end in sight.

One of the major drawbacks when the hardware database (hwdb) resides in the kernel is the considerable amount of time it takes for hwdb patches to be written, reviewed, and eventually merged into Linus's git tree. This significantly slows down the development of user space drivers. For end users, this means they must either run a bleeding-edge kernel or backport the necessary changes on their own.

For me personally, the in-kernel hardware database should never have been implemented in its current form. If I could go back in time, I would have voiced my concerns.

As a result, moving the hardware database (hwdb) to user space quickly became a top priority on my to-do list, and I began working on it. However, during the testing phase of my proof of concept (PoC), I had to pause my work due to a kernel issue that made it unreliable for user space to trust the ID values provided by the kernel. Once my fix for this issue began to be incorporated into stable kernel versions, it was time to finalize the user space hwdb.

There is only one little but important detail we have not talked about yet. There are vendor specific versions of gc_feature_database.h based on different versions of the binary blob. For instance, there is one from NXP, ST, Amlogic and some more.

Here is a brief look at the differences:

nxp/gc_feature_database.h (autogenerated at 2023-10-24 16:06:00, 861 struct members, 27 entries)
stm/gc_feature_database.h (autogenerated at 2022-12-29 11:13:00, 833 struct members, 4 entries)
amlogic/gc_feature_database.h (autogenerated at 2021-04-12 17:20:00, 733 struct members, 8 entries)

We understand that these header files are generated and adhere to a specific structure. Therefore, all we need to do is write an intelligent Python script capable of merging the struct members into a single consolidated struct. This script will also convert the old struct entries to the new format and generate a header file that we can use.

I'm consistently amazed by how swiftly and effortlessly Python can be used for such tasks. Ninety-nine percent of the time, there's a ready-to-use Python module available, complete with examples and some documentation. To address the C header parsing challenge, I opted for pycparser.

The final outcome is a generated hwdb.h file that looks and feels similar to those generated from the binary blob.

Future proof

This header merging approach offers several advantages:

While working on this topic I decided to do a bigger refactoring with the end goal to provide a struct etna_core_info that is located outside of the gallium driver.

This makes the code future proof and moves the filling of struct etna_core_info directly into the lowest layer - libetnaviv_drm (src/etnaviv/drm).

We have not yet talked about one important detail.

What happens if there is no entry in the user space hwdb?

The solution is straightforward: we fallback to the previous method and request all feature words from the kernel driver. However, in an ideal scenario, our user space hardware database should supply all necessary entries. If you find that an entry for your GPU/NPU is missing, please get in touch with me.

What about the in-kernel hwdb?

The existing system, despite its limitations, is set to remain indefinitely, with new entries being added to accommodate new GPUs. Although it will never contain as much information as the user space counterpart, this isn't necessarily a drawback. For the purposes at hand, only a handful of feature bits are required.

15 Apr 2024 12:00am GMT