16 Oct 2024
Planet Maemo
Adding buffering hysteresis to the WebKit GStreamer video player
The <video>
element implementation in WebKit does its job by using a multiplatform player that relies on a platform-specific implementation. In the specific case of glib platforms, which base their multimedia on GStreamer, that's MediaPlayerPrivateGStreamer.
The player private can have 3 buffering modes:
- On-disk buffering: This is the typical mode on desktop systems, but is frequently disabled on purpose on embedded devices to avoid wearing out their flash storage memories. All the video content is downloaded to disk, and the buffering percentage refers to the total size of the video. A GstDownloader element is present in the pipeline in this case. Buffering level monitoring is done by polling the pipeline every second, using the
fillTimerFired()
method. - In-memory buffering: This is the typical mode on embedded systems and on desktop systems in case of streamed (live) content. The video is downloaded progressively and only the part of it ahead of the current playback time is buffered. A GstQueue2 element is present in the pipeline in this case. Buffering level monitoring is done by listening to GST_MESSAGE_BUFFERING bus messages and using the buffering level stored on them. This is the case that motivates the refactoring described in this blog post, what we actually wanted to correct in Broadcom platforms, and what motivated the addition of hysteresis working on all the platforms.
- Local files: Files, MediaStream sources and other special origins of video don't do buffering at all (no GstDownloadBuffering nor GstQueue2 element is present on the pipeline). They work like the on-disk buffering mode in the sense that
fillTimerFired()
is used, but the reported level is relative, much like in the streaming case. In the initial version of the refactoring I was unaware of this third case, and only realized about it when tests triggered the assert that I added to ensure that the on-disk buffering method was working in GST_BUFFERING_DOWNLOAD mode.
The current implementation (actually, its wpe-2.38 version) was showing some buffering problems on some Broadcom platforms when doing in-memory buffering. The buffering levels monitored by MediaPlayerPrivateGStreamer weren't accurate because the Nexus multimedia subsystem used on Broadcom platforms was doing its own internal buffering. Data wasn't being accumulated in the GstQueue2 element of playbin, because BrcmAudFilter/BrcmVidFilter was accepting all the buffers that the queue could provide. Because of that, the player private buffering logic was erratic, leading to many transitions between "buffer completely empty" and "buffer completely full". This, it turn, caused many transitions between the HaveEnoughData, HaveFutureData and HaveCurrentData readyStates in the player, leading to frequent pauses and unpauses on Broadcom platforms.
So, one of the first thing I tried to solve this issue was to ask the Nexus PlayPump (the subsystem in charge of internal buffering in Nexus) about its internal levels, and add that to the levels reported by GstQueue2. There's also a GstMultiqueue in the pipeline that can hold a significant amount of buffers, so I also asked it for its level. Still, the buffering level unstability was too high, so I added a moving average implementation to try to smooth it.
All these tweaks only make sense on Broadcom platforms, so they were guarded by ifdefs in a first version of the patch. Later, I migrated those dirty ifdefs to the new quirks abstraction added by Phil. A challenge of this migration was that I needed to store some attributes that were considered part of MediaPlayerPrivateGStreamer before. They still had to be somehow linked to the player private but only accessible by the platform specific code of the quirks. A special HashMap attribute stores those quirks attributes in an opaque way, so that only the specific quirk they belong to knows how to interpret them (using downcasting). I tried to use move semantics when storing the data, but was bitten by object slicing when trying to move instances of the superclass. In the end, moving the responsibility of creating the unique_ptr that stored the concrete subclass to the caller did the trick.
Even with all those changes, undesirable swings in the buffering level kept happening, and when doing a careful analysis of the causes I noticed that the monitoring of the buffering level was being done from different places (in different moments) and sometimes the level was regarded as "enough" and the moment right after, as "insufficient". This was because the buffering level threshold was one single value. That's something that a hysteresis mechanism (with low and high watermarks) can solve. So, a logical level change to "full" would only happen when the level goes above the high watermark, and a logical level change to "low" when it goes under the low watermark level.
For the threshold change detection to work, we need to know the previous buffering level. There's a problem, though: the current code checked the levels from several scattered places, so only one of those places (the first one that detected the threshold crossing at a given moment) would properly react. The other places would miss the detection and operate improperly, because the "previous buffering level value" had been overwritten with the new one when the evaluation had been done before. To solve this, I centralized the detection in a single place "per cycle" (in updateBufferingStatus()), and then used the detection conclusions from updateStates().
So, with all this in mind, I refactored the buffering logic as https://commits.webkit.org/284072@main, so now WebKit GStreamer has a buffering code much more robust than before. The unstabilities observed in Broadcom devices were gone and I could, at last, close Issue 1309.
16 Oct 2024 6:12am GMT
10 Sep 2024
Planet Maemo
Don’t shoot yourself in the foot with the C++ move constructor
Move semantics can be very useful to transfer ownership of resources, but as many other C++ features, it's one more double edge sword that can harm yourself in new and interesting ways if you don't read the small print.
For instance, if object moving involves super and subclasses, you have to keep an extra eye on what's actually happening. Consider the following classes A and B, where the latter inherits from the former:
#include <stdio.h> #include <utility> #define PF printf("%s %p\n", __PRETTY_FUNCTION__, this) class A { public: A() { PF; } virtual ~A() { PF; } A(A&& other) { PF; std::swap(i, other.i); } int i = 0; }; class B : public A { public: B() { PF; } virtual ~B() { PF; } B(B&& other) { PF; std::swap(i, other.i); std::swap(j, other.j); } int j = 0; };
If your project is complex, it would be natural that your code involves abstractions, with part of the responsibility held by the superclass, and some other part by the subclass. Consider also that some of that code in the superclass involves move semantics, so a subclass object must be moved to become a superclass object, then perform some action, and then moved back to become the subclass again. That's a really bad idea!
Consider this usage of the classes defined before:
int main(int, char* argv[]) { printf("Creating B b1\n"); B b1; b1.i = 1; b1.j = 2; printf("b1.i = %d\n", b1.i); printf("b1.j = %d\n", b1.j); printf("Moving (B)b1 to (A)a. Which move constructor will be used?\n"); A a(std::move(b1)); printf("a.i = %d\n", a.i); // This may be reading memory beyond the object boundaries, which may not be // obvious if you think that (A)a is sort of a (B)b1 in disguise, but it's not! printf("(B)a.j = %d\n", reinterpret_cast<B&>(a).j); printf("Moving (A)a to (B)b2. Which move constructor will be used?\n"); B b2(reinterpret_cast<B&&>(std::move(a))); printf("b2.i = %d\n", b2.i); printf("b2.j = %d\n", b2.j); printf("^^^ Oops!! Somebody forgot to copy the j field when creating (A)a. Oh, wait... (A)a never had a j field in the first place\n"); printf("Destroying b2, a, b1\n"); return 0; }
If you've read the code, those printfs will have already given you some hints about the harsh truth: if you move a subclass object to become a superclass object, you're losing all the subclass specific data, because no matter if the original instance was one from a subclass, only the superclass move constructor will be used. And that's bad, very bad. This problem is called object slicing. It's specific to C++ and can also happen with copy constructors. See it with your own eyes:
Creating B b1 A::A() 0x7ffd544ca690 B::B() 0x7ffd544ca690 b1.i = 1 b1.j = 2 Moving (B)b1 to (A)a. Which move constructor will be used? A::A(A&&) 0x7ffd544ca6a0 a.i = 1 (B)a.j = 0 Moving (A)a to (B)b2. Which move constructor will be used? A::A() 0x7ffd544ca6b0 B::B(B&&) 0x7ffd544ca6b0 b2.i = 1 b2.j = 0 ^^^ Oops!! Somebody forgot to copy the j field when creating (A)a. Oh, wait... (A)a never had a j field in the first place Destroying b2, a, b1 virtual B::~B() 0x7ffd544ca6b0 virtual A::~A() 0x7ffd544ca6b0 virtual A::~A() 0x7ffd544ca6a0 virtual B::~B() 0x7ffd544ca690 virtual A::~A() 0x7ffd544ca690
Why can something that seems so obvious become such a problem, you may ask? Well, it depends on the context. It's not unusual for the codebase of a long lived project to have started using raw pointers for everything, then switching to using references as a way to get rid of null pointer issues when possible, and finally switch to whole objects and copy/move semantics to get rid or pointer issues (references are just pointers in disguise after all, and there are ways to produce null and dangling references by mistake). But this last step of moving from references to copy/move semantics on whole objects comes with the small object slicing nuance explained in this post, and when the size and all the different things to have into account about the project steals your focus, it's easy to forget about this.
So, please remember: never use move semantics that convert your precious subclass instance to a superclass instance thinking that the subclass data will survive. You can regret about it and create difficult to debug problems inadvertedly.
Happy coding!
10 Sep 2024 7:58am GMT
17 Jun 2024
Planet Maemo
Incorporating 3D Gaussian Splats into the graphics pipeline
3D Gaussian splatting is the emerging rendering technique that is overtaking NeRFs. Since it is centered around point primitives, it is more compatible with traditional graphics pipelines that already support point rendering.
Gaussian splats essentially enhance the concept of point rendering by converting the point primitive into a 3D ellipsoid, which is then projected into 2D during the rendering process.. This concept was initially described in 2002 [3], but the technique of extending Structure from Motion scans in this way was only detailed more recently [1].
In this post, I explore how to integrate Gaussian splats into the traditional graphics pipeline. This allows them to be used alongside triangle-based primitives and interact with them through the depth buffer for occlusion (see header image). This approach also simplifies deployment by eliminating the need for CUDA.
Storage
The original implementation uses .ply files as their checkpoint format, focusing on maintaining training-relevant data structures at the expense of storage efficiency, leading to increased file sizes.
For example, it stores the covariance as scaling and a rotation quaternion, necessitating reconstruction during rendering. A more efficient approach would be to leverage orthogonality, storing only the diagonal and upper triangular vectors, thereby eliminating reconstruction and reducing storage requirements.
Further analysis of the storage usage for each attribute shows that the spherical harmonics of orders 1-3 are the main contributors to the file size. However, according to the ablation study in the original publication [1], these harmonics only lead to a modest PSNR improvement of 0.5.
Therefore, the most straightforward way to decrease storage is by discarding the higher-order spherical harmonics. Additionally, the level 0 spherical harmonics can be converted into a diffuse color and merged with opacity to form a single RGBA value. These simple yet effective methods were implemented in one of the early WebGL implementations, resulting in the .splat format. As an added benefit, this format can be easily interpreted by viewers unaware of Gaussian splats as a simple colored point cloud:
By directly storing the covariance as previously mentioned we can reduce the precision from float32
to float16
, thereby halving the storage needed for that data. Furthermore, since most splats have limited spatial extents, we can also utilize float16
for position data, yielding additional storage savings.
With these changes, we achieve a storage requirement of 22 bytes per splat, in contrast to the 44 bytes needed by the .splat format and 236 bytes in the original implementation. Thus, we have attained a 10x reduction in storage compared to the original implementation simply by using more suitable data types.
Blending
The image formation model presented in the original paper [1] is similar to the NeRF rendering, as it is compared to it. This involves casting a ray and observing its intersection with the splats, which leads to front-to-back blending. This is precisely the approach taken by the provided CUDA implementation.
Blending remains a component of the fixed-function unit within the graphics pipeline, which can be set up for front-to-back blending [2] by using the factors (one_minus_dest_alpha, one)
and by multiplying color and alpha in the shader as color.rgb * color.a
. This results in the following equation:
\begin{aligned}C_{dst} &= (1 - \alpha_{dst}) \cdot \alpha_{src} C_{src} &+ C_{dst}\\ \alpha_{dst} &= (1 - \alpha_{dst})\cdot\alpha_{src} &+ \alpha_{dst}\end{aligned}
However, this method requires the framebuffer alpha value to be zero before rendering the splats, which is not typically the case as any previous render pass could have written an arbitrary alpha value.
A simple solution is to switch to back-to-front sorting and use the standard alpha blending factors (src_alpha, one_minus_src_alpha)
for the following blending equation:
C_{dst} = \alpha_{src} \cdot C_{src} + (1 - \alpha_{src}) \cdot C_{dst}
This allows us to regard Gaussian splats as a special type of particles that can be rendered together with other transparent elements within a scene.
References
- Kerbl, Bernhard, et al. "3d gaussian splatting for real-time radiance field rendering." ACM Transactions on Graphics 42.4 (2023): 1-14.
- Green, Simon. "Volumetric particle shadows." NVIDIA Developer Zone (2008).
- Zwicker, Matthias, et al. "EWA splatting." IEEE Transactions on Visualization and Computer Graphics 8.3 (2002): 223-238.
17 Jun 2024 1:28pm GMT