In my last blog, I explained how we resolved a throttling issue involving Azure storage API. In the end, I mentioned that I was not sure of the root cause of the throttling issue.
Even though we no longer had any problem in dev and preprod cluster, we still faced throttling issue with prod. The main difference between these 2 environments is that we have about 80 PVs in prod versus 15 in the other environments. Given that we manage 1500 pods in prod, 80 PVs does not look like a lot.
To continue the investigation, I've modified k8s-scheduled-volume-snapshotter to limit the number of snaphots done in a single cron run (see add maxSnapshotCount parameter pull request).
In prod, we used the modified snapshotter to trigger snapshots one by one.
Even with all previous snapshots cleaned up, we could not trigger a single new snapshot without being throttled. I guess that, in the cron job, just checking the list of PV to snapshot was enough to exhaust our API quota.
Azure doc mention that a leaky bucket algorithm is used for throttling. A full bucket holds tokens for 250 API calls, and the bucket gets 25 new tokens per second. Looks like that not enough.
I was puzzled and out of ideas .
I looked for similar problems in AKS issues on GitHub where I found this comment that recommend using useDataPlaneAPI parameter in the CSI file driver. That was it!
I was flabbergasted by this parameter: why is CSI file driver able to use 2 APIs ? Why is one on them so limited ? And more importantly, why is the limited API the default one ?
Anyway, setting useDataPlaneAPI: "true" in our VolumeSnapshotClass manifest was the right solution. This indeed solved the throttling issue in our prod cluster.
But not the snaphot issue . Amongst the 80 PV, I still had 2 snaphots failing.
Fortunately, the error was mentioned in the description of the failed snapshots: we had too many (200) snapshots for these shared volumes.
What ?? All these snapshots were cleaned up last week.
I then tried to delete these snaphots through azure console. But the console failed to delete these snapshot due to API throttling. Looks like Azure console is not using the right API.
Anyway, I went back to the solution explained in my previous blog, I listed all snapshots with az command. I indeed has a lot of snaphots, a lot of them dated Jan 19 and 20. There was often a new bogus snaphot created every minute.
These were created during the first attempt at fixing the throttling issue. I guess that even though CSI file driver was throttled, a snaphot was still created in the storage account, but the CSI driver did not see it and retried a minute later. What a mess.
Anyway, I've cleaned up again these bogus snapshot , and now, snaphot creation is working fine .
A few ago I may have accidentally bought a ring of 12 RGB LEDs; I soldered temporary leads on it, connected it to a CircuitPython supported board and played around for a while.
They we had a couple of friends come over to remote FOSDEM together, and I had talked with one of them about WS2812 / NeoPixels, so I brought them to the living room, in case there was a chance to show them in sort-of-use.
Then I was dealing with playing the various streams as we moved from one room to the next, which lead to me being called "video team", which lead to me wearing a video team shirt (from an old DebConf, not FOSDEM, but still video team), which lead to somebody asking me whether I also had the sheet with the countdown to the end of the talk, and the answer was sort-of-yes (I should have the ones we used to use for our Linux Day), but not handy.
But I had a thing with twelve things in a clock-like circle.
A bit of fiddling on the CircuitPython REPL resulted, if I remember correctly, in something like:
import board
import neopixel
import time
num_pixels = 12
pixels = neopixel.NeoPixel(board.GP0, num_pixels)
pixels.brightness = 0.1
def end(min):
pixels.fill((0, 0, 0))
for i in range(12):
pixels[i] = (127 + 10 * i, 8 * (12 - i), 0)
pixels[i-1] = (0, 0, 0)
time.sleep(min * 5) # min * 60 / 12
Now, I wasn't very consistent in running end, especially since I wasn't sure whether I wanted to run it at the beginning of the talk with the full duration or just in the last 5 - 10 minutes depending of the length of the slot, but I've had at least one person agree that the general idea has potential, so I'm taking these notes to be able to work on it in the future.
One thing that needs to be fixed is the fact that with the ring just attached with temporary wires and left on the table it isn't clear which LED is number 0, so it will need a bit of a case or something, but that's something that can be dealt with before the next fosdem.
And I should probably add some input interface, so that it is self-contained and not tethered to a computer and run from the REPL.
(And then I may also have a vague idea for putting that ring into some wearable thing: good thing that I actually bought two :D )
I was recently pointed to Technologies and Projects supported by the Sovereign Tech Agency which is financed by the German Federal Ministry for Economic Affairs and Climate Action. It is a subsidiary of the Federal Agency for Disruptive Innovation, SPRIND GmbH.
It is worth sending applications there for distinct projects as that is their preferred method of funding. Distinguished developers can also apply for a fellowship position that pays up to 40hrs / week (32hrs when freelancing) for a year. This is esp. open to maintainers of larger numbers of packages in Debian (or any other Linux distribution).
There might be a chance that some of the Debian-related projects submitted to the Google Summer of Code that did not get funded could be retried with those foundations. As per the FAQ of the project: "The Sovereign Tech Agency focuses on securing and strengthening open and foundational digital technologies. These communities working on these are distributed all around the world, so we work with people, companies, and FOSS communities everywhere."
Similar funding organizations include the Open Technology Fund and FLOSS/fund. If you have a Debian-related project that fits these funding programs, they might be interesting options. This list is by no means exhaustive-just some hints I've received and wanted to share. More suggestions for such opportunities are welcome.
Year of code reviews
On the debian-devel mailing list, there was a long thread titled "Let's make 2025 a year when code reviews became common in Debian". It initially suggested something along the lines of: "Let's review MRs in Salsa." The discussion quickly expanded to include patches that have been sitting in the BTS for years, which deserve at least the same attention. One idea I'd like to emphasize is that associating BTS bugs with MRs could be very convenient. It's not only helpful for documentation but also the easiest way to apply patches.
I'd like to emphasize that no matter what workflow we use-BTS, MRs, or a mix-it is crucial to uphold Debian's reputation for high quality. However, this reputation is at risk as more and more old issues accumulate. While Debian is known for its technical excellence, long-standing bugs and orphaned packages remain a challenge. If we don't address these, we risk weakening the high standards that Debian is valued for. Revisiting old issues and ensuring that unmaintained packages receive attention is especially important as we prepare for the Trixie release.
Debian Publicity Team will no longer post on X/Twitter
The team is in charge of deciding the most suitable publication venue or venues for announcements and when they are published.
the team once decided to join Twitter, but circumstances have since changed. The current Press delegates have the institutional authority to leave X, just as their predecessors had the authority to join. I appreciate that the team carefully considered the matter, reinforced by the arguments developed on the debian-publicity list, and communicated its reasoning openly.
The RcppUUID package on CRAN has been providing UUIDs (based on the underlying Boost library) for several years. Written by Artem Klemsov and maintained in this gitlab repo, the package is a very nice example of clean and straightforward library binding.
When we did our annual BH upgrade to 1.87.0 and check reverse dependencies, we noticed the RcppUUID needed a small and rather minor update which we showed as a short diff in an issue filed. Neither I nor CRAN heard from Artem, so the packaged ended up being archived last week. Which in turn lead me to make this minimal update to 1.1.2 to resurrect it, which CRAN processed more or less like a regular update given this explanation and so it arrived last Friday.
To Quote @MM0EFI and the GM0ESS gang, today was a particularly Amateur showing!
Having spent all weekend locked in the curling rink ruining my knees and inflicting mild liver damage in the Aberdeen City Open competition, I needed some outside time away from people to stretch the legs and loosen my knees.
With my teammates/guests shipped off early on account of our quality performance and the days fair drawin' out now, I found myself with a free afternoon to have a quick run up something nearby before a 1640 sunset! Up the back of Bennachie is a quick steady ascent and in 13 years of living up here I've never summited the big hill! Now is as good a time as any. In SOTA terms, this hill is GM/ES-061. In Geographical terms, it's around 20 miles inland from Aberdeen city here.
I've been experimenting with these Aliexpress whips since the end of last year and the forecast wind was low enough to take one into the hills. I cut and terminated 8x 2.5m radials for an effective ground plane last week and wanted to try that against the flat ribbon that it came with.
The ascent was pleasant enough, got to the summit in good time, and out came my Quansheng radio to get the GM/ES-Society on 2m. First my Nagoya whip - called CQ and heard nothing, with general poor reports in WhatsApp I opted to get the slim-g up my aliexpress fibreglass mast.
In an amateur showing last week, I broke the tip of the mast on Cat Law helping 2M0HSK do his first activation due to the wind, and had forgotten this until I summited this week. Squeezing my antenna on was tough, and after many failed attempts to get it up (the mast kept collapsing as I was rushing and not getting the friction hold on each section correctly) and still not hearing anything at all, I changed location and tried again.
In my new position, I received 2M0RVZ 4/4 at best, but he was hearing my 5/9. Similarly GM5ALX and GM4JXP were patiently receiving me loud and clear but I couldn't hear them at all. I fiddled with settings and decided the receive path of the Quansheng must be fried or sad somehow, but I don't yet have a full set of diagnostics run.
I'll take my Anytone on the next hill and compare them against each other I think.
I gave up and moved to HF, getting my whip and new radials into the ground:
Quick to deploy which is what I was after. My new 5m of coax with a choke fitted attached to the radio and we were off to the races - A convenient thing of beauty when it's up:
I've made a single guy with a sotabeams top insulator to brace against wind if need be, but that didn't need to be used today.
I hit tune, and the G90 spent ages clicking away. In fact, tuning to 14.074, I could only see the famed FT8 signals at S2.
What could be wrong here? Was it my new radials? the whip has behaved before… Minutes turned into tens of minutes playing with everything, and eventually I worked out what was up - my coax only passed signal when I the PL259 connector at the antenna juuuust right. Once I did that, I could take the tuner out the system and work 20 spectacularly well. Until now, I'd been tuning the coax only.
Another Quality Hibby Build Job™️. That's what's wrong!
I managed to struggle my way through a touch of QRM and my wonky cable woes to make enough contacts with some very patient chasers and a summit to summit before my frustration at the situation won out, and down the hill I went after a quick pack up period. I managed to beat the sunset - I think if the system had worked fine, I'd have stayed on the hill for sunset.
I think it's time for a new mast and a coax retermination!
Most of my Debian contributions this month were sponsored by Freexian. If you appreciate this sort of work and are at a company that uses Debian, have a look to see whether you can pay for any of Freexian's services; as well as the direct benefits, that revenue stream helps to keep Debian development sustainable for me and several other lovely people.
You can also support my work directly via Liberapay.
Python team
We finally made Python 3.13 the default version in testing! I fixed various bugs that got in the way of this:
I helped with some testing of a debian-installer-utils patch as part of the /usr move. I need to get around to uploading this, since it looks OK now.
Other small things
Helmut Grohne reached out for help debugging a multi-arch coinstallability problem (you know it's going to be complicated when even Helmut can't figure it out on his own …) in binutils, and we had a call about that.
For many years I wished I had a setup that would allow me to work (that is, code) productively outside in the bright sun. It's winter right now, but when its summer again it's always a bit. this weekend I got closer to that goal.
TL;DR: Using code-server on a beefy machine seems to be quite neat.
Personal history
Looking back at my own old blog entries I find one from 10 years ago describing how I bought a Kobo eBook reader with the intent of using it as an external monitor for my laptop. It seems that I got a proof-of-concept setup working, using VNC, but it was tedious to set up, and I never actually used that. I subsequently noticed that the eBook reader is rather useful to read eBooks, and it has been in heavy use for that every since.
Four years ago I gave this old idea another shot and bought an Onyx BOOX Max Lumi. This is an A4-sized tablet running Android and had the very promising feature of an HDMI input. So hopefully I'd attach it to my laptop and it just works™. Turns out that this never worked as well as I hoped: Even if I set the resolution to exactly the tablet's screen's resolution I got blurry output, and it also drained the battery a lot, so I gave up on this. I subsequently noticed that the tablet is rather useful to take notes, and it has been in sporadic use for that.
Going off on this tangent: I later learned that the HDMI input of this device appears to the system like a camera input, and I don't have to use Boox's "monitor" app but could other apps like FreeDCam as well. This somehow managed to fix the resolution issues, but the setup still wasn't as convenient to be used regularly.
I also played around with pure terminal approaches, e.g. SSH'ing into a system, but since my usual workflow was never purely text-based (I was at least used to using a window manager instead of a terminal multiplexer like screen or tmux) that never led anywhere either.
My colleagues have said good things about using VSCode with the remote SSH extension to work on a beefy machine, so I gave this a try now as well, and while it's not a complete game changer for me, it does make certain tasks (rebuilding everything after a switching branches, running the test suite) very convenient. And it's a bit spooky to run these work loads without the laptop's fan spinning up.
In this setup, the workspace is remote, but VSCode still runs locally. But it made me wonder about my old goal of being able to work reasonably efficient on my eInk tablet. Can I replicate this setup there?
VSCode itself doesn't run on Android directly. There are project that run a Linux chroot or in termux on the Android system, and then you can VNC to connect to it (e.g. on Andronix)… but that did not seem promising. It seemed fiddly, and I probably should take it easy on the tablet's system.
code-server, running remotely
A more promising option is code-server. This is a fork of VSCode (actually of VSCodium) that runs completely on the remote machine, and the client machine just needs a browser. I set that up this weekend and found that I was able to do a little bit of work reasonably.
Access
With code-server one has to decide how to expose it safely enough. I decided against the tunnel-over-SSH option, as I expected that to be somewhat tedious to set up (both initially and for each session) on the android system, and I liked the idea of being able to use any device to work in my environment.
I also decided against the more involved "reverse proxy behind proper hostname with SSL" setups, because they involve a few extra steps, and some of them I cannot do as I do not have root access on the shared beefy machine I wanted to use.
That left me with the option of using a code-server's built-in support for self-signed certificates and a password:
(I am using nix as a package manager on a Debian system there, hence the additional PATH and complex ExecStart. If you have a more conventional setup then you do not have to worry about Environment and can likely use ExecStart=code-server.
For this to survive me logging out I had to ask the system administrator to run loginctl enable-linger joachim, so that systemd allows my jobs to linger.
Git credentials
The next issue to be solved was how to access the git repositories. The work is all on public repositories, but I still need a way to push my work. With the classic VSCode-SSH-remote setup from my laptop, this is no problem: My local SSH key is forwarded using the SSH agent, so I can seamlessly use that on the other side. But with code-server there is no SSH key involved.
I could create a new SSH key and store it on the server. That did not seem appealing, though, because SSH keys on Github always have full access. It wouldn't be horrible, but I still wondered if I can do better.
I thought of creating fine-grained personal access tokens that only me to push code to specific repositories, and nothing else, and just store them permanently on the remote server. Still a neat and convenient option, but creating PATs for our org requires approval and I didn't want to bother anyone on the weekend.
So I am experimenting with Github's git-credential-manager now. I have configured it to use git's credential cache with an elevated timeout, so that once I log in, I don't have to again for one workday.
To login, I have to https://github.com/login/device on an authenticated device (e.g. my phone) and enter a 8-character code. Not too shabby in terms of security. I only wish that webpage would not require me to press Tab after each character…
This still grants rather broad permissions to the code-server, but at least only temporarily
Android setup
On the client side I could now open https://host.example.com:8080 in Firefox on my eInk Android tablet, click through the warning about self-signed certificates, log in with the fixed password mentioned above, and start working!
I switched to a theme that supposedly is eInk-optimized (eInk by Mufanza). It's not perfect (e.g. git diffs are unhelpful because it is not possible to distinguish deleted from added lines), but it's a start. There are more eInk themes on the official Visual Studio Marketplace, but because code-server is a fork it cannot use that marketplace, and for example this theme isn't on Open-VSX.
For some reason the F11 key doesn't work, but going fullscreen is crucial, because screen estate is scarce in this setup. I can go fullscreen using VSCode's command palette (Ctrl-P) and invoking the command there, but Firefox often jumps out of the fullscreen mode, which is annoying. I still have to pay attention to when that's happening; maybe its the Esc key, which I am of course using a lot due to me using vim bindings.
A more annoying problem was that on my Boox tablet, sometimes the on-screen keyboard would pop up, which is seriously annoying! It took me a while to track this down: The Boox has two virtual keyboards installed: The usual Google ASOP keyboard, and the Onyx Keyboard. The former is clever enough to stay hidden when there is a physical keyboard attached, but the latter isn't. Moreover, pressing Shift-Ctrl on the physical keyboard rotates through the virtual keyboards. Now, VSCode has many keyboard shortcuts that require Shift-Ctrl (especially on an eInk device, where you really want to avoid using the mouse). And the limited settings exposed by the Boox Android system do not allow you configure that or disable the Onyx keyboard! To solve this, I had to install the KISS Launcher, which would allow me to see more Android settings, and in particular allow me to disable the Onyx keyboard. So this is fixed.
I was hoping to improve the experience even more by opening the web page as a Progressive Web App (PWA), as described in the code-server FAQ. Unfortunately, that did not work. Firefox on Android did not recognize the site as a PWA (even though it recognizes a PWA test page). And I couldn't use Chrome either because (unlike Firefox) it would not consider a site with a self-signed certificate as a secure context, and then code-server does not work fully. Maybe this is just some bug that gets fixed in later versions.
I did not work enough with this yet to assess how much the smaller screen estate, the lack of colors and the slower refresh rate will bother me. I probably need to hide Lean's InfoView more often, and maybe use the Error Lens extension, to avoid having to split my screen vertically.
I also cannot easily work on a park bench this way, with a tablet and a separate external keyboard. I'd need at least a table, or some additional piece of hardware that turns tablet + keyboard into some laptop-like structure that I can put on my, well, lap. There are cases for Onyx products that include a keyboard, and maybe they work on the lap, but they don't have the Trackpoint that I have on my ThinkPad TrackPoint Keyboard II, and how can you live without that?
Conclusion
After this initial setup chances are good that entering and using this environment is convenient enough for me to actually use it; we will see when it gets warmer.
A few bits could be better. In particular logging in and authenticating GitHub access could be both more convenient and more safe - I could imagine that when I open the page I confirm that on my phone (maybe with a fingerprint), and that temporarily grants access to the code-server and to specific GitHub repositories only. Is that easily possible?
DeepSeek R1, the new entrant to the Large Language Model wars has created quite a splash over the last few weeks. Its entrance into a space dominated by the Big Corps, while pursuing asymmetric and novel strategies has been a refreshing eye-opener.
GPT AI improvement was starting to show signs of slowing down, and has been observed to be reaching a point of diminishing returns as it runs out of data and compute required to train, fine-tune increasingly large models. This has turned the focus towards building "reasoning" models that are post-trained through reinforcement learning, techniques such as inference-time and test-time scaling and search algorithms to make the models appear to think and reason better. OpenAI&aposs o1-series models were the first to achieve this successfully with its inference-time scaling and Chain-of-Thought reasoning.
Intelligence as an emergent property of Reinforcement Learning (RL)
Reinforcement Learning (RL) has been successfully used in the past by Google&aposs DeepMind team to build highly intelligent and specialized systems where intelligence is observed as an emergent property through rewards-based training approach that yielded achievements like AlphaGo (see my post on it here - AlphaGo: a journey to machine intuition).
DeepMind went on to build a series of Alpha* projects that achieved many notable feats using RL:
AlphaGo, defeated the world champion Lee Seedol in the game of Go
AlphaZero, a generalized system that learned to play games such as Chess, Shogi and Go without human input
AlphaStar, achieved high performance in the complex real-time strategy game StarCraft II.
AlphaFold, a tool for predicting protein structures which significantly advanced computational biology.
AlphaCode, a model designed to generate computer programs, performing competitively in coding challenges.
AlphaDev, a system developed to discover novel algorithms, notably optimizing sorting algorithms beyond human-derived methods.
All of these systems achieved mastery in its own area through self-training/self-play and by optimizing and maximizing the cumulative reward over time by interacting with its environment where intelligence was observed as an emergent property of the system.
RL mimics the process through which a baby would learn to walk, through trial, error and first principles.
R1 model training pipeline
At a technical level, DeepSeek-R1 leverages a combination of Reinforcement Learning (RL) and Supervised Fine-Tuning (SFT) for its training pipeline:
Using RL and DeepSeek-v3, an interim reasoning model was built, called DeepSeek-R1-Zero, purely based on RL without relying on SFT, which demonstrated superior reasoning capabilities that matched the performance of OpenAI&aposs o1 in certain benchmarks such as AIME 2024.
The model was however affected by poor readability and language-mixing and is only an interim-reasoning model built on RL principles and self-evolution.
DeepSeek-R1-Zero was then used to generate SFT data, which was combined with supervised data from DeepSeek-v3 to re-train the DeepSeek-v3-Base model.
The new DeepSeek-v3-Base model then underwent additional RL with prompts and scenarios to come up with the DeepSeek-R1 model.
The R1-model was then used to distill a number of smaller open source models such as Llama-8b, Qwen-7b, 14b which outperformed bigger models by a large margin, effectively making the smaller models more accessible and usable.
Key contributions of DeepSeek-R1
RL without the need for SFT for emergent reasoning capabilities
R1 was the first open research project to validate the efficacy of RL directly on the base model without relying on SFT as a first step, which resulted in the model developing advanced reasoning capabilities purely through self-reflection and self-verification.
Although, it did degrade in its language capabilities during the process, its Chain-of-Thought (CoT) capabilities for solving complex problems was later used for further RL on the DeepSeek-v3-Base model which became R1. This is a significant contribution back to the research community.
The below analysis of DeepSeek-R1-Zero and OpenAI o1-0912 shows that it is viable to attain robust reasoning capabilities purely through RL alone, which can be further augmented with other techniques to deliver even better reasoning performance.
Its quite interesting, that the application of RL gives rise to seemingly human capabilities of "reflection", and arriving at "aha" moments, causing it to pause, ponder and focus on a specific aspect of the problem, resulting in emergent capabilities to problem-solve as humans do.
Model distillation
DeepSeek-R1 also demonstrated that larger models can be distilled into smaller models which makes advanced capabilities accessible to resource-constrained environments, such as your laptop. While its not possible to run a 671b model on a stock laptop, you can still run a distilled 14b model that is distilled from the larger model which still performs better than most publicly available models out there. This enables intelligence to be brought closer to the edge, to allow faster inference at the point of experience (such as on a smartphone, or on a Raspberry Pi), which paves way for more use cases and possibilities for innovation.
Distilled models are very different to R1, which is a massive model with a completely different model architecture than the distilled variants, and so are not directly comparable in terms of capability, but are instead built to be more smaller and efficient for more constrained environments. This technique of being able to distill a larger model&aposs capabilities down to a smaller model for portability, accessibility, speed, and cost will bring about a lot of possibilities for applying artificial intelligence in places where it would have otherwise not been possible. This is another key contribution of this technology from DeepSeek, which I believe has even further potential for democratization and accessibility of AI.
Why is this moment so significant?
DeepSeek-R1 was a pivotal contribution in many ways.
The contributions to the state-of-the-art and the open research helps move the field forward where everybody benefits, not just a few highly funded AI labs building the next billion dollar model.
Open-sourcing and making the model freely available follows an asymmetric strategy to the prevailing closed nature of much of the model-sphere of the larger players. DeepSeek should be commended for making their contributions free and open.
It reminds us that its not just a one-horse race, and it incentivizes competition, which has already resulted in OpenAI o3-mini a cost-effective reasoning model which now shows the Chain-of-Thought reasoning. Competition is a good thing.
We stand at the cusp of an explosion of small-models that are hyper-specialized, and optimized for a specific use case that can be trained and deployed cheaply for solving problems at the edge. It raises a lot of exciting possibilities and is why DeepSeek-R1 is one of the most pivotal moments of tech history.
Version 0.0.20 of RcppSpdlog arrived on CRAN early this morning and has been uploaded to Debian. RcppSpdlog bundles spdlog, a wonderful header-only C++ logging library with all the bells and whistles you would want that was written by Gabi Melman, and also includes fmt by Victor Zverovich. You can learn more at the nice package documention site.
This release updates the code to the version 1.15.1 of spdlog which was released this morning as well. It also contains a contributed PR which illustrates logging in a multithreaded context.
The NEWS entry for this release follows.
Changes in RcppSpdlog version 0.0.20 (2025-02-01)
New multi-threaded logging example (Young Geun Kim and Dirk via #22)
Another short status update of what happened on my side last month. Mostly focused on quality of life improvements in phosh and cleaning up and improving phoc this time around (including catching up with wlroots git) but some improvements for other things like phosh-osk-stub happened on the side line too.
Allow events to override the sound feedback with custom sounds (MR). Allows desktop/mobile shells like phosh to honour application prefs for notifications.
udev regression affecting gmobile (Bug). Many thanks to Yu Watanabe for providing the fix so quickly
Reviews
This is not code by me but reviews on other peoples code. The list is incomplete, but I hope to improve on this in the upcoming months. Thanks for the contributions!
As people around the world understand how LLMs behave, more and more people wonder as to why these models hallucinate, and what can be done about to reduce it. This provocatively named article by Michael Townsen Hicks, James Humphries and Joe Slater bring is an excellent primer to better understanding how LLMs work and what to expect from them.
As humans carrying out our relations using our language as the main tool, we are easily at awe with the apparent ease with which ChatGPT (the first widely available, and to this day probably the best known, LLM-based automated chatbot) simulates human-like understanding and how it helps us to easily carry out even daunting data aggregation tasks. It is common that people ask ChatGPT for an answer and, if it gets part of the answer wrong, they justify it by stating that it's just a hallucination. Townsen et al. invite us to switch from that characterization to a more correct one: LLMs are bullshitting. This term is formally presented by Frankfurt [1]. To Bullshit is not the same as to lie, because lying requires to know (and want to cover) the truth. A bullshitter not necessarily knows the truth, they just have to provide a compelling description, regardless of what is really aligned with truth.
After introducing Frankfurt's ideas, the authors explain the fundamental ideas behind LLM-based chatbots such as ChatGPT; a Generative Pre-trained Transformer (GPT)'s have as their only goal to produce human-like text, and it is carried out mainly by presenting output that matches the input's high-dimensional abstract vector representation, and probabilistically outputs the next token (word) iteratively with the text produced so far. Clearly, a GPT's ask is not to seek truth or to convey useful information - they are built to provide a normal-seeming response to the prompts provided by their user. Core data are not queried to find optimal solutions for the user's requests, but are generated on the requested topic, attempting to mimic the style of document set it was trained with.
Erroneous data emitted by a LLM is, thus, not equiparable with what a person could hallucinate with, but appears because the model has no understanding of truth; in a way, this is very fitting with the current state of the world, a time often termed as the age of post-truth [2]. Requesting an LLM to provide truth in its answers is basically impossible, given the difference between intelligence and consciousness: Following Harari's definitions [3], LLM systems, or any AI-based system, can be seen as intelligent, as they have the ability to attain goals in various, flexible ways, but they cannot be seen as conscious, as they have no ability to experience subjectivity. This is, the LLM is, by definition, bullshitting its way towards an answer: their goal is to provide an answer, not to interpret the world in a trustworthy way.
The authors close their article with a plea for literature on the topic to adopt the more correct "bullshit" term instead of the vacuous, anthropomorphizing "hallucination". Of course, being the word already loaded with a negative meaning, it is an unlikely request.
This is a great article that mixes together Computer Science and Philosophy, and can shed some light on a topic that is hard to grasp for many users.
[1] Frankfurt, Harry (2005). On Bullshit. Princeton University Press.
[2] Zoglauer, Thomas (2023). Constructed truths: truth and knowledge in a post-truth world. Springer.
[3] Harari, Yuval Noah (2023. Nexus: A Brief History of Information Networks From the Stone Age to AI. Random House.
Thrilled to announce a new package: zigg. It arrived on CRAN today after a few days of review in the 'newbies' queue. zigg provides the Ziggurat pseudo-random number generator for Normal, Exponential and Uniform draws proposed by Marsaglia and Tsang (JSS, 2000), and extended by Leong et al. (JSS, 2005).
I had picked up their work in package RcppZiggurat and updated its code for the 64-buit world we now live in. That package alredy provided the Normal generator along with several competing implementations which it compared rigorously and timed them. As one of the generators was based on the GNU GSL via the implementation of Voss, we always ended up with a run-time dependency on the GSL too. No more: this new package is zero-depedency, zero-suggsts and hence very easy to deploy. Moreover, we also include a demonstration of four distinct ways of accessing the compiled code from another R package: pure and straight-up C, similarly pure C++, inclusion of the header in C++ as well as via Rcpp.
The other advance is the resurrection of the second generator for the Exponential distribution. And following Burkardt we expose the Uniform too. The main upside of these generators is their excellent speed as can be seen in the comparison the default R generators generated by the example script timings.R:
Needless to say, speed is not everything. This PRNG comes the time of 32-bit computing so the generator period is likely to be shorter than that of newer high-quality generators. If in doubt, forgo speed and stick with the high-quality default generators.
They did this by using some new shrink wrap bags and resetting the used hard disk SMART attributes to factory-new values.
Luckily Seagate has a proprietary extension "Seagate FARM (Field Access Reliability Metrics)" implemented in their disks that ... the crooks did not reset.
Luckily ... because other manufacturers do not have that extension. And you think the crooks only re-sell used Seagate disks? Lol.
The get access to the Seagate FARM extension, you need smartctl from smartmontools v7.4 or later.
For Debian 12 (Bookworm) you can add the backports archive and then install with apt install smartmontools/bookworm-backports.
For Debian 11 (Bullseye) you can use a backport we created at my company:
You can also download static builds from https://builds.smartmontools.org/ which keeps the latest CI builds of the current development branch (v7.5 at the time of writing).
To check the state of your drives, compare the output from smartctl -x and smartctl -l farm. Double checking Power_On_Hours vs. "Power on Hours" is the obvious. But the other values around "Head Flight Hours" and "Power Cycle Count" should also roughly match what you expect from a hard disk of a certain age. All near zero, of course, for a factory-new hard disk.
This is what it looks like for a hard disk that has gracefully serviced 4 years and 8 months so far. The smartctl -x and smartctl -l farm data match within some small margins:
$ smartctl -x/dev/sda
smartctl 7.42023-08-01 r5530 [x86_64-linux-6.1.0-30-amd64](local build)
Copyright (C)2002-23, Bruce Allen, Christian Franke, www.smartmontools.org