Sent out today were a set of input subsystem fixes for the near-final Linux 6.18 kernel. A bit of a notable addition via this "fixes" pull is getting both touchscreens working on the AYANEO Flip DS, a dual-screen gaming handheld device that can be loaded up with Linux...
When using Mkdocs with Material, you can set default languages for code blocks in your mkdocs.yml configuration file. This is particularly useful for inline code examples that may not have explicit language tags.
You can see what this looks like in practice with Air's API reference for forms here: feldroy.github.io/air/api/forms/. With this configuration, any code block without a specified language defaults to Python syntax highlighting, making documentation clearer and more consistent.
I have just released FSet 2! You can get it from common-lisp.net or GitHub. A detailed description can be found via those links, but briefly, it makes the CHAMP implementations the default for sets and maps, and makes some minor changes to the API.
I am already working on 2.1, which will have some performance improvements for seqs.
I want to be upfront that this blog post is for me to write down some thoughts that I have on the idea of rewriting the Python Launcher for Unix from Rust to pure Python. This blog post is not meant to explicitly be educational or enlightening for others, but I figured if I was going to write this down I might as well just toss it online in case someone happens to find it interesting. Anyway, with that caveat out of the way...
I started working on the Python Launcher for Unix in May 2018. At the time I used it as my Rust starter project and I figured distributing it would be easiest as a single binary since if I wrote it in Python how do you bootstrap yourself in launching Python with Python? But in the intervening 7.5 years, a few things have happened:
I became a dad (that will make more sense as to why that matters later in this post)
All of this has come together for me to realize now is the time to reevaluate whether I want to stick with Rust or pivot to using pure Python.
Performance
The first question I need to answer for myself is whether performance is good enough to switch. My hypothesis is that the Python Launcher for Unix is mostly I/O-bound (specifically around file system access), and so using Python wouldn&apost be a hindrance. To test this, I re-implemented enough of the Python Launcher for Unix in pure Python to make py --version work:
$VIRTUAL_ENV environment variable support
Detection of .venv in the current or parent directories
Searching $PATH for the newest version of Python
It only took 72 lines, so it was a quick hack. I compared the Rust version to the Python version on my machine running Fedora 43 by running hyperfine"py --version". If I give Rust an optimistic number by picking its average lower-bound and Python a handicap of picking its average upper-bound, we get:
3 ms for Rust (333 Hz)
33 ms for Python (30 Hz)
So 11x slower for Python. But when the absolute performance is fast enough to let you run the Python Launcher for Unix over 30 times a second, does it actually matter? And you&aposre not about to run the Python Launcher for Unix in some tight loop or even in production (as it&aposs a developer tool), so I don&apost think that worst-case performance number (on my machine) makes performance a concern in making my decision.
If I rewrote the Python Launcher for Unix in Python, could I get equivalent distribution channels? Substituting crates.io for PyPI makes that one easy. The various package managers also know how to package Python applications already, so they would take care of the bootstrapping problem of getting Python your machine to run the Python Launcher for Unix.
Add in the fact that I&aposm working towards prebuilt binaries for python.org and it wouldn&apost even necessarily be an impediment if the Python Launcher for Unix were ever to be distributed via python.org as well. I could imagine some shell script to download Python and then use it to run a Python script to get the Python Launcher for Unix installed on one&aposs machine (if relative paths for shebangs were relative to the script being executed then I could see just shipping an internal copy of Python with the Python Launcher for Unix, but a quick search online suggests such relative paths are relative to the working directory). So I don&apost see using Python as being a detriment to distribution.
Maximizing the impact of my time
I am a dad to a toddler. That means my spare time is negligible and restricted to nap time (which is shrinking), or in the evening (which I can&apost code past 21:00, else I have really wonky dreams or I simply can&apost fall asleep due to my brain not shutting off). Now I know I should eventually get some spare time back, but that&aposs currently measured in years according to other parents, and so this time restriction to work on this fun project is not about to improve in the near to mid-future.
This has led me, as of late, to look at how best to use my spare time. I could continue to grow my Rust experience while solving problems, or I could lean into my Python experience and solve more problems in the same amount of time. This somewhat matters if I decide that increasing the functionality of the Python Launcher for Unix is the more fun for me than getting more Rust experience at this current point of my life.
And if I think the feature set is the most important, then doing it in Python has a greater chance of getting external contribution from the Python Launcher for Unix&aposs user base. Compare that to now where there have been 11 human contributors over the project&aposs entire lifetime.
Conclusion?
So have I talked myself into rewriting the Python Launcher for Unix into Python?
The more I work with large language models through provider-exposed APIs, the more I feel like we have built ourselves into quite an unfortunate API surface area. It might not actually be the right abstraction for what's happening under the hood. The way I like to think about this problem now is that it's actually a distributed state synchronization problem.
At its core, a large language model takes text, tokenizes it into numbers, and feeds those tokens through a stack of matrix multiplications and attention layers on the GPU. Using a large set of fixed weights, it produces activations and predicts the next token. If it weren't for temperature (randomization), you could think of it having the potential of being a much more deterministic system, at least in principle.
As far as the core model is concerned, there's no magical distinction between "user text" and "assistant text"-everything is just tokens. The only difference comes from special tokens and formatting that encode roles (system, user, assistant, tool), injected into the stream via the prompt template. You can look at the system prompt templates on Ollama for the different models to get an idea.
The Basic Agent State
Let's ignore for a second which APIs already exist and just think about what usually happens in an agentic system. If I were to have my LLM run locally on the same machine, there is still state to be maintained, but that state is very local to me. You'd maintain the conversation history as tokens in RAM, and the model would keep a derived "working state" on the GPU-mainly the attention key/value cache built from those tokens. The weights themselves stay fixed; what changes per step are the activations and the KV cache.
From a mental-model perspective, caching means "remember the computation you already did for a given prefix so you don't have to redo it." Internally, that usually means storing the attention KV cache for those prefix tokens on the server and letting you reuse it, not literally handing you raw GPU state.
There are probably some subtleties to this that I'm missing, but I think this is a pretty good model to think about it.
The Completion API
The moment you're working with completion-style APIs such as OpenAI's or Anthropic's, abstractions are put in place that make things a little different from this very simple system. The first difference is that you're not actually sending raw tokens around. The way the GPU looks at the conversation history and the way you look at it are on fundamentally different levels of abstraction. While you could count and manipulate tokens on one side of the equation, extra tokens are being injected into the stream that you can't see. Some of those tokens come from converting the JSON message representation into the underlying input tokens fed into the machine. But you also have things like tool definitions, which are injected into the conversation in proprietary ways. Then there's out-of-band information such as cache points.
And beyond that, there are tokens you will never see. For instance, with reasoning models you often don't see any real reasoning tokens, because some LLM providers try to hide as much as possible so that you can't retrain your own models with their reasoning state. On the other hand, they might give you some other informational text so that you have something to show to the user. Model providers also love to hide search results and how those results were injected into the token stream. Instead, you only get an encrypted blob back that you need to send back to continue the conversation. All of a sudden, you need to take some information on your side and funnel it back to the server so that state can be reconciled on either end.
In completion-style APIs, each new turn requires resending the entire prompt history. The size of each individual request grows linearly with the number of turns, but the cumulative amount of data sent over a long conversation grows quadratically because each linear-sized history is retransmitted at every step. This is one of the reasons long chat sessions feel increasingly expensive. On the server, the model's attention cost over that sequence also grows quadratically in sequence length, which is why caching starts to matter.
The Responses API
One of the ways OpenAI tried to address this problem was to introduce the Responses API, which maintains the conversational history on the server (at least in the version with the saved state flag). But now you're in a bizarre situation where you're fully dealing with state synchronization: there's hidden state on the server and state on your side, but the API gives you very limited synchronization capabilities. To this point, it remains unclear to me how long you can actually continue that conversation. It's also unclear what happens if there is state divergence or corruption. I've seen the Responses API get stuck in ways where I couldn't recover it. It's also unclear what happens if there's a network partition, or if one side got the state update but the other didn't. The Responses API with saved state is quite a bit harder to use, at least as it's currently exposed.
Obviously, for OpenAI it's great because it allows them to hide more behind-the-scenes state that would otherwise have to be funneled through with every conversation message.
State Sync API
Regardless of whether you're using a completion-style API or the Responses API, the provider always has to inject additional context behind the scenes-prompt templates, role markers, system/tool definitions, sometimes even provider-side tool outputs-that never appears in your visible message list. Different providers handle this hidden context in different ways, and there's no common standard for how it's represented or synchronized. The underlying reality is much simpler than the message-based abstractions make it look: if you run an open-weights model yourself, you can drive it directly with token sequences and design APIs that are far cleaner than the JSON-message interfaces we've standardized around. The complexity gets even worse when you go through intermediaries like OpenRouter or SDKs like the Vercel AI SDK, which try to mask provider-specific differences but can't fully unify the hidden state each provider maintains. In practice, the hardest part of unifying LLM APIs isn't the user-visible messages-it's that each provider manages its own partially hidden state in incompatible ways.
It really comes down to how you pass this hidden state around in one form or another. I understand that from a model provider's perspective, it's nice to be able to hide things from the user. But synchronizing hidden state is tricky, and none of these APIs have been built with that mindset, as far as I can tell. Maybe it's time to start thinking about what a state synchronization API would look like, rather than a message-based API.
The more I work with these agents, the more I feel like I don't actually need a unified message API. The core idea of it being message-based in its current form is itself an abstraction that might not survive the passage of time.
Learn From Local First?
There's a whole ecosystem that has dealt with this kind of mess before: the local-first movement. Those folks spent a decade figuring out how to synchronize distributed state across clients and servers that don't trust each other, drop offline, fork, merge, and heal. Peer-to-peer sync, and conflict-free replicated storage engines all exist because "shared state but with gaps and divergence" is a hard problem that nobody could solve with naive message passing. Their architectures explicitly separate canonical state, derived state, and transport mechanics - exactly the kind of separation missing from most LLM APIs today.
Some of those ideas map surprisingly well to models: KV caches resemble derived state that could be checkpointed and resumed; prompt history is effectively an append-only log that could be synced incrementally instead of resent wholesale; provider-side invisible context behaves like a replicated document with hidden fields.
At the same time though, if the remote state gets wiped because the remote site doesn't want to hold it for that long, we would want to be in a situation where we can replay it entirely from scratch - which for instance the Responses API today does not allow.
Future Unified APIs
There's been plenty of talk about unifying message-based APIs, especially in the wake of MCP (Model Context Protocol). But if we ever standardize anything, it should start from how these models actually behave, not from the surface conventions we've inherited. A good standard would acknowledge hidden state, synchronization boundaries, replay semantics, and failure modes - because those are real issues. There is always the risk that we rush to formalize the current abstractions and lock in their weaknesses and faults. I don't know what the right abstraction looks like, but I'm increasingly doubtful that the status-quo solutions are the right fit.
Django 6.0 release candidate 1 is now available. It represents the final opportunity for you to try out a mosaic of modern tools and thoughtful design before Django 6.0 is released.
This release, 3.15.0a2, is the second of seven planned alpha releases. Alpha releases are intended to make it easier to test the current state of new features and bug fixes and to test the release process.
Today, "Updates to Django" is presented by Raffaella from Djangonaut Space! 🚀
Last week we had 17 pull requests merged into Django by 9 different contributors - including 2 first-time contributors! Congratulations to Hong Xu and Benedict Etzel for having their first commits merged into Django - welcome on board!
News in Django 6.1:
The admin site login view now redirects authenticated users to the next URL, if available, instead of always redirecting to the admin index page.
Inspectdb now introspects HStoreField when psycopg 3.2+ is installed and django.contrib.postgres is in INSTALLED_APPS.
This is the only annual discount available for lifetime access to three books by Will Vincent: Django for Beginners, Django for APIs, and Django for Professionals.
Learn how to use UUIDv7 today with stable releases of Python 3.14, Django 5.2 and PostgreSQL 18. A step by step guide showing how to generate UUIDv7 in Python, store them in Django models, use PostgreSQL native functions and build time ordered primary keys without writing SQL.
Modern browsers support native JavaScript modules and CSS features, so Django projects can skip frontend build tools while using ManifestStaticFilesStorage for production.
Practical overview of POST content types with Django examples showing request parsing and validation for form, multipart, JSON, NDJSON, text, XML, and binary.
Another week with a strong focus on security work. Most of the effort went into preparing and issuing the November security release, along with some follow-up permission and access reviews. CNA tasks and training also continued in the background.
This week we landed a migrations fix that prevents flaky CircularDependencyErrorswhen squashed replacements are in play. If you haven't tried squashing migrations in a while, check out main and give it another go!
We also fixed an unreleased regression in the urlize template filter-big thanks to Mehraz Hossain Rumman for testing the beta. (Are you the next tester to report a regression before 6.0 final?)
Intel's new CEO, Lip-Bu Tan, has made listening to customers a top priority, saying at Intel Vision earlier this year: "Please be brutally honest with us. This is what I expect of you this week, and I believe harsh feedback is most valuable."
I'd been in regular meetings with Intel for several years before I joined, and I had been giving them technical direction on various projects, including at times some brutal feedback. When I finally interviewed for a role at Intel I was told something unexpected: that I had already accomplished so much within Intel that I qualified to be an Intel Fellow candidate. I then had to pass several extra interviews to actually become a fellow (and was told I may only be the third person in Intel's history to be hired as a Fellow) but what stuck with me was that I had already accomplished so much at a company I'd never worked for.
If you are in regular meetings with a hardware vendor as a customer (or potential customer) you can accomplish a lot by providing firm and tough feedback, particularly with Intel today. This is easier said than done, however.
Now that I've seen it from the other side I realize I could have accomplished more, and you can too. I regret the meetings where I wasn't really able to have my feedback land as the staff weren't really getting it, so I eventually gave up. After the meeting I'd crack jokes with my colleagues about how the product would likely fail. (Come on, at least I tried to tell them!)
Here's what I wish I had done in any hardware vendor meeting:
Prep before meetings: study the agenda items and look up attendees on LinkedIn and note what they do, how many staff they say they manage, etc.
Be aware of intellectual property risks: Don't accept meetings covered by some agreement that involves doing a transfer of intellectual property rights for your feedback (I wrote a post on this); ask your legal team for help.
Make sure feedback is documented in the meeting minutes (e.g., a shared Google doc) and that it isn't watered down. Be firm about what you know and don't know: it's just as important to assert when you haven't formed an opinion yet on some new topic.
Stick to technical criticisms that are constructive (uncompetitive, impractical, poor quality, poor performance, difficult to use, of limited use/useless) instead of trash talk (sucks, dumb, rubbish).
Check minutes include who was present and the date.
Ask how many staff are on projects if they say they don't have the resources to address your feedback (they may not answer if this is considered sensitive) and share industry expectations, for example: "This should only take one engineer one month, and your LinkedIn says you have over 100 staff."
Decline freeloading: If staff ask to be taught technical topics they should already know (likely because they just started a new role), decline, as I'm the customer and not a free training resource.
Ask "did you Google it?" a lot: Sometimes staff join customer meetings to elevate their own status within the company, and ask questions they could have easily answered with Google or ChatGPT.
Ask for staff/project bans: If particular staff or projects are consistently wasting your time, tell the meeting host (usually the sales rep) to take them off the agenda for at least a year, and don't join (or quit) meetings if they show up. Play bad cop, often no one else will.
Review attendees. From time to time, consider: Am I meeting all the right people? Review the minutes. E.g., if you're meeting Intel and have been talking about a silicon change, have any actual silicon engineers joined the call?
Avoid peer pressure: You may meet with the entire product team who are adamant that they are building something great, and you alone need to tell them it's garbage (using better words). Many times in my life I've been the only person to speak up and say uncomfortable things in meetings, yet I'm not the only person present who could.
Ask for status updates: Be prepared that even if everyone appears grateful and appreciative of your feedback, you may realize six months later that nothing was done with it. Ask for updates and review the prior meeting minutes to see what you asked for and when.
Speak to ELT/CEO: Once a year or so, ask to speak to someone on the executive leadership team (ELT; the leaders on the website) or the CEO. Share brutal feedback, and email them a copy of the meeting minutes showing the timeline of what you have shared and with whom. This may be the only way your feedback ever gets addressed, in particular for major changes. Ask to hear what they have been told about you and be prepared to refute details: your brutal feedback may have been watered down.
I'm now in meetings from the other side where we'd really appreciate brutal feedback, but some customers aren't comfortable doing this, even when prompted. It isn't easy to tell someone their project is doomed, or that their reasons for not doing something are BS. It isn't easy dealing with peer pressure and a room of warm and friendly staff begging you say something, anything nice about their terrible product for fear of losing their jobs -- and realizing you must be brutal to their faces otherwise you're not helping the vendor or your own company. And it's extra effort to check meeting minutes and to push for meetings with the ELT or the CEO. Giving brutal feedback takes brutal effort.
The Drupal Association is excited to announce that our t-shirt design contest will be returning for DrupalCon Chicago!
We want to see the Drupal community's design ideas for the official t-shirt, available for all attendees to wear and enjoy. Do you have a fantastic idea in mind? Let's see your creativity!
The winner will get THEIR design on the front of the official t-shirt for DrupalCon Chicago!
What the judges are looking for
Judges are looking for a combination of creativity, impact, and relevance to the Drupal community. A design that tells a story and aligns with the values and aspirations of DrupalCon attendees is likely to capture attention.
While exploring bold ideas, consider how your design will resonate with a diverse audience. Think of classic elements that make a T-shirt memorable while pushing creative boundaries. Avoid overcomplicating things; sometimes less is more, especially if every element adds value to the message.
Now, for the finer details…
Your design must include the DrupalCon Chicago Logo and will only be featured on the front of the t-shirt. Sponsor logos will be added to the t-shirts sleeves after the design is finalized.
Specs:
PNG or PDF preferred
16 inches tall
graphics need to be 300 dpi
All designs must be submitted by 21 December 2025 at 23:59 UTC, after which the submission form will close.
The Drupal Association will then select 4 designs to go forward to a public vote.
The top three designs as chosen by the Drupal Association will then be voted upon by the public, with voting open 5 January until 12 January 2026 at 23:59 UTC.*
The winning design will be printed on the front of the official DrupalCon Chicago t-shirt and the winner will receive a complimentary ticket to their choice of either DrupalCon Chicago 2026 or DrupalCon North America 2027.
Simply create your design, then fill out our submission form by 21 December 2025 to submit your final design. We also ask that you include a sentence or two describing why you chose your design and how it represents the Drupal community.
So, what are you waiting for? Submit your design now, and please help us spread the word throughout the Drupal community!
Good luck!
* Dates for public voting dates are subject to change but will be open for a minimum of 1 week.
** Drupal Association staff and members of the DrupalCon Chicago Steering Committee will not be permitted to enter this contest.
With nearly 14 years at Open Social (via GoalGorilla), engineering lead Ronald te Brake has shifted from writing code to shaping how teams think, collaborate and solve problems. In this interview with TDT's Alka Elizabeth, Ronald explains why he views AI not as a magic bullet, but as a teammate that demands context, guardrails and documentation. He shares how governance, decision records and meaningful standards help developers stay effective in a world where automation and intelligent systems are growing.
They were created by our designer, Zohar Nir-Amitin. Zohar has been working with LPC since 2015, and has created all our wonderful t-shirts, badges and signage designs.
Vibe-coding is all the rage today. Who needs a developer when you can get an AI to develop an application for you? There are scads of application development tools now that promise to create that app you always wanted -- and surprisingly, these often work!
Hello and welcome back, I hope you are well! In this tutorial we will be exploring how to work with comments, I originally didn't think I would add too many Twitter like features, but I realised that having a self-referential model would actually be a useful lesson. In addition to demonstrating how to achieve this, we can look at how to complete a migration successfully.
This will involve us adjusting our models, adding a form (and respective validator), improving and expanding our controllers, adding the appropriate controller to our app and tweak our templates to accomodate the changes.
Note: There is also an improvement to be made in our models code, mito provides a convenience method to get the id, created-at, and updated-at slots. We will integrate it as we alter our models.
src/models.lisp
When it comes to changes to the post model it is very important that the :col-type is set to (or :post :null) and that :initform nil is also set. This is because when you run the migrations, existing rows will not have data for the parent column and so in the process of migration we have to provide a default. It should be possible to use (or :post :integer) and set :initform 0 if you so wished, but I chose to use :null and nil as my migration pattern.
This also ensures that new posts default to having no parent, which is the right design choice here.
Comments are really a specialist type of post that happens to have a non-nil parent value, we will take what we previously learned from working with post objects and extend it. In reality the only real difference is (sxql:where (:= parent :?)), perhaps I shall see if this could support conditionals inside it, but that's another experiment for another day.
I want to briefly remind you of what the :? does, as security is important!
The :? is a placeholder, it is a way to ensure that values are not placed in the SQL without being escaped, this prevents SQL Injection attacks, the retrieve-by-sql takes a key argument :binds which takes a list of values that will be interpolated into the right parts of the SQL query with the correct quoting.
We used this previously, but I want to remind you to not just inject values into a SQL query without quoting them.
I had not originally planned on this, but as I was writing the comments code it became clear that I was creating lots of duplication, and maybe I still am, but I hit upon a way to simplify the model interface, at least. Ideally it makes no difference if a user is logged in or not at the point the route is hit, the api should be to give the user object (whatever that might be, because it may be nil) and let a specialised method figure out what to do there. So in addition to adding comments (which is what prompted this change) we will also slightly refactor the posts logged-in-posts and not-logged-in-posts into a single, unified posts method cos it's silly of me to have split them like that.
There is also another small fix in this code, turns out there's a set of convenience methods that mito provides:
(mito:object-at ...)
(mito:created-at ...)
(mito:updated-at ...)
Previously we used mito.dao.mixin::id (and could have done the same for create-at, and updated-at), in combination with slot-value, which means (slot-value user 'mito.dao.mixin::id') simply becomes (mito:object-id user), which is much nicer!
(defpackageningle-tutorial-project/models(:use:cl:mito:sxql)(:import-from:ningle-auth/models#:user)(:export#:post#:id#:content#:comments#:likes#:user#:liked-post-p#:posts#:parent#:toggle-like))(in-packageningle-tutorial-project/models)(deftablepost()((user:col-typeningle-auth/models:user:initarg:user:accessoruser)(parent:col-type(or:post:null):initarg:parent:readerparent:initformnil)(content:col-type(:varchar140):initarg:content:accessorcontent)))(deftablelikes()((user:col-typeningle-auth/models:user:initarg:user:readeruser)(post:col-typepost:initarg:post:readerpost))(:unique-keys(userpost)))(defgenericlikes(post)(:documentation"Returns the number of likes a post has"))(defmethodlikes((postpost))(mito:count-dao'likes:postpost))(defgenericcomments(postuser)(:documentation"Gets the comments for a logged in user"))(defmethodcomments((postpost)(useruser))(mito:retrieve-by-sql(sxql:yield(sxql:select(:post.*(:as:user.username:username)(:as(:count:likes.id):like_count)(:as(:count:user_likes.id):liked_by_user))(sxql:from:post)(sxql:where(:=:parent:?))(sxql:left-join:user:on(:=:post.user_id:user.id))(sxql:left-join:likes:on(:=:post.id:likes.post_id))(sxql:left-join(:as:likes:user_likes):on(:and(:=:post.id:user_likes.post_id)(:=:user_likes.user_id:?)))(sxql:group-by:post.id)(sxql:order-by(:desc:post.created_at))(sxql:limit50))):binds(list(mito:object-idpost)(mito:object-iduser))))(defmethodcomments((postpost)(usernull))(mito:retrieve-by-sql(sxql:yield(sxql:select(:post.*(:as:user.username:username)(:as(:count:likes.id):like_count))(sxql:from:post)(sxql:where(:=:parent:?))(sxql:left-join:user:on(:=:post.user_id:user.id))(sxql:left-join:likes:on(:=:post.id:likes.post_id))(sxql:group-by:post.id)(sxql:order-by(:desc:post.created_at))(sxql:limit50))):binds(list(mito:object-idpost))))(defgenerictoggle-like(userpost)(:documentation"Toggles the like of a user to a given post"))(defmethodtoggle-like((ningle-auth/models:useruser)(postpost))(let((liked-post(liked-post-puserpost)))(ifliked-post(mito:delete-daoliked-post)(mito:create-dao'likes:postpost:useruser))(notliked-post)))(defgenericliked-post-p(userpost)(:documentation"Returns true if a user likes a given post"))(defmethodliked-post-p((ningle-auth/models:useruser)(postpost))(mito:find-dao'likes:useruser:postpost))(defgenericposts(user)(:documentation"Gets the posts"))(defmethodposts((useruser))(mito:retrieve-by-sql(sxql:yield(sxql:select(:post.*(:as:user.username:username)(:as(:count:likes.id):like_count)(:as(:count:user_likes.id):liked_by_user))(sxql:from:post)(sxql:left-join:user:on(:=:post.user_id:user.id))(sxql:left-join:likes:on(:=:post.id:likes.post_id))(sxql:left-join(:as:likes:user_likes):on(:and(:=:post.id:user_likes.post_id)(:=:user_likes.user_id:?)))(sxql:group-by:post.id)(sxql:order-by(:desc:post.created_at))(sxql:limit50))):binds(list(mito:object-iduser))))(defmethodposts((usernull))(mito:retrieve-by-sql(sxql:yield(sxql:select(:post.*(:as:user.username:username)(:as(:count:likes.id):like_count))(sxql:from:post)(sxql:left-join:user:on(:=:post.user_id:user.id))(sxql:left-join:likes:on(:=:post.id:likes.post_id))(sxql:group-by:post.id)(sxql:order-by(:desc:post.created_at))(sxql:limit50)))))
src/forms.lisp
All we have to do here is define our form and validators and ensure they are exported, not really a lot of work!
In our *post-parent-validator* we validate that the content of the parent field is not blank (as it is a comment and needs a reference to a parent) and we used a custom validator using clavier:fn and passing a lambda to verify the item is a positive integer.
We then create our comment form, which is very similar to our existing post form, with the difference of pointing to a different http endpoint /post/comment rather than just /post, and we have a hidden parent slot, which we set to 0 by default, so by default the form will be invalid, but that's ok, because we can't possibly know what the parent id would be until the form is rendered and we can set the parent id value at the point we render the form, so it really is nothing to worry about.
Here we begin by first checking that the post exists, if for some reason someone sent a request to our server without a valid post an error might be thrown and no response would be sent at all, which is not good, so we use unless as our "if not" check to return the standard http code for not found, the good old 404!
If however there is no error (a post matching the id exists) we can continue, we build up the hash-table, including the "post", "liked", and "likes" properties of a post. Remember these are not direct properties of a post model, but calculated based on information in other tables, especially the toggle-like (actually it's very important to ensure you call toggle-like first, as it changes the db state that calling likes will depend on), as it returns the toggled status, that is, if a user clicks it once it will like the post, but if they click it again it will "unlike" the post.
Now, with our single post, we have implemented a lot more information, comments, likes, our new comment form, etc so we have to really build up a more comprehensive single-post controller.
Where previously we just rendered the template, we now do a lot more! We can get the likes, comments etc which is a massive step up in functionality.
The next function to look at is post-content, thankfully there isn't too much to change here, all we need to do is ensure we pass through the parent (which will be nil).
We have seen this pattern before, but with some minor differences in which form to load (comment instead of post), and setting the parent from the value injected into the form at the point the form is rendered.
(defpackageningle-tutorial-project/controllers(:use:cl:sxql)(:import-from:ningle-tutorial-project/forms#:post#:content#:parent#:comment)(:export#:index#:post-likes#:single-post#:post-content#:post-comment#:logged-in-profile#:unauthorized-profile#:people#:person))(in-packageningle-tutorial-project/controllers)(defunindex(params)(let*((user(gethash:userningle:*session*))(posts(ningle-tutorial-project/models:postsuser)))(djula:render-template*"main/index.html"nil:title"Home":useruser:postsposts:form(ifuser(cl-forms:find-form'post)nil))))(defunpost-likes(params)(let*((user(gethash:userningle:*session*))(post(mito:find-dao'ningle-tutorial-project/models:post:id(parse-integer(ingle:get-param:idparams))))(res(make-hash-table:test'equal)));; Bail out if post does not exist(unlesspost(setf(getf(lack.response:response-headersningle:*response*):content-type)"application/json")(setf(gethash"error"res)"post not found")(setf(lack.response:response-statusningle:*response*)404)(return-frompost-likes(com.inuoe.jzon.stringifyres)));; success, continue(setf(gethash"post"res)(mito:object-idpost))(setf(gethash"liked"res)(ningle-tutorial-project/models:toggle-likeuserpost))(setf(gethash"likes"res)(ningle-tutorial-project/models:likespost))(setf(getf(lack.response:response-headersningle:*response*):content-type)"application/json")(setf(lack.response:response-statusningle:*response*)201)(com.inuoe.jzon:stringifyres)))(defunsingle-post(params)(handler-case(let((post(mito:find-dao'ningle-tutorial-project/models:post:id(parse-integer(ingle:get-param:idparams))))(form(cl-forms:find-form'comment)))(cl-forms:set-field-valueform'ningle-tutorial-project/forms:parent(mito:object-idpost))(djula:render-template*"main/post.html"nil:title"Post":postpost:comments(ningle-tutorial-project/models:commentspost(gethash:userningle:*session*)):likes(ningle-tutorial-project/models:likespost):formform:user(gethash:userningle:*session*)))(parse-error(err)(setf(lack.response:response-statusningle:*response*)404)(djula:render-template*"error.html"nil:title"Error":errorerr))))(defunpost-content(params)(let((user(gethash:userningle:*session*))(form(cl-forms:find-form'post)))(handler-case(progn(cl-forms:handle-requestform); Can throw an error if CSRF fails(multiple-value-bind(validerrors)(cl-forms:validate-formform)(whenerrors(formatt"Errors: ~A~%"errors))(whenvalid(cl-forms:with-form-field-values(content)form(mito:create-dao'ningle-tutorial-project/models:post:contentcontent:useruser:parentnil)(ingle:redirect"/")))))(simple-error(err)(setf(lack.response:response-statusningle:*response*)403)(djula:render-template*"error.html"nil:title"Error":errorerr)))))(defunpost-comment(params)(let((user(gethash:userningle:*session*))(form(cl-forms:find-form'comment)))(handler-case(progn(cl-forms:handle-requestform); Can throw an error if CSRF fails(multiple-value-bind(validerrors)(cl-forms:validate-formform)(whenerrors(formatt"Errors: ~A~%"errors))(whenvalid(cl-forms:with-form-field-values(contentparent)form(mito:create-dao'ningle-tutorial-project/models:post:contentcontent:useruser:parent(parse-integerparent))(ingle:redirect"/")))))(simple-error(err)(setf(lack.response:response-statusningle:*response*)403)(djula:render-template*"error.html"nil:title"Error":errorerr)))))(defunlogged-in-profile(params)(let((user(gethash:userningle:*session*)))(djula:render-template*"main/profile.html"nil:title"Profile":useruser)))(defununauthorized-profile(params)(setf(lack.response:response-statusningle:*response*)403)(djula:render-template*"error.html"nil:title"Error":error"Unauthorized"))(defunpeople(params)(let((users(mito:retrieve-dao'ningle-auth/models:user)))(djula:render-template*"main/people.html"nil:title"People":usersusers:user(cu-sith:logged-in-p))))(defunperson(params)(let*((username-or-email(ingle:get-param:personparams))(person(first(mito:select-dao'ningle-auth/models:user(where(:or(:=:usernameusername-or-email)(:=:emailusername-or-email)))))))(djula:render-template*"main/person.html"nil:title"Person":personperson:user(cu-sith:logged-in-p))))
src/main.lisp
The change to our main.lisp file is a single line that connects our controller to the urls we have declared we are using.
There are some small changes needed in the index.html file, they're largely just optimisations. The first is changing a boolean around likes to integer, this gets into the weeds of JavaScript types, and ensuring things were of the Number type in JS just made things easier. Some of the previous code even treated booleans as strings, which was pretty bad, I don't write JS in any real capacity, so I often make mistakes with it, because it so very often appears to work instead of just throwing an error.
~ Lines 28 - 30
data-logged-in="true"
- data-liked="false"
+ data-liked="0"
aria-label="Like post ">
The changes to this file as so substantial that the file might as well be brand new, so in the interests of clarity, I will simply show the file in full.
{% extends "base.html" %}
{% block content %}
<divclass="container"><divclass="row"><divclass="col-12"><divclass="card post mb-3"data-href="/post/{{ post.id }}"><divclass="card-body"><h5class="card-title mb-2">{{ post.content }}</h5><pclass="card-subtitle text-muted mb-0">@{{ post.user.username }}</p></div><divclass="card-footer d-flex justify-content-between align-items-center"><buttontype="button"class="btn btn-sm btn-outline-primary like-button"data-post-id="{{ post.id }}"data-logged-in="{% if user.username != ""%}true{%else%}false{%endif%}"data-liked="{% if post.liked-by-user == 1 %}1{% else %}0{% endif %}"aria-label="Like post {{ post.id }}">
{% if post.liked-by-user == 1 %}
<iclass="bi bi-hand-thumbs-up-fill text-primary"aria-hidden="true"></i>
{% else %}
<iclass="bi bi-hand-thumbs-up text-muted"aria-hidden="true"></i>
{% endif %}
<spanclass="ms-1 like-count">{{ likes }}</span></button><smallclass="text-muted">Posted on: {{ post.created-at }}</small></div></div></div></div><!-- Post form -->
{% if user %}
<divclass="row mb-4"><divclass="col">
{% if form %}
{% form form %}
{% endif %}
</div></div>
{% endif %}
{% if comments %}
<divclass="row mb-4"><divclass="col-12"><h2>Comments</h2></div></div>
{% endif %}
{% for comment in comments %}
<divclass="row mb-4"><divclass="col-12"><divclass="card post mb-3"data-href="/post/{{ comment.id }}"><divclass="card-body"><h5class="card-title mb-2">{{ comment.content }}</h5><pclass="card-subtitle text-muted mb-0">@{{ comment.username }}</p></div><divclass="card-footer d-flex justify-content-between align-items-center"><buttontype="button"class="btn btn-sm btn-outline-primary like-button"data-post-id="{{ comment.id }}"data-logged-in="{% if user.username != ""%}true{%else%}false{%endif%}"data-liked="{% if comment.liked-by-user == 1 %}1{% else %}0{% endif %}"aria-label="Like post {{ comment.id }}">
{% if comment.liked-by-user == 1 %}
<iclass="bi bi-hand-thumbs-up-fill text-primary"aria-hidden="true"></i>
{% else %}
<iclass="bi bi-hand-thumbs-up text-muted"aria-hidden="true"></i>
{% endif %}
<spanclass="ms-1 like-count">{{ comment.like-count }}</span></button><smallclass="text-muted">Posted on: {{ comment.created-at }}</small></div></div></div></div>
{% endfor %}
</div>
{% endblock %}
{% block js %}
document.querySelectorAll(".like-button").forEach(btn => {
btn.addEventListener("click", function (e) {
e.stopPropagation();
e.preventDefault();
// Check login
if (btn.dataset.loggedIn !== "true") {
alert("You must be logged in to like posts.");
return;
}
const postId = btn.dataset.postId;
const countSpan = btn.querySelector(".like-count");
const icon = btn.querySelector("i");
const liked = Number(btn.dataset.liked) === 1;
const previous = parseInt(countSpan.textContent, 10) || 0;
const url = `/post/${postId}/likes`;
// Optimistic UI toggle
countSpan.textContent = liked ? previous - 1 : previous + 1;
btn.dataset.liked = liked ? 0 : 1;
// Toggle icon classes optimistically
if (liked) {
// Currently liked, so unlike it
icon.className = "bi bi-hand-thumbs-up text-muted";
} else {
// Currently not liked, so like it
icon.className = "bi bi-hand-thumbs-up-fill text-primary";
}
const csrfTokenMeta = document.querySelector('meta[name="csrf-token"]');
const headers = { "Content-Type": "application/json" };
if (csrfTokenMeta) headers["X-CSRF-Token"] = csrfTokenMeta.getAttribute("content");
fetch(url, {
method: "POST",
headers: headers,
body: JSON.stringify({ toggle: true })
})
.then(resp => {
if (!resp.ok) {
// Revert optimistic changes on error
countSpan.textContent = previous;
btn.dataset.liked = liked ? 1 : 0;
icon.className = liked ? "bi bi-hand-thumbs-up-fill text-primary" : "bi bi-hand-thumbs-up text-muted";
throw new Error("Network response was not ok");
}
return resp.json();
})
.then(data => {
if (data && typeof data.likes !== "undefined") {
countSpan.textContent = data.likes;
btn.dataset.liked = data.liked ? 1 : 0;
icon.className = data.liked ? "bi bi-hand-thumbs-up-fill text-primary" : "bi bi-hand-thumbs-up text-muted";
}
})
.catch(err => {
console.error("Like failed:", err);
// Revert optimistic changes on error
countSpan.textContent = previous;
btn.dataset.liked = liked ? 1 : 0;
icon.className = liked ? "bi bi-hand-thumbs-up-fill text-primary" : "bi bi-hand-thumbs-up text-muted";
});
});
});
document.querySelectorAll(".card.post").forEach(card => {
card.addEventListener("click", function () {
const href = card.dataset.href;
if (href) {
window.location.href = href;
}
});
});
{% endblock %}
Conclusion
Learning Outcomes
Level
Learning Outcome
Understand
Understand how to model a self-referential post table in Mito (using a nullable parent column) and why (or :post :null)/:initform nil are important for safe migrations and representing "top-level" posts versus comments.
Apply
Apply Mito, SXQL, and cl-forms to implement a comment system end-to-end: defining comments/posts generics, adding validators (including a custom clavier:fn), wiring controllers and routes, and rendering comments and like-buttons in templates.
Analyse
Analyse and reduce duplication in the models/controllers layer by consolidating separate code paths (logged-in vs anonymous) into generic functions specialised on user/null, and by examining how SQL joins and binds shape the returned data.
Evaluate
Evaluate different design and safety choices in the implementation (nullable vs sentinel parents, optimistic UI vs server truth, HTTP status codes, SQL placeholders, CSRF and login checks) and judge which approaches are more robust and maintainable.
Github
The link for this tutorials code is available here.
Common Lisp HyperSpec
Symbol
Type
Why it appears in this lesson
CLHS
defpackage
Macro
Define project packages like ningle-tutorial-project/models, /forms, /controllers, and the main system package.
Here are some Django-related deals for this year's Black Friday (28th November) and Cyber Monday (1st December), including my own.
I'll keep this post up to date with any new deals I learn about. If you are also a creator, email me with details of your offer and I'll add it here.
My books
My four books have a 50% discount, for both individual and team licenses, until the end of Cyber Monday (1st December), including Boost Your GitHub DX, which I released last week. This deal stacks with the bundle offers and purchasing power parity discount for those in lower-income countries.
Aidas Bendoraitis of djangotricks.com has created three paid packages. Use the links below for a 20% discount, available until the end of the 1st December.
Django GDPR Cookie Consent - a customizable, self-hosted cookie consent screen. This package takes the pain out of setting up legally-mandated cookie banners and settings, without using an expensive or inflexible vendor.
Django Paddle Subscriptions - an integration with Paddle's billing API for subscriptions. This package allows you to collect SaaS payments internationally with a reliable payment processor.
Django Messaging - ready-to-use, real-time solution for Django that saves months of development. It offers private and group chats, embeddable widgets, flexible settings, a modern UI, and supports both WebSocket and polling.
This book is a tour through the advanced topic of asynchronous programming in Django. It covers the range of tools and protocols available for asynchronous behaviour in your web application. It's written by Paul Bailey, an experienced Principal Engineer.
Paul is offering ~50% off the book with the coupon link, discounting the book from $39.95 to $21. This is available until the 1st December.
Cory Zue's SaaS Pegasus is a configurable Django project template with many preset niceties, including teams, Stripe subscriptions, a front end pipeline, and built-in AI tooling. It can massively accelerate setting up a SaaS in Django.
The "unlimited license" is discounted 50%, from $999 to $499.50. This deal is available from the 21st November until the 3rd December.
Django is maintained by the Django Software Foundation (DSF), a non-profit organization that relies on donations to fund its work. So while it cannot run sales, supporting it is definitely a good deal!
I'm used to running pre-commit autoupdate regularly to update the versions of the linters/formatters that I use. Especially when there's some error.
For example, a couple of months ago, there was some problem with ansible-lint. You have an ansible-lint, ansible and ansible-core package and one of them needed an upgrade. I'd get an error like this:
ModuleNotFoundError: No module named 'ansible.parsing.yaml.constructor'
The solution: pre-commit autoupdate, which grabbed a new ansible-lint version that solved the problem. Upgrading is good.
But... little over a month ago, ansible-lint pinned python to 3.13 in the pre-commit hook. So when you update, you suddenly need to have 3.13 on your machine. I have that locally, but on the often-used "ubuntu latest" (24.04) github action runner, only 3.12 is installed by default. Then you'd get this:
[INFO] Installing environment for https://github.com/pre-commit/pre-commit-hooks.
[INFO] Once installed this environment will be reused.
[INFO] This may take a few minutes...
[INFO] Installing environment for https://github.com/astral-sh/ruff-pre-commit.
[INFO] Once installed this environment will be reused.
[INFO] This may take a few minutes...
[INFO] Installing environment for https://github.com/ansible-community/ansible-lint.git.
[INFO] Once installed this environment will be reused.
[INFO] This may take a few minutes...
An unexpected error has occurred: CalledProcessError: command:
('/opt/hostedtoolcache/Python/3.12.12/x64/bin/python', '-mvirtualenv',
'/home/runner/.cache/pre-commit/repomm4m0yuo/py_env-python3.13', '-p', 'python3.13')
return code: 1
stdout:
RuntimeError: failed to find interpreter for Builtin discover of python_spec='python3.13'
stderr: (none)
Check the log at /home/runner/.cache/pre-commit/pre-commit.log
Error: Process completed with exit code 3.
Ansible-lint's pre-commit hook needs 3.10+ or so, but won't accept anything except 3.13. Here's the change: https://github.com/ansible/ansible-lint/pull/4796 (including some comments that it is not ideal, including the github action problem).
The change apparently gives a good error message to people running too-old python versions, but it punishes those that do regular updates (and have perfectly fine non-3.13 python versions). A similar pin was done in "black" and later reverted (see the comments on this issue) as it caused too many problems.
Note: this comment gives some of the reasons for hardcoding 3.13. Pre-commit itself doesn't have a way to specify a minimum Python version. Apparently old Python version cans lead to weird install errors, though I haven't found a good ticket about that in the issue tracker. The number of issues in the tracker is impressively high, so I can imagine such a hardcoded version helping a bit.
Now on to the "fix". Override the language_version like this:
- repo: https://github.com/ansible-community/ansible-lint.git
hooks:
- id: ansible-lint
language_version: python3 # or python3.12 or so
If you use ansible-lint a lot (like I do), you'll have to add that line to all your (django) project repositories when you update your pre-commit config...
In a three-article series published recently on this site (Part 1, Part 2, Part 3), I've been demonstrating the power of the AWS Cloud Development Kit (CDK) in the Infrastructure as Code (IaC) area, especially when coupled with the ubiquitous Java and its supersonic/subatomic cloud-native stack: Quarkus.
While focusing on the CDK fundamentals in Java, like Stack and Construct, together with their Quarkus implementations, this series was a bit frugal as far as the infrastructure elements were concerned. Indeed, for the sake of clarity and simplification, the infrastructure used to illustrate how to use the CDK with Java and Quarkus was inherently consensual. Hence, the idea for a new series, of which this article is the first, is a series less concerned with CDK internals and more dedicated to the infrastructure itself.
Troubleshooting memory problems, such as memory leaks and OutOfMemoryError, can be an intimidating task even for experienced engineers. In this post, we would like to share simple tips, tools, and tricks so that even a novice engineer can isolate memory problems and resolve them quickly.
What Are Common Signs of a Java Memory Leak That Might Lead to OutOfMemoryError?
Before your application throws an OutOfMemoryError, it usually gives you a few warning signs. If you catch them early, you can prevent downtime and customer impact. Here's what you should keep an eye on:
I am just really bored by Lisp Machine romantics at this point: they should go away. I expect they never will.
History
Symbolics went bankrupt in early 1993. In the way of these things various remnants of the company lingered on for, in this case, decades. But 1983 was when the Lisp Machines died.
The death was not unexpected: by the time I started using mainstream Lisps in 19891 everyone knew that special hardware for Lisp was a dead idea. The common idea was that the arrival of RISC machines had killed it, but in fact machines like the Sun 3/260 in its 'AI' configuration2 were already hammering nails in its coffin. In 1987 I read a report showing the Lisp performance of an early RISC machine, using Kyoto Common Lisp, not a famously fast implementation of CL, beating a Symbolics on the Gabriel benchmarks [PDF link].
1993 is 32 years ago. The Symbolics 3600, probably the first Lisp machine that sold in more than tiny numbers, was introduced in 1983, ten years earlier. People who used Lisp machines other than as historical artefacts are old today3.
Lisp machines were both widely available and offered the best performance for Lisp for a period of about five years which ended nearly forty years ago. They were probably never competitive in terms of performance for the money.
It is time, and long past time, to let them go.
But still the romantics - some of them even old enough to remember the Lisp machines - repeat their myths.
'It was the development environment'
No, it wasn't.
The development environments offered by both families of Lisp machines were seriously cool, at least for the 1980s. I mean, they really were very cool indeed. Some of the ways they were cool matter today, but some don't. For instance in the 1980s and early 1990s Lisp images were very large compared to available memory, and machines were also extremely slow in general. So good Lisp development environents did a lot of work to hide this slowness, and in general making sure you only very seldom had to restart everthing, which took significant fractions of an hour, if not more. None of that matters today, because machines are so quick and Lisps so relatively small.
But that's not the only way they were cool. They really were just lovely things to use in many ways. But, despite what people might believe: this did not depend on the hardware: there is no reason at all why a development environent that cool could not be built on stock hardware. Perhaps, (perhaps) that was not true in 1990: it is certainly true today.
So if a really cool Lisp development environment doesn't exist today, it is nothing to do with Lisp machines not existing. In fact, as someone who used Lisp machines, I find the LispWorks development environment at least as comfortable and productive as they were. But, oh no, the full-fat version is not free, and no version is open source. Neither, I remind you, were they.
Please, stop telling me things about machines I used: believe it or not, I know those things.
Many machines were user-microcodable before about 1990. That meant that, technically, a user of the machine could implement their own instruction set. I am sure there are cases where people even did that, and a much smaller number of cases where doing that was not just a waste of time.
But in almost all cases the only people who wrote microcode were the people who built the machine. And the reason they wrote microcode was because it is the easiest way of implementing a very complex instruction set, especially when you can't use vast numbers of transistors. For instance if you're going to provide an 'add' instruction which will add numbers of any type, trapping back into user code for some cases, then by far the easiest way of doing that is going to be by writing code, not building hardware. And that's what the Lisp machines did.
Of course, the compiler could have generated that code for hardware without that instruction. But with the special instruction the compiler's job is much easier, and code is smaller. A small, quick compiler and small compiled code were very important with slow machines which had tiny amounts of memory. Of course a compiler not made of wet string could have used type information to avoid generating the full dispatch case, but wet string was all that was available.
What microcodable machines almost never meant was that users of the machines would write microcode.
At the time, the tradeoffs made by Lisp machines might even have been reasonable. CISC machines in general were probably good compromises given the expense of memory and how rudimentary compilers were: I can remember being horrified at the size of compiled code for RISC machines. But I was horrified because I wasn't thinking about it properly. Moore's law was very much in effect in about 1990 and, among other things, it meant that the amount of memory you could afford was rising exponentially with time: the RISC people understood that.
'They were Lisp all the way down'
This, finally, maybe, is a good point. They were, and you could dig around and change things on the fly, and this was pretty cool. Sometimes you could even replicate the things you'd done later. I remember playing with sound on a 3645 which was really only possible because you could get low-level access to the disk from Lisp, as the disk could just marginally provide data fast enough to stream sound.
On the other hand they had no isolation and thus no security at all: people didn't care about that in 1985, but if I was using a Lisp-based machine today I would certainly be unhappy if my web browser could modify my device drivers on the fly, or poke and peek at network buffers. A machine that was Lisp all the way down today would need to ensure that things like that couldn't happen.
So may be it would be Lisp all the way down, but you absolutely would not have the kind of ability to poke around in and redefine parts of the guts you had on Lisp machines. Maybe that's still worth it.
Not to mention that I'm just not very interested in spending a huge amount of time grovelling around in the guts of something like an SSL implementation: those things exist already, and I'd rather do something new and cool. I'd rather do something that Lisp is uniquely suited for, not reinvent wheels. Well, may be that's just me.
Machines which were Lisp all the way down might, indeed, be interesting, although they could not look like 1980s Lisp machines if they were to be safe. But that does not mean they would need special hardware for Lisp: they wouldn't. If you want something like this, hardware is not holding you back: there's no need to endlessly mourn the lost age of Lisp machines, you can start making one now. Shut up and code.
And now we come to the really strange arguments, the arguments that we need special Lisp machines either for reasons which turn out to be straightforwardly false, or because we need something that Lisp machines never were.
'Good Lisp compilers are too hard to write for stock hardware'
This mantra is getting old.
The most important thing is that we have good stock-hardware Lisp compilers today. As an example, today's CL compilers are not far from CLANG/LLVM for floating-point code. I tested SBCL and LispWorks: it would be interesting to know how many times more work has gone into LLVM than them for such a relatively small improvement. I can't imagine a world where these two CL compilers would not be at least comparable to LLVM if similar effort was spent on them4.
These things are so much better than the wet-cardboard-and-string compilers that the LispMs had it's not funny. In particular, if some mythical 'dedicated Lisp hardware' made it possible to write a Lisp compiler which generated significantly faster code, then code from Lisp compilers would comprehensively outperform C and Fortran compilers: does that seem plausible? I thought not.
A large amount of work is also going into compilation for other dynamically-typed, interactive languages which aim at high performance. That means on-the-fly compilation and recompilation of code where both the compilation and the resulting code must be quick. Example: Julia. Any of that development could be reused by Lisp compiler writers if they needed to or wanted to (I don't know if they do, or should).
Ah, but then it turns out that that's not what is meant by a 'good compiler' after all. It turns out that 'good' means 'compillation is fast'.
All these compilers are pretty quick: the computational resources used by even a pretty hairy compiler have not scaled anything like as fast as those needed for the problems we want to solve (that's why Julia can use LLVM on the fly). Compilation is also not an Amdahl bottleneck as it can happen on the node that needs the compiled code.
Compilers are so quick that a widely-used CL implementation exists where EVAL uses the compiler, unless you ask it not to.
Compilation options are also a thing: you can ask compilers to be quick, fussy, sloppy, safe, produce fast code and so on. Some radically modern languages also allow this to be done in a standardised (but extensible) way at the language level, so you can say 'make this inner loop really quick, and I have checked all the bounds so don't bother with that'.
The tradeoff between a fast Lisp compiler and a really good Lisp compiler is imaginary, at this point.
'They had wonderful keyboards'
Well, if you didn't mind the weird layouts: yes, they did5. And has exactly nothing to do with Lisp.
And so it goes on.
Bored now
There's a well-known syndrome amongst photographers and musicians called GAS: gear acquisition syndrome. Sufferers from this6 pursue an endless stream of purchases of gear - cameras, guitars, FX pedals, the last long-expired batch of a legendary printing paper - in the strange hope that the next camera, the next pedal, that paper, will bring out the Don McCullin, Jimmy Page or Chris Killip in them. Because, of course, Don McCullin & Chris Killip only took the pictures they did because he had the right cameras: it was nothing to do with talent, practice or courage, no.
GAS is a lie we tell ourselves to avoid the awkward reality that what we actually need to do is practice, a lot, and that even if we did that we might not actually be very talented.
Lisp machine romanticism is the same thing: a wall we build ourself so that, somehow unable to climb over it or knock it down, we never have to face the fact that the only thing stopping us is us.
There is no purpose to arguing with Lisp machine romantics because they will never accept that the person building the endless barriers in their way is the same person they see in the mirror every morning. They're too busy building the walls.
As a footnote, I went to a talk by an HPC person in the early 90s (so: after the end of the cold war7 and when the HPC money had gone) where they said that HPC people needed to be aiming at machines based on what big commercial systems looked like as nobody was going to fund dedicated HPC designs any more. At the time that meant big cache-coherent SMP systems. Those hit their limits and have really died out now: the bank I worked for had dozens of fully-populated big SMP systems in 2007, it perhaps still has one or two they can't get rid of because of some legacy application. So HPC people now run on enormous shared-nothing farms of close-to-commodity processors with very fat interconnect and are wondering about / using GPUs. That's similar to what happened to Lisp systems, of course: perhaps, in the HPC world, there are romantics who mourn the lost glories of the Cray-3. Well, if I was giving a talk to people interested in the possibilities of hardware today I'd be saying that in a few years there are going to be a lot of huge farms of GPUs going very cheap if you can afford the power. People could be looking at whether those can be used for anything more interesting than the huge neural networks they were designed for. I don't know if they can.
Before that I had read about Common Lisp but actually written programs in Cambridge Lisp and Standard Lisp. ↩
This had a lot of memory and a higher-resolution screen, I think, and probably was bundled with a rebadged Lucid Common Lisp. ↩
I am at the younger end of people who used these machines in anger: I was not there for the early part of the history described here, and I was also not in the right part of the world at a time when that mattered more. But I wrote Lisp from about 1985 and used Lisp machines of both families from 1989 until the mid to late 1990s. I know from first-hand experience what these machines were like. ↩
If anyone has good knowledge of Arm64 (specifically Apple M1) assembler and performance, and the patience to pore over a couple of assembler listings and work out performance differences, please get in touch. I have written most of a document exploring the difference in performance, but I lost the will to live at the point where it came down to understanding just what details made the LLVM code faster. All the compilers seem to do a good job of the actual float code, but perhaps things like array access or loop overhead are a little slower in Lisp. The difference between SBCL & LLVM is a factor of under 1.2. ↩
The Sun type 3 keyboard was both wonderful and did not have a weird layout, so there's that. ↩
The real performance of any computer hardware in production is the result of the hardware, software, and tuning; the investment and sequence of these efforts can be pictured as a three-stage rocket:
I recently presented this embarrassingly simple diagram to Intel's executive leadership, and at the time realized the value of sharing it publicly. The Internet is awash with comparisons about Intel (and other vendors') product performance based on hardware performance alone, but the performance of software and then tuning can make a huge difference for your particular workload. You need all three stages to reach the highest, and most competitive, performance.
It's obvious why this is important for HW vendors to understand internally - they, like the Internet, can get overly focused on HW alone. But customers need to understand it as well. If a benchmark is comparing TensorFlow performance between HW vendors, was the Intel hardware tested using the Intel Extension for TensorFlow Software, and was it then tuned? The most accurate and realistic evaluation for HW involves selecting the best software and then tuning it, and doing this for all HW options.
I spend a lot of time on the final stage, tuning - what I call third-stage engineering. It's composed of roughly four parts: People, training, tools, and capabilities. You need staff, you need them trained to understand performance methodologies and SW and HW internals, they need tools to analyze the system (both observational and experimental), and finally they need capabilities to tune (tunable parameters, settings, config, code changes, etc.).
I see too many HW evaluations that are trying to understand customer performance but are considering HW alone, which is like only testing the first stage of a rocket. This doesn't help vendors or customers. I hope that's what my simple diagram makes obvious: We need all three stages to reach the highest altitude.
In Spring, the ApplicationContext is the central container object that manages all beans (i.e., components, services, repositories, etc.).
Its tasks include reading the configuration (Java Config, XML, annotations), creating and managing bean instances, handling dependency injection, and running the application lifecycle.
You're working on an application. Let's call it "FooApp". FooApp has a dependency on an open source library, let's call it "LibBar". You find a bug in LibBar that affects FooApp.
To envisage the best possible version of this scenario, let's say you actively like LibBar, both technically and socially. You've contributed to it in the past. But this bug is causing production issues in FooApp today, and LibBar's release schedule is quarterly. FooApp is your job; LibBar is (at best) your hobby. Blocking on the full upstream contribution cycle and waiting for a release is an absolute non-starter.
What do you do?
There are a few common reactions to this type of scenario, all of which are bad options.
I will enumerate them specifically here, because I suspect that some of them may resonate with many readers:
Find an alternative to LibBar, and switch to it.
This is a bad idea because a transition to a core infrastructure component could be extremely expensive.
Vendor LibBar into your codebase and fix your vendored version.
This is a bad idea because carrying this one fix now requires you to maintain all the tooling associated with a monorepo1: you have to be able to start pulling in new versions from LibBar regularly, reconcile your changes even though you now have a separate version history on your imported version, and so on.
This is a bad idea because you are now extremely tightly coupled to a specific version of LibBar. By modifying LibBar internally like this, you're inherently violating its compatibility contract, in a way which is going to be extremely difficult to test. You can test this change, of course, but as LibBar changes, you will need to replicate any relevant portions of its test suite (which may be its entire test suite) in FooApp. Lots of potential duplication of effort there.
Implement a workaround in your own code, rather than fixing it.
This is a bad idea because you are distorting the responsibility for correct behavior. LibBar is supposed to do LibBar's job, and unless you have a full wrapper for it in your own codebase, other engineers (including "yourself, personally") might later forget to go through the alternate, workaround codepath, and invoke the buggy LibBar behavior again in some new place.
Implement the fix upstream in LibBar anyway, because that's the Right Thing To Do, and burn credibility with management while you anxiously wait for a release with the bug in production.
This is a bad idea because you are betraying your users - by allowing the buggy behavior to persist - for the workflow convenience of your dependency providers. Your users are probably giving you money, and trusting you with their data. This means you have both ethical and economic obligations to consider their interests.
As much as it's nice to participate in the open source community and take on an appropriate level of burden to maintain the commons, this cannot sustainably be at the explicit expense of the population you serve directly.
Even if we only care about the open source maintainers here, there's still a problem: as you are likely to come under immediate pressure to ship your changes, you will inevitably relay at least a bit of that stress to the maintainers. Even if you try to be exceedingly polite, the maintainers will know that you are coming under fire for not having shipped the fix yet, and are likely to feel an even greater burden of obligation to ship your code fast.
Much as it's good to contribute the fix, it's not great to put this on the maintainers.
The respective incentive structures of software development - specifically, of corporate application development and open source infrastructure development - make options 1-4 very common.
On the corporate / application side, these issues are:
it's difficult for corporate developers to get clearance to spend even small amounts of their work hours on upstream open source projects, but clearance to spend time on the project they actually work on is implicit. If it takes 3 hours of wrangling with Legal2 and 3 hours of implementation work to fix the issue in LibBar, but 0 hours of wrangling with Legal and 40 hours of implementation work in FooApp, a FooApp developer will often perceive it as "easier" to fix the issue downstream.
it's difficult for corporate developers to get clearance from management to spend even small amounts of money sponsoring upstream reviewers, so even if they can find the time to contribute the fix, chances are high that it will remain stuck in review unless they are personally well-integrated members of the LibBar development team already.
even assuming there's zero pressure whatsoever to avoid open sourcing the upstream changes, there's still the fact inherent to any development team that FooApp's developers will be more familiar with FooApp's codebase and development processes than they are with LibBar's. It's just easier to work there, even if all other things are equal.
systems for tracking risk from open source dependencies often lack visibility into vendoring, particularly if you're doing a hybrid approach and only vendoring a few things to address work in progress, rather than a comprehensive and disciplined approach to a monorepo. If you fully absorb a vendored dependency and then modify it, Dependabot isn't going to tell you that a new version is available any more, because it won't be present in your dependency list. Organizationally this is bad of course but from the perspective of an individual developer this manifests mostly as fewer annoying emails.
But there are problems on the open source side as well. Those problems are all derived from one big issue: because we're often working with relatively small sums of money, it's hard for upstream open source developers to consume either money or patches from application developers. It's nice to say that you should contribute money to your dependencies, and you absolutely should, but the cost-benefit function is discontinuous. Before a project reaches the fiscal threshold where it can be at least one person's full-time job to worry about this stuff, there's often no-one responsible in the first place. Developers will therefore gravitate to the issues that are either fun, or relevant to their own job.
These mutually-reinforcing incentive structures are a big reason that users of open source infrastructure, even teams who work at corporate users with zillions of dollars, don't reliably contribute back.
The Answer We Want
All those options are bad. If we had a good option, what would it look like?
It is both practically necessary3 and morally required4 for you to have a way to temporarily rely on a modified version of an open source dependency, without permanently diverging.
Below, I will describe a desirable abstract workflow for achieving this goal.
Step 0: Report the Problem
Before you get started with any of these other steps, write up a clear description of the problem and report it to the project as an issue; specifically, in contrast to writing it up as a pull request. Describe the problem before submitting a solution.
You may not be able to wait for a volunteer-run open source project to respond to your request, but you should at least tell the project what you're planning on doing.
If you don't hear back from them at all, you will have at least made sure to comprehensively describe your issue and strategy beforehand, which will provide some clarity and focus to your changes.
If you do hear back from them, in the worst case scenario, you may discover that a hard fork will be necessary because they don't consider your issue valid, but even that information will save you time, if you know it before you get started. In the best case, you may get a reply from the project telling you that you've misunderstood its functionality and that there is already a configuration parameter or usage pattern that will resolve your problems with no new code. But in all cases, you will benefit from early coordination on what needs fixing before you get to how to fix it.
Step 1: Source Code and CI Setup
Fork the source code for your upstream dependency to a writable location where it can live at least for the duration of this one bug-fix, and possibly for the duration of your application's use of the dependency. After all, you might want to fix more than one bug in LibBar.
You want to have a place where you can put your edits, that will be version controlled and code reviewed according to your normal development process. This probably means you'll need to have your own main branch that diverges from your upstream's main branch.
Remember: you're going to need to deploy this to your production, so testing gates that your upstream only applies to final releases of LibBar will need to be applied to every commit here.
Depending on your LibBar's own development process, this may result in slightly unusual configurations where, for example, your fixes are written against the last LibBar release tag, rather than its current5main; if the project has a branch-freshness requirement, you might need two branches, one for your upstream PR (based on main) and one for your own use (based on the release branch with your changes).
Ideally for projects with really good CI and a strong "keep main release-ready at all times" policy, you can deploy straight from a development branch, but it's good to take a moment to consider this before you get started. It's usually easier to rebase changes from an older HEAD onto a newer one than it is to go backwards.
Speaking of CI, you will want to have your own CI system. The fact that GitHub Actions has become a de-facto lingua franca of continuous integration means that this step may be quite simple, and your forked repo can just run its own instance.
Optional Bonus Step 1a: Artifact Management
If you have an in-house artifact repository, you should set that up for your dependency too, and upload your own build artifacts to it. You can often treat your modified dependency as an extension of your own source tree and install from a GitHub URL, but if you've already gone to the trouble of having an in-house package repository, you can pretend you've taken over maintenance of the upstream package temporarily (which you kind of have) and leverage those workflows for caching and build-time savings as you would with any other internal repo.
Step 2: Do The Fix
Now that you've got somewhere to edit LibBar's code, you will want to actually fix the bug.
Step 2a: Local Filesystem Setup
Before you have a production version on your own deployed branch, you'll want to test locally, which means having both repositories in a single integrated development environment.
At this point, you will want to have a local filesystem reference to your LibBar dependency, so that you can make real-time edits, without going through a slow cycle of pushing to a branch in your LibBar fork, pushing to a FooApp branch, and waiting for all of CI to run on both.
This is useful in both directions: as you prepare the FooApp branch that makes any necessary updates on that end, you'll want to make sure that FooApp can exercise the LibBar fix in any integration tests. As you work on the LibBar fix itself, you'll also want to be able to use FooApp to exercise the code and see if you've missed anything - and this, you wouldn't get in CI, since LibBar can't depend on FooApp itself.
In short, you want to be able to treat both projects as an integrated development environment, with support from your usual testing and debugging tools, just as much as you want your deployment output to be an integrated artifact.
Step 2b: Branch Setup for PR
However, for continuous integration to work, you will also need to have a remote resource reference of some kind from FooApp's branch to LibBar. You will need 2 pull requests: the first to land your LibBar changes to your internal LibBar fork and make sure it's passing its own tests, and then a second PR to switch your LibBar dependency from the public repository to your internal fork.
At this step it is very important to ensure that there is an issue filed on your own internal backlog to drop your LibBar fork. You do not want to lose track of this work; it is technical debt that must be addressed.
Until it's addressed, automated tools like Dependabot will not be able to apply security updates to LibBar for you; you're going to need to manually integrate every upstream change. This type of work is itself very easy to drop or lose track of, so you might just end up stuck on a vulnerable version.
Step 3: Deploy Internally
Now that you're confident that the fix will work, and that your temporarily-internally-maintained version of LibBar isn't going to break anything on your site, it's time to deploy.
Some deployment heritage should help to provide some evidence that your fix is ready to land in LibBar, but at the next step, please remember that your production environment isn't necessarily emblematic of that of all LibBar users.
Step 4: Propose Externally
You've got the fix, you've tested the fix, you've got the fix in your own production, you've told upstream you want to send them some changes. Now, it's time to make the pull request.
You're likely going to get some feedback on the PR, even if you think it's already ready to go; as I said, despite having been proven in your production environment, you may get feedback about additional concerns from other users that you'll need to address before LibBar's maintainers can land it.
As you process the feedback, make sure that each new iteration of your branch gets re-deployed to your own production. It would be a huge bummer to go through all this trouble, and then end up unable to deploy the next publicly released version of LibBar within FooApp because you forgot to test that your responses to feedback still worked on your own environment.
Step 4a: Hurry Up And Wait
If you're lucky, upstream will land your changes to LibBar. But, there's still no release version available. Here, you'll have to stay in a holding pattern until upstream can finalize the release on their end.
Depending on some particulars, it might make sense at this point to archive your internal LibBar repository and move your pinned release version to a git hash of the LibBar version where your fix landed, in their repository.
Before you do this, check in with the LibBar core team and make sure that they understand that's what you're doing and they don't have any wacky workflows which may involve rebasing or eliding that commit as part of their release process.
Step 5: Unwind Everything
Finally, you eventually want to stop carrying any patches and move back to an official released version that integrates your fix.
You want to do this because this is what the upstream will expect when you are reporting bugs. Part of the benefit of using open source is benefiting from the collective work to do bug-fixes and such, so you don't want to be stuck off on a pinned git hash that the developers do not support for anyone else.
As I said in step 2b6, make sure to maintain a tracking task for doing this work, because leaving this sort of relatively easy-to-clean-up technical debt lying around is something that can potentially create a lot of aggravation for no particular benefit. Make sure to put your internal LibBar repository into an appropriate state at this point as well.
Up Next
This is part 1 of a 2-part series. In part 2, I will explore in depth how to execute this workflow specifically for Python packages, using some popular tools. I'll discuss my own workflow, standards like PEP 517 and pyproject.toml, and of course, by the popular demand that I just know will come, uv.
if you already have all the tooling associated with a monorepo, including the ability to manage divergence and reintegrate patches with upstream, you already have the higher-overhead version of the workflow I am going to propose, so, never mind. but chances are you don't have that, very few companies do. ↩
In any business where one must wrangle with Legal, 3 hours is a wildly optimistic estimate. ↩
This is a talk about migrating from Plone 4 to 6 with the newest toolset.
There are several challenges when doing Plone migrations:
Highly customized source instances: custom workflow, add-ons, not all of them with versions that worked on Plone 6.
Complex data structures. For example a Folder with a Link as default page, with pointed to some other content which meanwhile had been moved.
Migrating Classic UI to Volto
Also, you might be migrating from a completely different CMS to Plone.
How do we do migrations in Plone in general?
In place migrations. Run migration steps on the source instance itself. Use the standard upgrade steps from Plone. Suitable for smaller sites with not so much complexity. Especially suitable if you do only a small Plone version update.
Export - import migrations. You extract data from the source, transform it, and load the structure in the new site. You transform the data outside of the source instance. Suitable for all kinds of migrations. Very safe approach: only once you are sure everything is fine, do you switch over to the newly migrated site. Can be more time consuming.
Let's look at export/import, which has three parts:
Extraction: you had collective.jsonify, transmogrifier, and now collective.exportimport and plone.exportimport.
Transformation: transmogrifier, collective.exportimport, and new: collective.transmute.
Transmogrifier is old, we won't talk about it now. collective.exportimport: written by Philip Bauer mostly. There is an@@export_allview, and then@@import_allto import it.
collective.transmuteis a new tool. This is made to transform data from collective.exportimport to the plone.exportimport format. Potentially it can be used for other migrations as well. Highly customizable and extensible. Tested by pytest. It is standalone software with a nice CLI. No dependency on Plone packages.
Another tool:collective.html2blocks. This is a lightweight Python replacement for the JavaScript Blocks conversion tool. This is extensible and tested.
Lastlyplone.exportimport. This is a stripped down version ofcollective.exportimport. This focuses on extract and load. No transforms. So this is best suited for importing to a Plone site with the same version.
collective.transmute is in alpha, probably a 1.0.0 release in the next weeks. Still missing quite some documentation. Test coverage needs some improvements. You can contribute with PRs, issues, docs.
We saw thatcookieplonewas coming up, and Docker, and as game changeruvmaking the installation of Python packages much faster.
With cookieplone you get a monorepo, with folders for backend, frontend, and devops. devops contains scripts to setup the server and deploy to it. Our sysadmins already had some other scripts. So we needed to integrate that.
First idea: let's fork it. Create our own copy of cookieplone. I explained this in my World Plone Day talk earlier this year. But cookieplone was changing a lot, so it was hard to keep our copy updated.
Maik Derstappen showed mecopier, yet another templating language. Our idea: create a cookieplone project, and then usecopierto modify it.
What about the deployment? We are on GitLab. We host our runners. We use the docker-in-docker service. We develop on a branch and create a merge request (pull request in GitHub terms). This activates a piple to check-test-and-build. When it is merged, bump the version, userelease-it.
Then we create deploy keys and tokens. We give these access to private GitLab repositories. We need some changes to SSH key management in pipelines, according to our sysadmins.
For deployment on the server: we do not yet have automatic deployments. We did not want to go too fast. We are testing the current pipelines and process, see if they work properly. In the future we can think about automating deployment. We just ssh to the server, and perform some commands there with docker.
Future improvements:
Start the docker containers and curl/wget the/okendpoint.
Vision: plone.restapi aims to provide a complete, stable, documented, extensible, language-agnostic API for the Plone CMS.
New services
@site: global site settings. These are overall, public settings that are needed on all pages and that don't change per context.
@login: choose between multiple login provider.
@navroot: contextual data from the navigation root of the current context.
@inherit: contextual data from any behavior. It looks for the closest parent that has this behavior defined, and gets this data.
Dynamic teaser blocks: you can choose to customize the teaser content. So the teaser links to the item you have selected, but if you want, you can change the title and other fields.
Roadmap:
Don't break it.
10.0 release for Plone 6.2: remove setuptools namespace.
Continue to support migration path from older versions: use an old plone.restapi version on an old Plone version to export it, and being able to import this to the latest versions.
Recycle bin (work in progress): a lot of the work from Rohan is in Classic UI, but he is working on the restapi as well.
Wishlist, no one is working on this, but would be good to have:
@permissionsendpoint
@catalogendpoint
missing control panel
folder type constraints
Any time that you find yourself going to the Classic UI to do something, that is a sign something is missing.
Some changes to relative paths to fix some use cases
Machine readable specifications for OpenAPI, MCP
New forms backend
Bulk operations
Streaming API
External functional test suite, that you could also run against e.g. guillotina or Nick to see if it works there as well.
Time travel: be able to see the state of the database from some time ago. The ZODB has some options here.
The most optimistic vision of generative AI1 is that it will relieve us of the tedious, repetitive elements of knowledge work so that we can get to work on the really interesting problems that such tedium stands in the way of. Even if you fully believe in this vision, it's hard to deny that today, some tedium is associated with the process of using generative AI itself.
Generative AI also isn'tfree, and so, as responsible consumers, we need to ask: is it worth it? What's the ROI of genAI, and how can we tell? In this post, I'd like to explore a logical framework for evaluating genAI expenditures, to determine if your organization is getting its money's worth.
Perpetually Proffering Permuted Prompts
I think most LLM users would agree with me that a typical workflow with an LLM rarely involves prompting it only one time and getting a perfectly useful answer that solves the whole problem.
Generative AI best practices, even from the most optimistic vendors all suggest that you should continuously evaluate everything. ChatGPT, which is really the only genAI product with significantly scaled adoption, still says at the bottom of every interaction:
ChatGPT can make mistakes. Check important info.
If we have to "check important info" on every interaction, it stands to reason that even if we think it's useful, some of those checks will find an error. Again, if we think it's useful, presumably the next thing to do is to perturb our prompt somehow, and issue it again, in the hopes that the next invocation will, by dint of either:
enhanced application of our skill to engineer a better prompt based on the deficiencies of the current inference, or
better performance of the model by populating additional context in subsequent chained prompts.
Unfortunately, given the relative lack of reliable methods to re-generate the prompt and receive a better answer2, checking the output and re-prompting the model can feel like just kinda futzing around with it. You try, you get a wrong answer, you try a few more times, eventually you get the right answer that you wanted in the first place. It's a somewhat unsatisfying process, but if you get the right answer eventually, it does feel like progress, and you didn't need to use up another human's time.
In fact, the hottest buzzword of the last hype cycle is "agentic". While I have my own feelings about this particular word3, its current practical definition is "a generative AI system which automates the process of re-prompting itself, by having a deterministic program evaluate its outputs for correctness".
A better term for an "agentic" system would be a "self-futzing system".
However, the ability to automate some level of checking and re-prompting does not mean that you can fully delegate tasks to an agentic tool, either. It is, plainly put, not safe. If you leave the AI on its own, you will get terrible results that will at best make for a funny story45 and at worst might end up causing serious damage67.
Taken together, this all means that for any consequential task that you want to accomplish with genAI, you need an expert human in the loop. The human must be capable of independently doing the job that the genAI system is being asked to accomplish.
When the genAI guesses correctly and produces usable output, some of the human's time will be saved. When the genAI guesses wrong and produces hallucinatory gibberish or even "correct" output that nevertheless fails to account for some unstated but necessary property such as security or scale, some of the human's time will be wasted evaluating it and re-trying it.
Income from Investment in Inference
Let's evaluate an abstract, hypothetical genAI system that can automate some work for our organization. To avoid implicating any specific vendor, let's call the system "Mallory".
Is Mallory worth the money? How can we know?
Logically, there are only two outcomes that might result from using Mallory to do our work.
We prompt Mallory to do some work; we check its work, it is correct, and some time is saved.
We prompt Mallory to do some work; we check its work, it fails, and we futz around with the result; this time is wasted.
As a logical framework, this makes sense, but ROI is an arithmetical concept, not a logical one. So let's translate this into some terms.
In order to evaluate Mallory, let's define the Futzing Fraction, " FF ", in terms of the following variables:
H
the average amount of time a Human worker would take to do a task, unaided by Mallory
I
the amount of time that Mallory takes to run one Inference8
C
the amount of time that a human has to spend Checking Mallory's output for each inference
P
the Probability that Mallory will produce a correct inference for each prompt
W
the average amount of time that it takes for a human to Write one prompt for Mallory
E
since we are normalizing everything to time, rather than money, we do also have to account for the dollar of Mallory as as a product, so we will include the Equivalent amount of human time we could purchase for the marginal cost of one9 inference.
As in last week's example of simple ROI arithmetic, we will put our costs in the numerator, and our benefits in the denominator.
FF = W+I+C+E P H
The idea here is that for each prompt, the minimum amount of time-equivalent cost possible is W+I+C+E. The user must, at least once, write a prompt, wait for inference to run, then check the output; and, of course, pay any costs to Mallory's vendor.
If the probability of a correct answer is P=13, then they will do this entire process 3 times10, so we put P in the denominator. Finally, we divide everything by H, because we are trying to determine if we are actually saving any time or money, versus just letting our existing human, who has to be driving this process anyway, do the whole thing.
If the Futzing Fraction evaluates to a number greater than 1, as previously discussed, you are a bozo; you're spending more time futzing with Mallory than getting value out of it.
Figuring out the Fraction is Frustrating
In order to even evaluate the value of the Futzing Fraction though, you have to have a sound method to even get a vague sense of all the terms.
If you are a business leader, a lot of this is relatively easy to measure. You vaguely know what H is, because you know what your payroll costs, and similarly, you can figure out E with some pretty trivial arithmetic based on Mallory's pricing table. There are endless YouTube channels, spec sheets and benchmarks to give you I. W is probably going to be so small compared to H that it hardly merits consideration11.
But, are you measuring C? If your employees are not checking the outputs of the AI, you're on a path to catastrophe that no ROI calculation can capture, so it had better be greater than zero.
Are you measuring P? How often does the AI get it right on the first try?
Challenges to Computing Checking Costs
In the fraction defined above, the term C is going to be large. Larger than you think.
Measuring P and C with a high degree of precision is probably going to be very hard; possibly unreasonably so, or too expensive12 to bother with in practice. So you will undoubtedly need to work with estimates and proxy metrics. But you have to be aware that this is a problem domain where your normal method of estimating is going to be extremely vulnerable to inherent cognitive bias, and find ways to measure.
Margins, Money, and Metacognition
First let's discuss cognitive and metacognitive bias.
My favorite cognitive bias is the availability heuristic and a close second is its cousin salience bias. Humans are empirically predisposed towards noticing and remembering things that are more striking, and to overestimate their frequency.
If you are estimating the variables above based on the vibe that you're getting from the experience of using an LLM, you may be overestimating its utility.
Consider a slot machine.
If you put a dollar in to a slot machine, and you lose that dollar, this is an unremarkable event. Expected, even. It doesn't seem interesting. You can repeat this over and over again, a thousand times, and each time it will seem equally unremarkable. If you do it a thousand times, you will probably get gradually more anxious as your sense of your dwindling bank account becomes slowly more salient, but losing one more dollar still seems unremarkable.
If you put a dollar in a slot machine and it gives you a thousand dollars, that will probably seem pretty cool. Interesting. Memorable. You might tell a story about this happening, but you definitely wouldn't really remember any particular time you lost one dollar.
Luckily, when you arrive at a casino with slot machines, you probably know well enough to set a hard budget in the form of some amount of physical currency you will have available to you. The odds are against you, you'll probably lose it all, but any responsible gambler will have an immediate, physical representation of their balance in front of them, so when they have lost it all, they can see that their hands are empty, and can try to resist the "just one more pull" temptation, after hitting that limit.
Now, consider Mallory.
If you put ten minutes into writing a prompt, and Mallory gives a completely off-the-rails, useless answer, and you lose ten minutes, well, that's just what using a computer is like sometimes. Mallory malfunctioned, or hallucinated, but it does that sometimes, everybody knows that. You only wasted ten minutes. It's fine. Not a big deal. Let's try it a few more times. Just ten more minutes. It'll probably work this time.
If you put ten minutes into writing a prompt, and it completes a task that would have otherwise taken you 4 hours, that feels amazing. Like the computer is magic! An absolute endorphin rush.
Very memorable. When it happens, it feels like P=1.
But... did you have a time budget before you started? Did you have a specified N such that "I will give up on Mallory as soon as I have spent N minutes attempting to solve this problem with it"? When the jackpot finally pays out that 4 hours, did you notice that you put 6 hours worth of 10-minute prompt coins into it in?
If you are attempting to use the same sort of heuristic intuition that probably works pretty well for other business leadership decisions, Mallory's slot-machine chat-prompt user interface is practically designed to subvert those sensibilities. Most business activities do not have nearly such an emotionally variable, intermittent reward schedule. They're not going to trick you with this sort of cognitive illusion.
Thus far we have been talking about cognitive bias, but there is a metacognitive bias at play too: while Dunning-Kruger, everybody's favorite metacognitive bias does have some problems with it, the main underlying metacognitive bias is that we tend to believe our own thoughts and perceptions, and it requires active effort to distance ourselves from them, even if we know they might be wrong.
This means you must assume any intuitive estimate of C is going to be biased low; similarly P is going to be biased high. You will forget the time you spent checking, and you will underestimate the number of times you had to re-check.
To avoid this, you will need to decide on a Ulysses pact to provide some inputs to a calculation for these factors that you will not be able to able to fudge if they seem wrong to you.
Problematically Plausible Presentation
Another nasty little cognitive-bias landmine for you to watch out for is the authority bias, for two reasons:
People will tend to see Mallory as an unbiased, external authority, and thereby see it as more of an authority than a similarly-situated human13.
Being an LLM, Mallory will be overconfident in its answers14.
The nature of LLM training is also such that commonly co-occurring tokens in the training corpus produce higher likelihood of co-occurring in the output; they're just going to be closer together in the vector-space of the weights; that's, like, what training a model is, establishing those relationships.
If you've ever used an heuristic to informally evaluate someone's credibility by listening for industry-specific shibboleths or ways of describing a particular issue, that skill is now useless. Having ingested every industry's expert literature, commonly-occurring phrases will always be present in Mallory's output. Mallory will usually sound like an expert, but then make mistakes at random.15.
While you might intuitively estimate C by thinking "well, if I asked a person, how could I check that they were correct, and how long would that take?" that estimate will be extremely optimistic, because the heuristic techniques you would use to quickly evaluate incorrect information from other humans will fail with Mallory. You need to go all the way back to primary sources and actually fully verify the output every time, or you will likely fall into one of these traps.
Mallory Mangling Mentorship
So far, I've been describing the effect Mallory will have in the context of an individual attempting to get some work done. If we are considering organization-wide adoption of Mallory, however, we must also consider the impact on team dynamics. There are a number of possible potential side effects that one might consider when looking at, but here I will focus on just one that I have observed.
I have a cohort of friends in the software industry, most of whom are individual contributors. I'm a programmer who likes programming, so are most of my friends, and we are also (sigh), charitably, pretty solidly middle-aged at this point, so we tend to have a lot of experience.
As such, we are often the folks that the team - or, in my case, the community - goes to when less-experienced folks need answers.
On its own, this is actually pretty great. Answering questions from more junior folks is one of the best parts of a software development job. It's an opportunity to be helpful, mostly just by knowing a thing we already knew. And it's an opportunity to help someone else improve their own agency by giving them knowledge that they can use in the future.
However, generative AI throws a bit of a wrench into the mix.
Let's imagine a scenario where we have 2 developers: Alice, a staff engineer who has a good understanding of the system being built, and Bob, a relatively junior engineer who is still onboarding.
The traditional interaction between Alice and Bob, when Bob has a question, goes like this:
Bob gets confused about something in the system being developed, because Bob's understanding of the system is incorrect.
Bob formulates a question based on this confusion.
Bob asks Alice that question.
Alice knows the system, so she gives an answer which accurately reflects the state of the system to Bob.
Bob's understanding of the system improves, and thus he will have fewer and better-informed questions going forward.
You can imagine how repeating this simple 5-step process will eventually transform Bob into a senior developer, and then he can start answering questions on his own. Making sufficient time for regularly iterating this loop is the heart of any good mentorship process.
Now, though, with Mallory in the mix, the process now has a new decision point, changing it from a linear sequence to a flow chart.
We begin the same way, with steps 1 and 2. Bob's confused, Bob formulates a question, but then:
Bob asks Mallory that question.
Here, our path then diverges into a "happy" path, a "meh" path, and a "sad" path.
The "happy" path proceeds like so:
Mallory happens to formulate a correct answer.
Bob's understanding of the system improves, and thus he will have fewer and better-informed questions going forward.
Great. Problem solved. We just saved some of Alice's time. But as we learned earlier,
Mallory can make mistakes. When that happens, we will need to check important info. So let's get checking:
Mallory happens to formulate an incorrect answer.
Bob investigates this answer.
Bob realizes that this answer is incorrect because it is inconsistent with some of his prior, correct knowledge of the system, or his investigation.
Bob asks Alice the same question; GOTO traditional interaction step 4.
On this path, Bob spent a while futzing around with Mallory, to no particular benefit. This wastes some of Bob's time, but then again, Bob could have ended up on the happy path, so perhaps it was worth the risk; at least Bob wasn't wasting any of Alice's much more valuable time in the process.16
Notice that beginning at the start of step 4, we must begin allocating all of Bob's time to C, so C already starts getting a bit bigger than if it were just Bob checking Mallory's output specifically on tasks that Bob is doing.
That brings us to the "sad" path.
Mallory happens to formulate an incorrect answer.
Bob investigates this answer.
Bob does not realize that this answer is incorrect because he is unable to recognize any inconsistencies with his existing, incomplete knowledge of the system.
Bob integrates Mallory's incorrect information of the system into his mental model.
Bob proceeds to make a larger and larger mess of his work, based on an incorrect mental model.
Eventually, Bob asks Alice a new, worse question, based on this incorrect understanding.
Sadly we cannot return to the happy path at this point, because now Alice must unravel the complex series of confusing misunderstandings that Mallory has unfortunately conveyed to Bob at this point. In the really sad case, Bob actually doesn't believe Alice for a while, because Mallory seems unbiased17, and Alice has to waste even more time convincing Bob before she can simply explain to him.
Now, we have wasted some of Bob's time, and some of Alice's time. Everything from step 5-10 is C, and as soon as Alice gets involved, we are now adding to C at double real-time. If more team members are pulled in to the investigation, you are now multiplying C by the number of investigators, potentially running at triple or quadruple real time.
But That's Not All
Here I've presented a brief selection reasons why C will be both large, and larger than you expect. To review:
Gambling-style mechanics of the user interface will interfere with your own self-monitoring and developing a good estimate.
You can't use human heuristics for quickly spotting bad answers.
Wrong answers given to junior people who can't evaluate them will waste more time from your more senior employees.
But this is a small selection of ways that Mallory's output can cost you money and time. It's harder to simplistically model second-order effects like this, but there's also a broad range of possibilities for ways that, rather than simply checking and catching errors, an error slips through and starts doing damage. Or ways in which the output isn't exactly wrong, but still sub-optimal in ways which can be difficult to notice in the short term.
For example, you might successfully vibe-code your way to launch a series of applications, successfully "checking" the output along the way, but then discover that the resulting code is unmaintainable garbage that prevents future feature delivery, and needs to be re-written18. But this kind of intellectual debt isn't even specific to technical debt while coding; it can even affect such apparently genAI-amenable fields as LinkedIn content marketing19.
Problems with the Prediction of P
C isn't the only challenging term though. P, is just as, if not more important, and just as hard to measure.
LLM marketing materials love to phrase their accuracy in terms of a percentage. Accuracy claims for LLMs in general tend to hover around 70%20. But these scores vary per field, and when you aggregate them across multiple topic areas, they start to trend down. This is exactly why "agentic" approaches for more immediately-verifiable LLM outputs (with checks like "did the code work") got popular in the first place: you need to try more than once.
Independently measured claims about accuracy tend to be quite a bit lower21. The field of AI benchmarks is exploding, but it probably goes without saying that LLM vendors game those benchmarks22, because of course every incentive would encourage them to do that. Regardless of what their arbitrary scoring on some benchmark might say, all that matters to your business is whether it is accurate for the problems you are solving, for the way that you use it. Which is not necessarily going to correspond to any benchmark. You will need to measure it for yourself.
With that goal in mind, our formulation of P must be a somewhat harsher standard than "accuracy". It's not merely "was the factual information contained in any generated output accurate", but, "is the output good enough that some given real knowledge-work task is done and the human does not need to issue another prompt"?
Surprisingly Small Space for Slip-Ups
The problem with reporting these things as percentages at all, however, is that our actual definition for P is 1attempts, where attempts for any given attempt, at least, must be an integer greater than or equal to 1.
Taken in aggregate, if we succeed on the first prompt more often than not, we could end up with a P>12, but combined with the previous observation that you almost always have to prompt it more than once, the practical reality is that P will start at 50% and go down from there.
If we plug in some numbers, trying to be as extremely optimistic as we can, and say that we have a uniform stream of tasks, every one of which can be addressed by Mallory, every one of which:
we can measure perfectly, with no overhead
would take a human 45 minutes
takes Mallory only a single minute to generate a response
Mallory will require only 1 re-prompt, so "good enough" half the time
takes a human only 5 minutes to write a prompt for
takes a human only 5 minutes to check the result of
has a per-prompt cost of the equivalent of a single second of a human's time
Thought experiments are a dicey basis for reasoning in the face of disagreements, so I have tried to formulate something here that is absolutely, comically, over-the-top stacked in favor of the AI optimist here.
Would that be a profitable? It sure seems like it, given that we are trading off 45 minutes of human time for 1 minute of Mallory-time and 10 minutes of human time. If we ask Python:
12345
>>> def FF(H, I, C, P, W, E):
... return (W + I + C + E) / (P * H)
... FF(H=45.0, I=1.0, C=5.0, P=1/2, W=5.0, E=0.01)
...
0.48933333333333334
We get a futzing fraction of about 0.4896. Not bad! Sounds like, at least under these conditions, it would indeed be cost-effective to deploy Mallory. But… realistically, do you reliably get useful, done-with-the-task quality output on the second prompt? Let's bump up the denominator on P just a little bit there, and see how we fare:
With this little test, we can see that at our next iteration we are already at 0.9792, and by 5 tries per prompt, even in this absolute fever-dream of an over-optimistic scenario, with a futzing fraction of 1.2240, Mallory is now a net detriment to our bottom line.
Harm to the Humans
We are treating H as functionally constant so far, an average around some hypothetical Gaussian distribution, but the distribution itself can also change over time.
Formally speaking, an increase to H would be good for our fraction. Maybe it would even be a good thing; it could mean we're taking on harder and harder tasks due to the superpowers that Mallory has given us.
But an observed increase to H would probably not be good. An increase could also mean your humans are getting worse at solving problems, because using Mallory has atrophied their skills23 and sabotaged learning opportunities2425. It could also go up because your senior, experienced people now hate their jobs26.
For some more vulnerable folks, Mallory might just take a shortcut to all these complex interactions and drive them completely insane27 directly. Employees experiencing an intense psychotic episode are famously less productive than those who are not.
This could all be very bad, if our futzing fraction eventually does head north of 1 and you need to reconsider introducing human-only workflows, without Mallory.
Abridging the Artificial Arithmetic (Alliteratively)
To reiterate, I have proposed this fraction:
FF = W+I+C+E P H
which shows us positive ROI when FF is less than 1, and negative ROI when it is more than 1.
This model is heavily simplified. A comprehensive measurement program that tests the efficacy of any technology, let alone one as complex and rapidly changing as LLMs, is more complex than could be captured in a single blog post.
Real-world work might be insufficiently uniform to fit into a closed-form solution like this. Perhaps an iterated simulation with variables based on the range of values seem from your team's metrics would give better results.
However, in this post, I want to illustrate that if you are going to try to evaluate an LLM-based tool, you need to at least include some representation of each of these terms somewhere. They are all fundamental to the way the technology works, and if you're not measuring them somehow, then you are flying blind into the genAI storm.
I also hope to show that a lot of existing assumptions about how benefits might be demonstrated, for example with user surveys about general impressions, or by evaluating artificial benchmark scores, are deeply flawed.
Even making what I consider to be wildly, unrealistically optimistic assumptions about these measurements, I hope I've shown:
in the numerator, C might be a lot higher than you expect,
in the denominator, P might be a lot lower than you expect,
repeated use of an LLM might make H go up, but despite the fact that it's in the denominator, that will ultimately be quite bad for your business.
Personally, I don't have all that many concerns about E and I. E is still seeing significant loss-leader pricing, and I might not be coming down as fast as vendors would like us to believe, if the other numbers work out I don't think they make a huge difference. However, there might still be surprises lurking in there, and if you want to rationally evaluate the effectiveness of a model, you need to be able to measure them and incorporate them as well.
In particular, I really want to stress the importance of the influence of LLMs on your team dynamic, as that can cause massive, hidden increases to C. LLMs present opportunities for junior employees to generate an endless stream of chaff that will simultaneously:
wreck your performance review process by making them look much more productive than they are,
increase stress and load on senior employees who need to clean up unforeseen messes created by their LLM output,
and ruin their own opportunities for career development by skipping over learning opportunities.
If you've already deployed LLM tooling without measuring these things and without updating your performance management processes to account for the strange distortions that these tools make possible, your Futzing Fraction may be much, much greater than 1, creating hidden costs and technical debt that your organization will not notice until a lot of damage has already been done.
If you got all the way here, particularly if you're someone who is enthusiastic about these technologies, thank you for reading. I appreciate your attention and I am hopeful that if we can start paying attention to these details, perhaps we can all stop futzing around so much with this stuff and get back to doing real work.
I do not share this optimism, but I want to try very hard in this particular piece to take it as a given that genAI is in fact helpful. ↩
If we could have a better prompt on demand via some repeatable and automatable process, surely we would have used a prompt that got the answer we wanted in the first place. ↩
The software idea of a "user agent" straightforwardly comes from the legal principle of an agent, which has deep roots in common law, jurisprudence, philosophy, and math. When we think of an agent (some software) acting on behalf of a principal (a human user), this historical baggage imputes some important ethical obligations to the developer of the agent software. genAI vendors have been as eager as any software vendor to dodge responsibility for faithfully representing the user's interests even as there are some indications that at least some courts are not persuaded by this dodge, at least by the consumers of genAI attempting to pass on the responsibility all the way to end users. Perhaps it goes without saying, but I'll say it anyway: I don't like this newer interpretation of "agent". ↩
During which a human will be busy-waiting on an answer. ↩
Given the fluctuating pricing of these products, and fixed subscription overhead, this will obviously need to be amortized; including all the additional terms to actually convert this from your inputs is left as an exercise for the reader. ↩
I feel like I should emphasize explicitly here that everything is an average over repeated interactions. For example, you might observe that a particular LLM has a low probability of outputting acceptable work on the first prompt, but higher probability on subsequent prompts in the same context, such that it usually takes 4 prompts. For the purposes of this extremely simple closed-form model, we'd still consider that a P of 25%, even though a more sophisticated model, or a monte carlo simulation that sets progressive bounds on the probability, might produce more accurate values. ↩
It's worth noting that all this expensive measuring itself must be included in C until you have a solid grounding for all your metrics, but let's optimistically leave all of that out for the sake of simplicity. ↩
My father, also known as "R0ML" once described a methodology for evaluating volume purchases that I think needs to be more popular.
If you are a hardcore fan, you might know that he has already described this concept publicly in a talk at OSCON in 2005, among other places, but it has never found its way to the public Internet, so I'm giving it a home here, and in the process, appropriating some of his words.1
Let's say you're running a circus. The circus has many clowns. Ten thousand clowns, to be precise. They require bright red clown noses. Therefore, you must acquire a significant volume of clown noses. An enterprise licensing agreement for clown noses, if you will.
If the nose plays, it can really make the act. In order to make sure you're getting quality noses, you go with a quality vendor. You select a vendor who can supply noses for $100 each, at retail.
Do you want to buy retail? Ten thousand clowns, ten thousand noses, one hundred dollars: that's a million bucks worth of noses, so it's worth your while to get a good deal.
As a conscientious executive, you go to the golf course with your favorite clown accessories vendor and negotiate yourself a 50% discount, with a commitment to buy all ten thousand noses.
Is this a good deal? Should you take it?
To determine this, we will use an analytical tool called R0ML's Ratio (RR).
The ratio has 2 terms:
the Full Undiscounted Retail List Price of Units Used (FURLPoUU), which can of course be computed by the individual retail list price of a single unit (in our case, $100) multiplied by the number of units used
the Total Price of the Entire Enterprise Volume Licensing Agreement (TPotEEVLA), which in our case is $500,000.
It is expressed as:
RR = TPotEEVLA FURLPoUU
Crucially, you must be able to compute the number of units used in order to complete this ratio. If, as expected, every single clown wears their nose at least once during the period of the license agreement, then our Units Used is 10,000, our FURLPoUU is $1,000,000 and our TPotEEVLA is $500,000, which makes our RR 0.5.
Congratulations. If R0ML's Ratio is less than 1, it's a good deal. Proceed.
But… maybe the nose doesn't play. Not every clown's costume is an exact clone of the traditional, stereotypical image of a clown. Many are avant-garde. Perhaps this plentiful proboscis pledge was premature. Here, I must quote the originator of this theoretical framework directly:
What if the wheeze doesn't please?
What if the schnozz gives some pause?
In other words: what if some clowns don't wear their noses?
If we were to do this deal, and then ask around afterwards to find out that only 200 of our 10,000 clowns were to use their noses, then FURLPoUU comes out to 200 * $100, for a total of $20,000. In that scenario, RR is 25, which you may observe is substantially greater than 1.
If you do a deal where R0ML's ratio is greater than 1, then you are the bozo.
I apologize if I have belabored this point. As R0ML expressed in the email we exchanged about this many years ago,
I do not mind if you blog about it - and I don't mind getting the credit - although one would think it would be obvious.
And yeah, one would think this would be obvious? But I have belabored it because many discounted enterprise volume purchasing agreements still fail the R0ML's Ratio Bozo Test.2
In the case of clown noses, if you pay the discounted price, at least you get to keep the nose; maybe lightly-used clown noses have some resale value. But in software licensing or SaaS deals, once you've purchased the "discounted" software or service, once you have provisioned the "seats", the money is gone, and if your employees don't use it, then no value for your organization will ever result.
Measuring number of units used is very important. Without this number, you have no idea if you are a bozo or not.
It is often better to give your individual employees a corporate card and allow them to make arbitrary individual purchases of software licenses and SaaS tools, with minimal expense-reporting overhead; this will always keep R0ML's Ratio at 1.0, and thus, you will never be a bozo.
It is always better to do that the first time you are purchasing a new software tool, because the first time making such a purchase you (almost by definition) have no information about "units used" yet. You have no idea - you cannot have any idea - if you are a bozo or not.
If you don't know who the bozo is, it's probably you.
Acknowledgments
Thank you for reading, and especially thank you to my patrons who are supporting my writing on this blog. Of course, extra thanks to dad for, like, having this idea and doing most of the work here beyond my transcription. If you like my dad's ideas and you'd like to post more of them, or you'd like to support my various open-source endeavors, you can support my work as a sponsor!
One of my other favorite posts on this blog was just stealing another one of his ideas, so hopefully this one will be good too. ↩
This concept was first developed in 2001, but it has some implications for extremely recent developments in the software industry; but that's a post for another day. ↩