19 Jun 2026
Planet Grep
Wouter Verhelst: Agentic coding and Free Software

Through work, I have paid license to windsurf (recently renamed to "devin"), an application for LLM-based (aka, "Agentic") development.
I hadn't been using it that much, but in an effort to more clearly understand how this whole AI development thing works, I decided to give it a closer look recently.
My conclusions:
In its current form, this whole LLM wave is problematic for multiple reasons. But ignoring that, and looking at the technology only, I can say that:
- it is a paradigm shift;
- it is, at the technological level, a positive evolution;
- and it is a threat to free software.
Problems
Lest someone (incorrectly) assume that I am arguing in favour of the current state of affairs with regards to LLMs, let me state this first.
The way LLMs are built today is highly parasitic. Websites are downloaded in whole, at unsustainable rates, regardless of the consent of the people who made the original content. The result is predictable: servers get overloaded, server administrators attempt to implement various mitigations. Some of these mitigations work; some do, for a while; some are entirely useless. In actual fact, the mitigations are an arms race -- if too many people implement the same mitigation, then the people who try to build yet another LLM so they can extract rent will just try to work around the mitigation, eventually they will succeed, and you'll just have to come up with another mitigation. It's a bit like spam; you introduce regex-based spam filters, they introduce spelling mistakes, you introduce bayesian filters, they add a large batch of markov chain-generated semi-nonsense words made invisible by markup, you add filters to block emails with such markup, they move the text into an image. We have working mitigations today, but eventually we'll run out of ideas.
LLMs glob up everything they can while ignoring the license of the source material. The people who push those LLMs claim that pushing the source material through the machine learning algorithms makes the output of the algorithm distinct enough from the source material that the license no longer applies; I'm not so sure that this is true. I guess the New York Times v OpenAI lawsuit will teach us some of the answer to that question here, but even so the ethical questions about "is it OK to bring down another server just so we can download the internet for another for-pay LLM" are still open. And regardless of what the law states, my opinion on "you're using my copyleft code to generate code under a different license" is not something you might like if you agree with the rent seekers' opinion on the subject.
That all being said and true, the technology works. You can have a "conversation" with an LLM that resembles a human one. If you pass it some data, you can use plain english to ask it questions about that data, which is a lot easier than to ask it about that in a formal way. You can request it to generate some code, and it will generate something that looks like what you need and that will be mostly correct for like 95% of the time.
Now, yes, 95% of the time is not 100% of the time, and no, you can't ask it to "write me a piece of software that implements this 300-page requirements document and get back to me when you're done", because it will fail, and you won't know where it has failed, and you'll take it into production and expect everything to be fine because it won't and this one minor logic bug will cause half your servers to spin and consume credits with your infrastructure provider with nothing to show for it.
But that doesn't mean you can't use an LLM to build a large piece of software. It just means you have to understand the LLMs limitations and strenghts, and use them correctly.
Here's what an LLM is good at:
- Generating plausible text
- Interpreting text to figure out what a plausible meaning or summary of that text is
- Giving vague indications as to what the probable context of a given body of text is.
It turns out that that's enough to use the LLM to build a reliable piece of software, provided you do it right.
Paradigm shift
An LLM can generate text by the truckful. The generated text could be code. Given a good enough LLM, the generated text might even run and do something useful.
You can try to blindly run the code, and if it doesn't run correctly, you can paste the error message to the LLM, and it can tell you what went wrong and how you could possibly fix it. This creates a feedback loop: you ask it for an amount of code, you run the code, you receive an error, you tell it that the code is problematic and give it the error message, it makes changes to the code, now you have something that at least no longer fails at startup.
If you ask it to add tests to make sure that your code acts as per your specification, now you get an error if and when the code doesn't act as per your specification. Or, well, at least not as per the part of the specification that was correctly turned into a unit test by the LLM.
LLMs have a context window, so if the error message is pasted in the same conversation as where the code was generated, it is able to reuse the earlier prompts to refine how it should interpret the error message that you received.
You can't really paste the source code of an entire application into the prompt of your LLM, that would quickly overrun its context window. But LLMs also allow you to provide some form of background information -- a document, say -- on which you ask it to reason. It will interpret that document, but doing so uses less of the LLMs context window. So providing the LLM with your application's source code as background information can help it understand better how your code interacts. This is especially helpful if you only provide the LLM the background information relevant to the actual question.
So now if you are able to:
- Create background context with your application's source code
- Have the LLM generate a first draft of your requested change, plus the tests to make sure it works
- Compile (if applicable) the generated code (and tests) and run said tests
- Return any error messages to the LLM with a request to correct the error
Then the combination of "getting it 95% right off the bat" and the above feedback loop means you can generate syntactically correct code, that probably does what you need, in minutes.
I say "probably" for a reason. There are going to be cases where you specify a request without a number of details (because they are implied), and the LLM will get most of those details right but just not implement the one bit because it's an automaton and it doesn't think. Or you will ask it to make sure that two bits of the application look exactly the same, without specifying that they must act the same, now and in the future, and it will just generate the same block of code twice and then in a future change it will change one but not the other.
But if you review the changes, and you have experience as a programmer, you will be able to spot most cases where the LLM got it wrong. And so it's possible, if not necessarily easy at first, to use an LLM to generate mostly correct code.
There are certain places where "mostly correct" code is not desireable. But equally, there are also cases where, "mostly correct" is good enough.
After all, most of the software you run today -- the bits of it that weren't, yet, generated by an LLM -- is only "mostly correct", too, because to err is human and we all make mistakes. If not, there wouldn't be any CVEs and your software would never do anything wrong.
Now, doing the feedback loop described above is certainly something you could do manually. You could open an account on one of the LLM websites, upload the source code of your application, ask it to generate some new feature, download the newly generated feature, run it, and then copy/paste any error messages back into the LLM.
But that's a lot of manual work of the type that computers are pretty good at. So that's what the "windsurf" tool helps you with: you run it inside your IDE -- either a VSCode-based tool that you download from their website which comes with their product preinstalled, or a separate JetBrains plugin that you can install. You can then open your entire relevant codebase in a workspace in your IDE. You then ask the LLM, through the IDE, to generate a new feature in your codebase, and to also generate the test while it's at it. It will use a mixture of LLM interpretation and non-LLM functionality to scoop out the relevant bits of your codebase to send to the LLM as background information, will send it your prompt, will download the generated code and patch or create files, will compile (if required) and run the newly generated code and tests, and will refine the generated code if the tests produce any errors. All mostly automatic; by default, running anything requires explicit confirmation. You can turn that off completely (probably not a good idea), or you can give it a whitelist of things that you don't want to confirm (perhaps OK), and the tool also passes standing instructions to the LLM to never generate any command that deletes a file (which, like with any LLM, can be overridden, but it requires you to be very stubborn and to use more credits than you'd probably like).
All this put together means you can build something without writing any piece of code, provided you do it right.
A technically positive evolution
Don't go and say, "here's a 300-page document, read it and write whatever the document says". It will get it wrong, it will write a massive test suite that it will only run at the end, it will choke itself up trying to interpret the massive amount of failures it encounters, it will fill up its context window and it will start to forget some of the requirements. That won't work.
But what you can do -- what I did, in fact -- is this.
First, create an empty workspace. Don't put any code in it.
Then, tell the LLM to generate a backend framework using technology X and a frontend framework using technology Y that initially only says "hello, world". Also add tests to it, and run the tests.
It will do that. You'll not get much, but it will work.
Then, ask it to add some UI elements. A login page, perhaps. A navigation bar. Small things. Most of it doesn't have to be functional -- but tests must be there for the bits that are, and have it run the tests and evaluate the results.
Rinse, repeat, until you have a working application.
Importantly, in between the steps, you should also run the application yourself and see if the change was implemented correctly. Sometimes it won't be. Sometimes there will be a subtle bug -- I at one point had a the application hang after a few minutes. Sometimes you tell it that there's a subtle bug, and it will discover it more quickly than you could, and it will fix it, and in implementing the fix it will uncover another bug, and then you have to fix that one -- the fix it came up with for the hang was to move something to an async process on the server, which caused the application to start spinning while trying to create hundreds of async jobs (this is when I realized that the hang was a deadlock due to some part of the codebase doing something that indirectly triggered itself). Sometimes it will try to fix the bug you tell it about, and you'll see that it's going off on a tangent that has nothing to do with what you're seeing. It's important to keep an eye on what it's doing, so you can guide it back on track when that happens -- when I told it about the hang, it started investigating the part of the code which sends out emails, thinking that it could hang while waiting for sendmail to finish, but the hang was happening when the application was idle, not when it was sending out emails, and only when I told it about it happening when it was idle did it find the deadlock.
So it's not a fully automatic process, and it needs to be guided by someone who knows what they're doing. But if that is the case, you can come up with something that works. I spent evenings and breaks for about a week, and I managed to create a working application which, had I written it by hand, would have taken me a few months of full-time work to come up with. And I now have a side project, fully complete and working, that I had been thinking about doing for more than a decade, but never got around to actually doing, because of all the work that would be involved and I just didn't see myself having the time for.
It's not perfect code. But it's mostly good enough, and it will perform the job it needs to. And it looks far slicker than most of the side projects I've done in the past, because in the past I would prioritize between implementing new features or making something look slick, and I would decide that the new feature was more important because it's only for me and there's only me and nobody cares if it looks good or not and I don't have three weeks to come up with something that looks better. But here, I found myself sometimes spending 10 minutes writing a prompt with instructions on making things look better. Because what's 10 minutes when you just spent an hour writing down and refining specifications for functionality and tests?
There are a number of other things in which an LLM can help a programmer.
For instance.
I received a bug report recently in a project I'm paid to maintain that I couldn't make heads or tails of. I opened the source code in my windsurf IDE, pasted the bug report in the prompt, and then requested the tool to analyze the source code and the associated logs and tell me how the described behavior could be happening. It turned out that I had overlooked something, but with the help of the tool, I found the bug in minutes.
I was trying to understand a particular part of a large codebase that I didn't really grasp very well. I loaded the codebase in the tool, and asked it to explain to me how a particular action is performed by the code. I requested specific functions and line numbers. I now have a far better understanding of how the code works, and will be able to write that patch that I've been wanting to write for years -- without using the LLM.
I have been struggling for, literally, years with understanding why another tool that I maintain was misbehaving in a particular way but only in Firefox. I opened the codebase in Firefox, explained the buggy behavior in plain English, and asked it to explain how this could be happening. It picked up some obscure corner case behavior of ffmpeg and mp4 containers that I was not aware of and that perfectly explained why things were misbehaving in the way that they were.
At the same time, there are limitations. Giving an LLM a codebase that was originally generated by an LLM (either the same one or another one) seems to work well. Giving it a codebase that was written by a human and expecting it to correctly update it seems to be more error-prone. I did one or two of those as a trial, and it is more problematic than anything.
An LLM is also not intelligent, notwithstanding the popular term of "Artificial Intelligence". On multiple occasions, I've asked it to write a test case for some code that was not set up to do so; and rather than suggesting a refactor is required, it would instead copy the code that needed to be tested and then test the copy, rather than the original. The tool has made multiple similar errors. I have sometimes people describe agentic coding as "similar to interacting with junior programmers", but that is not the case. A junior programmer will either fill in the gaps in your specifications, or ask for clarification when something seems off. The LLM will not do that; it will do what you ask, exactly that and nothing more. If you missed a corner case in your specification, then all bets are off.
I remember learning about programming language generations in college. A first-generation language is "machine code", a second-generation language is "assembler", a third-generation language is any high-level language such as C, Perl, or Pascal. I've forgotten what set a 3rd-generation language apart from a 4th-generation language. But I remember the definition they gave me for a 5th-generation language: "you tell the computer what to do, and it will do it". At the time, I thought it was ridiculous. Nobody could ever write something like that.
But it's here.
And it's a threat to free software.
A threat to free software?
Yes.
There is the obvious part where most of the well-known LLMs are non-free software. I mean, there are some "open source" LLM models. The windsurf tool that I used doesn't allow you to use them (directly), but they're there. There are also open source applications that implement what the windsurf editor does. So it's definitely possible to work like this without resorting to non-free software and non-free services, even though the non-free LLMs might be a bit ahead of the curve of the free ones. But that's not what I mean.
And there is also the obvious thing which I mentioned earlier in this post, which is that the people who try to build LLMs are doing it in unethical, disgusting ways, causing downtimes and disregarding licenses for whatever they can get their grubby hands on. Ideally we wouldn't be in that situation, and ideally this wouldn't be a problem, but we are where we are.
And there's the obvious thing where the OSI sold itself out and declared that a machine learning program can be open source even when the very things it was built from -- the training data -- is not available. That's a major issue that the free software community needs to fight against, but there's not really anything that that is a threat to free software. You just build your own, free software, LLM, and you're done.
The actual threat is in funding and developer support.
Most large businesses do not care about free-as-in-freedom software. They like the free-as-in-beer part, and they appreciate that the free-as-in-freedom bits can make the software more customizable. They are (mostly) happy to do sponsorships of the free-as-in-freedom projects that they use if that means their free-as-in-beer usage of the software gets improved.
But why would you care about all that when you can just generate the code you need, rather than interacting with an open source community that may or may not care about your business's interests?
Where to go from here
Although I think the moral and environmental issues with LLMs are real and problematic, given the experiments I did I am not convinced that the concept of interacting with a computer system in natural language and to use it to generate code is necessarily deficient. There are pitfalls, but they can be managed. It is possible to use such a system to create throwaway, proof-of-concept type "good enough" code bases. It can be used to interpret code bases and to understand bug reports.
I believe that the major issue with LLMs has to do with that saying about hammers and nails:
If all you have is a hammer, then everything looks like a nail.
LLMs are an outgrowth of machine learning, pushed by large corporations. These large corporations have a lot of money. If all you have is money, then every problem can be fixed by throwing more money at it. The initial language models were promising but not (yet) good enough, and it seemed that one way in which they could be improved was to increase the scale of the statistics: throw more hardware (and thus money) at it, and rather than improving the efficiency of the models, just scale up.
Scaling up is something that megacorporations are very good at. It's only a money problem, after all. Does that mean that "scaling up" is the only way to improve the models, though? I'm not convinced.
Some hardware, such as most modern Apple and Samsung devices, ship with accelerator hardware for machine learning algorithms. There are some models that are small enough to be able to run on these devices. I don't see why it should not be possible to create a small(er) language model that can do some useful part of the above-described use cases; if not locally, then at least on a server that one can run on-prem rather than requiring that you pay rent to one of the LLM companies.
The Software Freedom Conservancy has published an aspirational statement on machine learning-assisted programming that, I think, gets a lot right. It's not quite a definition, but it's something to keep in mind.
Perhaps that's the way forward?
More questions than answers at this point, anyway.
19 Jun 2026 4:59pm GMT
Frank Goossens: Open Broadcast Radio: Switzerland calling
We went on holiday to the Swiss Alps earlier this month, in Ausserbinn not far from the border with Italy. I won't bore you with a description of what we did or saw, but I do want to share the souvenir we took home with us. While driving around we scanned the radio for nice music and we discovered "Open Broadcast", a 24/7 non-stop radio station. They are non-commercial and have a very eclectic…
19 Jun 2026 4:59pm GMT
Dries Buytaert: AI and the great CMS unbundling
The question I get most these days is: did AI kill the CMS? Should we still invest in a CMS, switch to AI agents, or wait until the market becomes clearer?
At a friend's birthday party recently, I was talking with engineers and startup CEOs. They were all smart people, but none of them worked in the CMS industry. From where they sat, AI seemed to make the CMS obsolete.
I understand why. AI can now generate copy, design pages, write code, translate content, and assemble websites. If that is what you think a CMS is for, it does look like the CMS is in trouble.
They may be right about one part of the CMS market. But I think they are wrong about the larger picture.
To see why, it helps to separate what a content management system, or CMS, does into two planes: the control plane and the execution plane.
The control plane governs content: who can edit it, what gets approved, which version is canonical, how translations move through workflow, and where content can be used.
The execution plane creates, assembles, and delivers that content into websites, mobile apps, feeds, and other customer experiences.
AI is unbundling these two planes. It is commoditizing the execution plane while making the control plane more valuable. That is why I think AI is killing one corner of the CMS market, but making the CMS more critical everywhere else.
AI lowers the cost of creation, not the cost of trust
We have seen this pattern before. The printing press made it cheap to produce and distribute content, but it did not make editors or publishers irrelevant. It made them more important, because more content created more need for judgment, trust, and standards.
AI is doing something similar to digital content. It makes production cheaper: drafting, generating, translating, designing, assembling pages, and adapting content for different channels.
But AI should not be the final authority on what is correct, approved, compliant, or safe to publish. It can help, but people and systems still need to own those decisions. The more content AI helps produce and distribute, the more that ownership matters.
As production gets cheaper, control becomes more important, not less.
That is the real test for a CMS. Not whether AI can generate content or build a page, but whether your organization needs a control layer: roles, review, approvals, publishing states, revision history, and more.
How shared is your work?
Two simple questions can help decide how much you need a CMS:
- How many people or agents create, review, and publish content?
- How many systems need to use, update, or trust that content?
Put those questions on a grid, and four use cases emerge.
The more people, agents, systems, and channels involved, the more a CMS matters as the control layer.
When one person creates and publishes content, and no other systems depend on it, you may not need a CMS. A lightweight publishing tool or AI site builder may be enough.
When multiple people or agents touch content, you need a CMS for coordination: roles, review, approvals, publishing states, and revision history. AI inside the CMS can help teams create, review, and publish faster without losing control.
When many systems touch content, you need a CMS as the trusted source for content, permissions, workflows, and publishing controls. AI around the CMS can coordinate work across tools, but it still depends on the CMS to know what content is approved, who can use it, and where it can go.
In short, when many people and many systems are involved, the CMS becomes the critical control layer for people, agents, and systems working together. It gives people and agents a safe place to create and approve content, and gives other tools a trusted system they can read from, write to, and build on.
The decision, by quadrant
1. Assist: one person, one system
This is the simplest case: one person, one system, and little coordination.
If you are creating a new website quickly, an AI site builder may be the right tool. It can turn a prompt into a working site in an afternoon. In that case, a CMS may slow you down more than it helps. This is 1a in the quadrant image: the job is to create, not to manage.
But one person does not always mean a CMS is unnecessary.
My website has been around for more than twenty years. It has more than 1,500 blog posts and 10,000 photos. That is not just a website to create; it is a body of content to manage. Drupal helps me manage that content as structured content: content types, fields, taxonomy, media, revisions, URLs, and search.
I would not move my site to a standalone AI site builder. But I do use an AI agent to work on it through Drupal: updating content, improving existing features, and building new ones. This is 1b on the chart. AI helps with the execution work, while Drupal remains the control plane. This is the CMS unbundling at the smallest scale.
So use an AI builder when speed to a new site matters most. Use a CMS when the work is about managing a large or growing body of content over time: keeping it structured, consistent, reusable, and reliable.
2. Relay: many people, one system
This is a clear case for a CMS.
When many people collaborate on one website, the work becomes a "relay": a designer uploads an image, a developer builds a component, a marketer writes the copy, an editor reviews the page, legal approves it, and someone presses publish.
AI does not remove that relay; it makes it move faster. The developer may use an AI coding agent, the marketer may use an AI writing assistant, and the editor may use an AI policy checker. More work moves through the same website, with less time between handoffs.
But the moment several people and several agents are working on the same website, you need a control layer to manage roles, permissions, approvals, revision history, and one source of truth.
A CMS lets teams move at AI speed without losing track of who changed what, which version is approved, and what is safe to publish.
3. Delegate: one person, many systems
In the Delegate scenario you are still one person, so there is little coordination with other people. But the work now spans many systems: a CMS, an email marketing platform, a commerce system, a CRM, and a planning tool.
When one person spans many systems, no single product sees the whole job. The center of gravity moves to the coordinator: an automation tool that connects your systems, or an AI agent that works across their APIs.
That is why this quadrant is debatable. For a short-lived campaign, you may not need a traditional CMS. You might use an AI builder for the site and an automation tool or agent to coordinate the rest. This is 3a on the chart.
But that only works while the content is small, short-lived, and easy to manage by hand. Once the content has to be structured, reused, updated, approved, or kept consistent across systems, you need a trusted source for it. This is 3b on the quadrant image.
4. Orchestrate: many people, many systems
This is the most complex environment, and the clearest case for a CMS.
A company campaign can involve many people and many systems at once: a marketer plans the campaign, a designer reviews the creative, legal approves the content, an editor publishes the page, marketing operations builds the email, and a commerce manager checks the discount. Every person has a role, and every system has a workflow.
AI can remove much of the coordination work: reminders, status updates, handoffs, and manual routing. But coordination is not control. Someone still has to approve the content, approve the promotion, and answer for the campaign's effectiveness.
In this quadrant, the CMS has two jobs. First, it has to govern and accelerate the work that happens inside the CMS. Second, it has to make that work usable by the broader digital ecosystem.
The CMS is not necessarily the orchestrator of that ecosystem. It is the governed workspace where people and agents can work safely, and the trusted source that other systems and agents can read from, write to, and build on.
At this scale, and at AI speed, a weak content foundation becomes expensive fast. A strong CMS is not optional.
From unbundling to rebundling
One thing the grid does not show is where the market is moving the fastest. Right now, most of the visible energy is on the bottom row of Assist and Delegate, sections 1a and 3a, where no control plane is needed: one person using AI to create and coordinate faster.
In Assist, that means AI site builders that turn an idea into a working website. In Delegate, it means agents and automation for single-person workflows across different systems.
Lovable reportedly reached roughly $400 million in annual recurring revenue less than two years after launch. n8n raised $180 million at a $2.5 billion valuation in 2025.
But once many people are involved, individual productivity is no longer enough. Organizations need productivity, coordination, and control.
The current wave of AI site builders is mostly making one person faster. The next wave has to make organizations faster without losing trust.
AI is unbundling creation from the CMS and driving its cost toward zero. But once creation becomes cheap and abundant, the value shifts to control.
That is where rebundling starts. The next generation of products will combine AI-powered creation with a trusted control plane.
So, is the CMS dead? No. Its role is changing.
The more AI you use to create, translate, update, and publish content, the more you need a system that keeps that work structured, approved, reusable, and safe.
That means that a CMS is not a competing line item to your AI budget. It is what makes that budget pay off.
And the real risk is not that AI replaces your CMS. It is running AI without one.
AI gives you speed. A CMS gives you control at speed.
19 Jun 2026 4:59pm GMT
Planet Debian
Wouter Verhelst: Agentic coding and Free Software

Through work, I have paid license to windsurf (recently renamed to "devin"), an application for LLM-based (aka, "Agentic") development.
I hadn't been using it that much, but in an effort to more clearly understand how this whole AI development thing works, I decided to give it a closer look recently.
My conclusions:
In its current form, this whole LLM wave is problematic for multiple reasons. But ignoring that, and looking at the technology only, I can say that:
- it is a paradigm shift;
- it is, at the technological level, a positive evolution;
- and it is a threat to free software.
Problems
Lest someone (incorrectly) assume that I am arguing in favour of the current state of affairs with regards to LLMs, let me state this first.
The way LLMs are built today is highly parasitic. Websites are downloaded in whole, at unsustainable rates, regardless of the consent of the people who made the original content. The result is predictable: servers get overloaded, server administrators attempt to implement various mitigations. Some of these mitigations work; some do, for a while; some are entirely useless. In actual fact, the mitigations are an arms race -- if too many people implement the same mitigation, then the people who try to build yet another LLM so they can extract rent will just try to work around the mitigation, eventually they will succeed, and you'll just have to come up with another mitigation. It's a bit like spam; you introduce regex-based spam filters, they introduce spelling mistakes, you introduce bayesian filters, they add a large batch of markov chain-generated semi-nonsense words made invisible by markup, you add filters to block emails with such markup, they move the text into an image. We have working mitigations today, but eventually we'll run out of ideas.
LLMs glob up everything they can while ignoring the license of the source material. The people who push those LLMs claim that pushing the source material through the machine learning algorithms makes the output of the algorithm distinct enough from the source material that the license no longer applies; I'm not so sure that this is true. I guess the New York Times v OpenAI lawsuit will teach us some of the answer to that question here, but even so the ethical questions about "is it OK to bring down another server just so we can download the internet for another for-pay LLM" are still open. And regardless of what the law states, my opinion on "you're using my copyleft code to generate code under a different license" is not something you might like if you agree with the rent seekers' opinion on the subject.
That all being said and true, the technology works. You can have a "conversation" with an LLM that resembles a human one. If you pass it some data, you can use plain english to ask it questions about that data, which is a lot easier than to ask it about that in a formal way. You can request it to generate some code, and it will generate something that looks like what you need and that will be mostly correct for like 95% of the time.
Now, yes, 95% of the time is not 100% of the time, and no, you can't ask it to "write me a piece of software that implements this 300-page requirements document and get back to me when you're done", because it will fail, and you won't know where it has failed, and you'll take it into production and expect everything to be fine because it won't and this one minor logic bug will cause half your servers to spin and consume credits with your infrastructure provider with nothing to show for it.
But that doesn't mean you can't use an LLM to build a large piece of software. It just means you have to understand the LLMs limitations and strenghts, and use them correctly.
Here's what an LLM is good at:
- Generating plausible text
- Interpreting text to figure out what a plausible meaning or summary of that text is
- Giving vague indications as to what the probable context of a given body of text is.
It turns out that that's enough to use the LLM to build a reliable piece of software, provided you do it right.
Paradigm shift
An LLM can generate text by the truckful. The generated text could be code. Given a good enough LLM, the generated text might even run and do something useful.
You can try to blindly run the code, and if it doesn't run correctly, you can paste the error message to the LLM, and it can tell you what went wrong and how you could possibly fix it. This creates a feedback loop: you ask it for an amount of code, you run the code, you receive an error, you tell it that the code is problematic and give it the error message, it makes changes to the code, now you have something that at least no longer fails at startup.
If you ask it to add tests to make sure that your code acts as per your specification, now you get an error if and when the code doesn't act as per your specification. Or, well, at least not as per the part of the specification that was correctly turned into a unit test by the LLM.
LLMs have a context window, so if the error message is pasted in the same conversation as where the code was generated, it is able to reuse the earlier prompts to refine how it should interpret the error message that you received.
You can't really paste the source code of an entire application into the prompt of your LLM, that would quickly overrun its context window. But LLMs also allow you to provide some form of background information -- a document, say -- on which you ask it to reason. It will interpret that document, but doing so uses less of the LLMs context window. So providing the LLM with your application's source code as background information can help it understand better how your code interacts. This is especially helpful if you only provide the LLM the background information relevant to the actual question.
So now if you are able to:
- Create background context with your application's source code
- Have the LLM generate a first draft of your requested change, plus the tests to make sure it works
- Compile (if applicable) the generated code (and tests) and run said tests
- Return any error messages to the LLM with a request to correct the error
Then the combination of "getting it 95% right off the bat" and the above feedback loop means you can generate syntactically correct code, that probably does what you need, in minutes.
I say "probably" for a reason. There are going to be cases where you specify a request without a number of details (because they are implied), and the LLM will get most of those details right but just not implement the one bit because it's an automaton and it doesn't think. Or you will ask it to make sure that two bits of the application look exactly the same, without specifying that they must act the same, now and in the future, and it will just generate the same block of code twice and then in a future change it will change one but not the other.
But if you review the changes, and you have experience as a programmer, you will be able to spot most cases where the LLM got it wrong. And so it's possible, if not necessarily easy at first, to use an LLM to generate mostly correct code.
There are certain places where "mostly correct" code is not desireable. But equally, there are also cases where, "mostly correct" is good enough.
After all, most of the software you run today -- the bits of it that weren't, yet, generated by an LLM -- is only "mostly correct", too, because to err is human and we all make mistakes. If not, there wouldn't be any CVEs and your software would never do anything wrong.
Now, doing the feedback loop described above is certainly something you could do manually. You could open an account on one of the LLM websites, upload the source code of your application, ask it to generate some new feature, download the newly generated feature, run it, and then copy/paste any error messages back into the LLM.
But that's a lot of manual work of the type that computers are pretty good at. So that's what the "windsurf" tool helps you with: you run it inside your IDE -- either a VSCode-based tool that you download from their website which comes with their product preinstalled, or a separate JetBrains plugin that you can install. You can then open your entire relevant codebase in a workspace in your IDE. You then ask the LLM, through the IDE, to generate a new feature in your codebase, and to also generate the test while it's at it. It will use a mixture of LLM interpretation and non-LLM functionality to scoop out the relevant bits of your codebase to send to the LLM as background information, will send it your prompt, will download the generated code and patch or create files, will compile (if required) and run the newly generated code and tests, and will refine the generated code if the tests produce any errors. All mostly automatic; by default, running anything requires explicit confirmation. You can turn that off completely (probably not a good idea), or you can give it a whitelist of things that you don't want to confirm (perhaps OK), and the tool also passes standing instructions to the LLM to never generate any command that deletes a file (which, like with any LLM, can be overridden, but it requires you to be very stubborn and to use more credits than you'd probably like).
All this put together means you can build something without writing any piece of code, provided you do it right.
A technically positive evolution
Don't go and say, "here's a 300-page document, read it and write whatever the document says". It will get it wrong, it will write a massive test suite that it will only run at the end, it will choke itself up trying to interpret the massive amount of failures it encounters, it will fill up its context window and it will start to forget some of the requirements. That won't work.
But what you can do -- what I did, in fact -- is this.
First, create an empty workspace. Don't put any code in it.
Then, tell the LLM to generate a backend framework using technology X and a frontend framework using technology Y that initially only says "hello, world". Also add tests to it, and run the tests.
It will do that. You'll not get much, but it will work.
Then, ask it to add some UI elements. A login page, perhaps. A navigation bar. Small things. Most of it doesn't have to be functional -- but tests must be there for the bits that are, and have it run the tests and evaluate the results.
Rinse, repeat, until you have a working application.
Importantly, in between the steps, you should also run the application yourself and see if the change was implemented correctly. Sometimes it won't be. Sometimes there will be a subtle bug -- I at one point had a the application hang after a few minutes. Sometimes you tell it that there's a subtle bug, and it will discover it more quickly than you could, and it will fix it, and in implementing the fix it will uncover another bug, and then you have to fix that one -- the fix it came up with for the hang was to move something to an async process on the server, which caused the application to start spinning while trying to create hundreds of async jobs (this is when I realized that the hang was a deadlock due to some part of the codebase doing something that indirectly triggered itself). Sometimes it will try to fix the bug you tell it about, and you'll see that it's going off on a tangent that has nothing to do with what you're seeing. It's important to keep an eye on what it's doing, so you can guide it back on track when that happens -- when I told it about the hang, it started investigating the part of the code which sends out emails, thinking that it could hang while waiting for sendmail to finish, but the hang was happening when the application was idle, not when it was sending out emails, and only when I told it about it happening when it was idle did it find the deadlock.
So it's not a fully automatic process, and it needs to be guided by someone who knows what they're doing. But if that is the case, you can come up with something that works. I spent evenings and breaks for about a week, and I managed to create a working application which, had I written it by hand, would have taken me a few months of full-time work to come up with. And I now have a side project, fully complete and working, that I had been thinking about doing for more than a decade, but never got around to actually doing, because of all the work that would be involved and I just didn't see myself having the time for.
It's not perfect code. But it's mostly good enough, and it will perform the job it needs to. And it looks far slicker than most of the side projects I've done in the past, because in the past I would prioritize between implementing new features or making something look slick, and I would decide that the new feature was more important because it's only for me and there's only me and nobody cares if it looks good or not and I don't have three weeks to come up with something that looks better. But here, I found myself sometimes spending 10 minutes writing a prompt with instructions on making things look better. Because what's 10 minutes when you just spent an hour writing down and refining specifications for functionality and tests?
There are a number of other things in which an LLM can help a programmer.
For instance.
I received a bug report recently in a project I'm paid to maintain that I couldn't make heads or tails of. I opened the source code in my windsurf IDE, pasted the bug report in the prompt, and then requested the tool to analyze the source code and the associated logs and tell me how the described behavior could be happening. It turned out that I had overlooked something, but with the help of the tool, I found the bug in minutes.
I was trying to understand a particular part of a large codebase that I didn't really grasp very well. I loaded the codebase in the tool, and asked it to explain to me how a particular action is performed by the code. I requested specific functions and line numbers. I now have a far better understanding of how the code works, and will be able to write that patch that I've been wanting to write for years -- without using the LLM.
I have been struggling for, literally, years with understanding why another tool that I maintain was misbehaving in a particular way but only in Firefox. I opened the codebase in Firefox, explained the buggy behavior in plain English, and asked it to explain how this could be happening. It picked up some obscure corner case behavior of ffmpeg and mp4 containers that I was not aware of and that perfectly explained why things were misbehaving in the way that they were.
At the same time, there are limitations. Giving an LLM a codebase that was originally generated by an LLM (either the same one or another one) seems to work well. Giving it a codebase that was written by a human and expecting it to correctly update it seems to be more error-prone. I did one or two of those as a trial, and it is more problematic than anything.
An LLM is also not intelligent, notwithstanding the popular term of "Artificial Intelligence". On multiple occasions, I've asked it to write a test case for some code that was not set up to do so; and rather than suggesting a refactor is required, it would instead copy the code that needed to be tested and then test the copy, rather than the original. The tool has made multiple similar errors. I have sometimes people describe agentic coding as "similar to interacting with junior programmers", but that is not the case. A junior programmer will either fill in the gaps in your specifications, or ask for clarification when something seems off. The LLM will not do that; it will do what you ask, exactly that and nothing more. If you missed a corner case in your specification, then all bets are off.
I remember learning about programming language generations in college. A first-generation language is "machine code", a second-generation language is "assembler", a third-generation language is any high-level language such as C, Perl, or Pascal. I've forgotten what set a 3rd-generation language apart from a 4th-generation language. But I remember the definition they gave me for a 5th-generation language: "you tell the computer what to do, and it will do it". At the time, I thought it was ridiculous. Nobody could ever write something like that.
But it's here.
And it's a threat to free software.
A threat to free software?
Yes.
There is the obvious part where most of the well-known LLMs are non-free software. I mean, there are some "open source" LLM models. The windsurf tool that I used doesn't allow you to use them (directly), but they're there. There are also open source applications that implement what the windsurf editor does. So it's definitely possible to work like this without resorting to non-free software and non-free services, even though the non-free LLMs might be a bit ahead of the curve of the free ones. But that's not what I mean.
And there is also the obvious thing which I mentioned earlier in this post, which is that the people who try to build LLMs are doing it in unethical, disgusting ways, causing downtimes and disregarding licenses for whatever they can get their grubby hands on. Ideally we wouldn't be in that situation, and ideally this wouldn't be a problem, but we are where we are.
And there's the obvious thing where the OSI sold itself out and declared that a machine learning program can be open source even when the very things it was built from -- the training data -- is not available. That's a major issue that the free software community needs to fight against, but there's not really anything that that is a threat to free software. You just build your own, free software, LLM, and you're done.
The actual threat is in funding and developer support.
Most large businesses do not care about free-as-in-freedom software. They like the free-as-in-beer part, and they appreciate that the free-as-in-freedom bits can make the software more customizable. They are (mostly) happy to do sponsorships of the free-as-in-freedom projects that they use if that means their free-as-in-beer usage of the software gets improved.
But why would you care about all that when you can just generate the code you need, rather than interacting with an open source community that may or may not care about your business's interests?
Where to go from here
Although I think the moral and environmental issues with LLMs are real and problematic, given the experiments I did I am not convinced that the concept of interacting with a computer system in natural language and to use it to generate code is necessarily deficient. There are pitfalls, but they can be managed. It is possible to use such a system to create throwaway, proof-of-concept type "good enough" code bases. It can be used to interpret code bases and to understand bug reports.
I believe that the major issue with LLMs has to do with that saying about hammers and nails:
If all you have is a hammer, then everything looks like a nail.
LLMs are an outgrowth of machine learning, pushed by large corporations. These large corporations have a lot of money. If all you have is money, then every problem can be fixed by throwing more money at it. The initial language models were promising but not (yet) good enough, and it seemed that one way in which they could be improved was to increase the scale of the statistics: throw more hardware (and thus money) at it, and rather than improving the efficiency of the models, just scale up.
Scaling up is something that megacorporations are very good at. It's only a money problem, after all. Does that mean that "scaling up" is the only way to improve the models, though? I'm not convinced.
Some hardware, such as most modern Apple and Samsung devices, ship with accelerator hardware for machine learning algorithms. There are some models that are small enough to be able to run on these devices. I don't see why it should not be possible to create a small(er) language model that can do some useful part of the above-described use cases; if not locally, then at least on a server that one can run on-prem rather than requiring that you pay rent to one of the LLM companies.
The Software Freedom Conservancy has published an aspirational statement on machine learning-assisted programming that, I think, gets a lot right. It's not quite a definition, but it's something to keep in mind.
Perhaps that's the way forward?
More questions than answers at this point, anyway.
19 Jun 2026 12:09pm GMT
Junichi Uekawa: looking for last.
looking for last. I realized it's gone. what's my replacement?
19 Jun 2026 5:30am GMT
Reproducible Builds (diffoscope): diffoscope 320 released
The diffoscope maintainers are pleased to announce the release of diffoscope version 320. This version includes the following changes:
[ Chris Lamb ]
* Support androguard 4 and previous versions. Thanks, linsui!
(Closes: #1140016)
* Use --long-form arguments when calling apktool in order to support apktool
version 3. Thanks again to linsui. (Closes: #1140015)
* Update copyright years.
You find out more by visiting the project homepage.
19 Jun 2026 12:00am GMT
18 Jun 2026
Planet Lisp
Joe Marshall: Controlled Unclassified Information
Back in the day, the US government had a program called SBIR (Small Business Innovation Research) that funded small businesses to do research and development. I recall sitting in our dorm in college, reading through a giant printed catalog of SBIR grants just to amuse ourselves by brainstorming solutions over bad pizza.
.
So, I got curious the other day: what does the SBIR landscape look like now?
I can tell you right now: do not even try to read an SBIR solicitation on your local machine. You are opening yourself up to a world of absolute, unmitigated pain.
You might think, what harm could there be in simply opening a file?
Well, in the modern compliance panopticon, any manipulation of digital information that comes from the govenment has the potential to spawn CUI (Controlled Unclassified Information). CUI is basically a digital pathogen; once you download that file, *anything whatsover* derived from it, including notes and metadata, instantly becomes CUI by association. The moment you read an SBIR on your computer, you've infected your system, rendering you subject to a nightmare of Byzantine federal regulations.
These days, the amount of beurocratic red tape surrounding CUI is insane. To even look at the file legally, you need a dedicated, air-gapped machine completely disconnected from the internet, conforming to a massive, expensive slew of NIST standards covering everything from hardware-level encryption to strict access controls. Alternatively you could contract with a cloud company that offers a pre-certified "CUI-compliant" environment.
And assuming you actually shell out the cash and jump through the hoops to set up this digital containment zone just to read a PDF, you must meticulously audit and account for every single action you take in its presence. Under current federal auditing logic, you are explicitly assumed to be attempting to defraud the government unless you can produce a mountain of paper proving otherwise. Want to bring in a partner to bounce ideas around? You can't just "know a guy." You have to navigate a labyrinth of federal subcontracting regulations.
I had intended on amusing myself by reading some SBIRs and daydreaming about solutions that might involve Lisp (an impossibility in the modern enterprise stack for entirely separate, depressing reasons). Instead, I quickly discovered I did not even own the physical hardware required to even read an SBIR without running afoul of federal regulations.
I wanted to read some clever and inspiring engineering proposals. I ended up reading a lot of very dry and boring compliance regulations.
18 Jun 2026 11:48am GMT
01 Jun 2026
Planet Lisp
Joe Marshall: Regression
Last year I wrote some Lisp related AI apps. There was a syntax highlighter that used the LLM to determine how to colorize and highlight syntax, and a prompt refiner that takes a wimpy LLM prompt and creates more elaborate prompt from them.
I took the apps down last week. They were `vibe coded' and therefore approximate and had bugs (but that's to be expected), but they had a security hole where you could hijack the LLM processing with your own prompt turning my app into an open relay using my API key. Last week I discovered that my AI spend on video creation was becoming serious. This is odd because I never create AI video. It turned out that my app was being hijacked by a proxy in Luxembourg and was generating videos on my dime.
So I shut down the apps. I knew they had the potential of being abused, and I was willing to tolerate a small amount of abuse, but it didn't occur to me that syntax highlighter could be hijacked to generate gigabytes of video at my expense. Future applications will be careful to obtain the API key from the user.
01 Jun 2026 7:00am GMT
31 May 2026
Planet Lisp
Joe Marshall: CLRHack: Meta-object Protocol
Metaobject Protocol (MOP) Implementation in CLRHack
The Metaobject Protocol in CLRHack is a high-performance implementation of the Common Lisp Object System (CLOS) integrated into the .NET 8.0 Common Language Runtime (CLR). It provides a complete meta-compilation pipeline that bridges the gap between dynamic Lisp semantics and the static CIL (Common Intermediate Language) execution model.
Core Architecture
The MOP is implemented through three primary layers:
- The Metaobject Hierarchy (C#): A set of foundational classes in
LispBaserepresenting classes, methods, generic functions, and slot definitions. - The Runtime Engine (
MopRuntime): A centralized orchestrator that manages class finalization, method combination, dispatch caching, and instance allocation. - The Compiler Bridge (Lisp): Transformations in
ast.lispthat translate high-level CLOS forms (defclass,defmethod) into optimized runtime calls.
Instance Representation
Because the CLR type system is strictly single-inheritance and statically defined, CLRHack decouples Lisp-level inheritance from C# inheritance. All CLOS instances are represented by the StandardObjectInstance class, which contains:
- A reference to its
ClassMetaobject. - A private
object[] storagearray for instance slots, indexed by locations calculated during class finalization.
The Dispatch Pipeline
Generic function invocation is the most complex part of the implementation. When a generic function is called:
- Cache Lookup: The
DiscriminatingFunctionfirst checks a thread-safedispatchCacheusing anInvocationCacheKey(a stack-allocatedstruct) to find a previously computed effective method. - Applicability & Precedence: If the cache misses, the runtime computes all applicable methods and sorts them based on specializer specificity and the Class Precedence List (CPL).
- Method Combination: The
ComputeEffectiveMethodlogic builds a nested execution chain following the Standard Method Combination rules::aroundmethods are called first, withcall-next-methodprogressing to the next around method or the main chain.- The main chain executes all
:beforemethods, the primary method, and finally all:aftermethods in reverse order.
- Fast Invocation: The resulting effective method is compiled into a
Func<object[], object>that uses direct delegate invocation to minimize overhead.
Challenges and Solutions
1. Thread-Safe Non-Local Exits (call-next-method)
Challenge: call-next-method and next-method-p require access to the current invocation's state (the remaining methods and original arguments). Passing this state through every function call would break compatibility with standard Lisp function signatures.
Solution: CLRHack utilizes [ThreadStatic] fields in MopRuntime to store the currentNextMethods and currentArguments. This ensures that even in highly concurrent environments (like a web server), each OS thread has its own isolated invocation context, allowing call-next-method to function correctly without state leakage.
2. Forward References and Lazy Finalization
Challenge: Lisp allows classes to refer to superclasses that haven't been defined yet. The runtime must handle these "zombie" classes without crashing the JIT compiler.
Solution: The system implements a ForwardReferencedClassMetaobject. When a class is defined, it is automatically finalized (computing its CPL and slot layout). If a superclass is missing, a forward reference is created. The EnsureFinalized protocol ensures that inheritance is resolved and slot locations are assigned the moment the class is first instantiated or used in dispatch.
3. Performance Overhead of the "MOP Bridge"
Challenge: A naive implementation of slot-value or generic dispatch using C# reflection or linear searches is orders of magnitude slower than native C# member access.
Solution: Three distinct optimizations were applied:
- O(1) Slot Access: Each
ClassMetaobjectmaintains aSlotDictionary. Slot names are mapped to physical array indices during finalization, allowingslot-valueto perform a direct array access after a single dictionary lookup. - Compiler Primitives: The compiler identifies
SLOT-VALUEandMAKE-INSTANCEcalls and emits direct CILcallinstructions to optimizedLisp.MopRuntimemethods, bypassing the generalFuncallpath. - Zero-Allocation Cache Hits: By making
InvocationCacheKeyareadonly structand avoiding the cloning of the argument array during cache probes, the hot-path for generic function dispatch generates zero garbage for the .NET Collector.
4. Bootstrapping the COMMON-LISP Package
Challenge: Core CLOS functions like make-instance must be available as symbols in the COMMON-LISP package before user code runs, but they rely on the MOP runtime being fully initialized.
Solution: A MopRuntime.Initialize() method is injected into the entry point (Main) of every generated assembly. This method interns the necessary symbols and binds them to GenericFunctionClosureAdapter objects, ensuring that the MOP is "alive" before the first line of Lisp code executes.
Vibe coding the MOP basically involved feeding chapters 4 and 5 of the Art of the Meta-Object Protocol into the LLM and telling it to make an implementation plan. It came up with a twenty-step plan to bootstrap CLOS. I then spent the rest of the day instructing an agent to take on each task of the twenty-step plan in sequential order. At the end of the day, I had a working MOP
This is the end of my series of posts on CLRHack.
31 May 2026 7:00am GMT
25 Apr 2026
FOSDEM 2026
All FOSDEM 2026 videos are online
All video recordings from FOSDEM 2026 that are worth publishing have been processed and released. Videos are linked from the individual schedule pages for the talks and the full schedule page. They are also available, organised by room, at video.fosdem.org/2026. While all released videos have been reviewed by a human, it remains possible that one or more issues fell through the cracks. If you notice any problem with a video you care about, please let us know as soon as possible so we can look into it before the video-processing infrastructure is shut down for this edition. To report any舰
25 Apr 2026 10:00pm GMT
29 Jan 2026
FOSDEM 2026
Join the FOSDEM Treasure Hunt!
Are you ready for another challenge? We're excited to host the second yearly edition of our treasure hunt at FOSDEM! Participants must solve five sequential challenges to uncover the final answer. Update: the treasure hunt has been successfully solved by multiple participants, and the main prizes have now been claimed. But the fun doesn't stop here. If you still manage to find the correct final answer and go to Infodesk K, you will receive a small consolation prize as a reward for your effort. If you're still looking for a challenge, the 2025 treasure hunt is still unsolved, so舰
29 Jan 2026 11:00pm GMT
26 Jan 2026
FOSDEM 2026
Call for volunteers
With FOSDEM just a few days away, it is time for us to enlist your help. Every year, an enthusiastic band of volunteers make FOSDEM happen and make it a fun and safe place for all our attendees. We could not do this without you. This year we again need as many hands as possible, especially for heralding during the conference, during the buildup (starting Friday at noon) and teardown (Sunday evening). No need to worry about missing lunch at the weekend, food will be provided. Would you like to be part of the team that makes FOSDEM tick?舰
26 Jan 2026 11:00pm GMT