03 Dec 2025

feedPlanet Python

Caktus Consulting Group: LLM Basics: Ollama Function Calling

In our previous post, we introduced function calling and learned how to do it with OpenAI's LLMs. In this post, we'll call the same cactify_name function from that post using Meta's Llama 3.2 model, installed locally using Ollama. The techniques in this post should also work with other Ollama models that support function-calling.

03 Dec 2025 4:42pm GMT

Real Python: How to Use Google's Gemini CLI for AI Code Assistance

This tutorial will teach you how to use Gemini CLI to bring Google's AI-powered coding assistance directly into your terminal. After you authenticate with your Google account, this tool will be ready to help you analyze code, identify bugs, and suggest fixes-all without leaving your familiar development environment:

Gemini CLI

Imagine debugging code without switching between your console and browser, or picture getting instant explanations for unfamiliar projects. Like other command-line AI assistants, Google's Gemini CLI brings AI-powered coding assistance directly into your command line, allowing you to stay focused in your development workflow.

Whether you're troubleshooting a stubborn bug, understanding legacy code, or generating documentation, this tool acts as an intelligent pair-programming partner that understands your codebase's context.

You're about to install Gemini CLI, authenticate with Google's free tier, and put it to work on an actual Python project. You'll discover how natural language queries can help you understand code faster and catch bugs that might slip past manual review.

Prerequisites

To follow along with this tutorial, you'll need the following:

  • Google Account: A personal Google account is required to use Gemini CLI's free tier, which offers one thousand requests per day and sixty requests per minute at no charge.
  • Python 3.12 or Higher: You'll work with a Python command-line application to demonstrate Gemini CLI's capabilities. If you haven't already, install Python on your system, making sure the minimum version is Python 3.12.
  • Node.js 20 or Higher: Gemini CLI is distributed through npm, Node.js's package manager. You'll verify your Node.js installation in the next section.

Because Gemini CLI is a command-line tool, you should feel comfortable navigating your terminal and running basic shell commands.

Go ahead and download the supporting materials to get the Python project you'll be working with throughout this tutorial:

Get Your Code: Click here to download the free sample code that you'll use to take Google's Gemini CLI for a spin.

Once you've extracted the files, you'll find a todolist/ directory containing a complete Python CLI application, which is similar to the to-do app covered in another tutorial. This project will serve as your testing ground for Gemini CLI's code analysis and debugging features.

Take the Quiz: Test your knowledge with our interactive "How to Use Google's Gemini CLI for AI Code Assistance" quiz. You'll receive a score upon completion to help you track your learning progress:


How to Use Google's Gemini CLI for AI Code Assistance

Interactive Quiz

How to Use Google's Gemini CLI for AI Code Assistance

Learn how to install, authenticate, and safely use the Gemini CLI to interact with Google's Gemini models.

Step 1: Install and Set Up Gemini CLI

Before you can start using the AI-powered features of Gemini CLI, you need to get it installed on your system and authenticate with Google. In this step, you'll verify your Node.js installation, install Gemini CLI globally, and complete the authentication process to access the free tier.

Verify Your Node.js Installation

Gemini CLI is primarily implemented in TypeScript, which requires Node.js. You'll need Node.js version 20 or higher to run Gemini CLI. First, check if you have Node.js installed in the required version by opening your terminal and running this command:

Shell
$ node --version
v24.11.1

If you see a version number of 20 or higher, then you're all set. Otherwise, if you encounter a command not found error or have an older version, then you'll need to install or update Node.js before continuing.

Note: If you're on macOS or Linux, then you can leverage Homebrew to get Gemini CLI without having to install Node.js yourself.

The recommended approach to install Node.js is to use the Node Version Manager (nvm), which allows you to install and switch between multiple Node.js versions, much like pyenv does for Python. You can find detailed installation instructions for your operating system on the Node.js download page.

Once Node.js is installed, you'll also have access to the Node Package Manager (npm), which you'll use in the next step.

Install Gemini CLI Globally

With Node.js installed, you can now install Gemini CLI using npm. The -g flag installs the package globally, making the gemini command available from anywhere in your file system:

Shell
$ npm install -g @google/gemini-cli

Read the full article at https://realpython.com/how-to-use-gemini-cli/ »


[ Improve Your Python With 🐍 Python Tricks 💌 - Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

03 Dec 2025 2:00pm GMT

Real Python: Quiz: How to Use Google's Gemini CLI for AI Code Assistance

In this quiz, you'll test your understanding of the How to Use Google's Gemini CLI for AI Code Assistance tutorial.

By working through these questions, you'll revisit how to install and verify prerequisites like Node.js, explore authentication options, and understand the CLI's permission and safety model. You'll also practice managing interactive sessions, enforcing stronger models, and approving or editing shell commands securely.

To deepen your understanding, review the sections on verifying your environment, authenticating safely, and running interactive prompts in the linked tutorial.


[ Improve Your Python With 🐍 Python Tricks 💌 - Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

03 Dec 2025 12:00pm GMT

feedDjango community aggregator: Community blog posts

Django: what’s new in 6.0

Django 6.0 was released today, starting another release cycle for the loved and long-lived Python web framework (now 20 years old!). It comes with a mosaic of new features, contributed to by many, some of which I am happy to have helped with. Below is my pick of highlights from the release notes.

Upgrade with help from django-upgrade

If you're upgrading a project from Django 5.2 or earlier, please try my tool django-upgrade. It will automatically update old Django code to use new features, fixing some deprecation warnings for you, including five fixers for Django 6.0. (One day, I'll propose django-upgrade to become an official Django project, when energy and time permit…)

Template partials

There are four headline features in Django 6.0, which we'll cover before other notable changes, starting with this one:

The Django Template Language now supports template partials, making it easier to encapsulate and reuse small named fragments within a template file.

Partials are sections of a template marked by the new {% partialdef %} and {% endpartialdef %} tags. They can be reused within the same template or rendered in isolation. Let's look at examples for each use case in turn.

Reuse partials within the same template

The below template reuses a partial called filter_controls within the same template. It's defined once at the top of the template, then used twice later on. Using a partial allows the template avoid repetition without pushing the content into a separate include file.

<section id=videos>
  {% partialdef filter_controls %}
    <form>
      {{ filter_form }}
    </form>
  {% endpartialdef %}

  {% partial filter_controls %}

  <ul>
    {% for video in videos %}
      <li>
        <h2>{{ video.title }}</h2>
        ...
      </li>
    {% endfor %}
  </ul>

  {% partial filter_controls %}
</section>

Actually, we can simplify this pattern further, by using the inline option on the partialdef tag, which causes the definition to also render in place:

<section id=videos>
  {% partialdef filter_controls inline %}
    <form>
      {{ filter_form }}
    </form>
  {% endpartialdef %}

  <ul>
    {% for video in videos %}
      <li>
        <h2>{{ video.title }}</h2>
        ...
      </li>
    {% endfor %}
  </ul>

  {% partial filter_controls %}
</section>

Reach for this pattern any time you find yourself repeating template code within the same template. Because partials can use variables, you can also use them to de-duplicate when rendering similar controls with different data.

Render partials in isolation

The below template defines a view_count partial that's intended to be re-rendered in isolation. It uses the inline option, so when the whole template is rendered, the partial is included.

The page uses htmx, via my django-htmx package, to periodically refresh the view count, through the hx-* attributes. The request from htmx goes to a dedicated view that re-renders the view_count partial.

{% load django_htmx %}
<!doctype html>
<html>
  <body>
    <h1>{{ video.title }}</h1>
    <video width=1280 height=720 controls>
      <source src="{{ video.file.url }}" type="video/mp4">
      Your browser does not support the video tag.
    </video>

    {% partialdef view_count inline %}
    <section
      class=view-count
      hx-trigger="every 1s"
      hx-swap=outerHTML
      hx-get="{% url 'video-view-count' video.id %}"
    >
      {{ video.view_count }} views
    </section>
    {% endpartialdef %}

    {% htmx_script %}
  </body>
</html>

The relevant code for the two views could look like this:

from django.shortcuts import render


def video(request, video_id):
    ...
    return render(request, "video.html", {"video": video})


def video_view_count(request, video_id):
    ...
    return render(request, "video.html#view_count", {"video": video})

The initial video view renders the full template video.html. The video_view_count view renders just the view_count partial, by appending #view_count to the template name. This syntax is similar to how you'd reference an HTML fragment by its ID in a URL.

History

htmx was the main motivation for this feature, as promoted by htmx creator Carson Gross in a cross-framework review post. Using partials definitely helps maintain "Locality of behaviour" within your templates, easing authoring, debugging, and maintenance by avoiding template file sprawl.

Django's support for template partials was initially developed by Carlton Gibson in the django-template-partials package, which remains available for older Django versions. The integration into Django itself was done in a Google Summer of Code project this year, worked on by student Farhan Ali and mentored by Carlton, in Ticket #36410. You can read more about the development process in Farhan's retrospective blog post. Many thanks to Farhan for authoring, Carlton for mentoring, and Natalia Bidart, Nick Pope, and Sarah Boyce for reviewing!

Tasks framework

The next headline feature we're covering:

Django now includes a built-in Tasks framework for running code outside the HTTP request-response cycle. This enables offloading work, such as sending emails or processing data, to background workers.

Basically, there's a new API for defining and enqueuing background tasks-very cool!

Background tasks are a way of running code outside of the request-response cycle. They're a common requirement in web applications, used for sending emails, processing images, generating reports, and more.

Historically, Django has not provided any system for background tasks, and kind of ignored the problem space altogether. Developers have instead relied on third-party packages like Celery or Django Q2. While these systems are fine, they can be complex to set up and maintain, and often don't "go with the grain" of Django.

The new Tasks framework fills this gap by providing an interface to define background tasks, which task runner packages can then integrate with. This common ground allows third-party Django packages to define tasks in a standard way, assuming you'll be using a compatible task runner to execute them.

Define tasks with the new @task decorator:

from django.tasks import task


@task
def resize_video(video_id): ...

…and enqueue them for background execution with the Task.enqueue() method:

from example.tasks import resize_video


def upload_video(request):
    ...
    resize_video.enqueue(video.id)
    ...

Execute tasks

At this time, Django does not include a production-ready task backend, only two that are suitable for development and testing:

  • ImmediateBackend - runs tasks synchronously, blocking until they complete.
  • DummyBackend - does nothing when tasks are enqueued, but allows them to be inspected later. Useful for tests, where you can assert that tasks were enqueued without actually running them.

For production use, you'll need to use a third-party package that implements one, for which django-tasks, the reference implementation, is the primary option. It provides DatabaseBackend for storing tasks in your SQL database, a fine solution for many projects, avoiding extra infrastructure and allowing atomic task enqueuing within database transactions. We may see this backend merged into Django in due course, or at least become an official package, to help make Django "batteries included" for background tasks.

To use django-tasks' DatabaseBackend today, first install the package:

uv add django-tasks

Second, add these two apps to your INSTALLED_APPS setting:

INSTALLED_APPS = [
    # ...
    "django_tasks",
    "django_tasks.backends.database",
    # ...
]

Third, configure DatabaseBackend as your tasks backend in the new TASKS setting:

TASKS = {
    "default": {
        "BACKEND": "django_tasks.backends.database.DatabaseBackend",
    },
}

Fourth, run migrations to create the necessary database tables:

$ ./manage.py migrate

Finally, to run the task worker process, use the package's db_worker management command:

$ ./manage.py db_worker
Starting worker worker_id=jWLMLrms3C2NcUODYeatsqCFvd5rK6DM queues=default

This process runs indefinitely, polling for tasks and executing them, logging events as it goes:

Task id=10b794ed-9b64-4eed-950c-fcc92cd6784b path=example.tasks.echo state=RUNNING
Hello from test task!
Task id=10b794ed-9b64-4eed-950c-fcc92cd6784b path=example.tasks.echo state=SUCCEEDED

You'll want to run db_worker in production, and also in development if you want to test background task execution.

History

It's been a long path to get the Tasks framework into Django, and I'm super excited to see it finally available in Django 6.0. Jake Howard started on the idea for Wagtail, a Django-powered CMS, back in 2021, as they have a need for common task definitions across their package ecosystem. He upgraded the idea to target Django itself in 2024, when he proposed DEP 0014. As a member of the Steering Council at the time, I had the pleasure of helping review and accept the DEP.

Since then, Jake has been leading the implementation effort, building pieces first in the separate django-tasks package before preparing them for inclusion in Django itself. This step was done under Ticket #35859, with a pull request that took nearly a year to review and land. Thanks to Jake for his perseverance here, and to all reviewers: Andreas Nüßlein, Dave Gaeddert, Eric Holscher, Jacob Walls, Jake Howard, Kamal Mustafa, @rtr1, @tcely, Oliver Haas, Ran Benita, Raphael Gaschignard, and Sarah Boyce.

Read more about this feature and story in Jake's post celebrating when it was merged.

Content Security Policy support

Our third headline feature:

Built-in support for the Content Security Policy (CSP) standard is now available, making it easier to protect web applications against content injection attacks such as cross-site scripting (XSS). CSP allows declaring trusted sources of content by giving browsers strict rules about which scripts, styles, images, or other resources can be loaded.

I'm really excited about this, because I'm a bit of a security nerd who's been deploying CSP for client projects for years.

CSP is a security standard that can protect your site from cross-site scripting (XSS) and other code injection attacks. You set a content-security-policy header to declare which content sources are trusted for your site, and then browsers will block content from other sources. For example, you might declare that only scripts your domain are allowed, so an attacker who manages to inject a <script> tag pointing to evil.com would be thwarted, as the browser would refuse to load it.

Previously, Django had no built-in support for CSP, and developers had to rely on building their own, or using a third-party package like the very popular django-csp. But this was a little bit inconvenient, as it meant that other third-party packages couldn't reliably integrate with CSP, as there was no common API to do so.

The new CSP support provides all the core features that django-csp did, with a slightly tidier and more Djangoey API. To get started, first add ContentSecurityPolicyMiddleware to your MIDDLEWARE setting:

MIDDLEWARE = [
    # ...
    "django.middleware.csp.ContentSecurityPolicyMiddleware",
    # ...
]

Place it next to SecurityMiddleware, as it similarly adds security-related headers to all responses. (You do have SecurityMiddleware enabled, right?)

Second, configure your CSP policy using the new settings:

  • SECURE_CSP to configure the content-security-policy header, which is your actively enforced policy.
  • SECURE_CSP_REPORT_ONLY to configure the content-security-policy-report-only header, which sets a non-enforced policy for which browsers report violations to a specified endpoint. This option is useful for testing and monitoring a policy before enforcing it.

For example, to adopt the nonce-based strict CSP recommended by web.dev, you could start with the following setting:

from django.utils.csp import CSP

SECURE_CSP_REPORT_ONLY = {
    "script-src": [CSP.NONCE, CSP.STRICT_DYNAMIC],
    "object-src": [CSP.NONE],
    "base-uri": [CSP.NONE],
}

The CSP enum used above provides constants for CSP directives, to help avoid typos.

This policy is quite restrictive and will break most existing sites if deployed as-is, because it requires nonces, as covered next. That's why the example shows starting with the report-only mode header, to help track down places that need fixing before enforcing the policy. You'd later change to setting the SECURE_CSP setting to enforce the policy.

Anyway, those are the two basic steps to set up the new CSP support!

Nonce generation

A key part of the new feature is that nonce generation is now built-in to Django, when using the CSP middleware. Nonces are a security feature in CSP that allow you to mark specific <script> and <style> tags as trusted with a nonce attribute:

<script src=/static/app.js type=module nonce=55vsH4w7ATHB85C3MbPr_g></script>

The nonce value is randomly generated per-request, and included in the CSP header. An attacker performing content injection couldn't guess the nonce, so browsers can trust only those tags that include the correct nonce. Because nonce generation is now part of Django, third-party packages can depend on it for their <script> and <style> tags and they'll continue to work if you adopt CSP with nonces.

Nonces are the recommended way to use CSP today, avoiding problems with previous allow-list based approaches. That's why the above recommended policy enables them. To adopt a nonce-based policy, you'll need to annotate your <script> and <style> tags with the nonce value through the following steps.

First, add the new csp template context processor to your TEMPLATES setting:

TEMPLATES = [
    {
        "BACKEND": "django.template.backends.django.DjangoTemplates",
        "OPTIONS": {
            "context_processors": [
                # ...
                "django.template.context_processors.csp",
            ],
        },
    },
]

Second, annotate your <script> and <style> tags with nonce="{{ csp_nonce }}":

-   <script src="{% static 'app.js' %}" type="module"></script>
+   <script src="{% static 'app.js' %}" type="module" nonce="{{ csp_nonce }}"></script>

This can be tedious and error-prone, hence using the report-only mode first to monitor violations might be useful, especially on larger projects.

Anyway, deploying CSP right would be another post in itself, or even a book chapter, so we'll stop here for now. For more info, check out that web.dev article and the MDN CSP guide.

History

CSP itself was proposed for browsers way back in 2004, and was first implemented in Mozilla Firefox version 4, released 2011. That same year, Django Ticket #15727 was opened, proposing adding CSP support to Django. Mozilla created django-csp from 2010, before the first public availability of CSP, using it on their own Django-powered sites. The first comment on Ticket #15727 pointed to django-csp, and the community basically rolled with it as the de facto solution.

Over the years, CSP itself evolved, as did django-csp, with Rob Hudson ending up as its maintainer. Focusing on the package motivated to finally get CSP into Django itself. He made a draft PR and posted on Ticket #15727 in 2024, which I enjoyed helping review. He iterated on the PR over the next 13 months until it was finally merged for Django 6.0. Thanks to Rob for his heroic dedication here, and to all reviewers: Benjamin Balder Bach, Carlton Gibson, Collin Anderson, David Sanders, David Smith, Florian Apolloner, Harro van der Klauw, Jake Howard, Natalia Bidart, Paolo Melchiorre, Sarah Boyce, and Sébastien Corbin.

Email API updates

The fourth and final headline feature:

Email handling in Django now uses Python's modern email API, introduced in Python 3.6. This API, centered around the email.message.EmailMessage class, offers a cleaner and Unicode-friendly interface for composing and sending emails.

This is a major change, but it's unlikely to affect projects using basic email features. You can still use Django's send_mail() function and EmailMessage class as before, like:

from django.core.mail import EmailMessage

email = EmailMessage(
    subject="🐼 Need more bamboo",
    body="We are desperately low, please restock before the pandas find out!",
    from_email="zookeeper@example.com",
    to=["supplies@example.com"],
)
email.attach_file("/media/bamboo_cupboard.jpg")
email.send()

The key change is that, under-the-hood, when you call send() on a Django EmailMessage object, it now translates itself into a Python's newer email.message.EmailMessage type before sending.

Modernizing provides these benefits:

  1. Fewer bugs - many edge case bugs in Python's old email API have been fixed in the new one.
  2. Django is less hacky - a bunch of workarounds and security fixes in Django's email code have been removed.
  3. More convenient API - the new API supports some niceties, like the below inline attachment example.

Easier inline attachments with MIMEPart

Django's EmailMessage.attach() method allows you to attach a file as an attachment. Emails support images as inline attachments, which can be displayed within the HTML email body.

While you could previously use EmailMessage.attach() to add inline attachments, it was a bit fiddly, using a legacy class. Now, you can call the method with a Python email.message.MIMEPart object to add an inline attachment in a few steps:

import email.utils
from email.message import MIMEPart
from django.core.mail import EmailMultiAlternatives

message = EmailMultiAlternatives(
    subject="Cute Panda Alert",
    body="Here's a cute panda picture for you!",
    from_email="cute@example.com",
    to=["fans@example.com"],
)
with open("panda.jpg", "rb") as f:
    panda_jpeg = f.read()

cid = email.utils.make_msgid()
inline_image = MIMEPart()
inline_image.set_content(
    panda_jpeg,
    maintype="image",
    subtype="jpeg",
    disposition="inline",
    cid=cid,
)
message.attach(inline_image)
message.attach_alternative(
    f'<h1>Cute panda baby alert!</h1><img src="cid:{cid[1:-1]}">',
    "text/html",
)

It's not the simplest API, but it does expose all the power of the underlying email system, and it's better than the past situation.

History

The new email API was added to Python as provisional in version 3.4 (2014), and made stable in version 3.6 (2016). The legacy API, however, was never planned for deprecation, so there was never any deadline to upgrade Django's email handling.

In 2024, Mike Edmunds posted on the (old) django-developers mailing list, proposing the upgrade with strong reasoning and planning. This conversation led to Ticket #35581, which he worked on for eight months until it was merged. Many thanks to Mike for leading this effort, and to Sarah Boyce for reviewing! Email is not a glamorous feature, but it's a critical communication channel for nearly every Django project, so props for this.

Positional arguments in django.core.mail APIs

We're now out of the headline features and onto the "minor" changes, starting with this deprecation related to the above email changes:

django.core.mail APIs now require keyword arguments for less commonly used parameters. Using positional arguments for these now emits a deprecation warning and will raise a TypeError when the deprecation period ends:

  • All optional parameters (fail_silently and later) must be passed as keyword arguments to get_connection(), mail_admins(), mail_managers(), send_mail(), and send_mass_mail().
  • All parameters must be passed as keyword arguments when creating an EmailMessage or EmailMultiAlternatives instance, except for the first four (subject, body, from_email, and to), which may still be passed either as positional or keyword arguments.

Previously, Django would let you pass all parameters positionally, which gets a bit silly and hard to read with long parameter lists, like:

from django.core.mail import send_mail

send_mail(
    "🐼 Panda of the week",
    "This week's panda is Po Ping, sha-sha booey!",
    "updates@example.com",
    ["adam@example.com"],
    True,
)

The final True doesn't provide any clue what it means without looking up the function signature. Now, using positional arguments for those less-commonly-used parameters raises a deprecation warning, nudging you to write:

from django.core.mail import send_mail

send_mail(
    subject="🐼 Panda of the week",
    body="This week's panda is Po Ping, sha-sha booey!",
    from_email="updates@example.com",
    ["adam@example.com"],
    fail_silently=True,
)

This change is appreciated for API clarity, and Django is generally moving towards using keyword-only arguments more often. django-upgrade can automatically fix this one for you, via its mail_api_kwargs fixer.

Thanks to Mike Edmunds, again, for making this improvement in Ticket #36163.

Extended automatic shell imports

Next up:

Common utilities, such as django.conf.settings, are now automatically imported to the shell by default.

One of the headline features back in Django 5.2 was automatic model imports in the shell, making ./manage.py shell import all of your models automatically. Building on that DX boost, Django 6.0 now also imports other common utilities, for which we can find the full list by running ./manage.py shell with -v 2:

$ ./manage.py shell -v 2
6 objects imported automatically:

  from django.conf import settings
  from django.db import connection, models, reset_queries
  from django.db.models import functions
  from django.utils import timezone

...

(This is from a project without any models, so only the utilities are listed.)

So that's:

  • settings, useful for checking your runtime configuration:

    In [1]: settings.DEBUG
    Out[1]: False
    
  • connection and reset_queries(), great for checking the executed queries:

    In [1]: Book.objects.select_related('author')
    Out[1]: <QuerySet []>
    
    In [2]: connection.queries
    Out[2]:
    [{'sql': 'SELECT "example_book"."id", "example_book"."title", "example_book"."author_id", "example_author"."id", "example_author"."name" FROM "example_book" INNER JOIN "example_author" ON ("example_book"."author_id" = "example_author"."id") LIMIT 21',
      'time': '0.000'}]
    
  • models and functions, useful for advanced ORM work:

    In [1]: Book.objects.annotate(
       ...:   title_lower=functions.Lower("title")
       ...: ).filter(
       ...:   title_lower__startswith="a"
       ...: ).count()
    Out[1]: 71
    
  • timezone, useful for using Django's timezone-aware date and time utilities:

    In [1]: timezone.now()
    Out[1]: datetime.datetime(2025, 12, 1, 23, 42, 22, 558418, tzinfo=datetime.timezone.utc)
    

It remains possible to extend the automatic imports with whatever you'd like, as documented in How to customize the shell command documentation page.

Salvo Polizzi contributed the original automatic shell imports feature in Django 5.2. He's then returned to offer these extra imports for Django 6.0, in Ticket #35680. Thanks to everyone that contributed to the forum discussion agreeing on which imports to add, and to Natalia Bidart and Sarah Boyce for reviewing!

Dynamic field refresh on save()

Now let's discuss a series of ORM improvements, starting with this big one:

GeneratedFields and fields assigned expressions are now refreshed from the database after save() on backends that support the RETURNING clause (SQLite, PostgreSQL, and Oracle). On backends that don't support it (MySQL and MariaDB), the fields are marked as deferred to trigger a refresh on subsequent accesses.

Django models support having the database generate field values for you in three cases:

  1. The db_default field option, which lets the database generate the default value when creating an instance:

    from django.db import models
    from django.db.models.functions import Now
    
    
    class Video(models.Model):
        ...
        created = models.DateTimeField(db_default=Now())
    
  2. The GeneratedField field type, which is always computed by the database based on other fields in the same instance:

    from django.db import models
    from django.db.models.functions import Concat
    
    
    class Video(models.Model):
        ...
        full_title = models.GeneratedField(
            models.TextField(),
            expression=Concat(
                "title",
                models.Value(" - "),
                "subtitle",
            ),
        )
    
  3. Assigning expression values to fields before saving:

    from django.db import models
    from django.db.models.functions import Now
    
    
    class Video(models.Model):
        ...
        last_updated = models.DateTimeField()
    
    
    video = Video.objects.get(id=1)
    ...
    video.last_updated = Now()
    video.save()
    

Previously, only the first method, using db_default, would refresh the field value from the database after saving. The other two methods would leave you with only the old value or the expression object, meaning you'd need to call Model.refresh_from_db() to get any updated value if necessary. This was hard to remember and it costs an extra database query.

Now Django takes advantage of the RETURNING SQL clause to save the model instance and fetch updated dynamic field values in a single query, on backends that support it (SQLite, PostgreSQL, and Oracle). A save() call may now issue a query like:

UPDATE "example_video"
SET "last_updated" = NOW()
WHERE "example_video"."id" = 1
RETURNING "example_video"."last_updated"

Django puts the return value into the model field, so you can read it immediately after saving:

video = Video.objects.get(id=1)
...
video.last_updated = Now()
video.save()
print(video.last_updated)  # Updated value from the database

On backends that don't support RETURNING (MySQL and MariaDB), Django now marks the dynamic fields as deferred after saving. That way, the later access, as in the above example, will automatically call Model.refresh_from_db(). This ensures that you always read the updated value, even if it costs an extra query.

History

This feature was proposed in Ticket #27222 way back in 2016, by Anssi Kääriäinen. It sat dormant for most of the nine years since, but ORM boss Simon Charette picked it up earlier this year, found an implementation, and pushed it through to completion. Thanks to Simon for continuing to push the ORM forward, and to all reviewers: David Sanders, Jacob Walls, Mariusz Felisiak, nessita, Paolo Melchiorre, Simon Charette, and Tim Graham.

Universal StringAgg aggregate

The next ORM change:

The new StringAgg aggregate returns the input values concatenated into a string, separated by the delimiter string. This aggregate was previously supported only for PostgreSQL.

This aggregate is often used for making comma-separated lists of related items, among other things. Previously, it was only supported on PostgreSQL, as part of django.contrib.postgres:

from django.contrib.postgres.aggregates import StringAgg
from example.models import Video

videos = Video.objects.annotate(
    chapter_ids=StringAgg("chapter", delimiter=","),
)

for video in videos:
    print(f"Video {video.id} has chapters: {video.chapter_ids}")

…which might give you output like:

Video 104 has chapters: 71,72,74
Video 107 has chapters: 88,89,138,90,91,93

Now this aggregate is available on all database backends supported by Django, imported from django.db.models:

from django.db.models import StringAgg, Value
from example.models import Video

videos = Video.objects.annotate(
    chapter_ids=StringAgg("chapter", delimiter=Value(",")),
)

for video in videos:
    print(f"Video {video.id} has chapters: {video.chapter_ids}")

Note the delimiter argument now requires a Value() expression wrapper for literal strings, as above. This change allows you to use database functions or fields as the delimiter if desired.

While most Django projects stick to PostgreSQL, having this aggregate available on all backends is a nice improvement for cross-database compatibility, and it means third-party packages can use it without affecting their database support.

History

The PostgreSQL-specific StringAgg was added way back in Django 1.9 (2015) by Andriy Sokolovskiy, in Ticket #24301. In Ticket #35444, Chris Muthig proposed adding the Aggregate.order_by option, something used by StringAgg to specify the ordering of concatenated elements, and as a side effect this made it possible to generalize StringAgg to all backends.

Thanks to Chris for proposing and implementing this change, and to all reviewers: Paolo Melchiorre, Sarah Boyce, and Simon Charette.

BigAutoField as the default primary key type

Next up:

DEFAULT_AUTO_FIELD setting now defaults to BigAutoField

This important change helps lock in scalable larger primary keys.

Django 3.2 (2021) introduced the DEFAULT_AUTO_FIELD setting for changing the default primary key type used in models. Django uses this setting to add a primary key field called id to models that don't explicitly define a primary key field. For example, if you define a model like this:

from django.db import models


class Video(models.Model):
    title = models.TextField()

…then it will have two fields: id and title, where id uses the type defined by DEFAULT_AUTO_FIELD.

The setting can also be overridden on a per-app basis by defining AppConfig.default_auto_field in the app's apps.py file:

from django.apps import AppConfig


class ChannelConfig(AppConfig):
    name = "channel"
    default_auto_field = "django.db.models.BigAutoField"

A key motivation for adding the setting was to allow projects to switch from AutoField (a 32-bit integer) to BigAutoField (a 64-bit integer) for primary keys, without needing changes to every model. AutoField can store values up to about 2.1 billion, which sounds large but it becomes easy to hit at scale. BigAutoField can store values up to about 9.2 quintillion, which is "more than enough" for every practical purpose.

If a model using AutoField hits its maximum value, it can no longer accept new rows, a problem known as primary key exhaustion. The table is effectively blocked, requiring an urgent fix to switch the model from AutoField to BigAutoField via a locking database migration on a large table. For a great watch on how Kraken is fixing this problem, see Tim Bell's DjangoCon Europe 2025 talk, detailing some clever techniques to proactively migrate large tables with minimal downtime.

To stop this problem arising for new projects, Django 3.2 made new projects created with startproject set DEFAULT_AUTO_FIELD to BigAutoField, and new apps created with startapp set their AppConfig.default_auto_field to BigAutoField. It also added a system check to ensure that projects set DEFAULT_AUTO_FIELD explicitly, to ensure users were aware of the feature and could make an informed choice.

Now Django 6.0 changes the actual default values of the setting and app config attribute to BigAutoField. Projects using BigAutoField can remove the setting:

-DEFAULT_AUTO_FIELD = "django.db.models.BigAutoField"

…and app config attribute:

from django.apps import AppConfig

 class ChannelConfig(AppConfig):
     name = "channel"
-    default_auto_field = "django.db.models.BigAutoField"

The default startproject and startapp templates also no longer set these values. This change reduces the amount of boilerplate in new projects, and the problem of primary key exhaustion can fade into history, becoming something that most Django users no longer need to think about.

History

The addition of DEFAULT_AUTO_FIELD in Django 3.2 was proposed by Caio Ariede and implemented by Tom Forbes, in Ticket #31007. This new change in Django 6.0 was proposed and implemented by ex-Fellow Tim Graham, in Ticket #36564. Thanks to Tim for spotting that this cleanup was now possible, and to Jacob Walls and Clifford Gama for reviewing!

Template variable forloop.length

Moving on to templates, let's start with this nice little addition:

The new variable forloop.length is now available within a for loop.

This small extension makes it possible to write a template loop like this:

<ul>
  {% for goose in geese %}
    <li>
      <strong>{{ forloop.counter }}/{{ forloop.length }}</strong>: {{ goose.name }}
    </li>
  {% endfor %}
</ul>

Previously, you'd need to refer to the length in an another way, like {{ geese|length }}, which is a bit less flexible.

Thanks to Jonathan Ströbele for contributing this idea and implementation in Ticket #36186, and to David Smith, Paolo Melchiorre, and Sarah Boyce for reviewing.

querystring template tag enhancements

There are two extensions to the querystring template tag, which was added in Django 5.1 to help with building links that modify the current request's query parameters.

  1. Release note:

    The querystring template tag now consistently prefixes the returned query string with a ?, ensuring reliable link generation behavior.

    This small change improves how the tag behaves when an empty mapping of query parameters are provided. Say you had a template like this:

    <a href="{% querystring params %}">Reset search</a>
    

    …where params is a dictionary that may sometimes be empty. Previously, if params was empty, the output would be:

    <a href="">Reset search</a>
    

    Browsers treat this as a link to the same URL including the query parameters, so it would not clear the query parameters as intended. Now, with this change, the output will be:

    <a href="?">Reset search</a>
    

    Browsers treat ? as a link to the same URL without any query parameters, clearing them as the user would expect.

    Thanks to Django Fellow Sarah Boyce for spotting this improvement and implementing the fix in Ticket #36268, and for Django Fellow Natalia Bidart for reviewing!

  2. Release note:

    The querystring template tag now accepts multiple positional arguments, which must be mappings, such as QueryDict or dict.

    This enhancement allows the tag to merge multiple sources of query parameters when building the output. For example, you might have a template like this:

    <a href="{% querystring request.GET super_search_params %}">Super search</a>
    

    …where super_search_params is a dictionary of extra parameters to add to make the current search "super". The tag merges the two mappings, with later mappings taking precedence for duplicate keys.

    Thanks again to Sarah Boyce for proposing this improvement in Ticket #35529, to Giannis Terzopoulos for implementing it, and to Natalia Bidart, Sarah Boyce, and Tom Carrick for reviewing!

Fin

That's a wrap! Thank you for reading my highlights. There are plenty more changes to read about in the release notes.

Also, there are always many more behind-the-scenes improvements and bug fixes that don't make it into the release notes. Optimizations and micro-improvements get merged all the time, so don't delay, upgrade today!

Thank you to all 174 people who contributed to Django 6.0, as counted in this list by Mariusz Felisiak.

May your upgrade be swift, smooth, safe, and secure,

-Adam

03 Dec 2025 6:00am GMT

01 Dec 2025

feedDjango community aggregator: Community blog posts

Safe Django database migrations on Coolify

Deploying Django on Coolify has been fantastic so far: automatic builds, atomic releases, ephemeral containers, and simple scaling. In a previous article I explained how I set up my Django projects in Coolify, including the fact that migrations run as part of the build step. This works great for most migrations: failures abort the build, no partial deploys, no surprises.

But it breaks down for one specific class of migrations: schema-destructive migrations, i.e. migrations that remove fields, drop tables, rename columns, or otherwise make the existing running code incompatible with the current database schema.

Coolify builds your new container, applies the migrations, and only afterwards starts replacing the running container with the new one. During that in-between period (usually 1-2 minutes depending on your image size) your old Django container is still running code that expects the old schema. If a column has just been removed… boom: 500 errors for users until the new container becomes healthy and takes over.

Why you can't just move the migration step

Your first instinct might be to change when the migrations run. Let's look at some alternatives and why they don't solve the root problem.

Strategy A: the post-deploy hook

You could remove migrate from the Dockerfile and run it in a "Post-Deploy" hook (or as part of the container startup command). This means migrations run after the new container starts or just before the switch-over.

This fixes the "removing a field" problem, because the schema change happens after all old containers are gone. But it breaks the "adding a field" problem: if you add a new required column, the new code in the new container might start up and try to query that column before the migration finishes. Result: the new app crashes on startup, and the deployment fails.

Another issue: if migrations fail, your deployment still succeeds, leaving your app running with mismatched code and schema.

Verdict: worse than before.

Strategy B: separate migration job

You could create a second service in the same project that runs migrations as a one-shot job.

The problem is that you still have to choose: run it before the deploy, or after?

You've essentially re-implemented either Strategy A or the original "migrate during build" approach, just in a second container. The core compatibility problem remains.

Verdict: more complex and still unsafe.

Strategy C: blue/green databases

This is the most elaborate workaround: copy production to a new database, run migrations there, and then point the new code to the new DB.

This is incredibly complex to manage for data consistency, and it would be catastrophic for any e-commerce site:

10:00:00 - Start cloning production DB
10:00:30 - User places $500 order (written to old DB)
10:01:00 - Clone complete, run migrations on clone
10:01:30 - User registers account (written to old DB)
10:02:00 - Switch to new DB
10:02:01 - Previous order and user… GONE 💀

You're solving the wrong problem with a rocket launcher. This is enterprise-grade DevOps machinery for a basic schema compatibility issue.

Verdict: massive complexity, no real benefit.

The real solution: the two-phase deploy

The hard truth is that database migrations cannot be strictly zero-downtime if they break backward compatibility. The root problem is not when migrations run. The problem is that old and new code are talking to the same database during the deploy, so the schema must be compatible with both.

The solution is not infrastructure; it's application patterns. The key idea: Decouple the code change from the schema change. This is known as the Two-Phase Deploy Pattern, also called expand-and-contract, non-breaking migrations, or safe migrations.

Example 1: removing a field safely

Let's say we want to remove the phone_number field from our User model. The simplest way is sadly the wrong way: delete the field from models.py, run makemigrations, and deploy. This causes the server errors described above.

Instead, split it into two deploys.

Phase 1: expand (make schema compatible with both versions)

We stop using the field in code, but we keep the column in the database and make sure it doesn't break either version.

First we make the field nullable so the new code can ignore it:

# models.py
class User(models.Model):
    ...
    # We want to delete this, but first we make it nullable
    phone_number = models.TextField(null=True, blank=True)

Then remove all references to user.phone_number in templates, views, and serializers. Now it's safe to create a migration and deploy this version.

Result:

Even if some old code hits that field for a moment, it still exists, so nothing crashes.

Phase 2: contract (remove old schema elements)

Now that the production code no longer uses phone_number, we can safely drop it. Delete the field from models.py:

# models.py
class User(models.Model):
    ...
    # phone_number is gone

Create a migration and deploy this version.

Result:

No server errors.

Example 2: renaming a field safely

Renaming a field is just "remove old field + add new field," so the same pattern applies.

Phase 1: add new field + dual-write + data migration

class User(models.Model):
    old_name = models.CharField(max_length=100)
    new_name = models.CharField(max_length=100, null=True)

    def save(self, *args, **kwargs):
        # Dual-write during transition
        if self.old_name and not self.new_name:
            self.new_name = self.old_name
        super().save(*args, **kwargs)

Update your app to use new_name everywhere, but keep writing both fields for now.

Then add a data migration to your migration file:

from django.db.models import F

def copy_old_to_new(apps, schema_editor):
    User = apps.get_model('accounts', 'User')
    User.objects.filter(new_name__isnull=True).update(new_name=F('old_name'))

class Migration(migrations.Migration):
    operations = [
        migrations.AddField(...),
        migrations.RunPython(copy_old_to_new, migrations.RunPython.noop),
    ]

After deploying this version and letting it run for a while, all users should now have new_name populated, and dual-writes have ensured the values stayed in sync.

Phase 2: remove old field

Now remove old_name and the dual-write logic:

class User(models.Model):
    new_name = models.CharField(max_length=100)

Generate the migration, deploy, and you're done.

Summary

Running migrations during your Coolify build is good practice - it catches failures early and keeps your deploys atomic. But schema-destructive migrations will always conflict with rolling updates if they break compatibility between old code and the new schema.

Instead of bending Coolify into a complicated orchestration engine, embrace the proven approach: deploy schema changes in two phases, keeping them backwards-compatible.

You should use this pattern for all kinds of destructive changes:

It's simple, safe, predictable, and works with every hosting platform, including Coolify.

01 Dec 2025 6:14pm GMT

Safe Django migrations without server errors

One of the best features of modern container-based deployment platforms (Coolify included) is that they give you zero downtime rolling updates out of the box. When you push code, a new image is built, migrations run, and traffic only shifts to the new container when it's healthy.

However, this convenience creates a hidden trap for Django developers. Because Coolify performs a rolling update, there is a window of time (usually 1-2 minutes) where both the old and new versions of your application are running simultaneously against the same database.

This works fine if you are just adding a new table. But if you run a destructive migration, like removing a field or renaming a column, your deployment becomes a race condition. The new container runs the migration, the database schema changes, and your old container (which is still serving live traffic) immediately starts throwing 500 errors because it's trying to query a column that no longer exists.

This isn't a limitation of Coolify; it is a fundamental constraint of rolling deployments. Heroku, Render, Fly.io, ECS, Kubernetes - anything that swaps containers while the old version is still serving traffic has the same constraint. You cannot fix this by changing where the migration runs in your Dockerfile. You have to fix it by changing how you write migrations.

Why you can't just fix the infrastructure

Your first instinct might be to tweak the deployment settings to avoid the overlap. Let's look at why the common infrastructure workarounds don't actually solve the problem.

Strategy A: the post-deploy hook

You could remove migrate from the build step and run it in a "Post-Deploy" hook. This means migrations run after the new container is live and the old one is dead.

This fixes the "removing a field" problem, because the schema change happens after the old container is no longer serving traffic. But it breaks the "adding a field" problem: if you add a new required column, the new code in the new container might start up and try to query that column before the migration finishes. Result: the new app crashes on startup, and the deployment fails.

Another issue: if migrations fail, your deployment still succeeds, leaving your app running with mismatched code and schema.

Verdict: worse than before.

Strategy B: separate migration job

You could create a second service in the same project that runs migrations as a one-shot job.

The problem is that you still have to choose: run it before the deploy, or after?

You've essentially re-implemented either Strategy A or the original "migrate during build" approach, just in a second container. The core compatibility problem remains.

Verdict: more complex and still unsafe.

Strategy C: blue/green databases

This is the most elaborate workaround: copy production to a new database, run migrations there, and then point the new code to the new DB.

This is incredibly complex to manage for data consistency, and it would be catastrophic for any e-commerce site:

10:00:00 - Start cloning production DB
10:00:30 - User places $500 order (written to old DB)
10:01:00 - Clone complete, run migrations on clone
10:01:30 - User registers account (written to old DB)
10:02:00 - Switch to new DB
10:02:01 - Previous order and user… GONE 💀

You're solving the wrong problem with a rocket launcher. This is enterprise-grade DevOps machinery for a basic schema compatibility issue.

Verdict: massive complexity, no real benefit.

The real solution: the two-phase deploy

The hard truth is that destructive migrations can't be made safe unless the schema stays compatible with both old and new code during the rollout.

The solution is not infrastructure; it's application patterns. The key idea: decouple the code change from the schema change. This is known as the Two-Phase Deploy Pattern, also called expand-and-contract, non-breaking migrations, or safe migrations.

Example 1: removing a field safely

Let's say we want to remove the phone_number field from our User model. The simplest way is sadly the wrong way: delete the field from models.py, run makemigrations, and deploy. This causes the server errors described above.

Instead, split it into two deploys.

Phase 1: expand (make schema compatible with both versions)

We stop using the field in code, but we keep the column in the database and make sure it doesn't break either version.

First we make the field nullable so the new code can ignore it:

# models.py
class User(models.Model):
    ...
    # We want to delete this, but first we make it nullable
    phone_number = models.TextField(null=True, blank=True)

Then remove all references to user.phone_number in templates, views, and serializers. Now it's safe to create a migration and deploy this version.

Result:

Even if some old code hits that field for a moment, it still exists, so nothing crashes.

Phase 2: contract (remove old schema elements)

Now that the production code no longer uses phone_number, we can safely drop it. Delete the field from models.py:

# models.py
class User(models.Model):
    ...
    # phone_number is gone

Create a migration and deploy this version.

Result:

No server errors.

Example 2: renaming a field safely

Renaming a field is just "remove old field + add new field," so the same pattern applies.

Phase 1: add new field + dual-write + data migration

class User(models.Model):
    old_name = models.CharField(max_length=100)
    new_name = models.CharField(max_length=100, null=True)

    def save(self, *args, **kwargs):
        # Dual-write during transition
        if self.old_name and not self.new_name:
            self.new_name = self.old_name
        super().save(*args, **kwargs)

Update your app to use new_name everywhere, but keep writing both fields for now.

Then add a data migration to your migration file:

from django.db.models import F

def copy_old_to_new(apps, schema_editor):
    User = apps.get_model('accounts', 'User')
    User.objects.filter(new_name__isnull=True).update(new_name=F('old_name'))

class Migration(migrations.Migration):
    operations = [
        migrations.AddField(...),
        migrations.RunPython(copy_old_to_new, migrations.RunPython.noop),
    ]

After deploying this version and letting it run for a while, all users should now have new_name populated, and dual-writes have ensured the values stayed in sync.

Phase 2: remove old field

Now remove old_name and the dual-write logic:

class User(models.Model):
    new_name = models.CharField(max_length=100)

Generate the migration, deploy, and you're done.

Summary

Running migrations during your Coolify build is good practice - it catches failures early and keeps your deploys atomic. But schema-destructive migrations will always conflict with rolling updates if they break compatibility between old code and the new schema.

Instead of bending Coolify into a complicated orchestration engine, embrace the proven approach: deploy schema changes in two phases, keeping them backwards-compatible.

You should use this pattern for all kinds of destructive changes:

It's simple, safe, predictable, and works with every hosting platform, including Coolify.

01 Dec 2025 6:00pm GMT

11 Nov 2025

feedPlanet Twisted

Glyph Lefkowitz: The “Dependency Cutout” Workflow Pattern, Part I

Tell me if you've heard this one before.

You're working on an application. Let's call it "FooApp". FooApp has a dependency on an open source library, let's call it "LibBar". You find a bug in LibBar that affects FooApp.

To envisage the best possible version of this scenario, let's say you actively like LibBar, both technically and socially. You've contributed to it in the past. But this bug is causing production issues in FooApp today, and LibBar's release schedule is quarterly. FooApp is your job; LibBar is (at best) your hobby. Blocking on the full upstream contribution cycle and waiting for a release is an absolute non-starter.

What do you do?

There are a few common reactions to this type of scenario, all of which are bad options.

I will enumerate them specifically here, because I suspect that some of them may resonate with many readers:

  1. Find an alternative to LibBar, and switch to it.

    This is a bad idea because a transition to a core infrastructure component could be extremely expensive.

  2. Vendor LibBar into your codebase and fix your vendored version.

    This is a bad idea because carrying this one fix now requires you to maintain all the tooling associated with a monorepo1: you have to be able to start pulling in new versions from LibBar regularly, reconcile your changes even though you now have a separate version history on your imported version, and so on.

  3. Monkey-patch LibBar to include your fix.

    This is a bad idea because you are now extremely tightly coupled to a specific version of LibBar. By modifying LibBar internally like this, you're inherently violating its compatibility contract, in a way which is going to be extremely difficult to test. You can test this change, of course, but as LibBar changes, you will need to replicate any relevant portions of its test suite (which may be its entire test suite) in FooApp. Lots of potential duplication of effort there.

  4. Implement a workaround in your own code, rather than fixing it.

    This is a bad idea because you are distorting the responsibility for correct behavior. LibBar is supposed to do LibBar's job, and unless you have a full wrapper for it in your own codebase, other engineers (including "yourself, personally") might later forget to go through the alternate, workaround codepath, and invoke the buggy LibBar behavior again in some new place.

  5. Implement the fix upstream in LibBar anyway, because that's the Right Thing To Do, and burn credibility with management while you anxiously wait for a release with the bug in production.

    This is a bad idea because you are betraying your users - by allowing the buggy behavior to persist - for the workflow convenience of your dependency providers. Your users are probably giving you money, and trusting you with their data. This means you have both ethical and economic obligations to consider their interests.

    As much as it's nice to participate in the open source community and take on an appropriate level of burden to maintain the commons, this cannot sustainably be at the explicit expense of the population you serve directly.

    Even if we only care about the open source maintainers here, there's still a problem: as you are likely to come under immediate pressure to ship your changes, you will inevitably relay at least a bit of that stress to the maintainers. Even if you try to be exceedingly polite, the maintainers will know that you are coming under fire for not having shipped the fix yet, and are likely to feel an even greater burden of obligation to ship your code fast.

    Much as it's good to contribute the fix, it's not great to put this on the maintainers.

The respective incentive structures of software development - specifically, of corporate application development and open source infrastructure development - make options 1-4 very common.

On the corporate / application side, these issues are:

But there are problems on the open source side as well. Those problems are all derived from one big issue: because we're often working with relatively small sums of money, it's hard for upstream open source developers to consume either money or patches from application developers. It's nice to say that you should contribute money to your dependencies, and you absolutely should, but the cost-benefit function is discontinuous. Before a project reaches the fiscal threshold where it can be at least one person's full-time job to worry about this stuff, there's often no-one responsible in the first place. Developers will therefore gravitate to the issues that are either fun, or relevant to their own job.

These mutually-reinforcing incentive structures are a big reason that users of open source infrastructure, even teams who work at corporate users with zillions of dollars, don't reliably contribute back.

The Answer We Want

All those options are bad. If we had a good option, what would it look like?

It is both practically necessary3 and morally required4 for you to have a way to temporarily rely on a modified version of an open source dependency, without permanently diverging.

Below, I will describe a desirable abstract workflow for achieving this goal.

Step 0: Report the Problem

Before you get started with any of these other steps, write up a clear description of the problem and report it to the project as an issue; specifically, in contrast to writing it up as a pull request. Describe the problem before submitting a solution.

You may not be able to wait for a volunteer-run open source project to respond to your request, but you should at least tell the project what you're planning on doing.

If you don't hear back from them at all, you will have at least made sure to comprehensively describe your issue and strategy beforehand, which will provide some clarity and focus to your changes.

If you do hear back from them, in the worst case scenario, you may discover that a hard fork will be necessary because they don't consider your issue valid, but even that information will save you time, if you know it before you get started. In the best case, you may get a reply from the project telling you that you've misunderstood its functionality and that there is already a configuration parameter or usage pattern that will resolve your problems with no new code. But in all cases, you will benefit from early coordination on what needs fixing before you get to how to fix it.

Step 1: Source Code and CI Setup

Fork the source code for your upstream dependency to a writable location where it can live at least for the duration of this one bug-fix, and possibly for the duration of your application's use of the dependency. After all, you might want to fix more than one bug in LibBar.

You want to have a place where you can put your edits, that will be version controlled and code reviewed according to your normal development process. This probably means you'll need to have your own main branch that diverges from your upstream's main branch.

Remember: you're going to need to deploy this to your production, so testing gates that your upstream only applies to final releases of LibBar will need to be applied to every commit here.

Depending on your LibBar's own development process, this may result in slightly unusual configurations where, for example, your fixes are written against the last LibBar release tag, rather than its current5 main; if the project has a branch-freshness requirement, you might need two branches, one for your upstream PR (based on main) and one for your own use (based on the release branch with your changes).

Ideally for projects with really good CI and a strong "keep main release-ready at all times" policy, you can deploy straight from a development branch, but it's good to take a moment to consider this before you get started. It's usually easier to rebase changes from an older HEAD onto a newer one than it is to go backwards.

Speaking of CI, you will want to have your own CI system. The fact that GitHub Actions has become a de-facto lingua franca of continuous integration means that this step may be quite simple, and your forked repo can just run its own instance.

Optional Bonus Step 1a: Artifact Management

If you have an in-house artifact repository, you should set that up for your dependency too, and upload your own build artifacts to it. You can often treat your modified dependency as an extension of your own source tree and install from a GitHub URL, but if you've already gone to the trouble of having an in-house package repository, you can pretend you've taken over maintenance of the upstream package temporarily (which you kind of have) and leverage those workflows for caching and build-time savings as you would with any other internal repo.

Step 2: Do The Fix

Now that you've got somewhere to edit LibBar's code, you will want to actually fix the bug.

Step 2a: Local Filesystem Setup

Before you have a production version on your own deployed branch, you'll want to test locally, which means having both repositories in a single integrated development environment.

At this point, you will want to have a local filesystem reference to your LibBar dependency, so that you can make real-time edits, without going through a slow cycle of pushing to a branch in your LibBar fork, pushing to a FooApp branch, and waiting for all of CI to run on both.

This is useful in both directions: as you prepare the FooApp branch that makes any necessary updates on that end, you'll want to make sure that FooApp can exercise the LibBar fix in any integration tests. As you work on the LibBar fix itself, you'll also want to be able to use FooApp to exercise the code and see if you've missed anything - and this, you wouldn't get in CI, since LibBar can't depend on FooApp itself.

In short, you want to be able to treat both projects as an integrated development environment, with support from your usual testing and debugging tools, just as much as you want your deployment output to be an integrated artifact.

Step 2b: Branch Setup for PR

However, for continuous integration to work, you will also need to have a remote resource reference of some kind from FooApp's branch to LibBar. You will need 2 pull requests: the first to land your LibBar changes to your internal LibBar fork and make sure it's passing its own tests, and then a second PR to switch your LibBar dependency from the public repository to your internal fork.

At this step it is very important to ensure that there is an issue filed on your own internal backlog to drop your LibBar fork. You do not want to lose track of this work; it is technical debt that must be addressed.

Until it's addressed, automated tools like Dependabot will not be able to apply security updates to LibBar for you; you're going to need to manually integrate every upstream change. This type of work is itself very easy to drop or lose track of, so you might just end up stuck on a vulnerable version.

Step 3: Deploy Internally

Now that you're confident that the fix will work, and that your temporarily-internally-maintained version of LibBar isn't going to break anything on your site, it's time to deploy.

Some deployment heritage should help to provide some evidence that your fix is ready to land in LibBar, but at the next step, please remember that your production environment isn't necessarily emblematic of that of all LibBar users.

Step 4: Propose Externally

You've got the fix, you've tested the fix, you've got the fix in your own production, you've told upstream you want to send them some changes. Now, it's time to make the pull request.

You're likely going to get some feedback on the PR, even if you think it's already ready to go; as I said, despite having been proven in your production environment, you may get feedback about additional concerns from other users that you'll need to address before LibBar's maintainers can land it.

As you process the feedback, make sure that each new iteration of your branch gets re-deployed to your own production. It would be a huge bummer to go through all this trouble, and then end up unable to deploy the next publicly released version of LibBar within FooApp because you forgot to test that your responses to feedback still worked on your own environment.

Step 4a: Hurry Up And Wait

If you're lucky, upstream will land your changes to LibBar. But, there's still no release version available. Here, you'll have to stay in a holding pattern until upstream can finalize the release on their end.

Depending on some particulars, it might make sense at this point to archive your internal LibBar repository and move your pinned release version to a git hash of the LibBar version where your fix landed, in their repository.

Before you do this, check in with the LibBar core team and make sure that they understand that's what you're doing and they don't have any wacky workflows which may involve rebasing or eliding that commit as part of their release process.

Step 5: Unwind Everything

Finally, you eventually want to stop carrying any patches and move back to an official released version that integrates your fix.

You want to do this because this is what the upstream will expect when you are reporting bugs. Part of the benefit of using open source is benefiting from the collective work to do bug-fixes and such, so you don't want to be stuck off on a pinned git hash that the developers do not support for anyone else.

As I said in step 2b6, make sure to maintain a tracking task for doing this work, because leaving this sort of relatively easy-to-clean-up technical debt lying around is something that can potentially create a lot of aggravation for no particular benefit. Make sure to put your internal LibBar repository into an appropriate state at this point as well.

Up Next

This is part 1 of a 2-part series. In part 2, I will explore in depth how to execute this workflow specifically for Python packages, using some popular tools. I'll discuss my own workflow, standards like PEP 517 and pyproject.toml, and of course, by the popular demand that I just know will come, uv.

Acknowledgments

Thank you to my patrons who are supporting my writing on this blog. If you like what you've read here and you'd like to read more of it, or you'd like to support my various open-source endeavors, you can support my work as a sponsor!


  1. if you already have all the tooling associated with a monorepo, including the ability to manage divergence and reintegrate patches with upstream, you already have the higher-overhead version of the workflow I am going to propose, so, never mind. but chances are you don't have that, very few companies do.

  2. In any business where one must wrangle with Legal, 3 hours is a wildly optimistic estimate.

  3. c.f. @mcc@mastodon.social

  4. c.f. @geofft@mastodon.social

  5. In an ideal world every project would keep its main branch ready to release at all times, no matter what but we do not live in an ideal world.

  6. In this case, there is no question. It's 2b only, no not-2b.

11 Nov 2025 1:44am GMT

15 Oct 2025

feedPlanet Plone - Where Developers And Integrators Write

Maurits van Rees: Jakob Kahl and Erico Andrei: Flying from one Plone version to another

This is a talk about migrating from Plone 4 to 6 with the newest toolset.

There are several challenges when doing Plone migrations:

  • Highly customized source instances: custom workflow, add-ons, not all of them with versions that worked on Plone 6.
  • Complex data structures. For example a Folder with a Link as default page, with pointed to some other content which meanwhile had been moved.
  • Migrating Classic UI to Volto
  • Also, you might be migrating from a completely different CMS to Plone.

How do we do migrations in Plone in general?

  • In place migrations. Run migration steps on the source instance itself. Use the standard upgrade steps from Plone. Suitable for smaller sites with not so much complexity. Especially suitable if you do only a small Plone version update.
  • Export - import migrations. You extract data from the source, transform it, and load the structure in the new site. You transform the data outside of the source instance. Suitable for all kinds of migrations. Very safe approach: only once you are sure everything is fine, do you switch over to the newly migrated site. Can be more time consuming.

Let's look at export/import, which has three parts:

  • Extraction: you had collective.jsonify, transmogrifier, and now collective.exportimport and plone.exportimport.
  • Transformation: transmogrifier, collective.exportimport, and new: collective.transmute.
  • Load: Transmogrifier, collective.exportimport, plone.exportimport.

Transmogrifier is old, we won't talk about it now. collective.exportimport: written by Philip Bauer mostly. There is an @@export_all view, and then @@import_all to import it.

collective.transmute is a new tool. This is made to transform data from collective.exportimport to the plone.exportimport format. Potentially it can be used for other migrations as well. Highly customizable and extensible. Tested by pytest. It is standalone software with a nice CLI. No dependency on Plone packages.

Another tool: collective.html2blocks. This is a lightweight Python replacement for the JavaScript Blocks conversion tool. This is extensible and tested.

Lastly plone.exportimport. This is a stripped down version of collective.exportimport. This focuses on extract and load. No transforms. So this is best suited for importing to a Plone site with the same version.

collective.transmute is in alpha, probably a 1.0.0 release in the next weeks. Still missing quite some documentation. Test coverage needs some improvements. You can contribute with PRs, issues, docs.

15 Oct 2025 3:44pm GMT

Maurits van Rees: Mikel Larreategi: How we deploy cookieplone based projects.

We saw that cookieplone was coming up, and Docker, and as game changer uv making the installation of Python packages much faster.

With cookieplone you get a monorepo, with folders for backend, frontend, and devops. devops contains scripts to setup the server and deploy to it. Our sysadmins already had some other scripts. So we needed to integrate that.

First idea: let's fork it. Create our own copy of cookieplone. I explained this in my World Plone Day talk earlier this year. But cookieplone was changing a lot, so it was hard to keep our copy updated.

Maik Derstappen showed me copier, yet another templating language. Our idea: create a cookieplone project, and then use copier to modify it.

What about the deployment? We are on GitLab. We host our runners. We use the docker-in-docker service. We develop on a branch and create a merge request (pull request in GitHub terms). This activates a piple to check-test-and-build. When it is merged, bump the version, use release-it.

Then we create deploy keys and tokens. We give these access to private GitLab repositories. We need some changes to SSH key management in pipelines, according to our sysadmins.

For deployment on the server: we do not yet have automatic deployments. We did not want to go too fast. We are testing the current pipelines and process, see if they work properly. In the future we can think about automating deployment. We just ssh to the server, and perform some commands there with docker.

Future improvements:

  • Start the docker containers and curl/wget the /ok endpoint.
  • lock files for the backend, with pip/uv.

15 Oct 2025 3:41pm GMT

Maurits van Rees: David Glick: State of plone.restapi

[Missed the first part.]

Vision: plone.restapi aims to provide a complete, stable, documented, extensible, language-agnostic API for the Plone CMS.

New services

  • @site: global site settings. These are overall, public settings that are needed on all pages and that don't change per context.
  • @login: choose between multiple login provider.
  • @navroot: contextual data from the navigation root of the current context.
  • @inherit: contextual data from any behavior. It looks for the closest parent that has this behavior defined, and gets this data.

Dynamic teaser blocks: you can choose to customize the teaser content. So the teaser links to the item you have selected, but if you want, you can change the title and other fields.

Roadmap:

  • Don't break it.
  • 10.0 release for Plone 6.2: remove setuptools namespace.
  • Continue to support migration path from older versions: use an old plone.restapi version on an old Plone version to export it, and being able to import this to the latest versions.
  • Recycle bin (work in progress): a lot of the work from Rohan is in Classic UI, but he is working on the restapi as well.

Wishlist, no one is working on this, but would be good to have:

  • @permissions endpoint
  • @catalog endpoint
  • missing control panel
  • folder type constraints
  • Any time that you find yourself going to the Classic UI to do something, that is a sign something is missing.
  • Some changes to relative paths to fix some use cases
  • Machine readable specifications for OpenAPI, MCP
  • New forms backend
  • Bulk operations
  • Streaming API
  • External functional test suite, that you could also run against e.g. guillotina or Nick to see if it works there as well.
  • Time travel: be able to see the state of the database from some time ago. The ZODB has some options here.

15 Oct 2025 3:39pm GMT

15 Aug 2025

feedPlanet Twisted

Glyph Lefkowitz: The Futzing Fraction

The most optimistic vision of generative AI1 is that it will relieve us of the tedious, repetitive elements of knowledge work so that we can get to work on the really interesting problems that such tedium stands in the way of. Even if you fully believe in this vision, it's hard to deny that today, some tedium is associated with the process of using generative AI itself.

Generative AI also isn't free, and so, as responsible consumers, we need to ask: is it worth it? What's the ROI of genAI, and how can we tell? In this post, I'd like to explore a logical framework for evaluating genAI expenditures, to determine if your organization is getting its money's worth.

Perpetually Proffering Permuted Prompts

I think most LLM users would agree with me that a typical workflow with an LLM rarely involves prompting it only one time and getting a perfectly useful answer that solves the whole problem.

Generative AI best practices, even from the most optimistic vendors all suggest that you should continuously evaluate everything. ChatGPT, which is really the only genAI product with significantly scaled adoption, still says at the bottom of every interaction:

ChatGPT can make mistakes. Check important info.

If we have to "check important info" on every interaction, it stands to reason that even if we think it's useful, some of those checks will find an error. Again, if we think it's useful, presumably the next thing to do is to perturb our prompt somehow, and issue it again, in the hopes that the next invocation will, by dint of either:

  1. better luck this time with the stochastic aspect of the inference process,
  2. enhanced application of our skill to engineer a better prompt based on the deficiencies of the current inference, or
  3. better performance of the model by populating additional context in subsequent chained prompts.

Unfortunately, given the relative lack of reliable methods to re-generate the prompt and receive a better answer2, checking the output and re-prompting the model can feel like just kinda futzing around with it. You try, you get a wrong answer, you try a few more times, eventually you get the right answer that you wanted in the first place. It's a somewhat unsatisfying process, but if you get the right answer eventually, it does feel like progress, and you didn't need to use up another human's time.

In fact, the hottest buzzword of the last hype cycle is "agentic". While I have my own feelings about this particular word3, its current practical definition is "a generative AI system which automates the process of re-prompting itself, by having a deterministic program evaluate its outputs for correctness".

A better term for an "agentic" system would be a "self-futzing system".

However, the ability to automate some level of checking and re-prompting does not mean that you can fully delegate tasks to an agentic tool, either. It is, plainly put, not safe. If you leave the AI on its own, you will get terrible results that will at best make for a funny story45 and at worst might end up causing serious damage67.

Taken together, this all means that for any consequential task that you want to accomplish with genAI, you need an expert human in the loop. The human must be capable of independently doing the job that the genAI system is being asked to accomplish.

When the genAI guesses correctly and produces usable output, some of the human's time will be saved. When the genAI guesses wrong and produces hallucinatory gibberish or even "correct" output that nevertheless fails to account for some unstated but necessary property such as security or scale, some of the human's time will be wasted evaluating it and re-trying it.

Income from Investment in Inference

Let's evaluate an abstract, hypothetical genAI system that can automate some work for our organization. To avoid implicating any specific vendor, let's call the system "Mallory".

Is Mallory worth the money? How can we know?

Logically, there are only two outcomes that might result from using Mallory to do our work.

  1. We prompt Mallory to do some work; we check its work, it is correct, and some time is saved.
  2. We prompt Mallory to do some work; we check its work, it fails, and we futz around with the result; this time is wasted.

As a logical framework, this makes sense, but ROI is an arithmetical concept, not a logical one. So let's translate this into some terms.

In order to evaluate Mallory, let's define the Futzing Fraction, " FF ", in terms of the following variables:

H

the average amount of time a Human worker would take to do a task, unaided by Mallory

I

the amount of time that Mallory takes to run one Inference8

C

the amount of time that a human has to spend Checking Mallory's output for each inference

P

the Probability that Mallory will produce a correct inference for each prompt

W

the average amount of time that it takes for a human to Write one prompt for Mallory

E

since we are normalizing everything to time, rather than money, we do also have to account for the dollar of Mallory as as a product, so we will include the Equivalent amount of human time we could purchase for the marginal cost of one9 inference.

As in last week's example of simple ROI arithmetic, we will put our costs in the numerator, and our benefits in the denominator.

FF = W+I+C+E P H

The idea here is that for each prompt, the minimum amount of time-equivalent cost possible is W+I+C+E. The user must, at least once, write a prompt, wait for inference to run, then check the output; and, of course, pay any costs to Mallory's vendor.

If the probability of a correct answer is P=13, then they will do this entire process 3 times10, so we put P in the denominator. Finally, we divide everything by H, because we are trying to determine if we are actually saving any time or money, versus just letting our existing human, who has to be driving this process anyway, do the whole thing.

If the Futzing Fraction evaluates to a number greater than 1, as previously discussed, you are a bozo; you're spending more time futzing with Mallory than getting value out of it.

Figuring out the Fraction is Frustrating

In order to even evaluate the value of the Futzing Fraction though, you have to have a sound method to even get a vague sense of all the terms.

If you are a business leader, a lot of this is relatively easy to measure. You vaguely know what H is, because you know what your payroll costs, and similarly, you can figure out E with some pretty trivial arithmetic based on Mallory's pricing table. There are endless YouTube channels, spec sheets and benchmarks to give you I. W is probably going to be so small compared to H that it hardly merits consideration11.

But, are you measuring C? If your employees are not checking the outputs of the AI, you're on a path to catastrophe that no ROI calculation can capture, so it had better be greater than zero.

Are you measuring P? How often does the AI get it right on the first try?

Challenges to Computing Checking Costs

In the fraction defined above, the term C is going to be large. Larger than you think.

Measuring P and C with a high degree of precision is probably going to be very hard; possibly unreasonably so, or too expensive12 to bother with in practice. So you will undoubtedly need to work with estimates and proxy metrics. But you have to be aware that this is a problem domain where your normal method of estimating is going to be extremely vulnerable to inherent cognitive bias, and find ways to measure.

Margins, Money, and Metacognition

First let's discuss cognitive and metacognitive bias.

My favorite cognitive bias is the availability heuristic and a close second is its cousin salience bias. Humans are empirically predisposed towards noticing and remembering things that are more striking, and to overestimate their frequency.

If you are estimating the variables above based on the vibe that you're getting from the experience of using an LLM, you may be overestimating its utility.

Consider a slot machine.

If you put a dollar in to a slot machine, and you lose that dollar, this is an unremarkable event. Expected, even. It doesn't seem interesting. You can repeat this over and over again, a thousand times, and each time it will seem equally unremarkable. If you do it a thousand times, you will probably get gradually more anxious as your sense of your dwindling bank account becomes slowly more salient, but losing one more dollar still seems unremarkable.

If you put a dollar in a slot machine and it gives you a thousand dollars, that will probably seem pretty cool. Interesting. Memorable. You might tell a story about this happening, but you definitely wouldn't really remember any particular time you lost one dollar.

Luckily, when you arrive at a casino with slot machines, you probably know well enough to set a hard budget in the form of some amount of physical currency you will have available to you. The odds are against you, you'll probably lose it all, but any responsible gambler will have an immediate, physical representation of their balance in front of them, so when they have lost it all, they can see that their hands are empty, and can try to resist the "just one more pull" temptation, after hitting that limit.

Now, consider Mallory.

If you put ten minutes into writing a prompt, and Mallory gives a completely off-the-rails, useless answer, and you lose ten minutes, well, that's just what using a computer is like sometimes. Mallory malfunctioned, or hallucinated, but it does that sometimes, everybody knows that. You only wasted ten minutes. It's fine. Not a big deal. Let's try it a few more times. Just ten more minutes. It'll probably work this time.

If you put ten minutes into writing a prompt, and it completes a task that would have otherwise taken you 4 hours, that feels amazing. Like the computer is magic! An absolute endorphin rush.

Very memorable. When it happens, it feels like P=1.

But... did you have a time budget before you started? Did you have a specified N such that "I will give up on Mallory as soon as I have spent N minutes attempting to solve this problem with it"? When the jackpot finally pays out that 4 hours, did you notice that you put 6 hours worth of 10-minute prompt coins into it in?

If you are attempting to use the same sort of heuristic intuition that probably works pretty well for other business leadership decisions, Mallory's slot-machine chat-prompt user interface is practically designed to subvert those sensibilities. Most business activities do not have nearly such an emotionally variable, intermittent reward schedule. They're not going to trick you with this sort of cognitive illusion.

Thus far we have been talking about cognitive bias, but there is a metacognitive bias at play too: while Dunning-Kruger, everybody's favorite metacognitive bias does have some problems with it, the main underlying metacognitive bias is that we tend to believe our own thoughts and perceptions, and it requires active effort to distance ourselves from them, even if we know they might be wrong.

This means you must assume any intuitive estimate of C is going to be biased low; similarly P is going to be biased high. You will forget the time you spent checking, and you will underestimate the number of times you had to re-check.

To avoid this, you will need to decide on a Ulysses pact to provide some inputs to a calculation for these factors that you will not be able to able to fudge if they seem wrong to you.

Problematically Plausible Presentation

Another nasty little cognitive-bias landmine for you to watch out for is the authority bias, for two reasons:

  1. People will tend to see Mallory as an unbiased, external authority, and thereby see it as more of an authority than a similarly-situated human13.
  2. Being an LLM, Mallory will be overconfident in its answers14.

The nature of LLM training is also such that commonly co-occurring tokens in the training corpus produce higher likelihood of co-occurring in the output; they're just going to be closer together in the vector-space of the weights; that's, like, what training a model is, establishing those relationships.

If you've ever used an heuristic to informally evaluate someone's credibility by listening for industry-specific shibboleths or ways of describing a particular issue, that skill is now useless. Having ingested every industry's expert literature, commonly-occurring phrases will always be present in Mallory's output. Mallory will usually sound like an expert, but then make mistakes at random.15.

While you might intuitively estimate C by thinking "well, if I asked a person, how could I check that they were correct, and how long would that take?" that estimate will be extremely optimistic, because the heuristic techniques you would use to quickly evaluate incorrect information from other humans will fail with Mallory. You need to go all the way back to primary sources and actually fully verify the output every time, or you will likely fall into one of these traps.

Mallory Mangling Mentorship

So far, I've been describing the effect Mallory will have in the context of an individual attempting to get some work done. If we are considering organization-wide adoption of Mallory, however, we must also consider the impact on team dynamics. There are a number of possible potential side effects that one might consider when looking at, but here I will focus on just one that I have observed.

I have a cohort of friends in the software industry, most of whom are individual contributors. I'm a programmer who likes programming, so are most of my friends, and we are also (sigh), charitably, pretty solidly middle-aged at this point, so we tend to have a lot of experience.

As such, we are often the folks that the team - or, in my case, the community - goes to when less-experienced folks need answers.

On its own, this is actually pretty great. Answering questions from more junior folks is one of the best parts of a software development job. It's an opportunity to be helpful, mostly just by knowing a thing we already knew. And it's an opportunity to help someone else improve their own agency by giving them knowledge that they can use in the future.

However, generative AI throws a bit of a wrench into the mix.

Let's imagine a scenario where we have 2 developers: Alice, a staff engineer who has a good understanding of the system being built, and Bob, a relatively junior engineer who is still onboarding.

The traditional interaction between Alice and Bob, when Bob has a question, goes like this:

  1. Bob gets confused about something in the system being developed, because Bob's understanding of the system is incorrect.
  2. Bob formulates a question based on this confusion.
  3. Bob asks Alice that question.
  4. Alice knows the system, so she gives an answer which accurately reflects the state of the system to Bob.
  5. Bob's understanding of the system improves, and thus he will have fewer and better-informed questions going forward.

You can imagine how repeating this simple 5-step process will eventually transform Bob into a senior developer, and then he can start answering questions on his own. Making sufficient time for regularly iterating this loop is the heart of any good mentorship process.

Now, though, with Mallory in the mix, the process now has a new decision point, changing it from a linear sequence to a flow chart.

We begin the same way, with steps 1 and 2. Bob's confused, Bob formulates a question, but then:

  1. Bob asks Mallory that question.

Here, our path then diverges into a "happy" path, a "meh" path, and a "sad" path.

The "happy" path proceeds like so:

  1. Mallory happens to formulate a correct answer.
  2. Bob's understanding of the system improves, and thus he will have fewer and better-informed questions going forward.

Great. Problem solved. We just saved some of Alice's time. But as we learned earlier,

Mallory can make mistakes. When that happens, we will need to check important info. So let's get checking:

  1. Mallory happens to formulate an incorrect answer.
  2. Bob investigates this answer.
  3. Bob realizes that this answer is incorrect because it is inconsistent with some of his prior, correct knowledge of the system, or his investigation.
  4. Bob asks Alice the same question; GOTO traditional interaction step 4.

On this path, Bob spent a while futzing around with Mallory, to no particular benefit. This wastes some of Bob's time, but then again, Bob could have ended up on the happy path, so perhaps it was worth the risk; at least Bob wasn't wasting any of Alice's much more valuable time in the process.16

Notice that beginning at the start of step 4, we must begin allocating all of Bob's time to C, so C already starts getting a bit bigger than if it were just Bob checking Mallory's output specifically on tasks that Bob is doing.

That brings us to the "sad" path.

  1. Mallory happens to formulate an incorrect answer.
  2. Bob investigates this answer.
  3. Bob does not realize that this answer is incorrect because he is unable to recognize any inconsistencies with his existing, incomplete knowledge of the system.
  4. Bob integrates Mallory's incorrect information of the system into his mental model.
  5. Bob proceeds to make a larger and larger mess of his work, based on an incorrect mental model.
  6. Eventually, Bob asks Alice a new, worse question, based on this incorrect understanding.
  7. Sadly we cannot return to the happy path at this point, because now Alice must unravel the complex series of confusing misunderstandings that Mallory has unfortunately conveyed to Bob at this point. In the really sad case, Bob actually doesn't believe Alice for a while, because Mallory seems unbiased17, and Alice has to waste even more time convincing Bob before she can simply explain to him.

Now, we have wasted some of Bob's time, and some of Alice's time. Everything from step 5-10 is C, and as soon as Alice gets involved, we are now adding to C at double real-time. If more team members are pulled in to the investigation, you are now multiplying C by the number of investigators, potentially running at triple or quadruple real time.

But That's Not All

Here I've presented a brief selection reasons why C will be both large, and larger than you expect. To review:

  1. Gambling-style mechanics of the user interface will interfere with your own self-monitoring and developing a good estimate.
  2. You can't use human heuristics for quickly spotting bad answers.
  3. Wrong answers given to junior people who can't evaluate them will waste more time from your more senior employees.

But this is a small selection of ways that Mallory's output can cost you money and time. It's harder to simplistically model second-order effects like this, but there's also a broad range of possibilities for ways that, rather than simply checking and catching errors, an error slips through and starts doing damage. Or ways in which the output isn't exactly wrong, but still sub-optimal in ways which can be difficult to notice in the short term.

For example, you might successfully vibe-code your way to launch a series of applications, successfully "checking" the output along the way, but then discover that the resulting code is unmaintainable garbage that prevents future feature delivery, and needs to be re-written18. But this kind of intellectual debt isn't even specific to technical debt while coding; it can even affect such apparently genAI-amenable fields as LinkedIn content marketing19.

Problems with the Prediction of P

C isn't the only challenging term though. P, is just as, if not more important, and just as hard to measure.

LLM marketing materials love to phrase their accuracy in terms of a percentage. Accuracy claims for LLMs in general tend to hover around 70%20. But these scores vary per field, and when you aggregate them across multiple topic areas, they start to trend down. This is exactly why "agentic" approaches for more immediately-verifiable LLM outputs (with checks like "did the code work") got popular in the first place: you need to try more than once.

Independently measured claims about accuracy tend to be quite a bit lower21. The field of AI benchmarks is exploding, but it probably goes without saying that LLM vendors game those benchmarks22, because of course every incentive would encourage them to do that. Regardless of what their arbitrary scoring on some benchmark might say, all that matters to your business is whether it is accurate for the problems you are solving, for the way that you use it. Which is not necessarily going to correspond to any benchmark. You will need to measure it for yourself.

With that goal in mind, our formulation of P must be a somewhat harsher standard than "accuracy". It's not merely "was the factual information contained in any generated output accurate", but, "is the output good enough that some given real knowledge-work task is done and the human does not need to issue another prompt"?

Surprisingly Small Space for Slip-Ups

The problem with reporting these things as percentages at all, however, is that our actual definition for P is 1attempts, where attempts for any given attempt, at least, must be an integer greater than or equal to 1.

Taken in aggregate, if we succeed on the first prompt more often than not, we could end up with a P>12, but combined with the previous observation that you almost always have to prompt it more than once, the practical reality is that P will start at 50% and go down from there.

If we plug in some numbers, trying to be as extremely optimistic as we can, and say that we have a uniform stream of tasks, every one of which can be addressed by Mallory, every one of which:

Thought experiments are a dicey basis for reasoning in the face of disagreements, so I have tried to formulate something here that is absolutely, comically, over-the-top stacked in favor of the AI optimist here.

Would that be a profitable? It sure seems like it, given that we are trading off 45 minutes of human time for 1 minute of Mallory-time and 10 minutes of human time. If we ask Python:

1
2
3
4
5
>>> def FF(H, I, C, P, W, E):
...     return (W + I + C + E) / (P * H)
... FF(H=45.0, I=1.0, C=5.0, P=1/2, W=5.0, E=0.01)
...
0.48933333333333334

We get a futzing fraction of about 0.4896. Not bad! Sounds like, at least under these conditions, it would indeed be cost-effective to deploy Mallory. But… realistically, do you reliably get useful, done-with-the-task quality output on the second prompt? Let's bump up the denominator on P just a little bit there, and see how we fare:

1
2
>>> FF(H=45.0, I=1.0, C=5.0, P=1/3, W=5.0, E=0.01)
0.734

Oof. Still cost-effective at 0.734, but not quite as good. Where do we cap out, exactly?

1
2
3
4
5
6
7
8
9
>>> from itertools import count
... for A in count(start=4):
...     print(A, result := FF(H=45.0, I=1.0, C=5.0, P=1 / A, W=5.0, E=1/60.))
...     if result > 1:
...         break
...
4 0.9792592592592594
5 1.224074074074074
>>>

With this little test, we can see that at our next iteration we are already at 0.9792, and by 5 tries per prompt, even in this absolute fever-dream of an over-optimistic scenario, with a futzing fraction of 1.2240, Mallory is now a net detriment to our bottom line.

Harm to the Humans

We are treating H as functionally constant so far, an average around some hypothetical Gaussian distribution, but the distribution itself can also change over time.

Formally speaking, an increase to H would be good for our fraction. Maybe it would even be a good thing; it could mean we're taking on harder and harder tasks due to the superpowers that Mallory has given us.

But an observed increase to H would probably not be good. An increase could also mean your humans are getting worse at solving problems, because using Mallory has atrophied their skills23 and sabotaged learning opportunities2425. It could also go up because your senior, experienced people now hate their jobs26.

For some more vulnerable folks, Mallory might just take a shortcut to all these complex interactions and drive them completely insane27 directly. Employees experiencing an intense psychotic episode are famously less productive than those who are not.

This could all be very bad, if our futzing fraction eventually does head north of 1 and you need to reconsider introducing human-only workflows, without Mallory.

Abridging the Artificial Arithmetic (Alliteratively)

To reiterate, I have proposed this fraction:

FF = W+I+C+E P H

which shows us positive ROI when FF is less than 1, and negative ROI when it is more than 1.

This model is heavily simplified. A comprehensive measurement program that tests the efficacy of any technology, let alone one as complex and rapidly changing as LLMs, is more complex than could be captured in a single blog post.

Real-world work might be insufficiently uniform to fit into a closed-form solution like this. Perhaps an iterated simulation with variables based on the range of values seem from your team's metrics would give better results.

However, in this post, I want to illustrate that if you are going to try to evaluate an LLM-based tool, you need to at least include some representation of each of these terms somewhere. They are all fundamental to the way the technology works, and if you're not measuring them somehow, then you are flying blind into the genAI storm.

I also hope to show that a lot of existing assumptions about how benefits might be demonstrated, for example with user surveys about general impressions, or by evaluating artificial benchmark scores, are deeply flawed.

Even making what I consider to be wildly, unrealistically optimistic assumptions about these measurements, I hope I've shown:

  1. in the numerator, C might be a lot higher than you expect,
  2. in the denominator, P might be a lot lower than you expect,
  3. repeated use of an LLM might make H go up, but despite the fact that it's in the denominator, that will ultimately be quite bad for your business.

Personally, I don't have all that many concerns about E and I. E is still seeing significant loss-leader pricing, and I might not be coming down as fast as vendors would like us to believe, if the other numbers work out I don't think they make a huge difference. However, there might still be surprises lurking in there, and if you want to rationally evaluate the effectiveness of a model, you need to be able to measure them and incorporate them as well.

In particular, I really want to stress the importance of the influence of LLMs on your team dynamic, as that can cause massive, hidden increases to C. LLMs present opportunities for junior employees to generate an endless stream of chaff that will simultaneously:

If you've already deployed LLM tooling without measuring these things and without updating your performance management processes to account for the strange distortions that these tools make possible, your Futzing Fraction may be much, much greater than 1, creating hidden costs and technical debt that your organization will not notice until a lot of damage has already been done.

If you got all the way here, particularly if you're someone who is enthusiastic about these technologies, thank you for reading. I appreciate your attention and I am hopeful that if we can start paying attention to these details, perhaps we can all stop futzing around so much with this stuff and get back to doing real work.

Acknowledgments

Thank you to my patrons who are supporting my writing on this blog. If you like what you've read here and you'd like to read more of it, or you'd like to support my various open-source endeavors, you can support my work as a sponsor!


  1. I do not share this optimism, but I want to try very hard in this particular piece to take it as a given that genAI is in fact helpful.

  2. If we could have a better prompt on demand via some repeatable and automatable process, surely we would have used a prompt that got the answer we wanted in the first place.

  3. The software idea of a "user agent" straightforwardly comes from the legal principle of an agent, which has deep roots in common law, jurisprudence, philosophy, and math. When we think of an agent (some software) acting on behalf of a principal (a human user), this historical baggage imputes some important ethical obligations to the developer of the agent software. genAI vendors have been as eager as any software vendor to dodge responsibility for faithfully representing the user's interests even as there are some indications that at least some courts are not persuaded by this dodge, at least by the consumers of genAI attempting to pass on the responsibility all the way to end users. Perhaps it goes without saying, but I'll say it anyway: I don't like this newer interpretation of "agent".

  4. "Vending-Bench: A Benchmark for Long-Term Coherence of Autonomous Agents", Axel Backlund, Lukas Petersson, Feb 20, 2025

  5. "random thing are happening, maxed out usage on api keys", @leojr94 on Twitter, Mar 17, 2025

  6. "New study sheds light on ChatGPT's alarming interactions with teens"

  7. "Lawyers submitted bogus case law created by ChatGPT. A judge fined them $5,000", by Larry Neumeister for the Associated Press, June 22, 2023

  8. During which a human will be busy-waiting on an answer.

  9. Given the fluctuating pricing of these products, and fixed subscription overhead, this will obviously need to be amortized; including all the additional terms to actually convert this from your inputs is left as an exercise for the reader.

  10. I feel like I should emphasize explicitly here that everything is an average over repeated interactions. For example, you might observe that a particular LLM has a low probability of outputting acceptable work on the first prompt, but higher probability on subsequent prompts in the same context, such that it usually takes 4 prompts. For the purposes of this extremely simple closed-form model, we'd still consider that a P of 25%, even though a more sophisticated model, or a monte carlo simulation that sets progressive bounds on the probability, might produce more accurate values.

  11. No it isn't, actually, but for the sake of argument let's grant that it is.

  12. It's worth noting that all this expensive measuring itself must be included in C until you have a solid grounding for all your metrics, but let's optimistically leave all of that out for the sake of simplicity.

  13. "AI Company Poll Finds 45% of Workers Trust the Tech More Than Their Peers", by Suzanne Blake for Newsweek, Aug 13, 2025

  14. AI Chatbots Remain Overconfident - Even When They're Wrong by Jason Bittel for the Dietrich College of Humanities and Social Sciences at Carnegie Mellon University, July 22, 2025

  15. AI Mistakes Are Very Different From Human Mistakes by Bruce Schneier and Nathan E. Sanders for IEEE Spectrum, Jan 13, 2025

  16. Foreshadowing is a narrative device in which a storyteller gives an advance hint of an upcoming event later in the story.

  17. "People are worried about the misuse of AI, but they trust it more than humans"

  18. "Why I stopped using AI (as a Senior Software Engineer)", theSeniorDev YouTube channel, Jun 17, 2025

  19. "I was an AI evangelist. Now I'm an AI vegan. Here's why.", Joe McKay for the greatchatlinkedin YouTube channel, Aug 8, 2025

  20. "What LLM is The Most Accurate?"

  21. "Study Finds That 52 Percent Of ChatGPT Answers to Programming Questions are Wrong", by Sharon Adarlo for Futurism, May 23, 2024

  22. "Off the Mark: The Pitfalls of Metrics Gaming in AI Progress Races", by Tabrez Syed on BoxCars AI, Dec 14, 2023

  23. "I tried coding with AI, I became lazy and stupid", by Thomasorus, Aug 8, 2025

  24. "How AI Changes Student Thinking: The Hidden Cognitive Risks" by Timothy Cook for Psychology Today, May 10, 2025

  25. "Increased AI use linked to eroding critical thinking skills" by Justin Jackson for Phys.org, Jan 13, 2025

  26. "AI could end my job - Just not the way I expected" by Manuel Artero Anguita on dev.to, Jan 27, 2025

  27. "The Emerging Problem of "AI Psychosis"" by Gary Drevitch for Psychology Today, July 21, 2025.

15 Aug 2025 7:51am GMT

09 Aug 2025

feedPlanet Twisted

Glyph Lefkowitz: R0ML’s Ratio

My father, also known as "R0ML" once described a methodology for evaluating volume purchases that I think needs to be more popular.

If you are a hardcore fan, you might know that he has already described this concept publicly in a talk at OSCON in 2005, among other places, but it has never found its way to the public Internet, so I'm giving it a home here, and in the process, appropriating some of his words.1


Let's say you're running a circus. The circus has many clowns. Ten thousand clowns, to be precise. They require bright red clown noses. Therefore, you must acquire a significant volume of clown noses. An enterprise licensing agreement for clown noses, if you will.

If the nose plays, it can really make the act. In order to make sure you're getting quality noses, you go with a quality vendor. You select a vendor who can supply noses for $100 each, at retail.

Do you want to buy retail? Ten thousand clowns, ten thousand noses, one hundred dollars: that's a million bucks worth of noses, so it's worth your while to get a good deal.

As a conscientious executive, you go to the golf course with your favorite clown accessories vendor and negotiate yourself a 50% discount, with a commitment to buy all ten thousand noses.

Is this a good deal? Should you take it?

To determine this, we will use an analytical tool called R0ML's Ratio (RR).

The ratio has 2 terms:

  1. the Full Undiscounted Retail List Price of Units Used (FURLPoUU), which can of course be computed by the individual retail list price of a single unit (in our case, $100) multiplied by the number of units used
  2. the Total Price of the Entire Enterprise Volume Licensing Agreement (TPotEEVLA), which in our case is $500,000.

It is expressed as:

RR = TPotEEVLA FURLPoUU

Crucially, you must be able to compute the number of units used in order to complete this ratio. If, as expected, every single clown wears their nose at least once during the period of the license agreement, then our Units Used is 10,000, our FURLPoUU is $1,000,000 and our TPotEEVLA is $500,000, which makes our RR 0.5.

Congratulations. If R0ML's Ratio is less than 1, it's a good deal. Proceed.

But… maybe the nose doesn't play. Not every clown's costume is an exact clone of the traditional, stereotypical image of a clown. Many are avant-garde. Perhaps this plentiful proboscis pledge was premature. Here, I must quote the originator of this theoretical framework directly:

What if the wheeze doesn't please?

What if the schnozz gives some pause?

In other words: what if some clowns don't wear their noses?

If we were to do this deal, and then ask around afterwards to find out that only 200 of our 10,000 clowns were to use their noses, then FURLPoUU comes out to 200 * $100, for a total of $20,000. In that scenario, RR is 25, which you may observe is substantially greater than 1.

If you do a deal where R0ML's ratio is greater than 1, then you are the bozo.


I apologize if I have belabored this point. As R0ML expressed in the email we exchanged about this many years ago,

I do not mind if you blog about it - and I don't mind getting the credit - although one would think it would be obvious.

And yeah, one would think this would be obvious? But I have belabored it because many discounted enterprise volume purchasing agreements still fail the R0ML's Ratio Bozo Test.2

In the case of clown noses, if you pay the discounted price, at least you get to keep the nose; maybe lightly-used clown noses have some resale value. But in software licensing or SaaS deals, once you've purchased the "discounted" software or service, once you have provisioned the "seats", the money is gone, and if your employees don't use it, then no value for your organization will ever result.

Measuring number of units used is very important. Without this number, you have no idea if you are a bozo or not.

It is often better to give your individual employees a corporate card and allow them to make arbitrary individual purchases of software licenses and SaaS tools, with minimal expense-reporting overhead; this will always keep R0ML's Ratio at 1.0, and thus, you will never be a bozo.

It is always better to do that the first time you are purchasing a new software tool, because the first time making such a purchase you (almost by definition) have no information about "units used" yet. You have no idea - you cannot have any idea - if you are a bozo or not.

If you don't know who the bozo is, it's probably you.

Acknowledgments

Thank you for reading, and especially thank you to my patrons who are supporting my writing on this blog. Of course, extra thanks to dad for, like, having this idea and doing most of the work here beyond my transcription. If you like my dad's ideas and you'd like to post more of them, or you'd like to support my various open-source endeavors, you can support my work as a sponsor!


  1. One of my other favorite posts on this blog was just stealing another one of his ideas, so hopefully this one will be good too.

  2. This concept was first developed in 2001, but it has some implications for extremely recent developments in the software industry; but that's a post for another day.

09 Aug 2025 4:41am GMT