23 Jan 2026
Planet Python
death and gravity: DynamoDB crash course: part 1 – philosophy
This is part one of a series covering core DynamoDB concepts and patterns, from the data model and features all the way up to single-table design.
The goal is to get you to understand what idiomatic usage looks like and what the trade-offs are in under an hour, providing entry points to detailed documentation.
(Don't get me wrong, the AWS documentation is comprehensive, but can be quite complex, and DynamoDB being a relatively low level product with lots of features added over the years doesn't really help with that.)
Today, we're looking at what DynamoDB is and why it is the way it is.
What is DynamoDB? #
Quoting Wikipedia:
Amazon DynamoDB is a managed NoSQL database service provided by AWS. It supports key-value and document data structures and is designed to handle a wide range of applications requiring scalability and performance.
See also
This definition should suffice for now; for a more detailed refresher, see:
The DynamoDB data model can be summarized as follows:
A table is a collection of items, and an item is a collection of named attributes. Items are uniquely identified by a partition key attribute and an optional sort key attribute. The partition key determines where (i.e. on what computer) an item is stored. The sort key is used to get ordered ranges of items from a specific partition.
That's is, that's the whole data model. Sure, there's indexes and transactions and other features, but at its core, this is it. Put another way:
A DynamoDB table is a hash table of B-trees1 - partition keys are hash table keys, and sort keys are B-tree keys. Because of this, any access not based on partition and sort key is expensive, since you end up doing a full table scan.
If you were to implement this model in Python, it'd look something like this:
from collections import defaultdict
from sortedcontainers import SortedDict
class Table:
def __init__(self, pk_name, sk_name):
self._pk_name = pk_name
self._sk_name = sk_name
self._partitions = defaultdict(SortedDict)
def put_item(self, item):
pk, sk = item[self._pk_name], item[self._sk_name]
old_item = self._partitions[pk].setdefault(sk, {})
old_item.clear()
old_item.update(item)
def get_item(self, pk, sk):
return dict(self._partitions[pk][sk])
def query(self, pk, minimum=None, maximum=None, inclusive=(True, True), reverse=False):
# in the real DynamoDB, this operation is paginated
partition = self._partitions[pk]
for sk in partition.irange(minimum, maximum, inclusive, reverse):
yield dict(partition[sk])
def scan(self):
# in the real DynamoDB, this operation is paginated
for partition in self._partitions.values():
for item in partition.values():
yield dict(item)
def update_item(self, item):
pk, sk = item[self._pk_name], item[self._sk_name]
old_item = self._partitions[pk].setdefault(sk, {})
old_item.update(item)
def delete_item(self, pk, sk):
del self._partitions[pk][sk]
>>> table = Table('Artist', 'SongTitle')
>>>
>>> table.put_item({'Artist': '1000mods', 'SongTitle': 'Vidage', 'Year': 2011})
>>> table.put_item({'Artist': '1000mods', 'SongTitle': 'Claws', 'Album': 'Vultures'})
>>> table.put_item({'Artist': 'Kyuss', 'SongTitle': 'Space Cadet'})
>>>
>>> table.get_item('1000mods', 'Claws')
{'Artist': '1000mods', 'SongTitle': 'Claws', 'Album': 'Vultures'}
>>> [i['SongTitle'] for i in table.query('1000mods')]
['Claws', 'Vidage']
>>> [i['SongTitle'] for i in table.query('1000mods', minimum='Loose')]
['Vidage']
Philosophy #
One can't help but feel this kind of simplicity would be severely limiting.
A consequence of DynamoDB being this low level is that, unlike with most relational databases, query planning and sometimes index management happen at the application level, i.e. you have to do them yourself in code. In turn, this means you need to have a clear, upfront understanding of your application's access patterns, and accept that changes in access patterns will require changes to the application.
In return, you get a fully managed, highly-available database that scales infinitely:2 there are no servers to take care of, there's almost no downtime, and there are no limits on table size or the number of items in a table; where limits do exist, they are clearly documented, allowing for predictable performance.
This highlights an intentional design decision that is essentially DynamoDB's main proposition to you as its user: data modeling complexity is always preferable to complexity coming from infrastructure maintenance, availability, and scalability (what AWS marketing calls "undifferentiated heavy lifting").
To help manage this complexity, a number of design patterns have arisen, covered extensively by the official documentation, and which we'll discuss in a future article. Even so, the toll can be heavy - by AWS's own admission, the prime disadvantage of single table design, the fundamental design pattern, is that:
[the] learning curve can be steep due to paradoxical design compared to relational databases
As this walkthrough puts it:
a well-optimized single-table DynamoDB layout looks more like machine code than a simple spreadsheet
...which, admittedly, sounds pretty cool, but also why would I want that? After all, most useful programming most people do is one or two abstraction levels above assembly, itself one over machine code.
See also
- NoSQL design
- (unofficial) # The DynamoDB philosophy of limits
A bit of history #
Perhaps it's worth having a look at where DynamoDB comes from.
Amazon.com used Oracle databases for a long time. To cope with the increasing scale, they first adopted a database-per-service model, and then sharding, with all the architectural and operational overhead you would expect. At its 2017 peak (five years after DynamoDB was released in AWS, and over ten years after some version of it was available internally), they still had 75 PB of data in nearly 7500 Oracle databases, owned by 100+ teams, with thousands of applications, for OLTP workloads alone. That sounds pretty traumatic - it was definitely bad enough to allegedly ban OLTP relational databases internally, and require that teams get VP approval to use one.
Yeah, coming from that, it's hard to argue DynamoDB adds complexity.
That is not to say relational databases cannot be as scalable as DynamoDB, just that Amazon doesn't belive in them - distributed SQL databases like Google's Spanner and CockroachDB have existed for a while now, and even AWS seems to be warming up to the idea.
This might also explain why the design patterns are so slow to make their way into SDKs, or even better, into DynamoDB itself; when you have so many applications and so many experienced teams, the cost of yet another bit of code to do partition key sharding just isn't that great.
See also
- (paper) Amazon DynamoDB: A Scalable, Predictably Performant, and Fully Managed NoSQL Database Service (2022)
- (paper) Dynamo: Amazon's Highly Available Key-value Store (2007)
Anyway, that's it for now.
In the next article, we'll have a closer look at the DynamoDB data model and features.
Learned something new today? Share it with others, it really helps!
Want to know when new articles come out? Subscribe here to get new stuff straight to your inbox!
-
Or any other sorted data structure that allows fast searches, sequential access, insertions, and deletions. [return]
-
As the saying goes, the cloud is just someone else's computers. Here, "infinitely" means it scales horizontally, and you'll run out of money before AWS runs out of computers. [return]
23 Jan 2026 8:40am GMT
22 Jan 2026
Planet Python
The Python Coding Stack: The Orchestra Conductor, The Senior Programmer, and AI • [Club]
I spent a few years learning to play the piano when I was a child. It was always clear I would never be a concert pianist. Or a pianist of any description. This is to say that I don't know much about music. I still struggle to understand why an orchestra needs a conductor-don't the musicians all have the score that they can play nearly perfectly?
And many people who comment about programming and AI know as much about programming as I know about music…and probably even less about AI.
But the orchestra conductor analogy seems a good one. Let me explore it further.
22 Jan 2026 10:33pm GMT
Reuven Lerner: Learn to code with AI — not just write prompts
The AI revolution is here. Engineers at major companies are now using AI instead of writing code directly.
But there's a gap: Most developers know how to write code OR how to prompt AI, but not both. When working with real data, vague AI prompts produce code that might work on sample datasets but creates silent errors, performance issues, or incorrect analyses with messy, real-world data that requires careful handling.
I've spent 30 years teaching Python at companies like Apple, Intel, and Cisco, plus at conferences worldwide. I'm adapting my teaching for the AI era.
Specifically: I'm launching AI-Powered Python Practice Workshops. These are hands-on sessions where you'll solve real problems using Claude Code, then learn to critically evaluate and improve the results.
Here's how it works:
- I present a problem
- You solve it using Claude Code
- We compare prompts, discuss what worked (and what didn't)
- I provide deep-dives on both the Python concepts AND the AI collaboration techniques
In 3 hours, we'll cover 3-4 exercises. That'll give you a chance to learn two skills: Python/Pandas AND effective AI collaboration. That'll make you more effective at coding, and at the data analysis techniques that actually work with messy, real-world datasets.
Each workshop costs $200 for LernerPython members. Not a member? Total cost is $700 ($500 annual membership + $200 workshop fee). Want both workshops? $900 total ($500 membership + $400 for both workshops). Plus you get 40+ courses, 500+ exercises, office hours, Discord, and personal mentorship.
AI-Powered Python Practice Workshop
- Focus is on the Python language, standard library, and common packages
- Monday, February 2nd
- 10 a.m. - 1 p.m. Eastern / 3 p.m. - 6 p.m. London / 5 p.m. - 8 p.m. Israel
- Sign up here: https://lernerpython.com/product/ai-python-workshop-1/
AI-Powered Pandas Practice Workshop
- Focus is on data analysis with Pandas
- Monday, February 9th
- 10 a.m. - 1 p.m. Eastern / 3 p.m. - 6 p.m. London / 5 p.m. - 8 p.m. Israel
- Sign up here: https://lernerpython.com/product/ai-pandas-workshop-1/
I want to encourage lots of discussion and interactions, so I'm limiting the class to 20 total participants. Both sessions will be recorded, and will be available to all participants.
Questions? Just e-mail me at reuven@lernerpython.com.
The post Learn to code with AI - not just write prompts appeared first on Reuven Lerner.
22 Jan 2026 3:53pm GMT
Planet Plone - Where Developers And Integrators Write
Maurits van Rees: Mikel Larreategi: How we deploy cookieplone based projects.

We saw that cookieplone was coming up, and Docker, and as game changer uv making the installation of Python packages much faster.
With cookieplone you get a monorepo, with folders for backend, frontend, and devops. devops contains scripts to setup the server and deploy to it. Our sysadmins already had some other scripts. So we needed to integrate that.
First idea: let's fork it. Create our own copy of cookieplone. I explained this in my World Plone Day talk earlier this year. But cookieplone was changing a lot, so it was hard to keep our copy updated.
Maik Derstappen showed me copier, yet another templating language. Our idea: create a cookieplone project, and then use copier to modify it.
What about the deployment? We are on GitLab. We host our runners. We use the docker-in-docker service. We develop on a branch and create a merge request (pull request in GitHub terms). This activates a piple to check-test-and-build. When it is merged, bump the version, use release-it.
Then we create deploy keys and tokens. We give these access to private GitLab repositories. We need some changes to SSH key management in pipelines, according to our sysadmins.
For deployment on the server: we do not yet have automatic deployments. We did not want to go too fast. We are testing the current pipelines and process, see if they work properly. In the future we can think about automating deployment. We just ssh to the server, and perform some commands there with docker.
Future improvements:
- Start the docker containers and curl/wget the
/okendpoint. - lock files for the backend, with pip/uv.
22 Jan 2026 9:43am GMT
Maurits van Rees: Jakob Kahl and Erico Andrei: Flying from one Plone version to another

This is a talk about migrating from Plone 4 to 6 with the newest toolset.
There are several challenges when doing Plone migrations:
- Highly customized source instances: custom workflow, add-ons, not all of them with versions that worked on Plone 6.
- Complex data structures. For example a Folder with a Link as default page, with pointed to some other content which meanwhile had been moved.
- Migrating Classic UI to Volto
- Also, you might be migrating from a completely different CMS to Plone.
How do we do migrations in Plone in general?
- In place migrations. Run migration steps on the source instance itself. Use the standard upgrade steps from Plone. Suitable for smaller sites with not so much complexity. Especially suitable if you do only a small Plone version update.
- Export - import migrations. You extract data from the source, transform it, and load the structure in the new site. You transform the data outside of the source instance. Suitable for all kinds of migrations. Very safe approach: only once you are sure everything is fine, do you switch over to the newly migrated site. Can be more time consuming.
Let's look at export/import, which has three parts:
- Extraction: you had collective.jsonify, transmogrifier, and now collective.exportimport and plone.exportimport.
- Transformation: transmogrifier, collective.exportimport, and new: collective.transmute.
- Load: Transmogrifier, collective.exportimport, plone.exportimport.
Transmogrifier is old, we won't talk about it now. collective.exportimport: written by Philip Bauer mostly. There is an @@export_all view, and then @@import_all to import it.
collective.transmute is a new tool. This is made to transform data from collective.exportimport to the plone.exportimport format. Potentially it can be used for other migrations as well. Highly customizable and extensible. Tested by pytest. It is standalone software with a nice CLI. No dependency on Plone packages.
Another tool: collective.html2blocks. This is a lightweight Python replacement for the JavaScript Blocks conversion tool. This is extensible and tested.
Lastly plone.exportimport. This is a stripped down version of collective.exportimport. This focuses on extract and load. No transforms. So this is best suited for importing to a Plone site with the same version.
collective.transmute is in alpha, probably a 1.0.0 release in the next weeks. Still missing quite some documentation. Test coverage needs some improvements. You can contribute with PRs, issues, docs.
22 Jan 2026 9:43am GMT
Maurits van Rees: Fred van Dijk: Behind the screens: the state and direction of Plone community IT

This is a talk I did not want to give.
I am team lead of the Plone Admin team, and work at kitconcept.
The current state: see the keynotes, lots happening on the frontend. Good.
The current state of our IT: very troubling and daunting.
This is not a 'blame game'. But focussing on resources and people this conference should be a first priority. We are a real volunteer organisation, nobody is pushing anybody around. That is a strength, but also a weakness. We also see that in the Admin team.
The Admin team is 4 senior Plonistas as allround admin, 2 release managers, 2 CI/CD experts. 3 former board members, everyone overburdened with work. We had all kinds of plans for this year, but we have mostly been putting out fires.
We are a volunteer organisation, and don't have a big company behind us that can throw money at the problems. Strength and weakness. In all society it is a problem that volunteers are decreasing.
Root causes:
- We failed to scale down in time in our IT landscape and usage.
- We have no clean role descriptions, team descriptions, we can't ask a minimum effort per week or month.
- The trend is more communication channels, platforms to join and promote yourself, apps to use.
Overview of what have have to keep running as admin team:
- Support main development process: github, CI/CD, Jenkins main and runners, dist.plone.org.
- Main communication, documentation: pone.org, docs.plone.org, training.plone.org, conf and country sites, Matomo.
- Community office automation: Google docds, workspacae, Quaive, Signal, Slack
- Broader: Discourse and Discord
The first two are really needed, the second we already have some problems with.
Some services are self hosted, but also a lot of SAAS services/platforms. In all, it is quite a bit.
The Admin team does not officially support all of these, but it does provide fallback support. It is too much for the current team.
There are plans for what we can improve in the short term. Thank you to a lot of people that I have already talked to about this. 3 areas: GitHub setup and config, Google Workspace, user management.
On GitHub we have a sponsored OSS plan. So we have extra features for free, but it not enough by far. User management: hard to get people out. You can't contact your members directly. E-mail has been removed, for privacy. Features get added on GitHub, and no complete changelog.
Challenge on GitHub: we have public repositories, but we also have our deployments in there. Only really secure would be private repositories, otherwise the danger is that credentials or secret could get stolen. Every developer with access becomes an attack vector. Auditing is available for only 6 months. A simple question like: who has been active for the last 2 years? No, can't do.
Some actionable items on GitHub:
- We will separate the contributor agreement check from the organisation membership. We create a hidden team for those who signed, and use that in the check.
- Cleanup users, use Contributors team, Developers
- Active members: check who has contributed the last years.
- There have been security incidents. Someone accidentally removed a few repositories. Someone's account got hacked, luckily discovered within a few hours, and some actions had already been taken.
- More fine grained teams to control repository access.
- Use of GitHub Discussions for some central communication of changes.
- Use project management better.
- The elephant in the room that we have practice on this year, and ongoing: the Collective organisation. This was free for all, very nice, but the development world is not a nice and safe place anymore. So we already needed to lock down some things there.
- Keep deployments and the secrets all out of GitHub, so no secrets can be stolen.
Google Workspace:
- We are dependent on this.
- No user management. Admins have had access because they were on the board, but they kept access after leaving the board. So remove most inactive users.
- Spam and moderation issues
- We could move to Google docs for all kinds of things. Use Google workspace drives for all things. But the Drive UI is a mess, so docs can be in your personal account without you realizing it.
User management:
- We need separate standalone user management, but implementation is not clear.
- We cannot contact our members one on one.
Oh yes, Plone websites:
- upgrade plone.org
- self preservation: I know what needs to be done, and can do it, but have no time, focusing on the previous points instead.
22 Jan 2026 9:43am GMT
Django community aggregator: Community blog posts
Python Leiden meetup: PostgreSQL + Python in 2026 -- Aleksandr Dinu
(One of my summaries of the Python Leiden meetup in Leiden, NL).
He's going to revisit common gotchas of Python ORM usage. Plus some Postgresql-specific tricks.
ORM (object relational mappers) define tables, columns etc using Python concepts: classes, attributes and methods. In your software, you work with objects instead of rows. They can help with database schema management (migrations and so). It looks like this:
class Question(models.Model):
question = models.Charfield(...)
answer = models.Charfield(...)
You often have Python "context managers" for database sessions.
ORMs are handy, but you must be beware of what you're fetching:
# Bad, grabs all objects and then takes the length using python: questions_count = len(Question.objects.all()) # Good: let the database do it, # the code does the equivalent of "SELECT COUNT(*)": questions_count = Question.objects.all().count()
Relational databases allow 1:M and N:M relations. You use them with JOIN in SQL. If you use an ORM, make sure you use the database to follow the relations. If you first grab the first set of objects and then grab the second kind of objects with python, your code will be much slower.
"Migrations" generated by your ORM to move from one version of your schema to the next are real handy. But not all SQL concepts can be expressed in an ORM. Custom types, stored procedures. You have to handle them yourselves. You can get undesired behaviour as specific database versions can take a long time rebuilding after a change.
Migrations are nice, but they can lead to other problems from a database maintainer's point of view, like the performance suddenly dropping. And optimising is hard as often you don't know which server is connecting how much and also you don't know what is queried. Some solutions for postgresql:
- log_line_prefix = '%a %u %d" to show who is connecting to which database.
- log_min_duration_statement = 1000 logs every query taking more than 1000ms.
- log_lock_waits = on for feedback on blocking operations (like migrations).
- Handy: feedback on the number of queries being done, as simple programming errors can translate into lots of small queries instead of one faster bigger one.
If you've found a slow query, run that query with EXPLAIN (ANALYZE, BUFFERS) the-query. BUFFERS tells you how many pages of 8k the server uses for your query (and whether those were memory or disk pages). This is so useful that they made it the default in postgresql 18.
Some tools:
- RegreSQL: performance regression testing. You feed it a list of queries that you worry about. It will store how those queries are executed and compare it with the new version of your code and warn you when one of those queries suddenly takes a lot more time.
- Squawk: tells you (in CI, like github actions) which migrations are backward-incompatible or that might take a long time.
- You can look at one of the branching tools: aimed at getting access to production databases for testing. Like running your migration against a "branch"/copy of production. There are several tricks that are used, like filesystem layers. "pg_branch" and "pgcow" are examples. Several DB-as-a-service products also provide it (Databricks Lakebase, Neon, Heroku, Postgres.ai).
22 Jan 2026 5:00am GMT
Python Leiden meetup: PR vs ROC curves, which to use - Sultan K. Imangaliyev
(One of my summaries of the Python Leiden meetup in Leiden, NL).
Precision-recall (PR) versus Receiver Operating Characteristics (ROC) curves: which one to use if data is imbalanced?
Imbalanced data: for instance when you're investigating rare diseases. "Rare" means few people have them. So if you have data, most of the data will be of healthy people, there's a huge imbalance in the data.
Sensitivity versus specificity: sensitive means you find most of the sick people, specificity means you want as few false negatives and false positives as possible. Sensitivity/specificity looks a bit like precision/recall.
- Sensitivity: true positive rate.
- Specificity: false positive rate
If you classify, you can classify immediately into healthy/sick, but you can also use a probabilistic classifier which returns a chance (percentage) that someone can be classified as sick. You can then tweak which threshold you want to use: how sensitive and/or specific do you want to be?
PR and ROC curves (curve = graph showing the sensitivity/specificity relation on two axis) are two ways of measuring/visualising the sensitivity/specificity relation. He showed some data: if the data is imbalanced, PR is much better at evaluating your model. He compared balanced and imbalanced data with ROC and there was hardly a change in the curve.
He used scikit-learn for his data evaluations and demos.
22 Jan 2026 5:00am GMT
20 Jan 2026
Django community aggregator: Community blog posts
Django Icon packs with template partials
As I have been building out the startup (launching very very soon), I was struck by the pattern that I mentioned last year of Claude grouping lots partials into a single file. I think we are almost at the point of being able to define simple components in a template without a templatetag. My goal here is discovering how much we can do with the vanilla Django Template language with the recent additions. I still need to work on the specifics, but currently I have a file called icons.html in my project which has the lots of the following:
{# Heroicons (outline, 24px) as template partials #}
{# Usage: {% include "icons/heroicons.html#wallet" %} #}
{% partialdef wallet %}
<svg class="size-5 text-gray-500 group-has-[:checked]:text-white"
viewBox="0 0 24 24"
fill="none"
stroke="currentColor"
stroke-width="1.5"
stroke-linecap="round"
stroke-linejoin="round">
<path d="M21 12C21 10.7574 19.9926 9.75 18.75 9.75H15C15 11.4069 13.6569 12.75 12 12.75C10.3431 12.75 9 11.4069 9 9.75H5.25C4.00736 9.75 3 10.7574 3 12M21 12V18C21 19.2426 19.9926 20.25 18.75 20.25H5.25C4.00736 20.25 3 19.2426 3 18V12M21 12V9M3 12V9M21 9C21 7.75736 19.9926 6.75 18.75 6.75H5.25C4.00736 6.75 3 7.75736 3 9M21 9V6C21 4.75736 19.9926 3.75 18.75 3.75H5.25C4.00736 3.75 3 4.75736 3 6V9" />
</svg>
{% endpartialdef wallet %}
{% partialdef percent-badge %}
<svg class="size-5 text-gray-500 group-has-[:checked]:text-white">
....
</svg>
{% endpartialdef percent-badge %}
{# etc.. #}
The general problem here is quite a few things are hardcode so to have different coloured icons would be repeating each partial which is obviously not ideal. I have a proof of concept that would allow this kind of repeatablity. It looks like this:
{% partialdef star-solid %}
<path fill-rule="evenodd" d="M10.788 3.21c.448-1.077 1.976-1.077 2.424 0l2.082 5.006 5.404.434c1.164.093 1.636 1.545.749 2.305l-4.117 3.527 1.257 5.273c.271 1.136-.964 2.033-1.96 1.425L12 18.354 7.373 21.18c-.996.608-2.231-.29-1.96-1.425l1.257-5.273-4.117-3.527c-.887-.76-.415-2.212.749-2.305l5.404-.434 2.082-5.005Z" clip-rule="evenodd" />
{% endpartialdef star-solid %}
{% partialdef credit-card %}
<path stroke-linecap="round" stroke-linejoin="round" d="M2.25 8.25h19.5M2.25 9h19.5m-16.5 5.25h6m-6 2.25h3m-3.75 3h15a2.25 2.25 0 0 0 2.25-2.25V6.75A2.25 2.25 0 0 0 19.5 4.5h-15a2.25 2.25 0 0 0-2.25 2.25v10.5A2.25 2.25 0 0 0 4.5 19.5Z" />
{% endpartialdef credit-card %}
{% partialdef icon %}
<svg class="{{ css_classes }}" fill="none" stroke-width="1.5"
stroke="currentColor" viewBox="0 0 24 24">
{% with icon_path="icons/heroicons.html#"|add:icon_name %}
{% include icon_path %}
{% endwith %}
</svg>
{% endpartialdef icon %}
and is used like so:
{% include "icons/heroicons.html#icon" with icon_name='credit-cards' css_classes='size-10' %}
There are a couple of issues that need to be figured out. First is the being allowed to specify different html attributes on the svg tag such as fill, stroke, stroke-width etc. I don't have a solution, but I do know third-party libraries have template tags to handle this. Second is the hack to dynamically include a partial in the same file. Ideally the in the icon partial, we could specify the partial directly with a variable over using the with and include tags.
I'm going to keep playing with this, but I like the appeal of having a single drop in template which then has a whole template pack available to a project. I could easily see a package which then has a file per icon pack (heroicons, font awesome, flaticon etc).
20 Jan 2026 6:00am GMT
05 Jan 2026
Planet Twisted
Glyph Lefkowitz: How To Argue With Me About AI, If You Must
As you already know if you've read any of this blog in the last few years, I am a somewhat reluctant - but nevertheless quite staunch - critic of LLMs. This means that I have enthusiasts of varying degrees sometimes taking issue with my stance.
It seems that I am not going to get away from discussions, and, let's be honest, pretty intense arguments about "AI" any time soon. These arguments are starting to make me quite upset. So it might be time to set some rules of engagement.
I've written about all of these before at greater length, but this is a short post because it's not about the technology or making a broader point, it's about me. These are rules for engaging with me, personally, on this topic. Others are welcome to adopt these rules if they so wish but I am not encouraging anyone to do so.
Thus, I've made this post as short as I can so everyone interested in engaging can read the whole thing. If you can't make it through to the end, then please just follow Rule Zero.
Rule Zero: Maybe Don't
You are welcome to ignore me. You can think my take is stupid and I can think yours is. We don't have to get into an Internet Fight about it; we can even remain friends. You do not need to instigate an argument with me at all, if you think that my analysis is so bad that it doesn't require rebutting.
Rule One: No 'Just'
As I explained in a post with perhaps the least-predictive title I've ever written, "I Think I'm Done Thinking About genAI For Now", I've already heard a bunch of bad arguments. Don't tell me to 'just' use a better model, use an agentic tool, use a more recent version, or use some prompting trick that you personally believe works better. If you skim my work and think that I must not have deeply researched anything or read about it because you don't like my conclusion, that is wrong.
Rule Two: No 'Look At This Cool Thing'
Purely as a productivity tool, I have had a terrible experience with genAI. Perhaps you have had a great one. Neat. That's great for you. As I explained at great length in "The Futzing Fraction", my concern with generative AI is that I believe it is probably a net negative impact on productivity, based on both my experience and plenty of citations. Go check out the copious footnotes if you're interested in more detail.
Therefore, I have already acknowledged that you can get an LLM to do various impressive, cool things, sometimes. If I tell you that you will, on average, lose money betting on a slot machine, a picture of a slot machine hitting a jackpot is not evidence against my position.
Rule Two And A Half: Engage In Metacognition
I specifically didn't title the previous rule "no anecdotes" because data beyond anecdotes may be extremely expensive to produce. I don't want to say you can never talk to me unless you're doing a randomized controlled trial. However, if you are going to tell me an anecdote about the way that you're using an LLM, I am interested in hearing how you are compensating for the well-documented biases that LLM use tends to induce. Try to measure what you can.
Rule Three: Do Not Cite The Deep Magic To Me
As I explained in "A Grand Unified Theory of the AI Hype Cycle", I already know quite a bit of history of the "AI" label. If you are tempted to tell me something about how "AI" is really such a broad field, and it doesn't just mean LLMs, especially if you are trying to launder the reputation of LLMs under the banner of jumbling them together with other things that have been called "AI", I assure you that this will not be convincing to me.
Rule Four: Ethics Are Not Optional
I have made several arguments in my previous writing: there are ethical arguments, efficacy arguments, structuralist arguments, efficiency arguments and aesthetic arguments.
I am happy to, for the purposes of a good-faith discussion, focus on a specific set of concerns or an individual point that you want to make where you think I got something wrong. If you convince me that I am entirely incorrect about the effectiveness or predictability of LLMs in general or as specific LLM product, you don't need to make a comprehensive argument about whether one should use the technology overall. I will even assume that you have your own ethical arguments.
However, if you scoff at the idea that one should have any ethical boundaries at all, and think that there's no reason to care about the overall utilitarian impact of this technology, that it's worth using no matter what else it does as long as it makes you 5% better at your job, that's sociopath behavior.
This includes extreme whataboutism regarding things like the water use of datacenters, other elements of the surveillance technology stack, and so on.
Consequences
These are rules, once again, just for engaging with me. I have no particular power to enact broader sanctions upon you, nor would I be inclined to do so if I could. However, if you can't stay within these basic parameters and you insist upon continuing to direct messages to me about this topic, I will summarily block you with no warning, on mastodon, email, GitHub, IRC, or wherever else you're choosing to do that. This is for your benefit as well: such a discussion will not be a productive use of either of our time.
05 Jan 2026 5:22am GMT
02 Jan 2026
Planet Twisted
Glyph Lefkowitz: The Next Thing Will Not Be Big
The dawning of a new year is an opportune moment to contemplate what has transpired in the old year, and consider what is likely to happen in the new one.
Today, I'd like to contemplate that contemplation itself.
The 20th century was an era characterized by rapidly accelerating change in technology and industry, creating shorter and shorter cultural cycles of changes in lifestyles. Thus far, the 21st century seems to be following that trend, at least in its recently concluded first quarter.
The early half of the twentieth century saw the massive disruption caused by electrification, radio, motion pictures, and then television.
In 1971, Intel poured gasoline on that fire by releasing the 4004, a microchip generally recognized as the first general-purpose microprocessor. Popular innovations rapidly followed: the computerized cash register, the personal computer, credit cards, cellular phones, text messaging, the Internet, the web, online games, mass surveillance, app stores, social media.
These innovations have arrived faster than previous generations, but also, they have crossed a crucial threshold: that of the human lifespan.
While the entire second millennium A.D. has been characterized by a gradually accelerating rate of technological and social change - the printing press and the industrial revolution were no slouches, in terms of changing society, and those predate the 20th century - most of those changes had the benefit of unfolding throughout the course of a generation or so.
Which means that any individual person in any given century up to the 20th might remember one major world-altering social shift within their lifetime, not five to ten of them. The diversity of human experience is vast, but most people would not expect that the defining technology of their lifetime was merely the latest in a progression of predictable civilization-shattering marvels.
Along with each of these successive generations of technology, we minted a new generation of industry titans. Westinghouse, Carnegie, Sarnoff, Edison, Ford, Hughes, Gates, Jobs, Zuckerberg, Musk. Not just individual rich people, but entire new classes of rich people that did not exist before. "Radio DJ", "Movie Star", "Rock Star", "Dot Com Founder", were all new paths to wealth opened (and closed) by specific technologies. While most of these people did come from at least some level of generational wealth, they no longer came from a literal hereditary aristocracy.
To describe this new feeling of constant acceleration, a new phrase was coined: "The Next Big Thing". In addition to denoting that some Thing was coming and that it would be Big (i.e.: that it would change a lot about our lives), this phrase also carries the strong implication that such a Thing would be a product. Not a development in social relationships or a shift in cultural values, but some new and amazing form of conveying salted meat paste or what-have-you, that would make whatever lucky tinkerer who stumbled into it into a billionaire - along with any friends and family lucky enough to believe in their vision and get in on the ground floor with an investment.
In the latter part of the 20th century, our entire model of capital allocation shifted to account for this widespread belief. No longer were mega-businesses built by bank loans, stock issuances, and reinvestment of profit, the new model was "Venture Capital". Venture capital is a model of capital allocation explicitly predicated on the idea that carefully considering each bet on a likely-to-succeed business and reducing one's risk was a waste of time, because the return on the equity from the Next Big Thing would be so disproportionately huge - 10x, 100x, 1000x - that one could afford to make at least 10 bad bets for each good one, and still come out ahead.
The biggest risk was in missing the deal, not in giving a bunch of money to a scam. Thus, value investing and focus on fundamentals have been broadly disregarded in favor of the pursuit of the Next Big Thing.
If Americans of the twentieth century were temporarily embarrassed millionaires, those of the twenty-first are all temporarily embarrassed FAANG CEOs.
The predicament that this tendency leaves us in today is that the world is increasingly run by generations - GenX and Millennials - with the shared experience that the computer industry, either hardware or software, would produce some radical innovation every few years. We assume that to be true.
But all things change, even change itself, and that industry is beginning to slow down. Physically, transistor density is starting to brush up against physical limits. Economically, most people are drowning in more compute power than they know what to do with anyway. Users already have most of what they need from the Internet.
The big new feature in every operating system is a bunch of useless junk nobody really wants and is seeing remarkably little uptake. Social media and smartphones changed the world, true, but… those are both innovations from 2008. They're just not new any more.
So we are all - collectively, culturally - looking for the Next Big Thing, and we keep not finding it.
It wasn't 3D printing. It wasn't crowdfunding. It wasn't smart watches. It wasn't VR. It wasn't the Metaverse, it wasn't Bitcoin, it wasn't NFTs1.
It's also not AI, but this is why so many people assume that it will be AI. Because it's got to be something, right? If it's got to be something then AI is as good a guess as anything else right now.
The fact is, our lifetimes have been an extreme anomaly. Things like the Internet used to come along every thousand years or so, and while we might expect that the pace will stay a bit higher than that, it is not reasonable to expect that something new like "personal computers" or "the Internet"3 will arrive again.
We are not going to get rich by getting in on the ground floor of the next Apple or the next Google because the next Apple and the next Google are Apple and Google. The industry is maturing. Software technology, computer technology, and internet technology are all maturing.
There Will Be Next Things
Research and development is happening in all fields all the time. Amazing new developments quietly and regularly occur in pharmaceuticals and in materials science. But these are not predictable. They do not inhabit the public consciousness until they've already happened, and they are rarely so profound and transformative that they change everybody's life.
There will even be new things in the computer industry, both software and hardware. Foldable phones do address a real problem (I wish the screen were even bigger but I don't want to carry around such a big device), and would probably be more popular if they got the costs under control. One day somebody's going to crack the problem of volumetric displays, probably. Some VR product will probably, eventually, hit a more realistic price/performance ratio where the niche will expand at least a little more.
Maybe there will even be something genuinely useful, which is recognizably adjacent to the current "AI" fad, but if it is, it will be some new development that we haven't seen yet. If current AI technology were sufficient to drive some interesting product, it would already be doing it, not using marketing disguised as science to conceal diminishing returns on current investments.
But They Will Not Be Big
The impulse to find the One Big Thing that will dominate the next five years is a fool's errand. Incremental gains are diminishing across the board. The markets for time and attention2 are largely saturated. There's no need for another streaming service if 100% of your leisure time is already committed to TikTok, YouTube and Netflix; famously, Netflix has already considered sleep its primary competitor for close to a decade - years before the pandemic.
Those rare tech markets which aren't saturated are suffering from pedestrian economic problems like wealth inequality, not technological bottlenecks.
For example, the thing preventing the development of a robot that can do your laundry and your dishes without your input is not necessarily that we couldn't build something like that, but that most households just can't afford it without wage growth catching up to productivity growth. It doesn't make sense for anyone to commit to the substantial R&D investment that such a thing would take, if the market doesn't exist because the average worker isn't paid enough to afford it on top of all the other tech which is already required to exist in society.
The projected income from the tiny, wealthy sliver of the population who could pay for the hardware, cannot justify an investment in the software past a fake version remotely operated by workers in the global south, only made possible by Internet wage arbitrage, i.e. a more palatable, modern version of indentured servitude.
Even if we were to accept the premise of an actually-"AI" version of this, that is still just a wish that ChatGPT could somehow improve enough behind the scenes to replace that worker, not any substantive investment in a novel, proprietary-to-the-chores-robot software system which could reliably perform specific functions.
What, Then?
The expectation for, and lack of, a "big thing" is a big problem. There are others who could describe its economic, political, and financial dimensions better than I can. So then let me speak to my expertise and my audience: open source software developers.
When I began my own involvement with open source, a big part of the draw for me was participating in a low-cost (to the corporate developer) but high-value (to society at large) positive externality. None of my employers would ever have cared about many of the applications for which Twisted forms a core bit of infrastructure; nor would I have been able to predict those applications' existence. Yet, it is nice to have contributed to their development, even a little bit.
However, it's not actually a positive externality if the public at large can't directly benefit from it.
When real world-changing, disruptive developments are occurring, the bean-counters are not watching positive externalities too closely. As we discovered with many of the other benefits that temporarily accrued to labor in the tech economy, Open Source that is usable by individuals and small companies may have been a ZIRP. If you know you're gonna make a billion dollars you're not going to worry about giving away a few hundred thousand here and there.
When gains are smaller and harder to realize, and margins are starting to get squeezed, it's harder to justify the investment in vaguely good vibes.
But this, itself, is not a call to action. I doubt very much that anyone reading this can do anything about the macroeconomic reality of higher interest rates. The technological reality of "development is happening slower" is inherently something that you can't change on purpose.
However, what we can do is to be aware of this trend in our own work.
Fight Scale Creep
It seems to me that more and more open source infrastructure projects are tools for hyper-scale application development, only relevant to massive cloud companies. This is just a subjective assessment on my part - I'm not sure what tools even exist today to measure this empirically - but I remember a big part of the open source community when I was younger being things like Inkscape, Themes.Org and Slashdot, not React, Docker Hub and Hacker News.
This is not to say that the hobbyist world no longer exists. There is of course a ton of stuff going on with Raspberry Pi, Home Assistant, OwnCloud, and so on. If anything there's a bit of a resurgence of self-hosting. But the interests of self-hosters and corporate developers are growing apart; there seems to be far less of a beneficial overflow from corporate infrastructure projects into these enthusiast or prosumer communities.
This is the concrete call to action: if you are employed in any capacity as an open source maintainer, dedicate more energy to medium- or small-scale open source projects.
If your assumption is that you will eventually reach a hyper-scale inflection point, then mimicking Facebook and Netflix is likely to be a good idea. However, if we can all admit to ourselves that we're not going to achieve a trillion-dollar valuation and a hundred thousand engineer headcount, we can begin to consider ways to make our Next Thing a bit smaller, and to accommodate the world as it is rather than as we wish it would be.
Be Prepared to Scale Down
Here are some design guidelines you might consider, for just about any open source project, particularly infrastructure ones:
-
Don't assume that your software can sustain an arbitrarily large fixed overhead because "you just pay that cost once" and you're going to be running a billion instances so it will always amortize; maybe you're only going to be running ten.
-
Remember that such fixed overhead includes not just CPU, RAM, and filesystem storage, but also the learning curve for developers. Front-loading a massive amount of conceptual complexity to accommodate the problems of hyper-scalers is a common mistake. Try to smooth out these complexities and introduce them only when necessary.
-
Test your code on edge devices. This means supporting Windows and macOS, and even Android and iOS. If you want your tool to help empower individual users, you will need to meet them where they are, which is not on an EC2 instance.
-
This includes considering Desktop Linux as a platform, as opposed to Server Linux as a platform, which (while they certainly have plenty in common) they are also distinct in some details. Consider the highly specific example of secret storage: if you are writing something that intends to live in a cloud environment, and you need to configure it with a secret, you will probably want to provide it via a text file or an environment variable. By contrast, if you want this same code to run on a desktop system, your users will expect you to support the Secret Service. This will likely only require a few lines of code to accommodate, but it is a massive difference to the user experience.
-
Don't rely on LLMs remaining cheap or free. If you have LLM-related features4, make sure that they are sufficiently severable from the rest of your offering that if ChatGPT starts costing $1000 a month, your tool doesn't break completely. Similarly, do not require that your users have easy access to half a terabyte of VRAM and a rack full of 5090s in order to run a local model.
Even if you were going to scale up to infinity, the ability to scale down and consider smaller deployments means that you can run more comfortably on, for example, a developer's laptop. So even if you can't convince your employer that this is where the economy and the future of technology in our lifetimes is going, it can be easy enough to justify this sort of design shift, particularly as individual choices. Make your onboarding cheaper, your development feedback loops tighter, and your systems generally more resilient to economic headwinds.
So, please design your open source libraries, applications, and services to run on smaller devices, with less complexity. It will be worth your time as well as your users'.
But if you can fix the whole wealth inequality thing, do that first.
Acknowledgments
Thank you to my patrons who are supporting my writing on this blog. If you like what you've read here and you'd like to read more of it, or you'd like to support my various open-source endeavors, you can support my work as a sponsor!
-
These sorts of lists are pretty funny reads, in retrospect. ↩
-
Which is to say, "distraction". ↩
-
... or even their lesser-but-still-profound aftershocks like "Social Media", "Smartphones", or "On-Demand Streaming Video" ... secondary manifestations of the underlying innovation of a packet-switched global digital network ... ↩
-
My preference would of course be that you just didn't have such features at all, but perhaps even if you agree with me, you are part of an organization with some mandate to implement LLM stuff. Just try not to wrap the chain of this anchor all the way around your code's neck. ↩
02 Jan 2026 1:59am GMT
11 Nov 2025
Planet Twisted
Glyph Lefkowitz: The “Dependency Cutout” Workflow Pattern, Part I
Tell me if you've heard this one before.
You're working on an application. Let's call it "FooApp". FooApp has a dependency on an open source library, let's call it "LibBar". You find a bug in LibBar that affects FooApp.
To envisage the best possible version of this scenario, let's say you actively like LibBar, both technically and socially. You've contributed to it in the past. But this bug is causing production issues in FooApp today, and LibBar's release schedule is quarterly. FooApp is your job; LibBar is (at best) your hobby. Blocking on the full upstream contribution cycle and waiting for a release is an absolute non-starter.
What do you do?
There are a few common reactions to this type of scenario, all of which are bad options.
I will enumerate them specifically here, because I suspect that some of them may resonate with many readers:
-
Find an alternative to LibBar, and switch to it.
This is a bad idea because a transition to a core infrastructure component could be extremely expensive.
-
Vendor LibBar into your codebase and fix your vendored version.
This is a bad idea because carrying this one fix now requires you to maintain all the tooling associated with a monorepo1: you have to be able to start pulling in new versions from LibBar regularly, reconcile your changes even though you now have a separate version history on your imported version, and so on.
-
Monkey-patch LibBar to include your fix.
This is a bad idea because you are now extremely tightly coupled to a specific version of LibBar. By modifying LibBar internally like this, you're inherently violating its compatibility contract, in a way which is going to be extremely difficult to test. You can test this change, of course, but as LibBar changes, you will need to replicate any relevant portions of its test suite (which may be its entire test suite) in FooApp. Lots of potential duplication of effort there.
-
Implement a workaround in your own code, rather than fixing it.
This is a bad idea because you are distorting the responsibility for correct behavior. LibBar is supposed to do LibBar's job, and unless you have a full wrapper for it in your own codebase, other engineers (including "yourself, personally") might later forget to go through the alternate, workaround codepath, and invoke the buggy LibBar behavior again in some new place.
-
Implement the fix upstream in LibBar anyway, because that's the Right Thing To Do, and burn credibility with management while you anxiously wait for a release with the bug in production.
This is a bad idea because you are betraying your users - by allowing the buggy behavior to persist - for the workflow convenience of your dependency providers. Your users are probably giving you money, and trusting you with their data. This means you have both ethical and economic obligations to consider their interests.
As much as it's nice to participate in the open source community and take on an appropriate level of burden to maintain the commons, this cannot sustainably be at the explicit expense of the population you serve directly.
Even if we only care about the open source maintainers here, there's still a problem: as you are likely to come under immediate pressure to ship your changes, you will inevitably relay at least a bit of that stress to the maintainers. Even if you try to be exceedingly polite, the maintainers will know that you are coming under fire for not having shipped the fix yet, and are likely to feel an even greater burden of obligation to ship your code fast.
Much as it's good to contribute the fix, it's not great to put this on the maintainers.
The respective incentive structures of software development - specifically, of corporate application development and open source infrastructure development - make options 1-4 very common.
On the corporate / application side, these issues are:
-
it's difficult for corporate developers to get clearance to spend even small amounts of their work hours on upstream open source projects, but clearance to spend time on the project they actually work on is implicit. If it takes 3 hours of wrangling with Legal2 and 3 hours of implementation work to fix the issue in LibBar, but 0 hours of wrangling with Legal and 40 hours of implementation work in FooApp, a FooApp developer will often perceive it as "easier" to fix the issue downstream.
-
it's difficult for corporate developers to get clearance from management to spend even small amounts of money sponsoring upstream reviewers, so even if they can find the time to contribute the fix, chances are high that it will remain stuck in review unless they are personally well-integrated members of the LibBar development team already.
-
even assuming there's zero pressure whatsoever to avoid open sourcing the upstream changes, there's still the fact inherent to any development team that FooApp's developers will be more familiar with FooApp's codebase and development processes than they are with LibBar's. It's just easier to work there, even if all other things are equal.
-
systems for tracking risk from open source dependencies often lack visibility into vendoring, particularly if you're doing a hybrid approach and only vendoring a few things to address work in progress, rather than a comprehensive and disciplined approach to a monorepo. If you fully absorb a vendored dependency and then modify it, Dependabot isn't going to tell you that a new version is available any more, because it won't be present in your dependency list. Organizationally this is bad of course but from the perspective of an individual developer this manifests mostly as fewer annoying emails.
But there are problems on the open source side as well. Those problems are all derived from one big issue: because we're often working with relatively small sums of money, it's hard for upstream open source developers to consume either money or patches from application developers. It's nice to say that you should contribute money to your dependencies, and you absolutely should, but the cost-benefit function is discontinuous. Before a project reaches the fiscal threshold where it can be at least one person's full-time job to worry about this stuff, there's often no-one responsible in the first place. Developers will therefore gravitate to the issues that are either fun, or relevant to their own job.
These mutually-reinforcing incentive structures are a big reason that users of open source infrastructure, even teams who work at corporate users with zillions of dollars, don't reliably contribute back.
The Answer We Want
All those options are bad. If we had a good option, what would it look like?
It is both practically necessary3 and morally required4 for you to have a way to temporarily rely on a modified version of an open source dependency, without permanently diverging.
Below, I will describe a desirable abstract workflow for achieving this goal.
Step 0: Report the Problem
Before you get started with any of these other steps, write up a clear description of the problem and report it to the project as an issue; specifically, in contrast to writing it up as a pull request. Describe the problem before submitting a solution.
You may not be able to wait for a volunteer-run open source project to respond to your request, but you should at least tell the project what you're planning on doing.
If you don't hear back from them at all, you will have at least made sure to comprehensively describe your issue and strategy beforehand, which will provide some clarity and focus to your changes.
If you do hear back from them, in the worst case scenario, you may discover that a hard fork will be necessary because they don't consider your issue valid, but even that information will save you time, if you know it before you get started. In the best case, you may get a reply from the project telling you that you've misunderstood its functionality and that there is already a configuration parameter or usage pattern that will resolve your problems with no new code. But in all cases, you will benefit from early coordination on what needs fixing before you get to how to fix it.
Step 1: Source Code and CI Setup
Fork the source code for your upstream dependency to a writable location where it can live at least for the duration of this one bug-fix, and possibly for the duration of your application's use of the dependency. After all, you might want to fix more than one bug in LibBar.
You want to have a place where you can put your edits, that will be version controlled and code reviewed according to your normal development process. This probably means you'll need to have your own main branch that diverges from your upstream's main branch.
Remember: you're going to need to deploy this to your production, so testing gates that your upstream only applies to final releases of LibBar will need to be applied to every commit here.
Depending on your LibBar's own development process, this may result in slightly unusual configurations where, for example, your fixes are written against the last LibBar release tag, rather than its current5 main; if the project has a branch-freshness requirement, you might need two branches, one for your upstream PR (based on main) and one for your own use (based on the release branch with your changes).
Ideally for projects with really good CI and a strong "keep main release-ready at all times" policy, you can deploy straight from a development branch, but it's good to take a moment to consider this before you get started. It's usually easier to rebase changes from an older HEAD onto a newer one than it is to go backwards.
Speaking of CI, you will want to have your own CI system. The fact that GitHub Actions has become a de-facto lingua franca of continuous integration means that this step may be quite simple, and your forked repo can just run its own instance.
Optional Bonus Step 1a: Artifact Management
If you have an in-house artifact repository, you should set that up for your dependency too, and upload your own build artifacts to it. You can often treat your modified dependency as an extension of your own source tree and install from a GitHub URL, but if you've already gone to the trouble of having an in-house package repository, you can pretend you've taken over maintenance of the upstream package temporarily (which you kind of have) and leverage those workflows for caching and build-time savings as you would with any other internal repo.
Step 2: Do The Fix
Now that you've got somewhere to edit LibBar's code, you will want to actually fix the bug.
Step 2a: Local Filesystem Setup
Before you have a production version on your own deployed branch, you'll want to test locally, which means having both repositories in a single integrated development environment.
At this point, you will want to have a local filesystem reference to your LibBar dependency, so that you can make real-time edits, without going through a slow cycle of pushing to a branch in your LibBar fork, pushing to a FooApp branch, and waiting for all of CI to run on both.
This is useful in both directions: as you prepare the FooApp branch that makes any necessary updates on that end, you'll want to make sure that FooApp can exercise the LibBar fix in any integration tests. As you work on the LibBar fix itself, you'll also want to be able to use FooApp to exercise the code and see if you've missed anything - and this, you wouldn't get in CI, since LibBar can't depend on FooApp itself.
In short, you want to be able to treat both projects as an integrated development environment, with support from your usual testing and debugging tools, just as much as you want your deployment output to be an integrated artifact.
Step 2b: Branch Setup for PR
However, for continuous integration to work, you will also need to have a remote resource reference of some kind from FooApp's branch to LibBar. You will need 2 pull requests: the first to land your LibBar changes to your internal LibBar fork and make sure it's passing its own tests, and then a second PR to switch your LibBar dependency from the public repository to your internal fork.
At this step it is very important to ensure that there is an issue filed on your own internal backlog to drop your LibBar fork. You do not want to lose track of this work; it is technical debt that must be addressed.
Until it's addressed, automated tools like Dependabot will not be able to apply security updates to LibBar for you; you're going to need to manually integrate every upstream change. This type of work is itself very easy to drop or lose track of, so you might just end up stuck on a vulnerable version.
Step 3: Deploy Internally
Now that you're confident that the fix will work, and that your temporarily-internally-maintained version of LibBar isn't going to break anything on your site, it's time to deploy.
Some deployment heritage should help to provide some evidence that your fix is ready to land in LibBar, but at the next step, please remember that your production environment isn't necessarily emblematic of that of all LibBar users.
Step 4: Propose Externally
You've got the fix, you've tested the fix, you've got the fix in your own production, you've told upstream you want to send them some changes. Now, it's time to make the pull request.
You're likely going to get some feedback on the PR, even if you think it's already ready to go; as I said, despite having been proven in your production environment, you may get feedback about additional concerns from other users that you'll need to address before LibBar's maintainers can land it.
As you process the feedback, make sure that each new iteration of your branch gets re-deployed to your own production. It would be a huge bummer to go through all this trouble, and then end up unable to deploy the next publicly released version of LibBar within FooApp because you forgot to test that your responses to feedback still worked on your own environment.
Step 4a: Hurry Up And Wait
If you're lucky, upstream will land your changes to LibBar. But, there's still no release version available. Here, you'll have to stay in a holding pattern until upstream can finalize the release on their end.
Depending on some particulars, it might make sense at this point to archive your internal LibBar repository and move your pinned release version to a git hash of the LibBar version where your fix landed, in their repository.
Before you do this, check in with the LibBar core team and make sure that they understand that's what you're doing and they don't have any wacky workflows which may involve rebasing or eliding that commit as part of their release process.
Step 5: Unwind Everything
Finally, you eventually want to stop carrying any patches and move back to an official released version that integrates your fix.
You want to do this because this is what the upstream will expect when you are reporting bugs. Part of the benefit of using open source is benefiting from the collective work to do bug-fixes and such, so you don't want to be stuck off on a pinned git hash that the developers do not support for anyone else.
As I said in step 2b6, make sure to maintain a tracking task for doing this work, because leaving this sort of relatively easy-to-clean-up technical debt lying around is something that can potentially create a lot of aggravation for no particular benefit. Make sure to put your internal LibBar repository into an appropriate state at this point as well.
Up Next
This is part 1 of a 2-part series. In part 2, I will explore in depth how to execute this workflow specifically for Python packages, using some popular tools. I'll discuss my own workflow, standards like PEP 517 and pyproject.toml, and of course, by the popular demand that I just know will come, uv.
Acknowledgments
Thank you to my patrons who are supporting my writing on this blog. If you like what you've read here and you'd like to read more of it, or you'd like to support my various open-source endeavors, you can support my work as a sponsor!
-
if you already have all the tooling associated with a monorepo, including the ability to manage divergence and reintegrate patches with upstream, you already have the higher-overhead version of the workflow I am going to propose, so, never mind. but chances are you don't have that, very few companies do. ↩
-
In any business where one must wrangle with Legal, 3 hours is a wildly optimistic estimate. ↩
-
In an ideal world every project would keep its main branch ready to release at all times, no matter what but we do not live in an ideal world. ↩
-
In this case, there is no question. It's 2b only, no not-2b. ↩
11 Nov 2025 1:44am GMT
