20 May 2025
Planet Python
PyCoder’s Weekly: Issue #682: pylock.toml, NumPy, t-strings, and More (May 20, 2025)
#682 - MAY 20, 2025
View in Browser »
Going Beyond requirements.txt
With pylock.toml
What is the best way to record the Python dependencies for the reproducibility of your projects? What advantages will lock files provide for those projects? This week on the show, we welcome back Python Core Developer Brett Cannon to discuss his journey to bring PEP 751 and the pylock.toml file format to the community.
REAL PYTHON podcast
I Don't Like NumPy
This opinion piece talks about why NumPy gets difficult fast. Two dimensions to your array? No problem, the calc is mostly self evident. Add a couple more dimensions and it gets messy fast. See also the associated HN discussion, which also includes possible solutions.
DYNOMIGHT
Webinar: Inside Dust's AI-Powered Vision for the Future of Work
Join us on June 5 for a behind-the-scenes look at how Dust is building the future of work with AI and why Temporal is at the heart of their platform. We'll explore how Dust is transforming those everyday tasks with intelligent agents that are deeply contextual, resilient, and production-ready →
TEMPORAL sponsor
Unravelling t-strings
PEP 750 introduced t-strings
for Python 3.14. These are a template string mechanism similar to f-strings. Although they are in 3.14.0b1, there isn't any documentation yet, so this post explains what they are how they can be used.
BRETT CANNON
Python Jobs
Senior Software Engineer - Quant Investment Platform (LA or Dallas) (Los Angeles, CA, USA)
Causeway Capital Management LLC
Articles & Tutorials
Working With Missing Data in Polars
In this video course, you'll learn how to deal with missing data in Polars to ensure it doesn't interfere with your data analysis. You'll discover how to check for missing values, update them, and remove them.
REAL PYTHON course
Django Security Best Practices
This guide contains a long list of best practices for securing your Django site. It includes topics such as ensuring HTTPS everywhere, hardening your database, using key built-in security features, and more.
AHMAD SADEDDIN
Read My Lips: Automate Lip Syncing with Wav2Lip
Don't let your message be misunderstood! Use Wav2Lip for automated and AI-powered lip syncing. Get the code now.
INTEL CORPORATION sponsor
t-strings
in Python (PEP 750)
Python 3.14 will include the new t-string concept, an f-string like mechanism for string templating. Talk Python interviews Paul Everitt, David Peck, and Jim Baker and they talk about the upcoming feature.
KENNEDY ET AL podcast
The Database Row That Did and Didn't Exist
Misty ran into a confounding bug, where loading and saving certain Django ORM objects was failing, but not all of them. Read on to understand what went wrong.
MISTY DE MÉO
The Appearing Built-In
Rodrigo discovered that examining the builtins
module in the REPL changes the size of the builtins
module. Read on to find what causes this oddity.
RODRIGO GIRÃO SERRÃO
What Does @Slot()
Do in Qt?
When working with Qt slots and signals in PySide6 you might discover the @Slot
decorator. This post tells you what it does and why you might use it.
MARTIN FITZPATRICK
5 Steps to N-Body Simulation
This article shows you how to simulate gravity in an N-body problem. It starts out with simpler simulation then adds optimizations as it goes along.
ALVIN NG
PEPs & Co.
A little bit of Python history: where the term "PEP" came from, along with a list of similar types of documents for other languages.
HUGO VAN KEMENADE
The First Year of Free-Threaded Python
A recap of the first year of work on enabling support for the free-threaded build of CPython in community packages.
NATHAN GOLDBAUM
Pandas: Drop Columns by Name in DataFrames
A quick overview of the variety of ways to drop columns from a pandas DataFrame, referencing the columns by name.
ERIK MARSJA
Working on Complex Systems: What I Learned at Google
Explore the key characteristics of complex systems and learn practical patterns for tackling complex problems.
TEIVA HARSANYI
Building a Real-Time Dashboard With Flask and Svelte
Learn how to build a real-time analytics dashboard using Flask and Svelte with server-sent events.
AMIR TADRISI
Projects & Code
Events
Python Leiden User Group
May 22, 2025
PYTHONLEIDEN.NL
Python Sheffield
May 27, 2025
GOOGLE.COM
PyCon Italia 2025
May 28 to June 1, 2025
PYCON.IT
PyKla Monthly Meetup
May 28, 2025
MEETUP.COM
Happy Pythoning!
This was PyCoder's Weekly Issue #682.
View in Browser »
[ Subscribe to 🐍 PyCoder's Weekly 💌 - Get the best Python news, articles, and tutorials delivered to your inbox once a week >> Click here to learn more ]
20 May 2025 7:30pm GMT
Ari Lamstein: Course Review: Build AI Agents with Python
I was slow to jump on the LLM bandwagon.
In some ways this is surprising. Although it was a while ago, I did take several AI courses in grad school. So I am probably in a good position to dive into new developments in the field.
But when the LLM wave started I was busy trying to develop software engineering expertise in Python. I wanted to finish that "project" before moving onto something new. It was only a few months ago, when the PR of my multiyear
module was merged to the censusdis
package (link), that I felt like I could turn my attention elsewhere.
Coincidentally, around that time Kevin Markham created his first course on LLMs. I wrote a review of that course here. In this post I review his followup course: Build AI Agents with Python.
The Problem AI Agents Solve
Something I learned in this course is that LLMs like GPT-4o mini are called "foundational models". Foundational models have very real limitations. Consider this conversation I just had with GPT-4o mini:
User: What is your cutoff date?
AI: My knowledge cutoff date is October 2021. This means I do not have information on events or developments that occurred after that date. If you have questions based on information available up to then, feel free to ask!User: Do you have access to the web?
AI: No, I do not have access to the web or the ability to browse the internet. My responses are generated based on a mixture of licensed data, data created by human trainers, and publicly available information up to my knowledge cutoff date in October 2021. If you have any questions within that scope, feel free to ask!User: What is the latest version of Python?
AI: As of my last knowledge update in October 2021, the latest stable release of Python was Python 3.10, which was released on October 4, 2021. For any updates or newer versions released after that date, I recommend checking the official Python website at https://www.python.org/.
In short, the answers that foundational models give are limited to the data they were trained on. For reference, the latest version of Python is 3.13.3, which was released in April 2025.
Augmenting Foundational Models
You can think of AI Agents as augmenting foundational models with tools, data and prompts that you provide them. This allows them to provide better answers to certain questions.
As an example, Build AI Agents with Python walks you though the process of giving GPT-4o mini the "tool" of web search using Tavily. This resulted in the following conversation:
User: What is your cutoff date?
AI: My knowledge cutoff date is in October 2021. I do not have information on events or developments that occurred after that date. However, I can access real-time information through specific tools. If you have a particular query, feel free to ask!User: Do you have access to the web?
AI: Yes, I can access real-time information through a specific tool to search the web for current events and updates. If you have a particular topic or question in mind, let me know, and I can look it up for you!User: What is the latest version of python?
AI: The latest stable version of Python is 3.13.3. You can download it from the official Python website https://www.python.org/downloads/.
I had not heard of Tavily prior to taking this course. It turns out that there is an entire ecosystem of tools around LLMs!
In addition to adding in a tool for web search, the course guides you through integrating custom data sources using Retrieval Augmented Generation (RAG) and building a Q&A system that can interact with a SQL database. For a course that only contains two hours of videos, I felt that it covered a lot.
Like with all online courses, there is an argument that I could have learned all this on my own. But "get your hands dirty with LLMs" had been on my todo list for a while. It was only taking this course that made it happen. If you are looking for a similar push, then you might enjoy this course too.
Wish List
Kevin is gifted at coming up with good examples and clearly explaining his solutions. If there was one change I could make to the course it would be for him to add exercises to it.
Longtime readers might remember that I have created and taught several courses myself. My initial courses did not have exercises. It was only when I read Telling Ain't Training that I learned how valuable they are for students. That book also presents a framework for creating exercises that I still use today.
Next Steps
After taking an online course, I often want to apply what I learned to a project of my own. In this case, I can now imagine adding an LLM feature to my Covid Demographics Explorer.
The dataset that powers that app is in a single CSV file. I think it would be fun to create a chatbot that has access to that file and uses it to answer questions. That would allow users to ask questions about the dataset using natural language and get answers in natural language.
20 May 2025 3:21pm GMT
Real Python: First Steps With LangChain
You've likely interacted with large language models (LLMs), like the ones behind OpenAI's ChatGPT, and experienced their remarkable ability to answer questions, summarize documents, write code, and much more.
While LLMs are remarkable by themselves, with a little programming knowledge, you can leverage libraries like LangChain to create your own LLM-powered applications that can do just about anything.
In this video course, you'll learn how to:
- Use LangChain to build LLM-powered applications
- Create reusable instructions with prompt templates
- Create and extend LangChain chains
- Debug what happens when a chain executes
[ Improve Your Python With 🐍 Python Tricks 💌 - Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
20 May 2025 2:00pm GMT
Django community aggregator: Community blog posts
Python: a quick cProfile recipe with pstats
Python comes with two built-in profilers for measuring the performance of your code: cProfile and profile. They have the same API, but cProfile is a C extension, while profile is implemented in Python. You nearly always want to use cProfile, as it's faster and doesn't skew measurements as much.
By default, cProfile's CLI profiles a command and displays its profile statistics afterwards. But that can be a bit limited, especially for reading large profiles or re-sorting the same data in different ways.
For more flexibility, cProfile can instead save the profile data to a file, which you can then read with the pstats module. This is my preferred way of using it, and this post covers a recipe for doing so, with a worked example.
The recipe
First, profile your script:
$ python -m cProfile -o profile <script> [args]
Replace <script>
with the path to your Python file, and [args]
with any arguments you want to pass to it. cProfile will run your script under its profiling machinery, saving the results to a file called profile
, as specified by the -o
option.
Second, view the profile file using pstats:
$ python -m pstats profile <<< $'sort cumtime\nstats 1000' | less
The pstats CLI provides a REPL for interacting with profile files, based on its Stats
class. The CLI is oddly undocumented, but its help
command lists the available commands.
The above command passes several commands to pstats in a string. The string uses the $
syntax, a Bash feature for C-style strings, allowing \n
to represent a newline, passing two commands:
sort cumtime
: Sort the output by cumulative time, largest first. This means the time spent in a function and all its callees.stats 1000
: Show the first 1,000 lines of the profile.
The output is passed to less
, a common pager, allowing you to scroll through the results. Press q
to quit when you're done!
Profile a module
If you're running a module instead of a script, add -m
like:
$ python -m cProfile -o profile -m <module> [args]
Replace <module>
with the name of the module you want to profile, and [args]
with any arguments you want to pass to it.
Multiple profiles
If you're profiling code before and after, consider using different profile file names instead of just profile
. For example, for checking the results of some optimization, I often use the names before.profile
and after.profile
, like:
$ python -m cProfile -o before.profile example.py
$ git switch optimize_all_the_things
$ python -m cProfile -o after.profile example.py
Alternative sort orders
To sort by other metrics, swap cumtime
in sort cumtime
for one of these values, per the Stats.sort_stats()
documentation:
-
time
: internal time-the time spent in the function itself, excluding calls to other functions.This is useful for finding the slowest functions in your code.
-
calls
: number of calls to the function.This is useful for finding functions that are called many times and may be candidates for optimization, such as caching.
A Djangoey example
Here's a worked example showing how to apply this recipe to a Django management command. Say you are testing a database migration locally:
$ ./manage.py migrate example 0002
Operations to perform:
Target specific migration: 0002_complexito, from example
Running migrations:
Applying example.0002_complexito... OK
While it did pass, it was unexpectedly slow. To profile it, you would first reverse the migration to reset your test database:
$ ./manage.py migrate example 0001
...
Then you could apply the recipe to profile the migration.
First, stick the cProfile
command in front of the migration command:
$ python -m cProfile -o profile ./manage.py migrate example 0002
Operations to perform:
Target specific migration: 0002_complexito, from example
Running migrations:
Applying example.0002_complexito... OK
Then, run the second pstats
command to view the results:
$ python -m pstats profile <<< $'sort cumtime\nstats 1000' | less
This opens less
with a long table, starting:
Welcome to the profile statistics browser.
profile% profile% Mon May 19 23:52:37 2025 profile
213287 function calls (206021 primitive calls) in 1.150 seconds
Ordered by: cumulative time
List reduced from 3576 to 1000 due to restriction <1000>
ncalls tottime percall cumtime percall filename:lineno(function)
425/1 0.001 0.000 1.150 1.150 {built-in method builtins.exec}
1 0.000 0.000 1.150 1.150 ./manage.py:1(<module>)
1 0.000 0.000 1.150 1.150 ./manage.py:7(main)
1 0.000 0.000 1.109 1.109 /.../django/core/management/__init__.py:439(execute_from_command_line)
...
The header tells us how many function calls were made, how many were primitive calls, and how long the code took to run. Then there's the table of all function calls, limited to 1,000 entries.
Since we're sorting by cumtime
, cumulative time spent in each function, the first line shows the total time spent in all functions. That exec
is cProfile running your code, and the later lines represent the top-level wrappers from Django.
Generally, it's best to find the first listed function within your code base. In this profile, you would search for ``example/
and find this entry:
ncalls tottime percall cumtime percall filename:lineno(function)
...
1 0.000 0.000 1.005 1.005 /.../example/migrations/0002_complexito.py:4(forward)
...
One call to the forward()
function in the migration file took 1.005 seconds, nearly all of the 1.150 seconds total runtime. That's a bit suspicious!
Right above that entry, you might also spot the time spent running queries:
ncalls tottime percall cumtime percall filename:lineno(function)
...
13 0.000 0.000 1.007 0.077 /.../django/db/backends/utils.py:78(execute)
13 0.000 0.000 1.007 0.077 /.../django/db/backends/utils.py:88(_execute_with_wrappers)
13 0.000 0.000 1.007 0.077 /.../django/db/backends/utils.py:94(_execute)
13 0.000 0.000 1.007 0.077 /.../django/db/backends/sqlite3/base.py:354(execute)
13 1.006 0.077 1.006 0.077 {function SQLiteCursorWrapper.execute at 0x1054f7f60}
...
This stack of functions all show 13 calls, with a cumulative time of 1.007 or 1.006 seconds. They represent Django's database backend wrappers, which eventually pass the query to Python's SQLiteCursorWrapper.execute()
, which is displayed differently because it's implemented in C.
So, we can tell that the migration ran 13 queries in total, and at least one of them was slow and ran by forward()
. At this point, you might look at the source of forward()
to see if you can find the slow query. But first, you might want to re-display the profile to show only the forward()
function and its callees (the functions it called), which might shed some light on what it was doing.
To show only forward()
and its callees, you can use the pstats callees
command. This takes a regular expression to match the function names you want to show:
$ python -m pstats profile <<< $'sort cumtime\ncallees \\bforward\\b' | less
Welcome to the profile statistics browser.
profile% profile% Ordered by: cumulative time
List reduced from 3576 to 1 due to restriction <'\\bforward\\b'>
Function
called...
ncalls tottime cumtime
/.../example/migrations/0002_complexito.py:4(forward)
-> 1 0.000 0.000 /.../django/db/backends/utils.py:41(__enter__)
1 0.000 0.000 /.../django/db/backends/utils.py:44(__exit__)
1 0.000 1.005 /.../django/db/backends/utils.py:78(execute)
1 0.000 0.000 /.../django/utils/asyncio.py:15(inner)
1 0.000 0.000 {method 'create_function' of 'sqlite3.Connection' objects}
profile%
Goodbye.
(Output wrapped.)
This has revealed:
forward()
only callsexecute()
once, so there's only one slow query.- There's also a call to SQLite's
create_function()
. It's fast, rounding down to 0.000 seconds, but perhaps may be something to do with the slow query.
Okay, time to look at the source:
def forward(apps, schema_editor):
import time
schema_editor.connection.connection.create_function(
"sleep",
1,
time.sleep,
)
with schema_editor.connection.cursor() as cursor:
cursor.execute("SELECT sleep(1)")
Ah, it's a deliberate pause that I added to show you this example. Well, that solves that mystery.
20 May 2025 4:00am GMT
16 May 2025
Django community aggregator: Community blog posts
Django News - Django News is at PyCon US this weekend! - May 16th 2025
Introduction
Django News is at PyCon US this weekend!
Jeff and Will are at PyCon US in Pittsburgh this weekend and would love to meet fellow Django enthusiasts. Drop by the DSF or JetBrains booth to say hello and connect with the many Django community members and DSF folks who will be around all weekend.
Django Newsletter
News
Google Summer of Code 2025 - Django Projects
Three projects out of many worth proposals were accepted. Improvements to Django admin, adding django-template-partials
to core, and automating processes in the Django contribution workflow.
Waiting for Postgres 18: Accelerating Disk Reads with Asynchronous I/O
Postgres 18 introduces asynchronous I/O with new io_method options (worker and io_uring), which can double or triple read performance in high-latency cloud environments.
Django Software Foundation
Simon Charette is the DSF member of the month
Simon Charette is a longtime Django contributor and community member. He served on the Django 5.x Steering Council and is part of the Security team and the Triage and Review team.
Updates to Django
Today 'Updates to Django' is presented by Abigail Afi Gbadago from the DSF Board and Djangonaut Space!🚀
Last week we had 10 pull requests merged into Django by 7 different contributors - including a first-time contributor! Congratulations to Safrone for having their first commits merged into Django - welcome on board!🎉
This week's Django highlights 🌟
- Security release of Django 5.2.1, 5.1.9 and 4.2.21.
- Field names have been added to hints in admin duplicated fields errors.
- Maximum bulk size for SQLite bulk_create and bulk_update methods now respect SQLITE_LIMIT_VARIABLE_NUMBER.
Django Newsletter
Wagtail CMS
Our four contributors for Google Summer of Code 2025
Four GSoC 2025 contributors will extend Wagtail with grid-aware sustainability, strict CSP compatibility, improved media listings, and enhanced keyboard shortcut accessibility.
Sponsored Link 1
Hire Django developers without the hassle!
Building a team of skilled Django developers has never been easier. Trust HackSoft to help you with strategic Django team augmentation. Learn more!
Articles
Django Security Best Practices: A Comprehensive Guide for Software Engineers
Enforce up-to-date Django versions, HTTPS, strong SECRET_KEY
, ORM usage, built-in security middleware, XSS/CSRF defenses, robust authentication, dependency auditing, logging, and monitoring.
18 Years of REVSYS
Revsys marks 18 years offering Python and Django expertise, including code reviews, architectural design, cloud migrations, Kubernetes, CI/CD, AI integration, and team training.
Django: model field choices that can change without a database migration
Use Django 5.0 callable choices to avoid no-op migrations when updating model field choices, though database constraints still require migrations for data integrity.
Algorithms: Learning One's Learnings
Use Big O notation to choose efficient sorting in Django apps, leveraging Python's built-in Timsort or Quick Sort instead of Bubble Sort to improve performance.
Birds and Angles: Dabbling in Django Components
Combining django-bird and dj-angles enables Web-component style reusable Django template components for cleaner syntax and improved readability, despite limited filter parsing for props.
Setting up NGINX Unit (and switching from uWSGI)
Switch Django apps from uWSGI to NGINX Unit using JSON configuration, add SECURE_PROXY_SSL_HEADER
, adjust socket proxy_pass, and enable ASGI/WSGI deployments.
My DjangoCon Europe 2025
Paolo Melchiorre recaps his DjangoCon Europe 2025 experience in Dublin through Mastodon posts covering keynotes, talks on testing, migrations, community events, and mentoring.
Tutorials
Rapid AI-powered applications with Django MongoDB and Voyage API
Learn how to build an LLM-powered recipe recommendation website with Django and MongoDB.
Podcasts
Django Chat #182: Event Sourcing with Chris May
Chris is a Senior Staff Engineer at WellSky, a software company in the health industry. We discuss his background as a graphic designer, learning Python (and Django) as an adult, his multiple conference talks on HTMX, why he's a fan of event sourcing, and more.
Talk Python #505: t-strings in Python (PEP 750)
A panel discussion of PEP 750 on t-strings, scheduled for Python 3.14, which build on the idea of f-strings
to produce a template object rather than a standard string.
Django News Jobs
Python / Django Software Developer - full-time at Off Duty Management 🆕
Backend Python Developer (Django/DRF) at Paytree
Django Newsletter
Projects
astral-sh/ty
An extremely fast Python type checker and language server, written in Rust.
pydantic/pydantic-ai
Agent Framework / shim to use Pydantic with LLMs.
This RSS feed is published on https://django-news.com/. You can also subscribe via email.
16 May 2025 3:00pm GMT
My second DjangoCon Europe
Well I have been meaning to write this for over 2 weeks now, but better late than never! Towards the end of April 2025 I attended the DjangoCon Europe conference and Sprints and it was brilliant and exhausting all in one go.
Let's begin with the travel there, I decided to join those doing the SailRail for a relaxed train ride and crossing the sea to Dublin. This was great as I managed to make some use of the day (work and a blog post) while travelling as well as having some travel companions in the form of Thibaud, Sage, Tom & Daniele.
The next day kicked off the conference with an excellent keynote from Sarah Boyce, and other talks followed thoughout the next 2 days. Databases was a big theme along with community engagement and HTMX. However for me it was walking into the room and meeting folks from the community in person, that I have interacted with online for the past couple of years. This was also coupled with great conversations with friends new & old (mostly around making Django better). I also plucked up the courage and gave a lighting talk on the last day about my year of 100 words.
The evening socials again were excellent! Django Social on Wednesday and the official party on Friday, with a more chill evening going climbing with a couple of interested attendees. The weekend brought the Sprints which were just perfect. I managed to crack on with an open ticket/PR I have for the messages app in Django and also make some good progress on django-prodserver.
It was sad to leave, but reminds me that I want to go next year (if I am allowed by the family!). I am also excited with the energy I felt across the week reminding me that Django is going strong as ever and the communuity has a bright future. I could write more, but I am aware that I need to crack on with today's work, but I will leave you with the recommendation of getting to a DjangoCon if are use Django in any form, you will not be disappointed.
16 May 2025 5:00am GMT
17 Apr 2025
Planet Twisted
Glyph Lefkowitz: Stop Writing `__init__` Methods
The History
Before dataclasses were added to Python in version 3.7 - in June of 2018 - the __init__
special method had an important use. If you had a class representing a data structure - for example a 2DCoordinate
, with x
and y
attributes - you would want to be able to construct it as 2DCoordinate(x=1, y=2)
, which would require you to add an __init__
method with x
and y
parameters.
The other options available at the time all had pretty bad problems:
- You could remove
2DCoordinate
from your public API and instead expose amake_2d_coordinate
function and make it non-importable, but then how would you document your return or parameter types? - You could document the
x
andy
attributes and make the user assign each one themselves, but then2DCoordinate()
would return an invalid object. - You could default your coordinates to 0 with class attributes, and while that would fix the problem with option 2, this would now require all
2DCoordinate
objects to be not just mutable, but mutated at every call site. - You could fix the problems with option 1 by adding a new abstract class that you could expose in your public API, but this would explode the complexity of every new public class, no matter how simple. To make matters worse,
typing.Protocol
didn't even arrive until Python 3.8, so, in the pre-3.7 world this would condemn you to using concrete inheritance and declaring multiple classes even for the most basic data structure imaginable.
Also, an __init__
method that does nothing but assign a few attributes doesn't have any significant problems, so it is an obvious choice in this case. Given all the problems that I just described with the alternatives, it makes sense that it became the obvious default choice, in most cases.
However, by accepting "define a custom __init__
" as the default way to allow users to create your objects, we make a habit of beginning every class with a pile of arbitrary code that gets executed every time it is instantiated.
Wherever there is arbitrary code, there are arbitrary problems.
The Problems
Let's consider a data structure more complex than one that simply holds a couple of attributes. We will create one that represents a reference to some I/O in the external world: a FileReader
.
Of course Python has its own open-file object abstraction, but I will be ignoring that for the purposes of the example.
Let's assume a world where we have the following functions, in an imaginary fileio
module:
open(path: str) -> int
read(fileno: int, length: int)
close(fileno: int)
Our hypothetical fileio.open
returns an integer representing a file descriptor1, fileio.read
allows us to read length
bytes from an open file descriptor, and fileio.close
closes that file descriptor, invalidating it for future use.
With the habit that we have built from writing thousands of __init__
methods, we might want to write our FileReader
class like this:
1 2 3 4 5 6 7 |
|
For our initial use-case, this is fine. Client code creates a FileReader
by doing something like FileReader("./config.json")
, which always creates a FileReader
that maintains its file descriptor int
internally as private state. This is as it should be; we don't want user code to see or mess with _fd
, as that might violate FileReader
's invariants. All the necessary work to construct a valid FileReader
- i.e. the call to open
- is always taken care of for you by FileReader.__init__
.
However, additional requirements will creep in, and as they do, FileReader.__init__
becomes increasingly awkward.
Initially we only care about fileio.open
, but later, we may have to deal with a library that has its own reasons for managing the call to fileio.open
by itself, and wants to give us an int
that we use as our _fd
, we now have to resort to weird workarounds like:
1 2 3 4 |
|
Now, all those nice properties that we got from trying to force object construction to give us a valid object are gone. reader_from_fd
's type signature, which takes a plain int
, has no way of even suggesting to client code how to ensure that it has passed in the right kind of int
.
Testing is much more of a hassle, because we have to patch in our own copy of fileio.open
any time we want an instance of a FileReader
in a test without doing any real-life file I/O, even if we could (for example) share a single file descriptor among many FileReader
s for testing purposes.
All of this also assumes a fileio.open
that is synchronous. Although for literal file I/O this is more of a hypothetical concern, there are many types of networked resource which are really only available via an asynchronous (and thus: potentially slow, potentially error-prone) API. If you've ever found yourself wanting to type async def __init__(self): ...
then you have seen this limitation in practice.
Comprehensively describing all the possible problems with this approach would end up being a book-length treatise on a philosophy of object oriented design, so I will sum up by saying that the cause of all these problems is the same: we are inextricably linking the act of creating a data structure with whatever side-effects are most often associated with that data structure. If they are "often" associated with it, then by definition they are not "always" associated with it, and all the cases where they aren't associated become unweildy and potentially broken.
Defining an __init__
is an anti-pattern, and we need a replacement for it.
The Solutions
I believe this tripartite assemblage of design techniques will address the problems raised above:
- using
dataclass
to define attributes, - replacing behavior that previously would have previously been in
__init__
with a new classmethod that does the same thing, and - using precise types to describe what a valid instance looks like.
Using dataclass
attributes to create an __init__
for you
To begin, let's refactor FileReader
into a dataclass
. This does get us an __init__
method, but it won't be one an arbitrary one we define ourselves; it will get the useful constraint enforced on it that it will just assign attributes.
1 2 3 4 5 6 7 |
|
Except... oops. In fixing the problems that we created with our custom __init__
that calls fileio.open
, we have re-introduced several problems that it solved:
- We have removed all the convenience of
FileReader("path")
. Now the user needs to import the low-levelfileio.open
again, making the most common type of construction both more verbose and less discoverable; if we want users to know how to build aFileReader
in a practical scenario, we will have to add something in our documentation to point at a separate module entirely. - There's no enforcement of the validity of
_fd
as a file descriptor; it's just some integer, which the user could easily pass an incorrect instance of, with no error.
In isolation, dataclass
by itself can't solve all our problems, so let's add in the second technique.
Using classmethod
factories to create objects
We don't want to require any additional imports, or require users to go looking at any other modules - or indeed anything other than FileReader
itself - to figure out how to create a FileReader
for its intended usage.
Luckily we have a tool that can easily address all of these concerns at once: @classmethod
. Let's define a FileReader.open
class method:
1 2 3 4 5 6 7 |
|
Now, your callers can replace FileReader("path")
with FileReader.open("path")
, and get all the same benefits.
Additionally, if we needed to await fileio.open(...)
, and thus we needed its signature to be @classmethod async def open
, we are freed from the constraint of __init__
as a special method. There is nothing that would prevent a @classmethod
from being async
, or indeed, from having any other modification to its return value, such as returning a tuple
of related values rather than just the object being constructed.
Using NewType
to address object validity
Next, let's address the slightly trickier issue of enforcing object validity.
Our type signature calls this thing an int
, and indeed, that is unfortunately what the lower-level fileio.open
gives us, and that's beyond our control. But for our own purposes, we can be more precise in our definitions, using NewType
:
1 2 |
|
There are a few different ways to address the underlying library, but for the sake of brevity and to illustrate that this can be done with zero run-time overhead, let's just insist to Mypy that we have versions of fileio.open
, fileio.read
, and fileio.write
which actually already take FileDescriptor
integers rather than regular ones.
1 2 3 4 |
|
We do of course have to slightly adjust FileReader
, too, but the changes are very small. Putting it all together, we get:
1 2 3 4 5 6 7 8 9 10 11 |
|
Note that the main technique here is not necessarily using NewType
specifically, but rather aligning an instance's property of "has all attributes set" as closely as possible with an instance's property of "fully valid instance of its class"; NewType
is just a handy tool to enforce any necessary constraints on the places where you need to use a primitive type like int
, str
or bytes
.
In Summary - The New Best Practice
From now on, when you're defining a new Python class:
- Make it a dataclass2.
- Use its default
__init__
method3. - Add
@classmethod
s to provide your users convenient and discoverable ways to build your objects. - Require that all dependencies be satisfied by attributes, so you always start with a valid object.
- Use
typing.NewType
to enforce any constraints on primitive data types (likeint
andstr
) which might have magical external attributes, like needing to come from a particular library, needing to be random, and so on.
If you define all your classes this way, you will get all the benefits of a custom __init__
method:
- All consumers of your data structures will receive valid objects, because an object with all its attributes populated correctly is inherently valid.
- Users of your library will be presented with convenient ways to create your objects that do as much work as is necessary to make them easy to use, and they can discover these just by looking at the methods on your class itself.
Along with some nice new benefits:
- You will be future-proofed against new requirements for different ways that users may need to construct your object.
- If there are already multiple ways to instantiate your class, you can now give each of them a meaningful name; no need to have monstrosities like
def __init__(self, maybe_a_filename: int | str | None = None):
- Your test suite can always construct an object by satisfying all its dependencies; no need to monkey-patch anything when you can always call the type and never do any I/O or generate any side effects.
Before dataclasses, it was always a bit weird that such a basic feature of the Python language - giving data to a data structure to make it valid - required overriding a method with 4 underscores in its name. __init__
stuck out like a sore thumb. Other such methods like __add__
or even __repr__
were inherently customizing esoteric attributes of classes.
For many years now, that historical language wart has been resolved. @dataclass
, @classmethod
, and NewType
give you everything you need to build classes which are convenient, idiomatic, flexible, testable, and robust.
Acknowledgments
Thank you to my patrons who are supporting my writing on this blog. If you like what you've read here and you'd like to read more of it, or you'd like to support my various open-source endeavors, you can support my work as a sponsor! I am also available for consulting work if you think your organization could benefit from expertise on topics like "but what is a 'class', really?".
-
If you aren't already familiar, a "file descriptor" is an integer which has meaning only within your program; you tell the operating system to open a file, it says "I have opened file 7 for you", and then whenever you refer to "7" it is that file, until you
close(7)
. ↩ -
Or an attrs class, if you're nasty. ↩
-
Unless you have a really good reason to, of course. Backwards compatibility, or compatibility with another library, might be good reasons to do that. Or certain types of data-consistency validation which cannot be expressed within the type system. The most common example of these would be a class that requires consistency between two different fields, such as a "range" object where
start
must always be less thanend
. There are always exceptions to these types of rules. Still, it's pretty much never a good idea to do any I/O in__init__
, and nearly all of the remaining stuff that may sometimes be a good idea in edge-cases can be achieved with a__post_init__
rather than writing a literal__init__
. ↩
17 Apr 2025 10:35pm GMT
01 Apr 2025
Planet Twisted
Glyph Lefkowitz: A Bigger Database
A Database File
When I was 10 years old, and going through a fairly difficult time, I was lucky enough to come into the possession of a piece of software called Claris FileMaker Pro™.
FileMaker allowed its users to construct arbitrary databases, and to associate their tables with a customized visual presentation. FileMaker also had a rudimentary scripting language, which would allow users to imbue these databases with behavior.
As a mentally ill pre-teen, lacking a sense of control over anything or anyone in my own life, including myself, I began building a personalized database to catalogue the various objects and people in my immediate vicinity. If one were inclined to be generous, one might assess this behavior and say I was systematically taxonomizing the objects in my life and recording schematized information about them.
As I saw it at the time, if I collected the information, I could always use it later, to answer questions that I might have. If I didn't collect it, then what if I needed it? Surely I would regret it! Thus I developed a categorical imperative to spend as much of my time as possible collecting and entering data about everything that I could reasonably arrange into a common schema.
Having thus summoned this specter of regret for all lost data-entry opportunities, it was hard to dismiss. We might label it "Claris's Basilisk", for obvious reasons.
Therefore, a less-generous (or more clinically-minded) observer might have replaced the word "systematically" with "obsessively" in the assessment above.
I also began writing what scripts were within my marginal programming abilities at the time, just because I could: things like computing the sum of every street number of every person in my address book. Why was this useful? Wrong question: the right question is "was it possible" to which my answer was "yes".
If I was obliged to collect all the information which I could observe - in case it later became interesting - I was similarly obliged to write and run every program I could. It might, after all, emit some other interesting information.
I was an avid reader of science fiction as well.
I had this vague sense that computers could kind of think. This resulted in a chain of reasoning that went something like this:
- human brains are kinda like computers,
- the software running in the human brain is very complex,
- I could only write simple computer programs, but,
- when you really think about it, a "complex" program is just a collection of simpler programs
Therefore: if I just kept collecting data, collecting smaller programs that could solve specific problems, and connecting them all together in one big file, eventually the database as a whole would become self-aware and could solve whatever problem I wanted. I just needed to be patient; to "keep grinding" as the kids would put it today.
I still feel like this is an understandable way to think - if you are a highly depressed and anxious 10-year-old in 1990.
Anyway.
35 Years Later
OpenAI is a company that produces transformer architecture machine learning generative AI models; their current generation was trained on about 10 trillion words, obtained in a variety of different ways from a large variety of different, unrelated sources.
A few days ago, on March 26, 2025 at 8:41 AM Pacific Time, Sam Altman took to "X™, The Everything App™," and described the trajectory of his career of the last decade at OpenAI as, and I quote, a "grind for a decade trying to help make super-intelligence to cure cancer or whatever" (emphasis mine).
I really, really don't want to become a full-time AI skeptic, and I am not an expert here, but I feel like I can identify a logically flawed premise when I see one.
This is not a system-design strategy. It is a trauma response.
You can't cure cancer "or whatever". If you want to build a computer system that does some thing, you actually need to hire experts in that thing, and have them work to both design and validate that the system is fit for the purpose of that thing.
Aside: But... are they, though?
I am not an oncologist; I do not particularly want to be writing about the specifics here, but, if I am going to make a claim like "you can't cure cancer this way" I need to back it up.
My first argument - and possibly my strongest - is that cancer is not cured.
QED.
But I guess, to Sam's credit, there is at least one other company partnering with OpenAI to do things that are specifically related to cancer. However, that company is still in a self-described "initial phase" and it's not entirely clear that it is going to work out very well.
Almost everything I can find about it online was from a PR push in the middle of last year, so it all reads like a press release. I can't easily find any independently-verified information.
A lot of AI hype is like this. A promising demo is delivered; claims are made that surely if the technology can solve this small part of the problem now, within 5 years surely it will be able to solve everything else as well!
But even the light-on-content puff-pieces tend to hedge quite a lot. For example, as the Wall Street Journal quoted one of the users initially testing it (emphasis mine):
The most promising use of AI in healthcare right now is automating "mundane" tasks like paperwork and physician note-taking, he said. The tendency for AI models to "hallucinate" and contain bias presents serious risks for using AI to replace doctors. Both Color's Laraki and OpenAI's Lightcap are adamant that doctors be involved in any clinical decisions.
I would probably not personally characterize "'mundane' tasks like paperwork and … note-taking" as "curing cancer". Maybe an oncologist could use some code I developed too; even if it helped them, I wouldn't be stealing valor from them on the curing-cancer part of their job.
Even fully giving it the benefit of the doubt that it works great, and improves patient outcomes significantly, this is medical back-office software. It is not super-intelligence.
It would not even matter if it were "super-intelligence", whatever that means, because "intelligence" is not how you do medical care or medical research. It's called "lab work" not "lab think".
To put a fine point on it: biomedical research fundamentally cannot be done entirely by reading papers or processing existing information. It cannot even be done by testing drugs in computer simulations.
Biological systems are enormously complex, and medical research on new therapies inherently requires careful, repeated empirical testing to validate the correspondence of existing research with reality. Not "an experiment", but a series of coordinated experiments that all test the same theoretical model. The data (which, in an LLM context, is "training data") might just be wrong; it may not reflect reality, and the only way to tell is to continuously verify it against reality.
Previous observations can be tainted by methodological errors, by data fraud, and by operational mistakes by practitioners. If there were a way to do verifiable development of new disease therapies without the extremely expensive ladder going from cell cultures to animal models to human trials, we would already be doing it, and "AI" would just be an improvement to efficiency of that process. But there is no way to do that and nothing about the technologies involved in LLMs is going to change that fact.
Knowing Things
The practice of science - indeed any practice of the collection of meaningful information - must be done by intentionally and carefully selecting inclusion criteria, methodically and repeatedly curating our data, building a model that operates according to rules we understand and can verify, and verifying the data itself with repeated tests against nature. We cannot just hoover up whatever information happens to be conveniently available with no human intervention and hope it resolves to a correct model of reality by accident. We need to look where the keys are, not where the light is.
Piling up more and more information in a haphazard and increasingly precarious pile will not allow us to climb to the top of that pile, all the way to heaven, so that we can attack and dethrone God.
Eventually, we'll just run out of disk space, and then lose the database file when the family gets a new computer anyway.
Acknowledgments
Thank you to my patrons who are supporting my writing on this blog. If you like what you've read here and you'd like to read more of it, or you'd like to support my various open-source endeavors, you can support my work as a sponsor! Special thanks also to Itamar Turner-Trauring and Thomas Grainger for pre-publication feedback on this article; any errors of course remain my own.
01 Apr 2025 12:47am GMT
15 Jan 2025
Planet Twisted
Glyph Lefkowitz: Small PINPal Update
Today on stream, I updated PINPal to fix the memorization algorithm.
If you haven't heard of PINPal before, it is a vault password memorization tool. For more detail on what that means, you can check it out the README, and why not give it a ⭐ while you're at it.
As I started writing up an update post I realized that I wanted to contextualize it a bit more, because it's a tool I really wish were more popular. It solves one of those small security problems that you can mostly ignore, right up until the point where it's a huge problem and it's too late to do anything about it.
In brief, PINPal helps you memorize new secure passcodes for things you actually have to remember and can't simply put into your password manager, like the password to your password manager, your PC user account login, your email account1, or the PIN code to your phone or debit card.
Too often, even if you're properly using a good password manager for your passwords, you'll be protecting it with a password optimized for memorability, which is to say, one that isn't random and thus isn't secure. But I have also seen folks veer too far in the other direction, trying to make a really secure password that they then forget right after switching to a password manager. Forgetting your vault password can also be a really big deal, making you do password resets across every app you've loaded into it so far, so having an opportunity to practice it periodically is important.
PINPal uses spaced repetition to ensure that you remember the codes it generates.
While periodic forced password resets are a bad idea, if (and only if!) you can actually remember the new password, it is a good idea to get rid of old passwords eventually - like, let's say, when you get a new computer or phone. Doing so reduces the risk that a password stored somewhere on a very old hard drive or darkweb data dump is still floating around out there, forever haunting your current security posture. If you do a reset every 2 years or so, you know you've never got more than 2 years of history to worry about.
PINPal is also particularly secure in the way it incrementally generates your password; the computer you install it on only ever stores the entire password in memory when you type it in. It stores even the partial fragments that you are in the process of memorizing using the secure keyring
module, avoiding plain-text whenever possible.
I've been using PINPal to generate and memorize new codes for a while, just in case2, and the change I made today was because encountered a recurring problem. The problem was, I'd forget a token after it had been hidden, and there was never any going back. The moment that a token was hidden from the user, it was removed from storage, so you could never get a reminder. While I've successfully memorized about 10 different passwords with it so far, I've had to delete 3 or 4.
So, in the updated algorithm, the visual presentation now hides tokens in the prompt several memorizations before they're removed. Previously, if the password you were generating was 'hello world', you'd see hello world
5 times or so, times, then •••• world
; if you ever got it wrong past that point, too bad, start over. Now, you'll see hello world
, then °°°° world
, then after you have gotten the prompt right without seeing the token a few times, you'll see •••• world
after the backend has locked it in and it's properly erased from your computer.
If you get the prompt wrong, breaking your streak reveals the recently-hidden token until you get it right again. I also did a new release on that same livestream, so if this update sounds like it might make the memorization process more appealing, check it out via pip install pinpal
today.
Right now this tool is still only extremely for a specific type of nerd - it's command-line only, and you probably need to hand-customize your shell prompt to invoke it periodically. But I'm working on making it more accessible to a broader audience. It's open source, of course, so you can feel free to contribute your own code!
Acknowledgments
Thank you to my patrons who are supporting my writing on this blog. If you like what you've read here and you'd like to read more things like it, or you'd like to support my various open-source endeavors, you can support my work as a sponsor!
-
Your email account password can be stored in your password manager, of course, but given that email is the root-of-trust reset factor for so many things, being able to remember that password is very helpful in certain situations. ↩
-
Funny story: at one point, Apple had an outage which made it briefly appear as if a lot of people needed to reset their iCloud passwords, myself included. Because I'd been testing PINPal a bunch, I actually had several highly secure random passwords already memorized. It was a strange feeling to just respond to the scary password reset prompt with a new, highly secure password and just continue on with my day secure in the knowledge I wouldn't forget it. ↩
15 Jan 2025 12:54am GMT
29 Nov 2024
Planet Plone - Where Developers And Integrators Write
Maurits van Rees: Lightning talks Friday
Bonnie Tyler Sprint
On 12 August 2026 there is a total solar eclipse that can be seen from Valencia, Spain. So we organise a sprint there.
This conference
We had 291 participants, 234 in person and 57 online. 13 Brazilian states (that is all of them), 14 countries.
24.5 percent women, was 13% in 2013, so that has gone up, but we are not there yet. Thank you to PyLadies and Django Girls for making this happen.
We had more than 80 presenters, about 30 lightning talks, lots of talk in the hall ways.
Thanks also to the team!
Ramiro Luz: Yoga time
Yoga exercise.
Rikupekka: University case student portal
We have a student portal at the university. But mostly:
Welcome to Jyväskylä university in Finald for Plone conference 2025, October 13-19!
Jakob: Beethovensprint
26-30 mei 2025 in Bonn, Duitsland.
Afterwards, on May 30 and June 1 there will be FedCon in Bonn, a SciFi convention.
Piero/Victor: BYOUI
Add-ons first development with @plone/registry. See https://plone-registry.readthedocs.io/
It allows for development that is framework agnostic, so it is not only for Plone. It is around configuration that can be extended and injected, which is tricky in most javascript frameworks.
Imagine it.
Ana Dulce: 3D printing
For a difficult model I had trust the process, it took a week, but it worked.
Renan & Iza: Python Brasil
We organised the Python Brasil conference from 16 to 23 October this year in Rio de Janeiro.
Next year 21-27 October in São Paulo.
Erico: Python Cerrado
31 July to 2 August 2025 is the next Python Cerrado conference.
29 Nov 2024 10:25pm GMT
Maurits van Rees: Paul Roeland: The value of longevity
Link to talk information on Plone conference website.
I work for the Clean Clothes Campaign: https://cleanclothes.org/
After three large disasters in factories in 2012 and 2013 with over 1000 deaths, it took three years to get an agreement with clothes manufacturers to get 30 million dollar compensation. It does not bring lives back, but it helps the survivors.
See Open Supply Hub for open data that we collected, for checking which brands are produced in which factories.
Documenting history matters. Stories must be told.
The global closing industry is worth around 1.8 trillion dollars, in a country that would put them on the 12th place in the world. 75 million workers.
Our strongest weapon: backlinks. We have links from OECD, UN, wikipedia, school curriculum, books. Especially those last two don't change ever, so you should never change urls.
Plone: enable the sitemap, please, why not by default? Create a good robots.txt. I weekly check Google Search console, looking for broken links. Tag early, tag often, great tool, even if you have an AI do it.
Our website: started 1998 written in Notepad, 2004 Dreamweaver, 2006 Bluefish, 2010 Joomla, 2013 Plone 4, 2020 Castle CMS (opinionated distribution of Plone, but does not really exist anymore) 2024 Plone 6 with Volto Light Theme (work in progress). Thank you kitconcept for all the help, especially Jonas.
Migrations are painful. Along the years we used wget to csv to SQL to csv, Python script, "Franken-mogrifier", collective.exportimport.
Lessons learned: stable urls are awesome, migrations are painful. Please don't try to salvage CSS from your old site, just start fresh in your new system. Do not try to migrate composite pages or listings.
What if your website does not provide an export? Use wget, still works and is better than httrack. sed/awk/regex are your friend. archivebox (WARC).
Document your steps for your own sanity.
To manage json, jq or jello can be used. sq is a Swiss knife for json/sql/csv. emuto is a hybrid between jq and GraphQL.
Normalize import/export. We have `plone.exportimport` in core now.
In the future I would like a plone exporter script that accepts a regex and exports only matching pages. Switch backends: ZODB, relstorage, nick, quantum-db. Sitewide search/replace/sed. Sneakernet is useful in difficult countries where you cannot send data over the internet: so export to a usb stick.
A backup is only a backup if it regularly gets restored so you know that it works.
- Keeping content and URL stability is a superpower.
- Assuming that export/import/backup/restore/migration are rare occurrences, is wrong.
- Quick export/import is very useful.
Do small migrations, treat it as maintenance. Don't be too far behind. Large migrations one every five years will be costly. Do a small migration every year. Do your part. Clients should also do their part, by budgeting this yearly. That is how budgeting works. Use every iteration to review custom code.
Make your sites live long and prosper.
29 Nov 2024 8:58pm GMT
Maurits van Rees: Fred van Dijk: Run Plone in containers on your own cluster with coolify.io
Link to talk information on Plone conference website.
Sorry, I ran out of time trying to set up https://coolify.io
So let's talk about another problem. Running applications (stacks) in containers is the future. Well: abstraction and isolation is the future, and containers is the current phase.
I am on the Plone A/I team, with Paul, Kim, Erico. All senior sysadmins, so we kept things running. In 2022 we worked on containerisation. Kubernetes was the kool kid then, but Docker Swarm was easier. Checkout Erico's training with new cookieplone templates.
Doing devops well is hard. You have a high workload, but still need to keep learning new stuff to keep up with what is changing.
I want to plug Coolify, which is a full open source product. "Self-hosting with super powers." The main developer, Andras Bacsal, believes in open source and 'hates' pay by usage cloud providers with a vengeance.
Coolify is still docker swarm. We also want Kubernetes support. But we still need sysadmins. Someone will still need to install coolify, and keep it updated.
I would like to run an online DevOps course somewhere January-March 2025. 4-6 meetings of 2 hours, maybe Friday afternoon. Talk through devops and sysadmin concepts, show docker swarm, try coolify, etc.
29 Nov 2024 7:58pm GMT