20 May 2025

feedPlanet Python

PyCoder’s Weekly: Issue #682: pylock.toml, NumPy, t-strings, and More (May 20, 2025)

#682 - MAY 20, 2025
View in Browser »

The PyCoder’s Weekly Logo


Going Beyond requirements.txt With pylock.toml

What is the best way to record the Python dependencies for the reproducibility of your projects? What advantages will lock files provide for those projects? This week on the show, we welcome back Python Core Developer Brett Cannon to discuss his journey to bring PEP 751 and the pylock.toml file format to the community.
REAL PYTHON podcast

I Don't Like NumPy

This opinion piece talks about why NumPy gets difficult fast. Two dimensions to your array? No problem, the calc is mostly self evident. Add a couple more dimensions and it gets messy fast. See also the associated HN discussion, which also includes possible solutions.
DYNOMIGHT

Webinar: Inside Dust's AI-Powered Vision for the Future of Work

alt

Join us on June 5 for a behind-the-scenes look at how Dust is building the future of work with AI and why Temporal is at the heart of their platform. We'll explore how Dust is transforming those everyday tasks with intelligent agents that are deeply contextual, resilient, and production-ready →
TEMPORAL sponsor

Unravelling t-strings

PEP 750 introduced t-strings for Python 3.14. These are a template string mechanism similar to f-strings. Although they are in 3.14.0b1, there isn't any documentation yet, so this post explains what they are how they can be used.
BRETT CANNON

Quiz: Getting Started With Python IDLE

REAL PYTHON

Python Jobs

Senior Software Engineer - Quant Investment Platform (LA or Dallas) (Los Angeles, CA, USA)

Causeway Capital Management LLC

More Python Jobs >>>

Articles & Tutorials

Working With Missing Data in Polars

In this video course, you'll learn how to deal with missing data in Polars to ensure it doesn't interfere with your data analysis. You'll discover how to check for missing values, update them, and remove them.
REAL PYTHON course

Django Security Best Practices

This guide contains a long list of best practices for securing your Django site. It includes topics such as ensuring HTTPS everywhere, hardening your database, using key built-in security features, and more.
AHMAD SADEDDIN

Read My Lips: Automate Lip Syncing with Wav2Lip

Don't let your message be misunderstood! Use Wav2Lip for automated and AI-powered lip syncing. Get the code now.
INTEL CORPORATION sponsor

t-strings in Python (PEP 750)

Python 3.14 will include the new t-string concept, an f-string like mechanism for string templating. Talk Python interviews Paul Everitt, David Peck, and Jim Baker and they talk about the upcoming feature.
KENNEDY ET AL podcast

The Database Row That Did and Didn't Exist

Misty ran into a confounding bug, where loading and saving certain Django ORM objects was failing, but not all of them. Read on to understand what went wrong.
MISTY DE MÉO

The Appearing Built-In

Rodrigo discovered that examining the builtins module in the REPL changes the size of the builtins module. Read on to find what causes this oddity.
RODRIGO GIRÃO SERRÃO

What Does @Slot() Do in Qt?

When working with Qt slots and signals in PySide6 you might discover the @Slot decorator. This post tells you what it does and why you might use it.
MARTIN FITZPATRICK

5 Steps to N-Body Simulation

This article shows you how to simulate gravity in an N-body problem. It starts out with simpler simulation then adds optimizations as it goes along.
ALVIN NG

PEPs & Co.

A little bit of Python history: where the term "PEP" came from, along with a list of similar types of documents for other languages.
HUGO VAN KEMENADE

The First Year of Free-Threaded Python

A recap of the first year of work on enabling support for the free-threaded build of CPython in community packages.
NATHAN GOLDBAUM

Pandas: Drop Columns by Name in DataFrames

A quick overview of the variety of ways to drop columns from a pandas DataFrame, referencing the columns by name.
ERIK MARSJA

Working on Complex Systems: What I Learned at Google

Explore the key characteristics of complex systems and learn practical patterns for tackling complex problems.
TEIVA HARSANYI

Building a Real-Time Dashboard With Flask and Svelte

Learn how to build a real-time analytics dashboard using Flask and Svelte with server-sent events.
AMIR TADRISI

Projects & Code

manim: Framework for Mathematical Animations

GITHUB.COM/MANIMCOMMUNITY

pytest-benchmark: pytest Fixture for Benchmarking Code

GITHUB.COM/IONELMC

param: Clearer Python by Declaring Parameters

GITHUB.COM/HOLOVIZ

pyftpdlib: Fast and Scalable Python FTP Server Library

GITHUB.COM/GIAMPAOLO

pyrefly: A Fast Type Checker and IDE for Python

GITHUB.COM/FACEBOOK

Events

Weekly Real Python Office Hours Q&A (Virtual)

May 21, 2025
REALPYTHON.COM

Python Leiden User Group

May 22, 2025
PYTHONLEIDEN.NL

PyLadies Amsterdam: Dapr in Action: From Core Concepts to AI Agents

May 27, 2025
MEETUP.COM

Python Sheffield

May 27, 2025
GOOGLE.COM

PyCon Italia 2025

May 28 to June 1, 2025
PYCON.IT

PyKla Monthly Meetup

May 28, 2025
MEETUP.COM


Happy Pythoning!
This was PyCoder's Weekly Issue #682.
View in Browser »

alt


[ Subscribe to 🐍 PyCoder's Weekly 💌 - Get the best Python news, articles, and tutorials delivered to your inbox once a week >> Click here to learn more ]

20 May 2025 7:30pm GMT

Ari Lamstein: Course Review: Build AI Agents with Python

I was slow to jump on the LLM bandwagon.

In some ways this is surprising. Although it was a while ago, I did take several AI courses in grad school. So I am probably in a good position to dive into new developments in the field.

But when the LLM wave started I was busy trying to develop software engineering expertise in Python. I wanted to finish that "project" before moving onto something new. It was only a few months ago, when the PR of my multiyear module was merged to the censusdis package (link), that I felt like I could turn my attention elsewhere.

Coincidentally, around that time Kevin Markham created his first course on LLMs. I wrote a review of that course here. In this post I review his followup course: Build AI Agents with Python.

The Problem AI Agents Solve

Something I learned in this course is that LLMs like GPT-4o mini are called "foundational models". Foundational models have very real limitations. Consider this conversation I just had with GPT-4o mini:

User: What is your cutoff date?
AI: My knowledge cutoff date is October 2021. This means I do not have information on events or developments that occurred after that date. If you have questions based on information available up to then, feel free to ask!

User: Do you have access to the web?
AI: No, I do not have access to the web or the ability to browse the internet. My responses are generated based on a mixture of licensed data, data created by human trainers, and publicly available information up to my knowledge cutoff date in October 2021. If you have any questions within that scope, feel free to ask!

User: What is the latest version of Python?
AI: As of my last knowledge update in October 2021, the latest stable release of Python was Python 3.10, which was released on October 4, 2021. For any updates or newer versions released after that date, I recommend checking the official Python website at https://www.python.org/.

In short, the answers that foundational models give are limited to the data they were trained on. For reference, the latest version of Python is 3.13.3, which was released in April 2025.

Augmenting Foundational Models

You can think of AI Agents as augmenting foundational models with tools, data and prompts that you provide them. This allows them to provide better answers to certain questions.

As an example, Build AI Agents with Python walks you though the process of giving GPT-4o mini the "tool" of web search using Tavily. This resulted in the following conversation:

User: What is your cutoff date?
AI: My knowledge cutoff date is in October 2021. I do not have information on events or developments that occurred after that date. However, I can access real-time information through specific tools. If you have a particular query, feel free to ask!

User: Do you have access to the web?
AI: Yes, I can access real-time information through a specific tool to search the web for current events and updates. If you have a particular topic or question in mind, let me know, and I can look it up for you!

User: What is the latest version of python?
AI: The latest stable version of Python is 3.13.3. You can download it from the official Python website https://www.python.org/downloads/.

I had not heard of Tavily prior to taking this course. It turns out that there is an entire ecosystem of tools around LLMs!

In addition to adding in a tool for web search, the course guides you through integrating custom data sources using Retrieval Augmented Generation (RAG) and building a Q&A system that can interact with a SQL database. For a course that only contains two hours of videos, I felt that it covered a lot.

Like with all online courses, there is an argument that I could have learned all this on my own. But "get your hands dirty with LLMs" had been on my todo list for a while. It was only taking this course that made it happen. If you are looking for a similar push, then you might enjoy this course too.

Wish List

Kevin is gifted at coming up with good examples and clearly explaining his solutions. If there was one change I could make to the course it would be for him to add exercises to it.

Longtime readers might remember that I have created and taught several courses myself. My initial courses did not have exercises. It was only when I read Telling Ain't Training that I learned how valuable they are for students. That book also presents a framework for creating exercises that I still use today.

Next Steps

After taking an online course, I often want to apply what I learned to a project of my own. In this case, I can now imagine adding an LLM feature to my Covid Demographics Explorer.

The dataset that powers that app is in a single CSV file. I think it would be fun to create a chatbot that has access to that file and uses it to answer questions. That would allow users to ask questions about the dataset using natural language and get answers in natural language.

20 May 2025 3:21pm GMT

Real Python: First Steps With LangChain

You've likely interacted with large language models (LLMs), like the ones behind OpenAI's ChatGPT, and experienced their remarkable ability to answer questions, summarize documents, write code, and much more.

While LLMs are remarkable by themselves, with a little programming knowledge, you can leverage libraries like LangChain to create your own LLM-powered applications that can do just about anything.

In this video course, you'll learn how to:


[ Improve Your Python With 🐍 Python Tricks 💌 - Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

20 May 2025 2:00pm GMT

feedDjango community aggregator: Community blog posts

Python: a quick cProfile recipe with pstats

Python comes with two built-in profilers for measuring the performance of your code: cProfile and profile. They have the same API, but cProfile is a C extension, while profile is implemented in Python. You nearly always want to use cProfile, as it's faster and doesn't skew measurements as much.

By default, cProfile's CLI profiles a command and displays its profile statistics afterwards. But that can be a bit limited, especially for reading large profiles or re-sorting the same data in different ways.

For more flexibility, cProfile can instead save the profile data to a file, which you can then read with the pstats module. This is my preferred way of using it, and this post covers a recipe for doing so, with a worked example.

The recipe

First, profile your script:

$ python -m cProfile -o profile <script> [args]

Replace <script> with the path to your Python file, and [args] with any arguments you want to pass to it. cProfile will run your script under its profiling machinery, saving the results to a file called profile, as specified by the -o option.

Second, view the profile file using pstats:

$ python -m pstats profile <<< $'sort cumtime\nstats 1000' | less

The pstats CLI provides a REPL for interacting with profile files, based on its Stats class. The CLI is oddly undocumented, but its help command lists the available commands.

The above command passes several commands to pstats in a string. The string uses the $ syntax, a Bash feature for C-style strings, allowing \n to represent a newline, passing two commands:

  1. sort cumtime: Sort the output by cumulative time, largest first. This means the time spent in a function and all its callees.
  2. stats 1000: Show the first 1,000 lines of the profile.

The output is passed to less, a common pager, allowing you to scroll through the results. Press q to quit when you're done!

Profile a module

If you're running a module instead of a script, add -m like:

$ python -m cProfile -o profile -m <module> [args]

Replace <module> with the name of the module you want to profile, and [args] with any arguments you want to pass to it.

Multiple profiles

If you're profiling code before and after, consider using different profile file names instead of just profile. For example, for checking the results of some optimization, I often use the names before.profile and after.profile, like:

$ python -m cProfile -o before.profile example.py

$ git switch optimize_all_the_things

$ python -m cProfile -o after.profile example.py

Alternative sort orders

To sort by other metrics, swap cumtime in sort cumtime for one of these values, per the Stats.sort_stats() documentation:

  • time: internal time-the time spent in the function itself, excluding calls to other functions.

    This is useful for finding the slowest functions in your code.

  • calls: number of calls to the function.

    This is useful for finding functions that are called many times and may be candidates for optimization, such as caching.

A Djangoey example

Here's a worked example showing how to apply this recipe to a Django management command. Say you are testing a database migration locally:

$ ./manage.py migrate example 0002
Operations to perform:
  Target specific migration: 0002_complexito, from example
Running migrations:
  Applying example.0002_complexito... OK

While it did pass, it was unexpectedly slow. To profile it, you would first reverse the migration to reset your test database:

$ ./manage.py migrate example 0001
...

Then you could apply the recipe to profile the migration.

First, stick the cProfile command in front of the migration command:

$ python -m cProfile -o profile ./manage.py migrate example 0002
Operations to perform:
  Target specific migration: 0002_complexito, from example
Running migrations:
  Applying example.0002_complexito... OK

Then, run the second pstats command to view the results:

$ python -m pstats profile <<< $'sort cumtime\nstats 1000' | less

This opens less with a long table, starting:

Welcome to the profile statistics browser.
profile% profile% Mon May 19 23:52:37 2025    profile

         213287 function calls (206021 primitive calls) in 1.150 seconds

   Ordered by: cumulative time
   List reduced from 3576 to 1000 due to restriction <1000>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   425/1    0.001    0.000    1.150    1.150 {built-in method builtins.exec}
       1    0.000    0.000    1.150    1.150 ./manage.py:1(<module>)
       1    0.000    0.000    1.150    1.150 ./manage.py:7(main)
       1    0.000    0.000    1.109    1.109 /.../django/core/management/__init__.py:439(execute_from_command_line)
   ...

The header tells us how many function calls were made, how many were primitive calls, and how long the code took to run. Then there's the table of all function calls, limited to 1,000 entries.

Since we're sorting by cumtime, cumulative time spent in each function, the first line shows the total time spent in all functions. That exec is cProfile running your code, and the later lines represent the top-level wrappers from Django.

Generally, it's best to find the first listed function within your code base. In this profile, you would search for ``example/ and find this entry:

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
...
    1    0.000    0.000    1.005    1.005 /.../example/migrations/0002_complexito.py:4(forward)
...

One call to the forward() function in the migration file took 1.005 seconds, nearly all of the 1.150 seconds total runtime. That's a bit suspicious!

Right above that entry, you might also spot the time spent running queries:

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
...
   13    0.000    0.000    1.007    0.077 /.../django/db/backends/utils.py:78(execute)
   13    0.000    0.000    1.007    0.077 /.../django/db/backends/utils.py:88(_execute_with_wrappers)
   13    0.000    0.000    1.007    0.077 /.../django/db/backends/utils.py:94(_execute)
   13    0.000    0.000    1.007    0.077 /.../django/db/backends/sqlite3/base.py:354(execute)
   13    1.006    0.077    1.006    0.077 {function SQLiteCursorWrapper.execute at 0x1054f7f60}
...

This stack of functions all show 13 calls, with a cumulative time of 1.007 or 1.006 seconds. They represent Django's database backend wrappers, which eventually pass the query to Python's SQLiteCursorWrapper.execute(), which is displayed differently because it's implemented in C.

So, we can tell that the migration ran 13 queries in total, and at least one of them was slow and ran by forward(). At this point, you might look at the source of forward() to see if you can find the slow query. But first, you might want to re-display the profile to show only the forward() function and its callees (the functions it called), which might shed some light on what it was doing.

To show only forward() and its callees, you can use the pstats callees command. This takes a regular expression to match the function names you want to show:

$ python -m pstats profile <<< $'sort cumtime\ncallees \\bforward\\b' | less
Welcome to the profile statistics browser.
profile% profile%    Ordered by: cumulative time
   List reduced from 3576 to 1 due to restriction <'\\bforward\\b'>

Function
called...
ncalls  tottime  cumtime
/.../example/migrations/0002_complexito.py:4(forward)
  ->       1    0.000    0.000  /.../django/db/backends/utils.py:41(__enter__)
1    0.000    0.000  /.../django/db/backends/utils.py:44(__exit__)
1    0.000    1.005  /.../django/db/backends/utils.py:78(execute)
1    0.000    0.000  /.../django/utils/asyncio.py:15(inner)
1    0.000    0.000  {method 'create_function' of 'sqlite3.Connection' objects}

profile%
Goodbye.

(Output wrapped.)

This has revealed:

  • forward() only calls execute() once, so there's only one slow query.
  • There's also a call to SQLite's create_function(). It's fast, rounding down to 0.000 seconds, but perhaps may be something to do with the slow query.

Okay, time to look at the source:

def forward(apps, schema_editor):
    import time

    schema_editor.connection.connection.create_function(
        "sleep",
        1,
        time.sleep,
    )
    with schema_editor.connection.cursor() as cursor:
        cursor.execute("SELECT sleep(1)")

Ah, it's a deliberate pause that I added to show you this example. Well, that solves that mystery.

Fin

May you cook up some great profiles with this recipe!

-Adam

20 May 2025 4:00am GMT

16 May 2025

feedDjango community aggregator: Community blog posts

Django News -  Django News is at PyCon US this weekend! - May 16th 2025

Introduction

Django News is at PyCon US this weekend!

Jeff and Will are at PyCon US in Pittsburgh this weekend and would love to meet fellow Django enthusiasts. Drop by the DSF or JetBrains booth to say hello and connect with the many Django community members and DSF folks who will be around all weekend.

Django Newsletter

News

Google Summer of Code 2025 - Django Projects

Three projects out of many worth proposals were accepted. Improvements to Django admin, adding django-template-partials to core, and automating processes in the Django contribution workflow.

withgoogle.com

Waiting for Postgres 18: Accelerating Disk Reads with Asynchronous I/O

Postgres 18 introduces asynchronous I/O with new io_method options (worker and io_uring), which can double or triple read performance in high-latency cloud environments.

pganalyze.com

Django Software Foundation

Simon Charette is the DSF member of the month

Simon Charette is a longtime Django contributor and community member. He served on the Django 5.x Steering Council and is part of the Security team and the Triage and Review team.

djangoproject.com

Updates to Django

Today 'Updates to Django' is presented by Abigail Afi Gbadago from the DSF Board and Djangonaut Space!🚀

Last week we had 10 pull requests merged into Django by 7 different contributors - including a first-time contributor! Congratulations to Safrone for having their first commits merged into Django - welcome on board!🎉

This week's Django highlights 🌟

Django Newsletter

Wagtail CMS

Our four contributors for Google Summer of Code 2025

Four GSoC 2025 contributors will extend Wagtail with grid-aware sustainability, strict CSP compatibility, improved media listings, and enhanced keyboard shortcut accessibility.

wagtail.org

Sponsored Link 1

Hire Django developers without the hassle!

Building a team of skilled Django developers has never been easier. Trust HackSoft to help you with strategic Django team augmentation. Learn more!

hacksoft.io

Articles

Django Security Best Practices: A Comprehensive Guide for Software Engineers

Enforce up-to-date Django versions, HTTPS, strong SECRET_KEY, ORM usage, built-in security middleware, XSS/CSRF defenses, robust authentication, dependency auditing, logging, and monitoring.

corgea.com

18 Years of REVSYS

Revsys marks 18 years offering Python and Django expertise, including code reviews, architectural design, cloud migrations, Kubernetes, CI/CD, AI integration, and team training.

revsys.com

Django: model field choices that can change without a database migration

Use Django 5.0 callable choices to avoid no-op migrations when updating model field choices, though database constraints still require migrations for data integrity.

adamj.eu

Algorithms: Learning One's Learnings

Use Big O notation to choose efficient sorting in Django apps, leveraging Python's built-in Timsort or Quick Sort instead of Bubble Sort to improve performance.

djangotricks.com

Birds and Angles: Dabbling in Django Components

Combining django-bird and dj-angles enables Web-component style reusable Django template components for cleaner syntax and improved readability, despite limited filter parsing for props.

bencardy.co.uk

Setting up NGINX Unit (and switching from uWSGI)

Switch Django apps from uWSGI to NGINX Unit using JSON configuration, add SECURE_PROXY_SSL_HEADER, adjust socket proxy_pass, and enable ASGI/WSGI deployments.

shenbergertech.com

My DjangoCon Europe 2025

Paolo Melchiorre recaps his DjangoCon Europe 2025 experience in Dublin through Mastodon posts covering keynotes, talks on testing, migrations, community events, and mentoring.

paulox.net

Tutorials

Rapid AI-powered applications with Django MongoDB and Voyage API

Learn how to build an LLM-powered recipe recommendation website with Django and MongoDB.

dev.to

Podcasts

Django Chat #182: Event Sourcing with Chris May

Chris is a Senior Staff Engineer at WellSky, a software company in the health industry. We discuss his background as a graphic designer, learning Python (and Django) as an adult, his multiple conference talks on HTMX, why he's a fan of event sourcing, and more.

simplecast.com

Talk Python #505: t-strings in Python (PEP 750)

A panel discussion of PEP 750 on t-strings, scheduled for Python 3.14, which build on the idea of f-strings to produce a template object rather than a standard string.

talkpython.fm

Django News Jobs

Python / Django Software Developer - full-time at Off Duty Management 🆕

Backend Python Developer (Django/DRF) at Paytree

Django Newsletter

Projects

astral-sh/ty

An extremely fast Python type checker and language server, written in Rust.

github.com

pydantic/pydantic-ai

Agent Framework / shim to use Pydantic with LLMs.

github.com


This RSS feed is published on https://django-news.com/. You can also subscribe via email.

16 May 2025 3:00pm GMT

My second DjangoCon Europe

Well I have been meaning to write this for over 2 weeks now, but better late than never! Towards the end of April 2025 I attended the DjangoCon Europe conference and Sprints and it was brilliant and exhausting all in one go.

Let's begin with the travel there, I decided to join those doing the SailRail for a relaxed train ride and crossing the sea to Dublin. This was great as I managed to make some use of the day (work and a blog post) while travelling as well as having some travel companions in the form of Thibaud, Sage, Tom & Daniele.

The next day kicked off the conference with an excellent keynote from Sarah Boyce, and other talks followed thoughout the next 2 days. Databases was a big theme along with community engagement and HTMX. However for me it was walking into the room and meeting folks from the community in person, that I have interacted with online for the past couple of years. This was also coupled with great conversations with friends new & old (mostly around making Django better). I also plucked up the courage and gave a lighting talk on the last day about my year of 100 words.

The evening socials again were excellent! Django Social on Wednesday and the official party on Friday, with a more chill evening going climbing with a couple of interested attendees. The weekend brought the Sprints which were just perfect. I managed to crack on with an open ticket/PR I have for the messages app in Django and also make some good progress on django-prodserver.

It was sad to leave, but reminds me that I want to go next year (if I am allowed by the family!). I am also excited with the energy I felt across the week reminding me that Django is going strong as ever and the communuity has a bright future. I could write more, but I am aware that I need to crack on with today's work, but I will leave you with the recommendation of getting to a DjangoCon if are use Django in any form, you will not be disappointed.

16 May 2025 5:00am GMT

17 Apr 2025

feedPlanet Twisted

Glyph Lefkowitz: Stop Writing `__init__` Methods

The History

Before dataclasses were added to Python in version 3.7 - in June of 2018 - the __init__ special method had an important use. If you had a class representing a data structure - for example a 2DCoordinate, with x and y attributes - you would want to be able to construct it as 2DCoordinate(x=1, y=2), which would require you to add an __init__ method with x and y parameters.

The other options available at the time all had pretty bad problems:

  1. You could remove 2DCoordinate from your public API and instead expose a make_2d_coordinate function and make it non-importable, but then how would you document your return or parameter types?
  2. You could document the x and y attributes and make the user assign each one themselves, but then 2DCoordinate() would return an invalid object.
  3. You could default your coordinates to 0 with class attributes, and while that would fix the problem with option 2, this would now require all 2DCoordinate objects to be not just mutable, but mutated at every call site.
  4. You could fix the problems with option 1 by adding a new abstract class that you could expose in your public API, but this would explode the complexity of every new public class, no matter how simple. To make matters worse, typing.Protocol didn't even arrive until Python 3.8, so, in the pre-3.7 world this would condemn you to using concrete inheritance and declaring multiple classes even for the most basic data structure imaginable.

Also, an __init__ method that does nothing but assign a few attributes doesn't have any significant problems, so it is an obvious choice in this case. Given all the problems that I just described with the alternatives, it makes sense that it became the obvious default choice, in most cases.

However, by accepting "define a custom __init__" as the default way to allow users to create your objects, we make a habit of beginning every class with a pile of arbitrary code that gets executed every time it is instantiated.

Wherever there is arbitrary code, there are arbitrary problems.

The Problems

Let's consider a data structure more complex than one that simply holds a couple of attributes. We will create one that represents a reference to some I/O in the external world: a FileReader.

Of course Python has its own open-file object abstraction, but I will be ignoring that for the purposes of the example.

Let's assume a world where we have the following functions, in an imaginary fileio module:

Our hypothetical fileio.open returns an integer representing a file descriptor1, fileio.read allows us to read length bytes from an open file descriptor, and fileio.close closes that file descriptor, invalidating it for future use.

With the habit that we have built from writing thousands of __init__ methods, we might want to write our FileReader class like this:

1
2
3
4
5
6
7
class FileReader:
    def __init__(self, path: str) -> None:
        self._fd = fileio.open(path)
    def read(self, length: int) -> bytes:
        return fileio.read(self._fd, length)
    def close(self) -> None:
        fileio.close(self._fd)

For our initial use-case, this is fine. Client code creates a FileReader by doing something like FileReader("./config.json"), which always creates a FileReader that maintains its file descriptor int internally as private state. This is as it should be; we don't want user code to see or mess with _fd, as that might violate FileReader's invariants. All the necessary work to construct a valid FileReader - i.e. the call to open - is always taken care of for you by FileReader.__init__.

However, additional requirements will creep in, and as they do, FileReader.__init__ becomes increasingly awkward.

Initially we only care about fileio.open, but later, we may have to deal with a library that has its own reasons for managing the call to fileio.open by itself, and wants to give us an int that we use as our _fd, we now have to resort to weird workarounds like:

1
2
3
4
def reader_from_fd(fd: int) -> FileReader:
    fr = object.__new__(FileReader)
    fr._fd = fd
    return fr

Now, all those nice properties that we got from trying to force object construction to give us a valid object are gone. reader_from_fd's type signature, which takes a plain int, has no way of even suggesting to client code how to ensure that it has passed in the right kind of int.

Testing is much more of a hassle, because we have to patch in our own copy of fileio.open any time we want an instance of a FileReader in a test without doing any real-life file I/O, even if we could (for example) share a single file descriptor among many FileReader s for testing purposes.

All of this also assumes a fileio.open that is synchronous. Although for literal file I/O this is more of a hypothetical concern, there are many types of networked resource which are really only available via an asynchronous (and thus: potentially slow, potentially error-prone) API. If you've ever found yourself wanting to type async def __init__(self): ... then you have seen this limitation in practice.

Comprehensively describing all the possible problems with this approach would end up being a book-length treatise on a philosophy of object oriented design, so I will sum up by saying that the cause of all these problems is the same: we are inextricably linking the act of creating a data structure with whatever side-effects are most often associated with that data structure. If they are "often" associated with it, then by definition they are not "always" associated with it, and all the cases where they aren't associated become unweildy and potentially broken.

Defining an __init__ is an anti-pattern, and we need a replacement for it.

The Solutions

I believe this tripartite assemblage of design techniques will address the problems raised above:

Using dataclass attributes to create an __init__ for you

To begin, let's refactor FileReader into a dataclass. This does get us an __init__ method, but it won't be one an arbitrary one we define ourselves; it will get the useful constraint enforced on it that it will just assign attributes.

1
2
3
4
5
6
7
@dataclass
class FileReader:
    _fd: int
    def read(self, length: int) -> bytes:
        return fileio.read(self._fd, length)
    def close(self) -> None:
        fileio.close(self._fd)

Except... oops. In fixing the problems that we created with our custom __init__ that calls fileio.open, we have re-introduced several problems that it solved:

  1. We have removed all the convenience of FileReader("path"). Now the user needs to import the low-level fileio.open again, making the most common type of construction both more verbose and less discoverable; if we want users to know how to build a FileReader in a practical scenario, we will have to add something in our documentation to point at a separate module entirely.
  2. There's no enforcement of the validity of _fd as a file descriptor; it's just some integer, which the user could easily pass an incorrect instance of, with no error.

In isolation, dataclass by itself can't solve all our problems, so let's add in the second technique.

Using classmethod factories to create objects

We don't want to require any additional imports, or require users to go looking at any other modules - or indeed anything other than FileReader itself - to figure out how to create a FileReader for its intended usage.

Luckily we have a tool that can easily address all of these concerns at once: @classmethod. Let's define a FileReader.open class method:

1
2
3
4
5
6
7
from typing import Self
@dataclass
class FileReader:
    _fd: int
    @classmethod
    def open(cls, path: str) -> Self:
        return cls(fileio.open(path))

Now, your callers can replace FileReader("path") with FileReader.open("path"), and get all the same benefits.

Additionally, if we needed to await fileio.open(...), and thus we needed its signature to be @classmethod async def open, we are freed from the constraint of __init__ as a special method. There is nothing that would prevent a @classmethod from being async, or indeed, from having any other modification to its return value, such as returning a tuple of related values rather than just the object being constructed.

Using NewType to address object validity

Next, let's address the slightly trickier issue of enforcing object validity.

Our type signature calls this thing an int, and indeed, that is unfortunately what the lower-level fileio.open gives us, and that's beyond our control. But for our own purposes, we can be more precise in our definitions, using NewType:

1
2
from typing import NewType
FileDescriptor = NewType("FileDescriptor", int)

There are a few different ways to address the underlying library, but for the sake of brevity and to illustrate that this can be done with zero run-time overhead, let's just insist to Mypy that we have versions of fileio.open, fileio.read, and fileio.write which actually already take FileDescriptor integers rather than regular ones.

1
2
3
4
from typing import Callable
_open: Callable[[str], FileDescriptor] = fileio.open  # type:ignore[assignment]
_read: Callable[[FileDescriptor, int], bytes] = fileio.read
_close: Callable[[FileDescriptor], None] = fileio.close

We do of course have to slightly adjust FileReader, too, but the changes are very small. Putting it all together, we get:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
from typing import Self
@dataclass
class FileReader:
    _fd: FileDescriptor
    @classmethod
    def open(cls, path: str) -> Self:
        return cls(_open(path))
    def read(self, length: int) -> bytes:
        return _read(self._fd, length)
    def close(self) -> None:
        _close(self._fd)

Note that the main technique here is not necessarily using NewType specifically, but rather aligning an instance's property of "has all attributes set" as closely as possible with an instance's property of "fully valid instance of its class"; NewType is just a handy tool to enforce any necessary constraints on the places where you need to use a primitive type like int, str or bytes.

In Summary - The New Best Practice

From now on, when you're defining a new Python class:

If you define all your classes this way, you will get all the benefits of a custom __init__ method:

Along with some nice new benefits:

Before dataclasses, it was always a bit weird that such a basic feature of the Python language - giving data to a data structure to make it valid - required overriding a method with 4 underscores in its name. __init__ stuck out like a sore thumb. Other such methods like __add__ or even __repr__ were inherently customizing esoteric attributes of classes.

For many years now, that historical language wart has been resolved. @dataclass, @classmethod, and NewType give you everything you need to build classes which are convenient, idiomatic, flexible, testable, and robust.


Acknowledgments

Thank you to my patrons who are supporting my writing on this blog. If you like what you've read here and you'd like to read more of it, or you'd like to support my various open-source endeavors, you can support my work as a sponsor! I am also available for consulting work if you think your organization could benefit from expertise on topics like "but what is a 'class', really?".


  1. If you aren't already familiar, a "file descriptor" is an integer which has meaning only within your program; you tell the operating system to open a file, it says "I have opened file 7 for you", and then whenever you refer to "7" it is that file, until you close(7).

  2. Or an attrs class, if you're nasty.

  3. Unless you have a really good reason to, of course. Backwards compatibility, or compatibility with another library, might be good reasons to do that. Or certain types of data-consistency validation which cannot be expressed within the type system. The most common example of these would be a class that requires consistency between two different fields, such as a "range" object where start must always be less than end. There are always exceptions to these types of rules. Still, it's pretty much never a good idea to do any I/O in __init__, and nearly all of the remaining stuff that may sometimes be a good idea in edge-cases can be achieved with a __post_init__ rather than writing a literal __init__.

17 Apr 2025 10:35pm GMT

01 Apr 2025

feedPlanet Twisted

Glyph Lefkowitz: A Bigger Database

A Database File

When I was 10 years old, and going through a fairly difficult time, I was lucky enough to come into the possession of a piece of software called Claris FileMaker Pro™.

FileMaker allowed its users to construct arbitrary databases, and to associate their tables with a customized visual presentation. FileMaker also had a rudimentary scripting language, which would allow users to imbue these databases with behavior.

As a mentally ill pre-teen, lacking a sense of control over anything or anyone in my own life, including myself, I began building a personalized database to catalogue the various objects and people in my immediate vicinity. If one were inclined to be generous, one might assess this behavior and say I was systematically taxonomizing the objects in my life and recording schematized information about them.

As I saw it at the time, if I collected the information, I could always use it later, to answer questions that I might have. If I didn't collect it, then what if I needed it? Surely I would regret it! Thus I developed a categorical imperative to spend as much of my time as possible collecting and entering data about everything that I could reasonably arrange into a common schema.

Having thus summoned this specter of regret for all lost data-entry opportunities, it was hard to dismiss. We might label it "Claris's Basilisk", for obvious reasons.

Therefore, a less-generous (or more clinically-minded) observer might have replaced the word "systematically" with "obsessively" in the assessment above.

I also began writing what scripts were within my marginal programming abilities at the time, just because I could: things like computing the sum of every street number of every person in my address book. Why was this useful? Wrong question: the right question is "was it possible" to which my answer was "yes".

If I was obliged to collect all the information which I could observe - in case it later became interesting - I was similarly obliged to write and run every program I could. It might, after all, emit some other interesting information.

I was an avid reader of science fiction as well.

I had this vague sense that computers could kind of think. This resulted in a chain of reasoning that went something like this:

  1. human brains are kinda like computers,
  2. the software running in the human brain is very complex,
  3. I could only write simple computer programs, but,
  4. when you really think about it, a "complex" program is just a collection of simpler programs

Therefore: if I just kept collecting data, collecting smaller programs that could solve specific problems, and connecting them all together in one big file, eventually the database as a whole would become self-aware and could solve whatever problem I wanted. I just needed to be patient; to "keep grinding" as the kids would put it today.

I still feel like this is an understandable way to think - if you are a highly depressed and anxious 10-year-old in 1990.

Anyway.


35 Years Later

OpenAI is a company that produces transformer architecture machine learning generative AI models; their current generation was trained on about 10 trillion words, obtained in a variety of different ways from a large variety of different, unrelated sources.

A few days ago, on March 26, 2025 at 8:41 AM Pacific Time, Sam Altman took to "X™, The Everything App™," and described the trajectory of his career of the last decade at OpenAI as, and I quote, a "grind for a decade trying to help make super-intelligence to cure cancer or whatever" (emphasis mine).

I really, really don't want to become a full-time AI skeptic, and I am not an expert here, but I feel like I can identify a logically flawed premise when I see one.

This is not a system-design strategy. It is a trauma response.

You can't cure cancer "or whatever". If you want to build a computer system that does some thing, you actually need to hire experts in that thing, and have them work to both design and validate that the system is fit for the purpose of that thing.


Aside: But... are they, though?

I am not an oncologist; I do not particularly want to be writing about the specifics here, but, if I am going to make a claim like "you can't cure cancer this way" I need to back it up.

My first argument - and possibly my strongest - is that cancer is not cured.

QED.

But I guess, to Sam's credit, there is at least one other company partnering with OpenAI to do things that are specifically related to cancer. However, that company is still in a self-described "initial phase" and it's not entirely clear that it is going to work out very well.

Almost everything I can find about it online was from a PR push in the middle of last year, so it all reads like a press release. I can't easily find any independently-verified information.

A lot of AI hype is like this. A promising demo is delivered; claims are made that surely if the technology can solve this small part of the problem now, within 5 years surely it will be able to solve everything else as well!

But even the light-on-content puff-pieces tend to hedge quite a lot. For example, as the Wall Street Journal quoted one of the users initially testing it (emphasis mine):

The most promising use of AI in healthcare right now is automating "mundane" tasks like paperwork and physician note-taking, he said. The tendency for AI models to "hallucinate" and contain bias presents serious risks for using AI to replace doctors. Both Color's Laraki and OpenAI's Lightcap are adamant that doctors be involved in any clinical decisions.

I would probably not personally characterize "'mundane' tasks like paperwork and … note-taking" as "curing cancer". Maybe an oncologist could use some code I developed too; even if it helped them, I wouldn't be stealing valor from them on the curing-cancer part of their job.

Even fully giving it the benefit of the doubt that it works great, and improves patient outcomes significantly, this is medical back-office software. It is not super-intelligence.

It would not even matter if it were "super-intelligence", whatever that means, because "intelligence" is not how you do medical care or medical research. It's called "lab work" not "lab think".

To put a fine point on it: biomedical research fundamentally cannot be done entirely by reading papers or processing existing information. It cannot even be done by testing drugs in computer simulations.

Biological systems are enormously complex, and medical research on new therapies inherently requires careful, repeated empirical testing to validate the correspondence of existing research with reality. Not "an experiment", but a series of coordinated experiments that all test the same theoretical model. The data (which, in an LLM context, is "training data") might just be wrong; it may not reflect reality, and the only way to tell is to continuously verify it against reality.

Previous observations can be tainted by methodological errors, by data fraud, and by operational mistakes by practitioners. If there were a way to do verifiable development of new disease therapies without the extremely expensive ladder going from cell cultures to animal models to human trials, we would already be doing it, and "AI" would just be an improvement to efficiency of that process. But there is no way to do that and nothing about the technologies involved in LLMs is going to change that fact.


Knowing Things

The practice of science - indeed any practice of the collection of meaningful information - must be done by intentionally and carefully selecting inclusion criteria, methodically and repeatedly curating our data, building a model that operates according to rules we understand and can verify, and verifying the data itself with repeated tests against nature. We cannot just hoover up whatever information happens to be conveniently available with no human intervention and hope it resolves to a correct model of reality by accident. We need to look where the keys are, not where the light is.

Piling up more and more information in a haphazard and increasingly precarious pile will not allow us to climb to the top of that pile, all the way to heaven, so that we can attack and dethrone God.

Eventually, we'll just run out of disk space, and then lose the database file when the family gets a new computer anyway.


Acknowledgments

Thank you to my patrons who are supporting my writing on this blog. If you like what you've read here and you'd like to read more of it, or you'd like to support my various open-source endeavors, you can support my work as a sponsor! Special thanks also to Itamar Turner-Trauring and Thomas Grainger for pre-publication feedback on this article; any errors of course remain my own.

01 Apr 2025 12:47am GMT

15 Jan 2025

feedPlanet Twisted

Glyph Lefkowitz: Small PINPal Update

Today on stream, I updated PINPal to fix the memorization algorithm.

If you haven't heard of PINPal before, it is a vault password memorization tool. For more detail on what that means, you can check it out the README, and why not give it a ⭐ while you're at it.


As I started writing up an update post I realized that I wanted to contextualize it a bit more, because it's a tool I really wish were more popular. It solves one of those small security problems that you can mostly ignore, right up until the point where it's a huge problem and it's too late to do anything about it.

In brief, PINPal helps you memorize new secure passcodes for things you actually have to remember and can't simply put into your password manager, like the password to your password manager, your PC user account login, your email account1, or the PIN code to your phone or debit card.

Too often, even if you're properly using a good password manager for your passwords, you'll be protecting it with a password optimized for memorability, which is to say, one that isn't random and thus isn't secure. But I have also seen folks veer too far in the other direction, trying to make a really secure password that they then forget right after switching to a password manager. Forgetting your vault password can also be a really big deal, making you do password resets across every app you've loaded into it so far, so having an opportunity to practice it periodically is important.

PINPal uses spaced repetition to ensure that you remember the codes it generates.

While periodic forced password resets are a bad idea, if (and only if!) you can actually remember the new password, it is a good idea to get rid of old passwords eventually - like, let's say, when you get a new computer or phone. Doing so reduces the risk that a password stored somewhere on a very old hard drive or darkweb data dump is still floating around out there, forever haunting your current security posture. If you do a reset every 2 years or so, you know you've never got more than 2 years of history to worry about.

PINPal is also particularly secure in the way it incrementally generates your password; the computer you install it on only ever stores the entire password in memory when you type it in. It stores even the partial fragments that you are in the process of memorizing using the secure keyring module, avoiding plain-text whenever possible.


I've been using PINPal to generate and memorize new codes for a while, just in case2, and the change I made today was because encountered a recurring problem. The problem was, I'd forget a token after it had been hidden, and there was never any going back. The moment that a token was hidden from the user, it was removed from storage, so you could never get a reminder. While I've successfully memorized about 10 different passwords with it so far, I've had to delete 3 or 4.

So, in the updated algorithm, the visual presentation now hides tokens in the prompt several memorizations before they're removed. Previously, if the password you were generating was 'hello world', you'd see hello world 5 times or so, times, then •••• world; if you ever got it wrong past that point, too bad, start over. Now, you'll see hello world, then °°°° world, then after you have gotten the prompt right without seeing the token a few times, you'll see •••• world after the backend has locked it in and it's properly erased from your computer.

If you get the prompt wrong, breaking your streak reveals the recently-hidden token until you get it right again. I also did a new release on that same livestream, so if this update sounds like it might make the memorization process more appealing, check it out via pip install pinpal today.

Right now this tool is still only extremely for a specific type of nerd - it's command-line only, and you probably need to hand-customize your shell prompt to invoke it periodically. But I'm working on making it more accessible to a broader audience. It's open source, of course, so you can feel free to contribute your own code!

Acknowledgments

Thank you to my patrons who are supporting my writing on this blog. If you like what you've read here and you'd like to read more things like it, or you'd like to support my various open-source endeavors, you can support my work as a sponsor!


  1. Your email account password can be stored in your password manager, of course, but given that email is the root-of-trust reset factor for so many things, being able to remember that password is very helpful in certain situations.

  2. Funny story: at one point, Apple had an outage which made it briefly appear as if a lot of people needed to reset their iCloud passwords, myself included. Because I'd been testing PINPal a bunch, I actually had several highly secure random passwords already memorized. It was a strange feeling to just respond to the scary password reset prompt with a new, highly secure password and just continue on with my day secure in the knowledge I wouldn't forget it.

15 Jan 2025 12:54am GMT

29 Nov 2024

feedPlanet Plone - Where Developers And Integrators Write

Maurits van Rees: Lightning talks Friday

Bonnie Tyler Sprint

On 12 August 2026 there is a total solar eclipse that can be seen from Valencia, Spain. So we organise a sprint there.

This conference

We had 291 participants, 234 in person and 57 online. 13 Brazilian states (that is all of them), 14 countries.

24.5 percent women, was 13% in 2013, so that has gone up, but we are not there yet. Thank you to PyLadies and Django Girls for making this happen.

We had more than 80 presenters, about 30 lightning talks, lots of talk in the hall ways.

Thanks also to the team!

Ramiro Luz: Yoga time

Yoga exercise.

Rikupekka: University case student portal

We have a student portal at the university. But mostly:

Welcome to Jyväskylä university in Finald for Plone conference 2025, October 13-19!

Jakob: Beethovensprint

26-30 mei 2025 in Bonn, Duitsland.

Afterwards, on May 30 and June 1 there will be FedCon in Bonn, a SciFi convention.

Piero/Victor: BYOUI

Add-ons first development with @plone/registry. See https://plone-registry.readthedocs.io/

It allows for development that is framework agnostic, so it is not only for Plone. It is around configuration that can be extended and injected, which is tricky in most javascript frameworks.

Imagine it.

Ana Dulce: 3D printing

For a difficult model I had trust the process, it took a week, but it worked.

Renan & Iza: Python Brasil

We organised the Python Brasil conference from 16 to 23 October this year in Rio de Janeiro.

Next year 21-27 October in São Paulo.

Erico: Python Cerrado

31 July to 2 August 2025 is the next Python Cerrado conference.

29 Nov 2024 10:25pm GMT

Maurits van Rees: Paul Roeland: The value of longevity

Link to talk information on Plone conference website.

I work for the Clean Clothes Campaign: https://cleanclothes.org/

After three large disasters in factories in 2012 and 2013 with over 1000 deaths, it took three years to get an agreement with clothes manufacturers to get 30 million dollar compensation. It does not bring lives back, but it helps the survivors.

See Open Supply Hub for open data that we collected, for checking which brands are produced in which factories.

Documenting history matters. Stories must be told.

The global closing industry is worth around 1.8 trillion dollars, in a country that would put them on the 12th place in the world. 75 million workers.

Our strongest weapon: backlinks. We have links from OECD, UN, wikipedia, school curriculum, books. Especially those last two don't change ever, so you should never change urls.

Plone: enable the sitemap, please, why not by default? Create a good robots.txt. I weekly check Google Search console, looking for broken links. Tag early, tag often, great tool, even if you have an AI do it.

Our website: started 1998 written in Notepad, 2004 Dreamweaver, 2006 Bluefish, 2010 Joomla, 2013 Plone 4, 2020 Castle CMS (opinionated distribution of Plone, but does not really exist anymore) 2024 Plone 6 with Volto Light Theme (work in progress). Thank you kitconcept for all the help, especially Jonas.

Migrations are painful. Along the years we used wget to csv to SQL to csv, Python script, "Franken-mogrifier", collective.exportimport.

Lessons learned: stable urls are awesome, migrations are painful. Please don't try to salvage CSS from your old site, just start fresh in your new system. Do not try to migrate composite pages or listings.

What if your website does not provide an export? Use wget, still works and is better than httrack. sed/awk/regex are your friend. archivebox (WARC).

Document your steps for your own sanity.

To manage json, jq or jello can be used. sq is a Swiss knife for json/sql/csv. emuto is a hybrid between jq and GraphQL.

Normalize import/export. We have `plone.exportimport` in core now.

In the future I would like a plone exporter script that accepts a regex and exports only matching pages. Switch backends: ZODB, relstorage, nick, quantum-db. Sitewide search/replace/sed. Sneakernet is useful in difficult countries where you cannot send data over the internet: so export to a usb stick.

A backup is only a backup if it regularly gets restored so you know that it works.

  • Keeping content and URL stability is a superpower.
  • Assuming that export/import/backup/restore/migration are rare occurrences, is wrong.
  • Quick export/import is very useful.

Do small migrations, treat it as maintenance. Don't be too far behind. Large migrations one every five years will be costly. Do a small migration every year. Do your part. Clients should also do their part, by budgeting this yearly. That is how budgeting works. Use every iteration to review custom code.

Make your sites live long and prosper.

29 Nov 2024 8:58pm GMT

Maurits van Rees: Fred van Dijk: Run Plone in containers on your own cluster with coolify.io

Link to talk information on Plone conference website.

Sorry, I ran out of time trying to set up https://coolify.io

So let's talk about another problem. Running applications (stacks) in containers is the future. Well: abstraction and isolation is the future, and containers is the current phase.

I am on the Plone A/I team, with Paul, Kim, Erico. All senior sysadmins, so we kept things running. In 2022 we worked on containerisation. Kubernetes was the kool kid then, but Docker Swarm was easier. Checkout Erico's training with new cookieplone templates.

Doing devops well is hard. You have a high workload, but still need to keep learning new stuff to keep up with what is changing.

I want to plug Coolify, which is a full open source product. "Self-hosting with super powers." The main developer, Andras Bacsal, believes in open source and 'hates' pay by usage cloud providers with a vengeance.

Coolify is still docker swarm. We also want Kubernetes support. But we still need sysadmins. Someone will still need to install coolify, and keep it updated.

I would like to run an online DevOps course somewhere January-March 2025. 4-6 meetings of 2 hours, maybe Friday afternoon. Talk through devops and sysadmin concepts, show docker swarm, try coolify, etc.

29 Nov 2024 7:58pm GMT