06 Jul 2025
Planet Python
Anwesha Das: Creating Pull request with GitHub Action
---
name: Testing Gha
on:
workflow_dispatch:
inputs:
GIT_BRANCH:
description: The git branch to be worked on
required: true
jobs:
test-pr-creation:
name: Creates test PR
runs-on: ubuntu-latest
permissions:
pull-requests: write
contents: write
env:
GIT_BRANCH: ${{ inputs.GIT_BRANCH }}
steps:
- uses: actions/checkout@v4
- name: Updates README
run: echo date >> README.md
- name: Set up git
run: |
git switch --create "${GIT_BRANCH}"
ACTOR_NAME="$(curl -s https://api.github.com/users/"${GITHUB_ACTOR}" | jq --raw-output &apos.name // .login&apos)"
git config --global user.name "${ACTOR_NAME}"
git config --global user.email "${GITHUB_ACTOR_ID}+${GITHUB_ACTOR}@users.noreply.github.com"
- name: Add README
run: git add README.md
- name: Commit
run: >-
git diff-index --quiet HEAD ||
git commit -m "test commit msg"
- name: Push to the repo
run: git push origin "${GIT_BRANCH}"
- name: Create PR as draft
env:
GITHUB_TOKEN: ${{ github.token }}
run: >-
gh pr create
--draft
--base main
--head "${GIT_BRANCH}"
--title "test commit msg"
--body "pr body"
- name: Retrieve the existing PR URL
id: existing-pr
env:
GITHUB_TOKEN: ${{ github.token }}
run: >
echo -n pull_request_url= >> "${GITHUB_OUTPUT}"
gh pr view
--json &aposurl&apos
--jq &apos.url&apos
--repo &apos${{ github.repository }}&apos
&apos${{ env.GIT_BRANCH }}&apos
>> "${GITHUB_OUTPUT}"
- name: Select the actual PR URL
id: pr
env:
GITHUB_TOKEN: ${{ github.token }}
run: >
echo -n pull_request_url=
>> "${GITHUB_OUTPUT}"
echo &apos${{steps.existing-pr.outputs.pull_request_url}}&apos
>> "${GITHUB_OUTPUT}"
- name: Log the pull request details
run: >-
echo &aposPR URL: ${{ steps.pr.outputs.pull_request_url }}&apos | tee -a "${GITHUB_STEP_SUMMARY}"
- name: Instruct the maintainers to trigger CI by undrafting the PR
env:
GITHUB_TOKEN: ${{ github.token }}
run: >-
gh pr comment
--body &aposPlease mark the PR as ready for review to trigger PR checks.&apos
--repo &apos${{ github.repository }}&apos
&apos${{ steps.pr.outputs.pull_request_url }}&apos
The above is an example of how to create a draft PR
via GitHub Actions. We need to give permissions to the GitHub action to create PR in a repository (workflow permissions
in the settings).
Hopefully, this blogpost will help my future self.
06 Jul 2025 6:22pm GMT
Django community aggregator: Community blog posts
Using Google Consent Mode v2 on a Django Website
A decade ago, adding website analytics was simple: you'd just paste a JavaScript snippet from Google Analytics, and that was it.
But things changed. As people became more concerned about their privacy, countries introduced strict privacy laws-like GDPR in the European Union, PIPEDA in Canada, and APPI in Japan. These laws gave users more control over their data and placed new responsibilities on us developers.
One of those responsibilities is showing Cookie Consent banners and respecting users' choices before loading any tracking scripts.
Today, if you want to add Google Analytics to your website, it's not just about copying a script. You need to understand Google Tag Manager (GTM)-a tool that lets you manage what scripts run on your site and when, using a web-based dashboard.
When you add Google Analytics as a Google tag through GTM, it doesn't automatically start collecting data. It waits for a signal called Consent Mode, which tells Google whether the user has accepted or denied tracking. This signal must be sent from your website to GTM as an event.
That's where your Cookie Consent widget comes in. For Django websites, I created Django GDPR Cookie Consent, which lets you manage user consent and send the proper signals to Google.
In this article, I'll show you how to make all three-Django GDPR Cookie Consent, Google Tag Manager, and Google Consent Mode v2-work together smoothly.
1. Set up Google Analytics
Go to Google Analytics and set up a property for your website. You'll need to know your Google Analytics 4 Measurement ID, e.g., G-XXXXXXX
for Google Tag Manager.
2. Set up Google Tag Manager
You'll use the Google Tag Manager (GTM) web interface - to add and manage all your tracking scripts (called tags). You won't have to edit your website code every time you want to change something - GTM handles it all from one place.
Check the GTM container id bounded to your website, e.g., GTM-XXXXXXX
. You'll need it later for the scripts.
Here's how to configure GTM to load scripts only after the user gives consent, using the signals from Google Consent Mode.
GA4 Configuration Tag
This tag initializes Google Analytics 4 (GA4) tracking.
- Tag Type: Google Tag
- Measurement ID: Your GA4 ID (
G-XXXXXXX
) - Trigger:
Consent Initialization - All Pages
(this runs very early, before other tags)
This setup allows GA4 to start in a consent-aware way. It reads the default denied state at page load and will automatically adjust when the user accepts cookies later.
GA4 Event Tags
These are additional GA4 tags to track specific actions (like form submissions or button clicks).
- Tag Type: Google Analytics: GA4 Event
- Trigger: Choose based on the action (e.g., form submit, button click)
You don't need to check for consent manually here - GA4 automatically tracks or holds data based on the user's consent provided through Google Consent Mode.
Google Ads Tags
If you are using Google Ads, set these for conversion tracking and remarketing (showing ads to users who visited your site):
- Tag Type: Google Ads Conversion Tracking or Google Ads Remarketing
- Trigger: Set this to fire after a successful action (like a purchase or sign-up)
These tags respect consent choices, like whether the user allowed ad_storage
.
3. Set up Django GDPR Cookie Consent
Download and install the package
Get Django GDPR Cookie Consent from Gumroad.
Create a directory private_wheels/
in your project's repository and add the wheel file of the app there.
Link to this file in your requirements.txt
:
Django==5.2
file:./private_wheels/django_gdpr_cookie_consent-4.1.2-py2.py3-none-any.whl
Install the pip requirements from the requirements.txt
file into your project's virtual environment:
(venv)$ pip install -r requirements.txt
Add the app to INSTALLED_APPS
INSTALLED_APPS = [
# …
"gdpr_cookie_consent",
# …
]
Add the context processor
Add gdpr_cookie_consent
to context processors in your Django settings:
TEMPLATES = [
{
# …
"OPTIONS": {
"context_processors": [
# …
"gdpr_cookie_consent.context_processors.gdpr_cookie_consent",
],
},
},
]
Add URL path to urlpatterns
from django.urls import path, include
urlpatterns = [
# …
path(
"cookies/",
include("gdpr_cookie_consent.urls", namespace="cookie_consent"),
),
# …
]
Prepare cookie consent configuration
Create COOKIE_CONSENT_SETTINGS
configuration in your Django project settings with these cookie sections:
- Essential (strictly necessary) for cookies related to Django sessionid, CSRF token, and cookie consent,
- Analytics (optional) for website usage statistics with Google Analytics,
- Marketing (optional) for tracking cross-website ad statistics with Google Ads.
from django.utils.translation import gettext_lazy as _
COOKIE_CONSENT_SETTINGS = {
"base_template_name": "base.html",
"description_template_name": "gdpr_cookie_consent/descriptions/what_are_cookies.html",
"dialog_position": "center",
"consent_cookie_max_age": 60 * 60 * 24 * 30 * 6,
"sections": [
{
"slug": "essential",
"title": _("Essential Cookies"),
"required": True,
"preselected": True,
"summary": _(
"These cookies are always on, as they're essential for making this website work, and making it safe. Without these cookies, services you've asked for can't be provided."),
"description": _(
"These cookies are always on, as they're essential for making this website work, and making it safe. Without these cookies, services you've asked for can't be provided."),
"providers": [
{
"title": _("This website"),
"cookies": [
{
"cookie_name": "sessionid",
"duration": _("2 Weeks"),
"description": _(
"Session ID used to authenticate you and give permissions to use the site."),
"domain": ".example.com",
},
{
"cookie_name": "csrftoken",
"duration": _("Session"),
"description": _(
"Security token used to ensure that no hackers are posting forms on your behalf."),
"description_template_name": "",
"domain": ".example.com",
},
{
"cookie_name": "cookie_consent",
"duration": _("6 Years"),
"description": _("Settings of Cookie Consent preferences."),
"description_template_name": "",
"domain": ".example.com",
},
]
},
],
},
{
"slug": "analytics",
"title": _("Analytics Cookies"),
"required": False,
"summary": _(
"These cookies help us analyse how many people are using this website, where they come from and how they're using it. If you opt out of these cookies, we can't get feedback to make this website better for you and all our users."),
"description": _(
"These cookies help us analyse how many people are using this website, where they come from and how they're using it. If you opt out of these cookies, we can't get feedback to make this website better for you and all our users."),
"providers": [
{
"title": _("Google Analytics"),
"description": _("Google Analytics is used to track website usage statistics."),
"description_template_name": "",
"cookies": [
{
"cookie_name": "_ga",
"duration": _("2 Years"),
"description": _("Used to distinguish users."),
"description_template_name": "",
"domain": ".example.com",
},
{
"cookie_name": "_gid",
"duration": _("24 Hours"),
"description": _("Used to distinguish users."),
"description_template_name": "",
"domain": ".example.com",
},
{
"cookie_name": "_ga_*",
"duration": _("2 Years"),
"description": _("Used to persist session state."),
"description_template_name": "",
"domain": ".example.com",
},
{
"cookie_name": "_gat_gtag_UA_*",
"duration": _("1 Minute"),
"description": _("Stores unique user ID."),
"description_template_name": "",
"domain": ".example.com",
},
]
},
],
},
{
"slug": "marketing",
"title": _("Marketing Cookies"),
"required": False,
"summary": _(
"These cookies are set by our advertising partners to track your activity and show you relevant ads on other sites as you browse the internet."),
"description": _(
"These cookies are set by our advertising partners to track your activity and show you relevant ads on other sites as you browse the internet."),
"providers": [
{
"title": _("Google Ads"),
"description": _("These cookies are related to Google Ads conversion tracking."),
"description_template_name": "",
"cookies": [
{
"cookie_name": "_gac_gb_*",
"duration": _("90 Days"),
"description": _(
"Contains campaign related information. If you have linked your Google Analytics and Google Ads accounts, Google Ads website conversion tags will read this cookie unless you opt-out."),
"description_template_name": "",
"domain": ".example.com",
},
]
},
],
},
]
}
Include the widget to your base.html
template
Load the CSS somewhere in the <head>
section:
<link href="{% static 'gdpr-cookie-consent/css/gdpr-cookie-consent.css' %}" rel="stylesheet" />
Include the widget just before the closing </body>
tag:
{% include "gdpr_cookie_consent/includes/cookie_consent.html" %}
Link to the cookie management view, for example, in the website's footer:
{% url "cookie_consent:cookies_management" as cookie_management_url %}
<a href="{{ cookie_management_url }}" rel="nofollow">
{% trans "Manage Cookies" %}
</a>
Add GTM and Google Consent Mode snippets to your base.html
template:
Add these two scripts in the <head>
section:
<script nonce="{{ request.csp_nonce }}">
window.dataLayer = window.dataLayer || [];
function gtag(){dataLayer.push(arguments);}
gtag('consent', 'default', {
'analytics_storage': '{% if "analytics" in cookie_consent_controller.checked_sections %}granted{% else %}denied{% endif %}',
'ad_storage': '{% if "marketing" in cookie_consent_controller.checked_sections %}granted{% else %}denied{% endif %}',
'ad_user_data': '{% if "marketing" in cookie_consent_controller.checked_sections %}granted{% else %}denied{% endif %}',
'ad_personalization': '{% if "marketing" in cookie_consent_controller.checked_sections %}granted{% else %}denied{% endif %}',
'functionality_storage': 'denied',
'personalization_storage': 'denied',
'security_storage': 'granted' // usually okay to grant by default
});
</script>
<!-- Google Tag Manager (head) -->
<script nonce="{{ request.csp_nonce }}">(function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':
new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],
j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src=
'https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);
})(window,document,'script','dataLayer','GTM-XXXXXXX');</script>
Add this iframe in the beginning of the <body>
tag:
<!-- Google Tag Manager (body) -->
<noscript><iframe src="https://www.googletagmanager.com/ns.html?id=GTM-XXXXXXX"
height="0" width="0" style="display:none;visibility:hidden"></iframe></noscript>
Replace GTM-XXXXXXX
with your real GTM container ID.
Create JavaScript custom event handlers for granding and denying consent
Create a custom JavaScript file main.js
and include it in the base.html
template somewhere before the closing </body>
tag:
<script src="{% static 'site/js/main.js' %}"></script>
In the JavaScript file, add these custom event handlers that will grant or deny cookie consent at Google-that's a new feature in Django GDPR Cookie Consent v4:
document.addEventListener('grantGDPRCookieConsent', (e) => {
console.log(`${e.detail.section} cookies granted`);
if (e.detail.section === 'analytics') {
gtag('consent', 'update', {
'analytics_storage': 'granted'
});
dataLayer.push({'event': 'analytics_consent_granted'});
} else if (e.detail.section === 'marketing') {
gtag('consent', 'update', {
'ad_storage': 'granted',
'ad_user_data': 'granted',
'ad_personalization': 'granted'
});
dataLayer.push({'event': 'marketing_consent_granted'});
}
});
document.addEventListener('denyGDPRCookieConsent', (e) => {
console.log(`${e.detail.section} cookies denied`);
if (e.detail.section === 'analytics') {
gtag('consent', 'update', {
'analytics_storage': 'denied'
});
dataLayer.push({'event': 'analytics_consent_denied'});
} else if (e.detail.section === 'marketing') {
gtag('consent', 'update', {
'ad_storage': 'denied',
'ad_user_data': 'denied',
'ad_personalization': 'denied'
});
dataLayer.push({'event': 'marketing_consent_denied'});
}
});
Check if your setup is correct
Check the correctness of your configuration with the following:
(venv)$ python manage.py check gdpr_cookie_consent
4. Test and debug with GTM preview mode
- Deploy your project to production.
- Install Tag Assistant Chrome extension.
- Enable Preview in Google Tag Manager dashboard.
- Visit your website via the preview mode.
- Accept cookies.
- You should see these events:
analytics_consent_granted
- GA4 tag should fire.marketing_consent_granted
- Google Ads tag should fire.
- Check which tags fired - they should match the user's choices.
- The Consent tab of each event should show the correct preferred consent choices.
- Google Analytics realtime report should track you only when the consent was given.
How it works
When a user visits your website, the default cookie consent mode is set based on their previously saved preferences. If it's their first visit, consent will default to "denied."
When the user sets or updates their cookie preferences-via the modal dialog or the preferences form-your consent widget will fire custom JavaScript events grantGDPRCookieConsent
or denyGDPRCookieConsent
available since Django GDPR Cookie Consent v4.
Your JavaScript handler will listen for these events, update the Google Consent Mode accordingly, and send the updated values to Google Tag Manager.
Based on those values, Google Tag Manager will decide whether to activate tracking tags such as Google Analytics and Google Ads. These tags can then track usage statistics and, if allowed, ad-related cross-site behavior.
Final words
Now you should be all set. Google Analytics should respect user's privacy based on the choices in Cookie Consent widget, provided by Django GDPR Cookie Consent. Your website will be compliant with Google Consent Mode and will fire analyticy and marketing tags only after consent.
06 Jul 2025 5:00pm GMT
05 Jul 2025
Django community aggregator: Community blog posts
Weeknotes (2025 week 27)
Weeknotes (2025 week 27)
I have again missed a few weeks, so the releases section will be longer than usual since it covers six weeks.
django-prose-editor
I have totally restructured the documentation to make it clearer. The configuration chapter is shorter and more focussed, and the custom extensions chapter actually shows all required parts now.
The most visible change is probably the refactored menu system. Extensions now have an addMenuItems
method where they can add their own buttons to the menu bar. I wanted to do this for a long time but have only just this week found a way to achieve this which I actually like.
I've reported a bug to Tiptap where a .can()
chain always succeeded even though the actual operation could fail (#6306).
Finally, I have also switched from esbuild to rslib; I'm a heavy user of rspack anyway and am more at home with its configuration.
django-content-editor
The 7.4 release mostly contains minor changes, one new feature is the content_editor.admin.RefinedModelAdmin
class. It includes tweaks to Django's standard behavior such as supporting a Ctrl-S
shortcut for the "Save and continue editing" functionality and an additional warning when people want to delete inlines and instead delete the whole object. This seems to happen often even though people are shown the full list of objects which will be deleted.
Releases
- django-prose-editor 0.15: See above
- django-content-editor 7.4.1: See above.
- django-json-schema-editor 0.5.1: Now supports customizing the prose editor configuration (when using
format: "prose"
) and also includes validation support for foreign key references in the JSON data. - html-sanitizer 2.6: The sanitizer started crashing when used with
lxml>=6
when being fed strings with control characters inside. - django-recent-objects 0.1.1: Changed the code to use
UNION ALL
instead ofUNION
when determining which objects to fetch from all tables. - feincms3 5.4.1: Added experimental support for rendering sections. Sections can be nested, so they are more powerful than subregions. Also, added warnings when registering plugin proxies for rendering and fetching, since that will mostly likely lead to duplicated objects in the rendered output.
- django-tree-queries 0.20: Added
tree_info
andrecursetree
template tags. Optimized the performance by avoiding the rank table if easily possible. Added stronger recommendations to pre-filter the table using.tree_filter()
or.tree_exclude()
when working with small subsets of large datasets. - django-ckeditor 6.7.3: Added a trove identifeir for recent Django versions. It still works fine, but it's deprecated and shouldn't be used since it still uses the unmaintained CKEditor 4 line (since we do not ship the commercial LTS version).
- feincms3-cookiecontrol 1.6.1: Golfed the generated CSS and JavaScript bundle down to below 4000 bytes again, including the YouTube/Vimeo/etc. wrapper which only loads external content when users consent.
05 Jul 2025 5:00pm GMT
04 Jul 2025
Django community aggregator: Community blog posts
Django News - Django 2024 Annual Impact Report and Django 5.2.4 - Jul 4th 2025
News
Django 5.2.4 bugfix release
Django 5.2.4 fixes regressions in media type preference, JSON null serialization, and composite primary key lookups to improve framework robustness.
Django Joins curl in Pushing Back on AI Slop Security Report...
Django updates its security guidelines to mandate verified AI-assisted vulnerability reports, reducing fabricated submissions and ensuring human oversight in vulnerability triage.
W2D Special Event Station announcement
Amateur radio operators or those interested who also use Django - special event callsign W2D has been reserved to celebrate Django's 20th birthday.
Django Software Foundation
Django's 2024 Annual Impact Report
Django Software Foundation's annual impact report details community milestones, funding initiatives, and strategic support to drive continued growth and innovation in Django development.
Django's Ecosystem
The Django project now has an ecosystem page featuring third-party apps and add-ons.
Updates to Django
Today 'Updates to Django' is presented by Pradhvan from the Djangonaut Space!🚀
Last week we had 8 pull requests merged into Django by 7 different contributors.
This week's Django highlights 🦄
Content Security Policy lands in Django core: built-in CSP middleware and nonce support finally arrives, closing the ticket #15727. Shoutout to Rob Hudson for finally bringing CSP to Django core.
Enhanced MariaDB GIS support: Added __coveredby
lookups plus Collect
, GeoHash
, and IsValid
functions for MariaDB 12.0.1+.
Admin messaging gets visual polish: INFO and DEBUG messages now have proper styles and icons in the admin interface, closing ticket #36386.
Django Newsletter
Sponsored Link 1
Scout Monitoring: Logs, Traces, Error (coming soon). Made for devs who own products, not just tickets.
Articles
Drag and Drop and Django
Integrates custom HTML drag and drop components with Django template data and CSRF tokens to create interactive scheduling interfaces using fetch and custom elements.
Hosting your Django sites with Coolify
This post details how a complex, self-managed Django deployment stack was replaced with Coolify, an open-source self-hosted PaaS that offers zero downtime deployments, built-in backups, and a Git-based workflow, all while running on personal hardware.
From Rock Bottom to Production Code
Matthew Raynor's transformation from personal adversity to constructing production-level Django applications using full-stack development, custom authentication, and integrated AI features.
Pycliché & Djereo
Starting a Python or Django project? Steal Alberto Morón Hernández's templates!: pycliché & djereo, opinionated project templates for Python & Django, respectively.
Django Fellow Report
Django Fellow Report - Sarah Boyce
5 tickets triaged, 22 reviewed, 1 authored, and set up mssql-django
to test a ticket.
Django Fellow Report - Natalia Bidart
3 tickets triaged, 7 reviewed, 1 authored, and other misc.
Events
Visa to DjangoCon Africa
From the DSF President, tips on how to get a visa application done for DjangoCon Africa in Tanzania 🇹🇿.
DjangoCon Videos
Passkeys in Django: the best of all possible worlds - Tom Carrick
Secure, accessible, usable - pick any three.
Why compromise when you can have it all? This talks shows how easy it is to integrate support for passkeys (Face ID, fingerprint scans, etc.) into your Django app in almost no time at all.
100 Million Parking Transactions Per Year with Django - Wouter Steenstra
For several Dutch municipalities, Django applications power the monitoring of both on-street and off-street parking transactions. What started as a straightforward tool for extracting data from parking facilities has evolved into a robust ETL platform with a feature-rich dashboard. This talk delves into how Django remains the backbone of our operations and why it continues to be the foundation of our business success.
How we make decisions in Django - Carlton Gibson
Django is an inclusive community. We seek to invite contributions, and aim for consensus in our decision making. As the project has grown - and as with all large open source projects - that's led to difficulties, as even simple proposals get drawn out into long discussions that suck time, energy, and enthusiasm from all. It's time we refreshed our approach. We're going to look at how we got here, what we need to maintain, and how we can move forwards towards a better process.
Podcasts
Episode 9: with Tamara Atanasoska
Learn about Tamara's journey. Tamara has been contributing to open source projects since 2012. She participated in Google Summer of Code to contribute to projects like Gnome and e-cidadania.
Django News Jobs
Senior Backend Python Developer at Gravitas Recruitment 🆕
Senior/Staff Software Engineer at Clerq
Full Stack Software Engineer at Switchboard
Django Fellow at Django Software Foundation
Senior Software Engineer at Simons Foundation
Django Newsletter
Projects
wsvincent/official-django-polls-tutorial
Source code for the official Django Polls tutorial.
justinmayer/typogrify
A set of Django template filters to make caring about typography on the web a bit easier.
This RSS feed is published on https://django-news.com/. You can also subscribe via email.
04 Jul 2025 3:00pm GMT
Planet Python
Real Python: The Real Python Podcast – Episode #256: Solving Problems and Saving Time in Chemistry With Python
What motivates someone to learn how to code as a scientist? How do you harness the excitement of solving problems quickly and make the connection to the benefits of coding in your scientific work? This week on the show, we speak with Ben Lear and Christopher Johnson about their book "Coding For Chemists."
[ Improve Your Python With 🐍 Python Tricks 💌 - Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
04 Jul 2025 12:00pm GMT
PyPy: PyPy v7.3.20 release
PyPy v7.3.20: release of python 2.7, 3.11
The PyPy team is proud to release version 7.3.20 of PyPy after the previous release on Feb 26, 2025. The release fixes some subtle bugs in ctypes and OrderedDict
and makes PyPy3.11 compatible with an upcoming release of Cython.
The release includes two different interpreters:
-
PyPy2.7, which is an interpreter supporting the syntax and the features of Python 2.7 including the stdlib for CPython 2.7.18+ (the
+
is for backported security updates) -
PyPy3.11, which is an interpreter supporting the syntax and the features of Python 3.11, including the stdlib for CPython 3.11.13.
The interpreters are based on much the same codebase, thus the double release. This is a micro release, all APIs are compatible with the other 7.3 releases.
We recommend updating. You can find links to download the releases here:
We would like to thank our donors for the continued support of the PyPy project. If PyPy is not quite good enough for your needs, we are available for direct consulting work. If PyPy is helping you out, we would love to hear about it and encourage submissions to our blog via a pull request to https://github.com/pypy/pypy.org
We would also like to thank our contributors and encourage new people to join the project. PyPy has many layers and we need help with all of them: bug fixes, PyPy and RPython documentation improvements, or general help with making RPython's JIT even better.
If you are a python library maintainer and use C-extensions, please consider making a HPy / CFFI / cppyy version of your library that would be performant on PyPy. In any case, cibuildwheel supports building wheels for PyPy.
What is PyPy?
PyPy is a Python interpreter, a drop-in replacement for CPython It's fast (PyPy and CPython performance comparison) due to its integrated tracing JIT compiler.
We also welcome developers of other dynamic languages to see what RPython can do for them.
We provide binary builds for:
-
x86 machines on most common operating systems (Linux 32/64 bits, Mac OS 64 bits, Windows 64 bits)
-
64-bit ARM machines running Linux (
aarch64
) and macos (macos_arm64
).
PyPy supports Windows 32-bit, Linux PPC64 big- and little-endian, Linux ARM 32 bit, RISC-V RV64IMAFD Linux, and s390x Linux but does not release binaries. Please reach out to us if you wish to sponsor binary releases for those platforms. Downstream packagers provide binary builds for debian, Fedora, conda, OpenBSD, FreeBSD, Gentoo, and more.
What else is new?
For more information about the 7.3.20 release, see the full changelog.
Please update, and continue to help us make pypy better.
Cheers, The PyPy Team
04 Jul 2025 12:00pm GMT
05 Jun 2025
Planet Twisted
Glyph Lefkowitz: I Think I’m Done Thinking About genAI For Now
The Problem
Like many other self-styled thinky programmer guys, I like to imagine myself as a sort of Holmesian genius, making trenchant observations, collecting them, and then synergizing them into brilliant deductions with the keen application of my powerful mind.
However, several years ago, I had an epiphany in my self-concept. I finally understood that, to the extent that I am usefully clever, it is less in a Holmesian idiom, and more, shall we say, Monkesque.
For those unfamiliar with either of the respective franchises:
- Holmes is a towering intellect honed by years of training, who catalogues intentional, systematic observations and deduces logical, factual conclusions from those observations.
- Monk, on the other hand, while also a reasonably intelligent guy, is highly neurotic, wracked by unresolved trauma and profound grief. As both a consulting job and a coping mechanism, he makes a habit of erratically wandering into crime scenes, and, driven by a carefully managed jenga tower of mental illnesses, leverages his dual inabilities to solve crimes. First, he is unable to filter out apparently inconsequential details, building up a mental rat's nest of trivia about the problem; second, he is unable to let go of any minor incongruity, obsessively ruminating on the collection of facts until they all make sense in a consistent timeline.
Perhaps surprisingly, this tendency serves both this fictional wretch of a detective, and myself, reasonably well. I find annoying incongruities in abstractions and I fidget and fiddle with them until I end up building something that a lot of people like, or perhaps something that a smaller number of people get really excited about. At worst, at least I eventually understand what's going on. This is a self-soothing activity but it turns out that, managed properly, it can very effectively soothe others as well.
All that brings us to today's topic, which is an incongruity I cannot smooth out or fit into a logical framework to make sense. I am, somewhat reluctantly, a genAI skeptic. However, I am, even more reluctantly, exposed to genAI Discourse every damn minute of every damn day. It is relentless, inescapable, and exhausting.
This preamble about personality should hopefully help you, dear reader, to understand how I usually address problematical ideas by thinking and thinking and fidgeting with them until I manage to write some words - or perhaps a new open source package - that logically orders the ideas around it in a way which allows my brain to calm down and let it go, and how that process is important to me.
In this particular instance, however, genAI has defeated me. I cannot make it make sense, but I need to stop thinking about it anyway. It is too much and I need to give up.
My goal with this post is not to convince anyone of anything in particular - and we'll get to why that is a bit later - but rather:
- to set out my current understanding in one place, including all the various negative feelings which are still bothering me, so I can stop repeating it elsewhere,
- to explain why I cannot build a case that I think should be particularly convincing to anyone else, particularly to someone who actively disagrees with me,
- in so doing, to illustrate why I think the discourse is so fractious and unresolvable, and finally
- to give myself, and hopefully by proxy to give others in the same situation, permission to just peace out of this nightmare quagmire corner of the noosphere.
But first, just because I can't prove that my interlocutors are Wrong On The Internet, doesn't mean I won't explain why I feel like they are wrong.
The Anti-Antis
Most recently, at time of writing, there have been a spate of "the genAI discourse is bad" articles, almost exclusively written from the perspective of, not boosters exactly, but pragmatically minded (albeit concerned) genAI users, wishing for the skeptics to be more pointed and accurate in our critiques. This is anti-anti-genAI content.
I am not going to link to any of these, because, as part of their self-fulfilling prophecy about the "genAI discourse", they're also all bad.
Mostly, however, they had very little worthwhile to respond to because they were straw-manning their erstwhile interlocutors. They are all getting annoyed at "bad genAI criticism" while failing to engage with - and often failing to even mention - most of the actual substance of any serious genAI criticism. At least, any of the criticism that I've personally read.
I understand wanting to avoid a callout or Gish-gallop culture and just express your own ideas. So, I understand that they didn't link directly to particular sources or go point-by-point on anyone else's writing. Obviously I get it, since that's exactly what this post is doing too.
But if you're going to talk about how bad the genAI conversation is, without even mentioning huge categories of problem like "climate impact" or "disinformation"1 even once, I honestly don't know what conversation you're even talking about. This is peak "make up a guy to get mad at" behavior, which is especially confusing in this circumstance, because there's an absolutely huge crowd of actual people that you could already be mad at.
The people writing these pieces have historically seemed very thoughtful to me. Some of them I know personally. It is worrying to me that their critical thinking skills appear to have substantially degraded specifically after spending a bunch of time intensely using this technology which I believe has a scary risk of degrading one's critical thinking skills. Correlation is not causation or whatever, and sure, from a rhetorical perspective this is "post hoc ergo propter hoc" and maybe a little "ad hominem" for good measure, but correlation can still be concerning.
Yet, I cannot effectively respond to these folks, because they are making a practical argument that I cannot, despite my best efforts, find compelling evidence to refute categorically. My experiences of genAI are all extremely bad, but that is barely even anecdata. Their experiences are neutral-to-positive. Little scientific data exists. How to resolve this?2
The Aesthetics
As I begin to state my own position, let me lead with this: my factual analysis of genAI is hopelessly negatively biased. I find the vast majority of the aesthetic properties of genAI to be intensely unpleasant.
I have been trying very hard to correct for this bias, to try to pay attention to the facts and to have a clear-eyed view of these systems' capabilities. But the feelings are visceral, and the effort to compensate is tiring. It is, in fact, the desire to stop making this particular kind of effort that has me writing up this piece and trying to take an intentional break from the subject, despite its intense relevance.
When I say its "aesthetic qualities" are unpleasant, I don't just mean the aesthetic elements of output of genAIs themselves. The aesthetic quality of genAI writing, visual design, animation and so on, while mostly atrocious, is also highly variable. There are cherry-picked examples which look… fine. Maybe even good. For years now, there have been, famously, literally award-winning aesthetic outputs of genAI3.
While I am ideologically predisposed to see any "good" genAI art as accruing the benefits of either a survivorship bias from thousands of terrible outputs or simple plagiarism rather than its own inherent quality, I cannot deny that in many cases it is "good".
However, I am not just talking about the product, but the process; the aesthetic experience of interfacing with the genAI system itself, rather than the aesthetic experience of the outputs of that system.
I am not a visual artist and I am not really a writer4, particularly not a writer of fiction or anything else whose experience is primarily aesthetic. So I will speak directly to the experience of software development.
I have seen very few successful examples of using genAI to produce whole, working systems. There are no shortage of highly public miserable failures, particularly from the vendors of these systems themselves, where the outputs are confused, self-contradictory, full of subtle errors and generally unusable. While few studies exist, it sure looks like this is an automated way of producing a Net Negative Productivity Programmer, throwing out chaff to slow down the rest of the team.5
Juxtapose this with my aforementioned psychological motivations, to wit, I want to have everything in the computer be orderly and make sense, I'm sure most of you would have no trouble imagining that sitting through this sort of practice would make me extremely unhappy.
Despite this plethora of negative experiences, executives are aggressively mandating the use of AI6. It looks like without such mandates, most people will not bother to use such tools, so the executives will need muscular policies to enforce its use.7
Being forced to sit and argue with a robot while it struggles and fails to produce a working output, while you have to rewrite the code at the end anyway, is incredibly demoralizing. This is the kind of activity that activates every single major cause of burnout at once.
But, at least in that scenario, the thing ultimately doesn't work, so there's a hope that after a very stressful six month pilot program, you can go to management with a pile of meticulously collected evidence, and shut the whole thing down.
I am inclined to believe that, in fact, it doesn't work well enough to be used this way, and that we are going to see a big crash. But that is not the most aesthetically distressing thing. The most distressing thing is that maybe it does work; if not well enough to actually do the work, at least ambiguously enough to fool the executives long-term.
This project, in particular, stood out to me as an example. Its author, a self-professed "AI skeptic" who "thought LLMs were glorified Markov chain generators that didn't actually understand code and couldn't produce anything novel", did a green-field project to test this hypothesis.
Now, this particular project is not totally inconsistent with a world in which LLMs cannot produce anything novel. One could imagine that, out in the world of open source, perhaps there is enough "OAuth provider written in TypeScript" blended up into the slurry of "borrowed8" training data that the minor constraint of "make it work on Cloudflare Workers" is a small tweak9. It is not fully dispositive of the question of the viability of "genAI coding".
But it is a data point related to that question, and thus it did make me contend with what might happen if it were actually a fully demonstrative example. I reviewed the commit history, as the author suggested. For the sake of argument, I tried to ask myself if I would like working this way. Just for clarity on this question, I wanted to suspend judgement about everything else; assuming:
- the model could be created with ethically, legally, voluntarily sourced training data
- its usage involved consent from labor rather than authoritarian mandates
- sensible levels of energy expenditure, with minimal CO2 impact
- it is substantially more efficient to work this way than to just write the code yourself
and so on, and so on… would I like to use this magic robot that could mostly just emit working code for me? Would I use it if it were free, in all senses of the word?
No. I absolutely would not.
I found the experience of reading this commit history and imagining myself using such a tool - without exaggeration - nauseating.
Unlike many programmers, I love code review. I find that it is one of the best parts of the process of programming. I can help people learn, and develop their skills, and learn from them, and appreciate the decisions they made, develop an impression of a fellow programmer's style. It's a great way to build a mutual theory of mind.
Of course, it can still be really annoying; people make mistakes, often can't see things I find obvious, and in particular when you're reviewing a lot of code from a lot of different people, you often end up having to repeat explanations of the same mistakes. So I can see why many programmers, particularly those more introverted than I am, hate it.
But, ultimately, when I review their code and work hard to provide clear and actionable feedback, people learn and grow and it's worth that investment in inconvenience.
The process of coding with an "agentic" LLM appears to be the process of carefully distilling all the worst parts of code review, and removing and discarding all of its benefits.
The lazy, dumb, lying robot asshole keeps making the same mistakes over and over again, never improving, never genuinely reacting, always obsequiously pretending to take your feedback on board.
Even when it "does" actually "understand" and manages to load your instructions into its context window, 200K tokens later it will slide cleanly out of its memory and you will have to say it again.
All the while, it is attempting to trick you. It gets most things right, but it consistently makes mistakes in the places that you are least likely to notice. In places where a person wouldn't make a mistake. Your brain keeps trying to develop a theory of mind to predict its behavior but there's no mind there, so it always behaves infuriatingly randomly.
I don't think I am the only one who feels this way.
The Affordances
Whatever our environments afford, we tend to do more of. Whatever they resist, we tend to do less of. So in a world where we were all writing all of our code and emails and blog posts and texts to each other with LLMs, what do they afford that existing tools do not?
As a weirdo who enjoys code review, I also enjoy process engineering. The central question of almost all process engineering is to continuously ask: how shall we shape our tools, to better shape ourselves?
LLMs are an affordance for producing more text, faster. How is that going to shape us?
Again arguing in the alternative here, assuming the text is free from errors and hallucinations and whatever, it's all correct and fit for purpose, that means it reduces the pain of circumstances where you have to repeat yourself. Less pain! Sounds great; I don't like pain.
Every codebase has places where you need boilerplate. Every organization has defects in its information architecture that require repetition of certain information rather than a link back to the authoritative source of truth. Often, these problems persist for a very long time, because it is difficult to overcome the institutional inertia required to make real progress rather than going along with the status quo. But this is often where the highest-value projects can be found. Where there's muck, there's brass.
The process-engineering function of an LLM, therefore, is to prevent fundamental problems from ever getting fixed, to reward the rapid-fire overwhelm of infrastructure teams with an immediate, catastrophic cascade of legacy code that is now much harder to delete than it is to write.
There is a scene in Game of Thrones where Khal Drogo kills himself. He does so by replacing a stinging, burning, therapeutic antiseptic wound dressing with some cool, soothing mud. The mud felt nice, addressed the immediate pain, removed the discomfort of the antiseptic, and immediately gave him a lethal infection.
The pleasing feeling of immediate progress when one prompts an LLM to solve some problem feels like cool mud on my brain.
The Economics
We are in the middle of a mania around this technology. As I have written about before, I believe the mania will end. There will then be a crash, and a "winter". But, as I may not have stressed sufficiently, this crash will be the biggest of its kind - so big, that it is arguably not of a kind at all. The level of investment in these technologies is bananas and the possibility that the investors will recoup their investment seems close to zero. Meanwhile, that cost keeps going up, and up, and up.
Others have reported on this in detail10, and I will not reiterate that all here, but in addition to being a looming and scary industry-wide (if we are lucky; more likely it's probably "world-wide") economic threat, it is also going to drive some panicked behavior from management.
Panicky behavior from management stressed that their idea is not panning out is, famously, the cause of much human misery. I expect that even in the "good" scenario, where some profit is ultimately achieved, will still involve mass layoffs rocking the industry, panicked re-hiring, destruction of large amounts of wealth.
It feels bad to think about this.
The Energy Usage
For a long time I believed that the energy impact was overstated. I am even on record, about a year ago, saying I didn't think the energy usage was a big deal. I think I was wrong about that.
It initially seemed like it was letting regular old data centers off the hook. But recently I have learned that, while the numbers are incomplete because the vendors aren't sharing information, they're also extremely bad.11
I think there's probably a version of this technology that isn't a climate emergency nightmare, but that's not the version that the general public has access to today.
The Educational Impact
LLMs are making academic cheating incredibly rampant.12
Not only is it so common as to be nearly universal, it's also extremely harmful to learning.13
For learning, genAI is a forklift at the gym.
To some extent, LLMs are simply revealing a structural rot within education and academia that has been building for decades if not centuries. But it was within those inefficiencies and the inconveniences of the academic experience that real learning was, against all odds, still happening in schools.
LLMs produce a frictionless, streamlined process where students can effortlessly glide through the entire credential, learning nothing. Once again, they dull the pain without regard to its cause.
This is not good.
The Invasion of Privacy
This is obviously only a problem with the big cloud models, but then, the big cloud models are the only ones that people actually use. If you are having conversations about anything private with ChatGPT, you are sending all of that private information directly to Sam Altman, to do with as he wishes.
Even if you don't think he is a particularly bad guy, maybe he won't even create the privacy nightmare on purpose. Maybe he will be forced to do so as a result of some bizarre kafkaesque accident.14
Imagine the scenario, for example, where a woman is tracking her cycle and uploading the logs to ChatGPT so she can chat with it about a health concern. Except, surprise, you don't have to imagine, you can just search for it, as I have personally, organically, seen three separate women on YouTube, at least one of whom lives in Texas, not only do this on camera but recommend doing this to their audiences.
Citation links withheld on this particular claim for hopefully obvious reasons.
I assure you that I am neither particularly interested in menstrual products nor genAI content, and if I am seeing this more than once, it is probably a distressingly large trend.
The Stealing
The training data for LLMs is stolen. I don't mean like "pirated" in the sense where someone illicitly shares a copy they obtained legitimately; I mean their scrapers are ignoring both norms15 and laws16 to obtain copies under false pretenses, destroying other people's infrastructure17.
The Fatigue
I have provided references to numerous articles outlining rhetorical and sometimes data-driven cases for the existence of certain properties and consequences of genAI tools. But I can't prove any of these properties, either at a point in time or as a durable ongoing problem.
The LLMs themselves are simply too large to model with the usual kind of heuristics one would use to think about software. I'd sooner be able to predict the physics of dice in a casino than a 2 trillion parameter neural network. They resist scientific understanding, not just because of their size and complexity, but because unlike a natural phenomenon (which could of course be considerably larger and more complex) they resist experimentation.
The first form of genAI resistance to experiment is that every discussion is a motte-and-bailey. If I use a free model and get a bad result I'm told it's because I should have used the paid model. If I get a bad result with ChatGPT I should have used Claude. If I get a bad result with a chatbot I need to start using an agentic tool. If an agentic tool deletes my hard drive by putting os.system("rm -rf ~/")
into sitecustomize.py
then I guess I should have built my own MCP integration with a completely novel heretofore never even considered security sandbox or something?
What configuration, exactly, would let me make a categorical claim about these things? What specific methodological approach should I stick to, to get reliably adequate prompts?
For the record though, if the idea of the free models is that they are going to be provocative demonstrations of the impressive capabilities of the commercial models, and the results are consistently dogshit, I am finding it increasingly hard to care how much better the paid ones are supposed to be, especially since the "better"-ness cannot really be quantified in any meaningful way.
The motte-and-bailey doesn't stop there though. It's a war on all fronts. Concerned about energy usage? That's OK, you can use a local model. Concerned about infringement? That's okay, somewhere, somebody, maybe, has figured out how to train models consensually18. Worried about the politics of enriching the richest monsters in the world? Don't worry, you can always download an "open source" model from Hugging Face. It doesn't matter that many of these properties are mutually exclusive and attempting to fix one breaks two others; there's always an answer, the field is so abuzz with so many people trying to pull in so many directions at once that it is legitimately difficult to understand what's going on.
Even here though, I can see that characterizing everything this way is unfair to a hypothetical sort of person. If there is someone working at one of these thousands of AI companies that have been springing up like toadstools after a rain, and they really are solving one of these extremely difficult problems, how can I handwave that away? We need people working on problems, that's like, the whole point of having an economy. And I really don't like shitting on other people's earnest efforts, so I try not to dismiss whole fields. Given how AI has gotten into everything, in a way that e.g. cryptocurrency never did, painting with that broad a brush inevitably ends up tarring a bunch of stuff that isn't even really AI at all.
The second form of genAI resistance to experiment is the inherent obfuscation of productization. The models themselves are already complicated enough, but the products that are built around the models are evolving extremely rapidly. ChatGPT is not just a "model", and with the rapid19 deployment of Model Context Protocol tools, the edges of all these things will blur even further. Every LLM is now just an enormous unbounded soup of arbitrary software doing arbitrary whatever. How could I possibly get my arms around that to understand it?
The Challenge
I have woefully little experience with these tools.
I've tried them out a little bit, and almost every single time the result has been a disaster that has not made me curious to push further. Yet, I keep hearing from all over the industry that I should.
To some extent, I feel like the motte-and-bailey characterization above is fair; if the technology itself can really do real software development, it ought to be able to do it in multiple modalities, and there's nothing anyone can articulate to me about GPT-4o which puts it in a fundamentally different class than GPT-3.5.
But, also, I consistently hear that the subjective experience of using the premium versions of the tools is actually good, and the free ones are actually bad.
I keep struggling to find ways to try them "the right way", the way that people I know and otherwise respect claim to be using them, but I haven't managed to do so in any meaningful way yet.
I do not want to be using the cloud versions of these models with their potentially hideous energy demands; I'd like to use a local model. But there is obviously not a nicely composed way to use local models like this.
Since there are apparently zero models with ethically-sourced training data, and litigation is ongoing20 to determine the legal relationships of training data and outputs, even if I can be comfortable with some level of plagiarism on a project, I don't feel that I can introduce the existential legal risk into other people's infrastructure, so I would need to make a new project.
Others have differing opinions of course, including some within my dependency chain, which does worry me, but I still don't feel like I can freely contribute further to the problem; it's going to be bad enough to unwind any impact upstream. Even just for my own sake, I don't want to make it worse.
This especially presents a problem because I have way too much stuff going on already. A new project is not practical.
Finally, even if I did manage to satisfy all of my quirky21 constraints, would this experiment really be worth anything? The models and tools that people are raving about are the big, expensive, harmful ones. If I proved to myself yet again that a small model with bad tools was unpleasant to use, I wouldn't really be addressing my opponents' views.
I'm stuck.
The Surrender
I am writing this piece to make my peace with giving up on this topic, at least for a while. While I do idly hope that some folks might find bits of it convincing, and perhaps find ways to be more mindful with their own usage of genAI tools, and consider the harm they may be causing, that's not actually the goal. And that is not the goal because it is just so much goddamn work to prove.
Here, I must return to my philosophical hobbyhorse of sprachspiel. In this case, specifically to use it as an analytical tool, not just to understand what I am trying to say, but what the purpose for my speech is.
The concept of sprachspiel is most frequently deployed to describe the goal of the language game being played, but in game theory, that's only half the story. Speech - particularly rigorously justified speech - has a cost, as well as a benefit. I can make shit up pretty easily, but if I want to do anything remotely like scientific or academic rigor, that cost can be astronomical. In the case of developing an abstract understanding of LLMs, the cost is just too high.
So what is my goal, then? To be king Canute, standing astride the shore of "tech", whatever that is, commanding the LLM tide not to rise? This is a multi-trillion dollar juggernaut.
Even the rump, loser, also-ran fragment of it has the power to literally suffocate us in our homes22 if they so choose, completely insulated from any consequence. If the power curve starts there, imagine what the winners in this industry are going to be capable of, irrespective of the technology they're building - just with the resources they have to hand. Am I going to write a blog post that can rival their propaganda apparatus? Doubtful.
Instead, I will just have to concede that maybe I'm wrong. I don't have the skill, or the knowledge, or the energy, to demonstrate with any level of rigor that LLMs are generally, in fact, hot garbage. Intellectually, I will have to acknowledge that maybe the boosters are right. Maybe it'll be OK.
Maybe the carbon emissions aren't so bad. Maybe everybody is keeping them secret in ways that they don't for other types of datacenter for perfectly legitimate reasons. Maybe the tools really can write novel and correct code, and with a little more tweaking, it won't be so difficult to get them to do it. Maybe by the time they become a mandatory condition of access to developer tools, they won't be miserable.
Sure, I even sincerely agree, intellectual property really has been a pretty bad idea from the beginning. Maybe it's OK that we've made an exception to those rules. The rules were stupid anyway, so what does it matter if we let a few billionaires break them? Really, everybody should be able to break them (although of course, regular people can't, because we can't afford the lawyers to fight off the MPAA and RIAA, but that's a problem with the legal system, not tech).
I come not to praise "AI skepticism", but to bury it.
Maybe it really is all going to be fine. Perhaps I am simply catastrophizing; I have been known to do that from time to time. I can even sort of believe it, in my head. Still, even after writing all this out, I can't quite manage to believe it in the pit of my stomach.
Unfortunately, that feeling is not something that you, or I, can argue with.
Acknowledgments
Thank you to my patrons. Normally, I would say, "who are supporting my writing on this blog", but in the case of this piece, I feel more like I should apologize to them for this than to thank them; these thoughts have been preventing me from thinking more productive, useful things that I actually have relevant skill and expertise in; this felt more like a creative blockage that I just needed to expel than a deliberately written article. If you like what you've read here and you'd like to read more of it, well, too bad; I am sincerely determined to stop writing about this topic. But, if you'd like to read more stuff like other things I have written, or you'd like to support my various open-source endeavors, you can support my work as a sponsor!
-
And yes, disinformation is still an issue even if you're "just" using it for coding. Even sidestepping the practical matter that technology is inherently political, validation and propagation of poor technique is a form of disinformation. ↩
-
I can't resolve it, that's the whole tragedy here, but I guess we have to pretend I will to maintain narrative momentum here. ↩
-
The story in Creative Bloq, or the NYT, if you must ↩
-
although it's not for lack of trying, Jesus, look at the word count on this ↩
-
These are sometimes referred to as "10x" programmers, because they make everyone around them 10x slower. ↩
-
Douglas B. Laney at Forbes, Viral Shopify CEO Manifesto Says AI Now Mandatory For All Employees ↩
-
The National CIO Review, AI Mandates, Minimal Use: Closing the Workplace Readiness Gap ↩
-
Matt O'Brien at the AP, Reddit sues AI company Anthropic for allegedly 'scraping' user comments to train chatbot Claude ↩
-
Using the usual tricks to find plagiarism like searching for literal transcriptions of snippets of training data did not pull up anything when I tried, but then, that's not how LLMs work these days, is it? If it didn't obfuscate the plagiarism it wouldn't be a very good plagiarism-obfuscator. ↩
-
David Gerard at Pivot to AI, "Microsoft and AI: spending billions to make millions", Edward Zitron at Where's Your Ed At, "The Era Of The Business Idiot", both sobering reads ↩
-
James O'Donnell and Casey Crownhart at the MIT Technology Review, We did the math on AI's energy footprint. Here's the story you haven't heard. ↩
-
Lucas Ropek at Gizmodo, AI Cheating Is So Out of Hand In America's Schools That the Blue Books Are Coming Back ↩
-
James D. Walsh at the New York Magazine Intelligencer, Everyone Is Cheating Their Way Through College ↩
-
Ashley Belanger at Ars Technica, OpenAI slams court order to save all ChatGPT logs, including deleted chats ↩
-
Ashley Belanger at Ars Technica, AI haters build tarpits to trap and trick AI scrapers that ignore robots.txt ↩
-
Blake Brittain at Reuters, Judge in Meta case warns AI could 'obliterate' market for original works ↩
-
Xkeeper, TCRF has been getting DDoSed ↩
-
Kate Knibbs at Wired, Here's Proof You Can Train an AI Model Without Slurping Copyrighted Content ↩
-
and, I should note, extremely irresponsible ↩
-
Porter Anderson at Publishing Perspectives, Meta AI Lawsuit: US Publishers File Amicus Brief ↩
-
It feels bizarre to characterize what feel like baseline ethical concerns this way, but the fact remains that within the "genAI community", this places me into a tiny and obscure minority. ↩
-
Ariel Wittenberg for Politico, 'How come I can't breathe?': Musk's data company draws a backlash in Memphis ↩
05 Jun 2025 5:22am GMT
17 Apr 2025
Planet Twisted
Glyph Lefkowitz: Stop Writing `__init__` Methods
The History
Before dataclasses were added to Python in version 3.7 - in June of 2018 - the __init__
special method had an important use. If you had a class representing a data structure - for example a 2DCoordinate
, with x
and y
attributes - you would want to be able to construct it as 2DCoordinate(x=1, y=2)
, which would require you to add an __init__
method with x
and y
parameters.
The other options available at the time all had pretty bad problems:
- You could remove
2DCoordinate
from your public API and instead expose amake_2d_coordinate
function and make it non-importable, but then how would you document your return or parameter types? - You could document the
x
andy
attributes and make the user assign each one themselves, but then2DCoordinate()
would return an invalid object. - You could default your coordinates to 0 with class attributes, and while that would fix the problem with option 2, this would now require all
2DCoordinate
objects to be not just mutable, but mutated at every call site. - You could fix the problems with option 1 by adding a new abstract class that you could expose in your public API, but this would explode the complexity of every new public class, no matter how simple. To make matters worse,
typing.Protocol
didn't even arrive until Python 3.8, so, in the pre-3.7 world this would condemn you to using concrete inheritance and declaring multiple classes even for the most basic data structure imaginable.
Also, an __init__
method that does nothing but assign a few attributes doesn't have any significant problems, so it is an obvious choice in this case. Given all the problems that I just described with the alternatives, it makes sense that it became the obvious default choice, in most cases.
However, by accepting "define a custom __init__
" as the default way to allow users to create your objects, we make a habit of beginning every class with a pile of arbitrary code that gets executed every time it is instantiated.
Wherever there is arbitrary code, there are arbitrary problems.
The Problems
Let's consider a data structure more complex than one that simply holds a couple of attributes. We will create one that represents a reference to some I/O in the external world: a FileReader
.
Of course Python has its own open-file object abstraction, but I will be ignoring that for the purposes of the example.
Let's assume a world where we have the following functions, in an imaginary fileio
module:
open(path: str) -> int
read(fileno: int, length: int)
close(fileno: int)
Our hypothetical fileio.open
returns an integer representing a file descriptor1, fileio.read
allows us to read length
bytes from an open file descriptor, and fileio.close
closes that file descriptor, invalidating it for future use.
With the habit that we have built from writing thousands of __init__
methods, we might want to write our FileReader
class like this:
1 2 3 4 5 6 7 |
|
For our initial use-case, this is fine. Client code creates a FileReader
by doing something like FileReader("./config.json")
, which always creates a FileReader
that maintains its file descriptor int
internally as private state. This is as it should be; we don't want user code to see or mess with _fd
, as that might violate FileReader
's invariants. All the necessary work to construct a valid FileReader
- i.e. the call to open
- is always taken care of for you by FileReader.__init__
.
However, additional requirements will creep in, and as they do, FileReader.__init__
becomes increasingly awkward.
Initially we only care about fileio.open
, but later, we may have to deal with a library that has its own reasons for managing the call to fileio.open
by itself, and wants to give us an int
that we use as our _fd
, we now have to resort to weird workarounds like:
1 2 3 4 |
|
Now, all those nice properties that we got from trying to force object construction to give us a valid object are gone. reader_from_fd
's type signature, which takes a plain int
, has no way of even suggesting to client code how to ensure that it has passed in the right kind of int
.
Testing is much more of a hassle, because we have to patch in our own copy of fileio.open
any time we want an instance of a FileReader
in a test without doing any real-life file I/O, even if we could (for example) share a single file descriptor among many FileReader
s for testing purposes.
All of this also assumes a fileio.open
that is synchronous. Although for literal file I/O this is more of a hypothetical concern, there are many types of networked resource which are really only available via an asynchronous (and thus: potentially slow, potentially error-prone) API. If you've ever found yourself wanting to type async def __init__(self): ...
then you have seen this limitation in practice.
Comprehensively describing all the possible problems with this approach would end up being a book-length treatise on a philosophy of object oriented design, so I will sum up by saying that the cause of all these problems is the same: we are inextricably linking the act of creating a data structure with whatever side-effects are most often associated with that data structure. If they are "often" associated with it, then by definition they are not "always" associated with it, and all the cases where they aren't associated become unweildy and potentially broken.
Defining an __init__
is an anti-pattern, and we need a replacement for it.
The Solutions
I believe this tripartite assemblage of design techniques will address the problems raised above:
- using
dataclass
to define attributes, - replacing behavior that previously would have previously been in
__init__
with a new classmethod that does the same thing, and - using precise types to describe what a valid instance looks like.
Using dataclass
attributes to create an __init__
for you
To begin, let's refactor FileReader
into a dataclass
. This does get us an __init__
method, but it won't be one an arbitrary one we define ourselves; it will get the useful constraint enforced on it that it will just assign attributes.
1 2 3 4 5 6 7 |
|
Except... oops. In fixing the problems that we created with our custom __init__
that calls fileio.open
, we have re-introduced several problems that it solved:
- We have removed all the convenience of
FileReader("path")
. Now the user needs to import the low-levelfileio.open
again, making the most common type of construction both more verbose and less discoverable; if we want users to know how to build aFileReader
in a practical scenario, we will have to add something in our documentation to point at a separate module entirely. - There's no enforcement of the validity of
_fd
as a file descriptor; it's just some integer, which the user could easily pass an incorrect instance of, with no error.
In isolation, dataclass
by itself can't solve all our problems, so let's add in the second technique.
Using classmethod
factories to create objects
We don't want to require any additional imports, or require users to go looking at any other modules - or indeed anything other than FileReader
itself - to figure out how to create a FileReader
for its intended usage.
Luckily we have a tool that can easily address all of these concerns at once: @classmethod
. Let's define a FileReader.open
class method:
1 2 3 4 5 6 7 |
|
Now, your callers can replace FileReader("path")
with FileReader.open("path")
, and get all the same benefits.
Additionally, if we needed to await fileio.open(...)
, and thus we needed its signature to be @classmethod async def open
, we are freed from the constraint of __init__
as a special method. There is nothing that would prevent a @classmethod
from being async
, or indeed, from having any other modification to its return value, such as returning a tuple
of related values rather than just the object being constructed.
Using NewType
to address object validity
Next, let's address the slightly trickier issue of enforcing object validity.
Our type signature calls this thing an int
, and indeed, that is unfortunately what the lower-level fileio.open
gives us, and that's beyond our control. But for our own purposes, we can be more precise in our definitions, using NewType
:
1 2 |
|
There are a few different ways to address the underlying library, but for the sake of brevity and to illustrate that this can be done with zero run-time overhead, let's just insist to Mypy that we have versions of fileio.open
, fileio.read
, and fileio.write
which actually already take FileDescriptor
integers rather than regular ones.
1 2 3 4 |
|
We do of course have to slightly adjust FileReader
, too, but the changes are very small. Putting it all together, we get:
1 2 3 4 5 6 7 8 9 10 11 |
|
Note that the main technique here is not necessarily using NewType
specifically, but rather aligning an instance's property of "has all attributes set" as closely as possible with an instance's property of "fully valid instance of its class"; NewType
is just a handy tool to enforce any necessary constraints on the places where you need to use a primitive type like int
, str
or bytes
.
In Summary - The New Best Practice
From now on, when you're defining a new Python class:
- Make it a dataclass2.
- Use its default
__init__
method3. - Add
@classmethod
s to provide your users convenient and discoverable ways to build your objects. - Require that all dependencies be satisfied by attributes, so you always start with a valid object.
- Use
typing.NewType
to enforce any constraints on primitive data types (likeint
andstr
) which might have magical external attributes, like needing to come from a particular library, needing to be random, and so on.
If you define all your classes this way, you will get all the benefits of a custom __init__
method:
- All consumers of your data structures will receive valid objects, because an object with all its attributes populated correctly is inherently valid.
- Users of your library will be presented with convenient ways to create your objects that do as much work as is necessary to make them easy to use, and they can discover these just by looking at the methods on your class itself.
Along with some nice new benefits:
- You will be future-proofed against new requirements for different ways that users may need to construct your object.
- If there are already multiple ways to instantiate your class, you can now give each of them a meaningful name; no need to have monstrosities like
def __init__(self, maybe_a_filename: int | str | None = None):
- Your test suite can always construct an object by satisfying all its dependencies; no need to monkey-patch anything when you can always call the type and never do any I/O or generate any side effects.
Before dataclasses, it was always a bit weird that such a basic feature of the Python language - giving data to a data structure to make it valid - required overriding a method with 4 underscores in its name. __init__
stuck out like a sore thumb. Other such methods like __add__
or even __repr__
were inherently customizing esoteric attributes of classes.
For many years now, that historical language wart has been resolved. @dataclass
, @classmethod
, and NewType
give you everything you need to build classes which are convenient, idiomatic, flexible, testable, and robust.
Acknowledgments
Thank you to my patrons who are supporting my writing on this blog. If you like what you've read here and you'd like to read more of it, or you'd like to support my various open-source endeavors, you can support my work as a sponsor! I am also available for consulting work if you think your organization could benefit from expertise on topics like "but what is a 'class', really?".
-
If you aren't already familiar, a "file descriptor" is an integer which has meaning only within your program; you tell the operating system to open a file, it says "I have opened file 7 for you", and then whenever you refer to "7" it is that file, until you
close(7)
. ↩ -
Or an attrs class, if you're nasty. ↩
-
Unless you have a really good reason to, of course. Backwards compatibility, or compatibility with another library, might be good reasons to do that. Or certain types of data-consistency validation which cannot be expressed within the type system. The most common example of these would be a class that requires consistency between two different fields, such as a "range" object where
start
must always be less thanend
. There are always exceptions to these types of rules. Still, it's pretty much never a good idea to do any I/O in__init__
, and nearly all of the remaining stuff that may sometimes be a good idea in edge-cases can be achieved with a__post_init__
rather than writing a literal__init__
. ↩
17 Apr 2025 10:35pm GMT
01 Apr 2025
Planet Twisted
Glyph Lefkowitz: A Bigger Database
A Database File
When I was 10 years old, and going through a fairly difficult time, I was lucky enough to come into the possession of a piece of software called Claris FileMaker Pro™.
FileMaker allowed its users to construct arbitrary databases, and to associate their tables with a customized visual presentation. FileMaker also had a rudimentary scripting language, which would allow users to imbue these databases with behavior.
As a mentally ill pre-teen, lacking a sense of control over anything or anyone in my own life, including myself, I began building a personalized database to catalogue the various objects and people in my immediate vicinity. If one were inclined to be generous, one might assess this behavior and say I was systematically taxonomizing the objects in my life and recording schematized information about them.
As I saw it at the time, if I collected the information, I could always use it later, to answer questions that I might have. If I didn't collect it, then what if I needed it? Surely I would regret it! Thus I developed a categorical imperative to spend as much of my time as possible collecting and entering data about everything that I could reasonably arrange into a common schema.
Having thus summoned this specter of regret for all lost data-entry opportunities, it was hard to dismiss. We might label it "Claris's Basilisk", for obvious reasons.
Therefore, a less-generous (or more clinically-minded) observer might have replaced the word "systematically" with "obsessively" in the assessment above.
I also began writing what scripts were within my marginal programming abilities at the time, just because I could: things like computing the sum of every street number of every person in my address book. Why was this useful? Wrong question: the right question is "was it possible" to which my answer was "yes".
If I was obliged to collect all the information which I could observe - in case it later became interesting - I was similarly obliged to write and run every program I could. It might, after all, emit some other interesting information.
I was an avid reader of science fiction as well.
I had this vague sense that computers could kind of think. This resulted in a chain of reasoning that went something like this:
- human brains are kinda like computers,
- the software running in the human brain is very complex,
- I could only write simple computer programs, but,
- when you really think about it, a "complex" program is just a collection of simpler programs
Therefore: if I just kept collecting data, collecting smaller programs that could solve specific problems, and connecting them all together in one big file, eventually the database as a whole would become self-aware and could solve whatever problem I wanted. I just needed to be patient; to "keep grinding" as the kids would put it today.
I still feel like this is an understandable way to think - if you are a highly depressed and anxious 10-year-old in 1990.
Anyway.
35 Years Later
OpenAI is a company that produces transformer architecture machine learning generative AI models; their current generation was trained on about 10 trillion words, obtained in a variety of different ways from a large variety of different, unrelated sources.
A few days ago, on March 26, 2025 at 8:41 AM Pacific Time, Sam Altman took to "X™, The Everything App™," and described the trajectory of his career of the last decade at OpenAI as, and I quote, a "grind for a decade trying to help make super-intelligence to cure cancer or whatever" (emphasis mine).
I really, really don't want to become a full-time AI skeptic, and I am not an expert here, but I feel like I can identify a logically flawed premise when I see one.
This is not a system-design strategy. It is a trauma response.
You can't cure cancer "or whatever". If you want to build a computer system that does some thing, you actually need to hire experts in that thing, and have them work to both design and validate that the system is fit for the purpose of that thing.
Aside: But... are they, though?
I am not an oncologist; I do not particularly want to be writing about the specifics here, but, if I am going to make a claim like "you can't cure cancer this way" I need to back it up.
My first argument - and possibly my strongest - is that cancer is not cured.
QED.
But I guess, to Sam's credit, there is at least one other company partnering with OpenAI to do things that are specifically related to cancer. However, that company is still in a self-described "initial phase" and it's not entirely clear that it is going to work out very well.
Almost everything I can find about it online was from a PR push in the middle of last year, so it all reads like a press release. I can't easily find any independently-verified information.
A lot of AI hype is like this. A promising demo is delivered; claims are made that surely if the technology can solve this small part of the problem now, within 5 years surely it will be able to solve everything else as well!
But even the light-on-content puff-pieces tend to hedge quite a lot. For example, as the Wall Street Journal quoted one of the users initially testing it (emphasis mine):
The most promising use of AI in healthcare right now is automating "mundane" tasks like paperwork and physician note-taking, he said. The tendency for AI models to "hallucinate" and contain bias presents serious risks for using AI to replace doctors. Both Color's Laraki and OpenAI's Lightcap are adamant that doctors be involved in any clinical decisions.
I would probably not personally characterize "'mundane' tasks like paperwork and … note-taking" as "curing cancer". Maybe an oncologist could use some code I developed too; even if it helped them, I wouldn't be stealing valor from them on the curing-cancer part of their job.
Even fully giving it the benefit of the doubt that it works great, and improves patient outcomes significantly, this is medical back-office software. It is not super-intelligence.
It would not even matter if it were "super-intelligence", whatever that means, because "intelligence" is not how you do medical care or medical research. It's called "lab work" not "lab think".
To put a fine point on it: biomedical research fundamentally cannot be done entirely by reading papers or processing existing information. It cannot even be done by testing drugs in computer simulations.
Biological systems are enormously complex, and medical research on new therapies inherently requires careful, repeated empirical testing to validate the correspondence of existing research with reality. Not "an experiment", but a series of coordinated experiments that all test the same theoretical model. The data (which, in an LLM context, is "training data") might just be wrong; it may not reflect reality, and the only way to tell is to continuously verify it against reality.
Previous observations can be tainted by methodological errors, by data fraud, and by operational mistakes by practitioners. If there were a way to do verifiable development of new disease therapies without the extremely expensive ladder going from cell cultures to animal models to human trials, we would already be doing it, and "AI" would just be an improvement to efficiency of that process. But there is no way to do that and nothing about the technologies involved in LLMs is going to change that fact.
Knowing Things
The practice of science - indeed any practice of the collection of meaningful information - must be done by intentionally and carefully selecting inclusion criteria, methodically and repeatedly curating our data, building a model that operates according to rules we understand and can verify, and verifying the data itself with repeated tests against nature. We cannot just hoover up whatever information happens to be conveniently available with no human intervention and hope it resolves to a correct model of reality by accident. We need to look where the keys are, not where the light is.
Piling up more and more information in a haphazard and increasingly precarious pile will not allow us to climb to the top of that pile, all the way to heaven, so that we can attack and dethrone God.
Eventually, we'll just run out of disk space, and then lose the database file when the family gets a new computer anyway.
Acknowledgments
Thank you to my patrons who are supporting my writing on this blog. If you like what you've read here and you'd like to read more of it, or you'd like to support my various open-source endeavors, you can support my work as a sponsor! Special thanks also to Itamar Turner-Trauring and Thomas Grainger for pre-publication feedback on this article; any errors of course remain my own.
01 Apr 2025 12:47am GMT
29 Nov 2024
Planet Plone - Where Developers And Integrators Write
Maurits van Rees: Lightning talks Friday
Bonnie Tyler Sprint
On 12 August 2026 there is a total solar eclipse that can be seen from Valencia, Spain. So we organise a sprint there.
This conference
We had 291 participants, 234 in person and 57 online. 13 Brazilian states (that is all of them), 14 countries.
24.5 percent women, was 13% in 2013, so that has gone up, but we are not there yet. Thank you to PyLadies and Django Girls for making this happen.
We had more than 80 presenters, about 30 lightning talks, lots of talk in the hall ways.
Thanks also to the team!
Ramiro Luz: Yoga time
Yoga exercise.
Rikupekka: University case student portal
We have a student portal at the university. But mostly:
Welcome to Jyväskylä university in Finald for Plone conference 2025, October 13-19!
Jakob: Beethovensprint
26-30 mei 2025 in Bonn, Duitsland.
Afterwards, on May 30 and June 1 there will be FedCon in Bonn, a SciFi convention.
Piero/Victor: BYOUI
Add-ons first development with @plone/registry. See https://plone-registry.readthedocs.io/
It allows for development that is framework agnostic, so it is not only for Plone. It is around configuration that can be extended and injected, which is tricky in most javascript frameworks.
Imagine it.
Ana Dulce: 3D printing
For a difficult model I had trust the process, it took a week, but it worked.
Renan & Iza: Python Brasil
We organised the Python Brasil conference from 16 to 23 October this year in Rio de Janeiro.
Next year 21-27 October in São Paulo.
Erico: Python Cerrado
31 July to 2 August 2025 is the next Python Cerrado conference.
29 Nov 2024 10:25pm GMT
Maurits van Rees: Paul Roeland: The value of longevity
Link to talk information on Plone conference website.
I work for the Clean Clothes Campaign: https://cleanclothes.org/
After three large disasters in factories in 2012 and 2013 with over 1000 deaths, it took three years to get an agreement with clothes manufacturers to get 30 million dollar compensation. It does not bring lives back, but it helps the survivors.
See Open Supply Hub for open data that we collected, for checking which brands are produced in which factories.
Documenting history matters. Stories must be told.
The global closing industry is worth around 1.8 trillion dollars, in a country that would put them on the 12th place in the world. 75 million workers.
Our strongest weapon: backlinks. We have links from OECD, UN, wikipedia, school curriculum, books. Especially those last two don't change ever, so you should never change urls.
Plone: enable the sitemap, please, why not by default? Create a good robots.txt. I weekly check Google Search console, looking for broken links. Tag early, tag often, great tool, even if you have an AI do it.
Our website: started 1998 written in Notepad, 2004 Dreamweaver, 2006 Bluefish, 2010 Joomla, 2013 Plone 4, 2020 Castle CMS (opinionated distribution of Plone, but does not really exist anymore) 2024 Plone 6 with Volto Light Theme (work in progress). Thank you kitconcept for all the help, especially Jonas.
Migrations are painful. Along the years we used wget to csv to SQL to csv, Python script, "Franken-mogrifier", collective.exportimport.
Lessons learned: stable urls are awesome, migrations are painful. Please don't try to salvage CSS from your old site, just start fresh in your new system. Do not try to migrate composite pages or listings.
What if your website does not provide an export? Use wget, still works and is better than httrack. sed/awk/regex are your friend. archivebox (WARC).
Document your steps for your own sanity.
To manage json, jq or jello can be used. sq is a Swiss knife for json/sql/csv. emuto is a hybrid between jq and GraphQL.
Normalize import/export. We have `plone.exportimport` in core now.
In the future I would like a plone exporter script that accepts a regex and exports only matching pages. Switch backends: ZODB, relstorage, nick, quantum-db. Sitewide search/replace/sed. Sneakernet is useful in difficult countries where you cannot send data over the internet: so export to a usb stick.
A backup is only a backup if it regularly gets restored so you know that it works.
- Keeping content and URL stability is a superpower.
- Assuming that export/import/backup/restore/migration are rare occurrences, is wrong.
- Quick export/import is very useful.
Do small migrations, treat it as maintenance. Don't be too far behind. Large migrations one every five years will be costly. Do a small migration every year. Do your part. Clients should also do their part, by budgeting this yearly. That is how budgeting works. Use every iteration to review custom code.
Make your sites live long and prosper.
29 Nov 2024 8:58pm GMT
Maurits van Rees: Fred van Dijk: Run Plone in containers on your own cluster with coolify.io
Link to talk information on Plone conference website.
Sorry, I ran out of time trying to set up https://coolify.io
So let's talk about another problem. Running applications (stacks) in containers is the future. Well: abstraction and isolation is the future, and containers is the current phase.
I am on the Plone A/I team, with Paul, Kim, Erico. All senior sysadmins, so we kept things running. In 2022 we worked on containerisation. Kubernetes was the kool kid then, but Docker Swarm was easier. Checkout Erico's training with new cookieplone templates.
Doing devops well is hard. You have a high workload, but still need to keep learning new stuff to keep up with what is changing.
I want to plug Coolify, which is a full open source product. "Self-hosting with super powers." The main developer, Andras Bacsal, believes in open source and 'hates' pay by usage cloud providers with a vengeance.
Coolify is still docker swarm. We also want Kubernetes support. But we still need sysadmins. Someone will still need to install coolify, and keep it updated.
I would like to run an online DevOps course somewhere January-March 2025. 4-6 meetings of 2 hours, maybe Friday afternoon. Talk through devops and sysadmin concepts, show docker swarm, try coolify, etc.
29 Nov 2024 7:58pm GMT