31 Jul 2025

feedLXer Linux News

AMD Threadripper 9980X + 9970X Linux Benchmarks: Incredible Workstation Performance

Ahead of the Threadripper 9000 series hitting store shelves tomorrow, today the review embargo lifts on these new high-end desktop/workstation Zen 5 processors. I have been testing out the Threadripper 9970X and 9980X this month and have been extremely excited about the generational uplift and all-around performance of these new AMD Ryzen Threadripper 9970X/9980X processors on Linux for delivering the best possible workstation performance in 2025.

31 Jul 2025 11:36am GMT

Archinstall 3.0.9 Rolls Out with U2F and Bluetooth Support

Archinstall 3.0.9, a guided installer for Arch Linux, adds U2F authentication, LUKS iteration tweaks, and Bluetooth support.

31 Jul 2025 10:05am GMT

feedDrupal.org aggregator

The Drop Times: What Bram Driesen Sees from Inside the Drupal Ecosystem

From volunteer photography to security policy, Bram Driesen has spent years shaping the Drupal ecosystem from the inside. In this profile, he reflects on overlooked responsibilities, hidden work, and the standards that keep open-source projects stable, collaborative, and worth contributing to.

31 Jul 2025 9:00am GMT

feedLXer Linux News

Linux Begins Preparing For The Lenovo Legion Go 2 Handheld

In recent days there have been an increasing flow of leaks surrounding the Legion Go 2 as the next-generation handheld from Lenovo. The Lenovo Legion Go 2 is reported to be launching later this year with an AMD Ryzen Z2 Extreme SoC, 144Hz OLED display, and a variety of other hardware upgrades over the original Lenovo Legion Go. Linux driver activity around the Legion Go 2 has begun...

31 Jul 2025 8:33am GMT

feedDrupal.org aggregator

Droptica: Drupal 7 vs Drupal 11 – How Have This System and Its Functionalities Changed?

-

Drupal 7 was released in 2011 and has been the foundation of many websites for years. However, in the world of technology, 14 years is almost an eternity - during this time, Drupal has undergone a huge evolution. Today, the latest version, Drupal 11, is a modern system based on current coding standards and offering features that were once only a dream. In this article, we will look at the key differences between Drupal 7 and Drupal 11. If you last worked with Drupal years ago (e.g., with version 6 or 7), prepare yourself for the "wow" effect - Drupal 11 is a whole new experience.

31 Jul 2025 7:01am GMT

feedPlanet Lisp

Joe Marshall: JRM runs off at the mouth

Although LLMs perform a straightforward operation - they predict the next tokens from a sequence of tokens - they can be almost magical in their results if the stars are aligned. And from the look of it, the stars align often enough to be useful. But if you're unlucky, you can end up with a useless pile of garbage. My LLM started spitting out such gems as Cascadescontaminantsunnatural and exquisiteacquire the other day when I requested it imagine some dialog. Your mileage will vary, a lot.

The question is whether the magic outweighs the glossolalia. Can we keep the idiot savant LLM from evangelically speaking in tongues?

Many people at work are reluctant to use LLMs as an aid to programming, preferring to hand craft all their code. I understand the sentiment, but I think it is a mistake. LLMs are a tool of extraordinary power, but you need to develop the skill to use them, and that takes a lot of time and practice.

The initial key to using LLMs is to get good at prompting them. Here a trained programmer has a distinct advantage over a layperson. When you program at a high level, you are not only thinking about how to solve your problem, but also all the ways you can screw up. This is "defensive programming". You check your inputs, you write code to handle "impossible" cases, you write test cases that exercise the edge cases. (I'm no fan of test-driven development, but if I have code that is supposed to exhibit some complex behavior, I'll often write a few test cases to prove that the code isn't egregiously broken.)

When you prompt an LLM, it helps a lot to think in the same way you program. You need to be aware of the ways the LLM can misinterpret your prompt, and you need to write your prompt so that it is as clear as possible. You might think that this defeats the purpose. You are essentially performing the act of programming with an extra natural language translation step in the middle. This is true, and you will get good results if you approach the task with this in mind. Learning to effectively prompt an LLM is very similar to learning a new programming language. It is a skill that a trained programmer will have honed over time. Laypeople will find it possible to generate useful code with an LLM, but they will encounter bugs and problems that they will have difficulty overcoming. A trained programmer will know precisely how to craft additional clauses to the prompt to avoid these problems.

Context engineering is the art of crafting a series of prompts to guide the LLM to produce the results you want. If you know how to program, you don't necessarily know how to engineer large systems. If you know how to prompt, you don't necessarily know how to engineer the context. Think of Mickey Mouse in Fantasia. He quickly learns the prompts that get the broom to carry the water, but he doesn't foresee the consequences of exponential replication.

Ever write a program that seems to be taking an awfully long time to run? You do a back-of-the-envelope calculation and realize that the expected runtime will be on the order of 1050 seconds. This sort of problem won't go away with an LLM, but the relative number of people ill-equipped to diagnose and deal with the problem will certainly go up. Logical thinking and foreseeing of consequences will be skills in higher demand than ever in the future.

You won't be able to become a "machine whisperer" without a significant investment of time and effort. As a programmer, you already have a huge head start. Turn on the LLM and use it in your daily workflow. Get a good feel for its strengths and weaknesses (they'll surprise you). Then leverage this crazy tool for your advantage. It will make you a better programmer.

31 Jul 2025 2:14am GMT

feedPlanet Python

Daniel Roy Greenfeld: Unpack for keyword arguments

Previously I wrote a TIL on how to better type annotate callables with *args and **kwargs - in essence you ignore the container and worry just about the content of the container. This makes sense, as *args are a tuple and **kwargs keys are strings.

Here's an example of that in action:

>>> def func(*args, **kwargs):
...     print(f'{args=}')
...     print(f'{kwargs=}')
args=(1, 2, 3)
kwargs={'one': 1, 'two': 2}

In fact, if you try to force **kwargs to accept a non-string type Python stops you with a TypeError:

>>> func(**{1:2})
Traceback (most recent call last):
  File "<python-input-9>", line 1, in <module>
    func(**{1:2})
    ~~~~^^^^^^^^^
TypeError: keywords must be strings

This is all great, but what if you want your keyword arguments to consistently accept a pattern of arguments? So this passes type checks:

from typing import TypedDict, Unpack

class Cheese(TypedDict):
    name: str
    price: int


def func(**cheese: Unpack[Cheese]) -> None:
    print(cheese)

Let's try it out:

>>> func(name='Paski Sir', price=30)
{'name': 'Paski Sir', 'price': 30}

Works great! Now let's break it by forgetting a keyword argument:

>>> func(name='Paski Sir')
{'name': 'Paski Sir'}

What? How about adding an extra keyword argument and replacing the int with a float:

>>> func(name='Paski Sir', price=30.5, country='Croatia')
{'name': 'Paski Sir', 'price': 30.5, 'country': 'Croatia'}

Still no errors? What gives? The answer is that type annotations are for type checkers, and don't catch during runtime. See the [note at the top of the core Python docs on typing]:

Note The Python runtime does not enforce function and variable type annotations. They can be used by third party tools such as type checkers, IDEs, linters, etc.

For those times when we do need runtime evaluations of types, we lean on built-ins like isinstance and issubclass, which are quite seperate from type hints and annotations.

Thanks to the astute Luke Plant for pointing out Unpack to me and sending me down a quite pleasant rabbit hole.

31 Jul 2025 12:12am GMT

30 Jul 2025

feedDZone Java Zone

Immutable Objects Using Record in Java

It is often useful to have objects that, once created, don't change their content. To see a complete description on how to build such class, you can read my previous article "Immutable Objects in Java".

Let's imagine we want to build a PersonClass with two fields: firstName and lastName. To create immutable instances, this class must:

30 Jul 2025 8:00pm GMT

feedDrupal.org aggregator

Dries Buytaert: Why Drupal is built for the AI era

An astronaut explores a surreal landscape beneath rainbow-colored planetary rings, symbolizing the journey into AI&#039;s transformative potential for Drupal.

In my previous post, [The great digital agency unbundling](https://dri.es/ai-and-the-great-digital-agency-unbundling), I explored how AI is transforming the work of digital agencies. As AI automates more technical tasks, agencies will be shifting their focus toward orchestration, strategic thinking, and accountability. This shift also changes what they need from their tools. Content management systems like [Drupal](https://new.drupal.org/home) must evolve with them. This is not just about adding AI features. It is about becoming a platform that strengthens the new agency model. Because as agencies take on new roles, they will adopt the tools that help them succeed. As I wrote then: > "As the Project Lead of Drupal, I think about how Drupal, the product, and its ecosystem of digital agencies can evolve _together_. They need to move in step to navigate change and help shape what comes next" The good news is that the Drupal community is embracing AI in a coordinated and purposeful way. Today, Drupal CMS already ships with 22 AI agents, and through the [Drupal AI Initiative](https://dri.es/accelerating-ai-innovation-in-drupal), we are building additional infrastructure and tooling to bring more AI capabilities to Drupal. In this post, I want to share why I believe Drupal is not just ready to evolve, but uniquely positioned to thrive in the AI era. ### Drupal is built for AI Imagine an AI agent that plans, executes, and measures complex marketing campaigns across your CMS, CRM, email platform, and analytics tools without requiring manual handoff at every step. To support that level of orchestration, a platform must expose its content models, configuration data, state, user roles and permissions, and business logic in a structured, machine-readable way. That means making things like entity types, field definitions, relationships, and workflows available through APIs that AI systems can discover, inspect, and act on safely. Most platforms were not designed with this kind of structured access in mind. Drupal has been moving in that direction for more than a decade. Since Drupal 7, the community has invested deeply in modernizing the platform. We introduced a unified Entity API, adopted a service container with dependency injection, and expanded support for REST, JSON:API, and GraphQL. We also built a robust configuration management system, improved testability, and added more powerful workflows with granular revisioning and strong rollback support. Drupal also has excellent API documentation. These changes made Drupal not only more programmable but also more introspectable. AI agents can query Drupal's structure, understand relationships between entities, and make informed decisions based on both content and configuration. This enables AI to take meaningful action inside the system rather than just operating at the surface. And because Drupal's APIs are open and well-documented, these capabilities are easier for developers and AI systems to discover and build on. Making these architectural improvements was not easy. Upgrading from Drupal 7 was painful for many, and at the time, the benefits of Drupal 8's redesign were not obvious. We were not thinking about AI at the time, but in hindsight, we built exactly the kind of modern, flexible foundation that makes deep AI integration possible today. As is often the case, there is [pain before the payoff](https://dri.es/the-pain-before-the-payoff). ### AI makes Drupal's power more accessible I think this is exciting because AI can help make Drupal's flexibility more accessible. Drupal is one of the most flexible content management systems available. It powers everything from small websites to large, complex digital platforms. That flexibility is a strength, but it also introduces complexity. For newcomers, Drupal's flexibility can be overwhelming. Building a Drupal site requires understanding how to select and configure contributed modules, creating content types and relationships, defining roles and permissions, building Views, developing a custom theme, and more. The learning curve is steep and often prevents people from experiencing Drupal's power and flexibility. AI has the potential to change that. In the future, you might describe your needs by saying something like, "I need a multi-language news site with editorial workflows and social media integration". An AI assistant could ask a few follow-up questions, then generate a working starting point. I've demonstrated early prototypes of this vision in recent [DriesNotes](https://dri.es/tag/state-of-drupal), including [DrupalCon Barcelona 2024](https://dri.es/state-of-drupal-presentation-september-2024) and [DrupalCon Atlanta 2025](https://dri.es/state-of-drupal-presentation-march-2025). Much of that code has been productized in the [Drupal AI modules](https://www.drupal.org/project/ai). In my Barcelona keynote, I said that "AI is the new UI". AI helps lower the barrier to entry by turning complex setup tasks into simple prompts and conversations. With the right design, it can guide new users while still giving experts full control. In my last post, [The great digital agency unbundling](https://dri.es/ai-and-the-great-digital-agency-unbundling), I shared a similar perspective: > "Some of the hardest challenges the Drupal community has faced, such as improving usability or maintaining documentation, may finally become more manageable. I see ways AI can support Drupal's mission, lower barriers to online publishing, make Drupal more accessible, and help build a stronger, more inclusive Open Web. The future is both exciting and uncertain." Of course, AI comes with both promise and risk. It raises ethical questions and often fails to meet expectations. But ignoring AI is _not_ a strategy. AI is already changing how digital work gets done. If we want Drupal to stay relevant, we need to explore its potential. That means experimenting thoughtfully, sharing what we learn, and helping shape how these tools are built and used. ### Drupal's AI roadmap helps agencies AI is changing how digital work gets done. Some platforms can now generate full websites, marketing campaigns, or content strategies in minutes. For simple use cases, that may be enough. But many client needs are more complex. As requirements grow and automations become more sophisticated, agencies continue to play a critical role. They bring context, strategy, and accountability to challenges that off-the-shelf tools cannot solve. That is the future we want Drupal to support. We are _not_ building AI to replace digital agencies, but to strengthen them. Through the [Drupal AI Initiative](https://dri.es/accelerating-ai-innovation-in-drupal), Drupal agencies are actively helping shape the tools they want to use in an AI-driven world. As agencies evolve in response to AI, they will need tools that evolve with them. Drupal is not only keeping pace but helping lead the way. By investing in AI in collaboration with the agencies who rely on it, we are making Drupal stronger, more capable, and more relevant. ### Now is the moment to move The shift toward AI-powered digital work is inevitable. Platforms will succeed or fail based on how well they adapt to this reality. Drupal's investments in modern architecture, open development, and community collaboration has created something unique: a platform that doesn't just add AI features but fundamentally supports AI-driven workflows. While other systems scramble to retrofit AI capabilities, Drupal's foundation makes deep integration possible. The question isn't whether AI will change digital agencies and content management. It already has. The question is which platforms will help agencies and developers thrive in that new reality. Drupal is positioning itself to be one of them.

30 Jul 2025 6:09pm GMT

feedPlanet Lisp

Joe Marshall: Novice to LLMs — LLM calls Lisp

I'm a novice to the LLM API, and I'm assuming that at least some of my readers are too. I'm not the very last person to the party, am I?

When integrating the LLM with Lisp, we want to allow the LLM to direct queries back to the Lisp that is invoking it. This is done through the function call protocol. The client supplies to the LLM a list of functions that the LLM may invoke. When the LLM wants to invoke the function, instead of returing a block of generated text, it returns a JSON object indicating a function call. This contains the name of the function and the arguments. The client is supposed to invoke the function, but to return an answer, it actually makes a new call into the LLM and it concatenates the entire conversation so far along with the result of the function call. It is bizarro continuation-passing-style where the client acts as a trampoline and keeps track of the continuation.

So, for example, by exposing lisp-implementation-type and lisp-implementation-version, we can then query the LLM:

> (invoke-gemini "gemini-2.5-flash" "What is the type and version of the lisp implementation?")
"The Lisp implementation is SBCL version 2.5.4."

30 Jul 2025 2:49pm GMT

feedPlanet Python

Test and Code: 236: Git Tips for Testing - Adam Johnson

In this episode, host Brian Okken and guest Adam Johnson explore essential Git features, highlighted by Adam's updated book, "Boost Your Git DX."

Key topics include

This conversation offers valuable strategies for developers at any skill level to enhance their Git proficiency and optimize their coding workflows.


Links:


Help support the show AND learn pytest:

★ Support this podcast on Patreon ★ <p>In this episode, host Brian Okken and guest Adam Johnson explore essential Git features, highlighted by Adam's updated book, "Boost Your Git DX." </p><p>Key topics include </p><ul><li>"cherry picking" for selective commits</li><li>"git stash" for managing in-progress work</li><li>"git diff", and specifically its `--name-only` flag, which provides a streamlined method for developers to identify which files have changed, which can be used to determine which tests need to be run</li><li>"git bisect" for efficiently pinpointing bugs. </li></ul><p>This conversation offers valuable strategies for developers at any skill level to enhance their Git proficiency and optimize their coding workflows.</p><p><br>Links:</p><ul><li><a href="https://adamchainz.gumroad.com/l/bygdx">Boost Your Git DX</a> - Adam's book</li></ul> <br><p><strong>Help support the show AND learn pytest: </strong></p><ul><li><a href="https://file+.vscode-resource.vscode-cdn.net/Users/brianokken/projects/test_and_code_notes/new_ad.md">The Complete pytest course</a> is now a bundle, with each part available separately.<ul><li><a href="https://courses.pythontest.com/pytest-primary-power">pytest Primary Power</a> teaches the super powers of pytest that you need to learn to use pytest effectively.</li><li><a href="https://courses.pythontest.com/using-pytest-with-projects">Using pytest with Projects</a> has lots of "when you need it" sections like debugging failed tests, mocking, testing strategy, and CI</li><li>Then <a href="https://courses.pythontest.com/pytest-booster-rockets">pytest Booster Rockets</a> can help with advanced parametrization and building plugins.</li></ul></li><li>Whether you need to get started with pytest today, or want to power up your pytest skills, <a href="https://courses.pythontest.com">PythonTest</a> has a course for you.<p></p></li></ul> <strong> <a href="https://www.patreon.com/c/testpodcast" rel="payment" title="★ Support this podcast on Patreon ★">★ Support this podcast on Patreon ★</a> </strong>

30 Jul 2025 2:01pm GMT

Real Python: Python's asyncio: A Hands-On Walkthrough

Python's asyncio library enables you to write concurrent code using the async and await keywords. The core building blocks of async I/O in Python are awaitable objects-most often coroutines-that an event loop schedules and executes asynchronously. This programming model lets you efficiently manage multiple I/O-bound tasks within a single thread of execution.

In this tutorial, you'll learn how Python asyncio works, how to define and run coroutines, and when to use asynchronous programming for better performance in applications that perform I/O-bound tasks.

By the end of this tutorial, you'll understand that:

  • Python's asyncio provides a framework for writing single-threaded concurrent code using coroutines, event loops, and non-blocking I/O operations.
  • For I/O-bound tasks, async I/O can often outperform multithreading-especially when managing a large number of concurrent tasks-because it avoids the overhead of thread management.
  • You should use asyncio when your application spends significant time waiting on I/O operations, such as network requests or file access, and you want to run many of these tasks concurrently without creating extra threads or processes.

Through hands-on examples, you'll gain the practical skills to write efficient Python code using asyncio that scales gracefully with increasing I/O demands.

Get Your Code: Click here to download the free sample code that you'll use to learn about async I/O in Python.

Take the Quiz: Test your knowledge with our interactive "Python's asyncio: A Hands-On Walkthrough" quiz. You'll receive a score upon completion to help you track your learning progress:


Async IO in Python: A Complete Walkthrough

Interactive Quiz

Python's asyncio: A Hands-On Walkthrough

Test your knowledge of `asyncio` concurrency with this quiz that covers coroutines, event loops, and efficient I/O-bound task management.

A First Look at Async I/O

Before exploring asyncio, it's worth taking a moment to compare async I/O with other concurrency models to see how it fits into Python's broader, sometimes dizzying, landscape. Here are some essential concepts to start with:

  • Parallelism consists of executing multiple operations at the same time.
  • Multiprocessing is a means of achieving parallelism that entails spreading tasks over a computer's central processing unit (CPU) cores. Multiprocessing is well-suited for CPU-bound tasks, such as tightly bound for loops and mathematical computations.
  • Concurrency is a slightly broader term than parallelism, suggesting that multiple tasks have the ability to run in an overlapping manner. Concurrency doesn't necessarily imply parallelism.
  • Threading is a concurrent execution model in which multiple threads take turns executing tasks. A single process can contain multiple threads. Python's relationship with threading is complicated due to the global interpreter lock (GIL), but that's beyond the scope of this tutorial.

Threading is good for I/O-bound tasks. An I/O-bound job is dominated by a lot of waiting on input/output (I/O) to complete, while a CPU-bound task is characterized by the computer's cores continually working hard from start to finish.

The Python standard library has offered longstanding support for these models through its multiprocessing, concurrent.futures, and threading packages.

Now it's time to add a new member to the mix. In recent years, a separate model has been more comprehensively built into CPython: asynchronous I/O, commonly called async I/O. This model is enabled through the standard library's asyncio package and the async and await keywords.

Note: Async I/O isn't a new concept. It exists in-or is being built into-other languages such as Go, C#, and Rust.

The asyncio package is billed by the Python documentation as a library to write concurrent code. However, async I/O isn't threading or multiprocessing. It's not built on top of either of these.

Async I/O is a single-threaded, single-process technique that uses cooperative multitasking. Async I/O gives a feeling of concurrency despite using a single thread in a single process. Coroutines-or coro for short-are a central feature of async I/O and can be scheduled concurrently, but they're not inherently concurrent.

To reiterate, async I/O is a model of concurrent programming, but it's not parallelism. It's more closely aligned with threading than with multiprocessing, but it's different from both and is a standalone member of the concurrency ecosystem.

That leaves one more term. What does it mean for something to be asynchronous? This isn't a rigorous definition, but for the purposes of this tutorial, you can think of two key properties:

  1. Asynchronous routines can pause their execution while waiting for a result and allow other routines to run in the meantime.
  2. Asynchronous code facilitates the concurrent execution of tasks by coordinating asynchronous routines.

Here's a diagram that puts it all together. The white terms represent concepts, and the green terms represent the ways they're implemented:

Concurrency versus parallelismDiagram Comparing Concurrency and Parallelism in Python (Threading, Async I/O, Multiprocessing)

For a thorough exploration of threading versus multiprocessing versus async I/O, pause here and check out the Speed Up Your Python Program With Concurrency tutorial. For now, you'll focus on async I/O.

Async I/O Explained

Async I/O may seem counterintuitive and paradoxical at first. How does something that facilitates concurrent code use a single thread in a single CPU core? Miguel Grinberg's PyCon talk explains everything quite beautifully:

Chess master Judit Polgár hosts a chess exhibition in which she plays multiple amateur players. She has two ways of conducting the exhibition: synchronously and asynchronously.

Assumptions:

  • 24 opponents
  • Judit makes each chess move in 5 seconds
  • Opponents each take 55 seconds to make a move
  • Games average 30 pair-moves (60 moves total)

Synchronous version: Judit plays one game at a time, never two at the same time, until the game is complete. Each game takes (55 + 5) * 30 == 1800 seconds, or 30 minutes. The entire exhibition takes 24 * 30 == 720 minutes, or 12 hours.

Asynchronous version: Judit moves from table to table, making one move at each table. She leaves the table and lets the opponent make their next move during the wait time. One move on all 24 games takes Judit 24 * 5 == 120 seconds, or 2 minutes. The entire exhibition is now cut down to 120 * 30 == 3600 seconds, or just 1 hour. (Source)

Read the full article at https://realpython.com/async-io-python/ »


[ Improve Your Python With 🐍 Python Tricks 💌 - Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

30 Jul 2025 2:00pm GMT

feedLinux Today

Rescuezilla 2.6.1 Released with Ubuntu 25.04 Base

Discover the latest Rescuezilla 2.6.1 release, now based on Ubuntu 25.04. Explore new features and enhancements for efficient system recovery.

The post Rescuezilla 2.6.1 Released with Ubuntu 25.04 Base appeared first on Linux Today.

30 Jul 2025 1:46pm GMT

Best Free and Open Source Alternatives to Autodesk Fusion

Discover the best free and open source alternatives to Autodesk Fusion. Explore powerful tools that enhance your design capabilities without the cost.

The post Best Free and Open Source Alternatives to Autodesk Fusion appeared first on Linux Today.

30 Jul 2025 1:38pm GMT

GStreamer 1.26.4 Rolls Out with Bug Fixes and Performance Tweaks

Discover the latest GStreamer 1.26.4 update featuring essential bug fixes and performance enhancements. Improve your multimedia experience today!

The post GStreamer 1.26.4 Rolls Out with Bug Fixes and Performance Tweaks appeared first on Linux Today.

30 Jul 2025 1:32pm GMT

feedDjango community aggregator: Community blog posts

Django: split ModelAdmin.get_queryset() by view

Within Django's popular admin site, you can override ModelAdmin.get_queryset() to customize the queryset used by the admin views. It's often used for performance optimizations, such as adding a select_related() call to batch-fetch related objects:

from django.contrib import admin

from example.models import Book


@admin.register(Book)
class BookAdmin(admin.ModelAdmin):
    def get_queryset(self, request):
        return super().get_queryset(request).select_related("author")

However, one thing this approach lacks is granularity-the queryset returned by get_queryset() is used for all admin views, such as the change list, change form, and any custom views that you might add. That can mean that adding an optimization in get_queryset() for one view can impose a performance cost on other views that don't need it. For example, the above select_related() call might optimize showing author details shown on the change list view, but other pages that don't show the author will still incur the cost of the join.

There isn't an easy way to customize the queryset for individual views without overriding a lot of their code. However, the queryset() method is passed the current request object as context, which allows you to differentiate between views based on request.resolver_match. I think the most robust way to check the current admin view from there is with the __name__ attribute of the func:

from django.contrib import admin

from example.models import Book


@admin.register(Book)
class BookAdmin(admin.ModelAdmin):
    def get_queryset(self, request):
        queryset = super().get_queryset(request)

        if request.resolver_match.func.__name__ == "changelist_view":
            queryset = queryset.select_related("author")

        return queryset

request.resolver_match.func is the current view function, which will be a method of the current ModelAdmin instance, wrapped with the AdminSite.admin_view decorator. Its __name__ attribute gives the name of the view function, which you can use to differentiate between views. For the built-in admin views, it will be one of the following:

(add_view is not included here as it does not call get_queryset().)

Fin

May your admin views run swift and true,

-Adam

30 Jul 2025 4:00am GMT

29 Jul 2025

feedDjango community aggregator: Community blog posts

Our tools are still not designed for the AI future

First a disclaimer on this one: I am making the assumption that the AI trend is here to stay in some form and an economic crash/bubble doesn't make the usage of them untenable, also I have yet experiment with every tool out there!

With that said, a brief personal history of my usage of LLM's and the current wave of AI. I tried out ChatGPT when it was first released and was fairly impressed by the results, but the cruical missing step for me was the lack of browser integration, searching Google was still much quicker from a new tab page and the results from ChatGPT felt isolated, there was too much friction in my workflow for it be usable. I tried out a different product (I forget the name), which allowed me to search from a new tab page and I got AI results and normal search results in one go. This was better, but it still didn't stick, and so I kept experimenting with the tools on an ad-hoc basis, solving small challenges, but it not being a daily driver. In this I experimented with local LLMs and Zed's AI integration.

This changed earlier this year where I experimented with Manus.im and using Claude with Zed's agent mode. Both of these unlocked ideas or wrote decent code directly into my project for me to review, saving me time that I could measure. Since then I have used Zed's agent mode more frequently, that said I do still enjoy coding myself so sometimes forget to use the tools available.

This daily to weekly usage has led me to consider what an AI-first IDE would look like. At this point the newly released Kiro or Cursor comes to mind or others such as Zed or VSCode plus extensions, they are all really designed for a developer-first point of view since their beginning's were from a time before agent workflows existed.

Personally I am looking forward to the IDE that is built ground up to create, run and manage agents that follows the trend of 'spec-driven-development' that has started to form recently within AI circles. Generally I would expect the following:

Is this something of a wishlist? Yes, but I could easily a version of this starting soon, because there a paradigm shift coming I think, where the day to day activity goes from one of writing on our local machines, to one of review on our local machines being the priority. Let me know what you think!

29 Jul 2025 5:00am GMT

28 Jul 2025

feedDjango community aggregator: Community blog posts

User Timezones in Django

When you create a local website, the local time usually matches your country's timezone, and all visitors see times in that timezone. That's not a big issue if your country has only one timezone and your audience is local.

But when building a social platform like pybazaar.com, users are international and need to see times in their timezones. In this article, I'll show you how to handle that in Django.

Time Zone Database

Since version 4.0, Django has used the zoneinfo library for managing timezones, and it used pytz up to version 3.2. Both rely on the IANA Time Zone Database (tzdata). IANA is the same organization that manages the DNS root zone, IP addresses, and other global internet resources.

Install tzdata in your virtual environment as usual:

(venv)$ pip install --upgrade tzdata

Timezone Changes

Timezone information changes several times a year due to:

  1. Daylight Saving Time (DST) adjustments
  2. Political and border changes
  3. Shifts in standard time offset

Daylight Saving Time (DST) was first introduced in 1914 in Canada and later standardized in the U.S. in 1966. When dealing with historic dates before 1966-or future dates with uncertain timezone rules-precise time calculations can be unreliable.

# Before U.S. DST standardization:
old_date = datetime(1960, 6, 15, 12, 0)  

# DST rules may change in the future:
future_date = datetime(2030, 6, 15, 12, 0) 

Some timezone changes are driven by politics:

And countries sometimes adjust their UTC offsets:

Best Practices for Django

Timezone Management for a Social Platform

For platforms with global users:

1. Enable Timezone Support in Django Settings

Set the default timezone to UTC:

# settings.py
USE_TZ = True
TIME_ZONE = "UTC"  # Store everything in UTC

2. Add a timezone Field to the Custom User Model

Use a function for dynamic timezone choices, so you don't need new migrations when the list changes.

def get_timezone_choices():
    import zoneinfo
    return [(tz, tz) for tz in sorted(zoneinfo.available_timezones())]

class User(AbstractUser):
    # ...
    timezone = models.CharField(
        _("Timezone"), max_length=50, choices=get_timezone_choices, default="UTC"
    )

3. Detect Timezone on the Frontend

Add hidden fields in your Login and Signup forms to capture the user's timezone from their browser:

document.addEventListener('DOMContentLoaded', function () {
    const userTimezone = Intl.DateTimeFormat().resolvedOptions().timeZone;
    const timezoneInput = document.getElementById('id_timezone');
    if (timezoneInput) {
        timezoneInput.value = userTimezone;
    }
});

You can also let users change their timezone manually in account settings.

4. Use a Custom DateTime Field in Forms

This field will convert datetimes between UTC and the user's local timezone:

import datetime
from zoneinfo import ZoneInfo
from django import forms
from django.utils import timezone
from django.utils.dateparse import parse_datetime

class TimezoneAwareDateTimeField(forms.DateTimeField):
    widget = forms.DateTimeInput(attrs={"type": "datetime-local"})

    def __init__(self, user_timezone=None, *args, **kwargs):
        self.user_timezone = user_timezone
        super().__init__(*args, **kwargs)

    def prepare_value(self, value):
        if value and self.user_timezone:
            try:
                user_tz = ZoneInfo(self.user_timezone)
                if timezone.is_aware(value):
                    value = value.astimezone(user_tz)
            except Exception:
                pass
        return value

    def to_python(self, value):
        if value in self.empty_values:
            return None
        if isinstance(value, datetime.datetime):
            result = value
        elif isinstance(value, datetime.date):
            result = datetime.datetime(value.year, value.month, value.day)
        else:
            try:
                result = parse_datetime(value.strip())
            except ValueError:
                raise forms.ValidationError(
                    self.error_messages["invalid"], code="invalid"
                )
        if not result:
            result = super(forms.DateTimeField).to_python(value)
        if result and self.user_timezone:
            try:
                user_tz = ZoneInfo(self.user_timezone)
                if timezone.is_naive(result):
                    result = result.replace(tzinfo=user_tz)
                result = result.astimezone(ZoneInfo("UTC"))
            except Exception:
                pass
        return result

The type="datetime-local" widget uses the browser's native date/time picker.

Use the custom field like this:

from django import forms
from django.utils.translation import gettext_lazy as _
from myproject.apps.core.form_fields import TimezoneAwareDateTimeField
from .models import Post

class PostForm(forms.ModelForm):
    class Meta:
        model = Post
        fields = ["title", "content", "published_from"]

    def __init__(self, request, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.request = request
        self.fields["published_from"] = TimezoneAwareDateTimeField(
            label=_("Published from"),
            help_text=_("Enter date and time in your local timezone."),
            required=False,
            user_timezone=self.request.user.timezone,
        )

5. Output Dates and Times in User's Timezone

{% load tz %}
{% with user_timezone=request.user.timezone|default:"UTC" %}
    {{ post.published_from|timezone:user_timezone|date:"j M, Y H:i" }}
{% endwith %}

Other Options

You can also detect the visitor's timezone in JavaScript and send it via Ajax to be saved in the Django session. Then you can use it even for anonymous users.

Final Words

Timezones aren't so scary if you follow Django's best practices:

This keeps your website accurate, user-friendly, and ready for global audiences.


Cover photo by Andrey Grushnikov

28 Jul 2025 5:00pm GMT

feedPlanet Lisp

Joe Marshall: Pseudo

I was wondering what it would look like if a large language model were part of your programming language. I'm not talking about calling the model as an API, but rather embedding it as a language construct. I came up with this idea as a first cut.

The pseudo macro allows you to embed pseudocode expressions in your Common Lisp code. It takes a string description and uses an LLM to expand it into an s-expression. You can use pseudo anywhere an expression would be expected.

(defun my-func (a b)
  (pseudo "multiply b by factorial of a."))
MY-FUNC

(my-func 5 3)
360

(defun quadratic (a b c)
  (let ((d (sqrt (pseudo "compute discriminant of quadratic equation"))))
    (values (/ (+ (- b) d) (* 2 a)) (/ (- (- b) d) (* 2 a)))))
QUADRATIC

(quadratic 1 2 -3)
1.0
-3.0

The pseudo macro gathers contextual information and packages it up in a big set of system instructions to the LLM. The instructions include

pseduo sets the LLM to use a low temperature for more predictable generation. It prints the "thinking" of the LLM.

Lisp is a big win here. Since Lisp's macro system operates at the level of s-expressions, it has more contextual information available to it than a macro system that is just text expansion. The s-expression representation means that we don't need to interface with the language's parser or compiler to operate on the syntax tree of the code. Adding pseudo to a language like Java would be a much more significant undertaking.

pseudo has the usual LLM caveats:

pseudo has one dependency on SBCL which is a function to extract the lexically visible variables from the macro environment. If you port it to another Common Lisp, you'll want to provide an equivalent function.

pseudo was developed using Google's Gemini as the back end, but there's no reason it couldn't be adapted to use other LLMs. To try it out, you'll need the gemini library, available at https://github.com/jrm-code-project/gemini, and a Google API key.

Download pseudo from https://github.com/jrm-code-project/pseudo.

You'll also need these dependencies.

If you try it, let me know how it goes.

28 Jul 2025 9:41am GMT

25 Jul 2025

feedKernel Planet

Linux Plumbers Conference: All Microconferences have been Accepted!

Good news! All Microconferences have been accepted and are now accepting submissions. The accepted Microconferences are:

You can start submitting topics to these Microconferences. Remember to read the Blog on what makes the ideal Microconference topic before submitting.

After that, submit your topic and make sure that you select the appropriate track that you are submitting for (they are all listed under LPC Microconference Proposals and end with MC).

25 Jul 2025 8:12pm GMT

feedDZone Java Zone

Smart-Doc: Generating gRPC API Documentation in Java Projects

Foreword

In modern Java microservices, gRPC simplifies inter-service communication with its efficient binary protocol and multi-language support. However, maintaining gRPC API documentation can be challenging as projects grow. Among various AI tools, smart-doc stands out as the optimal solution for generating gRPC API documentation in Java projects.

Advantages of Smart-Doc in Java Projects

1. Fast Speed

Smart-doc is designed to quickly scan code and generate documentation without additional runtime dependencies. It directly extracts .proto files, compiles them into Java code using protoc, and then generates documentation by parsing the Java code and comments. This process is much faster than AI tools.

25 Jul 2025 1:00pm GMT

24 Jul 2025

feedKernel Planet

Dave Airlie (blogspot): ramalama/mesa : benchmarks on my hardware and open source vs proprietary

One of my pet peeves around running local LLMs and inferencing is the sheer mountain of shit^W^W^W complexity of compute stacks needed to run any of this stuff in an mostly optimal way on a piece of hardware.

CUDA, ROCm, and Intel oneAPI all to my mind scream over-engineering on a massive scale at least for a single task like inferencing. The combination of closed source, over the wall open source, and open source that is insurmountable for anyone to support or fix outside the vendor, screams that there has to be a simpler way. Combine that with the pytorch ecosystem and insanity of deploying python and I get a bit unstuck.

What can be done about it?

llama.cpp to me seems like the best answer to the problem at present, (a rust version would be a personal preference, but can't have everything). I like how ramalama wraps llama.cpp to provide a sane container interface, but I'd like to eventually get to the point where container complexity for a GPU compute stack isn't really needed except for exceptional cases.

On the compute stack side, Vulkan exposes most features of GPU hardware in a possibly suboptimal way, but with extensions all can be forgiven. Jeff Bolz from NVIDIA's talk at Vulkanised 2025 started to give me hope that maybe the dream was possible.

The main issue I have is Jeff is writing driver code for the NVIDIA proprietary vulkan driver which reduces complexity but doesn't solve my open source problem.

Enter NVK, the open source driver for NVIDIA GPUs. Karol Herbst and myself are taking a look at closing the feature gap with the proprietary one. For mesa 25.2 the initial support for VK_KHR_cooperative_matrix was landed, along with some optimisations, but there is a bunch of work to get VK_NV_cooperative_matrix2 and a truckload of compiler optimisations to catch up with NVIDIA.

But since mesa 25.2 was coming soon I wanted to try and get some baseline figures out.

I benchmarked on two systems (because my AMD 7900XT wouldn't fit in the case). Both Ryzen CPUs. The first I used system I put in an RTX5080 then a RTX6000 Ada and then the Intel A770. The second I used for the RX7900XT. The Intel SYCL stack failed to launch unfortunately inside ramalama and I hacked llama.cpp to use the A770 MMA accelerators.

ramalama bench hf://unsloth/Qwen3-8B-GGUF:UD-Q4_K_XL

I picked this model at random, and I've no idea if it was a good idea.


Some analysis:

The token generation workload is a lot less matmul heavy than prompt processing, it also does a lot more synchronising. Jeff has stated CUDA wins here mostly due to CUDA graphs and most of the work needed is operation fusion on the llama.cpp side. Prompt processing is a lot more matmul heavy, extensions like NV_coopmat2 will help with that (NVIDIA vulkan already uses it in the above), but there may be further work to help close the CUDA gap. On AMD radv (open source) Vulkan is already better at TG than ROCm, but behind in prompt processing. Again coopmat2 like extensions should help close the gap there.

NVK is starting from a fair way behind, we just pushed support for the most basic coopmat extension and we know there is a long way to go, but I think most of it is achievable as we move forward and I hope to update with new scores on a semi regular basis. We also know we can definitely close the gap on the NVIDIA proprietary Vulkan driver if we apply enough elbow grease and register allocation :-)

I think it might also be worth putting some effort into radv coopmat2 support, I think if radv could overtake ROCm for both of these it would remove a large piece of complexity from the basic users stack.

As for Intel I've no real idea, I hope to get their SYCL implementation up and running, and maybe I should try and get my hands on a B580 card as a better baseline. When I had SYCL running once before I kinda remember it being 2-4x the vulkan driver, but there's been development on both sides.

(The graphs were generated by Gemini.)

24 Jul 2025 10:19pm GMT

23 Jul 2025

feedDZone Java Zone

Undocumented Java 16 Feature: The End-of-File Comment

While working on some code where I wanted to obscure parts of it using Unicode escapes instead of the actual source, I accidentally stumbled upon an undocumented feature that's been around since Java 16: what I call the end-of-file comment.

In Java, we typically have three types of comments:

23 Jul 2025 6:00pm GMT

22 Jul 2025

feedKernel Planet

Pete Zaitcev: Floating Point

I'm unemployed right now and I go to job interviews once in a while. One time, the company was doing another AI thing, having to do with verifying that training computations were doing something useful, and not just "dumping a stream of floating point numbers".

Until now I didn't think of it, but apparently AI is all in FP. And it reminded me how I worked in a CPU design place, where they had a group focused on FP. Those guys were doing FP since the days of transistor. They migrated their designs, generation by generation, through TTL, ECL, Bi-CMOS, CMOS. When I heard from them last, they were tinkering with "deep sub-micron".

One remarkable part about their thing was that because they started out in transistors, their FPU didn't have any microcode. It was all in hardware. Even divisions! Just a bunch of counters that sequenced whatever necessary.

For a long time during the reign of x86, the group was somewhat de-prioritized, because many microprocessors at the time treated FP performance as an afterthought. A number of desktop CPUs shipped with no hardware FP at all. But look how the tables have turned. I honestly hope that it was not too late and AI has become a boon for the successors of my past colleagues.

22 Jul 2025 5:28pm GMT

05 Jun 2025

feedPlanet Twisted

Glyph Lefkowitz: I Think I’m Done Thinking About genAI For Now

The Problem

Like many other self-styled thinky programmer guys, I like to imagine myself as a sort of Holmesian genius, making trenchant observations, collecting them, and then synergizing them into brilliant deductions with the keen application of my powerful mind.

However, several years ago, I had an epiphany in my self-concept. I finally understood that, to the extent that I am usefully clever, it is less in a Holmesian idiom, and more, shall we say, Monkesque.

For those unfamiliar with either of the respective franchises:

Perhaps surprisingly, this tendency serves both this fictional wretch of a detective, and myself, reasonably well. I find annoying incongruities in abstractions and I fidget and fiddle with them until I end up building something that a lot of people like, or perhaps something that a smaller number of people get really excited about. At worst, at least I eventually understand what's going on. This is a self-soothing activity but it turns out that, managed properly, it can very effectively soothe others as well.

All that brings us to today's topic, which is an incongruity I cannot smooth out or fit into a logical framework to make sense. I am, somewhat reluctantly, a genAI skeptic. However, I am, even more reluctantly, exposed to genAI Discourse every damn minute of every damn day. It is relentless, inescapable, and exhausting.

This preamble about personality should hopefully help you, dear reader, to understand how I usually address problematical ideas by thinking and thinking and fidgeting with them until I manage to write some words - or perhaps a new open source package - that logically orders the ideas around it in a way which allows my brain to calm down and let it go, and how that process is important to me.

In this particular instance, however, genAI has defeated me. I cannot make it make sense, but I need to stop thinking about it anyway. It is too much and I need to give up.

My goal with this post is not to convince anyone of anything in particular - and we'll get to why that is a bit later - but rather:

  1. to set out my current understanding in one place, including all the various negative feelings which are still bothering me, so I can stop repeating it elsewhere,
  2. to explain why I cannot build a case that I think should be particularly convincing to anyone else, particularly to someone who actively disagrees with me,
  3. in so doing, to illustrate why I think the discourse is so fractious and unresolvable, and finally
  4. to give myself, and hopefully by proxy to give others in the same situation, permission to just peace out of this nightmare quagmire corner of the noosphere.

But first, just because I can't prove that my interlocutors are Wrong On The Internet, doesn't mean I won't explain why I feel like they are wrong.

The Anti-Antis

Most recently, at time of writing, there have been a spate of "the genAI discourse is bad" articles, almost exclusively written from the perspective of, not boosters exactly, but pragmatically minded (albeit concerned) genAI users, wishing for the skeptics to be more pointed and accurate in our critiques. This is anti-anti-genAI content.

I am not going to link to any of these, because, as part of their self-fulfilling prophecy about the "genAI discourse", they're also all bad.

Mostly, however, they had very little worthwhile to respond to because they were straw-manning their erstwhile interlocutors. They are all getting annoyed at "bad genAI criticism" while failing to engage with - and often failing to even mention - most of the actual substance of any serious genAI criticism. At least, any of the criticism that I've personally read.

I understand wanting to avoid a callout or Gish-gallop culture and just express your own ideas. So, I understand that they didn't link directly to particular sources or go point-by-point on anyone else's writing. Obviously I get it, since that's exactly what this post is doing too.

But if you're going to talk about how bad the genAI conversation is, without even mentioning huge categories of problem like "climate impact" or "disinformation"1 even once, I honestly don't know what conversation you're even talking about. This is peak "make up a guy to get mad at" behavior, which is especially confusing in this circumstance, because there's an absolutely huge crowd of actual people that you could already be mad at.

The people writing these pieces have historically seemed very thoughtful to me. Some of them I know personally. It is worrying to me that their critical thinking skills appear to have substantially degraded specifically after spending a bunch of time intensely using this technology which I believe has a scary risk of degrading one's critical thinking skills. Correlation is not causation or whatever, and sure, from a rhetorical perspective this is "post hoc ergo propter hoc" and maybe a little "ad hominem" for good measure, but correlation can still be concerning.

Yet, I cannot effectively respond to these folks, because they are making a practical argument that I cannot, despite my best efforts, find compelling evidence to refute categorically. My experiences of genAI are all extremely bad, but that is barely even anecdata. Their experiences are neutral-to-positive. Little scientific data exists. How to resolve this?2

The Aesthetics

As I begin to state my own position, let me lead with this: my factual analysis of genAI is hopelessly negatively biased. I find the vast majority of the aesthetic properties of genAI to be intensely unpleasant.

I have been trying very hard to correct for this bias, to try to pay attention to the facts and to have a clear-eyed view of these systems' capabilities. But the feelings are visceral, and the effort to compensate is tiring. It is, in fact, the desire to stop making this particular kind of effort that has me writing up this piece and trying to take an intentional break from the subject, despite its intense relevance.

When I say its "aesthetic qualities" are unpleasant, I don't just mean the aesthetic elements of output of genAIs themselves. The aesthetic quality of genAI writing, visual design, animation and so on, while mostly atrocious, is also highly variable. There are cherry-picked examples which look… fine. Maybe even good. For years now, there have been, famously, literally award-winning aesthetic outputs of genAI3.

While I am ideologically predisposed to see any "good" genAI art as accruing the benefits of either a survivorship bias from thousands of terrible outputs or simple plagiarism rather than its own inherent quality, I cannot deny that in many cases it is "good".

However, I am not just talking about the product, but the process; the aesthetic experience of interfacing with the genAI system itself, rather than the aesthetic experience of the outputs of that system.

I am not a visual artist and I am not really a writer4, particularly not a writer of fiction or anything else whose experience is primarily aesthetic. So I will speak directly to the experience of software development.

I have seen very few successful examples of using genAI to produce whole, working systems. There are no shortage of highly public miserable failures, particularly from the vendors of these systems themselves, where the outputs are confused, self-contradictory, full of subtle errors and generally unusable. While few studies exist, it sure looks like this is an automated way of producing a Net Negative Productivity Programmer, throwing out chaff to slow down the rest of the team.5

Juxtapose this with my aforementioned psychological motivations, to wit, I want to have everything in the computer be orderly and make sense, I'm sure most of you would have no trouble imagining that sitting through this sort of practice would make me extremely unhappy.

Despite this plethora of negative experiences, executives are aggressively mandating the use of AI6. It looks like without such mandates, most people will not bother to use such tools, so the executives will need muscular policies to enforce its use.7

Being forced to sit and argue with a robot while it struggles and fails to produce a working output, while you have to rewrite the code at the end anyway, is incredibly demoralizing. This is the kind of activity that activates every single major cause of burnout at once.

But, at least in that scenario, the thing ultimately doesn't work, so there's a hope that after a very stressful six month pilot program, you can go to management with a pile of meticulously collected evidence, and shut the whole thing down.

I am inclined to believe that, in fact, it doesn't work well enough to be used this way, and that we are going to see a big crash. But that is not the most aesthetically distressing thing. The most distressing thing is that maybe it does work; if not well enough to actually do the work, at least ambiguously enough to fool the executives long-term.

This project, in particular, stood out to me as an example. Its author, a self-professed "AI skeptic" who "thought LLMs were glorified Markov chain generators that didn't actually understand code and couldn't produce anything novel", did a green-field project to test this hypothesis.

Now, this particular project is not totally inconsistent with a world in which LLMs cannot produce anything novel. One could imagine that, out in the world of open source, perhaps there is enough "OAuth provider written in TypeScript" blended up into the slurry of "borrowed8" training data that the minor constraint of "make it work on Cloudflare Workers" is a small tweak9. It is not fully dispositive of the question of the viability of "genAI coding".

But it is a data point related to that question, and thus it did make me contend with what might happen if it were actually a fully demonstrative example. I reviewed the commit history, as the author suggested. For the sake of argument, I tried to ask myself if I would like working this way. Just for clarity on this question, I wanted to suspend judgement about everything else; assuming:

and so on, and so on… would I like to use this magic robot that could mostly just emit working code for me? Would I use it if it were free, in all senses of the word?

No. I absolutely would not.

I found the experience of reading this commit history and imagining myself using such a tool - without exaggeration - nauseating.

Unlike many programmers, I love code review. I find that it is one of the best parts of the process of programming. I can help people learn, and develop their skills, and learn from them, and appreciate the decisions they made, develop an impression of a fellow programmer's style. It's a great way to build a mutual theory of mind.

Of course, it can still be really annoying; people make mistakes, often can't see things I find obvious, and in particular when you're reviewing a lot of code from a lot of different people, you often end up having to repeat explanations of the same mistakes. So I can see why many programmers, particularly those more introverted than I am, hate it.

But, ultimately, when I review their code and work hard to provide clear and actionable feedback, people learn and grow and it's worth that investment in inconvenience.

The process of coding with an "agentic" LLM appears to be the process of carefully distilling all the worst parts of code review, and removing and discarding all of its benefits.

The lazy, dumb, lying robot asshole keeps making the same mistakes over and over again, never improving, never genuinely reacting, always obsequiously pretending to take your feedback on board.

Even when it "does" actually "understand" and manages to load your instructions into its context window, 200K tokens later it will slide cleanly out of its memory and you will have to say it again.

All the while, it is attempting to trick you. It gets most things right, but it consistently makes mistakes in the places that you are least likely to notice. In places where a person wouldn't make a mistake. Your brain keeps trying to develop a theory of mind to predict its behavior but there's no mind there, so it always behaves infuriatingly randomly.

I don't think I am the only one who feels this way.

The Affordances

Whatever our environments afford, we tend to do more of. Whatever they resist, we tend to do less of. So in a world where we were all writing all of our code and emails and blog posts and texts to each other with LLMs, what do they afford that existing tools do not?

As a weirdo who enjoys code review, I also enjoy process engineering. The central question of almost all process engineering is to continuously ask: how shall we shape our tools, to better shape ourselves?

LLMs are an affordance for producing more text, faster. How is that going to shape us?

Again arguing in the alternative here, assuming the text is free from errors and hallucinations and whatever, it's all correct and fit for purpose, that means it reduces the pain of circumstances where you have to repeat yourself. Less pain! Sounds great; I don't like pain.

Every codebase has places where you need boilerplate. Every organization has defects in its information architecture that require repetition of certain information rather than a link back to the authoritative source of truth. Often, these problems persist for a very long time, because it is difficult to overcome the institutional inertia required to make real progress rather than going along with the status quo. But this is often where the highest-value projects can be found. Where there's muck, there's brass.

The process-engineering function of an LLM, therefore, is to prevent fundamental problems from ever getting fixed, to reward the rapid-fire overwhelm of infrastructure teams with an immediate, catastrophic cascade of legacy code that is now much harder to delete than it is to write.


There is a scene in Game of Thrones where Khal Drogo kills himself. He does so by replacing a stinging, burning, therapeutic antiseptic wound dressing with some cool, soothing mud. The mud felt nice, addressed the immediate pain, removed the discomfort of the antiseptic, and immediately gave him a lethal infection.

The pleasing feeling of immediate progress when one prompts an LLM to solve some problem feels like cool mud on my brain.

The Economics

We are in the middle of a mania around this technology. As I have written about before, I believe the mania will end. There will then be a crash, and a "winter". But, as I may not have stressed sufficiently, this crash will be the biggest of its kind - so big, that it is arguably not of a kind at all. The level of investment in these technologies is bananas and the possibility that the investors will recoup their investment seems close to zero. Meanwhile, that cost keeps going up, and up, and up.

Others have reported on this in detail10, and I will not reiterate that all here, but in addition to being a looming and scary industry-wide (if we are lucky; more likely it's probably "world-wide") economic threat, it is also going to drive some panicked behavior from management.

Panicky behavior from management stressed that their idea is not panning out is, famously, the cause of much human misery. I expect that even in the "good" scenario, where some profit is ultimately achieved, will still involve mass layoffs rocking the industry, panicked re-hiring, destruction of large amounts of wealth.

It feels bad to think about this.

The Energy Usage

For a long time I believed that the energy impact was overstated. I am even on record, about a year ago, saying I didn't think the energy usage was a big deal. I think I was wrong about that.

It initially seemed like it was letting regular old data centers off the hook. But recently I have learned that, while the numbers are incomplete because the vendors aren't sharing information, they're also extremely bad.11

I think there's probably a version of this technology that isn't a climate emergency nightmare, but that's not the version that the general public has access to today.

The Educational Impact

LLMs are making academic cheating incredibly rampant.12

Not only is it so common as to be nearly universal, it's also extremely harmful to learning.13

For learning, genAI is a forklift at the gym.

To some extent, LLMs are simply revealing a structural rot within education and academia that has been building for decades if not centuries. But it was within those inefficiencies and the inconveniences of the academic experience that real learning was, against all odds, still happening in schools.

LLMs produce a frictionless, streamlined process where students can effortlessly glide through the entire credential, learning nothing. Once again, they dull the pain without regard to its cause.

This is not good.

The Invasion of Privacy

This is obviously only a problem with the big cloud models, but then, the big cloud models are the only ones that people actually use. If you are having conversations about anything private with ChatGPT, you are sending all of that private information directly to Sam Altman, to do with as he wishes.

Even if you don't think he is a particularly bad guy, maybe he won't even create the privacy nightmare on purpose. Maybe he will be forced to do so as a result of some bizarre kafkaesque accident.14

Imagine the scenario, for example, where a woman is tracking her cycle and uploading the logs to ChatGPT so she can chat with it about a health concern. Except, surprise, you don't have to imagine, you can just search for it, as I have personally, organically, seen three separate women on YouTube, at least one of whom lives in Texas, not only do this on camera but recommend doing this to their audiences.

Citation links withheld on this particular claim for hopefully obvious reasons.

I assure you that I am neither particularly interested in menstrual products nor genAI content, and if I am seeing this more than once, it is probably a distressingly large trend.

The Stealing

The training data for LLMs is stolen. I don't mean like "pirated" in the sense where someone illicitly shares a copy they obtained legitimately; I mean their scrapers are ignoring both norms15 and laws16 to obtain copies under false pretenses, destroying other people's infrastructure17.

The Fatigue

I have provided references to numerous articles outlining rhetorical and sometimes data-driven cases for the existence of certain properties and consequences of genAI tools. But I can't prove any of these properties, either at a point in time or as a durable ongoing problem.

The LLMs themselves are simply too large to model with the usual kind of heuristics one would use to think about software. I'd sooner be able to predict the physics of dice in a casino than a 2 trillion parameter neural network. They resist scientific understanding, not just because of their size and complexity, but because unlike a natural phenomenon (which could of course be considerably larger and more complex) they resist experimentation.

The first form of genAI resistance to experiment is that every discussion is a motte-and-bailey. If I use a free model and get a bad result I'm told it's because I should have used the paid model. If I get a bad result with ChatGPT I should have used Claude. If I get a bad result with a chatbot I need to start using an agentic tool. If an agentic tool deletes my hard drive by putting os.system("rm -rf ~/") into sitecustomize.py then I guess I should have built my own MCP integration with a completely novel heretofore never even considered security sandbox or something?

What configuration, exactly, would let me make a categorical claim about these things? What specific methodological approach should I stick to, to get reliably adequate prompts?

For the record though, if the idea of the free models is that they are going to be provocative demonstrations of the impressive capabilities of the commercial models, and the results are consistently dogshit, I am finding it increasingly hard to care how much better the paid ones are supposed to be, especially since the "better"-ness cannot really be quantified in any meaningful way.

The motte-and-bailey doesn't stop there though. It's a war on all fronts. Concerned about energy usage? That's OK, you can use a local model. Concerned about infringement? That's okay, somewhere, somebody, maybe, has figured out how to train models consensually18. Worried about the politics of enriching the richest monsters in the world? Don't worry, you can always download an "open source" model from Hugging Face. It doesn't matter that many of these properties are mutually exclusive and attempting to fix one breaks two others; there's always an answer, the field is so abuzz with so many people trying to pull in so many directions at once that it is legitimately difficult to understand what's going on.

Even here though, I can see that characterizing everything this way is unfair to a hypothetical sort of person. If there is someone working at one of these thousands of AI companies that have been springing up like toadstools after a rain, and they really are solving one of these extremely difficult problems, how can I handwave that away? We need people working on problems, that's like, the whole point of having an economy. And I really don't like shitting on other people's earnest efforts, so I try not to dismiss whole fields. Given how AI has gotten into everything, in a way that e.g. cryptocurrency never did, painting with that broad a brush inevitably ends up tarring a bunch of stuff that isn't even really AI at all.

The second form of genAI resistance to experiment is the inherent obfuscation of productization. The models themselves are already complicated enough, but the products that are built around the models are evolving extremely rapidly. ChatGPT is not just a "model", and with the rapid19 deployment of Model Context Protocol tools, the edges of all these things will blur even further. Every LLM is now just an enormous unbounded soup of arbitrary software doing arbitrary whatever. How could I possibly get my arms around that to understand it?

The Challenge

I have woefully little experience with these tools.

I've tried them out a little bit, and almost every single time the result has been a disaster that has not made me curious to push further. Yet, I keep hearing from all over the industry that I should.

To some extent, I feel like the motte-and-bailey characterization above is fair; if the technology itself can really do real software development, it ought to be able to do it in multiple modalities, and there's nothing anyone can articulate to me about GPT-4o which puts it in a fundamentally different class than GPT-3.5.

But, also, I consistently hear that the subjective experience of using the premium versions of the tools is actually good, and the free ones are actually bad.

I keep struggling to find ways to try them "the right way", the way that people I know and otherwise respect claim to be using them, but I haven't managed to do so in any meaningful way yet.

I do not want to be using the cloud versions of these models with their potentially hideous energy demands; I'd like to use a local model. But there is obviously not a nicely composed way to use local models like this.

Since there are apparently zero models with ethically-sourced training data, and litigation is ongoing20 to determine the legal relationships of training data and outputs, even if I can be comfortable with some level of plagiarism on a project, I don't feel that I can introduce the existential legal risk into other people's infrastructure, so I would need to make a new project.

Others have differing opinions of course, including some within my dependency chain, which does worry me, but I still don't feel like I can freely contribute further to the problem; it's going to be bad enough to unwind any impact upstream. Even just for my own sake, I don't want to make it worse.

This especially presents a problem because I have way too much stuff going on already. A new project is not practical.

Finally, even if I did manage to satisfy all of my quirky21 constraints, would this experiment really be worth anything? The models and tools that people are raving about are the big, expensive, harmful ones. If I proved to myself yet again that a small model with bad tools was unpleasant to use, I wouldn't really be addressing my opponents' views.

I'm stuck.

The Surrender

I am writing this piece to make my peace with giving up on this topic, at least for a while. While I do idly hope that some folks might find bits of it convincing, and perhaps find ways to be more mindful with their own usage of genAI tools, and consider the harm they may be causing, that's not actually the goal. And that is not the goal because it is just so much goddamn work to prove.

Here, I must return to my philosophical hobbyhorse of sprachspiel. In this case, specifically to use it as an analytical tool, not just to understand what I am trying to say, but what the purpose for my speech is.

The concept of sprachspiel is most frequently deployed to describe the goal of the language game being played, but in game theory, that's only half the story. Speech - particularly rigorously justified speech - has a cost, as well as a benefit. I can make shit up pretty easily, but if I want to do anything remotely like scientific or academic rigor, that cost can be astronomical. In the case of developing an abstract understanding of LLMs, the cost is just too high.

So what is my goal, then? To be king Canute, standing astride the shore of "tech", whatever that is, commanding the LLM tide not to rise? This is a multi-trillion dollar juggernaut.

Even the rump, loser, also-ran fragment of it has the power to literally suffocate us in our homes22 if they so choose, completely insulated from any consequence. If the power curve starts there, imagine what the winners in this industry are going to be capable of, irrespective of the technology they're building - just with the resources they have to hand. Am I going to write a blog post that can rival their propaganda apparatus? Doubtful.

Instead, I will just have to concede that maybe I'm wrong. I don't have the skill, or the knowledge, or the energy, to demonstrate with any level of rigor that LLMs are generally, in fact, hot garbage. Intellectually, I will have to acknowledge that maybe the boosters are right. Maybe it'll be OK.

Maybe the carbon emissions aren't so bad. Maybe everybody is keeping them secret in ways that they don't for other types of datacenter for perfectly legitimate reasons. Maybe the tools really can write novel and correct code, and with a little more tweaking, it won't be so difficult to get them to do it. Maybe by the time they become a mandatory condition of access to developer tools, they won't be miserable.

Sure, I even sincerely agree, intellectual property really has been a pretty bad idea from the beginning. Maybe it's OK that we've made an exception to those rules. The rules were stupid anyway, so what does it matter if we let a few billionaires break them? Really, everybody should be able to break them (although of course, regular people can't, because we can't afford the lawyers to fight off the MPAA and RIAA, but that's a problem with the legal system, not tech).

I come not to praise "AI skepticism", but to bury it.

Maybe it really is all going to be fine. Perhaps I am simply catastrophizing; I have been known to do that from time to time. I can even sort of believe it, in my head. Still, even after writing all this out, I can't quite manage to believe it in the pit of my stomach.

Unfortunately, that feeling is not something that you, or I, can argue with.


Acknowledgments

Thank you to my patrons. Normally, I would say, "who are supporting my writing on this blog", but in the case of this piece, I feel more like I should apologize to them for this than to thank them; these thoughts have been preventing me from thinking more productive, useful things that I actually have relevant skill and expertise in; this felt more like a creative blockage that I just needed to expel than a deliberately written article. If you like what you've read here and you'd like to read more of it, well, too bad; I am sincerely determined to stop writing about this topic. But, if you'd like to read more stuff like other things I have written, or you'd like to support my various open-source endeavors, you can support my work as a sponsor!


  1. And yes, disinformation is still an issue even if you're "just" using it for coding. Even sidestepping the practical matter that technology is inherently political, validation and propagation of poor technique is a form of disinformation.

  2. I can't resolve it, that's the whole tragedy here, but I guess we have to pretend I will to maintain narrative momentum here.

  3. The story in Creative Bloq, or the NYT, if you must

  4. although it's not for lack of trying, Jesus, look at the word count on this

  5. These are sometimes referred to as "10x" programmers, because they make everyone around them 10x slower.

  6. Douglas B. Laney at Forbes, Viral Shopify CEO Manifesto Says AI Now Mandatory For All Employees

  7. The National CIO Review, AI Mandates, Minimal Use: Closing the Workplace Readiness Gap

  8. Matt O'Brien at the AP, Reddit sues AI company Anthropic for allegedly 'scraping' user comments to train chatbot Claude

  9. Using the usual tricks to find plagiarism like searching for literal transcriptions of snippets of training data did not pull up anything when I tried, but then, that's not how LLMs work these days, is it? If it didn't obfuscate the plagiarism it wouldn't be a very good plagiarism-obfuscator.

  10. David Gerard at Pivot to AI, "Microsoft and AI: spending billions to make millions", Edward Zitron at Where's Your Ed At, "The Era Of The Business Idiot", both sobering reads

  11. James O'Donnell and Casey Crownhart at the MIT Technology Review, We did the math on AI's energy footprint. Here's the story you haven't heard.

  12. Lucas Ropek at Gizmodo, AI Cheating Is So Out of Hand In America's Schools That the Blue Books Are Coming Back

  13. James D. Walsh at the New York Magazine Intelligencer, Everyone Is Cheating Their Way Through College

  14. Ashley Belanger at Ars Technica, OpenAI slams court order to save all ChatGPT logs, including deleted chats

  15. Ashley Belanger at Ars Technica, AI haters build tarpits to trap and trick AI scrapers that ignore robots.txt

  16. Blake Brittain at Reuters, Judge in Meta case warns AI could 'obliterate' market for original works

  17. Xkeeper, TCRF has been getting DDoSed

  18. Kate Knibbs at Wired, Here's Proof You Can Train an AI Model Without Slurping Copyrighted Content

  19. and, I should note, extremely irresponsible

  20. Porter Anderson at Publishing Perspectives, Meta AI Lawsuit: US Publishers File Amicus Brief

  21. It feels bizarre to characterize what feel like baseline ethical concerns this way, but the fact remains that within the "genAI community", this places me into a tiny and obscure minority.

  22. Ariel Wittenberg for Politico, 'How come I can't breathe?': Musk's data company draws a backlash in Memphis

05 Jun 2025 5:22am GMT

17 Apr 2025

feedPlanet Twisted

Glyph Lefkowitz: Stop Writing `__init__` Methods

The History

Before dataclasses were added to Python in version 3.7 - in June of 2018 - the __init__ special method had an important use. If you had a class representing a data structure - for example a 2DCoordinate, with x and y attributes - you would want to be able to construct it as 2DCoordinate(x=1, y=2), which would require you to add an __init__ method with x and y parameters.

The other options available at the time all had pretty bad problems:

  1. You could remove 2DCoordinate from your public API and instead expose a make_2d_coordinate function and make it non-importable, but then how would you document your return or parameter types?
  2. You could document the x and y attributes and make the user assign each one themselves, but then 2DCoordinate() would return an invalid object.
  3. You could default your coordinates to 0 with class attributes, and while that would fix the problem with option 2, this would now require all 2DCoordinate objects to be not just mutable, but mutated at every call site.
  4. You could fix the problems with option 1 by adding a new abstract class that you could expose in your public API, but this would explode the complexity of every new public class, no matter how simple. To make matters worse, typing.Protocol didn't even arrive until Python 3.8, so, in the pre-3.7 world this would condemn you to using concrete inheritance and declaring multiple classes even for the most basic data structure imaginable.

Also, an __init__ method that does nothing but assign a few attributes doesn't have any significant problems, so it is an obvious choice in this case. Given all the problems that I just described with the alternatives, it makes sense that it became the obvious default choice, in most cases.

However, by accepting "define a custom __init__" as the default way to allow users to create your objects, we make a habit of beginning every class with a pile of arbitrary code that gets executed every time it is instantiated.

Wherever there is arbitrary code, there are arbitrary problems.

The Problems

Let's consider a data structure more complex than one that simply holds a couple of attributes. We will create one that represents a reference to some I/O in the external world: a FileReader.

Of course Python has its own open-file object abstraction, but I will be ignoring that for the purposes of the example.

Let's assume a world where we have the following functions, in an imaginary fileio module:

Our hypothetical fileio.open returns an integer representing a file descriptor1, fileio.read allows us to read length bytes from an open file descriptor, and fileio.close closes that file descriptor, invalidating it for future use.

With the habit that we have built from writing thousands of __init__ methods, we might want to write our FileReader class like this:

1
2
3
4
5
6
7
class FileReader:
    def __init__(self, path: str) -> None:
        self._fd = fileio.open(path)
    def read(self, length: int) -> bytes:
        return fileio.read(self._fd, length)
    def close(self) -> None:
        fileio.close(self._fd)

For our initial use-case, this is fine. Client code creates a FileReader by doing something like FileReader("./config.json"), which always creates a FileReader that maintains its file descriptor int internally as private state. This is as it should be; we don't want user code to see or mess with _fd, as that might violate FileReader's invariants. All the necessary work to construct a valid FileReader - i.e. the call to open - is always taken care of for you by FileReader.__init__.

However, additional requirements will creep in, and as they do, FileReader.__init__ becomes increasingly awkward.

Initially we only care about fileio.open, but later, we may have to deal with a library that has its own reasons for managing the call to fileio.open by itself, and wants to give us an int that we use as our _fd, we now have to resort to weird workarounds like:

1
2
3
4
def reader_from_fd(fd: int) -> FileReader:
    fr = object.__new__(FileReader)
    fr._fd = fd
    return fr

Now, all those nice properties that we got from trying to force object construction to give us a valid object are gone. reader_from_fd's type signature, which takes a plain int, has no way of even suggesting to client code how to ensure that it has passed in the right kind of int.

Testing is much more of a hassle, because we have to patch in our own copy of fileio.open any time we want an instance of a FileReader in a test without doing any real-life file I/O, even if we could (for example) share a single file descriptor among many FileReader s for testing purposes.

All of this also assumes a fileio.open that is synchronous. Although for literal file I/O this is more of a hypothetical concern, there are many types of networked resource which are really only available via an asynchronous (and thus: potentially slow, potentially error-prone) API. If you've ever found yourself wanting to type async def __init__(self): ... then you have seen this limitation in practice.

Comprehensively describing all the possible problems with this approach would end up being a book-length treatise on a philosophy of object oriented design, so I will sum up by saying that the cause of all these problems is the same: we are inextricably linking the act of creating a data structure with whatever side-effects are most often associated with that data structure. If they are "often" associated with it, then by definition they are not "always" associated with it, and all the cases where they aren't associated become unweildy and potentially broken.

Defining an __init__ is an anti-pattern, and we need a replacement for it.

The Solutions

I believe this tripartite assemblage of design techniques will address the problems raised above:

Using dataclass attributes to create an __init__ for you

To begin, let's refactor FileReader into a dataclass. This does get us an __init__ method, but it won't be one an arbitrary one we define ourselves; it will get the useful constraint enforced on it that it will just assign attributes.

1
2
3
4
5
6
7
@dataclass
class FileReader:
    _fd: int
    def read(self, length: int) -> bytes:
        return fileio.read(self._fd, length)
    def close(self) -> None:
        fileio.close(self._fd)

Except... oops. In fixing the problems that we created with our custom __init__ that calls fileio.open, we have re-introduced several problems that it solved:

  1. We have removed all the convenience of FileReader("path"). Now the user needs to import the low-level fileio.open again, making the most common type of construction both more verbose and less discoverable; if we want users to know how to build a FileReader in a practical scenario, we will have to add something in our documentation to point at a separate module entirely.
  2. There's no enforcement of the validity of _fd as a file descriptor; it's just some integer, which the user could easily pass an incorrect instance of, with no error.

In isolation, dataclass by itself can't solve all our problems, so let's add in the second technique.

Using classmethod factories to create objects

We don't want to require any additional imports, or require users to go looking at any other modules - or indeed anything other than FileReader itself - to figure out how to create a FileReader for its intended usage.

Luckily we have a tool that can easily address all of these concerns at once: @classmethod. Let's define a FileReader.open class method:

1
2
3
4
5
6
7
from typing import Self
@dataclass
class FileReader:
    _fd: int
    @classmethod
    def open(cls, path: str) -> Self:
        return cls(fileio.open(path))

Now, your callers can replace FileReader("path") with FileReader.open("path"), and get all the same benefits.

Additionally, if we needed to await fileio.open(...), and thus we needed its signature to be @classmethod async def open, we are freed from the constraint of __init__ as a special method. There is nothing that would prevent a @classmethod from being async, or indeed, from having any other modification to its return value, such as returning a tuple of related values rather than just the object being constructed.

Using NewType to address object validity

Next, let's address the slightly trickier issue of enforcing object validity.

Our type signature calls this thing an int, and indeed, that is unfortunately what the lower-level fileio.open gives us, and that's beyond our control. But for our own purposes, we can be more precise in our definitions, using NewType:

1
2
from typing import NewType
FileDescriptor = NewType("FileDescriptor", int)

There are a few different ways to address the underlying library, but for the sake of brevity and to illustrate that this can be done with zero run-time overhead, let's just insist to Mypy that we have versions of fileio.open, fileio.read, and fileio.write which actually already take FileDescriptor integers rather than regular ones.

1
2
3
4
from typing import Callable
_open: Callable[[str], FileDescriptor] = fileio.open  # type:ignore[assignment]
_read: Callable[[FileDescriptor, int], bytes] = fileio.read
_close: Callable[[FileDescriptor], None] = fileio.close

We do of course have to slightly adjust FileReader, too, but the changes are very small. Putting it all together, we get:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
from typing import Self
@dataclass
class FileReader:
    _fd: FileDescriptor
    @classmethod
    def open(cls, path: str) -> Self:
        return cls(_open(path))
    def read(self, length: int) -> bytes:
        return _read(self._fd, length)
    def close(self) -> None:
        _close(self._fd)

Note that the main technique here is not necessarily using NewType specifically, but rather aligning an instance's property of "has all attributes set" as closely as possible with an instance's property of "fully valid instance of its class"; NewType is just a handy tool to enforce any necessary constraints on the places where you need to use a primitive type like int, str or bytes.

In Summary - The New Best Practice

From now on, when you're defining a new Python class:

If you define all your classes this way, you will get all the benefits of a custom __init__ method:

Along with some nice new benefits:

Before dataclasses, it was always a bit weird that such a basic feature of the Python language - giving data to a data structure to make it valid - required overriding a method with 4 underscores in its name. __init__ stuck out like a sore thumb. Other such methods like __add__ or even __repr__ were inherently customizing esoteric attributes of classes.

For many years now, that historical language wart has been resolved. @dataclass, @classmethod, and NewType give you everything you need to build classes which are convenient, idiomatic, flexible, testable, and robust.


Acknowledgments

Thank you to my patrons who are supporting my writing on this blog. If you like what you've read here and you'd like to read more of it, or you'd like to support my various open-source endeavors, you can support my work as a sponsor! I am also available for consulting work if you think your organization could benefit from expertise on topics like "but what is a 'class', really?".


  1. If you aren't already familiar, a "file descriptor" is an integer which has meaning only within your program; you tell the operating system to open a file, it says "I have opened file 7 for you", and then whenever you refer to "7" it is that file, until you close(7).

  2. Or an attrs class, if you're nasty.

  3. Unless you have a really good reason to, of course. Backwards compatibility, or compatibility with another library, might be good reasons to do that. Or certain types of data-consistency validation which cannot be expressed within the type system. The most common example of these would be a class that requires consistency between two different fields, such as a "range" object where start must always be less than end. There are always exceptions to these types of rules. Still, it's pretty much never a good idea to do any I/O in __init__, and nearly all of the remaining stuff that may sometimes be a good idea in edge-cases can be achieved with a __post_init__ rather than writing a literal __init__.

17 Apr 2025 10:35pm GMT

01 Apr 2025

feedPlanet Twisted

Glyph Lefkowitz: A Bigger Database

A Database File

When I was 10 years old, and going through a fairly difficult time, I was lucky enough to come into the possession of a piece of software called Claris FileMaker Pro™.

FileMaker allowed its users to construct arbitrary databases, and to associate their tables with a customized visual presentation. FileMaker also had a rudimentary scripting language, which would allow users to imbue these databases with behavior.

As a mentally ill pre-teen, lacking a sense of control over anything or anyone in my own life, including myself, I began building a personalized database to catalogue the various objects and people in my immediate vicinity. If one were inclined to be generous, one might assess this behavior and say I was systematically taxonomizing the objects in my life and recording schematized information about them.

As I saw it at the time, if I collected the information, I could always use it later, to answer questions that I might have. If I didn't collect it, then what if I needed it? Surely I would regret it! Thus I developed a categorical imperative to spend as much of my time as possible collecting and entering data about everything that I could reasonably arrange into a common schema.

Having thus summoned this specter of regret for all lost data-entry opportunities, it was hard to dismiss. We might label it "Claris's Basilisk", for obvious reasons.

Therefore, a less-generous (or more clinically-minded) observer might have replaced the word "systematically" with "obsessively" in the assessment above.

I also began writing what scripts were within my marginal programming abilities at the time, just because I could: things like computing the sum of every street number of every person in my address book. Why was this useful? Wrong question: the right question is "was it possible" to which my answer was "yes".

If I was obliged to collect all the information which I could observe - in case it later became interesting - I was similarly obliged to write and run every program I could. It might, after all, emit some other interesting information.

I was an avid reader of science fiction as well.

I had this vague sense that computers could kind of think. This resulted in a chain of reasoning that went something like this:

  1. human brains are kinda like computers,
  2. the software running in the human brain is very complex,
  3. I could only write simple computer programs, but,
  4. when you really think about it, a "complex" program is just a collection of simpler programs

Therefore: if I just kept collecting data, collecting smaller programs that could solve specific problems, and connecting them all together in one big file, eventually the database as a whole would become self-aware and could solve whatever problem I wanted. I just needed to be patient; to "keep grinding" as the kids would put it today.

I still feel like this is an understandable way to think - if you are a highly depressed and anxious 10-year-old in 1990.

Anyway.


35 Years Later

OpenAI is a company that produces transformer architecture machine learning generative AI models; their current generation was trained on about 10 trillion words, obtained in a variety of different ways from a large variety of different, unrelated sources.

A few days ago, on March 26, 2025 at 8:41 AM Pacific Time, Sam Altman took to "X™, The Everything App™," and described the trajectory of his career of the last decade at OpenAI as, and I quote, a "grind for a decade trying to help make super-intelligence to cure cancer or whatever" (emphasis mine).

I really, really don't want to become a full-time AI skeptic, and I am not an expert here, but I feel like I can identify a logically flawed premise when I see one.

This is not a system-design strategy. It is a trauma response.

You can't cure cancer "or whatever". If you want to build a computer system that does some thing, you actually need to hire experts in that thing, and have them work to both design and validate that the system is fit for the purpose of that thing.


Aside: But... are they, though?

I am not an oncologist; I do not particularly want to be writing about the specifics here, but, if I am going to make a claim like "you can't cure cancer this way" I need to back it up.

My first argument - and possibly my strongest - is that cancer is not cured.

QED.

But I guess, to Sam's credit, there is at least one other company partnering with OpenAI to do things that are specifically related to cancer. However, that company is still in a self-described "initial phase" and it's not entirely clear that it is going to work out very well.

Almost everything I can find about it online was from a PR push in the middle of last year, so it all reads like a press release. I can't easily find any independently-verified information.

A lot of AI hype is like this. A promising demo is delivered; claims are made that surely if the technology can solve this small part of the problem now, within 5 years surely it will be able to solve everything else as well!

But even the light-on-content puff-pieces tend to hedge quite a lot. For example, as the Wall Street Journal quoted one of the users initially testing it (emphasis mine):

The most promising use of AI in healthcare right now is automating "mundane" tasks like paperwork and physician note-taking, he said. The tendency for AI models to "hallucinate" and contain bias presents serious risks for using AI to replace doctors. Both Color's Laraki and OpenAI's Lightcap are adamant that doctors be involved in any clinical decisions.

I would probably not personally characterize "'mundane' tasks like paperwork and … note-taking" as "curing cancer". Maybe an oncologist could use some code I developed too; even if it helped them, I wouldn't be stealing valor from them on the curing-cancer part of their job.

Even fully giving it the benefit of the doubt that it works great, and improves patient outcomes significantly, this is medical back-office software. It is not super-intelligence.

It would not even matter if it were "super-intelligence", whatever that means, because "intelligence" is not how you do medical care or medical research. It's called "lab work" not "lab think".

To put a fine point on it: biomedical research fundamentally cannot be done entirely by reading papers or processing existing information. It cannot even be done by testing drugs in computer simulations.

Biological systems are enormously complex, and medical research on new therapies inherently requires careful, repeated empirical testing to validate the correspondence of existing research with reality. Not "an experiment", but a series of coordinated experiments that all test the same theoretical model. The data (which, in an LLM context, is "training data") might just be wrong; it may not reflect reality, and the only way to tell is to continuously verify it against reality.

Previous observations can be tainted by methodological errors, by data fraud, and by operational mistakes by practitioners. If there were a way to do verifiable development of new disease therapies without the extremely expensive ladder going from cell cultures to animal models to human trials, we would already be doing it, and "AI" would just be an improvement to efficiency of that process. But there is no way to do that and nothing about the technologies involved in LLMs is going to change that fact.


Knowing Things

The practice of science - indeed any practice of the collection of meaningful information - must be done by intentionally and carefully selecting inclusion criteria, methodically and repeatedly curating our data, building a model that operates according to rules we understand and can verify, and verifying the data itself with repeated tests against nature. We cannot just hoover up whatever information happens to be conveniently available with no human intervention and hope it resolves to a correct model of reality by accident. We need to look where the keys are, not where the light is.

Piling up more and more information in a haphazard and increasingly precarious pile will not allow us to climb to the top of that pile, all the way to heaven, so that we can attack and dethrone God.

Eventually, we'll just run out of disk space, and then lose the database file when the family gets a new computer anyway.


Acknowledgments

Thank you to my patrons who are supporting my writing on this blog. If you like what you've read here and you'd like to read more of it, or you'd like to support my various open-source endeavors, you can support my work as a sponsor! Special thanks also to Itamar Turner-Trauring and Thomas Grainger for pre-publication feedback on this article; any errors of course remain my own.

01 Apr 2025 12:47am GMT

29 Nov 2024

feedPlanet Plone - Where Developers And Integrators Write

Maurits van Rees: Lightning talks Friday

Bonnie Tyler Sprint

On 12 August 2026 there is a total solar eclipse that can be seen from Valencia, Spain. So we organise a sprint there.

This conference

We had 291 participants, 234 in person and 57 online. 13 Brazilian states (that is all of them), 14 countries.

24.5 percent women, was 13% in 2013, so that has gone up, but we are not there yet. Thank you to PyLadies and Django Girls for making this happen.

We had more than 80 presenters, about 30 lightning talks, lots of talk in the hall ways.

Thanks also to the team!

Ramiro Luz: Yoga time

Yoga exercise.

Rikupekka: University case student portal

We have a student portal at the university. But mostly:

Welcome to Jyväskylä university in Finald for Plone conference 2025, October 13-19!

Jakob: Beethovensprint

26-30 mei 2025 in Bonn, Duitsland.

Afterwards, on May 30 and June 1 there will be FedCon in Bonn, a SciFi convention.

Piero/Victor: BYOUI

Add-ons first development with @plone/registry. See https://plone-registry.readthedocs.io/

It allows for development that is framework agnostic, so it is not only for Plone. It is around configuration that can be extended and injected, which is tricky in most javascript frameworks.

Imagine it.

Ana Dulce: 3D printing

For a difficult model I had trust the process, it took a week, but it worked.

Renan & Iza: Python Brasil

We organised the Python Brasil conference from 16 to 23 October this year in Rio de Janeiro.

Next year 21-27 October in São Paulo.

Erico: Python Cerrado

31 July to 2 August 2025 is the next Python Cerrado conference.

29 Nov 2024 10:25pm GMT

Maurits van Rees: Paul Roeland: The value of longevity

Link to talk information on Plone conference website.

I work for the Clean Clothes Campaign: https://cleanclothes.org/

After three large disasters in factories in 2012 and 2013 with over 1000 deaths, it took three years to get an agreement with clothes manufacturers to get 30 million dollar compensation. It does not bring lives back, but it helps the survivors.

See Open Supply Hub for open data that we collected, for checking which brands are produced in which factories.

Documenting history matters. Stories must be told.

The global closing industry is worth around 1.8 trillion dollars, in a country that would put them on the 12th place in the world. 75 million workers.

Our strongest weapon: backlinks. We have links from OECD, UN, wikipedia, school curriculum, books. Especially those last two don't change ever, so you should never change urls.

Plone: enable the sitemap, please, why not by default? Create a good robots.txt. I weekly check Google Search console, looking for broken links. Tag early, tag often, great tool, even if you have an AI do it.

Our website: started 1998 written in Notepad, 2004 Dreamweaver, 2006 Bluefish, 2010 Joomla, 2013 Plone 4, 2020 Castle CMS (opinionated distribution of Plone, but does not really exist anymore) 2024 Plone 6 with Volto Light Theme (work in progress). Thank you kitconcept for all the help, especially Jonas.

Migrations are painful. Along the years we used wget to csv to SQL to csv, Python script, "Franken-mogrifier", collective.exportimport.

Lessons learned: stable urls are awesome, migrations are painful. Please don't try to salvage CSS from your old site, just start fresh in your new system. Do not try to migrate composite pages or listings.

What if your website does not provide an export? Use wget, still works and is better than httrack. sed/awk/regex are your friend. archivebox (WARC).

Document your steps for your own sanity.

To manage json, jq or jello can be used. sq is a Swiss knife for json/sql/csv. emuto is a hybrid between jq and GraphQL.

Normalize import/export. We have `plone.exportimport` in core now.

In the future I would like a plone exporter script that accepts a regex and exports only matching pages. Switch backends: ZODB, relstorage, nick, quantum-db. Sitewide search/replace/sed. Sneakernet is useful in difficult countries where you cannot send data over the internet: so export to a usb stick.

A backup is only a backup if it regularly gets restored so you know that it works.

  • Keeping content and URL stability is a superpower.
  • Assuming that export/import/backup/restore/migration are rare occurrences, is wrong.
  • Quick export/import is very useful.

Do small migrations, treat it as maintenance. Don't be too far behind. Large migrations one every five years will be costly. Do a small migration every year. Do your part. Clients should also do their part, by budgeting this yearly. That is how budgeting works. Use every iteration to review custom code.

Make your sites live long and prosper.

29 Nov 2024 8:58pm GMT

Maurits van Rees: Fred van Dijk: Run Plone in containers on your own cluster with coolify.io

Link to talk information on Plone conference website.

Sorry, I ran out of time trying to set up https://coolify.io

So let's talk about another problem. Running applications (stacks) in containers is the future. Well: abstraction and isolation is the future, and containers is the current phase.

I am on the Plone A/I team, with Paul, Kim, Erico. All senior sysadmins, so we kept things running. In 2022 we worked on containerisation. Kubernetes was the kool kid then, but Docker Swarm was easier. Checkout Erico's training with new cookieplone templates.

Doing devops well is hard. You have a high workload, but still need to keep learning new stuff to keep up with what is changing.

I want to plug Coolify, which is a full open source product. "Self-hosting with super powers." The main developer, Andras Bacsal, believes in open source and 'hates' pay by usage cloud providers with a vengeance.

Coolify is still docker swarm. We also want Kubernetes support. But we still need sysadmins. Someone will still need to install coolify, and keep it updated.

I would like to run an online DevOps course somewhere January-March 2025. 4-6 meetings of 2 hours, maybe Friday afternoon. Talk through devops and sysadmin concepts, show docker swarm, try coolify, etc.

29 Nov 2024 7:58pm GMT