01 Jun 2026

feedPlanet Python

death and gravity: DynamoDB crash course: part 3 – design patterns

Previously

This is the last part of a series covering core DynamoDB concepts. The goal is to help you understand idiomatic usage and trade-offs in under an hour.

In the first part, I summarized DynamoDB's main proposition to its users like so:

data modeling complexity is always preferable to complexity coming from infrastructure maintenance, availability, and scalability

Today, we're looking at the design patterns that help manage this complexity, making the most of its data model and features and working around its limits.

Contents

Composite keys #

Composite (aka synthetic) keys underpin most other patterns.

The idea is simple: keys don't have to be natural attributes of your data, they can be composed of other attributes that enable specific access patterns. This works both with table and index keys.

How do you compose keys? By string concatenation, of course! Careful with numbers though, they need padding to be useful in sort keys.

Example

To sort lexicographically by more than one attribute, you group them in a sort key, e.g. {Album}#{Song}.

Or, in single table design, you distinguish between item types by prefixing keys with the type, e.g. album#{Album}.

Or, in partition key sharding, you spread the load on a GSI partition by splitting one partition key into multiple ones, e.g. {Genre}#{shard}.

But denormalization has its trade-offs. For sort key {Album}#{Song}, should Album and Song also be separate attributes? If yes, you need to ensure they never change, but you can use them in indexes (e.g. a GSI with Album as primary key). If no, items can't become inconsistent, but you need to parse the key to get them.

This was inconvenient enough that DynamoDB finally added multi-attribute keys support to GSIs in 2025 (although not inconvenient enough to also add it to tables).

See also

Single table design #

The AWS guidance is to use as few tables as possible:

As a general rule, you should maintain as few tables as possible in a DynamoDB application. [...] A single table with inverted indexes can usually enable simple queries to create and retrieve the complex hierarchical data structures required by your application.

This culminates in single table design, where you put all entities in the same table, and tell them apart based on the key format, usually using a prefix. With this pattern, one DynamoDB table corresponds to a whole relational database.

The easiest way is to put items related to a top-level entity on the same partition. The main benefit is that joins with the top-level entity become trivial. A second one is that you can sometimes get different entity types in a single query, which can be both faster and cheaper (fewer queries; small items pack into fewer capacity units).

Example

You can group items related to an Artist on the same partition, with sort keys like artist, album#{Album}, and song#{Album}#{Song}.

# table Music (partition key: Artist, sort key: sk)
Solar Fields: !btree
  'album#Leaving Home': { Genre: Electronic }
  'artist': { Variations: [ Solarfields ] }
  'song#Leaving Home#Air Song': { Duration: 741 }
  'song#Leaving Home#Monogram': { Duration: 944 }

Besides getting items of a single type, you can also get artist details and albums in a single query (sk BETWEEN "album#" AND "artist").

But choose wisely - queries can have only one sort key condition, so you can't also get album details and songs in a single query with this schema; sort keys {Album} and {Album}#{Song} would do it, at the expense of the first query.

Sometimes, it can be useful to put some sub-entities on dedicated partitions, accepting that joins will have to be done in code.

Example

In the example above, a popular artist with lots of songs can lead to:

Perhaps it's better to put the songs in each album on separate partitions:

# table Music (partition key: pk, sort key: sk)
'artist#Solar Fields': !btree
  'album#Leaving Home': { Genre: Electronic }
  'artist': { Variations: [ Solarfields ] }
'song#Solar Fields#Leaving Home': !btree
  'Air Song': { Duration: 741 }
  'Monogram': { Duration: 944 }

This spreads the load onto multiple partitions, which should fix throttling.

The downside is that list songs for artist is now a two-step operation: first one query for the albums, then one query per album for the songs. The upside is that the per-album queries can be done in parallel, which wasn't possible before.

A consequence of this design is that you need a GSI to list items of a specific type (otherwise, you have to do a full table scan). Of note, exceeding the GSI partition throughput limit will cause write throttling on the base table; in the absence of a natural high-cardinality GSI partition key, sharding or some other composite key can help.

A final benefit of using a single table is better utilization with provisioned mode: usage gets averaged across entities and tends to be smoother, and spikes can share the same spare capacity.

See also

GSI overloading #

GSI overloading is just single table design for indexes - you put different values in the GSI key attributes, depending on item type. This way you can index more attributes than the 20 GSIs per table quota, and it can be cheaper too, since, like with tables, fewer indexes make better use of spare provisioned capacity.

Example

For a table that contains both artist and album items, a single GSI can be used for entirely different purposes:

# table Music (partition key: Artist, sort key: sk)
2 Bit Pie: !btree
  'album#2 Pie Island': { gsi1pk: 'album#Electronic' }
  'artist': { gsi1pk: 'artist#United Kingdom' }
Ishome: !btree
  'album#Confession': { gsi1pk: 'album#Electronic' }
  'artist': { gsi1pk: 'artist#Russia' }
# GSI GSI1 (partition key: gsi1pk, sort key: Artist)
'artist#United Kingdom': !btree
  2 Bit Pie: { sk: 'artist' }
'artist#Russia': !btree
  Ishome: { sk: 'artist' }
'album#Electronic': !btree
  2 Bit Pie: { sk: 'album#2 Pie Island' }
  Ishome: { sk: 'album#Confession' }

See also

Partition key sharding #

Sometimes, a partition key composed of multiple natural attributes is not enough to spread the load evenly across partitions; you can deal with this by putting items with the same natural attributes on multiple partitions.

So, what partition key should you use? One option is to use a random suffix from a known range; this allows you to list items for a natural attribute value by doing multiple queries, one for each suffix.

Example

For a table of songs, using Album as the partition key won't work, since not all songs are released on an album; Artist always has a value, but some artists have hundreds or even thousands of songs, which can lead to throttling.

Instead, we can use {Artist}#{randrange(10)} as partition key, which allows ten times as many items before we reach throughput limits. To list an artist's songs:

for shard in range(10):
    for item in dynamodb.query(f"{artist}#{shard}"):
        yield item

A downside of random suffixes is that you can't get a specific item, because you don't know what its suffix is. A better option is to calculate the suffix from an attribute that you do know, for example using its hash modulo N.

Example

With primary key {Artist}#{hash(Song) % 10)}, we can get a song like this:

def hash(s):
    return int.from_bytes(sha256(s.encode()).digest())

shard = hash(song_title) % 10
dynamodb.get_item(f"{artist}#{shard}", song_title)

A lot of times you need to list items by a low-cardinality attribute, so sharding may be even more important for GSIs.

Example

Assuming dedicated album items, you can list all the albums by putting them in a single GSI partition key called albums, but this will definitely cause throttling.

To avoid it, you can use GSI partition key album#{hash(Album} % 100} if you don't care about the order, or something like album#{Album[:2].lower()} if you do (but likely more sophistication is needed - th will be a very common album title prefix, and some album titles don't contain letters at all).

Even if throttling is not an issue (e.g. single infrequent reader), sharding allows you to query multiple partitions in parallel, which can speed up getting the entire result set.


So, how many shards should you have? That depends on the number, size, and how often you access the items, and is also a trade-off - too many shards means additional queries and latency, too few shards means you still overload the partitions sometimes.

Importantly, increasing the number of shards is non-trivial. For tables, you usually need to rebalance the items in place. For indexes, it's cleaner to move to a new index, or if you just need to list items by type, you can put all new items on new shards.

Regardless, you have to support it in code, do a backfill, and orchestrate the migration, which all become more complex if downtime and inconsistencies are not acceptable (e.g. if you expose a pagination token based on LastEvaluatedKey, you may want to support both versions during the switch).

See also

Sparse indexes #

An item with missing index partition/sort key attributes won't appear in the index, and you won't pay for it. This can be used deliberately to query a subset of the items in the table, like those of a specific type or in a specific state.

Example

Assuming dedicated album items, an alternative way to list all the albums is to have a GSI with {Album} as partition key, and just scan the entire index (the primary key has to be a dedicated attribute that only albums have, so that only album items appear in the index).

Or, you can use a dedicated GSI with CoverOf as primary key to list cover songs.

See also

Base table indexes #

In some cases, GSIs won't cut it - maybe you need a strongly consistent index, or need to model a many-to-one relationship (indexes map one item in the base table to one item in the index).

Instead, you can maintain an index in the base table by having additional index items associated with the main item; to guarantee atomic updates, use transactions. You then go from the main item to the index items via a main item attribute, and from the index items to the main item via their partition key.

Example

Songs have different identifiers in external systems, such as ISRC, ISWC, or MBID. To query songs by multiple external ids, you'd structure your database like this:

(Alternatively, you could have one sparse index per external id type, but then you lose strong consistency, and risk running out of GSIs).

Note that modeling one-to-many relationships isn't this involved, since it fits neatly into the related-items-same-partition variant of single table design.

See also

Optimistic locking #

Optimistic locking is a concurrency control method useful when conflicts are rare, so instead of acquiring a lock to do changes, you check if someone else changed the data right before commiting, as part of an atomic operation.

In DynamoDB, that operation is a conditional write; items get an integer version attribute, and every time you want to update an item, you:

  1. read the item, including the version
  2. increment the version and modify the item
  3. update the item, using a condition expression to ensure the version matches
    1. if successful, you're done
    2. else, start over from the beginning

You can also do this in transactions to update groups of related items, like in the base table index pattern above, with only the main item needing a version.

The upside of optimistic locking is that it is faster on average, since updates usually succeed on the first try; for fewer conflicts, use strongly consistent reads.

The downside is that it requires explicit support - it must be possible to start over from the beginning, which complicates logic, especially if you need to interact with other systems besides updating the item (e.g. to send a notification).

See also


Anyway, that's it for now.

See also

For mode details and examples, check out the official documentation:

Learned something new today? Share it with others, it really helps!

Want to know when new articles come out? Subscribe here to get new stuff straight to your inbox!

01 Jun 2026 10:44am GMT

Speed Matters: Scandir Rs


layout: post title: scandir-rs tagline: Blazing-fast directory traversal for Python - up to 70× faster than os.walk. date: 2026-06-01 08:40:00 +0100 categories: posts ------

scandir-rs: High-Performance Directory Traversal for Python

File system traversal is often a hidden bottleneck.

Whether you're indexing files, collecting statistics, searching large directory trees, or building developer tools, performance matters. That's why I created scandir-rs: a Rust-powered Python library designed to be a drop-in replacement for os.walk() and os.scandir(), while delivering dramatically better performance and additional functionality.

A new version (2.9.9) is available with following changes compared to the version I've introduced here the last time (2.7.1):

Why scandir-rs?

Because speed matters…

🚀 Significant Performance Improvements

Compared to Python's built-in implementations:

When processing millions of files, these speedups can turn minutes into seconds.

Benchmarks results for running scandir in linux-5.9 folder

scandir benchmarks scandir-rs Walk benchmark on Linux (kernel 5.9) scandir benchmarks scandir-rs Walk benchmark on Windows (kernel 5.9)

🔍 Richer Metadata

Beyond the standard os.walk() and os.scandir() APIs, scandir-rs can return:

⚡ Background Processing

Long-running scans can run asynchronously in the background, allowing your application to process results while scanning is still in progress.

Installation

pip install scandir-rs

Usage Examples

Directory Statistics

Get fast statistics for an entire directory tree:

import scandir_rs as scandir

print(scandir.Count("/usr").collect())

Extended Statistics

Include additional metadata and hardlink detection:

import scandir_rs as scandir

print(
    scandir.Count(
        "/usr",
        return_type=scandir.ReturnType.Ext
    ).collect()
)

Background Scanning

Process results while scanning continues in the background:

import scandir_rs as scandir

counter = scandir.Count("/usr")

with counter:
    while counter.busy:
        results = counter.results()
        # Process intermediate results

# Final results as JSON
results = counter.to_json()

Faster os.walk()

A familiar interface with significantly better performance:

import scandir_rs as scandir

for root, dirs, files in scandir.Walk("/usr"):
    # Process files

Extended Walk Information

Retrieve additional file categories and error information:

import scandir_rs as scandir

for root, dirs, files, symlinks, other, errors in scandir.Walk(
    "/usr",
    return_type=scandir.ReturnType.Ext
):
    # Process files

On Unix systems, other includes special file types such as pipes and devices.

Faster os.scandir()

Collect all entries at once:

import scandir_rs as scandir

entries, errors = scandir.Scandir("/usr").collect()

Or iterate lazily:

import scandir_rs as scandir

for entry in scandir.Scandir("/usr"):
    # Process entry

Extended Metadata

Request detailed information for each directory entry:

import scandir_rs as scandir

for entry in scandir.Scandir(
    "/usr",
    return_type=scandir.ReturnType.Ext
):
    # Process entry

Entries are returned as DirEntryExt objects. Errors are reported as tuples containing:

(relative_path, error_message)

allowing scans to continue even when individual files cannot be accessed.

Benchmark Results

Walk Performance

Operation Linux Windows
Walk vs os.walk Up to 13× faster Up to 70× faster

Scandir Performance

Operation Linux Windows
Scandir vs os.scandir Up to 6.5× faster Up to 6.5× faster

For detailed benchmark data and methodology, see the benchmark documentation:

https://github.com/brmmm3/scandir-rs/blob/master/pyscandir/doc/benchmarks.md

Get Started

If your application spends time traversing large directory trees, scandir-rs can provide substantial performance improvements with minimal code changes.

The API is intentionally familiar, making migration from os.walk() and os.scandir() straightforward while unlocking additional capabilities and significantly faster execution.

Source code, documentation, and issue tracker:

https://github.com/brmmm3/scandir-rs

Licensed under the MIT License.

01 Jun 2026 12:00am GMT

31 May 2026

feedDjango community aggregator: Community blog posts

Django: introducing django-integrity-policy

Back in January, Firefox's Security & Privacy Newsletter for 2025 Q4 piqued my interest with this mention:

Integrity-Policy: Firefox 145 has added support for the Integrity-Policy response header. The header allows websites to ensure that only scripts with an integrity attribute will load.

A new security header! That's right up my street: I've cared about getting security headers right since 2018, when I created django-permissions-policy to set the Permissions-Policy header. (At the time, it was called Feature-Policy: why they changed it, I can't say, people just liked it better that way.)

The new Integrity-Policy header helps with subresource integrity, a tool for securely including third-party scripts and stylesheets on your website. Browsers support the integrity attribute on <script> and <link> tags, which allows you to specify a hash of the expected content, like:

<script
  src=https://cdn.jsdelivr.net/npm/htmx.org@4.0.0-beta4/dist/htmx.min.js
  integrity=sha384-aWZK1NtOs/aWb/+YZdTM8q2JkWEshlMc9mgZ189numT9bwFhyAyYEoO4nO/2dTXt
  crossorigin=anonymous></script>

If the content downloaded from the external source doesn't match the expected hash, the browser blocks it from loading. This is a great defense against the target URL changing its contents, executing a supply chain attack against your visitors.

(Generally, I recommend you avoid loading anything from a third-party URL, per Reasons to avoid Javascript CDNs. But sometimes, you gotta do what you gotta do, and declaring integrity is a great idea, then.)

Integrity-Policy allows you to opt in to requiring integrity attributes on your page, ensuring that you can never load potentially-compromised resources. The header is fairly simple, at least right now-here's a complete example that requires integrity for all scripts and stylesheets:

Integrity-Policy: blocked-destinations=(script style)

You can add in endpoints to tell browsers where to send violation reports to, and there's the second Integrity-Policy-Report-Only header which lets you test a policy without enforcing it.

Note there's no possibility to differentiate between first- and third-party resources in Integrity-Policy. If you set it, you'll need to add integrity attributes to all your scripts and stylesheets, including those you host yourself. This is by design: the header is being developed as part of Web Application Integrity, Consistency and Transparency (WAICT), an initiative to bring app-store-level "code signing" to the web, where users can be sure that CDNs and other intermediaries haven't tampered with served code.

Integrity-Policy is supported on Firefox 145+ and Chrome 138+.

django-integrity-policy

My new package, django-integrity-policy provides a middleware for setting the Integrity-Policy headers in a familiar Django-style way. You install it, add the middleware:

MIDDLEWARE = [
    ...,
    "django.middleware.security.SecurityMiddleware",
    "django_integrity_policy.IntegrityPolicyMiddleware",
    ...,
]

…and then configure the appropriate setting(s):

INTEGRITY_POLICY = {
    "blocked-destinations": ["script", "style"],
}

So far, it's pretty basic, but I expect WAICT will increase the complexity of Integrity-Policy over time, and I'll add support for new options as they come along.

Once Integrity-Policy is set, the browser will block any scripts or stylesheets (depending on configuration) that lack a valid integrity attribute, including your first-party resources. That means you need to add integrity attributes to all your static files. Luckily, this has been considered before by the legendary Jake Howard, who made a package called django-sri. It provides template tags to generate appropriately hashed HTML tags. For example:

{% load sri %}

{% sri_static "app.js" %}
{% sri_static "app.css" %}

…will output:

<script src="/static/app.js" integrity="sha256-..."></script>
<link rel="stylesheet" href="/static/app.css" integrity="sha256-..."/>

These tags would be allowed under a maximally strict integrity policy.

See the example application in the django-integrity-policy repository for a full working project.

LLM generation

I built this package using just two prompts to Claude. I copied the repository for my previous security header package, django-permissions-policy, and reset its Git history. I then used this prompt to Claude, inside Zed:

The current repository is a copy of my package django-permissions-policy

It's time to turn it into django-integrity-policy, for the relevant headers per the below mdn docs

(Integrity-Policy MDN docs from https://github.com/mdn/content/blob/main/files/en-us/web/http/reference/headers/integrity-policy/index.md?plain=1)

(Integrity-Policy-Report-Only MDN docs from https://github.com/mdn/content/blob/main/files/en-us/web/http/reference/headers/integrity-policy-report-only/index.md?plain=1)

Check and edit every file to be that new package, copyright 2026, keeping the general testing infrastructure and so on.

Don't run any commands yet, just check and edit every file

I then took a 20-minute nap and woke up to a near-complete package. I reviewed it, ran the tests, made some minor edits, committed, and pushed to PyPI!

LLMs are rightfully a hot topic, with heady supporters and heavy detractors. I use them begrudgingly and somewhat sparingly, and I cannot wait for the future where I can stick to local models. (Gemma 4 can run on my M1 Mac and approaches Claude's performance on many tasks, so we're getting there.)

This task, though, was a perfect fit for LLM code generation: the existing repository acted as great context for structure, the new package was very similar in shape ("same same but different"), and the documentation provided a clear specification for what to build. The LLM could mash things up for me with minimal oversight, and I could check the work quickly.

I would guess that overall, using an LLM saved me a couple of hours of mostly grunt-work, like checking every copied-over configuration file. That's pretty valuable for me, and honestly made creating the package feasible.

The future

I'm not sure how widely used Integrity-Policy will be, and therefore how popular django-integrity-policy will end up. But this was an interesting exercise and I am interested to see how the header and other work from WAICT evolves. I will try to keep the package updated, and we'll see if it ever reaches a point where proposing support in Django itself makes sense.

Fin

May integrity always be your policy,

-Adam

31 May 2026 4:00am GMT

30 May 2026

feedPlanet Python

Kay Hayen: Nuitka Release 4.1

This is to inform you about the new stable release of Nuitka. It is the extremely compatible Python compiler, "download now".

This release adds many new features and corrections with a focus on async code compatibility, missing generics features, and Python 3.14 compatibility and Python compilation scalability yet again.

Bug Fixes

Package Support

New Features

Optimization

Anti-Bloat

Organizational

Tests

Cleanups

Summary

This release builds on the scalability improvements established in 4.0, with enhanced Python 3.14 support, expanded package compatibility, and significant optimization work.

The --project option seems usable now.

Python 3.14 support remains experimental, but only barely made the cut, and probably will get there in hotfixes. Some of the corrections came in so late before the release, that it was just not possible to feel good about declaring it fully supported just yet.

30 May 2026 10:00pm GMT

29 May 2026

feedDjango community aggregator: Community blog posts

Issue 339: Early Bird DjangoCon US Tickets Ending Soon

News

DjangoCon US 2026: Early Bird Tickets End May 31st!

Early bird ticket sales for DjangoCon US 2026 end on May 31, 2026, with discounted pricing available. The conference runs five days at Voco Chicago Downtown and includes community-selected talks plus Django contribution sprints.


Wagtail CMS News

Wagtail Space NL - June 12

A full-day conference in Rotterdam, The Netherlands on Wagtail, with talks covering a range of topics, lightning talks, hallway discussions, and more.



Updates to Django

Today, "Updates to Django" is presented by Pradhvan from Djangonaut Space! 🚀

Last week we had 16 pull requests merged into Django by 10 different contributors.

This week's Django highlights: 🦄

If you haven't already, give Django 6.1 alpha 1 a spin and report anything suspicious to the issue tracker! 🎉

That's all for this week in Django development! 🐍🦄


Articles

Upgrade PostgreSQL from 17 to 18 on Ubuntu 26.04

After moving to Ubuntu 26.04, upgrade an existing 17/main cluster to 18 by running pg_upgradecluster 17 main -v 18, then verify the new 18/main cluster is online. Once confirmed, drop the old 17 cluster with pg_dropcluster 17 main and optionally purge postgresql-17 and postgresql-client-17 packages.

My not-so-static new static website

Jake Howard walks through his eighth website rewrite, this time ditching Wagtail for a custom "semi-static" Django setup that renders Markdown content into SQLite at startup and serves it dynamically with Jinja2 templates.

Improving First Byte and Contentful Paint on a Django Website

A look at how to use Django's StreamingHttpResponse to send the ` and above-the-fold content first, letting the browser fetch static assets and start painting while the rest of the page renders.

PyCon US 2026 Recap - Black Python Devs

A recap from from the community booth to open spaces, hallway track, and Jay Miller receiving the PSF Community Service Award.

django-removals 1.2.0 - Now with Django 6.1 deprecations

How the maintainers of django-removals shipped new warnings for the Django 6.1 deprecation wave.

Mentoring GSoC 2026: Experimental Flags - Software Crafts

Mentor and mentee are starting a GSoC 2026 project around an "Experimental Flags" framework for Django core, using the forum to gather requirements and drive early consensus. The plan balances fast iteration with faster-than-normal Django consensus, including an initial third-party package to test ideas before wider adoption.


Django Forum

GSoC 2026: Implementing a Formal Experimental API Framework for Django Core

A lively discussion around how experimental features can be merged into the main repository but remain explicitly non-stable.

Thoughts on advertising on djangoproject.com

New thoughts and comments on the age-old question.


Django Fellow Reports

Jacob Walls

Not much going on, "just" the 6.1 Feature Freeze/alpha release, a sprint at PyCon US, and a kickoff meeting with Google Summer of Code participants & mentors.

Sarah Boyce

As we had the feature freeze, focused on a few feature PRs I had prioritized for 6.1 release.

Natalia Bidart

This week was mostly about returning from PyCon, which was quite exhausting. I arrived back on Wednesday, fairly drained (and very hungry), so I worked during Thu and Fri catching up on a large backlog of email notifications and syncing with the other Fellows.


Events

Django on the Med - September 23-25 in Pescara, Italy

PyCon Italia this week has been Django members in attendance, so it is a good time to remind readers that Django on the Med will be back in Italy later in the year.


Django Job Board

Founding Engineer at MyDataValue


Projects

feincms/feincms3-cookiecontrol

Cookie banner with support for embedded media.

emfpdlzj/django-deploy-probes

HTTP deployment probes for Django applications.

29 May 2026 2:00pm GMT

27 May 2026

feedDjango community aggregator: Community blog posts

Please add an RSS Feed to Your Site

Why syndication feeds are having a moment in 2026.

27 May 2026 9:57pm GMT

22 May 2026

feedPlanet Twisted

Glyph Lefkowitz: Opaque Types in Python

Let's say you're writing a Python library.

In this library, you have some collection of state that represents "options" or "configuration" for a bunch of operations. Such a set of options is a bundle of potentially ever-increasing complexity. Thus, you will want it to have an extremely minimal compatibility surface, with a very carefully chosen public interface, that is either small, or perhaps nothing at all. Such an object conveys state and might have some private behavior, but all you want consumers to be able to do is build it in very constrained, specific ways, and then pass it along as a parameter to your own APIs.

By way of example, imagine that you're wrapping a library that handles shipping physical packages.

There are a zillion ways to do it ship a package. There are different carriers who can ship it for you. There's air freight, and ground freight, and sea freight. There's overnight shipping. There's the option to require a signature. There's package tracking and certified mail. Suffice it to say, lots of stuff.

If you are starting out to implement such a library, you might need an object called something like ShippingOptions that encapsulates some of this. At the core of your library you might have a function like this:

1
2
3
4
5
async def shipPackage(
        how: ShippingOptions,
        where: Address,
    ) -> ShippingStatus:
    ...

If you are starting out implementing such a library, you know that you're going to get the initial implementation of ShippingOptions wrong; or, at the very least, if not "wrong", then "incomplete". You should not want to commit to an expansive public API with a ton of different attributes until you really understand the problem domain pretty well.

Yet, ShippingOptions is absolutely vital to the rest of your library. You'll need to construct it and pass it to various methods like estimateShippingCost and shipPackage. So you're not going to want a ton of complexity and churn as you evolve it to be more complex.

Worse yet, this object has to hold a ton of state. It's got attributes, maybe even quite complex internal attributes that relate to different shipping services.

Right now, today, you need to add something so you can have "no rush", "standard" and "expedited" options. You can't just put off implementing that indefinitely until you can come up with the perfect shape. What to do?

The tool you want here is the opaque data type design pattern. C is lousy with such things (FILE, pthread_*_t, fd_set, etc). A typedef in a header file can easily achieve this.

But in Python, if you expose a dataclass - or any class, really - even if you keep all your fields private, the constructor is still, inherently, public. You can make it raise an exception or something, but your type checker still won't help your users; it'll still look like it's a normal class.

Luckily, Python typing provides a tool for this: typing.NewType.

Let's review our requirements:

  1. We need a type that our client code can use in its type annotations; it needs to be public.
  2. They need to be able to consruct it somehow, even if they shouldn't be able to see its attributes or its internal constructor arguments.
  3. To express high-level things (like "ship fast") that should stay supported as we add more nuanced and complex configurations in the future (like "ship with the fastest possible option provided by the lowest-cost carrier that supports signature verification").

In order to solve these problems respectively, we will use:

  1. a public NewType, which gives us our public name...
  2. which wraps a private class with entirely private attributes, to give us an actual data structure, while not exposing the constructor,
  3. a set of public constructor functions, which returns our NewType.

When we put that all together, it looks like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
from dataclasses import dataclass
from typing import Literal, NewType

@dataclass
class _RealShipOpts:
    _speed: Literal["fast", "normal", "slow"]

ShippingOptions = NewType("ShippingOptions", _RealShipOpts)

def shipFast() -> ShippingOptions:
    return ShippingOptions(_RealShipOpts("fast"))

def shipNormal() -> ShippingOptions:
    return ShippingOptions(_RealShipOpts("normal"))

def shipSlow() -> ShippingOptions:
    return ShippingOptions(_RealShipOpts("slow"))

As a snapshot in time, this is not all that interesting; we could have just exposed _RealShipOpts as a public class and saved ourselves some time. The fact that this exposes a constructor that takes a string is not a big deal for the present moment. For an initial quick and dirty implementation, we can just do checks like if options._speed == "fast" in our shipping and estimation code.

However, the main thing we are doing here is preserving our flexibility to evolve the related APIs into the future, so let's see how we might do that. For example, let's allow the shipping options to contain a concrete and specific carrier and freight method:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
from dataclasses import dataclass
from enum import Enum, auto
from typing import NewType

class Carrier(Enum):
    FedEx = auto()
    USPS = auto()
    DHL = auto()
    UPS = auto()

class Conveyance(Enum):
    air = auto()
    truck = auto()
    train = auto()

@dataclass
class _RealShipOpts:
    _carrier: Carrier
    _freight: Conveyance

ShippingOptions = NewType("ShippingOptions", _RealShipOpts)

def shipFast() -> ShippingOptions:
    return ShippingOptions(_RealShipOpts(Carrier.FedEx, Conveyance.air))

def shipNormal() -> ShippingOptions:
    return ShippingOptions(_RealShipOpts(Carrier.UPS, Conveyance.truck))

def shipSlow() -> ShippingOptions:
    return ShippingOptions(_RealShipOpts(Carrier.USPS, Conveyance.train))

def shippingDetailed(
    carrier: Carrier, conveyance: Conveyance
) -> ShippingOptions:
    return ShippingOptions(_RealShipOpts(carrier, conveyance))

As a NewType, our public ShippingOptions type doesn't have a constructor. Since _RealShipOpts is private, and all its attributes are private, we can completely remove the old versions.

Anything within our shipping library can still access the private variables on ShippingOptions; as a NewType, it's the same type as its base at runtime, so it presents minimal1 overhead.

Clients outside our shipping library can still call all of our public constructors: shipFast, shipNormal, and shipSlow all still work with the same (as far as calling code knows) signature and behavior.

If you need to build and convey some state within your public API, while avoiding breakages associated with compatibility churn, hopefully this technique can help you do that!


Acknowledgments

Thanks for reading, and thank you to my patrons who are supporting my writing on this blog. If you like what you've read here and you'd like to read more of it, or you'd like to support my various open-source endeavors, you can support my work as a sponsor.


  1. The overhead is minimal, but it is not completely zero. The suggested idiom for converting to a NewType is to call it like a function, as I've done in these examples, but if you are wanting to use this pattern inside of a hot loop, you can use # type: ignore[return-value] comments to avoid that small cost.

22 May 2026 12:33am GMT

04 Apr 2026

feedPlanet Twisted

Donovan Preston: Using osascript with terminal agents on macOS

Here is a useful trick that is unreasonably effective for simple computer use goals using modern terminal agents. On macOS, there has been a terminal osascript command since the original release of Mac OS X. All you have to do is suggest your agent use it and it can perform any application control action available in any AppleScript dictionary for any Mac app. No MCP set up or tools required at all. Agents are much more adapt at using rod terminal commands, especially ones that haven't changed in 30 years. Having a computer control interface that hasn't changed in 30 years and has extensive examples in the Internet corpus makes modern models understand how to use these tools basically Effortlessly. macOS locks down these permissions pretty heavily nowadays though, so you will have to grant the application control permission to terminal. But once you have done that, the range of possibilities for commanding applications using natural language is quite extensive. Also, for both Safari and chrome on Mac, you are going to want to turn on JavaScript over AppleScript permission. This basically allows claude or another agent to debug your web applications live for you as you are using them.In chrome, go to the view menu, developer submenu, and choose "Allow JavaScript from Apple events". In Safari, it's under the safari menu, settings, developer, "Allow JavaScript from Apple events". Then you can do something like "Hey Claude, would you Please use osascript to navigate the front chrome tab to hacker news". Once you suggest using OSA script in a session it will figure out pretty quickly what it can do with it. Of course you can ask it to do casual things like open your mail app or whatever. Then you can figure out what other things will work like please click around my web app or check the JavaScript Console for errors. Another very important tips for using modern agents is to try to practice using speech to text. I think speaking might be something like five times faster than typing. It takes a lot of time to get used to, especially after a lifetime of programming by typing, but it's a very interesting and a different experience and once you have a lot of practice It starts to to feel effortless.

04 Apr 2026 1:31pm GMT

16 Mar 2026

feedPlanet Twisted

Donovan Preston: "Start Drag" and "Drop" to select text with macOS Voice Control

I have been using macOS voice control for about three years. First it was a way to reduce pain from excessive computer use. It has been a real struggle. Decades of computer use habits with typing and the mouse are hard to overcome! Text selection manipulation commands work quite well on macOS native apps like apps written in swift or safari with an accessibly tagged webpage. However, many webpages and electron apps (Visual Studio Code) have serious problems manipulating the selection, not working at all when using "select foo" where foo is a word in the text box to select, or off by one errors when manipulating the cursor position or extending the selection. I only recently expanded my repertoire with the "start drag" and "drop" commands, previously having used "Click and hold mouse", "move cursor to x", and "release mouse". Well, now I have discovered that using "start drag x" and "drop x" makes a fantastic text selection method! This is really going to improve my speed. In the long run, I believe computer voice control in general is going to end up being faster than WIMP, but for now the awkwardly rigid command phrasing and the amount of times it misses commands or misunderstands commands still really holds it back. I've been learning the macOS Voice Control specific command set for years now and I still reach for the keyboard and mouse way too often.

16 Mar 2026 11:04am GMT