25 Jun 2026
Planet Python
Artem Golubin: Hexora v0.3: New features and improvements
Recently, I've improved my Python library, hexora. I wrote it to detect malicious Python code using static analysis.
In the new v.0.3.0 release, I've added new detections, and we now also use a simple machine learning model to analyze the whole file. The machine learning model uses code structure features, semantic features, and static code analysis to assess the entire Python file.
Although the model can detect malicious code without any detections coming from static analysis, its main use case is to filter false positives.
I've been testing it against newly published PyPI packages and it detects 2-10 new malicious packages each day.

Due to the number of published packages, before the machine learning model, I was getting around 5-10 false positives for 1[......]
25 Jun 2026 1:37pm GMT
Django Weblog: How the Django Software Foundation Became a CNA
Why the DSF pursued CNA status
Django has a long history of responsible security practices: a dedicated, private security mailing list, clear advisory policies, and predictable security releases. Even so, we relied on external organizations to assign CVE IDs (Common Vulnerabilities and Exposures). This sometimes introduced administrative delays and extra coordination overhead.
Becoming a CNA (CVE Numbering Authority) allows the DSF to:
- Assign CVEs ourselves for vulnerabilities in Django and selected community projects.
- Publish advisories more efficiently and in closer alignment with Django's established release workflow.
- Maintain strong independence in how Django handles security incidents.
The initial exploration
The process began with internal discussions within the DSF Board and Django Security Team. We evaluated:
- Whether our existing security process already met CNA expectations.
- Whether we had the organizational stability to take on long term responsibility for CVE assignment.
- The scope of projects we would cover.
- How to ensure we could meet the operational requirements without overloading volunteers and Django Fellows.
After confirming that our policies were mature and that the administrative workload would be manageable, we initiated the CNA application with MITRE.
Preparing the application
MITRE requires that new CNAs document their security processes and demonstrate that they can meet CNA obligations. Our preparation included:
-
Reviewing and updating the Django Security Policy.
-
Mapping our existing workflows to MITRE's CNA rules, including:
- How reports are received.
- How vulnerabilities are validated.
- How advisories are produced.
- How CVEs will be assigned and published.
-
Defining the scope of the CNA:
- Django itself as the core product.
- A small, clearly bounded set of related ecosystem projects.
-
Ensuring we had private communication channels and documented procedures for confidential handling.
-
Drafting the required procedural documentation for MITRE.
Most of the work here was not about creating new processes but about articulating long standing Django practices in the format MITRE expects.
Training and review
Once our initial documentation was accepted, MITRE scheduled us for CNA onboarding training. This covered:
- CNA rules and responsibilities.
- Required data fields for CVE Records.
- Expectations for coordination with reporters and downstream consumers.
- The publication workflow for CVE Records.
We also completed MITRE's required CNA onboarding exercises. As part of this process, we worked through sample security reports and demonstrated how we would determine CVE assignments, including cases where multiple CVEs may or may not be warranted for a single report.
Approval and onboarding
After MITRE approved our documentation, training, and exercise submissions, the DSF was formally granted CNA status. The announcements steps were:
- MITRE published our CNA scope on the official CNA directory.
- MITRE issued a press release.
- The DSF published a blog post announcing our new CNA status.
Lessons learned
A few procedural insights for other projects considering CNA status:
- If your project already has a mature and documented security process, becoming a CNA is mostly about formalizing what you already do.
- The documentation and validation steps take time. Most delays come from ensuring that all fields conform to the CVE schema. The whole process took about four months.
- The training is detailed and helpful. It clarifies exactly what CNAs are expected to produce and how CVE Records flow through the broader ecosystem.
- It is worth being explicit about scope early. Defining the boundary of responsibility reduces ambiguity later.
What changes for Django users
For most contributors and users, nothing changes. Django will continue to follow its established process for receiving reports, coordinating fixes, and publishing security releases.
The difference is that the DSF can now assign CVE IDs directly, which simplifies coordination and allows us to publish advisories with fewer external dependencies.
Acknowledgments
This work was led by Django Fellows Natalia Bidart and Jacob Walls, with support from the Django Security Team and the DSF Board. We are grateful to MITRE for their guidance during the onboarding process.
If you have questions about Django's CNA scope or security process, contact the Django Security Team.
25 Jun 2026 11:00am GMT
PyCharm: Explicit Lazy Imports Are Coming to Python 3.15
A while ago at PyCon US 2026, I had the pleasure of listening to the Python Steering Council give updates about new features that are being added in Python 3.15. One that stood out was explicit lazy imports (via PEP 810), which defer module loading until first use. I am curious to see how this new feature works, and I want to benchmark its performance with PyCharm. Let's take a look together.
Overview of explicit lazy imports
PEP 810 introduces an explicit syntax for lazy imports, allowing you to defer the loading and execution of modules until their attributes are actually accessed, unlike standard eager imports that execute immediately. This feature aims to significantly reduce startup latency and memory consumption. Explicitly marking modules as `lazy` can deliver substantial improvements in initial responsiveness and baseline resource usage in large-scale applications and command-line tools.
Because the implementation approach uses proxy objects within the module's namespace instead of modifying Python's fundamental dictionary structures, it preserves critical interpreter optimizations.
This mechanism defers both the finding and the loading of the module to maximize efficiency, especially in environments with high-latency filesystems. To manage potential side effects and ensure backward compatibility, the proposal includes global control flags and a transitional variable for progressive adoption across different Python versions.
In short, Python 3.15 will let you optimize application performance by significantly reducing startup latency and memory consumption, as the loading and execution of modules are deferred until their attributes are actually accessed.
Trying them out in Python 3.15.0b1
At the time this is being written, Python 3.15.0b1 is already out, so we can give this new feature a try. You can build it from source at the CPython GitHub repo, but since getting Python 3.15.0b1 is easy when using `uv` or `pyenv`, we will do that instead.
Make sure you have the latest version of `uv` or `pyenv`, and then download Python 3.15.0b1 via either of the following commands:
- `uv python install 3.15.0b1`
- `pyenv install 3.15.0b1`
After that, select the new interpreter in your project in PyCharm.

Now you will need to reinstall the dependencies for your project. You may have to build some of the libraries from source, as most of the libraries will not have a Python 3.15 wheel for download.
Profiling against normal imports
It is a common joke that the first thing data scientists will do is type `import pandas as pd` and `import numpy as np`, even if they are not actually going to use them. Let's assume this is the case, and you received a script like this from your colleague:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
def main():
print("Initializing example data science project...")
# Generate some dummy data
data = {
'x': np.linspace(0, 10, 100),
'y': np.sin(np.linspace(0, 10, 100)) + np.random.normal(0, 0.1, 100)
}
# Plotting
plt.figure(figsize=(10, 6))
plt.plot(data['x'], data['y'], label='Sine Wave with Noise')
plt.title('Sample Visualization')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.legend()
# Save the plot instead of showing it (since this is non-interactive)
plt.savefig('sine_wave.png')
print("Project executed successfully. Plot saved as sine_wave.png.")
if __name__ == "__main__":
main()
As you see, PyCharm highlights the unused pandas import for you, so removing it would be straightforward. However, for our experiment here, we'll keep it.

To get a better visualization of import profiles, install a tool from PyPI called tuna.

You can profile your script by setting a custom run script with this script text:
python -X importtime main.py 2> import_log.txt; tuna import_log.txt

When you use it, a new browser window will pop up with the import graph.

As you see, importing pandas accounts for half of the time it takes to load all the modules, and we never use it!
Now let's add `lazy` to all the imports.

Don't worry about the syntax highlighting. PyCharm just doesn't recognize it yet since `lazy` is a new keyword that has not been officially released.
Let's profile the script again.

Now we see the pandas import is gone, and loading everything takes way less time.
So, if you have a script that imports a lot of large libraries, and some of them are only used in certain conditions (e.g. in if-else clauses), lazy import can save time by loading modules only when they are first used.
Checking the inner workings with lazy imports
Let's see how lazy imports are handled internally.
When a module is imported "lazily", meaning `__lazy_import__` is called instead of `__import__`, a `types.LazyImportType` proxy object will be created. The module name will then be listed in `sys.lazy_modules` instead of `sys.modules`. (See the Lazy import mechanism section in PEP 810.)
When a lazy object is used, it needs to be reified. CPython will try resolving the import at that point and replacing the proxy object with the actual module itself. In this process, `__import__` is called to resolve the import. At the same time, the module is removed from `sys.lazy_modules`.
If there's an error during reification, AKA importing the module, the lazy object is not reified or replaced. The next time the lazy module is used, the import will try again. The exception raised during reification will also show both where the lazy import was defined and where it was accessed. (See the Reification section in PEP 810.)
To experiment with it ourselves, let's add some breakpoints with `pdb` and check what's happening in the code:
import pdb pdb.set_trace() lazy import pandas as pd lazy import matplotlib.pyplot as plt lazy import numpy as np pdb.set_trace() …
And
# Generate some dummy data
data = {
'x': np.linspace(0, 10, 100),
'y': np.sin(np.linspace(0, 10, 100)) + np.random.normal(0, 0.1, 100)
}
pdb.set_trace()
# Plotting
plt.figure(figsize=(10, 6))
pdb.set_trace()
…
Now run the script in the console:
python main.py
Note that PyCharm 2026.1 does not yet support Python 3.15, so using the Run or Debug button to run a script using lazy import may result in unexpected behavior.

When it hits the first line of ` pdb.set_trace()` at the top, there should not be any module loaded in. Let's check:
(Pdb) import sys (Pdb) sys.lazy_modules

As expected, none of our libraries - pandas, numpy, and matplotlib - are listed.
Now, let's continue running the program and let it stop at the next breakpoint. In the console, type `continue` and once it stops, we can check by typing `sys.lazy_modules` again:

Here, we see that all of our modules are in `lazy_modules`. Let's check whether pandas is in `sys.modules`:
(Pdb) 'pandas' in sys.modules

Nope, it's not. You can try with numpy and matplotlib, and you will see that neither of those is in `sys.module`.
Now let's type `continue` again and reach the next breakpoint, which occurs after numpy is used. Check `sys.lazy_module` again, and you'll see that numpy is no longer on the list. When we check whether it is in `sys.module`, we get `True` this time.

However, pandas and matplotlib are still not in `sys.modules`.

When you check the next breakpoint, you'll see that matplotlib is similarly removed from `sys.lazy_modules` and added to `sys.modules` after it is used.

Trying it yourself with PyCharm
Download the latest version of PyCharm to experiment with Python 3.15.0b1 and experience firsthand how explicit lazy imports can optimize your application's performance by significantly reducing startup latency and memory consumption.
25 Jun 2026 8:21am GMT
24 Jun 2026
Django community aggregator: Community blog posts
Supporting Django's Next Chapter
The path to hiring an Executive Director gained real momentum at DjangoCon US 2024, when Jacob Kaplan-Moss shared a vision for what dedicated resources could mean for the future of Django. In his blog post If We Had $1,000,000, he invited companies and supporters to help get the initiative off the ground. The response from the community was inspiring, and we're proud to see that vision become reality.
24 Jun 2026 7:00pm GMT
Wagtail as Django admin on steroids
Many of you have probably heard of Wagtail CMS, but not everyone knows that Wagtail, in a nutshell, is a supercharged admin backend for Django. At least that's how I see it, and how I often pitch it to fellow Django developers.
Django comes with its own django.contrib.admin …
24 Jun 2026 9:38am GMT
23 Jun 2026
Planet Twisted
Glyph Lefkowitz: Adversarial Communication
As I have discussed in previous posts, "AIs" can make mistakes. In fact, they do make mistakes, and their mistake-making patterns are such that where and how they will make mistakes is both uncertain and constantly changing.
Thus, in any scenario where you want to attempt to make "productive" use of "AI", you must have a system in place for checking every result. Not checking some results; checking every result. If each result might have a consequence for you (and if it didn't have a consequence, why bother automating it?) and you cannot predict in advance which kinds of results will need verification, then verification is always required.
The verification often ends up being just as expensive as doing the work in the first place, which means that if you want your usage of "AI" to be personally profitable, you have to find someone else to externalize the cost of verification onto. This person becomes your adversary, and, if you are successful, your "AI's" victim.
The Ladder-Climber And Their Reverse-Centaur Rungs
One way that this constellation of facts can straightforwardly assemble themselves into a dystopian nightmare is the phenomenon, described by Cory Doctorow, of the reverse centaur. This is when your employer non-consensually turns you into the verification system. The "AI" does the fun part of initially performing the work, and then you do the boring part where you check if the robot is right and clean up its messes, even if everyone already knows that it would, in aggregate, be cheaper for you to do the work in the first place.
Reverse centaurs can be made from any automation, not only "AI" automation. I think that there is a reason that this term happens to have emerged in the "age of AI", though, and not with earlier automation technologies (even those which were considerably more viscerally horrific). That reason is: the wrongness of "AI" output is not merely a technical feature that must be compensated for, it is a generalized externality.
As I mentioned above, if you are responsible for the entirety of the work, both extruding the "AI" output and checking it, it's usually cheaper to have humans do the entirety of the work to begin with. When humans do the writing directly, we can check as we go, and thus verification doesn't need to be as comprehensive.
When "AI" coding advocates say "code review is the bottleneck", what they are observing is that the LLM is still rolling the dice for each PR, and a human is still necessary to verify that each of those rolls is a winner. But calling this process "code review" is a bit of a misnomer; it's not really "code review" in the traditional sense, it's human understanding.
Before the advent of "AI", the human understanding was implicit in the process of writing the code in the first place1, and the code review was a way of diffusing and extending that understanding. Now that the code can be authored with no initial understanding taking place, that cost has not gone away, it has moved.
Human understanding was always the bottleneck.
However, this is taking a collaborative view of a software project, where satisfying the needs and solving the problems of your customers are the goals. We can see that "AI" is a bad tool to satisfy those goals, because all it's doing is converting the first half of the work, that of understanding the code as you write it, to understanding the agent's output as you read it.
What if, instead, we were to take the view that every software company is a Hobbesian nightmare, red in tooth and claw? In this view, the only goal of a software project is for the individual developers to make their promo cycles and get their bonuses. Given that there is only a certain amount of money to go around, this is a zero-sum game where each programmer wants to look more productive than their colleagues.
Pretty much every organization finds it easy to reward "productivity" as expressed by lines of code emitted, but the benefits of doing thorough and thoughtful design, analysis, and code review very difficult to reward. In this world, an LLM is an invaluable tool for the sociopathic ladder-climber, particularly if your legacy organization is still structuring their workflows as if the person prompting the bot is "writing" the code, and then they get to foist off the act of "reviewing" the code onto someone else.
Here, the prompter effectively externalizes the cost of the LLM's failures but internalizes any benefits. The prompter will vibe-code a big feature, so large that the assigned reviewer can't possibly comprehend it all effectively. When this happens, the reviewer will, eventually, be pressured to approve it, even if they can try to spot a few problems along the way. The reviewer has their own work to get back to, after all, the obligation to review the prompter's (read: the bot's) code is a drain on their time that they are not going to get rewarded for.
If this feature is a big success, the prompter gets a promotion. If it causes a big issue, well, the reviewer must not have been careful enough.
This is why LLMs are "good for coding", and also why their biggest promoters keep having outages.
The Generative Gish Galloper
Coding is the biggest "success story" of this type of adversarial communication, but it is by far not the only instance of such a thing. LLMs create a new form of leverage that can turn Brandolini's law from a linear advantage into an exponential one. If you are engaged in a political debate where you want to overwhelm the other side in nonsense, an LLM can generate bullshit faster than it is physically possible for a human being to type, let alone respond thoughtfully. There is an asymmetry to the utility of this weapon as well: only one side of the political spectrum wants to flood the zone and destroy trust in institutions and the concept of truth. There's a good reason that the fascists love it.
Straightforward Spam and Fraud
This is kind of obvious, but LLMs can generate lightly-customized, plausible-looking text much more quickly than any human being. This facilitates their use in fraud, spam, and scams. In a spamming or fraudulent interaction, once again, the costs are externalized onto the victim: the recipient of a spam message has to do all the work of "checking" the LLM's output. Spammers already expect very low hit rates from boilerplate, and if the LLM can increase those percentages from 1% to 5% the technology will pay for itself; they don't need anything like reliable accuracy.
Customer "Support"
If you have any kind of commercial relationship with a company, I probably don't even need to mention this: customer "support" bots are a misery. Everybody knows it at this point. But customer support is usually conceptualized by businesses as an adversarial interaction, because it is a cost center. They maintain internal metrics on time-to-resolution and try to optimize them. Implicitly, this creates a dynamic where the goal of the customer service agent's job is not to solve your problem, but to emit noise that will cause you to think your problem is resolved, or to give up, as fast as possible. Unsurprisingly, LLMs can emit this noise faster than humans can, getting those customers off the phone. But those customers will remember those interactions, and the story outside the TTR metrics is horrible.
Similarly to the situation in software development, LLMs can look very good on paper for customer support, but mostly what they are doing is illuminating the problems with the industry's existing metrics, by turning "winning the metrics battle against the customer" into a more obvious and immediate defeat for the company's long term reputation.
"Education"
In 2026 it is sadly a fact of life that students cheat all the time using "AI", and that this cheating is very successful, in that the teachers find it very hard to detect.
LLMs are great for cheating on schoolwork because the student is externalizing the work of the checking onto the teachers, who are often starting at a disadvantage to begin with, at least in the US.
My view is that this is happening because of a divergence in the way that students vs. teachers (or, more accurately, "the broader educational system") view grading.
When a student is asked to write an essay, the teachers see the effort as both intrinsically worthwhile for the student, as well as useful as a pedagogical tool to evaluate and react to the student's progress. The student, by contrast, sees a stumbling block designed to knock them off the path to success and into a permanent underclass. It is no wonder that the student sees "AI" as useful to their own goals and has no compunction about deploying it.
There is a bitter irony that the ability to understand the inherent value of actually writing the essay on their own is the sort of thing that students can really only learn by writing a bunch of essays. There's no way that I can think of which makes the benefit legible as long as a shortcut is available.
The net effect here is a downward spiral, where the already-wobbling educational system is sustaining an attack that it doesn't have the resources to recover from. The individual students' attacks against their teachers and their schools' grading systems might appear to momentarily succeed, but they will win the battle and lose the war.
Spamming "For Good"?
Usually when we talk about someone unilaterally choosing to enter into an adversarial relationship, that's an "attack" and for good reasons we have a negative impression of the attacker. However, I would be remiss if I did not point out that there are some cases where the relationship was already adversarial; just because you're the attacker doesn't mean that you are evil.
For example we might imagine use-cases like automatically filing appeals for prior authorizations against health insurance. It's relatively well-known at this point that the main way for-profit insurers maintain their margins is by denying claims right up to the line of the policies themselves being fraud, so using a spamming tool to fight them might be entirely justifiable2 in that case.
Similarly, using an LLM could be justified in a fight against a company refusing to honor a warranty. One could imagine using an LLM to immediately generate replies and escalations.
However, even in imagined cases like these, the underlying problem is that the insurers and the vendors already have a tremendous amount of structural power, so it is more likely that they will have the advantage in deploying a communications weapon like an LLM, as well as enacting policies to simply ignore any LLM-based communication that you might submit. Worse, if these strategies were to become widespread, they might provide an excuse to reject any communications by feeding them into an unreliable "LLM detector" and issuing an automated "computer says no" even to hand-written correspondence.
It is also worth stressing that these cases are imagined, as compared to the very real coworker-abuse, spam, scam, fraud, and disinformation campaigns being waged in real life today.
Therefore, while legitimate uses might exist, it's hard to imagine that there's anywhere they would be genuinely valuable and sustainable. In the best case "AI" will provide a temporary advantage for underdogs that will provoke an arms race which the resource-advantaged adversaries will win in the long run, in the worst case the arms race itself will cement permanent structural change that will make things worse.
"Search" By Stealing
Most of the adversarial utility of "AI" is on the "write" side, since write-amplification is more obviously aggressive than reading. But the "read" side of LLMs - summarization and question-answering - can be a form of attack as well.
To begin with, the act of reading itself is currently enormously destructive, but that's arguably not a fundamental aspect of this technology. They could set reasonable rate-limits and respect things like robots.txt, as search engines have for decades now. They could also refrain from committing criminal levels of copyright infringement. But, today, using "AI" tools does suborn this sort of out-of-control crawling.
More insidiously, consider the scenario described in this YouTube video. The LTT Bros decided to try Linux again, and in the course of so doing, they had problems. When trying to solve these problems, they were faced with a choice: they could consult Reddit, or they could ask an LLM. Asking an LLM would "gaslight the heck out of" them, but they still found it preferable, because they would at least get an answer without getting yelled at.
Initially this sounds great. But it also means that you want to extract knowledge from a community, while mechanically eliding any values or norms that the community may want to impart as part of offering that knowledge. As someone who spent many years in a community tech support role, this is worrying. Many requests for support are people asking how to do things that will momentarily solve a superficial problem but create a long-term reliability problem or even an immediate security risk, that the question-asker doesn't want to hear about. Consider the question "I'm tired of entering my password so much, how do I make it so my laptop unlocks automatically". An obsequious chatbot will helpfully tell you how to do this without pushback.
But, this is also a sort of ethically murky area. The Linux community is somewhat famously, for many years now, a toxic cesspool of general hostility, misogyny, etc. It is certainly a good thing that people can get access to this knowledge without subjecting themselves to abuse. But it also means that the people with the power and the privilege to change the community for the better can just quietly withdraw, rather than fixing the problems. It also means that the positive elements of culture cannot be transmitted, and people will have no opportunity to learn about unknown unknowns.
In this case, the "adversarial" communication is with society. The thing that using an LLM for search lets you do is withdraw from society and avoid forming any personal connections. There are some personal connections which are painful and annoying, and so that can feel like a momentary balm. But the need to make connections in general is, like, the concept of society itself.
Who Am I Hurting?
LLMs are good at adversarial communication. They are so good at it, relative to their other benefits, that they will tend to make communications adversarial if you are not remaining vigilant about the possibility that it might do so. My request to you, dear reader, if you are going to use such tools, is to always ask yourself, "who might I be hurting, if I use an LLM for this?"
If you're using an "AI", who is its adversary? If you haven't given it one yet, who might the "AI" turn into an adversary? Who might you overwhelm with an asymmetric amount of output, or, if you're receiving information and not sending it, who are you taking that information from without consulting?
Figure out the answers to these questions and conduct yourself accordingly; the answer might be "yourself".
Acknowledgments
Thank you to my patrons who are supporting my writing on this blog. If you like what you've read here and you'd like to read more of it, or you'd like to support my various open-source endeavors, you can support my work as a sponsor!
-
One of the reasons that software developers tend to prefer greenfield development is that when you are given a blank page, you can project your own specific understanding onto it. You can structure the codebase in a way that works for your brain, down to the variable naming conventions and the module layouts. LLM-assisted development makes everything into instant brownfield work, which makes developers instantly miserable; even those who are excited about the technology will frequently complain about how it feels like their agency has been stolen and their joy in the work has been diminished. But I digress. ↩
-
Modulo the massive amount of other externalities involved in using LLMs, of course, but I don't have the time or energy to get into those here. ↩
23 Jun 2026 8:06pm GMT
Django community aggregator: Community blog posts
Boolean algebra
The third article in the series, still on conditions. The previous installment was about their shape - merging ifs, factoring shared decisions, dropping checks that earn nothing. This one reaches for the other lever: the algebra of the conditions themselves - not a textbook tour, just the handful of transformations I lean on in everyday code.

23 Jun 2026 12:00pm GMT
09 Jun 2026
Planet Twisted
Hynek Schlawack: How to Ditch Codecov for Python Projects
Codecov's unreliability breaking CI on my open source projects has been a constant source of frustration for me for years. I have found a way to enforce coverage over a whole GitHub Actions build matrix that doesn't rely on third-party services.
09 Jun 2026 12:00am GMT
22 May 2026
Planet Twisted
Glyph Lefkowitz: Opaque Types in Python
Let's say you're writing a Python library.
In this library, you have some collection of state that represents "options" or "configuration" for a bunch of operations. Such a set of options is a bundle of potentially ever-increasing complexity. Thus, you will want it to have an extremely minimal compatibility surface, with a very carefully chosen public interface, that is either small, or perhaps nothing at all. Such an object conveys state and might have some private behavior, but all you want consumers to be able to do is build it in very constrained, specific ways, and then pass it along as a parameter to your own APIs.
By way of example, imagine that you're wrapping a library that handles shipping physical packages.
There are a zillion ways to do it ship a package. There are different carriers who can ship it for you. There's air freight, and ground freight, and sea freight. There's overnight shipping. There's the option to require a signature. There's package tracking and certified mail. Suffice it to say, lots of stuff.
If you are starting out to implement such a library, you might need an object called something like ShippingOptions that encapsulates some of this. At the core of your library you might have a function like this:
1 2 3 4 5 |
|
If you are starting out implementing such a library, you know that you're going to get the initial implementation of ShippingOptions wrong; or, at the very least, if not "wrong", then "incomplete". You should not want to commit to an expansive public API with a ton of different attributes until you really understand the problem domain pretty well.
Yet, ShippingOptions is absolutely vital to the rest of your library. You'll need to construct it and pass it to various methods like estimateShippingCost and shipPackage. So you're not going to want a ton of complexity and churn as you evolve it to be more complex.
Worse yet, this object has to hold a ton of state. It's got attributes, maybe even quite complex internal attributes that relate to different shipping services.
Right now, today, you need to add something so you can have "no rush", "standard" and "expedited" options. You can't just put off implementing that indefinitely until you can come up with the perfect shape. What to do?
The tool you want here is the opaque data type design pattern. C is lousy with such things (FILE, pthread_*_t, fd_set, etc). A typedef in a header file can easily achieve this.
But in Python, if you expose a dataclass - or any class, really - even if you keep all your fields private, the constructor is still, inherently, public. You can make it raise an exception or something, but your type checker still won't help your users; it'll still look like it's a normal class.
Luckily, Python typing provides a tool for this: typing.NewType.
Let's review our requirements:
- We need a type that our client code can use in its type annotations; it needs to be public.
- They need to be able to consruct it somehow, even if they shouldn't be able to see its attributes or its internal constructor arguments.
- To express high-level things (like "ship fast") that should stay supported as we add more nuanced and complex configurations in the future (like "ship with the fastest possible option provided by the lowest-cost carrier that supports signature verification").
In order to solve these problems respectively, we will use:
- a public
NewType, which gives us our public name... - which wraps a private class with entirely private attributes, to give us an actual data structure, while not exposing the constructor,
- a set of public constructor functions, which returns our
NewType.
When we put that all together, it looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
As a snapshot in time, this is not all that interesting; we could have just exposed _RealShipOpts as a public class and saved ourselves some time. The fact that this exposes a constructor that takes a string is not a big deal for the present moment. For an initial quick and dirty implementation, we can just do checks like if options._speed == "fast" in our shipping and estimation code.
However, the main thing we are doing here is preserving our flexibility to evolve the related APIs into the future, so let's see how we might do that. For example, let's allow the shipping options to contain a concrete and specific carrier and freight method:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
|
As a NewType, our public ShippingOptions type doesn't have a constructor. Since _RealShipOpts is private, and all its attributes are private, we can completely remove the old versions.
Anything within our shipping library can still access the private variables on ShippingOptions; as a NewType, it's the same type as its base at runtime, so it presents minimal1 overhead.
Clients outside our shipping library can still call all of our public constructors: shipFast, shipNormal, and shipSlow all still work with the same (as far as calling code knows) signature and behavior.
If you need to build and convey some state within your public API, while avoiding breakages associated with compatibility churn, hopefully this technique can help you do that!
Acknowledgments
Thanks for reading, and thank you to my patrons who are supporting my writing on this blog. If you like what you've read here and you'd like to read more of it, or you'd like to support my various open-source endeavors, you can support my work as a sponsor.
-
The overhead is minimal, but it is not completely zero. The suggested idiom for converting to a
NewTypeis to call it like a function, as I've done in these examples, but if you are wanting to use this pattern inside of a hot loop, you can use# type: ignore[return-value]comments to avoid that small cost. ↩
22 May 2026 12:33am GMT
