15 Nov 2025
Planet Python
Kay Hayen: Nuitka Release 2.8
This is to inform you about the new stable release of Nuitka. It is the extremely compatible Python compiler, "download now".
This release adds a ton of new features and corrections.
Bug Fixes
-
Standalone: For the "Python Build Standalone" flavor ensured that debug builds correctly recognize all their specific built-in modules, preventing potential errors. (Fixed in 2.7.2 already.)
-
Linux: Fixed a crash when attempting to modify the RPATH of statically linked executables (e.g., from
imageio-ffmpeg). (Fixed in 2.7.2 already.) -
Anaconda: Updated
PySide2support to correctly handle path changes in newer Conda packages and improved path normalization for robustness. (Fixed in 2.7.2 already.) -
macOS: Corrected handling of
QtWebKitframework resources. Previous special handling was removed as symlinking is now default, which also resolved an issue of file duplication. (Fixed in 2.7.2 already.) -
Debugging: Resolved an issue in debug builds where an incorrect assertion was done during the addition of distribution metadata. (Fixed in 2.7.1 already.)
-
Module: Corrected an issue preventing
stubgenfrom functioning with Python versions earlier than 3.6. (Fixed in 2.7.1 already.) -
UI: Prevented Nuitka from crashing when
--include-modulewas used with a built-in module. (Fixed in 2.7.1 already.) -
Module: Addressed a compatibility issue where the
codemode for the constants blob failed with the C++ fallback. This fallback is utilized on very old GCC versions (e.g., default on CentOS7), which are generally not recommended. (Fixed in 2.7.1 already.) -
Standalone: Resolved an assertion error that could occur in certain Python setups due to extension module suffix ordering. The issue involved incorrect calculation of the derived module name when the wrong suffix was applied (e.g., using
.soto derive a module name likegdbmmoduleinstead of justgdbm). This was observed with Python 2 on CentOS7 but could potentially affect other versions with unconventional extension module configurations. (Fixed in 2.7.1 already.) -
Python 3.12.0: Corrected the usage of an internal structure identifier that is only available in Python 3.12.1 and later versions. (Fixed in 2.7.1 already.)
-
Plugins: Prevented crashes in Python setups where importing
pkg_resourcesresults in aPermissionError. This typically occurs in broken installations, for instance, where some packages are installed with root privileges. (Fixed in 2.7.1 already.) -
macOS: Implemented a workaround for data file names that previously could not be signed within app bundles. The attempt in release 2.7 to sign these files inadvertently caused a regression for cases involving illegal filenames. (Fixed in 2.7.1 already.)
-
Python 2.6: Addressed an issue where
staticmethodobjects lacked the__func__attribute. Nuitka now tracks the original function as a distinct value. (Fixed in 2.7.1 already.) -
Corrected behavior for
orderedsetimplementations that lack aunionmethod, ensuring Nuitka does not attempt to use it. (Fixed in 2.7.1 already.) -
Python 2.6: Ensured compatibility for setups where the
_PyObject_GC_IS_TRACKEDmacro is unavailable. This macro is now used beyond assertions, necessitating support outside of debug mode. (Fixed in 2.7.1 already.) -
Python 2.6: Resolved an issue caused by the absence of
sys.version_info.releaselevelby utilizing a numeric index instead and adding a new helper function to access it. (Fixed in 2.7.1 already.) -
Module: Corrected the
__compiled__.mainvalue to accurately reflects the package in which a module is loaded, this was not the case for Python versions prior to 3.12. (Fixed in 2.7.1 already.) -
Plugins: Further improved the
dill-compatplugin by preventing assertions related to empty annotations and by removing hard-coded module names for greater flexibility. (Fixed in 2.7.1 already.) -
Windows: For onefile mode using DLL mode, ensure all necessary environment variables are correctly set for
QtWebEngine. Previously, default Qt paths could point incorrectly near the onefile binary. (Fixed in 2.7.3 already.) -
PySide6: Fixed an issue with
PySide6where slots defined in base classes might not be correctly handled, leading to them only working for the first class that used them. (Fixed in 2.7.3 already.) -
Plugins: Enhanced Qt binding plugin support by checking for module presence without strictly requiring metadata. This improves compatibility with environments like Homebrew or
uvwhere package metadata might be absent. (Fixed in 2.7.3 already.) -
macOS: Ensured the
appletarget is specified during linking to prevent potential linker warnings about using anunknowntarget in certain configurations. (Fixed in 2.7.3 already.) -
macOS: Disabled the use of static
libpythonwithpyenvinstallations, as this configuration is currently broken. (Fixed in 2.7.3 already.) -
macOS: Improved error handling for the
--macos-app-protected-resourceoption by catching cases where a description is not provided. (Fixed in 2.7.3 already.) -
Plugins: Enhanced workarounds for
PySide6, now also covering single-shot timer callbacks. (Fixed in 2.7.4 already.) -
Plugins: Ensured that the Qt binding module is included when using accelerated mode with Qt bindings. (Fixed in 2.7.4 already.)
-
macOS: Avoided signing through symlinks and minimized their use to prevent potential issues, especially during code signing of application bundles. (Fixed in 2.7.4 already.)
-
Windows: Implemented path shortening for paths used in onefile DLL mode to prevent issues with long or Unicode paths. This also benefits module mode. (Fixed in 2.7.4 already.)
-
UI: The options nanny plugin no longer uses a deprecated option for macOS app bundles, preventing potential warnings or issues. (Fixed in 2.7.4 already.)
-
Plugins: Ensured the correct macOS target architecture is used. This particularly useful for
PySide2with universal CPython binaries, to prevent compile time crashes e.g. when cross-compiling for a different architecture. (Fixed in 2.7.4 already.) -
UI: Fixed a crash that occurred on macOS if the
ccachedownload was rejected by the user. (Fixed in 2.7.4 already.) -
UI: Improved the warning message related to macOS application icons for better clarity. (Added in 2.7.4 already.)
-
Standalone: Corrected an issue with QML plugins on macOS when using newer
PySide6versions. (Fixed in 2.7.4 already.) -
Python 3.10+: Fixed a memory leak where the matched value in pattern matching constructs was not being released. (Fixed in 2.7.4 already.)
-
Python3: Fixed an issue where exception exits for larger
rangeobjects, which are not optimized away, were not correctly annotated by the compiler. (Fixed in 2.7.4 already.) -
Windows: Corrected an issue with the automatic use of icons for
PySide6applications on non-Windows if Windows icon options were used. (Fixed in 2.7.4 already.) -
Onefile: When using DLL mode there was a load error for the DLL with MSVC 14.2 or earlier, but older MSVC is to be supported. (Fixed in 2.7.5 already.)
-
Onefile: Fix, the splash screen was showing in DLL mode twice or more; these extra copies couldn't be stopped. (Fixed in 2.7.5 already.)
-
Standalone: Fixed an issue where data files were no longer checked for conflicts with included DLLs. The order of data file and DLL copying was restored, and macOS app signing was made a separate step to remove the order dependency. (Fixed in 2.7.6 already.)
-
macOS: Corrected our workaround using symlinks for files that cannot be signed. When
--output-directorywas used, as it made incorrect assumptions about thedistfolder path. (Fixed in 2.7.6 already.) -
UI: Prevented checks on onefile target specifications when not actually compiling in onefile mode, e.g. on macOS with
--mode=app. (Fixed in 2.7.6 already.) -
UI: Improved error messages for data directory options by include the relevant part in the output. (Fixed in 2.7.6 already.)
-
Plugins: Suppressed
UserWarningmessages from thepkg_resourcesmodule during compilation. (Fixed in 2.7.6 already.) -
Python3.11+: Fixed an issue where descriptors for compiled methods were incorrectly exposed for Python 3.11 and 3.12. (Fixed in 2.7.7 already.)
-
Plugins: Avoided loading modules when checking for data file existence. This prevents unnecessary module loading and potential crashes in broken installations. (Fixed in 2.7.9 already.)
-
Plugins: The
global_change_functionanti-bloat feature now operates on what should be the qualified names (__qualname__) instead of just function names, preventing incorrect replacements of methods with the same name in different classes. (Fixed in 2.7.9 already.) -
Onefile: The
containing_dirattribute of the__compiled__object was regressed in DLL mode on Windows, pointing to the temporary DLL directory instead of the directory containing the onefile binary. (Fixed in 2.7.10 already, note that the solution in 2.7.9 had a regression.) -
Compatibility: Fixed a crash that occurred when an import attempted to go outside its package boundaries. (Fixed in 2.7.11 already.)
-
macOS: Ignored a warning from
codesignwhen using self-signed certificates. (Fixed in 2.7.11 already.) -
Onefile: Fixed an issue in DLL mode where environment variables from other onefile processes (related to temporary paths and process IDs) were not being ignored, which could lead to conflicts. (Fixed in 2.7.12 already.)
-
Compatibility: Fixed a potential crash that could occur when processing an empty code body. (Fixed in 2.7.13 already.)
-
Plugins: Ensured that DLL directories created by plugins could be at the top level when necessary, improving flexibility. (Fixed in 2.7.13 already.)
-
Onefile: On Windows, corrected an issue in DLL mode where
original_argv0wasNone; it is now properly set. (Fixed in 2.7.13 already.) -
macOS: Avoided a warning that appeared on newer macOS versions. (Fixed in 2.7.13 already.)
-
macOS: Allowed another DLL to be missing for
PySide6to support more setups. (Fixed in 2.7.13 already.) -
Standalone: Corrected the existing import workaround for Python 3.12 that was incorrectly renaming existing modules of matching names into sub-modules of the currently imported module. (Fixed in 2.7.14 already.)
-
Standalone: On Windows, ensured that the DLL search path correctly uses the proper DLL directory. (Fixed in 2.7.14 already.)
-
Python 3.5+: Fixed a memory leak where the called object could be leaked in calls with keyword arguments following a star dict argument. (Fixed in 2.7.14 already.)
-
Python 3.13: Fixed an issue where
PyState_FindModulewas not working correctly with extension modules due to sub-interpreter changes. (Fixed in 2.7.14 already.) -
Onefile: Corrected an issue where the process ID (PID) was not set in a timely manner, which could affect onefile operations. (Fixed in 2.7.14 already.)
-
Compatibility: Fixed a crash that could occur when a function with both a star-list argument and keyword-only arguments was called without any arguments. (Fixed in 2.7.16 already.)
-
Standalone: Corrected an issue where distribution names were not checked case-insensitively, which could lead to metadata not being included. (Fixed in 2.7.16 already.)
-
Linux: Avoid using full zlib with extern declarations but instead only the CRC32 functions we need. Otherwise conflicts with OS headers could occur.
-
Standalone: Fixed an issue where scanning for standard library dependencies was unnecessarily performed.
-
Plugins: Made the runtime query code robust against modules that in stdout during import
This affected at least
togagiving some warnings on Windows with mere stdout prints. We now have a marker for the start of our output that we look for and safely ignore them. -
Windows: Do not attempt to attach to the console when running in DLL mode. For onefile with DLL mode, this was unnecessary as the bootstrap already handles it, and for pure DLL mode, it is not desired.
-
Onefile: Removed unnecessary parent process monitoring in onefile mode, as there is no child process launched.
-
Anaconda: Determine version and project name for conda packages more reliably
It seems Anaconda is giving variables in package metadata and often no project name, so we derive it from the conda files and its meta data in those cases.
-
macOS: Make sure the SSL certificates are found when downloading on macOS, ensuring successful downloads.
-
Windows: Fixed an issue where console mode
attachwas not working in onefile DLL mode. -
Scons: Fixed an issue where
pragmawas used with oldergccgcccan give warnings about them. This fixes building on older OSes with the system gcc. -
Compatibility: Fix, need to avoid using filenames with more than 250 chars for long module names.
-
For cache files, const files, and C files, we need to make sure, we don't exceed the 255 char limits per path element that literally every OS has.
-
Also enhanced the check code for legal paths to cover this, so user options are covered from this errors too.
-
Moved file hashing to file operations where it makes more sense to allow module names to use hashing to provide a legal filename to refer to themselves.
-
-
Compatibility: Fixed an issue where walking included compiled packages through the Nuitka loader could produce incorrect names in some cases.
-
Windows: Fixed wrong calls made when checking
stderrproperties during launch if it wasNone. -
Debugging: Fixed an issue where the segfault non-deployment disable itself before doing anything else.
-
Plugins: Fix, the warning to choose a GUI plugin for
matplotlibwas given withtk-interplugin enabled still, which is of course not appropriate. -
Distutils: Fix, do not recreate the build folder with a
.gitignorefile.We were re-creating it as soon as we looked at what it would be, now it's created only when asking for that to happen.
-
No-GIL: Addressed compile errors for the free-threaded dictionary implementation that were introduced by necessary hot-fixes in the version 2.7.
-
Compatibility: Fixed handling of generic classes and generic type declarations in Python 3.12.
-
macOS: Fixed an issue where entitlements were not properly provided for code signing.
-
Onefile: Fixed delayed shutdown for terminal applications in onefile DLL mode.
Was waiting for non-used child processes, which don't exist and then the timeout for that operation, which is always happening on CTRL-C or terminal shutdown.
-
Python3.13: Fix, seems interpreter frames with None code objects exist and need to be handled as well.
-
Standalone: Fix, need to allow for
setuptoolspackage to be user provided. -
Windows: Avoided using non-encodable dist and build folder names.
Some paths don't become short, but still be non-encodable from the file system for tools. In these cases, temporary filenames are used to avoid errors from C compilers and other tools.
-
Python3.13: Fix, ignore stdlib
cgimodule that might be left over from previous installsThe module was removed during development, and if you install over an old alpha version of 3.13 a newer Python, Nuitka would crash on it.
-
macOS: Allowed the
libfolder for the Python Build Standalone flavor, improving compatibility. -
macOS: Allowed libraries for
rpathresolution to be found in all Homebrew folders and not justlib. -
Onefile: Need to allow
..in paths to allow outside installation paths.
Package Support
-
Standalone: Introduced support for the
niceguipackage. (Added in 2.7.1 already.) -
Standalone: Extended support to include
xgboost.coreon macOS. (Added in 2.7.1 already.) -
Standalone: Added needed data files for
ursinapackage. (Added in 2.7.1 already.) -
Standalone: Added support for newer versions of the
pydanticpackage. (Added in 2.7.4 already.) -
Standalone: Extended
libonnxruntimesupport to macOS, enabling its use in compiled applications on this platform. (Added in 2.7.4 already.) -
Standalone: Added necessary data files for the
pygameextrapackage. (Added in 2.7.4 already.) -
Standalone: Included GL backends for the
arcadepackage. (Added in 2.7.4 already.) -
Standalone: Added more data directories for the
ursinaandpanda3dpackages, improving their out-of-the-box compatibility. (Added in 2.7.4 already.) -
Standalone: Added support for newer
skimagepackage. (Added in 2.7.5 already.) -
Standalone: Added support for the
PyTaskbarpackage. (Added in 2.7.6 already.) -
macOS: Added
tk-intersupport for Python 3.13 with official CPython builds, which now use framework files for Tcl/Tk. (Added in 2.7.6 already.) -
Standalone: Added support for the
paddlexpackage. (Added in 2.7.6 already.) -
Standalone: Added support for the
jinxedpackage, which dynamically loads terminal information. (Added in 2.7.6 already.) -
Windows: Added support for the
ansiconpackage by including a missing DLL. (Added in 2.7.6 already.) -
macOS: Enhanced configuration for the
pypylonpackage, however, it's not sufficient. (Added in 2.7.6 already.) -
Standalone: Added support for newer
numpyversions. (Added in 2.7.7 already.) -
Standalone: Added support for older
vtkpackage. (Added in 2.7.8 already.) -
Standalone: Added support for newer
certifiversions that useimportlib.resources. (Added in 2.7.9 already.) -
Standalone: Added support for the
reportlab.graphics.barcodemodule. (Added in 2.7.9 already.) -
Standalone: Added support for newer versions of the
transformerspackage. (Added in 2.7.11 already.) -
Standalone: Added support for newer versions of the
sklearnpackage. (Added in 2.7.12 already.) -
Standalone: Added support for newer versions of the
scipypackage. (Added in 2.7.12 already.) -
Standalone: Added support for older versions of the
cv2package (specifically version 4.4). (Added in 2.7.12 already.) -
Standalone: Added initial support for the
vllmpackage. (Added in 2.7.12 already.) -
Standalone: Ensured all necessary DLLs for the
pygamepackage are included. (Added in 2.7.12 already.) -
Standalone: Added support for newer versions of the
zaber_motionpackage. (Added in 2.7.13 already.) -
Standalone: Added missing dependencies for the
pymediainfopackage. (Added in 2.7.13 already.) -
Standalone: Added support for newer versions of the
sklearnpackage by including a missing dependency. (Added in 2.7.13 already.) -
Standalone: Added support for newer versions of the
togapackage. (Added in 2.7.14 already.) -
Standalone: Added support for the
wordninja-enhancedpackage. (Added in 2.7.14 already.) -
Standalone: Added support for the
Fast-SSIMpackage. (Added in 2.7.14 already.) -
Standalone: Added a missing data file for the
rfc3987_syntaxpackage. (Added in 2.7.14 already.) -
Standalone: Added missing data files for the
trimeshpackage. (Added in 2.7.15 already.) -
Standalone: Added support for the
gdsfactory,klayout, andkfactorypackages. (Added in 2.7.15 already.) -
Standalone: Added support for the
vllmpackage. (Added in 2.7.16 already.) -
Standalone: Added support for newer versions of the
tkinterwebpackage. (Added in 2.7.15 already.) -
Standalone: Added support for newer versions of the
cmsis_pack_managerpackage. (Added in 2.7.15 already.) -
Standalone: Added missing data files for the
idlelibpackage. (Added in 2.7.15 already.) -
Standalone: Avoid including debug binary on non-Windows for Qt Webkit.
-
Standalone: Add dependencies for pymediainfo package.
-
Standalone: Added support for the
winptypackage. -
Standalone: Added support for newer versions of the
gipackage. -
Standalone: Added support for newer versions of the
litellmpackage. -
Standalone: Added support for the
traitsandpyfacepackages. -
Standalone: Added support for newer versions of the
transformerspackage. -
Standalone: Added data files for
rasteriopackage. -
Standalone: Added support for
ortoolspackage. -
Standalone: Added support newer "vtk" package
New Features
-
Python3.14: Added experimental support for Python3.14, not recommended for use yet, as this is very fresh and might be missing a lot of fixes.
-
Release: Added an extra dependency group for the Nuitka build-backend, intended for use in
pyproject.tomland other build-system dependencies. To use it depend inNuitka[build-wheel]instead of Nuitka. (Added in 2.7.7 already.)For release we also added
Nuitka[onefile],Nuitka[standalone],Nuitka[app]as extra dependency groups. If icon conversions are used, e.g.Nuitka[onefile,icon-conversion]adds the necessary packages for that. If you don't care about what's being pulled inNuitka[all]can be used, by defaultNuitkaonly comes with the bare minimum needed and will inform about missing packages. -
macOS: Added
--macos-sign-keyring-filenameand--macos-sign-keyring-passwordto automatically unlock a keyring for use during signing. This is very useful for CI where no UI prompt can be used. -
Windows: Detect when
inputcannot be used due to no console or the console not providing proper standard input and produce a dialog for entry instead. Shells likecmd.exeexecute inputs as commands entered when attaching to them. With this, the user is informed to make the input into the dialog instead. In case of no terminal, this just brings up the dialog for GUI mode. -
Plugins: Introduced
global_change_functionto the anti-bloat engine, allowing function replacements across all sub-modules of a package at once. (Added in 2.7.6 already.) -
Reports: For Python 3.13+, the compilation report now includes information on GIL usage. (Added in 2.7.7 already.)
-
macOS: Added an option to prevent an application from running in multiple instances. (Added in 2.7.7 already.)
-
AIX: Added support for this OS as well, now standalone and module mode work there too.
-
Scons: When C a compilation fails to due warnings in
--debugmode, recognize that and provide the proper extra options to use if you want to ignore that. -
Non-Deployment: Added a non-deployment handler to catch modules
-
Non-Deployment: Added non-deployment handler to catch modules that error exit on import, while assumed to work perfectly.
This will give people an indication that the
numpymodule is expected to work and that maybe just the newest version is not and we need to be told about it. -
Non-Deployment: Added a non-deployment handler for
DistributionNotFoundexceptions in the main program, which now points the user to the necessary metadata options. -
UI: Made
--include-data-files-externalthe primary option for placing data files alongside the created program.This now works with standalone mode too, and is no longer onefile specific, the name should reflect that and people can now use it more broadly.
-
Plugins: Added support for multiple warnings of the same kind. The
dill-compatplugin needs that as it supports multiple packages. -
Plugins: Added detector for the
dill-compatplugin that detects usages ofdill,cloudpickleandray.cloudpickle. -
Standalone: Add support for including Visual Code runtime dlls on Windows.
-
When MSVC (Visual Studio) is installed, we take the runtime DLLs from its folders. We cannot take the ones from the
redistpackages installed to system folders for license reasons. -
Gives a warning when these DLLs would be needed, but were not found.
-
We might want to add an option later to exclude them again, for size purposes, but correctness out of the box is more important for now.
-
-
UI: Make sure the distribution name is correct for
--include-distribution-metadataoption values. -
Plugins: Added support for configuring re-compilation of extension modules from their source code.
-
When we have both Python code and an extension module, we only had a global option available on the command line.
-
This adds
--recompile-extension-modulesfor more fine grained choices as it allows to specify names and patterns. -
For
zmq, we need to enforce it to never be compiled, as it checks if it is compiled with Cython at runtime, so re-compilation is never possible.
-
-
Reports: Include environment flags for C compiler and linker picked up for the compilation. Sometimes these cause compilation errors that and this will reveal there presence.
Optimization
-
Enhanced detection of
raisestatements that use compile-time constant values which are not actual exception instances.This improvement prevents Nuitka from crashing during code generation when encountering syntactically valid but semantically incorrect code, such as
raise NotImplemented. While such code is erroneous, it should not cause a compiler crash. (Added in 2.7.1 already.) -
With unknown locals dictionary variables trust very hard values there too.
-
With this using hard import names also optimize inside of classes.
-
This makes
gcloudmetadata work, which previously wasn't resolved in their code.
-
-
macOS: Enhanced
PySide2support by removing the general requirement for onefile mode. Onefile mode is now only enforced forQtWebEnginedue to its specific stability issues when not bundled this way. (Added in 2.7.4 already.) -
Scons: Added support for C23 embedding of the constants blob with ClangCL, avoiding the use of resources. Since the onefile bootstrap does not yet honor this for its payload, this feature is not yet complete but could help with size limitations in the future.
-
Plugins: Overhauled the UPX plugin.
Use better compression than before, hint the user at disabling onefile compression where applicable to avoid double compression. Output warnings for files that are not considered compressible. Check for
upxbinary sooner. -
Scons: Avoid compiling
haclcode for macOS where it's not needed.
Anti-Bloat
-
Improved handling of the
astropypackage by implementing global replacements instead of per-module ones. Similar global handling has also been applied toIPythonto reduce overhead. (Added in 2.7.1 already.) -
Avoid
docutilsusage in themarkdown2package. (Added in 2.7.1 already.) -
Reduced compiled size by avoiding the use of "docutils" within the
markdown2package. (Added in 2.7.1 already.) -
Avoid including the testing framework from the
langsmithpackage. (Added in 2.7.6 already.) -
Avoid including
setuptoolsfromjax.version. (Added in 2.7.6 already.) -
Avoid including
unittestfrom thereportlabpackage. (Added in 2.7.6 already.) -
Avoid including
IPythonfor thekeraspackage using a more global approach. (Added in 2.7.11 already.) -
Avoid including the
tritonpackage when compilingtransformers. (Added in 2.7.11 already.) -
Avoid a bloat warning for an optional import in the
seabornpackage. (Added in 2.7.13 already.) -
Avoid compiling generated
google.protobuf.*_pb2files. (Added in 2.7.7 already.) -
Avoid including
tritonandsetuptoolswhen using thexformerspackage. (Added in 2.7.16 already.) -
Refined
dasksupport to not removepandas.testingwhenpytestusage is allowed. (Added in 2.7.16 already.) -
Avoid compiling the
tensorflowmodule that is very slow and contains generated code. -
Avoid using
setuptoolsincupypackage. -
Avoid false bloat warning in
seadocpackage. -
Avoid using
daskinsklearnpackage. -
Avoid using
cupy.testingin thecupypackage. -
Avoid using
IPythonin theroboflowpackage. -
Avoid including
rayfor thevllmpackage. -
Avoid using
dillin thetorchpackage.
Organizational
-
UI: Remove obsolete options to control the compilation mode from help output. We are keeping them only to not break existing workflows, but
--mode=...should be used now, and these options will start triggering warnings soon. -
Python3.13.4: Reject broken CPython official release for Windows.
The link library included is not the one needed for GIL, and as such it breaks Nuitka heavily and must be errored out on, all smaller or larger micro versions work, but this one does not.
-
Release: Do not use Nuitka 2.7.9 as it broke data file access via
__file__in onefile mode on Windows. This is a brown paper bag release, with 2.7.10 containing only the fix for that. Sorry for the inconvenience. -
Release: Ensured proper handling of newer
setuptoolsversions during Nuitka installation. (Fixed in 2.7.4 already.) -
UI: Sort
--list-distribution-metadataoutput and remove duplicates. (Changed in 2.7.8 already.) -
Visual Code: Added a Python 2.6 configuration for Win32 to aid in comparisons and legacy testing.
-
UI: Now lists available Qt plugin families if
--include-qt-plugincannot find one. -
UI: Warn about compiling a file named
__main__.pywhich should be avoided, instead you should specify the package directory in that case.-
UI: Make it an error to compile a file named
__init__.pyfor standalone mode.
-
-
Debugging: The
--editoption now correctly finds files even when using long, non-shortened temporary file paths. -
Debugging: The
pyside6plugin now enforces--no-debug-immortal-assumptionswhen--debugis on because PySide6 violates these and we don't need Nuitka to check for that then as it will abort when it finds them. -
Quality: Avoid writing auto-formatted files with same contents
-
That avoids stirring up tools that listen to changes.
-
For example the Nuitka website auto-builder otherwise rebuilt per release post on docs update.
-
-
Quality: Use latest version of
deepdiff. -
Quality: Added autoformat for JSON files.
-
Release: The man pages were using outdated options and had no example for standalone or app modes. Also the actual options were no longer included.
-
GitHub: Use the
--modeoptions in the issue template as well. -
GitHub: Enhanced wordings for bug report template to give more directions and more space for excellent reports to be made.
-
GitHub: The bug report template now requests the output of our package metadata listing tool, as it provides more insight into how Nuitka perceives the environment.
-
Debugging: Re-enabled important warnings for Clang, which had unnoticed for a long time and prevented a few things from being recognized.
-
Debugging: Support arbitrary debuggers through -debugger-choice.
Support arbitrary debuggers for use in the
--debuggermode, if you specify all of their command line you can do anything there.Also added predefined
valgrind-memcheckmode for memory checker tool of Valgrind to be used. -
UI: Added rich as a progress bar that can be used. Since it's available via pip, it can likely be found and requires no inline copy. Added colors and similar behavior for
tqdmas well. -
UI: Remove obsolete warning for Linux with
upxplugin.We don't use
appimageanymore for a while now, so its constraints no longer apply. -
UI: Add warnings for module specific options too. The logic to not warn on GitHub Actions was inverted, this restores warnings for normal users.
-
UI: Output the module name in question for
options-nannyplugin and parameter warnings. -
UI: When a forbidden import comes from an implicit import, report it properly.
Sometimes
.pyifiles from extension modules cause an import, but it was not clear which one; now it will indicate the module causing it. -
UI: More clear error message in case a Python for scons was not found.
-
Actions: Cover debug mode compilation at least once.
-
Quality: Resolve paths from all OSes in
--edit. Sometime I want to look at a file on a different OS, and there is no need to enforce being on the same one for path resolution to work. -
Actions: Updated to a newer Ubuntu version for testing, as to get
clang-formatinstalled anymore. -
Debugging: Allow for C stack output in signal handlers, this is most useful when doing the non-deployment handler that catches them to know where they came from more precisely.
-
UI: Show no-GIL in output of Python flavor in compilation if relevant.
Tests
-
Removed Azure CI configuration, as testing has been fully migrated to GitHub Actions. (Changed in 2.7.9 already.)
-
Improved test robustness against short paths for package-containing directories. (Added in 2.7.4 already.)
-
Prevented test failures caused by rejected download prompts during test execution, making CI more stable. (Added in 2.7.4 already.)
-
Refactored common testing code to avoid using
doctests, preventing warnings in specific standalone mode test scenarios related to reference counting. (Added in 2.7.4 already.) -
Tests: Cover the memory leaking call re-formulation with a reference count test.
Cleanups
-
Plugins: Improved
pkg_resourcesintegration by using the__loader__attribute of the registering module for loader type registration, avoiding modification of the globalbuiltinsdictionary. (Fixed in 2.7.2 already.) -
Improved the logging mechanism for module search scans. It is now possible to enable tracing for individual
locateModulecalls, significantly enhancing readability and aiding debugging efforts. -
Scons: Refactored architecture specific options into dedicated functions to improve code clarity.
-
Spelling: Various spelling and wording cleanups.
-
Avoid using
#ifdefin C code templates, and let's just avoid it generally. -
Added missing slot function names to the ignored word list.
-
Renamed variables related to slots to be more verbose and proper spelling as a result, as that's for better understanding of their use anyway.
-
-
Scons: Specify versions supported for Scons by excluding the ones that are not, rather than manually maintaining a list. This adds automatic support for Python 3.14.
-
Plugins: Removed a useless call to
internas it did not have thought it does. -
Attach copyright during code generation for code specializations
-
This also enhances the formatting for almost all files by making leading and trailing new lines more consistent.
-
One C file turns out unused and was removed as a left over from a previous refactoring.
-
Summary
This release was supposed to focus on scalability, but that didn't happen again due to a variety of important issues coming up as well as a created downtime after high private difficulties after a planned surgery. However, the upcoming release will have it finally.
The onefile DLL mode as used on Windows has driven a lot of need for corrections, some of which are only in the final release, and this is probably the first time it should be usable for everything.
For compatibility, working with the popular (yet - not yes recommended UV-Python), Windows UI fixes for temporary onefile and macOS improvements, as well as improved Android support are excellent.
The next release of Nuitka however will have to focus on scalability and maintenance only. But as usual, not sure if it can happen.
15 Nov 2025 1:52pm GMT
14 Nov 2025
Django community aggregator: Community blog posts
Django News - PyCharm 30% Promotion Extended! - Nov 14th 2025
News
Support the Django Software Foundation by buying PyCharm at a 30% Discount
The Django Software Foundation's primary fundraiser has been extended, so you can get 30 percent off PyCharm Pro and support Django until November 19.
Call for Proposals for DjangoCon US 2026 Website!
DjangoCon US 2026 requests proposals to redesign branding, illustrations, and the 11ty and Tailwind website for Chicago, including swag, signage, and starter code.
"Boost Your GitHub DX" out now
Boost Your GitHub DX by Adam Johnson provides practical guidance on GitHub features, gh CLI, and Actions to streamline collaboration and speed software delivery.
Django Software Foundation
Five ways to discover Django packages
New Django ecosystem page plus resources like State of Django survey, Django Packages, Awesome Django, Reddit and newsletters help developers discover third-party Django packages.
Django at PyCon FR 2025
Highlights from PyCon France where 27 contributors joined together in sprints, discussions of Django's direction, htmx presentations, and more.
Python Software Foundation
Trusted Publishing is popular, now for GitLab Self-Managed and Organizations
Django projects can now use PyPI Trusted Publishing to securely publish packages, with GitLab Self Managed beta support and organization pending publishers.
Updates to Django
Today, "Updates to Django" is presented by Raffaella from Djangonaut Space! 🚀
Last week we had 14 pull requests merged into Django by 11 different contributors - including 3 first-time contributors! Congratulations to Hal Blackburn, Mehraz Hossain Rumman, and Harsh Jain for having their first commits merged into Django - welcome on board!
Fixed a bug in Django 5.2 where proxy models having a CompositePrimaryKey incorrectly raised a models.E042 system check error.
Refactored async code to use asyncio.TaskGroup for cleaner, modern concurrency management. Thank you for the hard work on this. 🎉
Django Newsletter
Sponsored Link 1
Peace of Mind for Your Django Projects
Great code doesn't keep you up at night. From maintenance to scalability, we've got your Django project under control. 🧑💻 Partner with HackSoft today!
Articles
Django Admin Deux: Bringing Django's Admin Back to Django
Django Admin Deux is a proof of concept admin built on Django generic class-based views, plugin-first architecture, and action-based CRUD.
Preserving referential integrity with JSON fields and Django
Adds referential integrity for model references stored in JSONField within Django by registering model links and enforcing on_delete protection using django-json-schema-editor.
Django-Tailwind v4.4: Now with Zero Node.js Setup via Standalone Tailwind CLI
Django-Tailwind 4.4 adds support for Tailwind's Standalone CLI via pytailwindcss, enabling Tailwind CSS workflows without requiring Node.js, and integrates it into manage.py.
django-deadcode: idea to release in under 2 hours
django-deadcode was prototyped and published in about two hours using Agent OS and Claude to analyze Django projects for removable dead code.
Django Fellow Report
Django Fellow Report - Natalia
A very security-heavy week . Most of my energy went into preparing and testing patches for the upcoming security release, including a tough vulnerability that I spent quite some time wrestling with. It was demanding and a bit exhausting, but everything is now on track for next week's release.
Django Fellow Report - Jacob
This week we landed the JSONNull expression I mentioned last week. We also landed a follow-up to the database delete behavior feature to add support in inspectdb.
Events
Behind the Curtain as a Conference Chair
Chairing DjangoCon US 2025 taught that effective leadership means creating and holding welcoming spaces for community, volunteers, and speakers to collaborate and thrive.
Videos
PyBay 2025 - YouTube
PyBay 2025 features talks on Python tooling, robust testing, typing, async performance, LLM integration, and data validation relevant to Django backends.
The future of Python and AI with Guido van Rossum
Guido van Rossum discusses Python's role in the AI era, TypeAgent and typing tools like Pyright, and AI coding workflows with VS Code and Copilot. There are some nice Django and DjangoCon US shoutouts here.
Podcasts
Django Chat #189: Django 20 Years Later with Adrian Holovaty
Adrian finally agreed to come on the podcast! This episode was so much fun to record. Adrian is one of the original creators of Django and we discussed everything from initial design decisions with twenty years of hindsight, why modern JavaScript is so complicated, coding with LLMs, and much more.
Django News Jobs
Looking for your next Django focused role? Here are the latest openings across security engineering, backend development, and university innovation.
Job Application for Senior Application Security Engineer at Energy Solutions - USA 🆕
Senior Python Developer at Basalt Health 🆕
Senior Back-End Developer at Showcare 🆕
Software Engineer Lead at Center for Academic Innovation, University of Michigan
Part-Time Senior Full-Stack Engineer (Python/Django) (gn) at voiio
Django Newsletter
Django Forum
DEP 15 - Improved startproject interface - Django Internals
DEP 15 standardizes and extends startproject to support multiple modern project layouts while preserving backwards compatibility and encouraging consistent, opinionated project structures.
Projects
stuartmaxwell/djcheckup
DJ Checkup is a security scanner for Django sites. This package provides a command-line interface to run the security checks against your Django site.
wsvincent/djangoforai
Django + local LLM + server side events + HTMX demo. As presented during DjangoCon US 2025 talk.
This RSS feed is published on https://django-news.com/. You can also subscribe via email.
14 Nov 2025 5:00pm GMT
Planet Python
Real Python: The Real Python Podcast – Episode #274: Preparing Data Science Projects for Production
How do you prepare your Python data science projects for production? What are the essential tools and techniques to make your code reproducible, organized, and testable? This week on the show, Khuyen Tran from CodeCut discusses her new book, "Production Ready Data Science."
[ Improve Your Python With 🐍 Python Tricks 💌 - Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
14 Nov 2025 12:00pm GMT
EuroPython Society: Recognising Michael Foord as an Honorary EuroPython Society Fellow
Hi everyone. Today, we are honoured to announce a very special recognition.
The EuroPython Society has posthumously elected Michael Foord (aka voidspace) as an Honorary EuroPython Society Fellow.
Michael Foord (1974-2025)
Michael was a long-time and deeply influential member of the Python community. He began using Python in 2002, became a Python core developer, and left a lasting mark on the language through his work on unittest and the creation of the mock library. He also started the tradition of the Python Language Summits at PyCon US, and he consistently supported and connected the Python community across Europe and beyond.
However, his legacy extends far beyond code. Many of us first met Michael through his writing and tools, but what stayed with people was the example he set through his contributions, and how he showed up for others. He answered questions with patience, welcomed newcomers, and cared about doing the right thing in small, everyday ways. He made space for people to learn. He helped the Python community in Europe grow stronger and more connected. He made our community feel like a community.
His impact was celebrated widely across the community, with many tributes reflecting his kindness, humour, and dedication:
At EuroPython 2025, we held a memorial and kept a seat for him in the Forum Hall:
A lasting tribute
EuroPython Society Fellows are people whose work and care move our mission forward. By naming Michael an Honorary Fellow, we acknowledge his technical contributions and also the kindness and curiosity that defined his presence among us. We are grateful for the example he set, and we miss him.
Our thoughts and thanks are with Michael&aposs friends, collaborators, and family. His work lives on in our tools. His spirit lives on in how we treat each other.
With gratitude,
Your friends at EuroPython Society
14 Nov 2025 9:00am GMT
Django community aggregator: Community blog posts
How to use UUIDv7 in Python, Django and PostgreSQL
Learn how to use UUIDv7 today with stable releases of Python 3.14, Django 5.2 and PostgreSQL 18. A step by step guide showing how to generate UUIDv7 in Python, store them in Django models, use PostgreSQL native functions and build time ordered primary keys without writing SQL.
14 Nov 2025 5:00am GMT
13 Nov 2025
Django community aggregator: Community blog posts
django-deadcode: idea to release in under 2 hours
A few weeks ago I noticed a toot from Jeff Triplett about Anthrophic releasing Claude Code for the Web. This was the final spark that coalesced a few different thoughts that had been lingering in the back of my head, some of I have written a bit about before. The first thought was the speed of prototyping that agentic AI enables. Essentially ideas or tasks can simply be executed rather than being written down and the time allocated to develop these ideas go from weeks to days or even hours. The second thought is related to the first in that tools like Agent OS allows for AI to build out products in a more reliable way for the most part. I have also been pondering how I can use my mobile more as an engineer, the Github app is ok for PR reviews, but to date building anything needs a larger screen.
The final thought goes back to the toot from Jeff and Claude Code on the being possibly the cloest thing so far to my post from day 282 about how our tooling doesn't yet fully leverage what AI can do for us.
Well this led to me creating two things this week. First was a template repo on Github which is only loaded with an install of my Django Agent OS profile. This enables me to quickly start a project in the browser without having to open a terminal or potentially even be at my laptop, I could start a project from my phone. Second was an experiment to see how much I could get Claude to build from the browser. I took my idea from Day 71 about analysing a Django codebase for dead code that could be removed. Over the course of about 2 hours of my time and letting Claude along with Agent OS, I have a package released on PyPI.
The unlock here is that I have yet to clone the repo to my laptop. In fact, the most time consuming part has been getting CI to work nicely to release new versions. Upon reflection this is something to go into the template repository, but then not every project needs to be uploaded to PyPI.
It's been a fun experiment to get a working proof of concept out the door so quickly, but it needs a bit more refinement, testing and review before I recommend anyone else use it! If you want to have a peak, the repo is here and package is here
13 Nov 2025 6:00am GMT
11 Nov 2025
Planet Twisted
Glyph Lefkowitz: The “Dependency Cutout” Workflow Pattern, Part I
Tell me if you've heard this one before.
You're working on an application. Let's call it "FooApp". FooApp has a dependency on an open source library, let's call it "LibBar". You find a bug in LibBar that affects FooApp.
To envisage the best possible version of this scenario, let's say you actively like LibBar, both technically and socially. You've contributed to it in the past. But this bug is causing production issues in FooApp today, and LibBar's release schedule is quarterly. FooApp is your job; LibBar is (at best) your hobby. Blocking on the full upstream contribution cycle and waiting for a release is an absolute non-starter.
What do you do?
There are a few common reactions to this type of scenario, all of which are bad options.
I will enumerate them specifically here, because I suspect that some of them may resonate with many readers:
-
Find an alternative to LibBar, and switch to it.
This is a bad idea because a transition to a core infrastructure component could be extremely expensive.
-
Vendor LibBar into your codebase and fix your vendored version.
This is a bad idea because carrying this one fix now requires you to maintain all the tooling associated with a monorepo1: you have to be able to start pulling in new versions from LibBar regularly, reconcile your changes even though you now have a separate version history on your imported version, and so on.
-
Monkey-patch LibBar to include your fix.
This is a bad idea because you are now extremely tightly coupled to a specific version of LibBar. By modifying LibBar internally like this, you're inherently violating its compatibility contract, in a way which is going to be extremely difficult to test. You can test this change, of course, but as LibBar changes, you will need to replicate any relevant portions of its test suite (which may be its entire test suite) in FooApp. Lots of potential duplication of effort there.
-
Implement a workaround in your own code, rather than fixing it.
This is a bad idea because you are distorting the responsibility for correct behavior. LibBar is supposed to do LibBar's job, and unless you have a full wrapper for it in your own codebase, other engineers (including "yourself, personally") might later forget to go through the alternate, workaround codepath, and invoke the buggy LibBar behavior again in some new place.
-
Implement the fix upstream in LibBar anyway, because that's the Right Thing To Do, and burn credibility with management while you anxiously wait for a release with the bug in production.
This is a bad idea because you are betraying your users - by allowing the buggy behavior to persist - for the workflow convenience of your dependency providers. Your users are probably giving you money, and trusting you with their data. This means you have both ethical and economic obligations to consider their interests.
As much as it's nice to participate in the open source community and take on an appropriate level of burden to maintain the commons, this cannot sustainably be at the explicit expense of the population you serve directly.
Even if we only care about the open source maintainers here, there's still a problem: as you are likely to come under immediate pressure to ship your changes, you will inevitably relay at least a bit of that stress to the maintainers. Even if you try to be exceedingly polite, the maintainers will know that you are coming under fire for not having shipped the fix yet, and are likely to feel an even greater burden of obligation to ship your code fast.
Much as it's good to contribute the fix, it's not great to put this on the maintainers.
The respective incentive structures of software development - specifically, of corporate application development and open source infrastructure development - make options 1-4 very common.
On the corporate / application side, these issues are:
-
it's difficult for corporate developers to get clearance to spend even small amounts of their work hours on upstream open source projects, but clearance to spend time on the project they actually work on is implicit. If it takes 3 hours of wrangling with Legal2 and 3 hours of implementation work to fix the issue in LibBar, but 0 hours of wrangling with Legal and 40 hours of implementation work in FooApp, a FooApp developer will often perceive it as "easier" to fix the issue downstream.
-
it's difficult for corporate developers to get clearance from management to spend even small amounts of money sponsoring upstream reviewers, so even if they can find the time to contribute the fix, chances are high that it will remain stuck in review unless they are personally well-integrated members of the LibBar development team already.
-
even assuming there's zero pressure whatsoever to avoid open sourcing the upstream changes, there's still the fact inherent to any development team that FooApp's developers will be more familiar with FooApp's codebase and development processes than they are with LibBar's. It's just easier to work there, even if all other things are equal.
-
systems for tracking risk from open source dependencies often lack visibility into vendoring, particularly if you're doing a hybrid approach and only vendoring a few things to address work in progress, rather than a comprehensive and disciplined approach to a monorepo. If you fully absorb a vendored dependency and then modify it, Dependabot isn't going to tell you that a new version is available any more, because it won't be present in your dependency list. Organizationally this is bad of course but from the perspective of an individual developer this manifests mostly as fewer annoying emails.
But there are problems on the open source side as well. Those problems are all derived from one big issue: because we're often working with relatively small sums of money, it's hard for upstream open source developers to consume either money or patches from application developers. It's nice to say that you should contribute money to your dependencies, and you absolutely should, but the cost-benefit function is discontinuous. Before a project reaches the fiscal threshold where it can be at least one person's full-time job to worry about this stuff, there's often no-one responsible in the first place. Developers will therefore gravitate to the issues that are either fun, or relevant to their own job.
These mutually-reinforcing incentive structures are a big reason that users of open source infrastructure, even teams who work at corporate users with zillions of dollars, don't reliably contribute back.
The Answer We Want
All those options are bad. If we had a good option, what would it look like?
It is both practically necessary3 and morally required4 for you to have a way to temporarily rely on a modified version of an open source dependency, without permanently diverging.
Below, I will describe a desirable abstract workflow for achieving this goal.
Step 0: Report the Problem
Before you get started with any of these other steps, write up a clear description of the problem and report it to the project as an issue; specifically, in contrast to writing it up as a pull request. Describe the problem before submitting a solution.
You may not be able to wait for a volunteer-run open source project to respond to your request, but you should at least tell the project what you're planning on doing.
If you don't hear back from them at all, you will have at least made sure to comprehensively describe your issue and strategy beforehand, which will provide some clarity and focus to your changes.
If you do hear back from them, in the worst case scenario, you may discover that a hard fork will be necessary because they don't consider your issue valid, but even that information will save you time, if you know it before you get started. In the best case, you may get a reply from the project telling you that you've misunderstood its functionality and that there is already a configuration parameter or usage pattern that will resolve your problems with no new code. But in all cases, you will benefit from early coordination on what needs fixing before you get to how to fix it.
Step 1: Source Code and CI Setup
Fork the source code for your upstream dependency to a writable location where it can live at least for the duration of this one bug-fix, and possibly for the duration of your application's use of the dependency. After all, you might want to fix more than one bug in LibBar.
You want to have a place where you can put your edits, that will be version controlled and code reviewed according to your normal development process. This probably means you'll need to have your own main branch that diverges from your upstream's main branch.
Remember: you're going to need to deploy this to your production, so testing gates that your upstream only applies to final releases of LibBar will need to be applied to every commit here.
Depending on your LibBar's own development process, this may result in slightly unusual configurations where, for example, your fixes are written against the last LibBar release tag, rather than its current5 main; if the project has a branch-freshness requirement, you might need two branches, one for your upstream PR (based on main) and one for your own use (based on the release branch with your changes).
Ideally for projects with really good CI and a strong "keep main release-ready at all times" policy, you can deploy straight from a development branch, but it's good to take a moment to consider this before you get started. It's usually easier to rebase changes from an older HEAD onto a newer one than it is to go backwards.
Speaking of CI, you will want to have your own CI system. The fact that GitHub Actions has become a de-facto lingua franca of continuous integration means that this step may be quite simple, and your forked repo can just run its own instance.
Optional Bonus Step 1a: Artifact Management
If you have an in-house artifact repository, you should set that up for your dependency too, and upload your own build artifacts to it. You can often treat your modified dependency as an extension of your own source tree and install from a GitHub URL, but if you've already gone to the trouble of having an in-house package repository, you can pretend you've taken over maintenance of the upstream package temporarily (which you kind of have) and leverage those workflows for caching and build-time savings as you would with any other internal repo.
Step 2: Do The Fix
Now that you've got somewhere to edit LibBar's code, you will want to actually fix the bug.
Step 2a: Local Filesystem Setup
Before you have a production version on your own deployed branch, you'll want to test locally, which means having both repositories in a single integrated development environment.
At this point, you will want to have a local filesystem reference to your LibBar dependency, so that you can make real-time edits, without going through a slow cycle of pushing to a branch in your LibBar fork, pushing to a FooApp branch, and waiting for all of CI to run on both.
This is useful in both directions: as you prepare the FooApp branch that makes any necessary updates on that end, you'll want to make sure that FooApp can exercise the LibBar fix in any integration tests. As you work on the LibBar fix itself, you'll also want to be able to use FooApp to exercise the code and see if you've missed anything - and this, you wouldn't get in CI, since LibBar can't depend on FooApp itself.
In short, you want to be able to treat both projects as an integrated development environment, with support from your usual testing and debugging tools, just as much as you want your deployment output to be an integrated artifact.
Step 2b: Branch Setup for PR
However, for continuous integration to work, you will also need to have a remote resource reference of some kind from FooApp's branch to LibBar. You will need 2 pull requests: the first to land your LibBar changes to your internal LibBar fork and make sure it's passing its own tests, and then a second PR to switch your LibBar dependency from the public repository to your internal fork.
At this step it is very important to ensure that there is an issue filed on your own internal backlog to drop your LibBar fork. You do not want to lose track of this work; it is technical debt that must be addressed.
Until it's addressed, automated tools like Dependabot will not be able to apply security updates to LibBar for you; you're going to need to manually integrate every upstream change. This type of work is itself very easy to drop or lose track of, so you might just end up stuck on a vulnerable version.
Step 3: Deploy Internally
Now that you're confident that the fix will work, and that your temporarily-internally-maintained version of LibBar isn't going to break anything on your site, it's time to deploy.
Some deployment heritage should help to provide some evidence that your fix is ready to land in LibBar, but at the next step, please remember that your production environment isn't necessarily emblematic of that of all LibBar users.
Step 4: Propose Externally
You've got the fix, you've tested the fix, you've got the fix in your own production, you've told upstream you want to send them some changes. Now, it's time to make the pull request.
You're likely going to get some feedback on the PR, even if you think it's already ready to go; as I said, despite having been proven in your production environment, you may get feedback about additional concerns from other users that you'll need to address before LibBar's maintainers can land it.
As you process the feedback, make sure that each new iteration of your branch gets re-deployed to your own production. It would be a huge bummer to go through all this trouble, and then end up unable to deploy the next publicly released version of LibBar within FooApp because you forgot to test that your responses to feedback still worked on your own environment.
Step 4a: Hurry Up And Wait
If you're lucky, upstream will land your changes to LibBar. But, there's still no release version available. Here, you'll have to stay in a holding pattern until upstream can finalize the release on their end.
Depending on some particulars, it might make sense at this point to archive your internal LibBar repository and move your pinned release version to a git hash of the LibBar version where your fix landed, in their repository.
Before you do this, check in with the LibBar core team and make sure that they understand that's what you're doing and they don't have any wacky workflows which may involve rebasing or eliding that commit as part of their release process.
Step 5: Unwind Everything
Finally, you eventually want to stop carrying any patches and move back to an official released version that integrates your fix.
You want to do this because this is what the upstream will expect when you are reporting bugs. Part of the benefit of using open source is benefiting from the collective work to do bug-fixes and such, so you don't want to be stuck off on a pinned git hash that the developers do not support for anyone else.
As I said in step 2b6, make sure to maintain a tracking task for doing this work, because leaving this sort of relatively easy-to-clean-up technical debt lying around is something that can potentially create a lot of aggravation for no particular benefit. Make sure to put your internal LibBar repository into an appropriate state at this point as well.
Up Next
This is part 1 of a 2-part series. In part 2, I will explore in depth how to execute this workflow specifically for Python packages, using some popular tools. I'll discuss my own workflow, standards like PEP 517 and pyproject.toml, and of course, by the popular demand that I just know will come, uv.
Acknowledgments
Thank you to my patrons who are supporting my writing on this blog. If you like what you've read here and you'd like to read more of it, or you'd like to support my various open-source endeavors, you can support my work as a sponsor!
-
if you already have all the tooling associated with a monorepo, including the ability to manage divergence and reintegrate patches with upstream, you already have the higher-overhead version of the workflow I am going to propose, so, never mind. but chances are you don't have that, very few companies do. ↩
-
In any business where one must wrangle with Legal, 3 hours is a wildly optimistic estimate. ↩
-
In an ideal world every project would keep its main branch ready to release at all times, no matter what but we do not live in an ideal world. ↩
-
In this case, there is no question. It's 2b only, no not-2b. ↩
11 Nov 2025 1:44am GMT
15 Oct 2025
Planet Plone - Where Developers And Integrators Write
Maurits van Rees: Jakob Kahl and Erico Andrei: Flying from one Plone version to another

This is a talk about migrating from Plone 4 to 6 with the newest toolset.
There are several challenges when doing Plone migrations:
- Highly customized source instances: custom workflow, add-ons, not all of them with versions that worked on Plone 6.
- Complex data structures. For example a Folder with a Link as default page, with pointed to some other content which meanwhile had been moved.
- Migrating Classic UI to Volto
- Also, you might be migrating from a completely different CMS to Plone.
How do we do migrations in Plone in general?
- In place migrations. Run migration steps on the source instance itself. Use the standard upgrade steps from Plone. Suitable for smaller sites with not so much complexity. Especially suitable if you do only a small Plone version update.
- Export - import migrations. You extract data from the source, transform it, and load the structure in the new site. You transform the data outside of the source instance. Suitable for all kinds of migrations. Very safe approach: only once you are sure everything is fine, do you switch over to the newly migrated site. Can be more time consuming.
Let's look at export/import, which has three parts:
- Extraction: you had collective.jsonify, transmogrifier, and now collective.exportimport and plone.exportimport.
- Transformation: transmogrifier, collective.exportimport, and new: collective.transmute.
- Load: Transmogrifier, collective.exportimport, plone.exportimport.
Transmogrifier is old, we won't talk about it now. collective.exportimport: written by Philip Bauer mostly. There is an @@export_all view, and then @@import_all to import it.
collective.transmute is a new tool. This is made to transform data from collective.exportimport to the plone.exportimport format. Potentially it can be used for other migrations as well. Highly customizable and extensible. Tested by pytest. It is standalone software with a nice CLI. No dependency on Plone packages.
Another tool: collective.html2blocks. This is a lightweight Python replacement for the JavaScript Blocks conversion tool. This is extensible and tested.
Lastly plone.exportimport. This is a stripped down version of collective.exportimport. This focuses on extract and load. No transforms. So this is best suited for importing to a Plone site with the same version.
collective.transmute is in alpha, probably a 1.0.0 release in the next weeks. Still missing quite some documentation. Test coverage needs some improvements. You can contribute with PRs, issues, docs.
15 Oct 2025 3:44pm GMT
Maurits van Rees: Mikel Larreategi: How we deploy cookieplone based projects.

We saw that cookieplone was coming up, and Docker, and as game changer uv making the installation of Python packages much faster.
With cookieplone you get a monorepo, with folders for backend, frontend, and devops. devops contains scripts to setup the server and deploy to it. Our sysadmins already had some other scripts. So we needed to integrate that.
First idea: let's fork it. Create our own copy of cookieplone. I explained this in my World Plone Day talk earlier this year. But cookieplone was changing a lot, so it was hard to keep our copy updated.
Maik Derstappen showed me copier, yet another templating language. Our idea: create a cookieplone project, and then use copier to modify it.
What about the deployment? We are on GitLab. We host our runners. We use the docker-in-docker service. We develop on a branch and create a merge request (pull request in GitHub terms). This activates a piple to check-test-and-build. When it is merged, bump the version, use release-it.
Then we create deploy keys and tokens. We give these access to private GitLab repositories. We need some changes to SSH key management in pipelines, according to our sysadmins.
For deployment on the server: we do not yet have automatic deployments. We did not want to go too fast. We are testing the current pipelines and process, see if they work properly. In the future we can think about automating deployment. We just ssh to the server, and perform some commands there with docker.
Future improvements:
- Start the docker containers and curl/wget the
/okendpoint. - lock files for the backend, with pip/uv.
15 Oct 2025 3:41pm GMT
Maurits van Rees: David Glick: State of plone.restapi

[Missed the first part.]
Vision: plone.restapi aims to provide a complete, stable, documented, extensible, language-agnostic API for the Plone CMS.
New services
@site: global site settings. These are overall, public settings that are needed on all pages and that don't change per context.@login: choose between multiple login provider.@navroot: contextual data from the navigation root of the current context.@inherit: contextual data from any behavior. It looks for the closest parent that has this behavior defined, and gets this data.
Dynamic teaser blocks: you can choose to customize the teaser content. So the teaser links to the item you have selected, but if you want, you can change the title and other fields.
Roadmap:
- Don't break it.
- 10.0 release for Plone 6.2: remove setuptools namespace.
- Continue to support migration path from older versions: use an old plone.restapi version on an old Plone version to export it, and being able to import this to the latest versions.
- Recycle bin (work in progress): a lot of the work from Rohan is in Classic UI, but he is working on the restapi as well.
Wishlist, no one is working on this, but would be good to have:
@permissionsendpoint@catalogendpoint- missing control panel
- folder type constraints
- Any time that you find yourself going to the Classic UI to do something, that is a sign something is missing.
- Some changes to relative paths to fix some use cases
- Machine readable specifications for OpenAPI, MCP
- New forms backend
- Bulk operations
- Streaming API
- External functional test suite, that you could also run against e.g. guillotina or Nick to see if it works there as well.
- Time travel: be able to see the state of the database from some time ago. The ZODB has some options here.
15 Oct 2025 3:39pm GMT
15 Aug 2025
Planet Twisted
Glyph Lefkowitz: The Futzing Fraction
The most optimistic vision of generative AI1 is that it will relieve us of the tedious, repetitive elements of knowledge work so that we can get to work on the really interesting problems that such tedium stands in the way of. Even if you fully believe in this vision, it's hard to deny that today, some tedium is associated with the process of using generative AI itself.
Generative AI also isn't free, and so, as responsible consumers, we need to ask: is it worth it? What's the ROI of genAI, and how can we tell? In this post, I'd like to explore a logical framework for evaluating genAI expenditures, to determine if your organization is getting its money's worth.
Perpetually Proffering Permuted Prompts
I think most LLM users would agree with me that a typical workflow with an LLM rarely involves prompting it only one time and getting a perfectly useful answer that solves the whole problem.
Generative AI best practices, even from the most optimistic vendors all suggest that you should continuously evaluate everything. ChatGPT, which is really the only genAI product with significantly scaled adoption, still says at the bottom of every interaction:
ChatGPT can make mistakes. Check important info.
If we have to "check important info" on every interaction, it stands to reason that even if we think it's useful, some of those checks will find an error. Again, if we think it's useful, presumably the next thing to do is to perturb our prompt somehow, and issue it again, in the hopes that the next invocation will, by dint of either:
- better luck this time with the stochastic aspect of the inference process,
- enhanced application of our skill to engineer a better prompt based on the deficiencies of the current inference, or
- better performance of the model by populating additional context in subsequent chained prompts.
Unfortunately, given the relative lack of reliable methods to re-generate the prompt and receive a better answer2, checking the output and re-prompting the model can feel like just kinda futzing around with it. You try, you get a wrong answer, you try a few more times, eventually you get the right answer that you wanted in the first place. It's a somewhat unsatisfying process, but if you get the right answer eventually, it does feel like progress, and you didn't need to use up another human's time.
In fact, the hottest buzzword of the last hype cycle is "agentic". While I have my own feelings about this particular word3, its current practical definition is "a generative AI system which automates the process of re-prompting itself, by having a deterministic program evaluate its outputs for correctness".
A better term for an "agentic" system would be a "self-futzing system".
However, the ability to automate some level of checking and re-prompting does not mean that you can fully delegate tasks to an agentic tool, either. It is, plainly put, not safe. If you leave the AI on its own, you will get terrible results that will at best make for a funny story45 and at worst might end up causing serious damage67.
Taken together, this all means that for any consequential task that you want to accomplish with genAI, you need an expert human in the loop. The human must be capable of independently doing the job that the genAI system is being asked to accomplish.
When the genAI guesses correctly and produces usable output, some of the human's time will be saved. When the genAI guesses wrong and produces hallucinatory gibberish or even "correct" output that nevertheless fails to account for some unstated but necessary property such as security or scale, some of the human's time will be wasted evaluating it and re-trying it.
Income from Investment in Inference
Let's evaluate an abstract, hypothetical genAI system that can automate some work for our organization. To avoid implicating any specific vendor, let's call the system "Mallory".
Is Mallory worth the money? How can we know?
Logically, there are only two outcomes that might result from using Mallory to do our work.
- We prompt Mallory to do some work; we check its work, it is correct, and some time is saved.
- We prompt Mallory to do some work; we check its work, it fails, and we futz around with the result; this time is wasted.
As a logical framework, this makes sense, but ROI is an arithmetical concept, not a logical one. So let's translate this into some terms.
In order to evaluate Mallory, let's define the Futzing Fraction, " FF ", in terms of the following variables:
- H
-
the average amount of time a Human worker would take to do a task, unaided by Mallory
- I
-
the amount of time that Mallory takes to run one Inference8
- C
-
the amount of time that a human has to spend Checking Mallory's output for each inference
- P
-
the Probability that Mallory will produce a correct inference for each prompt
- W
-
the average amount of time that it takes for a human to Write one prompt for Mallory
- E
-
since we are normalizing everything to time, rather than money, we do also have to account for the dollar of Mallory as as a product, so we will include the Equivalent amount of human time we could purchase for the marginal cost of one9 inference.
As in last week's example of simple ROI arithmetic, we will put our costs in the numerator, and our benefits in the denominator.
The idea here is that for each prompt, the minimum amount of time-equivalent cost possible is W+I+C+E. The user must, at least once, write a prompt, wait for inference to run, then check the output; and, of course, pay any costs to Mallory's vendor.
If the probability of a correct answer is P=13, then they will do this entire process 3 times10, so we put P in the denominator. Finally, we divide everything by H, because we are trying to determine if we are actually saving any time or money, versus just letting our existing human, who has to be driving this process anyway, do the whole thing.
If the Futzing Fraction evaluates to a number greater than 1, as previously discussed, you are a bozo; you're spending more time futzing with Mallory than getting value out of it.
Figuring out the Fraction is Frustrating
In order to even evaluate the value of the Futzing Fraction though, you have to have a sound method to even get a vague sense of all the terms.
If you are a business leader, a lot of this is relatively easy to measure. You vaguely know what H is, because you know what your payroll costs, and similarly, you can figure out E with some pretty trivial arithmetic based on Mallory's pricing table. There are endless YouTube channels, spec sheets and benchmarks to give you I. W is probably going to be so small compared to H that it hardly merits consideration11.
But, are you measuring C? If your employees are not checking the outputs of the AI, you're on a path to catastrophe that no ROI calculation can capture, so it had better be greater than zero.
Are you measuring P? How often does the AI get it right on the first try?
Challenges to Computing Checking Costs
In the fraction defined above, the term C is going to be large. Larger than you think.
Measuring P and C with a high degree of precision is probably going to be very hard; possibly unreasonably so, or too expensive12 to bother with in practice. So you will undoubtedly need to work with estimates and proxy metrics. But you have to be aware that this is a problem domain where your normal method of estimating is going to be extremely vulnerable to inherent cognitive bias, and find ways to measure.
Margins, Money, and Metacognition
First let's discuss cognitive and metacognitive bias.
My favorite cognitive bias is the availability heuristic and a close second is its cousin salience bias. Humans are empirically predisposed towards noticing and remembering things that are more striking, and to overestimate their frequency.
If you are estimating the variables above based on the vibe that you're getting from the experience of using an LLM, you may be overestimating its utility.
Consider a slot machine.
If you put a dollar in to a slot machine, and you lose that dollar, this is an unremarkable event. Expected, even. It doesn't seem interesting. You can repeat this over and over again, a thousand times, and each time it will seem equally unremarkable. If you do it a thousand times, you will probably get gradually more anxious as your sense of your dwindling bank account becomes slowly more salient, but losing one more dollar still seems unremarkable.
If you put a dollar in a slot machine and it gives you a thousand dollars, that will probably seem pretty cool. Interesting. Memorable. You might tell a story about this happening, but you definitely wouldn't really remember any particular time you lost one dollar.
Luckily, when you arrive at a casino with slot machines, you probably know well enough to set a hard budget in the form of some amount of physical currency you will have available to you. The odds are against you, you'll probably lose it all, but any responsible gambler will have an immediate, physical representation of their balance in front of them, so when they have lost it all, they can see that their hands are empty, and can try to resist the "just one more pull" temptation, after hitting that limit.
Now, consider Mallory.
If you put ten minutes into writing a prompt, and Mallory gives a completely off-the-rails, useless answer, and you lose ten minutes, well, that's just what using a computer is like sometimes. Mallory malfunctioned, or hallucinated, but it does that sometimes, everybody knows that. You only wasted ten minutes. It's fine. Not a big deal. Let's try it a few more times. Just ten more minutes. It'll probably work this time.
If you put ten minutes into writing a prompt, and it completes a task that would have otherwise taken you 4 hours, that feels amazing. Like the computer is magic! An absolute endorphin rush.
Very memorable. When it happens, it feels like P=1.
But... did you have a time budget before you started? Did you have a specified N such that "I will give up on Mallory as soon as I have spent N minutes attempting to solve this problem with it"? When the jackpot finally pays out that 4 hours, did you notice that you put 6 hours worth of 10-minute prompt coins into it in?
If you are attempting to use the same sort of heuristic intuition that probably works pretty well for other business leadership decisions, Mallory's slot-machine chat-prompt user interface is practically designed to subvert those sensibilities. Most business activities do not have nearly such an emotionally variable, intermittent reward schedule. They're not going to trick you with this sort of cognitive illusion.
Thus far we have been talking about cognitive bias, but there is a metacognitive bias at play too: while Dunning-Kruger, everybody's favorite metacognitive bias does have some problems with it, the main underlying metacognitive bias is that we tend to believe our own thoughts and perceptions, and it requires active effort to distance ourselves from them, even if we know they might be wrong.
This means you must assume any intuitive estimate of C is going to be biased low; similarly P is going to be biased high. You will forget the time you spent checking, and you will underestimate the number of times you had to re-check.
To avoid this, you will need to decide on a Ulysses pact to provide some inputs to a calculation for these factors that you will not be able to able to fudge if they seem wrong to you.
Problematically Plausible Presentation
Another nasty little cognitive-bias landmine for you to watch out for is the authority bias, for two reasons:
- People will tend to see Mallory as an unbiased, external authority, and thereby see it as more of an authority than a similarly-situated human13.
- Being an LLM, Mallory will be overconfident in its answers14.
The nature of LLM training is also such that commonly co-occurring tokens in the training corpus produce higher likelihood of co-occurring in the output; they're just going to be closer together in the vector-space of the weights; that's, like, what training a model is, establishing those relationships.
If you've ever used an heuristic to informally evaluate someone's credibility by listening for industry-specific shibboleths or ways of describing a particular issue, that skill is now useless. Having ingested every industry's expert literature, commonly-occurring phrases will always be present in Mallory's output. Mallory will usually sound like an expert, but then make mistakes at random.15.
While you might intuitively estimate C by thinking "well, if I asked a person, how could I check that they were correct, and how long would that take?" that estimate will be extremely optimistic, because the heuristic techniques you would use to quickly evaluate incorrect information from other humans will fail with Mallory. You need to go all the way back to primary sources and actually fully verify the output every time, or you will likely fall into one of these traps.
Mallory Mangling Mentorship
So far, I've been describing the effect Mallory will have in the context of an individual attempting to get some work done. If we are considering organization-wide adoption of Mallory, however, we must also consider the impact on team dynamics. There are a number of possible potential side effects that one might consider when looking at, but here I will focus on just one that I have observed.
I have a cohort of friends in the software industry, most of whom are individual contributors. I'm a programmer who likes programming, so are most of my friends, and we are also (sigh), charitably, pretty solidly middle-aged at this point, so we tend to have a lot of experience.
As such, we are often the folks that the team - or, in my case, the community - goes to when less-experienced folks need answers.
On its own, this is actually pretty great. Answering questions from more junior folks is one of the best parts of a software development job. It's an opportunity to be helpful, mostly just by knowing a thing we already knew. And it's an opportunity to help someone else improve their own agency by giving them knowledge that they can use in the future.
However, generative AI throws a bit of a wrench into the mix.
Let's imagine a scenario where we have 2 developers: Alice, a staff engineer who has a good understanding of the system being built, and Bob, a relatively junior engineer who is still onboarding.
The traditional interaction between Alice and Bob, when Bob has a question, goes like this:
- Bob gets confused about something in the system being developed, because Bob's understanding of the system is incorrect.
- Bob formulates a question based on this confusion.
- Bob asks Alice that question.
- Alice knows the system, so she gives an answer which accurately reflects the state of the system to Bob.
- Bob's understanding of the system improves, and thus he will have fewer and better-informed questions going forward.
You can imagine how repeating this simple 5-step process will eventually transform Bob into a senior developer, and then he can start answering questions on his own. Making sufficient time for regularly iterating this loop is the heart of any good mentorship process.
Now, though, with Mallory in the mix, the process now has a new decision point, changing it from a linear sequence to a flow chart.
We begin the same way, with steps 1 and 2. Bob's confused, Bob formulates a question, but then:
- Bob asks Mallory that question.
Here, our path then diverges into a "happy" path, a "meh" path, and a "sad" path.
The "happy" path proceeds like so:
- Mallory happens to formulate a correct answer.
- Bob's understanding of the system improves, and thus he will have fewer and better-informed questions going forward.
Great. Problem solved. We just saved some of Alice's time. But as we learned earlier,
Mallory can make mistakes. When that happens, we will need to check important info. So let's get checking:
- Mallory happens to formulate an incorrect answer.
- Bob investigates this answer.
- Bob realizes that this answer is incorrect because it is inconsistent with some of his prior, correct knowledge of the system, or his investigation.
- Bob asks Alice the same question; GOTO traditional interaction step 4.
On this path, Bob spent a while futzing around with Mallory, to no particular benefit. This wastes some of Bob's time, but then again, Bob could have ended up on the happy path, so perhaps it was worth the risk; at least Bob wasn't wasting any of Alice's much more valuable time in the process.16
Notice that beginning at the start of step 4, we must begin allocating all of Bob's time to C, so C already starts getting a bit bigger than if it were just Bob checking Mallory's output specifically on tasks that Bob is doing.
That brings us to the "sad" path.
- Mallory happens to formulate an incorrect answer.
- Bob investigates this answer.
- Bob does not realize that this answer is incorrect because he is unable to recognize any inconsistencies with his existing, incomplete knowledge of the system.
- Bob integrates Mallory's incorrect information of the system into his mental model.
- Bob proceeds to make a larger and larger mess of his work, based on an incorrect mental model.
- Eventually, Bob asks Alice a new, worse question, based on this incorrect understanding.
- Sadly we cannot return to the happy path at this point, because now Alice must unravel the complex series of confusing misunderstandings that Mallory has unfortunately conveyed to Bob at this point. In the really sad case, Bob actually doesn't believe Alice for a while, because Mallory seems unbiased17, and Alice has to waste even more time convincing Bob before she can simply explain to him.
Now, we have wasted some of Bob's time, and some of Alice's time. Everything from step 5-10 is C, and as soon as Alice gets involved, we are now adding to C at double real-time. If more team members are pulled in to the investigation, you are now multiplying C by the number of investigators, potentially running at triple or quadruple real time.
But That's Not All
Here I've presented a brief selection reasons why C will be both large, and larger than you expect. To review:
- Gambling-style mechanics of the user interface will interfere with your own self-monitoring and developing a good estimate.
- You can't use human heuristics for quickly spotting bad answers.
- Wrong answers given to junior people who can't evaluate them will waste more time from your more senior employees.
But this is a small selection of ways that Mallory's output can cost you money and time. It's harder to simplistically model second-order effects like this, but there's also a broad range of possibilities for ways that, rather than simply checking and catching errors, an error slips through and starts doing damage. Or ways in which the output isn't exactly wrong, but still sub-optimal in ways which can be difficult to notice in the short term.
For example, you might successfully vibe-code your way to launch a series of applications, successfully "checking" the output along the way, but then discover that the resulting code is unmaintainable garbage that prevents future feature delivery, and needs to be re-written18. But this kind of intellectual debt isn't even specific to technical debt while coding; it can even affect such apparently genAI-amenable fields as LinkedIn content marketing19.
Problems with the Prediction of P
C isn't the only challenging term though. P, is just as, if not more important, and just as hard to measure.
LLM marketing materials love to phrase their accuracy in terms of a percentage. Accuracy claims for LLMs in general tend to hover around 70%20. But these scores vary per field, and when you aggregate them across multiple topic areas, they start to trend down. This is exactly why "agentic" approaches for more immediately-verifiable LLM outputs (with checks like "did the code work") got popular in the first place: you need to try more than once.
Independently measured claims about accuracy tend to be quite a bit lower21. The field of AI benchmarks is exploding, but it probably goes without saying that LLM vendors game those benchmarks22, because of course every incentive would encourage them to do that. Regardless of what their arbitrary scoring on some benchmark might say, all that matters to your business is whether it is accurate for the problems you are solving, for the way that you use it. Which is not necessarily going to correspond to any benchmark. You will need to measure it for yourself.
With that goal in mind, our formulation of P must be a somewhat harsher standard than "accuracy". It's not merely "was the factual information contained in any generated output accurate", but, "is the output good enough that some given real knowledge-work task is done and the human does not need to issue another prompt"?
Surprisingly Small Space for Slip-Ups
The problem with reporting these things as percentages at all, however, is that our actual definition for P is 1attempts, where attempts for any given attempt, at least, must be an integer greater than or equal to 1.
Taken in aggregate, if we succeed on the first prompt more often than not, we could end up with a P>12, but combined with the previous observation that you almost always have to prompt it more than once, the practical reality is that P will start at 50% and go down from there.
If we plug in some numbers, trying to be as extremely optimistic as we can, and say that we have a uniform stream of tasks, every one of which can be addressed by Mallory, every one of which:
- we can measure perfectly, with no overhead
- would take a human 45 minutes
- takes Mallory only a single minute to generate a response
- Mallory will require only 1 re-prompt, so "good enough" half the time
- takes a human only 5 minutes to write a prompt for
- takes a human only 5 minutes to check the result of
- has a per-prompt cost of the equivalent of a single second of a human's time
Thought experiments are a dicey basis for reasoning in the face of disagreements, so I have tried to formulate something here that is absolutely, comically, over-the-top stacked in favor of the AI optimist here.
Would that be a profitable? It sure seems like it, given that we are trading off 45 minutes of human time for 1 minute of Mallory-time and 10 minutes of human time. If we ask Python:
1 2 3 4 5 |
|
We get a futzing fraction of about 0.4896. Not bad! Sounds like, at least under these conditions, it would indeed be cost-effective to deploy Mallory. But… realistically, do you reliably get useful, done-with-the-task quality output on the second prompt? Let's bump up the denominator on P just a little bit there, and see how we fare:
1 2 |
|
Oof. Still cost-effective at 0.734, but not quite as good. Where do we cap out, exactly?
1 2 3 4 5 6 7 8 9 |
|
With this little test, we can see that at our next iteration we are already at 0.9792, and by 5 tries per prompt, even in this absolute fever-dream of an over-optimistic scenario, with a futzing fraction of 1.2240, Mallory is now a net detriment to our bottom line.
Harm to the Humans
We are treating H as functionally constant so far, an average around some hypothetical Gaussian distribution, but the distribution itself can also change over time.
Formally speaking, an increase to H would be good for our fraction. Maybe it would even be a good thing; it could mean we're taking on harder and harder tasks due to the superpowers that Mallory has given us.
But an observed increase to H would probably not be good. An increase could also mean your humans are getting worse at solving problems, because using Mallory has atrophied their skills23 and sabotaged learning opportunities2425. It could also go up because your senior, experienced people now hate their jobs26.
For some more vulnerable folks, Mallory might just take a shortcut to all these complex interactions and drive them completely insane27 directly. Employees experiencing an intense psychotic episode are famously less productive than those who are not.
This could all be very bad, if our futzing fraction eventually does head north of 1 and you need to reconsider introducing human-only workflows, without Mallory.
Abridging the Artificial Arithmetic (Alliteratively)
To reiterate, I have proposed this fraction:
which shows us positive ROI when FF is less than 1, and negative ROI when it is more than 1.
This model is heavily simplified. A comprehensive measurement program that tests the efficacy of any technology, let alone one as complex and rapidly changing as LLMs, is more complex than could be captured in a single blog post.
Real-world work might be insufficiently uniform to fit into a closed-form solution like this. Perhaps an iterated simulation with variables based on the range of values seem from your team's metrics would give better results.
However, in this post, I want to illustrate that if you are going to try to evaluate an LLM-based tool, you need to at least include some representation of each of these terms somewhere. They are all fundamental to the way the technology works, and if you're not measuring them somehow, then you are flying blind into the genAI storm.
I also hope to show that a lot of existing assumptions about how benefits might be demonstrated, for example with user surveys about general impressions, or by evaluating artificial benchmark scores, are deeply flawed.
Even making what I consider to be wildly, unrealistically optimistic assumptions about these measurements, I hope I've shown:
- in the numerator, C might be a lot higher than you expect,
- in the denominator, P might be a lot lower than you expect,
- repeated use of an LLM might make H go up, but despite the fact that it's in the denominator, that will ultimately be quite bad for your business.
Personally, I don't have all that many concerns about E and I. E is still seeing significant loss-leader pricing, and I might not be coming down as fast as vendors would like us to believe, if the other numbers work out I don't think they make a huge difference. However, there might still be surprises lurking in there, and if you want to rationally evaluate the effectiveness of a model, you need to be able to measure them and incorporate them as well.
In particular, I really want to stress the importance of the influence of LLMs on your team dynamic, as that can cause massive, hidden increases to C. LLMs present opportunities for junior employees to generate an endless stream of chaff that will simultaneously:
- wreck your performance review process by making them look much more productive than they are,
- increase stress and load on senior employees who need to clean up unforeseen messes created by their LLM output,
- and ruin their own opportunities for career development by skipping over learning opportunities.
If you've already deployed LLM tooling without measuring these things and without updating your performance management processes to account for the strange distortions that these tools make possible, your Futzing Fraction may be much, much greater than 1, creating hidden costs and technical debt that your organization will not notice until a lot of damage has already been done.
If you got all the way here, particularly if you're someone who is enthusiastic about these technologies, thank you for reading. I appreciate your attention and I am hopeful that if we can start paying attention to these details, perhaps we can all stop futzing around so much with this stuff and get back to doing real work.
Acknowledgments
Thank you to my patrons who are supporting my writing on this blog. If you like what you've read here and you'd like to read more of it, or you'd like to support my various open-source endeavors, you can support my work as a sponsor!
-
I do not share this optimism, but I want to try very hard in this particular piece to take it as a given that genAI is in fact helpful. ↩
-
If we could have a better prompt on demand via some repeatable and automatable process, surely we would have used a prompt that got the answer we wanted in the first place. ↩
-
The software idea of a "user agent" straightforwardly comes from the legal principle of an agent, which has deep roots in common law, jurisprudence, philosophy, and math. When we think of an agent (some software) acting on behalf of a principal (a human user), this historical baggage imputes some important ethical obligations to the developer of the agent software. genAI vendors have been as eager as any software vendor to dodge responsibility for faithfully representing the user's interests even as there are some indications that at least some courts are not persuaded by this dodge, at least by the consumers of genAI attempting to pass on the responsibility all the way to end users. Perhaps it goes without saying, but I'll say it anyway: I don't like this newer interpretation of "agent". ↩
-
"Vending-Bench: A Benchmark for Long-Term Coherence of Autonomous Agents", Axel Backlund, Lukas Petersson, Feb 20, 2025 ↩
-
"random thing are happening, maxed out usage on api keys", @leojr94 on Twitter, Mar 17, 2025 ↩
-
"New study sheds light on ChatGPT's alarming interactions with teens" ↩
-
"Lawyers submitted bogus case law created by ChatGPT. A judge fined them $5,000", by Larry Neumeister for the Associated Press, June 22, 2023 ↩
-
During which a human will be busy-waiting on an answer. ↩
-
Given the fluctuating pricing of these products, and fixed subscription overhead, this will obviously need to be amortized; including all the additional terms to actually convert this from your inputs is left as an exercise for the reader. ↩
-
I feel like I should emphasize explicitly here that everything is an average over repeated interactions. For example, you might observe that a particular LLM has a low probability of outputting acceptable work on the first prompt, but higher probability on subsequent prompts in the same context, such that it usually takes 4 prompts. For the purposes of this extremely simple closed-form model, we'd still consider that a P of 25%, even though a more sophisticated model, or a monte carlo simulation that sets progressive bounds on the probability, might produce more accurate values. ↩
-
No it isn't, actually, but for the sake of argument let's grant that it is. ↩
-
It's worth noting that all this expensive measuring itself must be included in C until you have a solid grounding for all your metrics, but let's optimistically leave all of that out for the sake of simplicity. ↩
-
"AI Company Poll Finds 45% of Workers Trust the Tech More Than Their Peers", by Suzanne Blake for Newsweek, Aug 13, 2025 ↩
-
AI Chatbots Remain Overconfident - Even When They're Wrong by Jason Bittel for the Dietrich College of Humanities and Social Sciences at Carnegie Mellon University, July 22, 2025 ↩
-
AI Mistakes Are Very Different From Human Mistakes by Bruce Schneier and Nathan E. Sanders for IEEE Spectrum, Jan 13, 2025 ↩
-
Foreshadowing is a narrative device in which a storyteller gives an advance hint of an upcoming event later in the story. ↩
-
"People are worried about the misuse of AI, but they trust it more than humans" ↩
-
"Why I stopped using AI (as a Senior Software Engineer)", theSeniorDev YouTube channel, Jun 17, 2025 ↩
-
"I was an AI evangelist. Now I'm an AI vegan. Here's why.", Joe McKay for the greatchatlinkedin YouTube channel, Aug 8, 2025 ↩
-
"Study Finds That 52 Percent Of ChatGPT Answers to Programming Questions are Wrong", by Sharon Adarlo for Futurism, May 23, 2024 ↩
-
"Off the Mark: The Pitfalls of Metrics Gaming in AI Progress Races", by Tabrez Syed on BoxCars AI, Dec 14, 2023 ↩
-
"I tried coding with AI, I became lazy and stupid", by Thomasorus, Aug 8, 2025 ↩
-
"How AI Changes Student Thinking: The Hidden Cognitive Risks" by Timothy Cook for Psychology Today, May 10, 2025 ↩
-
"Increased AI use linked to eroding critical thinking skills" by Justin Jackson for Phys.org, Jan 13, 2025 ↩
-
"AI could end my job - Just not the way I expected" by Manuel Artero Anguita on dev.to, Jan 27, 2025 ↩
-
"The Emerging Problem of "AI Psychosis"" by Gary Drevitch for Psychology Today, July 21, 2025. ↩
15 Aug 2025 7:51am GMT
09 Aug 2025
Planet Twisted
Glyph Lefkowitz: R0ML’s Ratio
My father, also known as "R0ML" once described a methodology for evaluating volume purchases that I think needs to be more popular.
If you are a hardcore fan, you might know that he has already described this concept publicly in a talk at OSCON in 2005, among other places, but it has never found its way to the public Internet, so I'm giving it a home here, and in the process, appropriating some of his words.1
Let's say you're running a circus. The circus has many clowns. Ten thousand clowns, to be precise. They require bright red clown noses. Therefore, you must acquire a significant volume of clown noses. An enterprise licensing agreement for clown noses, if you will.
If the nose plays, it can really make the act. In order to make sure you're getting quality noses, you go with a quality vendor. You select a vendor who can supply noses for $100 each, at retail.
Do you want to buy retail? Ten thousand clowns, ten thousand noses, one hundred dollars: that's a million bucks worth of noses, so it's worth your while to get a good deal.
As a conscientious executive, you go to the golf course with your favorite clown accessories vendor and negotiate yourself a 50% discount, with a commitment to buy all ten thousand noses.
Is this a good deal? Should you take it?
To determine this, we will use an analytical tool called R0ML's Ratio (RR).
The ratio has 2 terms:
- the Full Undiscounted Retail List Price of Units Used (FURLPoUU), which can of course be computed by the individual retail list price of a single unit (in our case, $100) multiplied by the number of units used
- the Total Price of the Entire Enterprise Volume Licensing Agreement (TPotEEVLA), which in our case is $500,000.
It is expressed as:
RR = TPotEEVLA FURLPoUU
Crucially, you must be able to compute the number of units used in order to complete this ratio. If, as expected, every single clown wears their nose at least once during the period of the license agreement, then our Units Used is 10,000, our FURLPoUU is $1,000,000 and our TPotEEVLA is $500,000, which makes our RR 0.5.
Congratulations. If R0ML's Ratio is less than 1, it's a good deal. Proceed.
But… maybe the nose doesn't play. Not every clown's costume is an exact clone of the traditional, stereotypical image of a clown. Many are avant-garde. Perhaps this plentiful proboscis pledge was premature. Here, I must quote the originator of this theoretical framework directly:
What if the wheeze doesn't please?
What if the schnozz gives some pause?
In other words: what if some clowns don't wear their noses?
If we were to do this deal, and then ask around afterwards to find out that only 200 of our 10,000 clowns were to use their noses, then FURLPoUU comes out to 200 * $100, for a total of $20,000. In that scenario, RR is 25, which you may observe is substantially greater than 1.
If you do a deal where R0ML's ratio is greater than 1, then you are the bozo.
I apologize if I have belabored this point. As R0ML expressed in the email we exchanged about this many years ago,
I do not mind if you blog about it - and I don't mind getting the credit - although one would think it would be obvious.
And yeah, one would think this would be obvious? But I have belabored it because many discounted enterprise volume purchasing agreements still fail the R0ML's Ratio Bozo Test.2
In the case of clown noses, if you pay the discounted price, at least you get to keep the nose; maybe lightly-used clown noses have some resale value. But in software licensing or SaaS deals, once you've purchased the "discounted" software or service, once you have provisioned the "seats", the money is gone, and if your employees don't use it, then no value for your organization will ever result.
Measuring number of units used is very important. Without this number, you have no idea if you are a bozo or not.
It is often better to give your individual employees a corporate card and allow them to make arbitrary individual purchases of software licenses and SaaS tools, with minimal expense-reporting overhead; this will always keep R0ML's Ratio at 1.0, and thus, you will never be a bozo.
It is always better to do that the first time you are purchasing a new software tool, because the first time making such a purchase you (almost by definition) have no information about "units used" yet. You have no idea - you cannot have any idea - if you are a bozo or not.
If you don't know who the bozo is, it's probably you.
Acknowledgments
Thank you for reading, and especially thank you to my patrons who are supporting my writing on this blog. Of course, extra thanks to dad for, like, having this idea and doing most of the work here beyond my transcription. If you like my dad's ideas and you'd like to post more of them, or you'd like to support my various open-source endeavors, you can support my work as a sponsor!
09 Aug 2025 4:41am GMT
