25 Jan 2025
LXer Linux News
ESP32-S3 Development Board with 2-Inch Display, Camera Interface, and 6-Axis IMU
The ESP32-S3-LCD-2 is a compact development board based on the ESP32-S3R8 processor. It integrates features such as wireless connectivity, a small LCD display, and various interface options, including a battery connector for portable applications.
25 Jan 2025 6:21pm GMT
TEAMGROUP MP44L 2TB M.2 NVMe SSD Review
The SSD uses the Phison PS5021-E21T controller. It's a DRAM-less quad-channel controller that supports a PCIe 4.0 x4 interface.
25 Jan 2025 4:50pm GMT
x86 32-bit Operating Systems Arent Dead Yet: New Linux Patches Improve 32-bit PAE
The Linux x86 32-bit PAE kernel support for Physical Address Extensions allows addressing more than 4GB of memory for those still running 32-bit processors or otherwise opting to still run a 32-bit OS.
25 Jan 2025 3:18pm GMT
Linux Today
AI Document Editing: Connect GPT4All to ONLYOFFICE on Ubuntu
In this article, you will learn how to enable AI-powered document editing on Ubuntu through the example of ONLYOFFICE Desktop Editors, an open-source office package for Linux, and GPT4All, an open-source platform designed to run local AI models.
The post AI Document Editing: Connect GPT4All to ONLYOFFICE on Ubuntu appeared first on Linux Today.
25 Jan 2025 2:38pm GMT
Planet Python
Real Python: Python and TOML: New Best Friends
TOML stands for Tom's Obvious Minimal Language. Its human-readable syntax makes TOML convenient to parse into data structures across various programming languages. In Python, you can use the built-in tomllib
module to work with TOML files. TOML plays an essential role in the Python ecosystem. Many of your favorite tools rely on TOML for configuration, and you'll use pyproject.toml
when you build and distribute your own packages.
By the end of this tutorial, you'll understand that:
- TOML in Python refers to a minimal configuration file format that's convenient to read and parse.
- TOML files are primarily used for configuration, separating code from settings for flexibility.
pyproject.toml
is crucial for package configuration and specifies the build system and dependencies.- Loading a TOML file in Python involves using
tomli
ortomllib
to parse it into a dictionary. tomli
andtomllib
differ mainly in origin, withtomllib
being a standard library module in modern Python.
If you want to know more about why tomllib
was added to Python, then have a look at the companion tutorial, Python 3.11 Preview: TOML and tomllib
.
Free Download: Get a sample chapter from Python Tricks: The Book that shows you Python's best practices with simple examples you can apply instantly to write more beautiful + Pythonic code.
Use TOML as a Configuration Format
TOML is short for Tom's Obvious Minimal Language and is humbly named after its creator, Tom Preston-Werner. It was designed expressly to be a configuration file format that should be "easy to parse into data structures in a wide variety of languages" (Source).
In this section, you'll start thinking about configuration files and look at what TOML brings to the table.
Configurations and Configuration Files
A configuration is an important part of almost any application or system. It'll allow you to change settings or behavior without changing the source code. Sometimes you'll use a configuration to specify information needed to connect to another service like a database or cloud storage. Other times you'll use configuration settings to allow your users to customize their experience with your project.
Using a configuration file for your project is a good way to separate your code from its settings. It also encourages you to be conscious about which parts of your system are genuinely configurable, giving you a tool to name magic values in your source code. For now, consider this configuration file for a hypothetical tic-tac-toe game:
player_x_color = blue
player_o_color = green
board_size = 3
server_url = https://tictactoe.example.com/
You could potentially code this directly in your source code. However, by moving the settings into a separate file, you achieve a few things:
- You give explicit names to values.
- You provide these values more visibility.
- You make it simpler to change the values.
Look more closely at your hypothetical configuration file. Those values are conceptually different. The colors are values that your framework probably supports changing. In other words, if you replaced blue
with red
, that would be honored without any special handling in your code. You could even consider if it's worth exposing this configuration to your end users through your front end.
However, the board size may or may not be configurable. A tic-tac-toe game is played on a three-by-three grid. It's not certain that your logic would still work for other board sizes. It may still make sense to keep the value in your configuration file, both to give a name to the value and to make it visible.
Finally, the project URL is usually essential when deploying your application. It's not something that a typical user will change, but a power user may want to redeploy your game to a different server.
To be more explicit about these different use cases, you may want to add some organization to your configuration. One popular option is to separate your configuration into additional files, each dealing with a different concern. Another option is to group your configuration values somehow. For example, you can organize your hypothetical configuration file as follows:
[user]
player_x_color = blue
player_o_color = green
[constant]
board_size = 3
[server]
url = https://tictactoe.example.com
The organization of the file makes the role of each configuration item clearer. You can also add comments to the configuration file with instructions to anyone thinking about making changes to it.
Note: The actual format of your configuration file isn't important for this discussion. The above principles hold independently of how you specify your configuration values. As it happens, the examples that you've seen so far can be parsed by Python's ConfigParser
class.
There are many ways for you to specify a configuration. Windows has traditionally used INI files, which resemble your configuration file from above. Unix systems have also relied on plain-text, human-readable configuration files, although the actual format varies between different services.
Over time, more and more applications have come to use well-defined formats like XML, JSON, or YAML for their configuration needs. These formats were designed as data interchange or serialization formats, usually meant for computer communication.
On the other hand, configuration files are often written or edited by humans. Many developers have gotten frustrated with JSON's strict comma rules when updating their Visual Studio Code settings or with YAML's nested indentations when setting up a cloud service. Despite their ubiquity, these file formats aren't the easiest to write by hand.
TOML: Tom's Obvious Minimal Language
Read the full article at https://realpython.com/python-toml/ »
[ Improve Your Python With 🐍 Python Tricks 💌 - Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
25 Jan 2025 2:00pm GMT
Real Python: How to Flush the Output of the Python Print Function
Python's flush
parameter in the print()
function allows you to control when to empty the output data buffer, ensuring your output appears immediately. This is useful when building visual progress indicators or when piping logs to another application in real-time.
By default, print()
output is line-buffered in interactive environments and block-buffered otherwise. You can override this behavior by using the flush=True
parameter to force the buffer to clear immediately.
By the end of this tutorial, you'll understand that:
- Flush in coding refers to emptying the data buffer to ensure immediate output.
flush=True
inprint(
) forces the buffer to clear immediately.- Clearing output involves managing buffer behavior, typically with newlines.
- Turning off print buffering can be achieved using the
-u
option orPYTHONUNBUFFERED
.
By repeatedly running a short code snippet that you change only slightly, you'll see that if you run print()
with its default arguments, then its execution is line-buffered in interactive mode, and block-buffered otherwise.
You'll get a feel for what all of that means by exploring the code practically. But before you dive into changing output stream buffering in Python, it's helpful to revisit how it happens by default, and understand why you might want to change it.
Free Sample Code: Click here to download the free sample code that you'll use to dive deep into flushing the output of the Python print function.
Understand How Python Buffers Output
When you make a write call to a file-like object, Python buffers the call by default-and that's a good idea! Disk write and read operations are slow in comparison to random-access memory (RAM) access. When your script makes fewer system calls for write operations by batching characters in a RAM data buffer and writing them all at once to disk with a single system call, then you can save a lot of time.
To put the use case for buffering into a real-world context, think of traffic lights as buffers for car traffic. If every car crossed an intersection immediately upon arrival, it would end in gridlock. That's why the traffic lights buffer traffic from one direction while the other direction flushes.
Note: Data buffers are generally size-based, not time-based, which is where the traffic analogy breaks down. In the context of a data buffer, the traffic lights would switch if a certain number of cars were queued up and waiting.
However, there are situations when you don't want to wait for a data buffer to fill up before it flushes. Imagine that there's an ambulance that needs to get past the crossroads as quickly as possible. You don't want it to wait at the traffic lights until there's a certain number of cars queued up.
In your program, you usually want to flush the data buffer right away when you need real-time feedback on code that has executed. Here are a couple of use cases for immediate flushing:
-
Instant feedback: In an interactive environment, such as a Python REPL or a situation where your Python script writes to a terminal
-
File monitoring: In a situation where you're writing to a file-like object, and the output of the write operation gets read by another program while your script is still executing-for example, when you're monitoring a log file
In both cases, you need to read the generated output as soon as it generates, and not only when enough output has assembled to flush the data buffer.
There are many situations where buffering is helpful, and there are some situations where too much buffering can be a disadvantage. Therefore, there are different types of data buffering that you can implement where they fit best:
-
Unbuffered means that there's no data buffer. Every byte creates a new system call and gets written independently.
-
Line-buffered means that there's a data buffer that collects information in memory, and once it encounters a newline character (
\n
), the data buffer flushes and writes the whole line in one system call. -
Fully-buffered (block-buffered) means that there's a data buffer of a specific size, which collects all the information that you want to write. Once it's full, it flushes and sends all its contents onward in a single system call.
Python uses block buffering as a default when writing to file-like objects. However, it executes line-buffered if you're writing to an interactive environment.
To better understand what that means, write a Python script that simulates a countdown:
countdown.py
from time import sleep
for second in range(3, 0, -1):
print(second)
sleep(1)
print("Go!")
By default, each number shows up right when print()
is called in the script. But as you develop and tweak your countdown timer, you might run into a situation where all your output gets buffered. Buffering the whole countdown and printing it all at once when the script finishes would lead to a lot of confusion for the athletes waiting at the start line!
So how can you make sure that you won't run into data buffering issues as you develop your Python script?
Add a Newline for Python to Flush Print Output
If you're running a code snippet in a Python REPL or executing it as a script directly with your Python interpreter, then you won't run into any issues with the script shown above.
In an interactive environment, the standard output stream is line-buffered. This is the output stream that print()
writes to by default. You're working with an interactive environment any time that your output will display in a terminal. In this case, the data buffer flushes automatically when it encounters a newline character ("\n"
):
Read the full article at https://realpython.com/python-flush-print-output/ »
[ Improve Your Python With 🐍 Python Tricks 💌 - Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
25 Jan 2025 2:00pm GMT
Real Python: How to Download Files From URLs With Python
Python makes it straightforward to download files from a URL with its robust set of libraries. For quick tasks, you can use the built-in urllib
module or the requests
library to fetch and save files. When working with large files, streaming data in chunks can help save memory and improve performance.
You can also perform parallel file downloads using ThreadPoolExecutor
for multithreading or the aiohttp
library for asynchronous tasks. These approaches allow you to handle multiple downloads concurrently, significantly reducing the total download time if you're handling many files.
By the end of this tutorial, you'll understand that:
- You can use Python to download files with libraries like
urllib
andrequests
. - To download a file using a URL in Python, you can use
urlretrieve()
orrequests.get()
. - To extract data from a URL in Python, you use the response object from
requests
. - To download a CSV file from a URL in Python, you may need to specify the format in the URL or query parameters.
In this tutorial, you'll be downloading a range of economic data from the World Bank Open Data platform. To get started on this example project, go ahead and grab the sample code below:
Free Bonus: Click here to download your sample code for downloading files from the Web with Python.
Facilitating File Downloads With Python
While it's possible to download files from URLs using traditional command-line tools, Python provides several libraries that facilitate file retrieval. Using Python to download files offers several advantages.
One advantage is flexibility, as Python has a rich ecosystem of libraries, including ones that offer efficient ways to handle different file formats, protocols, and authentication methods. You can choose the most suitable Python tools to accomplish the task at hand and fulfill your specific requirements, whether you're downloading from a plain-text CSV file or a complex binary file.
Another reason is portability. You may encounter situations where you're working on cross-platform applications. In such cases, using Python is a good choice because it's a cross-platform programming language. This means that Python code can run consistently across different operating systems, such as Windows, Linux, and macOS.
Using Python also offers the possibility of automating your processes, saving you time and effort. Some examples include automating retries if a download fails, retrieving and saving multiple files from URLs, and processing and storing your data in designated locations.
These are just a few reasons why downloading files using Python is better than using traditional command-line tools. Depending on your project requirements, you can choose the approach and library that best suits your needs. In this tutorial, you'll learn approaches to some common scenarios requiring file retrievals.
Downloading a File From a URL in Python
In this section, you'll learn the basics of downloading a ZIP file containing gross domestic product (GDP) data from the World Bank Open Data platform. You'll use two common tools in Python, urllib
and requests
, to download GDP by country.
While the urllib
package comes with Python in its standard library, it has some limitations. So, you'll also learn to use a popular third-party library, requests
, that offers more features for making HTTP requests. Later in the tutorial, you'll see additional functionalities and use cases.
Using urllib
From the Standard Library
Python ships with a package called urllib
, which provides a convenient way to interact with web resources. It has a straightforward and user-friendly interface, making it suitable for quick prototyping and smaller projects. With urllib
, you can perform different tasks dealing with network communication, such as parsing URLs, sending HTTP requests, downloading files, and handling errors related to network operations.
As a standard library package, urllib
has no external dependencies and doesn't require installing additional packages, making it a convenient choice. For the same reason, it's readily accessible for development and deployment. It's also cross-platform compatible, meaning you can write and run code seamlessly using the urllib
package across different operating systems without additional dependencies or configuration.
The urllib
package is also very versatile. It integrates well with other modules in the Python standard library, such as re
for building and manipulating regular expressions, as well as json
for working with JSON data. The latter is particularly handy when you need to consume JSON APIs.
In addition, you can extend the urllib
package and use it with other third-party libraries, like requests
, BeautifulSoup
, and Scrapy
. This offers the possibility for more advanced operations in web scraping and interacting with web APIs.
To download a file from a URL using the urllib
package, you can call urlretrieve()
from the urllib.request
module. This function fetches a web resource from the specified URL and then saves the response to a local file. To start, import urlretrieve()
from urlllib.request
:
>>> from urllib.request import urlretrieve
Next, define the URL that you want to retrieve data from. If you don't specify a path to a local file where you want to save the data, then the function will create a temporary file for you. Since you know that you'll be downloading a ZIP file from that URL, go ahead and provide an optional path to the target file:
>>> url = (
... "https://api.worldbank.org/v2/en/indicator/"
... "NY.GDP.MKTP.CD?downloadformat=csv"
... )
>>> filename = "gdp_by_country.zip"
Because your URL is quite long, you rely on Python's implicit concatenation by splitting the string literal over multiple lines inside parentheses. The Python interpreter will automatically join the separate strings on different lines into a single string. You also define the location where you wish to save the file. When you only provide a filename without a path, Python will save the resulting file in your current working directory.
Read the full article at https://realpython.com/python-download-file-from-url/ »
[ Improve Your Python With 🐍 Python Tricks 💌 - Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
25 Jan 2025 2:00pm GMT
Linux Today
How to Install NeoVim on Ubuntu and Other Linux Distros
Discover a step-by-step guide to install the latest version of NeoVim on your Ubuntu and other Linux distributions with multiple methods.
The post How to Install NeoVim on Ubuntu and Other Linux Distros appeared first on Linux Today.
25 Jan 2025 1:38pm GMT
Django community aggregator: Community blog posts
Looking at Django task runners and queues
I use django-apscheduler to run a queue of scheduled tasks. Now I also need the ability to run one-off tasks and that turned out to not be so simple.
25 Jan 2025 5:59am GMT
24 Jan 2025
Linux Today
How To Change Directory And List Files In One Command In Fish Shell
In this guide, we will explain different methods to change directories and list files in one command using Fish Shell on Linux.
The post How To Change Directory And List Files In One Command In Fish Shell appeared first on Linux Today.
24 Jan 2025 6:23pm GMT
Django community aggregator: Community blog posts
Django News - Django 5.2 alpha 1 release - Jan 24th 2025
News
Django 5.2 alpha 1 released
Django 5.2 alpha 1 is now available. It represents the first stage in the 5.2 release cycle and is an opportunity for you to try out the changes coming in Django 5.2.
Hello from the new Steering Council; and a quick temporary voting process change
The Steering Council is officially in action, and we want to give you a heads-up on a change we're making for the short term.
Djangonaut Space - New session 2025
Djangonaut Space is holding a fourth session! This session will start on February 17th, 2025. Applications are open until January 29th, 2025, Anywhere on Earth.
Tailwind CSS v4.0 - Tailwind CSS
Tailwind CSS v4.0 is out - an all-new version of the framework optimized for performance and flexibility, with a reimagined configuration and customization experience, and taking full advantage of the latest advancements the web offers.
Django Software Foundation
President of the Django Software Foundation
Notes from the Thibaud Colas, the new President of the Django Software Foundation.
Updates to Django
Today 'Updates to Django' is presented by Velda Kiara from Djangonaut Space!
Last week we had 26 pull requests merged into Django by 10 different contributors - including 1 first-time contributor! Congratulations to Thibaut Decombe for having their first commits merged into Django - welcome on board 🎊!
Big news! Django 5.2a1 is released! Please test the alpha version, try out the new features, and report any regressions on Trac.
The following enhancements were added to Django 5.2 last week:
- Built-in aggregate functions accepting only one argument (
Avg
,Count
, etc.) now raise aTypeError
when called with an incorrect number of arguments. - The ability to override the
BoundField
class for fields, forms, and renderers, providing greater flexibility in form handling.
Django Newsletter
Wagtail CMS
Results of the 2024 Wagtail headless survey
The results of the 2024 Wagtail headless survey are here! There are plenty of expected results plus a few surprises.
Sponsored Link 1
Django & Wagtail Hosting for just $9/month
New year, new site! Finally put those domain names to use and launch that side-project, blog, or personal site in 2025. Lock in this limited time pricing by Feb 28.
Articles
Ideas how to support Tailwind CSS 4.x in django-tailwind-cli
A look at potential updates to align with Tailwind's new CSS-first configuration, simplified theme setup, and other breaking changes, while weighing user-friendly migration strategies versus manual transitions.
Monkeypatching Django
A deep dive into how the nanodjango project leverages monkeypatching and metaclass manipulation to enable full Django functionality within a single-file application, bridging the gap between lightweight prototypes and Django's powerful features.
Django Islands: Part 1 - Side Quests by Bo Lopker
On using SolidJS with Django to build a type-safe frontend that the whole team will love.
Show Django forms inside a modal using HTMX
Learn how to create and edit models using Django forms inside a modal using HTMX
Exploring Python Requirements
A guide to using the uv tool for managing Python project dependencies, showcasing features like tree views, listing outdated packages, and integrating interactive package selection with tools like wheat to streamline dependency updates.
Django admin tip: Adding links to related objects in change forms
This article explores two practical methods for enhancing the Django admin interface by linking related objects directly in change forms, improving navigation and usability.
Django Developer Salary Report 2025
An annual report from Foxley Talent on Django developer salaries.
urllib3 in 2024
Highlights from 2024 for the urllib3 team in terms of funding, features, and looking forward.
Tutorials
The definitive guide to using Django with SQLite in production
A comprehensive look at using SQLite for Django projects under real-world traffic, detailing its benefits and potential challenges and deployment strategies.
Sponsored Link 2
Buff your Monolith
Scout Monitoring delivers performance metrics, detailed traces and query stats for all your app routes with a five-minute setup. Add our Log Management and have everything you need to make your Django app shiny and fast. Get started for free at https://try.scoutapm.com/djangonews
Podcasts
Django Chat #174: Beyond Golden Pathways - Teaching Django with Sheena O'Connell
Sheena is a long-time Django developer and teacher currently focused on Prelude Tech, small workshops focused on topics like Django, HTMX, and Git.
Django News Jobs
Senior Backend Engineer at BactoBio 🆕
Senior Software Engineer at Spark
Front-end developer at cassandra.app
Lead Django Developer at cassandra.app
Développeur(se) back-end en CDI at Brief Media
Django Newsletter
Django Forum
Call for Project Ideas and Prospective Mentors for GSoC 2025
Google Summer of Code (GSoC) rolls around again. Historically this has been a real boon for Django, with successes again this year. We need project ideas and prospective mentors for 2025. Do you have one? Is that you?
Projects
Salaah01/django-action-triggers
A Django library for asynchronously triggering actions in response to database changes. It supports integration with webhooks, message brokers (e.g., Kafka, RabbitMQ), and can trigger other processes, including AWS Lambda functions.
OmenApps/django-templated-email-md
An extension for django-templated-email for creating emails with Markdown.
This RSS feed is published on https://django-news.com/. You can also subscribe via email.
24 Jan 2025 5:00pm GMT
Drupal.org aggregator
Droptica: How to Customize CMS for Your Business? A Guide Using Drupal as an Example
A CMS system is the foundation of any modern website or business platform. Choosing the right tool that adapts to a company's needs is crucial for effective content management. In this article, I'll provide a detailed guide on understanding and matching a CMS to an organization's requirements. I'll use the flexible and functional Drupal as an example. I invite you to read the article or watch an episode of "Nowoczesny Drupal" series.
24 Jan 2025 2:05pm GMT
DZone Java Zone
Multi-Tenancy and Its Improved Support in Hibernate 6.3.0
Multi-tenancy has become an important feature for modern enterprise applications that need to serve multiple clients (tenants) from a single application instance. While an earlier version of Hibernate had support for multi-tenancy, its implementation required significant manual configuration and custom strategies to handle tenant isolation, which resulted in higher complexity and slower processes, especially for applications with a number of tenants.
The latest version of Hibernate 6.3.0, which was released on December 15, 2024, addressed the above limitations with enhanced multi-tenancy support through better tools for tenant identification, schema resolution, and enhanced performance for handling tenant-specific operations. This article talks about how Hibernate 6.3.0 enhanced the traditional multi-tenancy implementation significantly.
24 Jan 2025 12:00pm GMT
23 Jan 2025
Drupal.org aggregator
ImageX: Drupal CMS 1.0 Released: Simplicity and Power at Your Fingertips
Authored by
23 Jan 2025 8:43pm GMT
DZone Java Zone
Multi-Tenant Data Isolation and Row Level Security
Over the past one and a half years, I was involved in designing and developing a multi-tenant treasury management system. In this article, I will share our approaches to the data isolation aspect of our multi-tenant solution and the learnings from it.
Background and Problem Regarding Data Isolation
Before going into the problem that I will focus on today, I must first give some background into our architecture for storage and data in our system. When it comes to data partitioning for SaaS systems, at the extreme far right end, we have the approach of using dedicated databases for each tenant (silo model), and on the other side of the spectrum is the shared database model (pool model).
23 Jan 2025 6:00pm GMT
Drupal.org aggregator
Tag1 Consulting: Migrating Your Data from D7 to D10: Paragraph migration. Creating custom process plugins.
Are you ready to migrate with confidence? Our newest migration guide shares proven techniques for transforming field collection data into paragraph entities, including proper entity reference handling, revision management, and custom process plugins for field transformations. Insights from our experts will help your team deliver successful results.
23 Jan 2025 3:49pm GMT
Django community aggregator: Community blog posts
Strip spaces
When I joined TriOptima back in 2010, a common pattern emerged where names of things were slightly off because of stray whitespaces. Sometimes we had duplicates like "foo"
, "foo "
and " foo"
in the database. Sometimes we couldn't find stuff in logs because you searched for "foo was deleted"
, when actually you had to search for "foo was deleted"
(notice the double space!). Sorting was "broken" because " foo"
and "foobar"
are not next to each other. And more issues that I can't remember…
It was everywhere, causing problems across the entire code base. Each individual issue was easily fixed by cleaning up the data, but it added up to an annoying support burden. My fix at the time was to make a function that took a Django Form
instance and returned a new instance with space stripping on all fields. Something like:
form = auto_strip(Form(...))
After I added that to every single Django form in the entire code base that slow and steady trickle of bugs and annoyances just stopped. From seeing a few a month consistently to zero for the next ~9 years. Even better: I never got a complaint about it.
This was fixed in Django 1.9 after some fierce debating back and forth ("will never happen" was uttered). In iommi forms we've had this since the beginning, which turns out to be a few months ahead of when Django took this decision (although it was tri.form and not iommi back then).
Just when you think you're out
Unfortunately the story doesn't end here. I started getting this issue again, and it took me a while to realize it's because of SPA-like pages that uses Pydantic serializers instead of a form library. To solve this I added this base class for my schemas:
class Schema(ninja.Schema):
@validator('*')
def strip_spaces(cls, v: Any) -> Any:
if isinstance(v, str) and '\n' not in v:
return v.strip(' ')
return v
The reason I'm not stripping spaces if the text contains a newline is to avoid situations where you have multiline text fields with indented code. Maybe it's just programmers who will care, but we tend to care a lot :P
Modifying data silently and by default like this sounds like a bad idea and I also get a little pit in my stomach when I think about it with that frame of mind, but this specific case seems like all win and no downside from my 14 years of experience doing it.
23 Jan 2025 6:00am GMT
20 Jan 2025
DZone Java Zone
Create a Custom Logger to Log Response Details With Playwright Java
While working on the series of tutorial blogs for GET, POST, PUT, PATCH, and DELETE requests for API Automation using Playwright Java. I noticed that there is no logging method provided by the Playwright Java framework to log the requests and responses.
In the REST-assured framework, we have the log().all()
method available that is used for logging the request as well as the response. However, Playwright does not provide any such method. However, Playwright offers a text()
method in the APIResponse
interface that could be well used to extract the response text.
20 Jan 2025 7:00pm GMT
18 Jan 2025
Planet Lisp
Joe Marshall: Valid Use Case for Copilot
Our compay proides us with GitHub copilot, which is yet another example of an "AI" engine. I've reviewed it before and I wasn't particulary impressed. It writes mediocre code.
However, I found a use case where it really shines. I was setting up the blank project repositories for our upcoming hackathon. This involved creating a bunch of otherwise empty repositories with a customize README.md file. The simple solution was to automate this with an ad hoc shell script.
I don't make a practice of writing shell scripts. I write a few every now and then, but I don't live and breathe bash
. It takes me a while because I forget the flags. I have to look up the exact syntax for things like the find
command. Then I have to fight with the shell to get the quoting and escaping right. It's a tedious chore. No doubt I would become adept at it if it were something I did daily, but it's not.
So I decided to ask GitHub copilot to write some scripts for me. I gave it my problem statement: start with a file with a list of repository and description pairs. For each pair, look for the subdirectory with that name and descend into it. Use the sed command to modify the README.md file in the subdirectory to add the repository discription. Then use git to commit and push the change.
Copilot gave this a couple of seconds thought and came up with this script:
while IFS= read -r line; do dir_name=$(echo "$line" | cut -d' ' -f1) description=$(echo "$line" | cut -d' ' -f2-) if [ -d "$dir_name" ]; then sed -i "s/\[\[github_repo_description\]\]/$description/g" "$dir_name/README.md" else echo "Directory $dir_name does not exist." fi done < descriptions.txt find . -type d -name "hack*" -exec sh -c ' cd "$0" if [ -f README.md ]; then git add README.md git commit -m "Update description." git push fi ' {} \;
It isn't a script that'd I'd write home about, but it got the job done. I ran the script and had the repositories set up in couple of minutes. I could have written the script myself, but it would have take me longer. I had forgotten the options to the find
command. I had forgotten the arguments to the cut
command. I would have had to look them up. Copilot saved me that time.
A co-worker of mine questioned the engineering tradeoff of using a resource hog like generative AI to write crappy, throwaway shell scripts. From the standpoint of an indiviual developer, though, this is the first use case for copilot that I've where it actualy saved me time and effort.
18 Jan 2025 5:38am GMT
17 Jan 2025
Planet Lisp
Joe Marshall: Iteration
Iteration is simply that special case of recursion that doesn't accumulate storage in the long term. It's a notable special case because computer storage is finite, and you want to be able to write agorithms that are bound by constant space.
There are two common strategies that computer languages use to approach iteration. Functional languages like Scheme and Haskell make sure that normal function calls do not accumulate storage per se. Function calls can be used to direct the control flow, and if they direct the control flow in a loop, you will iterate. Most other languages achieve iteration via special iteration constructs that you must use if you want to iterate. Each of these approaches has its own advantages and disadvantages.
The advantage of using special iteration constructs are these:
- It is clear that you are iterating.
- Special constructs are usually optimized for iteration and have particular compiler support to make them efficient.
- Special constructs are constrained so that you cannot accidentally write non-iterative code.
The disadvantage of using special iteration constructs are these:
- Special constructs are drawn from a fixed set of constructs that are built in to the language. If you want to iterate differently, you are out of luck.
- Special constructs usually do not cross function boundaries. Iteration must reside in a single function.
- You have to decide beforehand that you want to iterate and choose an iteration construct.
- Special constructs are usually imperative in nature and operate via side effects.
The alternative approach used by functional languages is to make the language implementation tail recursive. This has these advantages:
- Iteration is automatic. You don't have to decide that you want to iterate, it just happens when it can.
- Iteration can cross function boundaries.
- You can write your own iteration constructs and build them out of ordinary functions.
- Iteration can be done purely functionally, without side effects.
The disadvantages of using tail recursion for iteration are these:
- It is not obvious that you are iterating or intended to.
- You have to be careful to place all the iteration in tail position or you will blow the stack. Beginner programmers often have difficulty recognizing which calls are tail calls and can find it hard to avoid blowing the stack.
- Small, innocent looking changes in the code can change its behavior to be non tail recursive, again blowing the stack.
- The stack no longer contains a complete call history. If you rely on the stack as a call history buffer for debugging, you may find debugging more difficult.
The code in an iteration can be classified as being part of the machinery of iteration - the part that sets up the itertion, tests the ending conditional, and advances to the next iteration - or part of the logic of the iteration - the specific part that you are repeating. The machinery of the iteration is usually the same across many iterations, while the logic of the iteration is idiomatic to the specific instance of iteration. For example, all iterations over a list will have a null test, a call to CDR to walk down the list, and a call to CAR to fetch the current element, but each specific iteration over a list will do something different to the current element.
There are several goals in writing iterative code. One is to have efficient code that performs well. Another is to have clear code that is easy to understand, debug, and maintain. You choose how to iterate based on these considerations. For the highest performing code, you will want detailed control over what the code is doing. You may wish to resort to using individual assignments and GOTO
statements to squeeze the last clock cycles out of an inner loop. For the clearest code, you will want to use a high degree of abstraction. A clever compiler can generate efficient code from highly abstracted code, and experienced programmers know how to write abstract code that can be compiled to efficient code.
Here are some examples of iteration strategies Lisp. To make these examples easy to compare I chose a simple problem to solve: given a list of numbers, return both a list of the squares of the numbers and the sum of the squares. This is a simple problem that can be solved in many ways.
Tagbody and Go
A tagbody
is a block of code that is labeled with tags. You can jump to a tag with a go
statement. This is a very low level form of iteration that is not used much in modern Lisp programming. Here is an example of a tagbody
:
(defun iteration-example-with-tagbody (numbers) (let ((squares '()) (total 0) (nums numbers)) (tagbody start (if (null nums) (go end)) (let ((square (* (car nums) (car nums)))) (setq squares (cons square squares)) (incf total square)) (setq nums (cdr nums)) (go start) end (values (nreverse squares) total))))
This is like programming in assembly code. The go
instructions turn into jumps. This code is very efficient, but it is not particularly clear. The machinery of the iteration is mixed in with the logic of the iteration, making it hard to see what is going on. The code is not very abstract.
State Machine via Mutual Tail Recursion
Here we use tail recursion to iterate. The compiler will turn the tail recursive call into a jump and the variable rebinding into assignments, so this code will be about as efficient as the tagbody
code above.
(defun iteration-example-tail-recursive (numbers &optional (squares '()) (total 0)) (if (null numbers) (values (nreverse squares) total) (let ((square (* (car numbers) (car numbers)))) (iteration-example-tail-recursive (cdr numbers) (cons square squares) (+ total square)))))
This state machine only has one state, so it is not a very interesting state machine. The ultimate in iteration control is to write an iterative state machine using mutually tail recursive functions. The compiler will generate very efficient code for this, and you can write the code in a very abstract way. Here is an example of a state machine that simulates the action of a turnstile:
(defun turnstile (actions) "State machine to simulate a turnstile with actions 'push', 'coin', and 'slug'." (locked-state actions '() '())) (defun locked-state (actions collected return-bucket) (cond ((null actions) (list collected return-bucket)) ((eql (car actions) 'coin) (unlocked-state (cdr actions) collected return-bucket)) ((eql (car actions) 'push) (locked-state (cdr actions) collected return-bucket)) ;; Ignore push in locked state ((eql (car actions) 'slug) (locked-state (cdr actions) collected (append return-bucket '(slug)))) ;; Return slug (t (locked-state (cdr actions) collected return-bucket)))) (defun unlocked-state (actions collected return-bucket) (cond ((null actions) (list collected return-bucket)) ((eql (car actions) 'push) (locked-state (cdr actions) (append collected '(coin)) return-bucket)) ((eql (car actions) 'coin) (unlocked-state (cdr actions) collected (append return-bucket '(coin)))) ;; Return coin ((eql (car actions) 'slug) (unlocked-state (cdr actions) collected (append return-bucket '(slug)))) ;; Return slug (t (unlocked-state (cdr actions) collected return-bucket)))) ;; Example usage: (turnstile '(coin push coin push)) ;; => ((coin coin) ()) (turnstile '(push coin push)) ;; => ((coin) ()) (turnstile '(coin coin push push)) ;; => ((coin) (coin)) (turnstile '(push)) ;; => (NIL NIL) (turnstile '(coin push push)) ;; => ((coin) ()) (turnstile '(coin coin coin push)) ;; => ((coin) (coin coin)) (turnstile '(slug coin push)) ;; => ((coin) (slug)) (turnstile '(coin slug push)) ;; => ((coin) (slug)) (turnstile '(slug slug push coin push)) ;; => ((coin) (slug slug))
The iteration machinery is still interwoven with the logic of the code. We're still finding calls to null
and cdr
sprinkled around the code. Nonetheless, structuring iterative code this way is a big step up from using a tagbody
and go
. This is my go-to method for compex iterations that cannot easily be expressed as a map
or reduce
.
Loop Macro
Common Lisp's loop
macro is a very powerful iteration construct that can be used to express a wide variety of iteration patterns.
defun loop-iteration-example (numbers) (loop for num in numbers for square = (* num num) collect square into squares sum square into total finally (return (values squares total))))
Call me a knee-jerk anti-loopist, but this doesn't look like Lisp to me. It has some major problems:
- It is highly imperative. To understand what is going on, you have to follow the code in the order it is written. You need to have a mental model of the state of the loop at each point in the iteration. Running into a
loop
when reading functional code takes you out of the zen of functional programming. - The bound variables are not lexical, they are scattered around the code. You have to carefully examine each clause to figure out what variables are being bound.
- You need a parser to walk the code. There is nothing that delimits the clauses of the loop; it is a flat list of random symbols and forms. You couldn't easily write a program that takes a loop form and transforms it in some way.
Do and Friends
The do
macro, and its friends dolist
, dotimes
, and do*
, etc., are the most common iteration constructs in Common Lisp.
(defun iteration-example-with-do (numbers) (let ((squares '()) (total 0)) (do ((nums numbers (cdr nums))) ((null nums) (values (nreverse squares) total)) (let ((square (* (car nums) (car nums)))) (setq squares (cons square squares)) (incf total square)))))
The do
macros have some drawbacks:
- They are imperative. The body of a
do
loop ultimately must have some sort of side effect or non-local exit to "get a value out". Notice how we bind accumulator variables in an outer scope and assign them in the inner one. This is a common pattern in ado
loop. - They do not compose. You can nest a
dotimes
inside adolist
, e.g., but you cannot run adotimes
in parallel with adolist
. - They are incomplete. There is no
do-array
ordo-string
, for example.
But at least you can parse them and transform them. They are structured, and you can write a program that walks the clauses of a do
loop and does something with them.
Map and Reduce
Map and reduce abstract the machinery of iteration away from the logic of the iteration through use of a monoid (a higher order function). The resulting code is clear and concise:
(defun iteration-example-with-map-reduce (numbers) (let* ((squares (map 'list (lambda (num) (* num num)) numbers)) (total (reduce #'+ squares))) (values squares total)))
The looping is implicit in the mapcar
and reduce
functions. You can usually make the assumption that the language implemetors have optimized these functions to be reasonably efficient.
I often see programmers writing looping code when a perfectly good library function exists that does the same thing. For example, it is common to want to count the number of items in a sequence, and Commmon Lisp supplies the count
function just for this purpose. There is no need to write a loop.
Common Lisp provides a filter
function, but it is called remove-if-not
.
The drawback of using these functions is that large intermediate sequences can be created. In our example code, the entire list of squares is constructed prior to reducing it with #'+. Of course the entire list is one of the return values, so you need it anyway, but if you only needed the sum of the squares, you would prefer to sum it incrementally as you go along rather than constructing a list of squares and then summing it. For small sequences, it doesn't make a difference.
Series
The series macro suite attempt to bring you best of both worlds. You write series expressions that look like sequence functions, but the macro recognizes that you are iterating and generates efficient incremental code.
(defun iteration-example-with-series (numbers) (let ((squares (map-fn 'integer (lambda (n) (* n n)) (scan 'list numbers))) (values (collect 'list squares) (collect-sum squares))))
This code is very similar to the sequence case, but the series macro will generate code that does not construct the entire list of squares before summing them. It will sum them incrementally as it goes along.
Series will expand into a tagboy
. For example, the above code will expand into something like this:
(COMMON-LISP:LET* ((#:OUT-1015 NUMBERS)) (COMMON-LISP:LET (#:ELEMENTS-1012 (#:LISTPTR-1013 #:OUT-1015) (SQUARES 0) #:SEQ-1018 (#:LIMIT-1019 (COMMON-LISP:MULTIPLE-VALUE-BIND (SERIES::X SERIES::Y) (SERIES::DECODE-SEQ-TYPE (LIST 'QUOTE 'LISTS)) (DECLARE (IGNORE SERIES::X)) SERIES::Y)) (#:LST-1020 NIL) (#:SUM-1023 0)) (DECLARE (TYPE LIST #:LISTPTR-1013) (TYPE INTEGER SQUARES) (TYPE (SERIES::NULL-OR SERIES::NONNEGATIVE-INTEGER) #:LIMIT-1019) (TYPE LIST #:LST-1020) (TYPE NUMBER #:SUM-1023)) (TAGBODY #:LL-1026 (IF (ENDP #:LISTPTR-1013) (GO SERIES::END)) (SETQ #:ELEMENTS-1012 (CAR #:LISTPTR-1013)) (SETQ #:LISTPTR-1013 (CDR #:LISTPTR-1013)) (SETQ SQUARES ((LAMBDA (N) (* N N)) #:ELEMENTS-1012)) (SETQ #:LST-1020 (CONS SQUARES #:LST-1020)) (SETQ #:SUM-1023 (+ #:SUM-1023 SQUARES)) (GO #:LL-1026) SERIES::END) (COMMON-LISP:LET ((SERIES::NUM (LENGTH #:LST-1020))) (DECLARE (TYPE SERIES::NONNEGATIVE-INTEGER SERIES::NUM)) (SETQ #:SEQ-1018 (MAKE-SEQUENCE 'LISTS (OR #:LIMIT-1019 SERIES::NUM))) (DO ((SERIES::I (1- SERIES::NUM) (1- SERIES::I))) ((MINUSP SERIES::I)) (SETF (ELT #:SEQ-1018 SERIES::I) (POP #:LST-1020)))) (VALUES #:SEQ-1018 #:SUM-1023)))
90% of the time, the series macro will produce very efficient code, but 10% of the time the macro loses its lunch. It takes a little practice to get use to when the series macro will work and to write code that the series macro can handle.
Conclusion
There are many ways to iterate in Lisp, some are more efficient than others, some are more abstrac than others. You choose the way that suits your needs. I like the abstraction of the series macro, but I will also use a library function like count
when it is appropriate. When I need tight control, I'll write a state machine.
17 Jan 2025 8:36pm GMT
15 Jan 2025
Planet Lisp
vindarel: New resource specialized on web development in Common Lisp
I just released a new documentation website specialized on web development in Common Lisp:
I'd be embarrassed to tell how long it took me to grasp all the building blocks and to assemble a resource that makes sense. I hope it serves you well, now don't hesitate to share what you are building, it creates emulation!
In the first tutorial we build a simple app that shows a web form that searches and displays a list of products.
We see many necessary building blocks to write web apps in Lisp:
- how to start a server
- how to create routes
- how to define and use path and URL parameters
- how to define HTML templates
- how to run and build the app, from our editor and from the terminal.
In doing so, we'll experience the interactive nature of Common Lisp.
In the user log-in section, we build a form that checks a user name and a password:
We also introduce databases, and more topics.
The sources are here: https://github.com/web-apps-in-lisp/web-apps-in-lisp.github.io/ and the GitHub Discussions are open.
15 Jan 2025 9:39am GMT
Planet Twisted
Glyph Lefkowitz: Small PINPal Update
Today on stream, I updated PINPal to fix the memorization algorithm.
If you haven't heard of PINPal before, it is a vault password memorization tool. For more detail on what that means, you can check it out the README, and why not give it a ⭐ while you're at it.
As I started writing up an update post I realized that I wanted to contextualize it a bit more, because it's a tool I really wish were more popular. It solves one of those small security problems that you can mostly ignore, right up until the point where it's a huge problem and it's too late to do anything about it.
In brief, PINPal helps you memorize new secure passcodes for things you actually have to remember and can't simply put into your password manager, like the password to your password manager, your PC user account login, your email account1, or the PIN code to your phone or debit card.
Too often, even if you're properly using a good password manager for your passwords, you'll be protecting it with a password optimized for memorability, which is to say, one that isn't random and thus isn't secure. But I have also seen folks veer too far in the other direction, trying to make a really secure password that they then forget right after switching to a password manager. Forgetting your vault password can also be a really big deal, making you do password resets across every app you've loaded into it so far, so having an opportunity to practice it periodically is important.
PINPal uses spaced repetition to ensure that you remember the codes it generates.
While periodic forced password resets are a bad idea, if (and only if!) you can actually remember the new password, it is a good idea to get rid of old passwords eventually - like, let's say, when you get a new computer or phone. Doing so reduces the risk that a password stored somewhere on a very old hard drive or darkweb data dump is still floating around out there, forever haunting your current security posture. If you do a reset every 2 years or so, you know you've never got more than 2 years of history to worry about.
PINPal is also particularly secure in the way it incrementally generates your password; the computer you install it on only ever stores the entire password in memory when you type it in. It stores even the partial fragments that you are in the process of memorizing using the secure keyring
module, avoiding plain-text whenever possible.
I've been using PINPal to generate and memorize new codes for a while, just in case2, and the change I made today was because encountered a recurring problem. The problem was, I'd forget a token after it had been hidden, and there was never any going back. The moment that a token was hidden from the user, it was removed from storage, so you could never get a reminder. While I've successfully memorized about 10 different passwords with it so far, I've had to delete 3 or 4.
So, in the updated algorithm, the visual presentation now hides tokens in the prompt several memorizations before they're removed. Previously, if the password you were generating was 'hello world', you'd see hello world
5 times or so, times, then •••• world
; if you ever got it wrong past that point, too bad, start over. Now, you'll see hello world
, then °°°° world
, then after you have gotten the prompt right without seeing the token a few times, you'll see •••• world
after the backend has locked it in and it's properly erased from your computer.
If you get the prompt wrong, breaking your streak reveals the recently-hidden token until you get it right again. I also did a new release on that same livestream, so if this update sounds like it might make the memorization process more appealing, check it out via pip install pinpal
today.
Right now this tool is still only extremely for a specific type of nerd - it's command-line only, and you probably need to hand-customize your shell prompt to invoke it periodically. But I'm working on making it more accessible to a broader audience. It's open source, of course, so you can feel free to contribute your own code!
Acknowledgments
Thank you to my patrons who are supporting my writing on this blog. If you like what you've read here and you'd like to read more things like it, or you'd like to support my various open-source endeavors, you can support my work as a sponsor!
-
Your email account password can be stored in your password manager, of course, but given that email is the root-of-trust reset factor for so many things, being able to remember that password is very helpful in certain situations. ↩
-
Funny story: at one point, Apple had an outage which made it briefly appear as if a lot of people needed to reset their iCloud passwords, myself included. Because I'd been testing PINPal a bunch, I actually had several highly secure random passwords already memorized. It was a strange feeling to just respond to the scary password reset prompt with a new, highly secure password and just continue on with my day secure in the knowledge I wouldn't forget it. ↩
15 Jan 2025 12:54am GMT
11 Jan 2025
Kernel Planet
Pete Zaitcev: Looking for a BSSID
I'm looking for a name for a new WiFi area.
The current one is called "Tokyo-Jupiter". It turns out hard to top, it meets all the requirements. It's a geographic area. It's weeb, but from old enough times: not Naruto Shippuuden, Attack On Titan, or Kimetsu no Yaiba. Classy and unique enough.
"Konoha" is too new, too washed-up, and too short.
"Kodena" and "Yokosuka" add a patriotic American tint nicely, but also too short.
"Minas-Tirith" is a place and outstanding in its reference, but not weeb.
"Big-Sight" is an opposite of the above: too much. I'm a weeb, not otaku.
Any ideas are appreciated.
UPDATE 2025-01-11: The provisional candidate is "Nishi-Teppelin". Don't google it, it's not canon. I remain open to better ideas.
11 Jan 2025 1:42am GMT
10 Jan 2025
Planet Twisted
Glyph Lefkowitz: The “Active Enum” Pattern
Have you ever written some Python code that looks like this?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
That is to say, have you written code that:
- defined an enum with several members
- associated custom behavior, or custom values, with each member of that enum,
- needed one or more
match
/case
statements (or, if you've been programming in Python for more than a few weeks, probably a bigif
/elif
/elif
/else
tree) to do that association?
In this post, I'd like to submit that this is an antipattern; let's call it the "passive enum" antipattern.
For those of you having a generally positive experience organizing your discrete values with enums, it may seem odd to call this an "antipattern", so let me first make something clear: the path to a passive enum is going in the correct direction.
Typically - particularly in legacy code that predates Python 3.4 - one begins with a value that is a bare int
constant, or maybe a str
with some associated values sitting beside in a few global dict
s.
Starting from there, collecting all of your values into an enum at all is a great first step. Having an explicit listing of all valid values and verifying against them is great.
But, it is a mistake to stop there. There are problems with passive enums, too:
- The behavior can be defined somewhere far away from the data, making it difficult to:
- maintain an inventory of everywhere it's used,
- update all the consumers of the data when the list of enum values changes, and
- learn about the different usages as a consumer of the API
- Logic may be defined procedurally (via
if
/elif
ormatch
) or declaratively (via e.g. adict
whose keys are your enum and whose values are the required associated value).- If it's defined procedurally, it can be difficult to build tools to interrogate it, because you need to parse the AST of your Python program. So it can be difficult to build interactive tools that look at the associated data without just calling the relevant functions.
- If it's defined declaratively, it can be difficult for existing tools that do know how to interrogate ASTs (mypy, flake8, Pyright, ruff, et. al.) to make meaningful assertions about it. Does your linter know how to check that a
dict
whose keys should be every value of your enum is complete?
To refactor this, I would propose a further step towards organizing one's enum-oriented code: the active enum.
An active enum is one which contains all the logic associated with the first-party provider of the enum itself.
You may recognize this as a more generalized restatement of the object-oriented lens on the principle of "separation of concerns". The responsibilities of a class ought to be implemented as methods on that class, so that you can send messages to that class via method calls, and it's up to the class internally to implement things. Enums are no different.
More specifically, you might notice it as a riff on the Active Nothing pattern described in this excellent talk by Sandi Metz, and, yeah, it's the same thing.
The first refactoring that we can make is, thus, to mechanically move the method from an external function living anywhere, to a method on SomeNumber
. At least like this, we present an API to consumers externally that shows that SomeNumber
has a behavior
method that can be invoked.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
However, this still leaves us with a match
statement that repeats all the values that we just defined, with no particular guarantee of completeness. To continue the refactoring, what we can do is change the value of the enum itself into a simple dataclass to structurally, by definition, contain all the fields we need:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
Here, we give SomeNumber
members a value of NumberValue
, a dataclass that requires a result: int
and an effect: Callable
to be constructed. Mypy will properly notice that if x
is a SomeNumber
, that x
will have the type NumberValue
and we will get proper type checking on its result
(a static value) and effect
(some associated behaviors)1.
Note that the implementation of behavior
method - still conveniently discoverable for callers, and with its signature unchanged - is now vastly simpler.
But what about...
Lookups?
You may be noticing that I have hand-waved over something important to many enum
users, which is to say, by-value lookup. enum.auto
will have generated int values for one
, two
, and three
already, and by transforming those into NumberValue
instances, I can no longer do SomeNumber(1)
.
For the simple, string-enum case, one where you might do class MyEnum: value = "value"
so that you can do name lookups via MyEnum("value")
, there's a simple solution: use square brackets instead of round ones. In this case, with no matching strings in sight, SomeNumber["one"]
still works.
But, if we want to do integer lookups with our dataclass version here, there's a simple one-liner that will get them back for you; and, moreover, will let you do lookups on whatever attribute you want:
1 |
|
enum.Flag
?
You can do this with Flag
more or less unchanged, but in the same way that you can't expect all your list[T]
behaviors to be defined on T
, the lack of a 1-to-1 correspondence between Flag
instances and their values makes it more complex and out of scope for this pattern specifically.
3rd-party usage?
Sometimes an enum is defined in library L and used in application A, where L provides the data and A provides the behavior. If this is the case, then some amount of version shear is unavoidable; this is a situation where the data and behavior have different vendors, and this means that other means of abstraction are required to keep them in sync. Object-oriented modeling methods are for consolidating the responsibility for maintenance within a single vendor's scope of responsibility. Once you're not responsible for the entire model, you can't do the modeling over all of it, and that is perfectly normal and to be expected.
The goal of the Active Enum pattern is to avoid creating the additional complexity of that shear when it does not serve a purpose, not to ban it entirely.
A Case Study
I was inspired to make this post by a recent refactoring I did from a more obscure and magical2 version of this pattern into the version that I am presenting here, but if I am going to call passive enums an "antipattern" I feel like it behooves me to point at an example outside of my own solo work.
So, for a more realistic example, let's consider a package that all Python developers will recognize from their day-to-day work, python-hearthstone
, the Python library for parsing the data files associated with Blizzard's popular computerized collectible card game Hearthstone.
As I'm sure you already know, there are a lot of enums in this library, but for one small case study, let's look a few of the methods in hearthstone.enums.GameType
.
GameType
has already taken the "step 1" in the direction of an active enum, as I described above: as_bnet
is an instancemethod on GameType
itself, making it at least easy to see by looking at the class definition what operations it supports. However, in the implementation of that method (among many others) we can see the worst of both worlds:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
|
We have procedural code mixed with a data lookup table; raise ValueError
mixed together with value returns. Overall, it looks like this might be hard to maintain this going forward, or to see what's going on without a comprehensive understanding of the game being modeled. Of course for most python programmers that understanding can be assumed, but, still.
If GameType
were refactored in the manner above3, you'd be able to look at the member definition for GT_RANKED
and see a mapping of FormatType
to BnetGameType
, or GT_BATTLEGROUNDS_DUO_FRIENDLY
to see an unconditional value of BGT_BATTLEGROUNDS_DUO_FRIENDLY
. Given that this enum has 40 elements, with several renamed or removed, it seems reasonable to expect that more will be added and removed as the game is developed.
Conclusion
If you have large enums that change over time, consider placing the responsibility for the behavior of the values alongside the values directly, and any logic for processing the values as methods of the enum. This will allow you to quickly validate that you have full coverage of any data that is required among all the different members of the enum, and it will allow API clients a convenient surface to discover the capabilities associated with that enum.
Acknowledgments
Thank you to my patrons who are supporting my writing on this blog. If you like what you've read here and you'd like to read more of it, or you'd like to support my various open-source endeavors, you can support my work as a sponsor!
-
You can get even fancier than this, defining a
typing.Protocol
as your enum's value, but it's best to keep things simple and use a very simpledataclass
container if you can. ↩ -
derogatory ↩
-
I did not submit such a refactoring as a PR before writing this post because I don't have full context for this library and I do not want to harass the maintainers or burden them with extra changes just to make a rhetorical point. If you do want to try that yourself, please file a bug first and clearly explain how you think it would benefit their project's maintainability, and make sure that such a PR would be welcome. ↩
10 Jan 2025 10:37pm GMT
02 Jan 2025
Kernel Planet
Matthew Garrett: The GPU, not the TPM, is the root of hardware DRM
As part of their "Defective by Design" anti-DRM campaign, the FSF recently made the following claim:
Today, most of the major streaming media platforms utilize the TPM to decrypt media streams, forcefully placing the decryption out of the user's control
(from here).
This is part of an overall argument that Microsoft's insistence that only hardware with a TPM can run Windows 11 is with the goal of aiding streaming companies in their attempt to ensure media can only be played in tightly constrained environments.
I'm going to be honest here and say that I don't know what Microsoft's actual motivation for requiring a TPM in Windows 11 is. I've been talking about TPM stuff for a long time. My job involves writing a lot of TPM code. I think having a TPM enables a number of worthwhile security features. Given the choice, I'd certainly pick a computer with a TPM. But in terms of whether it's of sufficient value to lock out Windows 11 on hardware with no TPM that would otherwise be able to run it? I'm not sure that's a worthwhile tradeoff.
What I can say is that the FSF's claim is just 100% wrong, and since this seems to be the sole basis of their overall claim about Microsoft's strategy here, the argument is pretty significantly undermined. I'm not aware of any streaming media platforms making use of TPMs in any way whatsoever. There is hardware DRM that the media companies use to restrict users, but it's not in the TPM - it's in the GPU.
Let's back up for a moment. There's multiple different DRM implementations, but the big three are Widevine (owned by Google, used on Android, Chromebooks, and some other embedded devices), Fairplay (Apple implementation, used for Mac and iOS), and Playready (Microsoft's implementation, used in Windows and some other hardware streaming devices and TVs). These generally implement several levels of functionality, depending on the capabilities of the device they're running on - this will range from all the DRM functionality being implemented in software up to the hardware path that will be discussed shortly. Streaming providers can choose what level of functionality and quality to provide based on the level implemented on the client device, and it's common for 4K and HDR content to be tied to hardware DRM. In any scenario, they stream encrypted content to the client and the DRM stack decrypts it before the compressed data can be decoded and played.
The "problem" with software DRM implementations is that the decrypted material is going to exist somewhere the OS can get at it at some point, making it possible for users to simply grab the decrypted stream, somewhat defeating the entire point. Vendors try to make this difficult by obfuscating their code as much as possible (and in some cases putting some of it in-kernel), but pretty much all software DRM is at least somewhat broken and copies of any new streaming media end up being available via Bittorrent pretty quickly after release. This is why higher quality media tends to be restricted to clients that implement hardware-based DRM.
The implementation of hardware-based DRM varies. On devices in the ARM world this is usually handled by performing the cryptography in a Trusted Execution Environment, or TEE. A TEE is an area where code can be executed without the OS having any insight into it at all, with ARM's TrustZone being an example of this. By putting the DRM code in TrustZone, the cryptography can be performed in RAM that the OS has no access to, making the scraping described earlier impossible. x86 has no well-specified TEE (Intel's SGX is an example, but is no longer implemented in consumer parts), so instead this tends to be handed off to the GPU. The exact details of this implementation are somewhat opaque - of the previously mentioned DRM implementations, only Playready does hardware DRM on x86, and I haven't found any public documentation of what drivers need to expose for this to work.
In any case, as part of the DRM handshake between the client and the streaming platform, encryption keys are negotiated with the key material being stored in the GPU or the TEE, inaccessible from the OS. Once decrypted, the material is decoded (again either on the GPU or in the TEE - even in implementations that use the TEE for the cryptography, the actual media decoding may happen on the GPU) and displayed. One key point is that the decoded video material is still stored in RAM that the OS has no access to, and the GPU composites it onto the outbound video stream (which is why if you take a screenshot of a browser playing a stream using hardware-based DRM you'll just see a black window - as far as the OS can see, there is only a black window there).
Now, TPMs are sometimes referred to as a TEE, and in a way they are. However, they're fixed function - you can't run arbitrary code on the TPM, you only have whatever functionality it provides. But TPMs do have the ability to decrypt data using keys that are tied to the TPM, so isn't this sufficient? Well, no. First, the TPM can't communicate with the GPU. The OS could push encrypted material to it, and it would get plaintext material back. But the entire point of this exercise was to avoid the decrypted version of the stream from ever being visible to the OS, so this would be pointless. And rather more fundamentally, TPMs are slow. I don't think there's a TPM on the market that could decrypt a 1080p stream in realtime, let alone a 4K one.
The FSF's focus on TPMs here is not only technically wrong, it's indicative of a failure to understand what's actually happening in the industry. While the FSF has been focusing on TPMs, GPU vendors have quietly deployed all of this technology without the FSF complaining at all. Microsoft has enthusiastically participated in making hardware DRM on Windows possible, and user freedoms have suffered as a result, but Playready hardware-based DRM works just fine on hardware that doesn't have a TPM and will continue to do so.
comments
02 Jan 2025 1:14am GMT
16 Dec 2024
Planet Twisted
Glyph Lefkowitz: DANGIT
Over the last decade, it has become a common experience to be using a social media app, and to perceive that app as saying something specific to you. This manifests in statements like "Twitter thinks Rudy Giuliani has lost his mind", "Facebook is up in arms about DEI", "Instagram is going crazy for this new water bottle", "BlueSky loves this bigoted substack", or "Mastodon can't stop talking about Linux". Sometimes this will even be expressed with "the Internet" as a metonym for the speaker's preferred social media: "the Internet thinks that Kate Middleton is missing".
However, even the smallest of these networks comprises literal millions of human beings, speaking dozens of different languages, many of whom never interact with each other at all. The hot takes that you see from a certain excitable sub-community, on your particular timeline or "for you" page, are not necessarily representative of "the Internet" - at this point, a group that represents a significant majority of the entire human population.
If I may coin a phrase, I will refer to these as "Diffuse, Amorphous, Nebulous, Generalized Internet Takes", or DANGITs, which handily evokes the frustrating feeling of arguing against them.
A DANGIT is not really a new "internet" phenomenon: it is a specific expression of the availability heuristic.
If we look at our device and see a bunch of comments in our inbox, particularly if those comments have high salience via being recent, emotive, and repeated, we will naturally think that this is what The Internet thinks. However, just because we will naturally think this does not mean that we will accurately think it.
It is worth keeping this concept in mind when participating in public discourse because it leads to a specific type of communication breakdown. If you are arguing with a DANGIT, you will feel like you are arguing with someone with incredibly inconsistent, hypocritical, and sometimes even totally self-contradictory views. But to be self-contradictory, one needs to have a self. And if you are arguing with 9 different people from 3 different ideological factions, all making completely different points and not even taking time to agree on the facts beforehand, of course it's going to sound like cacophonous nonsense. You're arguing with the cacophony, it's just presented to you in a way that deceives you into thinking that it's one group.
There are subtle variations on this breakdown; for example, it can also make people's taste seem incoherent. If it seems like one week the Interior Designer internet loves stark Scandinavian minimalism, and the next week baroque Rococo styles are making a comeback, it might seem like The Internet has no coherent sense of taste, and these things don't go together. That's because it doesn't! Why would you expect it to?
Most likely, you are simply seeing some posts from minimalists, and then, separately, some posts from Rococo aficionados. Any particular person's feed may be dedicated to a specific, internally coherent viewpoint, aesthetic, or ideology, but if you dump them all into a blender to separate them from their context, of course they will look jumbled together.
This is what social media does. It is context collapse as a service. Even if you eliminate engagement-maximizing algorithms and view everything perfectly chronologically, even if you have the world's best trust & safety team making sure that there is nothing harmful and no disinformation, social media - like email - inherently remains that context-collapsing blender. There's no way for it not to be; if two people you follow, who do not follow and are not aware of each other, are both posting unrelated things at the same time, you're going to see them at around the same time.
Do not argue with a DANGIT. Discussions are the internet are famously Pyrrhic battles to begin with, but if you argue with a DANGIT it's not that you will achieve a Pyrrhic victory, you cannot possibly achieve any victory, because you are shadowboxing an imagined consensus where none exits.
You can't win against something that isn't there.
Acknowledgments
Thank you to my patrons who are supporting my writing on this blog. If you like what you've read here and you'd like to read more things like it, or you'd like to support my various open-source endeavors, you can support my work as a sponsor!
16 Dec 2024 10:58pm GMT
12 Dec 2024
Kernel Planet
Matthew Garrett: When should we require that firmware be free?
The distinction between hardware and software has historically been relatively easy to understand - hardware is the physical object that software runs on. This is made more complicated by the existence of programmable logic like FPGAs, but by and large things tend to fall into fairly neat categories if we're drawing that distinction.
Conversations usually become more complicated when we introduce firmware, but should they? According to Wikipedia, Firmware is software that provides low-level control of computing device hardware
, and basically anything that's generally described as firmware certainly fits into the "software" side of the above hardware/software binary. From a software freedom perspective, this seems like something where the obvious answer to "Should this be free" is "yes", but it's worth thinking about why the answer is yes - the goal of free software isn't freedom for freedom's sake, but because the freedoms embodied in the Free Software Definition (and by proxy the DFSG) are grounded in real world practicalities.
How do these line up for firmware? Firmware can fit into two main classes - it can be something that's responsible for initialisation of the hardware (such as, historically, BIOS, which is involved in initialisation and boot and then largely irrelevant for runtime[1]) or it can be something that makes the hardware work at runtime (wifi card firmware being an obvious example). The role of free software in the latter case feels fairly intuitive, since the interface and functionality the hardware offers to the operating system is frequently largely defined by the firmware running on it. Your wifi chipset is, these days, largely a software defined radio, and what you can do with it is determined by what the firmware it's running allows you to do. Sometimes those restrictions may be required by law, but other times they're simply because the people writing the firmware aren't interested in supporting a feature - they may see no reason to allow raw radio packets to be provided to the OS, for instance. We also shouldn't ignore the fact that sufficiently complicated firmware exposed to untrusted input (as is the case in most wifi scenarios) may contain exploitable vulnerabilities allowing attackers to gain arbitrary code execution on the wifi chipset - and potentially use that as a way to gain control of the host OS (see this writeup for an example). Vendors being in a unique position to update that firmware means users may never receive security updates, leaving them with a choice between discarding hardware that otherwise works perfectly or leaving themselves vulnerable to known security issues.
But even the cases where firmware does nothing other than initialise the hardware cause problems. A lot of hardware has functionality controlled by registers that can be locked during the boot process. Vendor firmware may choose to disable (or, rather, never to enable) functionality that may be beneficial to a user, and then lock out the ability to reconfigure the hardware later. Without any ability to modify that firmware, the user lacks the freedom to choose what functionality their hardware makes available to them. Again, the ability to inspect this firmware and modify it has a distinct benefit to the user.
So, from a practical perspective, I think there's a strong argument that users would benefit from most (if not all) firmware being free software, and I don't think that's an especially controversial argument. So I think this is less of a philosophical discussion, and more of a strategic one - is spending time focused on ensuring firmware is free worthwhile, and if so what's an appropriate way of achieving this?
I think there's two consistent ways to view this. One is to view free firmware as desirable but not necessary. This approach basically argues that code that's running on hardware that isn't the main CPU would benefit from being free, in the same way that code running on a remote network service would benefit from being free, but that this is much less important than ensuring that all the code running in the context of the OS on the primary CPU is free. The maximalist position is not to compromise at all - all software on a system, whether it's running at boot or during runtime, and whether it's running on the primary CPU or any other component on the board, should be free.
Personally, I lean towards the former and think there's a reasonably coherent argument here. I think users would benefit from the ability to modify the code running on hardware that their OS talks to, in the same way that I think users would benefit from the ability to modify the code running on hardware the other side of a network link that their browser talks to. I also think that there's enough that remains to be done in terms of what's running on the host CPU that it's not worth having that fight yet. But I think the latter is absolutely intellectually consistent, and while I don't agree with it from a pragmatic perspective I think things would undeniably be better if we lived in that world.
This feels like a thing you'd expect the Free Software Foundation to have opinions on, and it does! There are two primarily relevant things - the Respects your Freedoms campaign focused on ensuring that certified hardware meets certain requirements (including around firmware), and the Free System Distribution Guidelines, which define a baseline for an OS to be considered free by the FSF (including requirements around firmware).
RYF requires that all software on a piece of hardware be free other than under one specific set of circumstances. If software runs on (a) a secondary processor and (b) within which software installation is not intended after the user obtains the product
, then the software does not need to be free. (b) effectively means that the firmware has to be in ROM, since any runtime interface that allows the firmware to be loaded or updated is intended to allow software installation after the user obtains the product.
The Free System Distribution Guidelines require that all non-free firmware be removed from the OS before it can be considered free. The recommended mechanism to achieve this is via linux-libre, a project that produces tooling to remove anything that looks plausibly like a non-free firmware blob from the Linux source code, along with any incitement to the user to load firmware - including even removing suggestions to update CPU microcode in order to mitigate CPU vulnerabilities.
For hardware that requires non-free firmware to be loaded at runtime in order to work, linux-libre doesn't do anything to work around this - the hardware will simply not work. In this respect, linux-libre reduces the amount of non-free firmware running on a system in the same way that removing the hardware would. This presumably encourages users to purchase RYF compliant hardware.
But does that actually improve things? RYF doesn't require that a piece of hardware have no non-free firmware, it simply requires that any non-free firmware be hidden from the user. CPU microcode is an instructive example here. At the time of writing, every laptop listed here has an Intel CPU. Every Intel CPU has microcode in ROM, typically an early revision that is known to have many bugs. The expectation is that this microcode is updated in the field by either the firmware or the OS at boot time - the updated version is loaded into RAM on the CPU, and vanishes if power is cut. The combination of RYF and linux-libre doesn't reduce the amount of non-free code running inside the CPU, it just means that the user (a) is more likely to hit since-fixed bugs (including security ones!), and (b) has less guidance on how to avoid them.
As long as RYF permits hardware that makes use of non-free firmware I think it hurts more than it helps. In many cases users aren't guided away from non-free firmware - instead it's hidden away from them, leaving them less aware that their freedom is constrained. Linux-libre goes further, refusing to even inform the user that the non-free firmware that their hardware depends on can be upgraded to improve their security.
Out of sight shouldn't mean out of mind. If non-free firmware is a threat to user freedom then allowing it to exist in ROM doesn't do anything to solve that problem. And if it isn't a threat to user freedom, then what's the point of requiring linux-libre for a Linux distribution to be considered free by the FSF? We seem to have ended up in the worst case scenario, where nothing is being done to actually replace any of the non-free firmware running on people's systems and where users may even end up with a reduced awareness that the non-free firmware even exists.
[1] Yes yes SMM
comments
12 Dec 2024 3:57pm GMT
29 Nov 2024
Planet Plone - Where Developers And Integrators Write
Maurits van Rees: Lightning talks Friday
Bonnie Tyler Sprint
On 12 August 2026 there is a total solar eclipse that can be seen from Valencia, Spain. So we organise a sprint there.
This conference
We had 291 participants, 234 in person and 57 online. 13 Brazilian states (that is all of them), 14 countries.
24.5 percent women, was 13% in 2013, so that has gone up, but we are not there yet. Thank you to PyLadies and Django Girls for making this happen.
We had more than 80 presenters, about 30 lightning talks, lots of talk in the hall ways.
Thanks also to the team!
Ramiro Luz: Yoga time
Yoga exercise.
Rikupekka: University case student portal
We have a student portal at the university. But mostly:
Welcome to Jyväskylä university in Finald for Plone conference 2025, October 13-19!
Jakob: Beethovensprint
26-30 mei 2025 in Bonn, Duitsland.
Afterwards, on May 30 and June 1 there will be FedCon in Bonn, a SciFi convention.
Piero/Victor: BYOUI
Add-ons first development with @plone/registry. See https://plone-registry.readthedocs.io/
It allows for development that is framework agnostic, so it is not only for Plone. It is around configuration that can be extended and injected, which is tricky in most javascript frameworks.
Imagine it.
Ana Dulce: 3D printing
For a difficult model I had trust the process, it took a week, but it worked.
Renan & Iza: Python Brasil
We organised the Python Brasil conference from 16 to 23 October this year in Rio de Janeiro.
Next year 21-27 October in São Paulo.
Erico: Python Cerrado
31 July to 2 August 2025 is the next Python Cerrado conference.
29 Nov 2024 10:25pm GMT
Maurits van Rees: Paul Roeland: The value of longevity
Link to talk information on Plone conference website.
I work for the Clean Clothes Campaign: https://cleanclothes.org/
After three large disasters in factories in 2012 and 2013 with over 1000 deaths, it took three years to get an agreement with clothes manufacturers to get 30 million dollar compensation. It does not bring lives back, but it helps the survivors.
See Open Supply Hub for open data that we collected, for checking which brands are produced in which factories.
Documenting history matters. Stories must be told.
The global closing industry is worth around 1.8 trillion dollars, in a country that would put them on the 12th place in the world. 75 million workers.
Our strongest weapon: backlinks. We have links from OECD, UN, wikipedia, school curriculum, books. Especially those last two don't change ever, so you should never change urls.
Plone: enable the sitemap, please, why not by default? Create a good robots.txt. I weekly check Google Search console, looking for broken links. Tag early, tag often, great tool, even if you have an AI do it.
Our website: started 1998 written in Notepad, 2004 Dreamweaver, 2006 Bluefish, 2010 Joomla, 2013 Plone 4, 2020 Castle CMS (opinionated distribution of Plone, but does not really exist anymore) 2024 Plone 6 with Volto Light Theme (work in progress). Thank you kitconcept for all the help, especially Jonas.
Migrations are painful. Along the years we used wget to csv to SQL to csv, Python script, "Franken-mogrifier", collective.exportimport.
Lessons learned: stable urls are awesome, migrations are painful. Please don't try to salvage CSS from your old site, just start fresh in your new system. Do not try to migrate composite pages or listings.
What if your website does not provide an export? Use wget, still works and is better than httrack. sed/awk/regex are your friend. archivebox (WARC).
Document your steps for your own sanity.
To manage json, jq or jello can be used. sq is a Swiss knife for json/sql/csv. emuto is a hybrid between jq and GraphQL.
Normalize import/export. We have `plone.exportimport` in core now.
In the future I would like a plone exporter script that accepts a regex and exports only matching pages. Switch backends: ZODB, relstorage, nick, quantum-db. Sitewide search/replace/sed. Sneakernet is useful in difficult countries where you cannot send data over the internet: so export to a usb stick.
A backup is only a backup if it regularly gets restored so you know that it works.
- Keeping content and URL stability is a superpower.
- Assuming that export/import/backup/restore/migration are rare occurrences, is wrong.
- Quick export/import is very useful.
Do small migrations, treat it as maintenance. Don't be too far behind. Large migrations one every five years will be costly. Do a small migration every year. Do your part. Clients should also do their part, by budgeting this yearly. That is how budgeting works. Use every iteration to review custom code.
Make your sites live long and prosper.
29 Nov 2024 8:58pm GMT
Maurits van Rees: Fred van Dijk: Run Plone in containers on your own cluster with coolify.io
Link to talk information on Plone conference website.
Sorry, I ran out of time trying to set up https://coolify.io
So let's talk about another problem. Running applications (stacks) in containers is the future. Well: abstraction and isolation is the future, and containers is the current phase.
I am on the Plone A/I team, with Paul, Kim, Erico. All senior sysadmins, so we kept things running. In 2022 we worked on containerisation. Kubernetes was the kool kid then, but Docker Swarm was easier. Checkout Erico's training with new cookieplone templates.
Doing devops well is hard. You have a high workload, but still need to keep learning new stuff to keep up with what is changing.
I want to plug Coolify, which is a full open source product. "Self-hosting with super powers." The main developer, Andras Bacsal, believes in open source and 'hates' pay by usage cloud providers with a vengeance.
Coolify is still docker swarm. We also want Kubernetes support. But we still need sysadmins. Someone will still need to install coolify, and keep it updated.
I would like to run an online DevOps course somewhere January-March 2025. 4-6 meetings of 2 hours, maybe Friday afternoon. Talk through devops and sysadmin concepts, show docker swarm, try coolify, etc.
29 Nov 2024 7:58pm GMT