07 May 2025
Planet Python
The Python Coding Stack: "AI Coffee" Grand Opening This Monday • A Story About Parameters and Arguments in Python Functions
Alex had one last look around. You could almost see a faint smile emerge from the deep sigh-part exhaustion and part satisfaction. He was as ready as he could be. His new shop was as ready as it could be. There was nothing left to set up. He locked up and headed home. The grand opening was only seven hours away, and he'd better get some sleep.
Grand Opening sounds grand-too grand. Alex had regretted putting it on the sign outside the shop's window the previous week. This wasn't a vanity project. He didn't send out invitations to friends, journalists, or local politicians. He didn't hire musicians or waiters to serve customers. Grand Opening simply meant opening for the first time.
Alex didn't really know what to expect on the first day. Or maybe he did-he wasn't expecting too many customers. Another coffee shop on the high street? He may need some time to build a loyal client base.
• • •
He had arrived early on Monday. He'd been checking the lights, the machines, the labels, the chairs, the fridge. And then checking them all again. He glanced at the clock-ten minutes until opening time. But he saw two people standing outside. Surely they were just having a chat, only standing there by chance. He looked again. They weren't chatting. They were waiting.
Waiting for his coffee shop to open? Surely not?
But rather than check for the sixth time that the labels on the juice bottles were facing outwards, he decided to open the door a bit early. And those people outside walked in. They were AI Coffee's first customers.
Today's article is an overview of the parameters and arguments in Python's functions. It takes you through some of the key principles and discusses the various types of parameters you can define and arguments you can pass to a Python function. There are five numbered sections interspersed within the story in today's article:
-
Parameters and Arguments
-
Positional and Keyword Arguments
-
Args and Kwargs
-
Optional Arguments with Default Values
-
Positional-Only and Keyword-Only Arguments
Espressix ProtoSip v0.1 (AlphaBrew v0.1.3.7)
Introducing the Espressix ProtoSip, a revolutionary coffee-making machine designed to elevate the art of brewing for modern coffee shops. With its sleek, futuristic design and cutting-edge technology, this prototype blends precision engineering with intuitive controls to craft barista-quality espresso, cappuccino, and more. Tailored for innovators, the Espressix delivers unparalleled flavour extraction and consistency, setting a new standard for coffee excellence while hinting at the bold future of café culture.
Alex had taken a gamble with the choice of coffee machine for his shop. His cousin set up a startup some time earlier that developed an innovative coffee machine for restaurants and coffee shops. The company had just released its first prototype, and they offered Alex one at a significantly reduced cost since it was still a work in progress-and he was the founder's cousin!
The paragraph you read above is the spiel the startup has on its website and on the front cover of the slim booklet that came with the machine. There was little else in the booklet. But an engineer from the startup company had spent some time explaining to Alex how to use the machine.
The Espressix didn't have a user interface yet-it was still a rather basic prototype. Alex connected the machine to a laptop. He was fine calling functions from the AlphaBrew Python API directly from a terminal window-AlphaBrew is the software that came with the Espressix.
What the Espressix did have, despite being an early prototype, is a sleek and futuristic look. One of the startup's cofounders was a product design graduate, so she went all out with style and looks.
1. Parameters and Arguments
"You're AI Coffee's first ever customer", Alex told the first person to walk in. "What can I get you?"
"Wow! I'm honoured. Could I have a strong Cappuccino, please, but with a bit less milk?"
"Sure", and Alex tapped at his laptop:

And the Espressix started whizzing. A few seconds later, the perfect brew poured into a cup.
Here's the signature for the brew_coffee()
function Alex used:
Alex was a programmer before deciding to open a coffee shop. He was comfortable with this rudimentary API to use the machine, even though it wasn't ideal. But then, he wasn't paying much to lease the machine, so he couldn't complain!
The coffee_type
parameter accepts a string, which must match one of the available coffee types. Alex is already planning to replace this with enums to prevent typos, but that's not a priority for now.
The strength
parameter accepts integers between 1
and 5
. And milk
also accepts integers up to 5
, but the range starts from 0
to cater for coffees with no milk.
Terminology can be confusing, and functions come with plenty of terms. Parameter and argument are terms that many confuse. And it doesn't matter too much if you use one instead of the other. But, if you prefer to be precise, then:
Use parameter for the name you choose to refer to values you pass into a function. The parameter is the name you place within parentheses when you define a function. This is the variable name you use within the function definition. The parameters in the above example are
coffee_type
,strength
, andmilk_amount
.Use argument for the object you pass to the function when you call it. An argument is the value you pass to a function. In the example above, the arguments are
"Cappuccino"
,4
, and2
.When you call a function, you pass arguments. These arguments are assigned to the parameter names within the function.
To confuse matters further, some people use formal parameters to refer to parameters and actual parameters to refer to arguments. But the terms parameters and arguments as described in the bullet points above are more common in Python, and they're the ones I use here and in all my writing.
Alex's first day went better than he thought it would. He had a steady stream of customers throughout the day. And they all seemed to like the coffee.
But let's see what happened on Alex's second day!
2. Positional and Keyword Arguments
Chloezz @chloesipslife • 7m
Just visited the new AI Coffee shop on my high street, and OMG, it's like stepping into the future! The coffee machine is a total sci-fi vibe-sleek, futuristic, and honestly, I have no clue how it works, but it's powered by AI and makes a mean latte! The coffee? Absolutely delish. If this is what AI can do for my morning brew, I'm here for it! Who's tried it? #AICoffee #CoffeeLovers #TechMeetsTaste
- from social media
Alex hadn't been on social media after closing the coffee shop on the first day. Even if he had, he probably wouldn't have seen Chloezz's post. He didn't know who she was. But whoever she is, she has a massive following.
Alex was still unaware his coffee shop had been in the spotlight when he opened up on Tuesday. There was a queue outside. By mid-morning, he was no longer coping. Tables needed cleaning, fridge shelves needed replenishing, but there had been no gaps in the queue of customers waiting to be served.
And then Alex's sister popped in to have a look.
"Great timing. Here, I'll show you how this works." Alex didn't hesitate. His sister didn't have a choice. She was now serving coffees while Alex dealt with everything else.
• • •
But a few minutes later, she had a problem. A take-away customer came back in to complain about his coffee. He had asked for a strong Americano with a dash of milk. Instead, he got what seemed like the weakest latte in the world.
Alex's sister had typed the following code to serve this customer:
But the function's signature is:
I dropped the type hints, and I won't use them further in this article to focus on other characteristics of the function signature.
Let's write a demo version of this function to identify what went wrong:
The first argument, "Americano"
, is assigned to the first parameter, coffee_type
. So far, so good…
But the second argument, 1
, is assigned to strength
, which is the second parameter. Python can only determine which argument is assigned to which parameter based on the position of the argument in the function call. Python is a great programming language, but it still can't read the user's mind!
And then, the final argument, 4
, is assigned to the final parameter, milk_amount
.
Alex's sister had swapped the two integers. An easy mistake to make. Instead of a strong coffee with a little milk, she had input the call for a cup of hot milk with just a touch of coffee. Oops!
Here's the output from our demo code to confirm this error:
Coffee type: Americano
Strength: 1
Milk Amount: 4
Alex apologised to the customer, and he made him a new coffee.
"You can do this instead to make sure you get the numbers right," he showed his sister as he prepared the customer's replacement drink:
Note how the second and third arguments now also include the names of the parameters.
"This way, it doesn't matter what order you input the numbers since you're naming them", he explained.
Here's the output now:
Coffee type: Americano
Strength: 4
Milk Amount: 1
Even though the integer 1
is still passed as the second of the three arguments, Python now knows it needs to assign this value to milk_amount
since the parameter is named in the function call.
When you call a function such as
brew_coffee()
, you have the choice to use either positional arguments or keyword arguments.Arguments are positional when you pass the values directly without using the parameter names, as you do in the following call:
brew_coffee("Americano", 1, 4)
You don't use the parameter names. You only include the values within the parentheses. These arguments are assigned to parameter names depending on their order.
Keyword arguments are the arguments you pass using the parameter names, such as the following call:
brew_coffee(coffee_type="Americano", milk_amount=1, strength=4)
In this example, all three arguments are keyword arguments. You pass each argument matched to its corresponding parameter name. The order in which you pass keyword arguments no longer matters.
Keyword arguments can also be called named arguments.
Positional and keyword arguments: Mixing and matching
But look again at the code Alex used when preparing the customer's replacement drink:
The first argument doesn't have the parameter name. The first argument is a positional argument and, therefore, it's assigned to the first parameter, coffee_type
.
However, the remaining arguments are keyword arguments. The order of the second and third arguments no longer matters.
Therefore, you can mix and match positional and keyword arguments.
But there are some rules! Try the following call:
You try to pass the first and third arguments as positional and the second as a keyword argument, but…
File "...", line 8
brew_coffee("Americano", milk_amount=1, 4)
^
SyntaxError: positional argument follows
keyword argument
Any keyword arguments must come after all the positional arguments. Once you include a keyword argument, all the remaining arguments must also be passed as keyword arguments.
And this rule makes sense. Python can figure out which argument goes to which parameter if they're in order. But the moment you include a keyword argument, Python can no longer assume the order of arguments. To avoid ambiguity-we don't like ambiguity in programming-Python doesn't allow any more positional arguments once you include a keyword argument.
3. Args and Kwargs
Last week, AI Coffee, a futuristic new coffee shop, opened its doors on High Street, drawing crowds with its sleek, Star Trek-esque coffee machine. This reporter visited to sample the buzzworthy brews and was wowed by the rich, perfectly crafted cappuccino, churned out by the shop's mysterious AI-powered machine. Eager to learn more about the technology behind the magic, I tried to chat with the owner, but the bustling shop kept him too busy for a moment to spare. While the AI's secrets remain under wraps for now, AI Coffee is already a local hit, blending cutting-edge tech with top-notch coffee.
- from The Herald, a local paper
Alex had started to catch up with the hype around his coffee shop-social media frenzy, articles in local newspapers, and lots of word-of-mouth. He wasn't complaining, but he was perplexed at why his humble coffee shop had gained so much attention and popularity within its first few days. Sure, his coffee was great, but was it so much better than others? And his prices weren't the highest on the high street, but they weren't too cheap, either.
However, with the increased popularity, Alex also started getting increasingly complex coffee requests. Vanilla syrup, cinnamon powder, caramel drizzle, and lots more.
Luckily, the Espressix ProtoSip was designed with the demanding urban coffee aficionado in mind.
Args
Alex made some tweaks to his brew_coffee()
function:
There's a new parameter in brew_coffee()
. This is the *args
parameter, which has a leading *
in front of the parameter name. This function can now accept any number of positional arguments following the first three. We'll explore what the variable name args
refers to shortly. But first, let's test this new function:
You call the function with five arguments. And here's the output from this function call:
Coffee type: Latte
Strength: 3
Milk Amount: 2
Add-ons: cinnamon, hazelnut syrup
-
The first argument,
"Latte"
, is assigned to the first parameter,coffee_type
. -
The second argument,
3
, is assigned to the second parameter,strength
. -
The third argument,
2
, is assigned to the third parameter,milk_amount
. -
The remaining two arguments,
"cinnamon"
and"hazelnut syrup"
, are assigned toargs
, which is a tuple.
You can confirm that args
is a tuple with a small addition to the function:
The first two lines of the output from this code are shown below:
args=('cinnamon', 'hazelnut syrup')
<class 'tuple'>
The parameter name args
is a tuple containing the remaining positional arguments in the function call once the function deals with the first three.
There's nothing special about the name args
What gives *args
its features? It's not the name args
. Instead, it's the leading asterisk, *
, that makes this parameter one that can accept any number of positional arguments. The parameter name args
is often used in this case, but you can also use a name that's more descriptive to make your code more readable:
Alex uses the name add_ons
instead of args
. This parameter name still has the leading *
in the function signature. Colloquially, many Python programmers will still call a parameter with a leading *
the args parameter, even though the parameter name is different.
Therefore, you can now call this function with three or more arguments. You can add as many arguments as you wish after the third one, including none at all:
The output confirms that add_ons
is now an empty tuple:
add_ons=()
<class 'tuple'>
Coffee type: Latte
Strength: 3
Milk Amount: 2
Add-ons:
This coffee doesn't have any add-ons.
We have a problem
However, Alex's sister, who was now working in the coffee shop full time, could no longer use her preferred way of calling the brew_coffee()
function:
This raises an error:
File "...", line 9
brew_coffee("Latte", strength=3,
milk_amount=2, "vanilla syrup")
^
SyntaxError: positional argument follows
keyword argument
This is a problem you've seen already. Positional arguments must come before keyword arguments in a function call. And *add_ons
in the function signature indicates that Python will collect all remaining positional arguments from this point in the parameter list. Therefore, none of the parameters defined before *add_ons
can be assigned a keyword argument if you also include args as arguments. They must all be assigned positional arguments.
All arguments preceding the args arguments in a function call must be positional arguments.
Alex refactored the code:
The *add_ons
parameter is now right after coffee_type
. The remaining parameters, strength
and milk_amount
, come next. Unfortunately, this affects how Alex and his growing team can use brew_coffee()
in other situations, too. The strength
and milk_amount
arguments must now come after any add-ons, and they must be used as keyword arguments.
See what happens if you try to pass positional arguments for strength
and milk_amount
:
This raises an error:
Traceback (most recent call last):
File "...", line 9, in <module>
brew_coffee("Latte", "vanilla syrup", 3, 2)
~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: brew_coffee() missing
2 required keyword-only arguments:
'strength' and 'milk_amount'
The args parameter, which is *add_ons
in this example, marks the end of the positional arguments in a function. Therefore, strength
and milk_amount
must be assigned arguments using keywords.
Alex instructed his team on these two changes:
-
Any add-ons must go after the coffee type.
-
They must use keyword arguments for
strength
andmilk_amount
.
It's a bit annoying that they have to change how to call the function but they're all still learning and Alex feels this is a safer option.
Kwargs
But Alex's customers also had other requests. Some wanted their coffee extra hot, others needed oat milk, and others wanted their small coffee served in a large cup.
Alex included this in brew_coffee()
by adding another parameter:
The new parameter Alex added at the end of the signature, **kwargs
, has two leading asterisks, **
. This parameter indicates that the function can accept any number of optional keyword arguments after all the other arguments.
Whereas *args
creates a tuple called args
within the function, the double asterisk in **kwargs
creates a dictionary called kwargs
. The best way to see this is to call this function with additional keyword arguments:
The final two arguments use the keywords milk_type
and temperature
. These are not parameters in the function definition.
Let's explore these six arguments:
-
The first argument,
"Latte"
, is the required argument assigned tocoffee_type
. -
The second argument,
"vanilla syrup"
, is also a positional argument. Therefore, Python assigns this toadd_ons
. There's only one additional positional argument in this call but, in general, you can have more. -
Next, you have the two required keyword arguments,
strength=3
andmilk_amount=2
. -
But there are also two more keyword arguments,
milk_type="oat"
andtemperature="extra hot"
. These are the additional optional keyword arguments, and they're assigned to the dictionarykwargs
.
Here is the first part of the output from this call:
kwargs={
'milk_type': 'oat',
'temperature': 'extra hot'
}
<class 'dict'>
This confirms that kwargs
is a dictionary. The keywords are the keys, and the argument values are the dictionary values.
The rest of the output shows the additional special instructions in the printout:
Coffee type: Latte
Strength: 3
Milk Amount: 2
Add-ons: vanilla syrup
Instructions:
milk type: oat
temperature: extra hot
There's nothing special about the name kwargs
You've seen this when we talked about args. There's nothing special about the parameter name kwargs
. It's the leading double asterisk that does the trick. So, you can use any descriptive name you wish in your code:
Warning: the following paragraph is dense with terminology!
So, in its current form, this function needs a required argument assigned to coffee_type
and two required keyword arguments assigned to strength
and milk_amount
. And you can also have any number of optional positional arguments, which you add after the first positional argument but before the required keyword arguments. These are the add-ons a customer wants in their coffee.
But you can also add any number of keyword arguments at the end of the function call. These are the special instructions from the customer.
Both args and kwargs are optional. So, you can still call the function with only the required arguments:
The output shows that this gives a strong espresso with no milk, no add-ons, and no special instructions:
instructions={}
<class 'dict'>
Coffee type: Espresso
Strength: 4
Milk Amount: 0
Add-ons:
Instructions:
Note that in this case, since there are no args, you can also pass the first argument as a keyword argument:
But this is only possible when there are no add-ons-no args. We'll revisit this case in a later section of this article.
A quick recap before we move on.
Args and kwargs are informal terms used for parameters with a leading single and double asterisk.
The term args refers to a parameter with a leading asterisk in the function's signature, such as
*args
. This parameter indicates that the function can accept any number of optional positional arguments following any required positional arguments. The term args stands for arguments, but you've already figured that out!And kwargs refers to a parameter with two leading asterisks, such as
**kwargs
, which indicates that the function can accept any number of optional keyword arguments following any required keyword arguments. The 'kw' in kwargs stands for keyword.
Coffee features often when talking about programming. Here's another coffee-themed article, also about functions: What Can A Coffee Machine Teach You About Python's Functions?
4. Optional Arguments with Default Values
Alex's team grew rapidly. The coffee shop now had many regular customers and a constant stream of coffee lovers throughout the day.
Debra, one of the staff members, had some ideas to share in a team meeting:
"Alex, many customers don't care about the coffee strength. They just want a normal coffee. I usually type in 3
for the strength argument for these customers. But it's time-consuming to have to write strength=3
for all of them, especially when it's busy."
"We can easily fix that", Alex was quick to respond:
The parameter strength
now has a default value. This makes the argument corresponding to strength
an optional argument since it has a default value of 3
. The default value is used by the function only if you don't pass the corresponding argument.
Alex's staff can now leave this argument out if they want to brew a "normal strength" coffee:
This gives a medium strength espresso with no add-ons or special instructions:
Coffee type: Espresso
Strength: 3
Milk Amount: 0
Add-ons:
Instructions:
The output confirms that the coffee strength has a value of 3
, which is the default value. And here's a coffee with some add-ons that also uses the default coffee strength:
Here's the output confirming this normal-strength caramel-drizzle latte:
Coffee type: Latte
Strength: 3
Milk Amount: 2
Add-ons: caramel drizzle
Instructions:
Ambiguity, again
Let's look at the function's signature again:
The coffee_type
parameter can accept a positional argument. Then, *add_ons
collects all remaining positional arguments, if there are any, that the user passes when calling the function. Any argument after this must be a keyword argument. Therefore, when calling the function, there's no ambiguity whether strength
, which is optional, is included or not, since all the arguments after the add-ons are named.
Why am I mentioning this? Consider a version of this function that doesn't have the args parameter *add_ons
:
I commented out the lines with *add_ons
to highlight they've been removed temporarily in this function version. When you run this code, Python raises an error. Note that the error is raised in the function definition before the function call itself:
File "...", line 5
milk_amount,
^^^^^^^^^^^
SyntaxError: parameter without a default follows
parameter with a default
Python doesn't allow this function signature since this format introduces ambiguity. To see this ambiguity, let's use a positional argument for the amount of milk, since this would now be possible as *add_ons
is no longer there. Recall that in the main version of the function with the parameter *add_ons
, all the arguments that follow the args must be named:
As mentioned above, note that the error is raised by the function definition and not the function call. I'm showing these calls to help with the discussion.
Is the value 0
meant for strength
, or is your intention to use the default value for strength
and assign the value 0
to milk_amount
? To avoid this ambiguity, Python function definitions don't allow parameters without a default value to follow a parameter with a default value. Once you add a default value, all the following parameters must also have a default value.
Of course, there would be no ambiguity if you use a keyword argument. However, this would lead to the situation where the function call is ambiguous with a positional argument, but not when using a keyword argument, even though both positional and keyword arguments are possible. Python doesn't allow this to be on the safe side!
This wasn't an issue when you had *add_ons
as part of the signature. Let's put *add_ons
back in:
There's no ambiguity in this case since strength
and milk_amount
must both have keyword arguments.
However, even though this signature is permitted in Python, it's rather unconventional. Normally, you don't see many parameters without default values after ones with default values, even when you're already in the keyword-only region of the function (after the args).
In this case, Debra's follow-up suggestion fixes this unconventional function signature:
"And we also have to input milk_amount=0
for black coffees, which are quite common. Can we do a similar trick for coffees with no milk?"
"Sure we can"
Now, there's also a default value for milk_amount
. The default is a black coffee.
In this version of the function, there's only one required argument-the first one that's assigned to coffee_type
. All the other arguments are optional either because they're not needed to make a coffee, such as the add-ons and special instructions, or because the function has default values for them, such as strength
and milk_amount
.
A parameter can have a default value defined in the function's signature. Therefore, the argument assigned to a parameter with a default value is an optional argument.
And let's confirm you can still include add-ons and special instructions:
Here's the output from this function call:
Coffee type: Cappuccino
Strength: 3
Milk Amount: 2
Add-ons: chocolate sprinkles, vanilla syrup
Instructions:
temperature: extra hot
cup size: large cup
Note that you rely on the default value for strength
in this example since the argument assigned to strength
is not included in the call.
A common pitfall with default values in function definitions is the mutable default value trap. You can read more about this in section 2, The Teleportation Trick, in this article: Python Quirks? Party Tricks? Peculiarities Revealed…
5. Positional-Only and Keyword-Only Arguments
Let's summarise the requirements for all the arguments in Alex's current version of the brew_coffee()
function. Here's the current function signature:
-
The first parameter is
coffee_type
, and the argument you assign to this parameter can be either a positional argument or a keyword argument. But-and this is important-you can only use it as a keyword argument if you don't pass any arguments assigned to*add_ons
. Remember that positional arguments must come before keyword arguments in function calls. Therefore, you can only use a keyword argument for the first parameter if you don't have args. We'll focus on this point soon. -
As long as the first argument, the one assigned to
coffee_type
, is positional, any further positional arguments are assigned to the tupleadd_ons
. -
Next, you can add named arguments (which is another term used for keyword arguments) for
strength
andmilk_amount
. Both of these arguments are optional, and the order in which you use them in a function call is not important. -
Finally, you can add more keyword arguments using keywords that aren't parameters in the function definition. You can include as many keyword arguments as you wish.
Read point 1 above again. Alex thinks that allowing the first argument to be either positional or named is not a good idea, as it can lead to confusion. You can only use the first argument as a keyword argument if you don't have add-ons. Here's proof:
The first argument is a keyword argument, coffee_type="Cappuccino"
. But then you attempt to pass two positional arguments, chocolate sprinkles
and vanilla syrup
. This call raises an error:
File "...", line 25
)
^
SyntaxError: positional argument follows
keyword argument
You can't have positional arguments following keyword arguments.
Alex decides to remove this source of confusion by ensuring that the argument assigned to coffee_type
is always a positional argument. He only needs to make a small addition to the function's signature:
The rogue forward slash, /
, in place of a parameter is not a typo. It indicates that all parameters before the forward slash must be assigned positional arguments. Therefore, the object assigned to coffee_type
can no longer be a keyword argument:
The first argument is a keyword argument. But this call raises an error:
Traceback (most recent call last):
File "...", line 19, in <module>
brew_coffee(
~~~~~~~~~~~^
coffee_type="Cappuccino",
^^^^^^^^^^^^^^^^^^^^^^^^^
...<2 lines>...
cup_size="large cup",
^^^^^^^^^^^^^^^^^^^^^
)
^
TypeError: brew_coffee() missing 1 required
positional argument: 'coffee_type'
The function has a required positional argument, the one assigned to coffee_type
. The forward slash, /
, makes the first argument a positional-only argument. It can no longer be a keyword argument:
This version works fine since the first argument is positional:
Coffee type: Cappuccino
Strength: 3
Milk Amount: 2
Add-ons:
Instructions:
temperature: extra hot
cup size: large cup
Alex feels that this function's signature is neater and clearer now, avoiding ambiguity.
• • •
The R&D team at the startup that's developing the Espressix ProtoSip were keen to see how Alex was using the prototype and inspect the changes he made to suit his needs. They implemented many of Alex's changes.
However, they were planning to offer a more basic version of the Espressix that didn't have the option to include add-ons in the coffee.
The easiest option is to remove the *add-ons
parameter from the function's signature:
No *add_ons
parameter, no add-ons in the coffee.
Sorted? Sort of.
The *add_ons
parameter enabled you to pass optional positional arguments. However, *add_ons
served a second purpose in the earlier version. All parameters after the args parameter, which is *add_ons
in this example, must be assigned keyword arguments. The args parameter, *add_ons
, forces all remaining parameters to be assigned keyword-only arguments.
Removing the *add_ons
parameter changes the rules for the remaining arguments.
But you can still implement the same rules even when you're not using args. All you need to do is keep the leading asterisk but drop the parameter name:
Remember to remove the line printing out the add-ons, too. That's the second of the highlighted lines in the code above.
Notice how there's a lone asterisk in one of the parameter slots in the function signature. You can confirm that strength
and milk_amount
still need to be assigned keyword arguments:
When you try to pass positional arguments to strength
and milk_amount
, the code raises an error:
Traceback (most recent call last):
brew_coffee(
~~~~~~~~~~~^
"Espresso",
^^^^^^^^^^^
3,
^^
0,
^^
)
^
TypeError: brew_coffee() takes 1 positional argument
but 3 were given
The error message tells you that brew_coffee()
only takes one positional argument. All the arguments after the *
are keyword-only. Therefore, only the arguments preceding it may be positional. And there's only one parameter before the rogue asterisk, *
.
A lone forward slash,
/
, among the function's parameters indicates that all parameters before the forward slash must be assigned positional-only arguments.A lone asterisk,
*
, among the function's parameters indicates that all parameters after the asterisk must be assigned keyword-only arguments.
If you re-read the statements above carefully, you'll conclude that when you use both /
and *
in a function definition, the /
must come before the *
. Recall that positional arguments must come before keyword arguments.
It's also possible to have parameters between the /
and *
:
You add a new parameter, another_param
, in between /
and *
in the function's signature. Since this parameter is sandwiched between /
and *
, you can choose to assign either a positional or a keyword argument to it.
Here's a function call with the second argument as a positional argument:
The second positional argument is assigned to another_param
.
But you can also use a keyword argument:
Both of these versions give the same output:
Coffee type: Espresso
another_param='testing another parameter'
Strength: 4
Milk Amount: 0
Instructions:
Any parameter between /
and *
in the function definition can have either positional or keyword arguments. So, in summary:
-
Arguments assigned to parameters before a forward slash,
/
, are positional-only. -
Arguments assigned to parameters between a forward slash,
/
, and an asterisk,*
, can be either positional or keyword. -
Arguments assigned to parameters after an asterisk,
*
, are keyword-only.
Remember that the *
serves a similar purpose as the asterisk in *args
since both *
and *args
force any parameters that come after them to require keyword-only arguments. Remember this similarity if you find yourself struggling to remember what /
and *
do!
Why use positional-only or keyword-only arguments? Positional-only arguments (using /
) ensure clarity and prevent misuse in APIs where parameter names are irrelevant to the user. Keyword-only arguments (using *
) improve readability and avoid errors in functions with many parameters, as names make the intent clear. For Alex, making coffee_type
positional-only and strength
and milk_amount
keyword-only simplifies the API by enforcing a consistent calling style, reducing confusion for his team.
Using positional-only arguments may also be beneficial in performance-critical code since the overhead to deal with keyword arguments is not negligible in these cases.
Do you want to join a forum to discuss Python further with other Pythonistas? Upgrade to a paid subscription here on The Python Coding Stack to get exclusive access to The Python Coding Place's members' forum. More Python. More discussions. More fun.
And you'll also be supporting this publication. I put plenty of time and effort into crafting each article. Your support will help me keep this content coming regularly and, importantly, will help keep it free for everyone.
Final Words
The reporter from The Herald did manage to chat to Alex eventually. She had become a regular at AI Coffee, and ever since Alex employed more staff, he's been able to chat to customers a bit more.
"There's a question I'm curious about", she asked. "How does the Artificial Intelligence software work to make the coffee just perfect for each customer?"
"I beg your pardon?" Alex looked confused.
"I get it. It's a trade secret, and you don't want to tell me. This Artificial Intelligence stuff is everywhere these days."
"What do you mean by Artificial Intelligence?" Alex asked, more perplexed.
"The machine uses AI to optimise the coffee it makes, right?"
"Er, no. It does not."
"But…But the name of the coffee shop, AI Coffee…?"
"Ah, that's silly, I know. I couldn't think of a name for the shop. So I just used my initials. I'm Alex Inverness."
• • •
Python functions offer lots of flexibility in how to define and use them. But function signatures can look cryptic with all the *args
and **kwargs
, rogue /
and *
, some parameters with default values and others without. And the rules on when and how to use arguments may not be intuitive at first.
Hopefully, Alex's story helped you grasp all the minutiae of the various types of parameters and arguments you can use in Python functions.
Now, I need to make myself a cup of coffee…
Photo by Viktoria Alipatova: https://www.pexels.com/photo/person-sitting-near-table-with-teacups-and-plates-2074130/
Code in this article uses Python 3.13
The code images used in this article are created using Snappify. [Affiliate link]
You can also support this publication by making a one-off contribution of any amount you wish.
For more Python resources, you can also visit Real Python-you may even stumble on one of my own articles or courses there!
Also, are you interested in technical writing? You'd like to make your own writing more narrative, more engaging, more memorable? Have a look at Breaking the Rules.
And you can find out more about me at stephengruppetta.com
Further reading related to this article's topic:
-
Coffee features often when talking about programming. Here's another coffee-themed article, also about functions: What Can A Coffee Machine Teach You About Python's Functions?
-
Python Quirks? Party Tricks? Peculiarities Revealed… [in particular see section 2, The Teleportation Trick]
Appendix: Code Blocks
Code Block #1
brew_coffee("Cappuccino", 4, 2)
Code Block #2
brew_coffee(coffee_type: str, strength: int, milk_amount: int)
Code Block #3
brew_coffee("Americano", 1, 4)
Code Block #4
brew_coffee(coffee_type, strength, milk_amount)
Code Block #5
def brew_coffee(coffee_type: str, strength: int, milk_amount: int):
print(
f"Coffee type: {coffee_type}\n"
f"Strength: {strength}\n"
f"Milk Amount: {milk_amount}\n"
)
brew_coffee("Americano", 1, 4)
Code Block #6
brew_coffee("Americano", milk_amount=1, strength=4)
Code Block #7
brew_coffee("Americano", milk_amount=1, strength=4)
Code Block #8
brew_coffee("Americano", milk_amount=1, 4)
Code Block #9
def brew_coffee(coffee_type, strength, milk_amount, *args):
print(
f"Coffee type: {coffee_type}\n"
f"Strength: {strength}\n"
f"Milk Amount: {milk_amount}\n"
f"Add-ons: {', '.join(args)}\n"
)
Code Block #10
brew_coffee("Latte", 3, 2, "cinnamon", "hazelnut syrup")
Code Block #11
def brew_coffee(coffee_type, strength, milk_amount, *args):
print(f"{args=}")
print(type(args))
print(
f"Coffee type: {coffee_type}\n"
f"Strength: {strength}\n"
f"Milk Amount: {milk_amount}\n"
f"Add-ons: {', '.join(args)}\n"
)
brew_coffee("Latte", 3, 2, "cinnamon", "hazelnut syrup")
Code Block #12
def brew_coffee(coffee_type, strength, milk_amount, *add_ons):
print(f"{add_ons=}")
print(type(add_ons))
print(
f"Coffee type: {coffee_type}\n"
f"Strength: {strength}\n"
f"Milk Amount: {milk_amount}\n"
f"Add-ons: {', '.join(add_ons)}\n"
)
brew_coffee("Latte", 3, 2, "cinnamon", "hazelnut syrup")
Code Block #13
brew_coffee("Latte", 3, 2)
Code Block #14
def brew_coffee(coffee_type, strength, milk_amount, *add_ons):
print(
f"Coffee type: {coffee_type}\n"
f"Strength: {strength}\n"
f"Milk Amount: {milk_amount}\n"
f"Add-ons: {', '.join(add_ons)}\n"
)
brew_coffee("Latte", strength=3, milk_amount=2, "vanilla syrup")
Code Block #15
def brew_coffee(coffee_type, *add_ons, strength, milk_amount):
print(
f"Coffee type: {coffee_type}\n"
f"Strength: {strength}\n"
f"Milk Amount: {milk_amount}\n"
f"Add-ons: {', '.join(add_ons)}\n"
)
brew_coffee("Latte", "vanilla syrup", strength=3, milk_amount=2)
Code Block #16
brew_coffee("Latte", "vanilla syrup", 3, 2)
Code Block #17
def brew_coffee(
coffee_type,
*add_ons,
strength,
milk_amount,
**kwargs,
):
print(f"{kwargs=}")
print(type(kwargs))
print(
f"Coffee type: {coffee_type}\n"
f"Strength: {strength}\n"
f"Milk Amount: {milk_amount}\n"
f"Add-ons: {', '.join(add_ons)}\n"
f"Instructions:"
)
for key, value in kwargs.items():
print(f"\t{key.replace('_', ' ')}: {value}")
Code Block #18
brew_coffee(
"Latte",
"vanilla syrup",
strength=3,
milk_amount=2,
milk_type="oat",
temperature="extra hot",
)
Code Block #19
def brew_coffee(
coffee_type,
*add_ons,
strength,
milk_amount,
**instructions,
):
print(f"{instructions=}")
print(type(instructions))
print(
f"Coffee type: {coffee_type}\n"
f"Strength: {strength}\n"
f"Milk Amount: {milk_amount}\n"
f"Add-ons: {', '.join(add_ons)}\n"
f"Instructions:"
)
for key, value in instructions.items():
print(f"\t{key.replace('_', ' ')}: {value}")
Code Block #20
brew_coffee("Espresso", strength=4, milk_amount=0)
Code Block #21
brew_coffee(coffee_type="Espresso", strength=4, milk_amount=0)
Code Block #22
def brew_coffee(
coffee_type,
*add_ons,
strength=3,
milk_amount,
**instructions,
):
# ...
Code Block #23
brew_coffee("Espresso", milk_amount=0)
Code Block #24
brew_coffee("Latte", "caramel drizzle", milk_amount=2)
Code Block #25
def brew_coffee(
coffee_type,
*add_ons,
strength=3,
milk_amount,
**instructions,
):
# ...
Code Block #26
def brew_coffee_variant(
coffee_type,
# *add_ons,
strength=3,
milk_amount,
**instructions,
):
print(
f"Coffee type: {coffee_type}\n"
f"Strength: {strength}\n"
f"Milk Amount: {milk_amount}\n"
# f"Add-ons: {', '.join(add_ons)}\n"
f"Instructions:"
)
for key, value in instructions.items():
print(f"\t{key.replace('_', ' ')}: {value}")
brew_coffee_variant("Espresso", milk_amount=0)
Code Block #27
brew_coffee_variant("Espresso", 0)
Code Block #28
def brew_coffee(
coffee_type,
*add_ons,
strength=3,
milk_amount,
**instructions,
):
print(
f"Coffee type: {coffee_type}\n"
f"Strength: {strength}\n"
f"Milk Amount: {milk_amount}\n"
f"Add-ons: {', '.join(add_ons)}\n"
f"Instructions:"
)
for key, value in instructions.items():
print(f"\t{key.replace('_', ' ')}: {value}")
brew_coffee("Espresso", milk_amount=0)
Code Block #29
def brew_coffee(
coffee_type,
*add_ons,
strength=3,
milk_amount=0,
**instructions,
):
print(
f"Coffee type: {coffee_type}\n"
f"Strength: {strength}\n"
f"Milk Amount: {milk_amount}\n"
f"Add-ons: {', '.join(add_ons)}\n"
f"Instructions:"
)
for key, value in instructions.items():
print(f"\t{key.replace('_', ' ')}: {value}")
brew_coffee("Espresso")
Code Block #30
brew_coffee(
"Cappuccino",
"chocolate sprinkles",
"vanilla syrup",
milk_amount=2,
temperature="extra hot",
cup_size="large cup",
)
Code Block #31
def brew_coffee(
coffee_type,
*add_ons,
strength=3,
milk_amount=0,
**instructions,
):
# ...
Code Block #32
brew_coffee(
coffee_type="Cappuccino",
"chocolate sprinkles",
"vanilla syrup",
milk_amount=2,
temperature="extra hot",
cup_size="large cup",
)
Code Block #33
def brew_coffee(
coffee_type,
/,
*add_ons,
strength=3,
milk_amount=0,
**instructions,
):
# ...
Code Block #34
brew_coffee(
coffee_type="Cappuccino",
milk_amount=2,
temperature="extra hot",
cup_size="large cup",
)
Code Block #35
brew_coffee(
"Cappuccino",
milk_amount=2,
temperature="extra hot",
cup_size="large cup",
)
Code Block #36
def brew_coffee(
coffee_type,
/,
# *add_ons,
strength=3,
milk_amount=0,
**instructions,
):
Code Block #37
def brew_coffee(
coffee_type,
/,
*,
strength=3,
milk_amount=0,
**instructions,
):
print(
f"Coffee type: {coffee_type}\n"
f"Strength: {strength}\n"
f"Milk Amount: {milk_amount}\n"
# f"Add-ons: {', '.join(add_ons)}\n"
f"Instructions:"
)
for key, value in instructions.items():
print(f"\t{key.replace('_', ' ')}: {value}")
brew_coffee(
"Cappuccino",
milk_amount=2,
temperature="extra hot",
cup_size="large cup",
)
Code Block #38
brew_coffee(
"Espresso",
3,
0,
)
Code Block #39
def brew_coffee(
coffee_type,
/,
another_param,
*,
strength=3,
milk_amount=0,
**instructions,
):
print(
f"Coffee type: {coffee_type}\n"
f"{another_param=}\n"
f"Strength: {strength}\n"
f"Milk Amount: {milk_amount}\n"
# f"Add-ons: {', '.join(add_ons)}\n"
f"Instructions:"
)
for key, value in instructions.items():
print(f"\t{key.replace('_', ' ')}: {value}")
Code Block #40
brew_coffee(
"Espresso",
"testing another parameter",
strength=4,
)
Code Block #41
brew_coffee(
"Espresso",
another_param="testing another parameter",
strength=4,
)
Code Block #42
brew_coffee(
"Macchiato",
strength=4,
milk_amount=1,
cup="Stephen's espresso cup",
)
For more Python resources, you can also visit Real Python-you may even stumble on one of my own articles or courses there!
Also, are you interested in technical writing? You'd like to make your own writing more narrative, more engaging, more memorable? Have a look at Breaking the Rules.
And you can find out more about me at stephengruppetta.com
07 May 2025 8:19pm GMT
death and gravity: ProcessThreadPoolExecutor: when I/O becomes CPU-bound
So, you're doing some I/O bound stuff, in parallel.
Maybe you're scraping some websites - a lot of websites.
Maybe you're updating or deleting millions of DynamoDB items.
You've got your ThreadPoolExecutor, you've increased the number of threads and tuned connection limits... but after some point, it's just not getting any faster. You look at your Python process, and you see CPU utilization hovers above 100%.
You could split the work into batches and have a ProcessPoolExecutor run your original code in separate processes. But that requires yet more code, and a bunch of changes, which is no fun. And maybe your input is not that easy to split into batches.
If only we had an executor that worked seamlessly across processes and threads.
Well, you're in luck, since that's exactly what we're building today!
And even better, in a couple years you won't even need it anymore.
Establishing a baseline #
To measure things, we'll use a mock that pretends to do mostly I/O, with a sprinkling of CPU-bound work thrown in - a stand-in for something like a database connection, a Requests session, or a DynamoDB client.
class Client:
io_time = 0.02
cpu_time = 0.0008
def method(self, arg):
# simulate I/O
time.sleep(self.io_time)
# simulate CPU-bound work
start = time.perf_counter()
while time.perf_counter() - start < self.cpu_time:
for i in range(100): i ** i
return arg
We sleep() for the I/O, and do some math in a loop for the CPU stuff; it doesn't matter exactly how long each takes, as long I/O time dominates.
Real multi-threaded clients are usually backed by a shared connection pool, which allows for connection reuse (so you don't pay the cost of a new connection on each request) and multiplexing (so you can use the same connection for multiple concurrent requests, possible with protocols like HTTP/2 or newer). We could simulate this with a semaphore, but limiting connections is not relevant here - we're assuming the connection pool is effectively unbounded.
Since we'll use our client from multiple processes, we write an initializer function to set up a global, per-process client instance (remember, we want to share potential connection pools between threads); we can then pass the initializer to the executor constructor, along with any arguments we want to pass to the client. Similarly, we do the work through a function that uses this global client.
# this code runs in each worker process
client = None
def init_client(*args):
global client
client = Client(*args)
def do_stuff(*args):
return client.method(*args)
Finally, we make a simple timing context manager:
@contextmanager
def timer():
start = time.perf_counter()
yield
end = time.perf_counter()
print(f"elapsed: {end-start:1.3f}")
...and put everything together in a function that measures how long it takes to do a bunch of work using a concurrent.futures executor:
def benchmark(executor, n=10_000, timer=timer, chunksize=10):
with executor:
# make sure all the workers are started,
# so we don't measure their startup time
list(executor.map(time.sleep, [0] * 200))
with timer():
values = list(executor.map(do_stuff, range(n), chunksize=chunksize))
assert values == list(range(n)), values
Threads #
So, a ThreadPoolExecutor should suffice here, since we're mostly doing I/O, right?
>>> from concurrent.futures import *
>>> from bench import *
>>> init_client()
>>> benchmark(ThreadPoolExecutor(10))
elapsed: 24.693
More threads!
>>> benchmark(ThreadPoolExecutor(20))
elapsed: 12.405
Twice the threads, twice as fast. More!
>>> benchmark(ThreadPoolExecutor(30))
elapsed: 8.718
Good, it's still scaling linearly. MORE!
>>> benchmark(ThreadPoolExecutor(40))
elapsed: 8.638
...more?
>>> benchmark(ThreadPoolExecutor(50))
elapsed: 8.458
>>> benchmark(ThreadPoolExecutor(60))
elapsed: 8.430
>>> benchmark(ThreadPoolExecutor(70))
elapsed: 8.428
Problem: CPU becomes a bottleneck #
It's time we take a closer look at what our process is doing. I'd normally use the top command for this, but since the flags and output vary with the operating system, we'll implement our own using the excellent psutil library.
@contextmanager
def top():
"""Print information about current and child processes.
RES is the resident set size. USS is the unique set size.
%CPU is the CPU utilization. nTH is the number of threads.
"""
process = psutil.Process()
processes = [process] + process.children(True)
for p in processes: p.cpu_percent()
yield
print(f"{'PID':>7} {'RES':>7} {'USS':>7} {'%CPU':>7} {'nTH':>7}")
for p in processes:
try:
m = p.memory_full_info()
except psutil.AccessDenied:
m = p.memory_info()
rss = m.rss / 2**20
uss = getattr(m, 'uss', 0) / 2**20
cpu = p.cpu_percent()
nth = p.num_threads()
print(f"{p.pid:>7} {rss:6.1f}m {uss:6.1f}m {cpu:7.1f} {nth:>7}")
And because it's a context manager, we can use it as a timer:
>>> init_client()
>>> benchmark(ThreadPoolExecutor(10), timer=top)
PID RES USS %CPU nTH
51395 35.2m 28.5m 38.7 11
So, what happens if we increase the number of threads?
>>> benchmark(ThreadPoolExecutor(20), timer=top)
PID RES USS %CPU nTH
13912 16.8m 13.2m 70.7 21
>>> benchmark(ThreadPoolExecutor(30), timer=top)
PID RES USS %CPU nTH
13912 17.0m 13.4m 99.1 31
>>> benchmark(ThreadPoolExecutor(40), timer=top)
PID RES USS %CPU nTH
13912 17.3m 13.7m 100.9 41
With more threads, the compute part of our I/O bound workload increases, eventually becoming high enough to saturate one CPU - and due to the global interpreter lock, one CPU is all we can use, regardless of the number of threads.1
Processes? #
I know, let's use a ProcessPoolExecutor instead!
>>> benchmark(ProcessPoolExecutor(20, initializer=init_client))
elapsed: 12.374
>>> benchmark(ProcessPoolExecutor(30, initializer=init_client))
elapsed: 8.330
>>> benchmark(ProcessPoolExecutor(40, initializer=init_client))
elapsed: 6.273
Hmmm... I guess it is a little bit better.
More? More!
>>> benchmark(ProcessPoolExecutor(60, initializer=init_client))
elapsed: 4.751
>>> benchmark(ProcessPoolExecutor(80, initializer=init_client))
elapsed: 3.785
>>> benchmark(ProcessPoolExecutor(100, initializer=init_client))
elapsed: 3.824
OK, it's better, but with diminishing returns - there's no improvement after 80 processes, and even then, it's only 2.2x faster than the best time with threads, when, in theory, it should be able to make full use of all 4 CPUs.
Also, we're not making best use of connection pools (since we now have 80 of them, one per process), nor multiplexing (since we now have 80 connections, one per pool).
Problem: more processes, more memory #
But it gets worse!
>>> benchmark(ProcessPoolExecutor(80, initializer=init_client), timer=top)
PID RES USS %CPU nTH
2479 21.2m 15.4m 15.0 3
2480 11.2m 6.3m 0.0 1
2481 13.8m 8.5m 3.4 1
... 78 more lines ...
2560 13.8m 8.5m 4.4 1
13.8 MiB * 80 ~= 1 GiB ... that is a lot of memory.
Now, there's some nuance to be had here.
First, on most operating systems that have virtual memory, code segment pages are shared between processes - there's no point in having 80 copies of libc or the Python interpreter in memory.
The unique set size is probably a better measurement than the resident set size, since it excludes memory shared between processes.2 So, for the macOS output above,3 the actual usage is more like 8.5 MiB * 80 = 680 MiB.
Second, if you use the fork or forkserver start methods, processes also share memory allocated before the fork() via copy-on-write; for Python, this includes module code and variables. On Linux, the actual usage is 1.7 MiB * 80 = 136 MiB:
>>> benchmark(ProcessPoolExecutor(80, initializer=init_client), timer=top)
PID RES USS %CPU nTH
329801 17.0m 6.6m 5.1 3
329802 13.3m 1.6m 2.1 1
... 78 more lines ...
329881 13.3m 1.7m 2.0 1
However, it's important to note that's just a lower bound; memory allocated after fork() is not shared, and most real work will unavoidably allocate more memory.
Why not both? #
One reasonable way of dealing with this would be to split the input into batches, one per CPU, and pass them to a ProcessPoolExecutor, which in turn runs the batch items using a ThreadPoolExecutor.4
But that would mean we need to change our code, and that's no fun.
If only we had an executor that worked seamlessly across processes and threads.
A minimal plausible solution #
In keeping with what has become tradition by now, we'll take an iterative, problem-solution approach; since we're not sure what to do yet, we start with the simplest thing that could possibly work.
We know we want a process pool executor that starts one thread pool executor per process, so let's deal with that first.
class ProcessThreadPoolExecutor(concurrent.futures.ProcessPoolExecutor):
def __init__(self, max_threads=None, initializer=None, initargs=()):
super().__init__(
initializer=_init_process,
initargs=(max_threads, initializer, initargs)
)
By subclassing ProcessPoolExecutor, we get the map() implementation for free, since the original is implemented in terms of submit().5 By going with the default max_workers
, we get one process per CPU (which is what we want); we can add more arguments later if needed.
In a custom process initializer, we set up a global thread pool executor,6 and then call the process initializer provided by the user:
# this code runs in each worker process
_executor = None
def _init_process(max_threads, initializer, initargs):
global _executor
_executor = concurrent.futures.ThreadPoolExecutor(max_threads)
if initializer:
initializer(*initargs)
Likewise, submit() passes the work along to the thread pool executor:
class ProcessThreadPoolExecutor(concurrent.futures.ProcessPoolExecutor):
# ...
def submit(self, fn, *args, **kwargs):
return super().submit(_submit, fn, *args, **kwargs)
# this code runs in each worker process
# ...
def _submit(fn, *args, **kwargs):
return _executor.submit(fn, *args, **kwargs).result()
OK, that looks good enough; let's use it and see if it works:
def _do_stuff(n):
print(f"doing: {n}")
return n ** 2
if __name__ == '__main__':
with ProcessThreadPoolExecutor() as e:
print(list(e.map(_do_stuff, [0, 1, 2])))
$ python ptpe.py
doing: 0
doing: 1
doing: 2
[0, 1, 4]
Wait, we got it on the first try?!
Let's measure that:
>>> from bench import *
>>> from ptpe import *
>>> benchmark(ProcessThreadPoolExecutor(30, initializer=init_client), n=1000)
elapsed: 6.161
Hmmm... that's unexpectedly slow... almost as if:
>>> multiprocessing.cpu_count()
4
>>> benchmark(ProcessPoolExecutor(4, initializer=init_client), n=1000)
elapsed: 6.067
Ah, because _submit()
waits for the result() in the main thread of the worker process, this is just a ProcessPoolExecutor with extra steps.
But what if we send back the future object instead?
def submit(self, fn, *args, **kwargs):
return super().submit(_submit, fn, *args, **kwargs).result()
def _submit(fn, *args, **kwargs):
return _executor.submit(fn, *args, **kwargs)
Alas:
$ python ptpe.py
doing: 0
doing: 1
doing: 2
concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
File "concurrent/futures/process.py", line 210, in _sendback_result
result_queue.put(_ResultItem(work_id, result=result,
File "multiprocessing/queues.py", line 391, in put
obj = _ForkingPickler.dumps(obj)
File "multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
TypeError: cannot pickle '_thread.RLock' object
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "ptpe.py", line 42, in <module>
print(list(e.map(_do_stuff, [0, 1, 2])))
...
TypeError: cannot pickle '_thread.RLock' object
The immediate cause of the error is that the future has a condition that has a lock that can't be pickled, because threading locks only make sense within the same process.
The deeper cause is that the future is not just data, but encapsulates state owned by the thread pool executor, and sharing state between processes requires extra work.
It may not seem like it, but this is a partial success: the work happens, we just can't get the results back. Not surprising, to be honest, it couldn't have been that easy.
Getting results #
If you look carefully at the traceback, you'll find a hint of how ProcessPoolExecutor gets its own results back from workers - a queue; the module docstring even has a neat data-flow diagram:
|======================= In-process =====================|== Out-of-process ==|
+----------+ +----------+ +--------+ +-----------+ +---------+
| | => | Work Ids | | | | Call Q | | Process |
| | +----------+ | | +-----------+ | Pool |
| | | ... | | | | ... | +---------+
| | | 6 | => | | => | 5, call() | => | |
| | | 7 | | | | ... | | |
| Process | | ... | | Local | +-----------+ | Process |
| Pool | +----------+ | Worker | | #1..n |
| Executor | | Thread | | |
| | +----------- + | | +-----------+ | |
| | <=> | Work Items | <=> | | <= | Result Q | <= | |
| | +------------+ | | +-----------+ | |
| | | 6: call() | | | | ... | | |
| | | future | | | | 4, result | | |
| | | ... | | | | 3, except | | |
+----------+ +------------+ +--------+ +-----------+ +---------+
Now, we could probably use the same queue somehow, but it would involve touching a lot of (private) internals.7 Instead, let's use a separate queue:
def __init__(self, max_threads=None, initializer=None, initargs=()):
self.__result_queue = multiprocessing.Queue()
super().__init__(
initializer=_init_process,
initargs=(self.__result_queue, max_threads, initializer, initargs)
)
On the worker side, we make it globally accessible:
# this code runs in each worker process
_executor = None
_result_queue = None
def _init_process(queue, max_threads, initializer, initargs):
global _executor, _result_queue
_executor = concurrent.futures.ThreadPoolExecutor(max_threads)
_result_queue = queue
if initializer:
initializer(*initargs)
...so we can use it from a task callback registered by _submit()
:
def _submit(fn, *args, **kwargs):
task = _executor.submit(fn, *args, **kwargs)
task.add_done_callback(_put_result)
def _put_result(task):
if exception := task.exception():
_result_queue.put((False, exception))
else:
_result_queue.put((True, task.result()))
Back in the main process, we handle the results in a thread:
def __init__(self, max_threads=None, initializer=None, initargs=()):
# ...
self.__result_handler = threading.Thread(target=self.__handle_results)
self.__result_handler.start()
def __handle_results(self):
for ok, result in iter(self.__result_queue.get, None):
print(f"{'ok' if ok else 'error'}: {result}")
Finally, to stop the handler, we use None as a sentinel on executor shutdown:
def shutdown(self, wait=True):
super().shutdown(wait=wait)
if self.__result_queue:
self.__result_queue.put(None)
if wait:
self.__result_handler.join()
self.__result_queue.close()
self.__result_queue = None
Let's see if it works:
$ python ptpe.py
doing: 0
ok: [0]
doing: 1
ok: [1]
doing: 2
ok: [4]
Traceback (most recent call last):
File "concurrent/futures/_base.py", line 317, in _result_or_cancel
return fut.result(timeout)
AttributeError: 'NoneType' object has no attribute 'result'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
...
AttributeError: 'NoneType' object has no attribute 'cancel'
Yay, the results are making it to the handler!
The error happens because instead of returning a Future, our submit() returns the result of _submit()
, which is always None.
Fine, we'll make our own futures #
But submit() must return a future, so we make our own:
def __init__(self, max_threads=None, initializer=None, initargs=()):
# ...
self.__tasks = {}
# ...
def submit(self, fn, *args, **kwargs):
outer = concurrent.futures.Future()
task_id = id(outer)
self.__tasks[task_id] = outer
outer.set_running_or_notify_cancel()
inner = super().submit(_submit, task_id, fn, *args, **kwargs)
return outer
In order to map results to their futures, we can use a unique identifier; the id() of the outer future should do, since it is unique for the object's lifetime.
We pass the id to _submit()
, then to _put_result()
as an attribute on the future, and finally back in the queue with the result:
def _submit(task_id, fn, *args, **kwargs):
task = _executor.submit(fn, *args, **kwargs)
task.task_id = task_id
task.add_done_callback(_put_result)
def _put_result(task):
if exception := task.exception():
_result_queue.put((task.task_id, False, exception))
else:
_result_queue.put((task.task_id, True, task.result()))
Back in the result handler, we find the maching future, and set the result accordingly:
def __handle_results(self):
for task_id, ok, result in iter(self.__result_queue.get, None):
outer = self.__tasks.pop(task_id)
if ok:
outer.set_result(result)
else:
outer.set_exception(result)
And it works:
$ python ptpe.py
doing: 0
doing: 1
doing: 2
[0, 1, 4]
I mean, it really works:
>>> benchmark(ProcessThreadPoolExecutor(10, initializer=init_client))
elapsed: 6.220
>>> benchmark(ProcessThreadPoolExecutor(20, initializer=init_client))
elapsed: 3.397
>>> benchmark(ProcessThreadPoolExecutor(30, initializer=init_client))
elapsed: 2.575
>>> benchmark(ProcessThreadPoolExecutor(40, initializer=init_client))
elapsed: 2.664
3.3x is not quite the 4 CPUs my laptop has, but it's pretty close, and much better than the 2.2x we got from processes alone.
Death becomes a problem #
I wonder what happens when a worker process dies.
For example, the initializer can fail:
>>> executor = ProcessPoolExecutor(initializer=divmod, initargs=(0, 0))
>>> executor.submit(int).result()
Exception in initializer:
Traceback (most recent call last):
...
ZeroDivisionError: integer division or modulo by zero
Traceback (most recent call last):
...
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.
...or a worker can die some time later, which we can help along with a custom timer
:8
@contextmanager
def terminate_child(interval=1):
threading.Timer(interval, psutil.Process().children()[-1].terminate).start()
yield
>>> executor = ProcessPoolExecutor(initializer=init_client)
>>> benchmark(executor, timer=terminate_child)
[ one second later ]
Traceback (most recent call last):
...
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.
Now let's see our executor:
>>> executor = ProcessThreadPoolExecutor(30, initializer=init_client)
>>> benchmark(executor, timer=terminate_child)
[ one second later ]
[ ... ]
[ still waiting ]
[ ... ]
[ hello? ]
If the dead worker is not around to send back results, its futures never get completed, and map() keeps waiting until the end of time, when the expected behavior is to detect when this happens, and fail all pending tasks with BrokenProcessPool.
Before we do that, though, let's address a more specific issue.
If map() hasn't finished submitting tasks when the worker dies, inner
fails with BrokenProcessPool, which right now we're ignoring entirely. While we don't need to do anything about it in particular because it gets covered by handling the general case, we should still propagate all errors to the outer
task anyway.
def submit(self, fn, *args, **kwargs):
# ...
inner = super().submit(_submit, task_id, fn, *args, **kwargs)
inner.task_id = task_id
inner.add_done_callback(self.__handle_inner)
return outer
def __handle_inner(self, inner):
task_id = inner.task_id
if exception := inner.exception():
if outer := self.__tasks.pop(task_id, None):
outer.set_exception(exception)
This fixes the case where a worker dies almost instantly:
>>> executor = ProcessThreadPoolExecutor(30, initializer=init_client)
>>> benchmark(executor, timer=lambda: terminate_child(0))
Traceback (most recent call last):
...
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.
For the general case, we need to check if the executor is broken - but how? We've already decided we don't want to depend on internals, so we can't use ProcessPoolExecutor._broken. Maybe we can submit a dummy task and see if it fails instead:
def __check_broken(self):
try:
super().submit(int).cancel()
except concurrent.futures.BrokenExecutor as e:
return type(e)(str(e))
except RuntimeError as e:
if 'shutdown' not in str(e):
raise
return None
Using it is a bit involved, but not completely awful:
def __handle_results(self):
last_broken_check = time.monotonic()
while True:
now = time.monotonic()
if now - last_broken_check >= .1:
if exc := self.__check_broken():
break
last_broken_check = now
try:
value = self.__result_queue.get(timeout=.1)
except queue.Empty:
continue
if not value:
return
task_id, ok, result = value
if outer := self.__tasks.pop(task_id, None):
if ok:
outer.set_result(result)
else:
outer.set_exception(result)
while self.__tasks:
try:
_, outer = self.__tasks.popitem()
except KeyError:
break
outer.set_exception(exc)
When there's a steady stream of results coming in, we don't want to check too often, so we enforce a minimum delay between checks. When there are no results coming in, we want to check regularly, so we use the Queue.get() timeout to avoid waiting forever. If the check fails, we break out of the loop and fail the pending tasks. Like so:
>>> executor = ProcessThreadPoolExecutor(30, initializer=init_client)
>>> benchmark(executor, timer=terminate_child)
Traceback (most recent call last):
...
concurrent.futures.process.BrokenProcessPool: A child process terminated abruptly, the process pool is not usable anymore
So, yeah, I think we're done. Here's the final executor and benchmark code.
Some features left as an exercise for the reader:
- providing a ThreadPoolExecutor initializer
- using other start methods
- shutdown()'s
cancel_futures
Learned something new today? Share this with others, it really helps!
Want to know when new articles come out? Subscribe here to get new stuff straight to your inbox!
Bonus: free threading #
You may have heard people being excited about the experimental free threading support added in Python 3.13, which allows running Python code on multiple CPUs.
And for good reason:
$ python3.13t
Python 3.13.2 experimental free-threading build
>>> from concurrent.futures import *
>>> from bench import *
>>> init_client()
>>> benchmark(ThreadPoolExecutor(30))
elapsed: 8.224
>>> benchmark(ThreadPoolExecutor(40))
elapsed: 6.193
>>> benchmark(ThreadPoolExecutor(120))
elapsed: 2.323
3.6x over to the GIL version, with none of the shenanigans in this article!
Alas, packages with extensions need to be updated to support it:
>>> import psutil
zsh: segmentation fault python3.13t
...but the ecosystem is slowly catching up.
-
At least, all we can use for pure-Python code. I/O always releases the global interpreter lock, and so do some extension modules. [return]
-
The psutil documentation for memory_full_info() explains the difference quite nicely and links to further resources, because good libraries educate. [return]
-
You may have to run Python as root to get the USS of child processes. [return]
-
And no, asyncio is not a solution, since the event loop runs in a single thread, so you'd still need to run one event loop per CPU in dedicated processes. [return]
-
We could have used composition instead, but then we'd have to implement the full Executor interface, defining each method explicitly to delegate to the inner process pool executor, and keep things up to date when the interface gets new methods (and we'd have no way to trick the inner executor's map() to use our submit(), so we'd have to implement it from scratch).
Yet another option would be to use both inheritance and composition - inherit the Executor base class directly for the common methods (assuming they're defined there and not in subclasses), and delegate to the inner executor only where needed (likely just map() and shutdown()). But, the only difference from the current code would be that it'd say
self._inner
instead ofsuper()
in a few places, so it's not really worth it, in my opinion. [return] -
A previous version of this code attempted to shutdown() the thread pool executor using atexit, but since atexit functions run after non-daemon threads finish, it wasn't actually doing anything. Not shutting it down seems to work for now, but we may still need do it to support
shutdown(cancel_futures=True)
properly. [return] -
Check out nilp0inter/threadedprocess for an idea of what that looks like. [return]
-
pkill -fn '[Pp]ython'
would've done it too, but it gets tedious if you do it a lot, and it's a different command on Windows. [return]
07 May 2025 6:00pm GMT
Real Python: How to Use Loguru for Simpler Python Logging
In Python, logging is a vital programming practice that helps you track, understand, and debug your application's behavior. Loguru is a Python library that provides simpler, more intuitive logging compared to Python's built-in logging
module.
Good logging gives you insights into your program's execution, helps you diagnose issues, and provides valuable information about your application's health in production. Without proper logging, you risk missing critical errors, spending countless hours debugging blind spots, and potentially undermining your project's overall stability.
By the end of this tutorial, you'll understand that:
- Logging in Python can be simple and intuitive with the right tools.
- Using Loguru lets you start logging immediately without complex configuration.
- You can customize log formats and send logs to multiple destinations like files, the standard error stream, or external services.
- You can implement automatic log rotation and retention policies to manage log files effectively.
- Loguru provides powerful debugging capabilities that make troubleshooting easier.
- Loguru supports structured logging with JSON formatting for modern applications.
After reading this tutorial, you'll be able to quickly implement better logging in your Python applications. You'll spend less time wrestling with logging configuration and more time using logs effectively to debug issues. This will help you build production-ready applications that are easier to troubleshoot when problems occur.
To get the most from this tutorial, you should be familiar with Python concepts like functions, decorators, and context managers. You might also find it helpful to have some experience with Python's built-in logging
module, though this isn't required.
Don't worry if you're new to logging in Python. This tutorial will guide you through everything you need to know to get started with Loguru and implement effective logging in your applications.
You'll do parts of the coding for this tutorial in the Python standard REPL, and some other parts with Python scripts. You'll find full script examples in the materials of this tutorial. You can download these scripts by clicking the link below:
Get Your Code: Click here to download the free sample code that shows you how to use Loguru for simpler Python logging.
Installing Loguru
Loguru is available on PyPI, and you can install it with pip
. Open a terminal or command prompt, create a new virtual environment, and then install the library:
This command will install the latest version of Loguru from Python Package Index (PyPI) onto your machine.
Verifying the Installation
To verify that the installation was successful, start a Python REPL:
(venv) $ python
Next, import Loguru:
>>> import loguru
If the import runs without error, then you've successfully installed Loguru and can now use it to log messages in your Python programs and applications.
Understanding Basic Setup Considerations
Before diving into Loguru's features, there are a few key points to keep in mind:
-
Single Logger Instance: Unlike Python's built-in
logging
module, Loguru uses a single logger instance. You don't need to create multiple loggers, just import the pre-configuredlogger
object:Pythonfrom loguru import logger
-
Default Configuration: Out of the box, Loguru logs to stderr with a reasonable default format. This means you can start logging immediately without any setup.
- Python Version Compatibility: Loguru supports Python 3.5 and above.
Now that you understand these basic considerations, you're ready to start logging with Loguru. In the next section, you'll learn about basic logging operations and how to customize them to suit your needs.
Learning the Fundamentals of Logging With Loguru
Read the full article at https://realpython.com/python-loguru/ »
[ Improve Your Python With 🐍 Python Tricks 💌 - Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
07 May 2025 2:00pm GMT
Django Weblog: Django security releases issued: 5.2.1, 5.1.9 and 4.2.21
In accordance with our security release policy, the Django team is issuing releases for Django 5.2.1, Django 5.1.9 and Django 4.2.21. These releases address the security issues detailed below. We encourage all users of Django to upgrade as soon as possible.
CVE-2025-32873: Denial-of-service possibility in strip_tags()
django.utils.html.strip_tags() would be slow to evaluate certain inputs containing large sequences of incomplete HTML tags. This function is used to implement the striptags template filter, which was thus also vulnerable. django.utils.html.strip_tags() now raises a SuspiciousOperation exception if it encounters an unusually large number of unclosed opening tags.
Thanks to Elias Myllymäki for the report.
This issue has severity "moderate" according to the Django security policy.
Affected supported versions
- Django main
- Django 5.2
- Django 5.1
- Django 4.2
Resolution
Patches to resolve the issue have been applied to Django's main, 5.2, 5.1, and 4.2 branches. The patches may be obtained from the following changesets.
CVE-2025-32873: Denial-of-service possibility in strip_tags()
- On the main branch
- On the 5.2 branch
- On the 5.1 branch
- On the 4.2 branch
The following releases have been issued
- Django 5.2.1 (download Django 5.2.1 | 5.2.1 checksums)
- Django 5.1.9 (download Django 5.1.9 | 5.1.9 checksums)
- Django 4.2.21 (download Django 4.2.21 | 4.2.21 checksums)
The PGP key ID used for this release is Natalia Bidart: 2EE82A8D9470983E
General notes regarding security reporting
As always, we ask that potential security issues be reported via private email to security@djangoproject.com, and not via Django's Trac instance, nor via the Django Forum. Please see our security policies for further information.
07 May 2025 2:00pm GMT
John Cook: Converting between quaternions and rotation matrices
In the previous post I wrote about representing rotations with quaternions. This representation has several advantages, such as making it clear how rotations compose. Rotations are often represented as matrices, and so it's useful to be able to go between the two representations.
A unit-length quaternion (q0, q1, q2, q3) represents a rotation by an angle θ around an axis in the direction of (q1, q2, q3) where cos(θ/2) = q0. The corresponding rotation matrix is given below.
Going the other way around, inferring a quaternion representation from a rotation matrix, is harder. Here is a mathematically correct but numerically suboptimal method known [1] as the Chiaverini-Siciliano method.
Here sgn is the sign function; sgn(x) equals 1 if x is positive and −1 if x is negative. Note that the components only depend on the diagonal of the rotation matrix, aside from the sign terms. Better numerical algorithms make more use of the off-diagonal elements.
Accounting for degrees of freedom
Something seems a little suspicious here. Quaternions contain four real numbers, and 3 by 3 matrices contain nine. How can four numbers determine nine numbers? And going the other way, out of the nine, we essentially choose three that determine the four components of a quaternion.
Quaterions have four degrees of freedom, but we're using unit quaternions, so there are basically three degrees of freedom. Likewise orthogonal matrices have three degrees of freedom. An axis of rotation is a point on a sphere, so that has two degrees of freedom, and the degree of rotation is the third degree of freedom.
In topological terms, the unit quaternions and the set of 3 by 3 orthogonal matrices are both three dimensional manifolds, and the former is a double cover of the latter. It is a double cover because a unit quaternion q corresponds to the same rotation as −q.
Python code
Implementing the equations above is straightforward.
import numpy as np def quaternion_to_rotation_matrix(q): q0, q1, q2, q3 = q return np.array([ [2*(q0**2 + q1**2) - 1, 2*(q1*q2 - q0*q3), 2*(q1*q3 + q0*q2)], [2*(q1*q2 + q0*q3), 2*(q0**2 + q2**2) - 1, 2*(q2*q3 - q0*q1)], [2*(q1*q3 - q0*q2), 2*(q2*q3 + q0*q1), 2*(q0**2 + q3**2) - 1] ]) def rotation_matrix_to_quaternion(R): r11, r12, r13 = R[0, 0], R[0, 1], R[0, 2] r21, r22, r23 = R[1, 0], R[1, 1], R[1, 2] r31, r32, r33 = R[2, 0], R[2, 1], R[2, 2] # Calculate quaternion components q0 = 0.5 * np.sqrt(1 + r11 + r22 + r33) q1 = 0.5 * np.sqrt(1 + r11 - r22 - r33) * np.sign(r32 - r23) q2 = 0.5 * np.sqrt(1 - r11 + r22 - r33) * np.sign(r13 - r31) q3 = 0.5 * np.sqrt(1 - r11 - r22 + r33) * np.sign(r21 - r12) return np.array([q0, q1, q2, q3])
Random testing
We'd like to test the code above by generating random quaternions, converting the quaternions to rotation matrices, then back to quaternions to verify that the round trip puts us back essentially where we started. Then we'd like to go the other way around, starting with randomly generated rotation matrices.
To generate a random unit quaternion, we generate a vector of four independent normal random values, then normalize by dividing by its length. (See this recent post.)
To generate a random rotation matrix, we use a generator that is part of SciPy.
Here's the test code:
def randomq(): q = norm.rvs(size=4) return q/np.linalg.norm(q) def randomR(): return special_ortho_group.rvs(dim=3) np.random.seed(20250507) N = 10 for _ in range(N): q = randomq() R = quaternion_to_rotation_matrix(q) t = rotation_matrix_to_quaternion(R) print(np.linalg.norm(q - t)) for _ in range(N): R = randomR() q = rotation_matrix_to_quaternion(R) T = quaternion_to_rotation_matrix(q) print(np.linalg.norm(R - T))
The first test utterly fails, returning six 2s, i.e. the round trip vector is as far as possible from the vector we started with. How could that happen? It must be returning the negative of the original vector. Now go back to the discussion above about double covers: q and −q correspond to the same rotation.
If we go back and add the line
q *= np.sign(q[0])
then we standardize our random vectors to have a positive first component, just like the vectors returned by rotation_matrix_to_quaternion
.
Now our tests all return norms on the order of 10−16 to 10−14. There's a little room to improve the accuracy, but the results are good.
Update: I did some more random testing, and found errors on the order of 10−10. Then I was able to create a test case where rotation_matrix_to_quaternion
threw an exception because one of the square roots had a negative argument. In [1] the authors get around this problem by evaluating two theoretically equivalent expressions for each of the square root arguments. The expressions are complementary in the sense that both should not lead to numerical difficulties at the same time.
[1] See "Accurate Computation of Quaternions from Rotation Matrices" by Soheil Sarabandi and Federico Thomas for a better numerical algorithm. See also the article "A Survey on the Computation of Quaternions From Rotation Matrices" by the same authors.
The post Converting between quaternions and rotation matrices first appeared on John D. Cook.
07 May 2025 1:52pm GMT
Python Insider: Python 3.14.0 beta 1 is here!
Only one day late, welcome to the first beta!
https://www.python.org/downloads/release/python-3140b1/
This is a beta preview of Python 3.14
Python 3.14 is still in development. This release, 3.14.0b1, is the first of four planned beta releases.
Beta release previews are intended to give the wider community the opportunity to test new features and bug fixes and to prepare their projects to support the new feature release.
We strongly encourage maintainers of third-party Python projects to test with 3.14 during the beta phase and report issues found to the Python bug tracker as soon as possible. While the release is planned to be feature-complete entering the beta phase, it is possible that features may be modified or, in rare cases, deleted up until the start of the release candidate phase (Tuesday 2025-07-22). Our goal is to have no ABI changes after beta 4 and as few code changes as possible after the first release candidate. To achieve that, it will be extremely important to get as much exposure for 3.14 as possible during the beta phase.
Please keep in mind that this is a preview release and its use is not recommended for production environments.
Major new features of the 3.14 series, compared to 3.13
Some of the major new features and changes in Python 3.14 are:
New features
- PEP 649: The evaluation of type annotations is now deferred, improving the semantics of using annotations.
- PEP 750: Template string literals (t-strings) for custom string processing, using the familiar syntax of f-strings.
- PEP 784: A new module
compression.zstd
providing support for the Zstandard compression algorithm. - PEP 758:
except
andexcept*
expressions may now omit the brackets. - Syntax highlighting in PyREPL, and support for color in unittest, argparse, json and calendar CLIs.
- PEP 768: A zero-overhead external debugger interface for CPython.
- UUID versions 6-8 are now supported by the
uuid
module, and generation of versions 3-5 and 8 are up to 40% faster. - PEP 765: Disallow
return
/break
/continue
that exit afinally
block. - PEP 741: An improved C API for configuring Python.
- A new type of interpreter. For certain newer compilers, this interpreter provides significantly better performance. Opt-in for now, requires building from source.
- Improved error messages.
- Builtin implementation of HMAC with formally verified code from the HACL* project.
(Hey, fellow core developer, if a feature you find important is missing from this list, let Hugo know.)
For more details on the changes to Python 3.14, see What's new in Python 3.14. The next pre-release of Python 3.14 will be 3.14.0b2, scheduled for 2025-05-27.
Build changes
- PEP 761: Python 3.14 and onwards no longer provides PGP signatures for release artifacts. Instead, Sigstore is recommended for verifiers.
- Official macOS and Windows release binaries include an experimental JIT compiler.
Incompatible changes, removals and new deprecations
- Incompatible changes
- Python removals and deprecations
- C API removals and deprecations
- Overview of all pending deprecations
Python install manager
The installer we offer for Windows is being replaced by our new install manager, which can be installed from the Windows Store or our FTP page. See our documentation for more information. The JSON file available for download below contains the list of all the installable packages available as part of this release, including file URLs and hashes, but is not required to install the latest release. The traditional installer will remain available throughout the 3.14 and 3.15 releases.
More resources
- Online documentation
- PEP 745, 3.14 Release Schedule
- Report bugs at github.com/python/cpython/issues
- Help fund Python and its community
Note
During the release process, we discovered a test that only failed when run sequentially and only when run after a certain number of other tests. This appears to be a problem with the test itself, and we will make it more robust for beta 2. For details, see python/cpython#133532.
And now for something completely different
The mathematical constant pi is represented by the Greek letter π and represents the ratio of a circle's circumference to its diameter. The first person to use π as a symbol for this ratio was Welsh self-taught mathematician William Jones in 1706. He was a farmer's son born in Llanfihangel Tre'r Beirdd on Angelsy (Ynys Môn) in 1675 and only received a basic education at a local charity school. However, the owner of his parents' farm noticed his mathematical ability and arranged for him to move to London to work in a bank.
By age 20, he served at sea in the Royal Navy, teaching sailors mathematics and helping with the ship's navigation. On return to London seven years later, he became a maths teacher in coffee houses and a private tutor. In 1706, Jones published Synopsis Palmariorum Matheseos which used the symbol π for the ratio of a circle's circumference to diameter (hunt for it on pages 243 and 263 or here). Jones was also the first person to realise π is an irrational number, meaning it can be written as decimal number that goes on forever, but cannot be written as a fraction of two integers.
But why π? It's thought Jones used the Greek letter π because it's the first letter in perimetron or perimeter. Jones was the first to use π as our familiar ratio but wasn't the first to use it in as part of the ratio. William Oughtred, in his 1631 Clavis Mathematicae (The Key of Mathematics), used π/δ to represent what we now call pi. His π was the circumference, not the ratio of circumference to diameter. James Gregory, in his 1668 Geometriae Pars Universalis (The Universal Part of Geometry) used π/ρ instead, where ρ is the radius, making the ratio 6.28… or τ. After Jones, Leonhard Euler had used π for 6.28…, and also p for 3.14…, before settling on and popularising π for the famous ratio.
Enjoy the new release
Thanks to all of the many volunteers who help make Python Development and these releases possible! Please consider supporting our efforts by volunteering yourself or through organisation contributions to the Python Software Foundation.
Regards from Helsinki as the leaves begin to appear on the trees,
Your release team,
Hugo van Kemenade
Ned Deily
Steve Dower
Łukasz Langa
07 May 2025 1:43pm GMT
Daniel Roy Greenfeld: TIL: ^ bitwise XOR
How to mark a comparison of booleans as True or False using bitwise XOR.
07 May 2025 3:21am GMT
06 May 2025
Planet Python
PyCoder’s Weekly: Issue #680: Thread Safety, Pip 25.1, DjangoCon EU Wrap-Up, and More (May 6, 2025)
#680 - MAY 6, 2025
View in Browser »
Thread Safety in Python: Locks and Other Techniques
In this video course, you'll learn about the issues that can occur when your code is run in a multithreaded environment. Then you'll explore the various synchronization primitives available in Python's threading module, such as locks, which help you make your code safe.
REAL PYTHON course
What's New in Pip 25.1
pip 25.1 introduces support for Dependency Groups (PEP 735), resumable downloads, and an installation progress bar. Dependency resolution has also received a raft of bugfixes and improvements.
RICHARD SI
Deploy Your Streamlit, Dash, Bokeh Apps all in one Place
Posit Connect Cloud is a cloud environment for showcasing your Python apps, no matter the framework.
POSIT sponsor
Takeaways From DjangoCon EU 2025
A deep summary of concepts that Zach learned at DjangoCon EU. For more content, also see Sumit's post about his talk.
ZACH BELLAY
Articles & Tutorials
Modern Web Automation With Python and Selenium
Learn advanced Python web automation techniques with Selenium, such as headless browsing, interacting with web elements, and implementing the Page Object Model pattern.
REAL PYTHON
Quiz: Web Automation With Python and Selenium
In this quiz, you'll test your understanding of using Selenium with Python for web automation. You'll revisit concepts like launching browsers, interacting with web elements, handling dynamic content, and implementing the Page Object Model (POM) design pattern.
REAL PYTHON
Using JWTs in Python Flask REST Framework
"JSON Web Tokens (JWTs) secure communication between parties over the internet by authenticating users and transmitting information securely, without requiring a centralized storage system." This article shows you how they work using a to-do list API in Flask.
FEDERICO TROTTA • Shared by AppSignal
The PyArrow Revolution
Pandas is built on NumPy, but changes are coming to allow the optional use of PyArrow. Talk Python interviews Reuven Lerner and they talk about what this means and how it will improve performance.
KENNEDY & LERNER podcast
Quirks in Django's Template Language
Lily has been porting the Django template language into Rust and along the way has found some weird corner cases and some bugs. This post talks about those discoveries.
LILY F
PyXL: Python, on Hardware
PyXL is a custom chip that runs compiled Python ByteCode directly in hardware. Designed for real-time and embedded systems where Python was never fast enough-until now.
RUNPYXL.COM
Debugging Python f-string
Errors
Brandon encountered a TypeError
when using a variable inside an f-string, which converted with str()
just fine. This post talks about what happened and why.
BRANDON CHINN
Managing Python Projects With uv
In this tutorial, you'll learn how to create and manage your Python projects using uv, an extremely fast Python package and project manager written in Rust.
REAL PYTHON
Top Python Code Quality Tools
This guide covers a list of tools that can help you produce higher quality Python code. It includes linters, code formatters, type checkers, and much more.
MEENAKSHI AGARWAL
Quiz: Managing Python Projects With uv
In this quiz, you'll test your understanding of the uv tool, a high-speed package and project manager for Python.
REAL PYTHON
An Introduction to Testing in Python Flask
Like with any other library, when writing with Flask you should be writing tests. This article shows you how.
FREDERICO TROTTA
PSF Names New Deputy Executive Director
Loren Crary has been promoted to Deputy Executive Director of the Python Software Foundation.
PYTHON SOFTWARE FOUNDATION
Projects & Code
pip-Dev: Interactive Tool for Testing Python Version Specifiers
NOK.GITHUB.IO • Shared by Darius Morawiec
Events
Python Atlanta
May 8 to May 9, 2025
MEETUP.COM
Python Communities
May 10 to May 11, 2025
NOKIDBEHIND.ORG
DFW Pythoneers 2nd Saturday Teaching Meeting
May 10, 2025
MEETUP.COM
PiterPy Meetup
May 13, 2025
PITERPY.COM
PyCon US 2025
May 14 to May 23, 2025
PYCON.ORG
Happy Pythoning!
This was PyCoder's Weekly Issue #680.
View in Browser »
[ Subscribe to 🐍 PyCoder's Weekly 💌 - Get the best Python news, articles, and tutorials delivered to your inbox once a week >> Click here to learn more ]
06 May 2025 7:30pm GMT
Ari Lamstein: Course Review: Build an AI chatbot with Python
For a while now I've been wanting to learn more about LLMs. The problem has been that I wasn't sure where to start.
So when Kevin Markham launched his course Build an AI chatbot with Python I jumped at the chance to take it. I had previously taken Kevin's course on Pandas and enjoyed his teaching style. Build an AI chatbot with Python is short (Kevin says you can finish it in an hour, although I took longer) and cheap ($9).
The course starts with the very basics: creating an API key on OpenAI and installing the necessary packages. It ends with using LangChain and LangGraph to create a simple bot that has memory and can keep track of conversations with multiple users. Here's an example:
Here you can see that Chatbot #1 learned that my name is Ari. I then terminated that bot and created another one. That new bot (#2) did not know my name. I then terminated it and reloaded bot #1. Bot #1 still remembered my name.
Due to its length, the course doesn't teach you how to build anything more complex than that. But if you are just looking for a brief introduction to the field, then this might be exactly what you are looking for. It certainly was for me!
Kevin is currently working on a followup course ("Build AI agents with Python") which I am currently reviewing. If people are interested, I can post a review of that course when I finish it as well. You can use this form to contact me and let me know if you are interested in that.
06 May 2025 4:10pm GMT
Real Python: Using the Python subprocess Module
Python's subprocess
module allows you to run shell commands and manage external processes directly from your Python code. By using subprocess
, you can execute shell commands like ls
or dir
, launch applications, and handle both input and output streams. This module provides tools for error handling and process communication, making it a flexible choice for integrating command-line operations into your Python projects.
By the end of this video course, you'll understand that:
- The Python
subprocess
module is used to run shell commands and manage external processes. - You run a shell command using
subprocess
by callingsubprocess.run()
with the command as a list of arguments. subprocess.call()
,subprocess.run()
, andsubprocess.Popen()
differ in how they execute commands and handle process output and return codes.multiprocessing
is for parallel execution within Python, whilesubprocess
manages external processes.- To execute multiple commands in sequence using
subprocess
, you can chain them by using pipes or running them consecutively.
[ Improve Your Python With 🐍 Python Tricks 💌 - Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
06 May 2025 2:00pm GMT
Python Software Foundation: Announcing Python Software Foundation Fellow Members for Q1 2025! 🎉
The PSF is pleased to announce its first batch of PSF Fellows for 2025! Let us welcome the new PSF Fellows for Q1! The following people continue to do amazing things for the Python community:
Aidis Stukas
Website, GitHub, LinkedIn, X(Twitter)
Baptiste Mispelon
Charlie Marsh
Felipe de Morais
Frank Wiles
Ivy Fung Oi Wei
Jon Banafato
Julia Duimovich
Leandro Enrique Colombo Viña
X(Twitter), GitHub, LinkedIn, Instagram
Mike Pirnat
Sage Sharp
Tereza Iofciu
Website, GitHub, Bluesky, Mastodon, LinkedIn
Velda Kiara
Website, LinkedIn, X(Twitter), Mastodon, Bluesky, GitHub
Thank you for your continued contributions. We have added you to our Fellows Roster.
The above members help support the Python ecosystem by being phenomenal leaders, sustaining the growth of the Python scientific community, maintaining virtual Python communities, maintaining Python libraries, creating educational material, organizing Python events and conferences, starting Python communities in local regions, and overall being great mentors in our community. Each of them continues to help make Python more accessible around the world. To learn more about the new Fellow members, check out their links above.
Let's continue recognizing Pythonistas all over the world for their impact on our community. The criteria for Fellow members is available on our PSF Fellow Membership page. If you would like to nominate someone to be a PSF Fellow, please send a description of their Python accomplishments and their email address to psf-fellow at python.org. Quarter 2 nominations will be in review soon. We are accepting nominations for Quarter 2 of 2025 through May 20th, 2025.
Are you a PSF Fellow and want to help the Work Group review nominations? Contact us at psf-fellow at python.org.
06 May 2025 12:13pm GMT
05 May 2025
Planet Python
PyCon: Asking the Key Questions: Q&A with the PyCon US 2025 keynote speakers
Get to know the all-star lineup of PyCon US 2025 keynote speakers. They've graciously answered our questions, and shared some conference advice plus tidbits of their backstories-from rubber ducks to paper towel printing to Pac-Man. Read along and get excited to see them live as we count down to the event!
How did you get started in tech/Python? Did you have a friend or a mentor that helped you?
CORY DOCTOROW: My father was a computer scientist so we grew up with computers in the house. Our first "computer" was a Cardiac cardboard computer (CARDboard Illustrative Aid to Computation) that required a human to move little tokens around in slots: https://en.wikipedia.org/wiki/CARDboard_Illustrative_Aid_to_Computation
Then in the late seventies, when I was 6-7, we got a teletype terminal and an acoustic coupler that we could use to connect to a PDP-11 at the University of Toronto. However, my computing was limited by how much printer-paper we had for the teletype. Luckily, my mother was a kindergarten teacher and she was able to bring home 1,000' rolls of paper towel from the kids' bathrooms. I'd print up one side of them, then reverse the roll and print down the other side, and then, finally, I'd re-roll-up the paper so my mom could take the paper into school for the kids to dry their hands on.
LYNN ROOT: I started in 2011, learning how to code through an online intro to CS course. It was awful - who thinks C is a good first language? I failed both midterms (failed as in, "here's a D, be thankful for the grading curve"), but somehow finished the course with an A- because I learned Python for my final project. After that experience, I had to learn more, but didn't want to go through a "proper" degree program. It's actually how PyLadies SF got started: I wanted friends to learn to program with, so I figured - why not invite other like-minded people to join me!
I did (and still do) have a mentor - I definitely wouldn't be where I am today without the guidance and patience of Hynek Schlawack, who also happens to be my best friend ( hi bestiee ). He's been there since the very beginning, and I hope someday I can repay him. I do try to pay it forward with mentoring women who are early in their careers. Everyone deserves a Hynek!
TOM MEAGHER: As a journalist, I've had no formal training in programming. Most of what I have learned - including Python and pandas and Django and other tools for data analysis and investigative reporting - has come through my connection to the organization Investigative Reporters and Editors. IRE is a wonderful community of really generous journalists from around the world who teach one another new techniques and support each other in our projects.
GEOFF HING: I studied computer science and engineering as an undergrad. Python was really emerging as a language at that point, but a few years later, it was fully the "get stuff done" language among a lot of people around me. I really benefited from people I worked with being generous with their time in explaining code bases I worked with.
DR. KARI L. JORDAN: I was introduced to tech/Python when I began working for Data Carpentry back in 2016. Before then, you didn't know what I was doing to analyze my data!
What do you think the most important work you've ever done is? Or if you think it might still be in the future, can you tell us something about your plans?
CORY DOCTOROW: I have no idea - I think this is something that can only be done in retrospect. For example, I worked on an obscure but very important fight over something called the "Broadcast Flag" that would have banned GNU Radio and all other free software defined radios outright, and would have required all PC hardware to be certified by an entertainment industry committee as "piracy proof." That may yet turn out to be very important, or it may be that the work I'm doing now on antitrust - which seems likely to result in the breakup of Google and Meta - will be more important.
Have you been to PyCon US before? What are you looking forward to?
Do you have any advice for first-time conference goers?
Can you tell us about an open source or open culture project that you think not enough people know about?
05 May 2025 5:19pm GMT
Real Python: Sets in Python
Python provides a built-in set
data type. It differs from other built-in data types in that it's an unordered collection of unique elements. It also supports operations that differ from those of other data types. You might recall learning about sets and set theory in math class. Maybe you even remember Venn diagrams:

In mathematics, the definition of a set can be abstract and difficult to grasp. In practice, you can think of a set as a well-defined collection of unique objects, typically called elements or members. Grouping objects in a set can be pretty helpful in programming. That's why Python has sets built into the language.
By the end of this tutorial, you'll understand that:
- A
set
is an unordered collection of unique, hashable elements. - The
set()
constructor works by converting any iterable into a set, removing duplicate elements in the process. - You can initialize a
set
using literals, theset()
constructor, or comprehensions. - Sets are unordered because they don't maintain a specific order of elements.
- Sets are useful when you need to run set operations, remove duplicates, run efficient membership tests, and more.
In this tutorial, you'll dive deep into the features of Python sets and explore topics like set creation and initialization, common set operations, set manipulation, and more.
Get Your Code: Click here to download the free sample code that shows you how to work with sets in Python.
Take the Quiz: Test your knowledge with our interactive "Python Sets" quiz. You'll receive a score upon completion to help you track your learning progress:
Interactive Quiz
Python SetsIn this quiz, you'll assess your understanding of Python's built-in set data type. You'll revisit the definition of unordered, unique, hashable collections, how to create and initialize sets, and key set operations.
Getting Started With Python's set
Data Type
Python's built-in set
data type is a mutable and unordered collection of unique and hashable elements. In this definition, the qualifiers mean the following:
- Mutable: You can add or remove elements from an existing set.
- Unordered: A set doesn't maintain any particular order of its elements.
- Unique elements: Duplicate elements aren't allowed.
- Hashable elements: Each element must have a hash value that stays the same for its entire lifetime.
As with other mutable data types, you can modify sets by increasing or decreasing their size or number of elements. To this end, sets provide a series of handy methods that allow you to add and remove elements to and from an existing set.
The elements of a set must be unique. This feature makes sets especially useful in scenarios where you need to remove duplicate elements from an existing iterable, such as a list or tuple:
>>> numbers = [1, 2, 2, 2, 3, 4, 5, 5]
>>> set(numbers)
{1, 2, 3, 4, 5}
In practice, removing duplicate items from an iterable might be one of the most useful and commonly used features of sets.
Python implements sets as hash tables. A great feature of hash tables is that they make lookup operations almost instantaneous. Because of this, sets are exceptionally efficient in membership operations with the in
and not in
operators.
Finally, Python sets support common set operations, such as union, intersection, difference, symmetric difference, and others. This feature makes them useful when you need to do some of the following tasks:
- Find common elements in two or more sets
- Find differences between two or more sets
- Combine multiple sets together while avoiding duplicates
As you can see, set
is a powerful data type with characteristics that make it useful in many contexts and situations. Throughout the rest of this tutorial, you'll learn more about the features that make sets a worthwhile addition to your programming toolkit.
Building Sets in Python
To use a set, you first need to create it. You'll have different ways to build sets in Python. For example, you can create them using one of the following techniques:
- Set literals
- The
set()
constructor - A set comprehension
In the following sections, you'll learn how to use the three approaches listed above to create new sets in Python. You'll start with set literals.
Creating Sets Through Literals
You can define a new set by providing a comma-separated series of hashable objects within curly braces {}
as shown below:
Read the full article at https://realpython.com/python-sets/ »
[ Improve Your Python With 🐍 Python Tricks 💌 - Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
05 May 2025 2:00pm GMT
Talk Python to Me: #504: Developer Trends in 2025
What trends and technologies should you be paying attention to today? Are there hot new database servers you should check out? Or will that just be a flash in the pan? I love these forward looking episodes and this one is super fun. I've put together an amazing panel: Gina Häußge, Ines Montani, Richard Campbell, and Calvin Hendryx-Parker. We dive into the recent Stack Overflow Developer survey results as a sounding board for our thoughts on rising and falling trends in the Python and broader developer space.<br/> <br/> <strong>Episode sponsors</strong><br/> <br/> <a href='https://talkpython.fm/nordlayer'>NordLayer</a><br> <a href='https://talkpython.fm/auth0'>Auth0</a><br> <a href='https://talkpython.fm/training'>Talk Python Courses</a><br/> <br/> <h2 class="links-heading">Links from the show</h2> <div><strong>The Stack Overflow Survey Results</strong>: <a href="https://survey.stackoverflow.co/2024/?featured_on=talkpython" target="_blank" >survey.stackoverflow.co/2024</a><br/> <br/> <strong>Panelists</strong><br/> <strong>Gina Häußge</strong>: <a href="https://chaos.social/@foosel?featured_on=talkpython" target="_blank" >chaos.social/@foosel</a><br/> <strong>Ines Montani</strong>: <a href="https://ines.io/?featured_on=talkpython" target="_blank" >ines.io</a><br/> <strong>Richard Campbell</strong>: <a href="https://about.me/richard.campbell?featured_on=talkpython" target="_blank" >about.me/richard.campbell</a><br/> <strong>Calvin Hendryx-Parker</strong>: <a href="https://github.com/calvinhp?featured_on=talkpython" target="_blank" >github.com/calvinhp</a><br/> <br/> <strong>Explosion</strong>: <a href="https://explosion.ai?featured_on=talkpython" target="_blank" >explosion.ai</a><br/> <strong>spaCy</strong>: <a href="https://spacy.io?featured_on=talkpython" target="_blank" >spacy.io</a><br/> <strong>OctoPrint</strong>: <a href="https://octoprint.org?featured_on=talkpython" target="_blank" >octoprint.org</a><br/> <strong>.NET Rocks</strong>: <a href="https://www.dotnetrocks.com?featured_on=talkpython" target="_blank" >dotnetrocks.com</a><br/> <strong>Six Feet Up</strong>: <a href="https://sixfeetup.com?featured_on=talkpython" target="_blank" >sixfeetup.com</a><br/> <strong>Stack Overflow</strong>: <a href="https://stackoverflow.com/?featured_on=talkpython" target="_blank" >stackoverflow.com</a><br/> <strong>Python.org</strong>: <a href="https://www.python.org/?featured_on=talkpython" target="_blank" >python.org</a><br/> <strong>GitHub Copilot</strong>: <a href="https://github.com/features/copilot?featured_on=talkpython" target="_blank" >github.com</a><br/> <strong>OpenAI ChatGPT</strong>: <a href="https://chat.openai.com/?featured_on=talkpython" target="_blank" >chat.openai.com</a><br/> <strong>Claude</strong>: <a href="https://www.anthropic.com/index/introducing-claude?featured_on=talkpython" target="_blank" >anthropic.com</a><br/> <strong>LM Studio</strong>: <a href="https://lmstudio.ai/?featured_on=talkpython" target="_blank" >lmstudio.ai</a><br/> <strong>Hetzner</strong>: <a href="https://www.hetzner.com?featured_on=talkpython" target="_blank" >hetzner.com</a><br/> <strong>Docker</strong>: <a href="https://www.docker.com?featured_on=talkpython" target="_blank" >docker.com</a><br/> <strong>Aider Chat</strong>: <a href="https://github.com/paul-gauthier/aider?featured_on=talkpython" target="_blank" >github.com</a><br/> <strong>Codename Goose AI</strong>: <a href="https://block.github.io/goose/?featured_on=talkpython" target="_blank" >block.github.io/goose/</a><br/> <strong>IndyPy</strong>: <a href="https://www.indypy.org/?featured_on=talkpython" target="_blank" >indypy.org</a><br/> <strong>OctoPrint Community Forum</strong>: <a href="https://community.octoprint.org?featured_on=talkpython" target="_blank" >community.octoprint.org</a><br/> <strong>spaCy GitHub</strong>: <a href="https://github.com/explosion/spaCy?featured_on=talkpython" target="_blank" >github.com</a><br/> <strong>Hugging Face</strong>: <a href="https://huggingface.co/?featured_on=talkpython" target="_blank" >huggingface.co</a><br/> <strong>Watch this episode on YouTube</strong>: <a href="https://www.youtube.com/watch?v=6VZEJ8FstEQ" target="_blank" >youtube.com</a><br/> <strong>Episode transcripts</strong>: <a href="https://talkpython.fm/episodes/transcript/504/developer-trends-in-2025" target="_blank" >talkpython.fm</a><br/> <br/> <strong>--- Stay in touch with us ---</strong><br/> <strong>Subscribe to Talk Python on YouTube</strong>: <a href="https://talkpython.fm/youtube" target="_blank" >youtube.com</a><br/> <strong>Talk Python on Bluesky</strong>: <a href="https://bsky.app/profile/talkpython.fm" target="_blank" >@talkpython.fm at bsky.app</a><br/> <strong>Talk Python on Mastodon</strong>: <a href="https://fosstodon.org/web/@talkpython" target="_blank" ><i class="fa-brands fa-mastodon"></i>talkpython</a><br/> <strong>Michael on Bluesky</strong>: <a href="https://bsky.app/profile/mkennedy.codes?featured_on=talkpython" target="_blank" >@mkennedy.codes at bsky.app</a><br/> <strong>Michael on Mastodon</strong>: <a href="https://fosstodon.org/web/@mkennedy" target="_blank" ><i class="fa-brands fa-mastodon"></i>mkennedy</a><br/></div>
05 May 2025 8:00am GMT
Python Bytes: #431 Nerd Gas
<strong>Topics covered in this episode:</strong><br> <ul> <li><strong><a href="https://github.com/RafaelWO/pirel?featured_on=pythonbytes"> pirel: Python release cycle in your terminal</a></strong></li> <li><a href="https://fastapicloud.com?featured_on=pythonbytes"><strong>FastAPI Cloud</strong></a></li> <li><strong><a href="https://davepeck.org/2025/04/11/pythons-new-t-strings/?featured_on=pythonbytes">Python's new t-strings</a></strong></li> <li><strong>Extras</strong></li> <li><strong>Joke</strong></li> </ul><a href='https://www.youtube.com/watch?v=WaWjUlgWpBo' style='font-weight: bold;'data-umami-event="Livestream-Past" data-umami-event-episode="431">Watch on YouTube</a><br> <p><strong>About the show</strong></p> <p>Sponsored by <strong>NordLayer</strong>: <a href="https://pythonbytes.fm/nordlayer"><strong>pythonbytes.fm/nordlayer</strong></a></p> <p><strong>Connect with the hosts</strong></p> <ul> <li>Michael: <a href="https://fosstodon.org/@mkennedy"><strong>@mkennedy@fosstodon.org</strong></a> <strong>/</strong> <a href="https://bsky.app/profile/mkennedy.codes?featured_on=pythonbytes"><strong>@mkennedy.codes</strong></a> <strong>(bsky)</strong></li> <li>Brian: <a href="https://fosstodon.org/@brianokken"><strong>@brianokken@fosstodon.org</strong></a> <strong>/</strong> <a href="https://bsky.app/profile/brianokken.bsky.social?featured_on=pythonbytes"><strong>@brianokken.bsky.social</strong></a></li> <li>Show: <a href="https://fosstodon.org/@pythonbytes"><strong>@pythonbytes@fosstodon.org</strong></a> <strong>/</strong> <a href="https://bsky.app/profile/pythonbytes.fm"><strong>@pythonbytes.fm</strong></a> <strong>(bsky)</strong></li> </ul> <p>Join us on YouTube at <a href="https://pythonbytes.fm/stream/live"><strong>pythonbytes.fm/live</strong></a> to be part of the audience. Usually <strong>Monday</strong> at 10am PT. Older video versions available there too.</p> <p>Finally, if you want an artisanal, hand-crafted digest of every week of the show notes in email form? Add your name and email to <a href="https://pythonbytes.fm/friends-of-the-show">our friends of the show list</a>, we'll never share it.</p> <p><strong>Michael #1:</strong><a href="https://github.com/RafaelWO/pirel?featured_on=pythonbytes"> pirel: Python release cycle in your terminal</a></p> <ul> <li>pirel check shows release information about your active Python interpreter.</li> <li>If the active version is end-of-life, the program exits with code 1. If no active Python interpreter is found, the program exits with code 2.</li> <li>pirel list lists all Python releases in a table. Your active Python interpreter is highlighted.</li> <li>A picture is worth many words</li> </ul> <p><img src="https://blobs.pythonbytes.fm/pirel-cli-demo.gif" alt="" /></p> <p><strong>Brian #2:</strong> <a href="https://fastapicloud.com?featured_on=pythonbytes"><strong>FastAPI Cloud</strong></a></p> <ul> <li>Sebastián Ramírez, creator of FastAPI, <a href="https://bsky.app/profile/tiangolo.com/post/3lognxjvw4225?featured_on=pythonbytes">announced today</a> the formation of a new Company, FastAPI Cloud.</li> <li>Here's the announcement blog post: <a href="https://fastapicloud.com/blog/fastapi-cloud-by-the-same-team-behind-fastapi?featured_on=pythonbytes">FastAPI Cloud - By The Same Team Behind FastAPI</a></li> <li>There's a wait list to try it out.</li> <li>Promises to turns deployment into fastapi login; fastapi deploy</li> <li>Side note: announcement includes quote from Daft Punk: Build Harder, Better, Faster, Stronger <ul> <li>I just included this in a talk I'm gave last week (and will again next week), where I modify this to "Build Easier, Better, Faster, Stronger"</li> <li>Sebastian and I are both fans of the rocket emoji.</li> </ul></li> <li>BTW, <a href="https://pythonbytes.fm/episodes/show/123/time-to-right-the-py-wrongs">we first covered FastAPI on episode 123 in 2019</a></li> </ul> <p><strong>Brian #3:</strong> <a href="https://davepeck.org/2025/04/11/pythons-new-t-strings/?featured_on=pythonbytes">Python's new t-strings</a></p> <ul> <li>Dave Peck, one of the authors of PEP 750, which will go into Python 3.14</li> <li>We covered t-strings in <a href="https://pythonbytes.fm/episodes/show/428/how-old-is-your-python">ep 428</a></li> <li>In article <ul> <li>t-strings security benefits over f-strings</li> <li>How to work with t-strings</li> <li>A Pig Latin example <ul> <li>Also, I think I have always done this wrong</li> <li>Is it the first consonant to the end? or the first consonant cluster?</li> <li>So… Brian → Rianbay? or Ianbray?</li> <li>BTW, this is an example of nerdgassing</li> </ul></li> <li>What's next once t-strings ship?</li> </ul></li> <li>On thing that's next (in Python 3.15, maybe, is using t-strings in shlex and subprocess) <ul> <li><a href="https://peps.python.org/pep-0787/?featured_on=pythonbytes">PEP 787 - Safer subprocess usage using t-strings</a> deferred to 3.15</li> </ul></li> </ul> <p><strong>Michael #4</strong>: <a href="https://github.com/dtnewman/zev?featured_on=pythonbytes">zev</a></p> <ul> <li>A simple CLI tool to help you remember terminal commands.</li> <li><p>Examples:</p> <pre><code># Find running processes zev 'show all running python processes' # File operations zev 'find all .py files modified in the last 24 hours' # System information zev 'show disk usage for current directory' # Network commands zev 'check if google.com is reachable' # Git operations zev 'show uncommitted changes in git' </code></pre></li> <li><p>Again, picture worth many words:</p></li> </ul> <p><img src="https://blobs.pythonbytes.fm/zev-demo.gif" alt="" /></p> <p><strong>Extras</strong> </p> <p>Brian:</p> <ul> <li><a href="https://arstechnica.com/culture/2025/04/monty-python-and-the-holy-grail-turns-50/?featured_on=pythonbytes">Holy Grail turns 50</a></li> <li><a href="https://whatever.scalzi.com/2008/06/03/nerdgassing-i-coin-this-word-in-the-name-of-humanity/?featured_on=pythonbytes">nerdgassing</a></li> </ul> <p>Michael:</p> <ul> <li>Transcripts are a bit better now.</li> <li>Zen <a href="https://zen-browser.app/release-notes/#1.12.1b">is better now</a></li> </ul> <p><strong>Joke:</strong> <a href="https://x.com/PR0GRAMMERHUM0R/status/1915103409062978033?featured_on=pythonbytes">Can my friend come in?</a></p>
05 May 2025 8:00am GMT
Python GUIs: Build an Image Noise Reduction Tool with Streamlit and OpenCV — Clean up noisy images using OpenCV
Image noise is a random variation of brightness or color in images, which can make it harder to discern finer details in a photo. Noise is an artefact of how the image is captured. In digital photography, sensor electronic noise causes random fuzziness over the true image. It is more noticeable in low light, where the lower signal from the sensor is amplified, amplifying the noise with it. Similar noisy artefacts are also present in analog photos and film, but there it is caused by the film grain. Finally, you can also see noise-like artefacts introduced by lossy compression algorithms such as JPEG.
Noise reduction or denoising improves the visual appearance of a photo and can be an important step in a larger image analysis pipeline. Eliminating noise can make it easier to identify features algorithmically. However, we need to ensure that the denoised image is still an accurate representation of the original capture.
Denoising is a complex topic. Fortunately, several different algorithms are available. In this tutorial, we'll use algorithms from OpenCV and build them into a Streamlit app. The app will allow a user to upload images, choose from the common noise reduction algorithms, such as Gaussian Blur, Median Blur, Minimum Blur, and Maximum Blur, and adjust the strength of the noise reduction using a slider. The user can then download the resulting noise-reduced image.
By the end of this tutorial, you will --
- Learn how to build interactive web applications with Streamlit.
- Understand how to work with images using the Python libraries OpenCV and Pillow.
- Be able to apply noise reduction algorithms to images and allow users to download the processed images in different formats.
There's quite a lot to this example, so we'll break it down into small steps to make sure we understand how everything works.
- Setting Up the Working Environment
- Building the Application Outline
- Uploading an Image with Streamlit
- How Streamlit Works
- Configuring the Noise Reduction Algorithm
- Performing the Noise Reduction
- Non-Local Means Denoising
- Improving the Layout
- Downloading the Denoised Image
- Improving the Code Structure
- Conclusion
Setting Up the Working Environment
In this tutorial, we'll use the Streamlit library to build the noise reduction app's GUI.
To perform the denoising, we'll be using OpenCV. Don't worry if you're not familiar with this library, we'll be including working examples you can copy for everything we do.
With that in mind, let's create a virtual environment and install our requirements into it. To do this, you can run the following commands:
- macOS
- Windows
- Linux
$ mkdir denoise/
$ cd denoise
$ python -m venv venv
$ source venv/bin/activate
(venv)$ pip install streamlit opencv-python pillow numpy
> mkdir denoise/
> cd denoise
> python -m venv venv
> venv\Scripts\activate.bat
(venv)> pip install streamlit opencv-python pillow numpy
$ mkdir denoise/
$ cd denoise
$ python -m venv venv
$ source venv/bin/activate
(venv)$ pip install streamlit opencv-python pillow numpy
With these commands, you create a denoise/
folder for storing your project. Inside that folder, you create a new virtual environment, activate it, and install Streamlit, OpenCV, Pillow & numpy.
For platform-specific troublshooting, check the Working With Python Virtual Environments tutorial.
Building the Application Outline
We'll start by constructing a simple Streamlit application and then expand it from there.
import streamlit as st
# Set the title of our app.
st.title("Noise Reduction App")
Save this file as app.py
and use the following command to run it:
streamlit run app.py
Streamlit will start up and will launch the application in your default web browser.
The Streamlit application title displayed in the browser.
If it doesn't launch by itself, you can see the web address to open in the console.
The Streamlit application launch message showing the local server address where the app can be viewed.
Now that we have the app working, we can step through and build up our app.
Uploading an Image with Streamlit
First we need a way to upload an image to denoise. Streamlit provides a simple .file_uploader
method which can be used to upload an image from your computer. This is a generic file upload handler, but you can provide both a message to display (to specify what to upload) and constrain the file types that are supported.
Below we define a file_uploader
which shows a message "Choose an image..." and accepts JPEG and PNG images.
import streamlit as st
# Set the title of our app.
st.title("Noise Reduction App")
uploaded_file = st.file_uploader("Choose an image...", type=["jpg", "jpeg", "png"])
print(uploaded_file)
For historic reasons, JPEG images can have both .jpg
or .jpeg
extensions, so we include both in the list.
Run the code and you'll see the file upload box in the app. Try uploading a file.
Streamlit application with a file-upload widget.
The uploaded image is stored in the variable uploaded_file
. Before a file is uploaded, the value of uploaded_file
will be None
. Once the user uploads an image, this variable will contain an UploadedFile
object.
None
UploadedFile(file_id='73fd9a97-9939-4c02-b9e8-80bd2749ff76', name='headcake.jpg', type='image/jpeg', size=652805, _file_urls=file_id: "73fd9a97-9939-4c02-b9e8-80bd2749ff76"
upload_url: "/_stcore/upload_file/7c881339-82e4-4d64-ba20-a073a11f7b60/73fd9a97-9939-4c02-b9e8-80bd2749ff76"
delete_url: "/_stcore/upload_file/7c881339-82e4-4d64-ba20-a073a11f7b60/73fd9a97-9939-4c02-b9e8-80bd2749ff76"
)
We can use this UploadedFile
object to load and display the image in the browser.
How Streamlit Works
If you're used to writing Python scripts the behavior of the script and file upload box might be a confusing. Normally a script would execute from top to bottom, but here the value of uploaded_file
is changing and the print
statement is being re-run as the state changes.
There's a lot of clever stuff going on under the hood here, but in simple terms the Streamlit script is being re-evaluated in response to changes. On each change the script runs again, from top to bottom. But importantly, the state of widgets is not reset on each run.
When we upload a file, that file gets stored in the state of the file upload widget and this triggers the script to re-start. When it gets to the st.file_uploader
call, that UploadedFile
object will be returned immediately from the stored state. It can then affect the flow of the code after it.
The following code allows you to see these re-runs more clearly, by displaying the current timestamp in the header. Every time the code is re-executed this number will update.
from time import time
import streamlit as st
# Set the title of our app.
st.title(f"Noise Reduction App {int(time())}")
uploaded_file = st.file_uploader("Choose an image...", type=["jpg", "jpeg", "png"])
Try uploading an image and then removing it. You'll see the timestamp in the title change each time. This is the script being re-evaluated in response to changes in the widget state.
Loading and Displaying the Uploaded Image
While we can upload an image, we can't see it yet. Let's implement that now.
As mentioned, the uploaded file is available as an UploadedFile
object in the uploaded_file
variable. This object can be passed directly to st.image
to display the image back in the browser. You can also add a caption and auto resize the image to the width of the application.
import numpy as np
import streamlit as st
from PIL import Image
st.title("Noise Reduction App")
uploaded_file = st.file_uploader("Choose an image...", type=["jpg", "jpeg", "png"])
if uploaded_file is not None:
st.image(image, caption="Uploaded Image", use_container_width=True)
Run this and upload an image. You'll see the image appear under the file upload widget.
Streamlit application showing an uploaded image.
Converting the Image for Processing
While the above works fine for displaying the image in the browser, we want to process the image through the OpenCV noise reduction algorithms. For that we need to get the image into a format which OpenCV recognizes. We can do that using Pillow & NumPy.
The updated code to handle this conversion is shown below.
import numpy as np
import streamlit as st
from PIL import Image
st.title("Noise Reduction App")
uploaded_file = st.file_uploader("Choose an image...", type=["jpg", "jpeg", "png"])
if uploaded_file is not None:
# Convert the uploaded file to a PIL image.
image = Image.open(uploaded_file)
# Convert the image to an RGB NumPy array for processing.
image = image.convert("RGB")
image = np.array(image)
# Displaying the RGB image.
st.image(image, caption="Uploaded Image", use_container_width=True)
In this code, the uploaded file is opened using Pillow's Image.open()
method, which reads the image into a PIL image format. The image is then converted into Pillows RGB format, for consistency (discarding transparent for example). This regular format is then converted into a NumPy array which OpenCV requires for processing.
Helpfully, Streamlit's st.image
method also understands the NumPy RGB image format, so we can pass the image array directly to it. This will be useful when we want to display the processed image, since we won't need to convert it before doing that.
If you run the above it will work exactly as before. But now we have our uploaded image available as an RGB array in the image
variable. We'll use that to do our processing next.
Configuring the Noise Reduction Algorithm
The correct noise reduction strategy depends on the image and type of noise present. For a given image you may want to try different algorithms and adjust the extent of the noise reduction. To accommodate that, we're going to add two new controls to our application -- an algorithm drop down, and a kernel size slider.
The first presents a select box from which the user can choose which algorithm to use. The second allows the user to configure the behavior of the given algorithm -- specifically the size of the area being considered by each algorithm when performing noise reduction.
import numpy as np
import streamlit as st
from PIL import Image
st.title("Noise Reduction App")
uploaded_file = st.file_uploader("Choose an image...", type=["jpg", "jpeg", "png"])
algorithm = st.selectbox(
"Select noise reduction algorithm",
(
"Gaussian Blur Filter",
"Median Blur Filter",
"Minimum Blur Filter",
"Maximum Blur Filter",
"Non-local Means Filter",
),
)
kernel_size = st.slider("Select kernel size", 1, 10, step=2)
if uploaded_file is not None:
# Convert the uploaded file to a PIL image.
image = Image.open(uploaded_file)
# Convert the image to an RGB NumPy array for processing.
image = image.convert("RGB")
image = np.array(image)
# Displaying the RGB image.
st.image(image, caption="Uploaded Image", use_container_width=True)
When you run this you'll see the new widgets in the UI. The uploaded image is displayed last since it is the last thing to be added.
The algorithm selection and configuration widgets shown in the app.
The slider for the kernel size allows the user to adjust the kernel size, which determines the strength of the noise reduction effect. The kernel is a small matrix used in convolution to blur or process the image for noise removal. The larger the kernel size, the stronger the effect will be but also the more blurring or distortion you will see in the image.
The removal of noise is always a balancing act between noise and accuracy of the image.
The slider ranges from 1 to 10, with a step of 2 (i.e., possible kernel sizes are 1, 3, 5, 7, and 9).
The kernel size must be an odd number to maintain symmetry in the image processing algorithms.
Performing the Noise Reduction
Now we have all the parts in place to actually perform noise reduction on the image. The final step is to add the calls to OpenCV's noise reduction algorithms and show the resulting, noise-reduced image back in the UI.
import cv2
import numpy as np
import streamlit as st
from PIL import Image
st.title("Noise Reduction App")
uploaded_file = st.file_uploader("Choose an image...", type=["jpg", "jpeg", "png"])
algorithm = st.selectbox(
"Select noise reduction algorithm",
(
"Gaussian Blur Filter",
"Median Blur Filter",
"Minimum Blur Filter",
"Maximum Blur Filter",
"Non-local Means Filter",
),
)
kernel_size = st.slider("Select kernel size", 1, 10, step=2)
if uploaded_file is not None:
# Convert the uploaded file to a PIL image.
image = Image.open(uploaded_file)
# Convert the image to an RGB NumPy array for processing.
image = image.convert("RGB")
image = np.array(image)
# Displaying the RGB image.
st.image(image, caption="Uploaded Image", use_container_width=True)
# Applying the selected noise reduction algorithm based on user selection
if algorithm == "Gaussian Blur Filter":
denoised_image = cv2.GaussianBlur(image, (kernel_size, kernel_size), 0)
elif algorithm == "Median Blur Filter":
denoised_image = cv2.medianBlur(image, kernel_size)
elif algorithm == "Minimum Blur Filter":
kernel = np.ones((kernel_size, kernel_size), np.uint8)
denoised_image = cv2.erode(image, kernel, iterations=1)
elif algorithm == "Maximum Blur Filter":
kernel = np.ones((kernel_size, kernel_size), np.uint8)
denoised_image = cv2.dilate(image, kernel, iterations=1)
elif algorithm == "Non-local Means Filter":
denoised_image = cv2.fastNlMeansDenoisingColored(
image, None, kernel_size, kernel_size, 7, 15
)
# Displaying the denoised image in RGB format
st.image(denoised_image, caption="Denoised Image", use_container_width=True)
If you run this you can now upload your images and apply denoising to them. Try changing the algorithm and adjusting the kernel size parameter to see the effect it has on the noise reduction. The denoised image is displayed at the bottom with the caption "Denoised Image".
Each of the noise reduction strategies is described below. The median blur
and non-local means
methods are the most effective for normal images.
Gaussian Blur Filter
Gaussian blur smoothens the image by applying a Gaussian function to a pixel's neighbors. The kernel size determines the area over which the blur is applied, with larger kernels leading to stronger blurs. This method preserves edges fairly well and is often used in preprocessing for tasks like object detection.
Gaussian blur filter applied to a image using a 3x3 kernel.
This is effective at removing light noise, at the expense of sharpness.
Median Blur Filter
Median blur reduces noise by replacing each pixel's value with the median value from the surrounding pixels, making it effective against salt-and-pepper noise. It preserves edges better than Gaussian blur but can still affect the sharpness of fine details.
Median blur filter applied to a image using a 3x3 kernel window.
Median blur noise reduction (kernel size = 7).
Median blur noise reduction (kernel size = 5).
Minimum Blur (Erosion)
This filter uses the concept of morphological erosion. It shrinks bright areas in the image by sliding a small kernel over it. This filter is effective for removing noise in bright areas but may distort the overall structure if applied too strongly.
Erosion algorithm applied to an image using 3x3 kernel window.
This works well to remove light noise from dark regions.
Erosion noise reduction (kernel size = 5).
Maximum Blur (Dilation)
In contrast to erosion, dilation expands bright areas and is effective in eliminating dark noise spots. However, it can result in the expansion of bright regions, altering the shape of objects in the image.
Dilation algorithm applied to an image using 3x3 kernel window.
This works well to remove dark noise from light regions.
Non-Local Means Denoising
This method identifies similar regions from across the image, then combines these together to average out the noise. This works particularly well in images with repeating regions, or flat areas of color, but less well when the image has too much noise to be able to identify the similar regions.
Non-local means noise reduction example.
Improving the Layout
It's not very user friendly having the input and output images one above the other, as you need to scroll up and down to see the effect of the algorithm. Streamlit has support for arranging widgets in columns. We'll use that to put the two images next to one another.
To create columns in Streamlit you use st.columns()
passing in the number of columns to create. This returns column objects (as many as you request) which can be used as context managers to wrap your widget calls. In code, this looks like the following:
# Displaying the denoised image in RGB format
col1, col2 = st.columns(2)
with col1:
st.image(image, caption="Uploaded Image", use_container_width=True)
with col2:
st.image(denoised_image, caption="Denoised Image", use_container_width=True)
Here we call st.columns(2)
creating two columns, returning into col1
and col2
. We then use these with with
to wrap the two st.image
calls. This puts them into two adjacent columns.
Run this and you'll see the two images next to one another. This makes it much easier to see the impact of changes in the algorithm or parameters.
The source and processed image arranged next to one another using columns.
Downloading the Denoised Image
Our application now allows users to upload images and process them to remove noise, with a configurable noise removal algorithm and kernel size. The final step is to allow users to download and save the processed image somewhere.
You can actually just right-click and use your browser's option to Save the image if you like. But adding this to the UI makes it more explicit and allows us to offer different image output formats.
First, we need to import the io
module. In a normal image processing script, you could simply save the generated image to disk. Our Streamlit app could be running on a server somewhere, and saving the result to the server isn't useful: we want to be able to send it to the user. For that, we need to send it to the web browser. We browsers don't understand Python objects, so we need to save our image data to a simple bytes
object. The io
module allows us to do that.
Add an import for Python's io
module to the imports at the top of the code.
import io
import cv2
import numpy as np
import streamlit as st
from PIL import Image
Now under the rest of the code we can add the widgets and logic for saving and presenting the image as a download. First add a select box to choose the image format.
# ..snipped the rest of the code.
# Dropdown to select the file format for downloading
file_format = st.selectbox("Select output format", ("PNG", "JPEG"))
Next we need to take our denoised_image
and convert this from a NumPy array back to a PIL image. Then we can use Pillow's native methods for saving the image to a simple bytestream, which can be sent to the web browser.
# Converting NumPy array to PIL image in RGB mode
denoised_image_pil = Image.fromarray(denoised_image)
# Creating a buffer to store the image data in the selected format
buf = io.BytesIO()
denoised_image_pil.save(buf, format=file_format)
byte_data = buf.getvalue()
Since OpenCV operations return a NumPy array (the same format we provide it with) it must be converted back to a PIL image before saving. The io.BytesIO()
creates an in-memory file buffer to write to. That way we don't need to actually save the image. We write the image using the Image .save()
method in the requested file format.
Note that this saved image is in an actual PNG/JPEG image format at this point, not just pure image data.
We can retrieve the bytes data from the buffer using .getvalue()
. The resulting byte_data
is a raw bytes object that can be passed to the web browser. This is handled by a Streamlit download button.
# Button to download the processed image
st.download_button(
label="Download Image",
data=byte_data,
file_name=f"denoised_image.{file_format.lower()}",
mime=f"image/{file_format.lower()}"
)
Notice we've also set the filename and mimetype, using the selected file_format
variable.
If you're adding additional file formats, be aware that the mimetypes are not always 1:1 with the file extensions. In this case we've used .jpeg
since the mimetype is image/jpeg
.
Improving the Code Structure
The complete code so far is shown below.
import io
import cv2
import numpy as np
import streamlit as st
from PIL import Image
st.title("Noise Reduction App")
uploaded_file = st.file_uploader("Choose an image...", type=["jpg", "jpeg", "png"])
algorithm = st.selectbox(
"Select noise reduction algorithm",
(
"Gaussian Blur Filter",
"Median Blur Filter",
"Minimum Blur Filter",
"Maximum Blur Filter",
"Non-local Means Filter",
),
)
kernel_size = st.slider("Select kernel size", 1, 10, step=2)
if uploaded_file is not None:
# Convert the uploaded file to a PIL image.
image = Image.open(uploaded_file)
# Convert the image to an RGB NumPy array for processing.
image = image.convert("RGB")
image = np.array(image)
# Applying the selected noise reduction algorithm based on user selection
if algorithm == "Gaussian Blur Filter":
denoised_image = cv2.GaussianBlur(image, (kernel_size, kernel_size), 0)
elif algorithm == "Median Blur Filter":
denoised_image = cv2.medianBlur(image, kernel_size)
elif algorithm == "Minimum Blur Filter":
kernel = np.ones((kernel_size, kernel_size), np.uint8)
denoised_image = cv2.erode(image, kernel, iterations=1)
elif algorithm == "Maximum Blur Filter":
kernel = np.ones((kernel_size, kernel_size), np.uint8)
denoised_image = cv2.dilate(image, kernel, iterations=1)
elif algorithm == "Non-local Means Filter":
denoised_image = cv2.fastNlMeansDenoisingColored(
image, None, kernel_size, kernel_size, 7, 15
)
# Displaying the denoised image in RGB format
col1, col2 = st.columns(2)
with col1:
st.image(image, caption="Uploaded Image", use_container_width=True)
with col2:
st.image(denoised_image, caption="Denoised Image", use_container_width=True)
# Dropdown to select the file format for downloading
file_format = st.selectbox("Select output format", ("PNG", "JPEG", "JPG"))
# Converting NumPy array to PIL image in RGB mode
denoised_image_pil = Image.fromarray(denoised_image)
# Creating a buffer to store the image data in the selected format
buf = io.BytesIO()
denoised_image_pil.save(buf, format=file_format)
byte_data = buf.getvalue()
# Button to download the processed image
st.download_button(
label="Download Image",
data=byte_data,
file_name=f"denoised_image.{file_format.lower()}",
mime=f"image/{file_format.lower()}",
)
If you run the completed app you can now upload images, denoise them using the different algorithms and kernel parameters and then save them as JPEG or PNG format images.
However, we can still improve this. There is a lot of code nested under the if uploaded_file is not None:
branch, and the logic and processing steps aren't well organized -- everything runs together, mixed in with the UI. When developing UI applications it's a good habit to separate UI and non-UI code where possible (logic vs. presentation). That keeps related code together in the same context, aiding readability and maintainability.
Below is the same code refactored to move the file opening, denoising and file exporting logic out into separate handler functions.
import io
import cv2
import numpy as np
import streamlit as st
from PIL import Image
def image_to_array(file_to_open):
"""Load a Streamlit image into an array."""
# Convert the uploaded file to a PIL image.
image = Image.open(file_to_open)
# Convert the image to an RGB NumPy array for processing.
image = image.convert("RGB")
image = np.array(image)
return image
def denoise_image(image, algorithm, kernel_size):
"""Apply a denoising algorithm to the provided image, with the given kernel size."""
# Applying the selected noise reduction algorithm based on user selection
if algorithm == "Gaussian Blur Filter":
denoised_image = cv2.GaussianBlur(image, (kernel_size, kernel_size), 0)
elif algorithm == "Median Blur Filter":
denoised_image = cv2.medianBlur(image, kernel_size)
elif algorithm == "Minimum Blur Filter":
kernel = np.ones((kernel_size, kernel_size), np.uint8)
denoised_image = cv2.erode(image, kernel, iterations=1)
elif algorithm == "Maximum Blur Filter":
kernel = np.ones((kernel_size, kernel_size), np.uint8)
denoised_image = cv2.dilate(image, kernel, iterations=1)
elif algorithm == "Non-local Means Filter":
denoised_image = cv2.fastNlMeansDenoisingColored(
image, None, kernel_size, kernel_size, 7, 15
)
return denoised_image
def image_array_to_bytes(image_to_convert):
"""Given an image array, convert it to a bytes object."""
# Converting NumPy array to PIL image in RGB mode
image_pil = Image.fromarray(image_to_convert)
# Creating a buffer to store the image data in the selected format
buf = io.BytesIO()
image_pil.save(buf, format=file_format)
byte_data = buf.getvalue()
return byte_data
st.title("Noise Reduction App")
uploaded_file = st.file_uploader("Choose an image...", type=["jpg", "jpeg", "png"])
algorithm = st.selectbox(
"Select noise reduction algorithm",
(
"Gaussian Blur Filter",
"Median Blur Filter",
"Minimum Blur Filter",
"Maximum Blur Filter",
"Non-local Means Filter",
),
)
kernel_size = st.slider("Select kernel size", 1, 10, step=2)
if uploaded_file is not None:
image = image_to_array(uploaded_file)
denoised_image = denoise_image(image, algorithm, kernel_size)
# Displaying the denoised image in RGB format
col1, col2 = st.columns(2)
with col1:
st.image(image, caption="Uploaded Image", use_container_width=True)
with col2:
st.image(denoised_image, caption="Denoised Image", use_container_width=True)
# Dropdown to select the file format for downloading
file_format = st.selectbox("Select output format", ("PNG", "JPEG", "JPG"))
byte_data = image_array_to_bytes(denoised_image)
# Button to download the processed image
st.download_button(
label="Download Image",
data=byte_data,
file_name=f"denoised_image.{file_format.lower()}",
mime=f"image/{file_format.lower()}",
)
As you can see, the main flow of the code now consists entirely of Streamlit UI setup code and calls to the processing functions we have defined. Both the UI and processing code is now easier to read and maintain.
In larger projects you may choose to move the functions out into a separate files of related functions and import them instead.
Conclusion
In this tutorial, you created an image noise reduction application using Streamlit and OpenCV. The app allows users to upload images, apply different noise reduction algorithms, and download the denoised image.
It also allows the user to customize the kernel size, which controls the strength of the effect. This makes the app useful for a variety of noise types and image processing tasks.
Streamlit makes it simple to build powerful web applications, taking the power of Python's rich ecosystem and making it available through the browser.
05 May 2025 6:00am GMT