22 Jun 2018

feedPlanet Python

Python Bytes: #83 from __future__ import braces

22 Jun 2018 8:00am GMT

Python Bytes: #83 from __future__ import braces

22 Jun 2018 8:00am GMT

qutebrowser development blog: qutebrowser v1.3.3 released (security update!)

I've just released qutebrowser v1.3.3, which fixes an XSS vulnerability on the qute://history page (:history).

qutebrowser is a keyboard driven browser with a vim-like, minimalistic interface. It's written using PyQt and cross-platform.

The vulnerability allowed websites to inject HTML into the page via a crafted title tag …

22 Jun 2018 12:04am GMT

qutebrowser development blog: qutebrowser v1.3.3 released (security update!)

I've just released qutebrowser v1.3.3, which fixes an XSS vulnerability on the qute://history page (:history).

qutebrowser is a keyboard driven browser with a vim-like, minimalistic interface. It's written using PyQt and cross-platform.

The vulnerability allowed websites to inject HTML into the page via a crafted title tag …

22 Jun 2018 12:04am GMT

21 Jun 2018

feedPlanet Python

Trey Hunner: How to make an iterator in Python

I wrote an article sometime ago on the iterator protocol that powers Python's for loops. One thing I left out of that article was how to make your own iterators.

In this article I'm going to discuss why you'd want to make your own iterators and then show you how to do so.

What is an iterator?

First let's quickly address what an iterator is. For a much more detailed explanation, consider watching my Loop Better talk or reading the article based on the talk.

An iterable is anything you're able to loop over.

An iterator is the object that does the actual iterating.

You can get an iterator from any iterable by calling the built-in iter function on the iterable.

1
2
3
>>> favorite_numbers = [6, 57, 4, 7, 68, 95]
>>> iter(favorite_numbers)
<list_iterator object at 0x7fe8e5623160>

You can use the built-in next function on an iterator to get the next item from it (you'll get a StopIteration exception if there are no more items).

1
2
3
4
5
6
>>> favorite_numbers = [6, 57, 4, 7, 68, 95]
>>> my_iterator = iter(favorite_numbers)
>>> next(my_iterator)
6
>>> next(my_iterator)
57

There's one more rule about iterators that makes everything interesting: iterators are also iterables and their iterator is themselves. I explain the consequences of that more fully in that Loop Better talk I mentioned above.

Why make an iterator?

Iterators allow you to make an iterable that computes its items as it goes. Which means that you can make iterables that are lazy, in that they don't determine what their next item is until you ask them for it.

Using an iterator instead of a list, set, or another iterable data structure can sometimes allow us to save memory. For example, we can use itertools.repeat to create an iterable that provides 100 million 4's to us:

1
2
>>> from itertools import repeat
>>> lots_of_fours = repeat(4, times=100_000_000)

This iterator takes up 56 bytes of memory on my machine:

1
2
3
>>> import sys
>>> sys.getsizeof(lots_of_fours)
56

An equivalent list of 100 million 4's takes up many megabytes of memory:

1
2
3
4
>>> lots_of_fours = [4] * 100_000_000
>>> import sys
>>> sys.getsizeof(lots_of_fours)
800000064

While iterators can save memory, they can also save time. For example if you wanted to print out just the first line of a 10 gigabyte log file, you could do this:

1
2
>>> print(next(open('giant_log_file.txt')))
This is the first line in a giant file

File objects in Python are implemented iterators. As you loop over a file, data is read into memory one line at a time. If we instead used the readlines method to store all lines in memory, we might run out of system memory.

So iterators can save us memory, but iterators can sometimes save us time also.

Additionally, iterators have abilities that other iterables don't. For example, the laziness of iterables can be used to make iterables that have an unknown length. In fact, you can even make infinitely long iterators.

For example, the itertools.count utility will give us an iterator that will provide every number from 0 upward as we loop over it:

1
2
3
4
5
6
7
8
>>> from itertools import count
>>> for n in count():
...     print(n)
...
0
1
2
(this goes on forever)

That itertools.count object is essentially an infinitely long iterable. And it's implemented as an iterator.

Making an iterator: the object-oriented way

So we've seen that iterators can save us memory, save us CPU time, and unlock new abilities to us.

Let's make our own iterators. We'll start be re-inventing the itertools.count iterator object.

Here's an iterator implemented using a class:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
class Count:

    """Iterator that counts upward forever."""

    def __init__(self, start=0):
        self.num = start

    def __iter__(self):
        return self

    def __next__(self):
        num = self.num
        self.num += 1
        return num

This class has an initializer that initializes our current number to 0 (or whatever is passed in as the start). The things that make this class usable as an iterator are the __iter__ and __next__ methods.

When an object is passed to the str built-in function, its __str__ method is called. When an object is passed to the len built-in function, its __len__ method is called.

1
2
3
4
5
>>> numbers = [1, 2, 3]
>>> str(numbers), numbers.__str__()
('[1, 2, 3]', '[1, 2, 3]')
>>> len(numbers), numbers.__len__()
(3, 3)

Calling the built-in iter function on an object will attempt to call its __iter__ method. Calling the built-in next function on an object will attempt to call its __next__ method.

The iter function is supposed to return an iterator. So our __iter__ function must return an iterator. But our object is an iterator, so should return ourself. Therefore our Count object returns self from its __iter__ method because it is its own iterator.

The next function is supposed to return the next item in our iterator or raise a StopIteration exception when there are no more items. We're returning the current number and incrementing the number so it'll be larger during the next __next__ call.

We can manually loop over our Count iterator class like this:

1
2
3
4
5
>>> c = Count()
>>> next(c)
0
>>> next(c)
1

We could also loop over our Count object like using a for loop, as with any other iterable:

1
2
3
4
5
6
7
>>> for n in Count():
...     print(n)
...
0
1
2
(this goes on forever)

This object-oriented approach to making an iterator is cool, but it's not the usual way that Python programmers make iterators. Usually when we want an iterator, we make a generator.

Generators: the easy way to make an iterator

The easiest ways to make our own iterators in Python is to create a generator.

There are two ways to make generators in Python.

Given this list of numbers:

1
>>> favorite_numbers = [6, 57, 4, 7, 68, 95]

We can make a generator that will lazily provide us with all the squares of these numbers like this:

1
2
3
4
5
>>> def square_all(numbers):
...     for n in numbers:
...         yield n**2
...
>>> squares = square_all(favorite_numbers)

Or we can make the same generator like this:

1
>>> squares = (n**2 for n in favorite_numbers)

The first one is called a generator function and the second one is called a generator expression.

Both of these generator objects work the same way. They both have a type of generator and they're both iterators that provide squares of the numbers in our numbers list.

1
2
3
4
5
6
>>> type(squares)
<class 'generator'>
>>> next(squares)
36
>>> next(squares)
3249

We're going to talk about both of these approaches to making a generator, but first let's talk about terminology.

The word "generator" is used in quite a few ways in Python:

With that terminology out of the way, let's take a look at each one of these things individually. We'll look at generator functions first.

Generator functions

Generator functions are distinguished from plain old functions by the fact that they have one or more yield statements.

Normally when you call a function, its code is executed:

1
2
3
4
5
6
7
8
>>> def gimme4_please():
...     print("Let me go get that number for you.")
...     return 4
...
>>> num = gimme4_please()
Let me go get that number for you.
>>> num
4

But if the function has a yield statement in it, it isn't a typical function anymore. It's now a generator function, meaning it will return a generator object when called. That generator object can be looped over to execute it until a yield statement is hit:

1
2
3
4
5
6
7
8
9
10
11
>>> def gimme4_later_please():
...     print("Let me go get that number for you.")
...     yield 4
...
>>> get4 = gimme4_later_please()
>>> get4
<generator object gimme4_later_please at 0x7f78b2e7e2b0>
>>> num = next(get4)
Let me go get that number for you.
>>> num
4

The mere presence of a yield statement turns a function into a generator function. If you see a function and there's a yield, you're working with a different animal. It's a bit odd, but that's the way generator functions work.

Okay let's look at a real example of a generator function. We'll make a generator function that does the same thing as our Count iterator class we made earlier.

1
2
3
4
5
def count(start=0):
    num = start
    while True:
        yield num
        num += 1

Just like our Counter iterator class, we can manually loop over the generator we get back from calling count:

1
2
3
4
5
>>> c = count()
>>> next(c)
0
>>> next(c)
1

And we can loop over this generator object using a for loop, just like before:

1
2
3
4
5
6
7
>>> for n in count():
...     print(n)
...
0
1
2
(this goes on forever)

But this function is considerably shorter than our Count class we created before.

Generator expressions

Generator expressions are a list comprehension-like syntax that allow us to make a generator object.

Let's say we have a list comprehension that filters empty lines from a file and strips newlines from the end:

1
2
3
4
5
lines = [
    line.rstrip('\n')
    for line in poem_file
    if line != '\n'
]

We could create a generator instead of a list, by turning the square brackets of that comprehension into parenthesis:

1
2
3
4
5
lines = (
    line.rstrip('\n')
    for line in poem_file
    if line != '\n'
)

Just as our list comprehension gave us a list back, our generator expression gives us a generator object back:

1
2
3
4
5
6
>>> type(lines)
<class 'generator'>
>>> next(lines)
' This little bag I hope will prove'
>>> next(lines)
'To be not vainly made--'

Generator expressions use a shorter inline syntax compared to generator functions. They're not as powerful though.

If you can write your generator function in this form:

1
2
3
4
def get_a_generator(some_iterable):
    for item in some_iterable:
        if some_condition(item):
            yield item

Then you can replace it with a generator expression:

1
2
3
4
5
6
def get_a_generator(some_iterable):
    return (
        item
        for item in some_iterable
        if some_condition(item)
    )

If you can't write your generator function in that form, then you can't create a generator expression to replace it.

Generator expressions vs generator functions

You can think of generator expressions as the list comprehensions of the generator world.

If you're not familiar with list comprehensions, I recommend reading my article on list comprehensions in Python. I note in that article that you can copy-paste your way from a for loop to a list comprehension.

You can also copy-paste your way from a generator function to a function that returns a generator expression:

Generator expressions are to generator functions as list comprehensions are to a simple for loop with an append and a condition.

Generator expressions are so similar to comprehensions, that you might even be tempted to say generator comprehension instead of generator expression. That's not technically the correct name, but if you say it everyone will know what you're talking about. Ned Batchelder actually proposed that we should all start calling generator expressions generator comprehensions and I tend to agree that this would be a clearer name.

So what's the best way to make an iterator?

To make an iterator you could create an iterator class, a generator function, or a generator expression. Which way is the best way though?

Generator expressions are very succinct, but they're not nearly as flexible as generator functions. Generator functions are flexible, but if you need to attach extra methods or attributes to your iterator object, you'll probably need to switch to using an iterator class.

I'd recommend reaching for generator expressions the same way you reach for list comprehensions. If you're doing a simple mapping or filtering operation, a generator expression is a great solution. If you're doing something a bit more sophisticated, you'll likely need a generator function.

I'd recommend using generator functions the same way you'd use for loops that append to a list. Everywhere you'd see an append method, you'd often see a yield statement instead.

And I'd say that you should almost never create an iterator class. If you find you need an iterator class, try to write a generator function that does what you need and see how it compares to your iterator class.

Generators can help when making iterables too

You'll see iterator classes in the wild, but there's rarely a good opportunity to write your own.

While it's rare to create your own iterator class, it's not unusual to make your own iterable class. And iterable classes require a __iter__ method which returns an iterator. Since generators are the easy way to make an iterator, we can use a generator function or a generator expression to create our __iter__ methods.

For example here's an iterable that provides x-y coordinates:

1
2
3
4
5
6
class Point:
    def __init__(self, x, y):
        self.x, self.y = x, y
    def __iter__(self):
        yield self.x
        yield self.y

Note that our Point class here creates an iterable when called (not an iterator). That means our __iter__ method must return an iterator. The easiest way to create an iterator is by making a generator function, so that's just what we did.

We stuck yield in our __iter__ to make it into a generator function and now our Point class con be looped over, just like any other iterable.

1
2
3
4
5
6
>>> p = Point(1, 2)
>>> x, y = p
>>> print(x, y)
1 2
>>> list(p)
[1, 2]

Generator functions are a natural fit for creating __iter__ methods on your iterable classes.

Generators are the way to make iterators

Dictionaries are the typical way to make a mapping in Python. Functions are the typical way to make a callable object in Python. Likewise, generators are the typical way to make an iterator in Python.

So when you're thinking "it sure would be nice to implement an iterable that lazily computes things as it's looped over," think of iterators.

And when you're considering how to create your own iterator, think of generator functions and generator expressions.

21 Jun 2018 11:00pm GMT

Trey Hunner: How to make an iterator in Python

I wrote an article sometime ago on the iterator protocol that powers Python's for loops. One thing I left out of that article was how to make your own iterators.

In this article I'm going to discuss why you'd want to make your own iterators and then show you how to do so.

What is an iterator?

First let's quickly address what an iterator is. For a much more detailed explanation, consider watching my Loop Better talk or reading the article based on the talk.

An iterable is anything you're able to loop over.

An iterator is the object that does the actual iterating.

You can get an iterator from any iterable by calling the built-in iter function on the iterable.

1
2
3
>>> favorite_numbers = [6, 57, 4, 7, 68, 95]
>>> iter(favorite_numbers)
<list_iterator object at 0x7fe8e5623160>

You can use the built-in next function on an iterator to get the next item from it (you'll get a StopIteration exception if there are no more items).

1
2
3
4
5
6
>>> favorite_numbers = [6, 57, 4, 7, 68, 95]
>>> my_iterator = iter(favorite_numbers)
>>> next(my_iterator)
6
>>> next(my_iterator)
57

There's one more rule about iterators that makes everything interesting: iterators are also iterables and their iterator is themselves. I explain the consequences of that more fully in that Loop Better talk I mentioned above.

Why make an iterator?

Iterators allow you to make an iterable that computes its items as it goes. Which means that you can make iterables that are lazy, in that they don't determine what their next item is until you ask them for it.

Using an iterator instead of a list, set, or another iterable data structure can sometimes allow us to save memory. For example, we can use itertools.repeat to create an iterable that provides 100 million 4's to us:

1
2
>>> from itertools import repeat
>>> lots_of_fours = repeat(4, times=100_000_000)

This iterator takes up 56 bytes of memory on my machine:

1
2
3
>>> import sys
>>> sys.getsizeof(lots_of_fours)
56

An equivalent list of 100 million 4's takes up many megabytes of memory:

1
2
3
4
>>> lots_of_fours = [4] * 100_000_000
>>> import sys
>>> sys.getsizeof(lots_of_fours)
800000064

While iterators can save memory, they can also save time. For example if you wanted to print out just the first line of a 10 gigabyte log file, you could do this:

1
2
>>> print(next(open('giant_log_file.txt')))
This is the first line in a giant file

File objects in Python are implemented iterators. As you loop over a file, data is read into memory one line at a time. If we instead used the readlines method to store all lines in memory, we might run out of system memory.

So iterators can save us memory, but iterators can sometimes save us time also.

Additionally, iterators have abilities that other iterables don't. For example, the laziness of iterables can be used to make iterables that have an unknown length. In fact, you can even make infinitely long iterators.

For example, the itertools.count utility will give us an iterator that will provide every number from 0 upward as we loop over it:

1
2
3
4
5
6
7
8
>>> from itertools import count
>>> for n in count():
...     print(n)
...
0
1
2
(this goes on forever)

That itertools.count object is essentially an infinitely long iterable. And it's implemented as an iterator.

Making an iterator: the object-oriented way

So we've seen that iterators can save us memory, save us CPU time, and unlock new abilities to us.

Let's make our own iterators. We'll start be re-inventing the itertools.count iterator object.

Here's an iterator implemented using a class:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
class Count:

    """Iterator that counts upward forever."""

    def __init__(self, start=0):
        self.num = start

    def __iter__(self):
        return self

    def __next__(self):
        num = self.num
        self.num += 1
        return num

This class has an initializer that initializes our current number to 0 (or whatever is passed in as the start). The things that make this class usable as an iterator are the __iter__ and __next__ methods.

When an object is passed to the str built-in function, its __str__ method is called. When an object is passed to the len built-in function, its __len__ method is called.

1
2
3
4
5
>>> numbers = [1, 2, 3]
>>> str(numbers), numbers.__str__()
('[1, 2, 3]', '[1, 2, 3]')
>>> len(numbers), numbers.__len__()
(3, 3)

Calling the built-in iter function on an object will attempt to call its __iter__ method. Calling the built-in next function on an object will attempt to call its __next__ method.

The iter function is supposed to return an iterator. So our __iter__ function must return an iterator. But our object is an iterator, so should return ourself. Therefore our Count object returns self from its __iter__ method because it is its own iterator.

The next function is supposed to return the next item in our iterator or raise a StopIteration exception when there are no more items. We're returning the current number and incrementing the number so it'll be larger during the next __next__ call.

We can manually loop over our Count iterator class like this:

1
2
3
4
5
>>> c = Count()
>>> next(c)
0
>>> next(c)
1

We could also loop over our Count object like using a for loop, as with any other iterable:

1
2
3
4
5
6
7
>>> for n in Count():
...     print(n)
...
0
1
2
(this goes on forever)

This object-oriented approach to making an iterator is cool, but it's not the usual way that Python programmers make iterators. Usually when we want an iterator, we make a generator.

Generators: the easy way to make an iterator

The easiest ways to make our own iterators in Python is to create a generator.

There are two ways to make generators in Python.

Given this list of numbers:

1
>>> favorite_numbers = [6, 57, 4, 7, 68, 95]

We can make a generator that will lazily provide us with all the squares of these numbers like this:

1
2
3
4
5
>>> def square_all(numbers):
...     for n in numbers:
...         yield n**2
...
>>> squares = square_all(favorite_numbers)

Or we can make the same generator like this:

1
>>> squares = (n**2 for n in favorite_numbers)

The first one is called a generator function and the second one is called a generator expression.

Both of these generator objects work the same way. They both have a type of generator and they're both iterators that provide squares of the numbers in our numbers list.

1
2
3
4
5
6
>>> type(squares)
<class 'generator'>
>>> next(squares)
36
>>> next(squares)
3249

We're going to talk about both of these approaches to making a generator, but first let's talk about terminology.

The word "generator" is used in quite a few ways in Python:

With that terminology out of the way, let's take a look at each one of these things individually. We'll look at generator functions first.

Generator functions

Generator functions are distinguished from plain old functions by the fact that they have one or more yield statements.

Normally when you call a function, its code is executed:

1
2
3
4
5
6
7
8
>>> def gimme4_please():
...     print("Let me go get that number for you.")
...     return 4
...
>>> num = gimme4_please()
Let me go get that number for you.
>>> num
4

But if the function has a yield statement in it, it isn't a typical function anymore. It's now a generator function, meaning it will return a generator object when called. That generator object can be looped over to execute it until a yield statement is hit:

1
2
3
4
5
6
7
8
9
10
11
>>> def gimme4_later_please():
...     print("Let me go get that number for you.")
...     yield 4
...
>>> get4 = gimme4_later_please()
>>> get4
<generator object gimme4_later_please at 0x7f78b2e7e2b0>
>>> num = next(get4)
Let me go get that number for you.
>>> num
4

The mere presence of a yield statement turns a function into a generator function. If you see a function and there's a yield, you're working with a different animal. It's a bit odd, but that's the way generator functions work.

Okay let's look at a real example of a generator function. We'll make a generator function that does the same thing as our Count iterator class we made earlier.

1
2
3
4
5
def count(start=0):
    num = start
    while True:
        yield num
        num += 1

Just like our Counter iterator class, we can manually loop over the generator we get back from calling count:

1
2
3
4
5
>>> c = count()
>>> next(c)
0
>>> next(c)
1

And we can loop over this generator object using a for loop, just like before:

1
2
3
4
5
6
7
>>> for n in count():
...     print(n)
...
0
1
2
(this goes on forever)

But this function is considerably shorter than our Count class we created before.

Generator expressions

Generator expressions are a list comprehension-like syntax that allow us to make a generator object.

Let's say we have a list comprehension that filters empty lines from a file and strips newlines from the end:

1
2
3
4
5
lines = [
    line.rstrip('\n')
    for line in poem_file
    if line != '\n'
]

We could create a generator instead of a list, by turning the square brackets of that comprehension into parenthesis:

1
2
3
4
5
lines = (
    line.rstrip('\n')
    for line in poem_file
    if line != '\n'
)

Just as our list comprehension gave us a list back, our generator expression gives us a generator object back:

1
2
3
4
5
6
>>> type(lines)
<class 'generator'>
>>> next(lines)
' This little bag I hope will prove'
>>> next(lines)
'To be not vainly made--'

Generator expressions use a shorter inline syntax compared to generator functions. They're not as powerful though.

If you can write your generator function in this form:

1
2
3
4
def get_a_generator(some_iterable):
    for item in some_iterable:
        if some_condition(item):
            yield item

Then you can replace it with a generator expression:

1
2
3
4
5
6
def get_a_generator(some_iterable):
    return (
        item
        for item in some_iterable
        if some_condition(item)
    )

If you can't write your generator function in that form, then you can't create a generator expression to replace it.

Generator expressions vs generator functions

You can think of generator expressions as the list comprehensions of the generator world.

If you're not familiar with list comprehensions, I recommend reading my article on list comprehensions in Python. I note in that article that you can copy-paste your way from a for loop to a list comprehension.

You can also copy-paste your way from a generator function to a function that returns a generator expression:

Generator expressions are to generator functions as list comprehensions are to a simple for loop with an append and a condition.

Generator expressions are so similar to comprehensions, that you might even be tempted to say generator comprehension instead of generator expression. That's not technically the correct name, but if you say it everyone will know what you're talking about. Ned Batchelder actually proposed that we should all start calling generator expressions generator comprehensions and I tend to agree that this would be a clearer name.

So what's the best way to make an iterator?

To make an iterator you could create an iterator class, a generator function, or a generator expression. Which way is the best way though?

Generator expressions are very succinct, but they're not nearly as flexible as generator functions. Generator functions are flexible, but if you need to attach extra methods or attributes to your iterator object, you'll probably need to switch to using an iterator class.

I'd recommend reaching for generator expressions the same way you reach for list comprehensions. If you're doing a simple mapping or filtering operation, a generator expression is a great solution. If you're doing something a bit more sophisticated, you'll likely need a generator function.

I'd recommend using generator functions the same way you'd use for loops that append to a list. Everywhere you'd see an append method, you'd often see a yield statement instead.

And I'd say that you should almost never create an iterator class. If you find you need an iterator class, try to write a generator function that does what you need and see how it compares to your iterator class.

Generators can help when making iterables too

You'll see iterator classes in the wild, but there's rarely a good opportunity to write your own.

While it's rare to create your own iterator class, it's not unusual to make your own iterable class. And iterable classes require a __iter__ method which returns an iterator. Since generators are the easy way to make an iterator, we can use a generator function or a generator expression to create our __iter__ methods.

For example here's an iterable that provides x-y coordinates:

1
2
3
4
5
6
class Point:
    def __init__(self, x, y):
        self.x, self.y = x, y
    def __iter__(self):
        yield self.x
        yield self.y

Note that our Point class here creates an iterable when called (not an iterator). That means our __iter__ method must return an iterator. The easiest way to create an iterator is by making a generator function, so that's just what we did.

We stuck yield in our __iter__ to make it into a generator function and now our Point class con be looped over, just like any other iterable.

1
2
3
4
5
6
>>> p = Point(1, 2)
>>> x, y = p
>>> print(x, y)
1 2
>>> list(p)
[1, 2]

Generator functions are a natural fit for creating __iter__ methods on your iterable classes.

Generators are the way to make iterators

Dictionaries are the typical way to make a mapping in Python. Functions are the typical way to make a callable object in Python. Likewise, generators are the typical way to make an iterator in Python.

So when you're thinking "it sure would be nice to implement an iterable that lazily computes things as it's looped over," think of iterators.

And when you're considering how to create your own iterator, think of generator functions and generator expressions.

21 Jun 2018 11:00pm GMT

Wallaroo Labs: Implementing Time Windowing in an Evented Streaming System

Hi there! Welcome to the second and final installment of my trending twitter hashtags example series. In part 1, we covered the basic dataflow and logic of the application. In part 2, we are going to take a look at how windowing for the "trending" aspect of our application is implemented. When implementing any sort of "trending" application, what we are really doing is implementing some kind of windowing. That is, for some duration of time, we want to know what was popular, what was "trending" during that period of time.

21 Jun 2018 4:49pm GMT

Wallaroo Labs: Implementing Time Windowing in an Evented Streaming System

Hi there! Welcome to the second and final installment of my trending twitter hashtags example series. In part 1, we covered the basic dataflow and logic of the application. In part 2, we are going to take a look at how windowing for the "trending" aspect of our application is implemented. When implementing any sort of "trending" application, what we are really doing is implementing some kind of windowing. That is, for some duration of time, we want to know what was popular, what was "trending" during that period of time.

21 Jun 2018 4:49pm GMT

Andre Roberge: Javascript tools for Python hobbyists

I am just a hobbyist, Python enthusiast who has been, over the course of many years, writing what is now a relatively big Javascript program (close to 20,000 lines of code so far). If you like Python and the Pythonic way of programming, and find yourself writing more JavaScript code than you'd like for a fun side-project meant as a hobby, you may find some merit in the approach I use. I wish I could have read something like this blog post when I started my project or even just a few years ago, when I did a major rewrite, and started using some of the tools described in this post.

If you are a professional programmer, you can just stop reading as you know much more than I do, and you surely have a better, more efficient and cutting edge way of doing things right now - and you will likely use yet a different way next year, if not next month. So, you would likely find my advice to look the same as my site: dated and not using the latest and coolest techniques - in short, for you, not worth looking at. ;-)

Summary:

Warning

This blog post is long. I've attempted to provide enough details for you to determine in each case if my use-case corresponds to yours and thus if and when my recommendation might make sense for you and your project.

The context

I started working on Reeborg's World many years ago. The first version was created as a desktop program (rur-ple) in 2004. My first primitive attempt at a web version was done around 2007. During the years I have worked on it, tools and libraries have come, evolved, and gone, to be replaced by better ones. As programming is only a hobby for me which I work on when I have some free time, I cannot afford to change the set of tools I use every year to follow the latest trend.

I've started working on the current version when using color gradients for buttons and menu bars was the latest and coolest thing - well before the current flat UI became the norm.


Admittedly, my site looks dated - but since I do not have enough time to add all the new ideas for functional improvements I want to make, investing time to modernize the look is not a priority.

The Javascript code I wrote is split over many files and has become a tangled mess - in spite of some occasional attempts at reorganizing the code, including a near-complete rewrite a few years ago.


Some of the complexity is required as I want to make it easier for would-be collaborator to add new programming language or paradigms for learners [1] or additional human language support [2]. However, it is likely that some of this tangled mess could be simplified with a significant effort.

In addition, there is more to Reeborg's World than a single site; there is also a basic programming tutorial available in three languages [3] with additional languages in the works, a Teacher's Guide [4], an API documentation for advanced features [5], and more [6]. Each of these act almost like an independent project pulling me in different directions.

In order to preserve my sanity, as my project slowly evolves I need some constancy and simplicity in the tools I use.

Using a well-supported library

Unlike the situation with Python, which comes "batteries included", there is no standard library for Javascript. Using a library means choosing between various alternatives, and communities.

When I started this project, the main problem facing people writing Javascript code was browser incompatibilities. There was one obvious solution: use jQuery. Nowadays, it is most likely no longer needed for that purpose, but that was not the case back then.

I also knew that I wanted the ability to have floating windows for additional menus and dialogs. After examining a few choices, I settled on jQuery UI, since there was good documentation for it and an active community ... and I was already using jQuery which meant a smaller footprint than some other alternatives.

Libraries like jQuery and jQuery UI can be included with a link to a CDN (Content delivery network) which can reduce the load on the server where my project lives. I can also link to a specific version of these libraries, which means that I do not have to update code that depend on them (except if security issues are discovered).

10 years later, both libraries are still alive and well and I haven't needed to make any significant changes to any code that uses them.

Use npm for installing Javascript tools

npm is described as both a package manager for Javascript and as the world's largest software respository. I use it to install various Javascript tools I use (like tape, browserify, jsdoc, etc. which I describe below).

I do not use it to install javascript libraries (big or small) called by my own code. From what I can tell, the "best/most common" practice in the Javascript world is to make use of tons of modules found on the npm repository, some of which are simply one line of code. Requiring a single module can mean in reality that the project can depend on dozens of other modules, none of them being vetted - unlike the Python standard library. Upgrade to a single modules can result in a bug affecting hundreds of other modules ... For example, one developer broke Node, Babel and thousands of projects in 11 lines of JavaScript.

When I resume working on my project after months of inactivity, I never have to worry about how any change to any such third party module could require updating my code. (Yes, there are most likely ways to mitigate such problems, but I prefer to avoid them in the first place.)

There are alternatives to npm (such as yarn, and others), but, from what I can tell, they do not offer any advantages when it comes to installing Javascript tools - a task that is performed very rarely for a given project.

Use npm to manage your workflow

When reading about Javascript, I most often saw either gulp or grunt mentioned mentioned as tools to automate tasks. From what I read, it seems that were essential to do any serious Javascript development. Each of them came had its own way to do things ... and it was not easy for me to see which would be the best fit. In the various posts I read about gulp vs grunt, npm was never mentioned as an alternative.

However, as I learned more about npm, I found that, together with a very simple batch file it could do all the automation that I needed in a very, very simple way, by defining "scripts" in a file named package.json. Chaining tasks with npm scripts is a simple matter of "piping" them (with the | character). Since I had already installed npm, it became an easy choice.

Use browserify to concatenate all your Javascript files

Once my Javascript code became much too long to fit into a single file, I broke it up into various files. With Python, I would have use an import statement in individual files to take care of dependencies. With Javascript, the only method that I knew of at the beginning of my project (10 years ago) was to add individual links in my html file. As the number of Javascript files increased, it became difficult to ensure that files were inserted in the proper order to ensure that dependencies were taken care of ... In fact, it soon became almost impossible.

This required a major rewrite. Fortunately, when I had to do this, some standardized way of ensuring dependencies had emerged. The simplest was to use something like

require("module_a.js");

at the top of, say module_b.js, and use some tools to concatenate the javascript files, ensuring that proper dependencies were taken care of. The simplest tool I found for this purpose is browserify, originally created, as far as I can tell, by James Halliday.

browserify can be installed using npm.

Use tape for unit testing

Sigh ... I find testing boring ... But, as my project grew larger, it became necessary to write some tests.

When I did a search on testing tools/framework for Javascript, I most often saw mentions of Chai, Jasmine, Mocha QUnit and Sinon. A recent search yields a few more potential candidates like Cucumber, Karma, etc.

The Javascript world seems to really, really like so-called Behaviour Driven Development, where writing tests can mean writing code like:

tea.should.have.property('flavors').with.lengthOf(3);

If.I.wanted.to.write.code.that.read.like.English.I.would.likely.use.Cobol.

It is only by accident that I came accross tape as a testing framework that felt "right" to me. I like my tests to look like my code. With Python, I would use assert statements to ensure that the a function produces the correct result. My favourite unit testing framework for Python is, not surprisingly, pytest.

From what I have seen, tape is the closest Javascript testing framework to pytest. Here's an actual example where I test some code which is expected to raise/throw an exception/error:

test('add_wall: invalid orientation', function (assert) {
assert.plan(2);
try {
RUR.add_wall("n", 1, 2, true);
} catch (e) {
assert.ok(e.reeborg_shouts, "reeborg_shouts");
assert.equal(e.name, "ReeborgError", "error name ok");
}
assert.end();
});

I make use of "assert.plan()" to ensure that the number of assertions tested matches my expectations.

It was only after I had used tape for a while that I found out that it was also written by James Halliday.

tape can be installed using npm.

Use faucet for formatting of unit test results

Tape's output is in the TAP format (Test Anything Protocol) which, by default, is extremely verbose. Most often, it is recommended to pipe the results into formatters which produce more readable results.

Depending on what I am doing, I use different formatters, some more verbose than others. After trying out about a dozen formatters, I now use faucet by default. faucet can be installed using npm and has been written by, ... you guessed it, James Halliday.

Use QUnit for integration testing

Unit tests are fine, but they miss problems arising from putting all the code together. I used different strategies to do integration testing, all of which seem to create almost more problems than they solved, until I stumbled upon a very easy way that just works for me. Using a Python script, I take the single html file for my site, put all the code inside an html div with display set to none, insert some qunit code and my own tests, and let everything run.

Optional: use Madge for identifying circular dependencies

To help identify potential problems with circular dependencies, I use madge, which can be installed with npm.

There is one remaining dependency in my code, which I silence by not inserting a require() call in one of my modules: when the site is initialized, I want to draw a default version of the world which I by calling functions in the drawing module when loading some images. Later, when calling the drawing module, I do need the definitions found in the module where I load the images. I could get rid of the dependencies at the cost of duplicating some code ... but since the initializing of the site and the execution of user-entered code are done in separate phases, the circular dependency does not cause any problems.

Optional: use Dependo to identify any overlooked module


The image of the tangled mess of modules shown above was created using dependo. As I was refactoring code and adding various require() statement, dependo was helpful in identifying any module not included, either because they had been accidently forgotten or because they had become irrelevant. dependo can also be installed using npm.

Optional: use JSDoc for creating an API

While I do not particularly like it, as I cannot figure out how to extend it to address my particular needs, I found that jsdoc useful to produce an API for people wanting to use advanced features in creating unusual programming tasks (aka "worlds"). When I started using it, there did not seem to be any easy way to use Sphinx to create such API. I gather that this might no longer be the case ... but it would likely require too much effort to make the change at this point.

jsdoc can also be installed using npm.

Use jshint instead of jslint

A linter can often be useful in identifying potential or real problems with some code. When I started working on this project, the only linter I knew was jslint. jshint is friendlier and more configurable to use, and is my preferred choice. And, you guessed it, jshint can be installed using npm.

Last thoughts

There might very well be other tools that would be better for your own projects but, if you love Python and find yourself not overly enthusiastic at the thought of adopting the Javascript way when working on a project that requires Javascript, you might find that the tools I use match more closely the way you do things with Python. Or not.



[1] Currently, programs can be written in Python, Javascript, using blockly, or in Python using a REPL.

[2] Language support can mean one of two things: either the programming library for users (like using "avance()" in French as equivalent to "move()" in English, or for the UI, or both. Currently, French and English are implemented for both, while Korean and Polish are only available for UI. Work is underway to provide Chinese support for both.

[3] The tutorial can be found here; you can change the default language using the side-bar on the right. The repository is at https://github.com/aroberge/reeborg-docs. The tutorial is currently available in French, English and Korean, with additional languages in the works.

[4] https://github.com/aroberge/reeborg-howto is a site aimed at creators of advanced tasks for Reeborg's World. It has very little content currently but will have more to be migrated from https://github.com/aroberge/reeborg-world-creation which was written as an online book (a format which I found to be unsatisfactory.)

[5] https://github.com/aroberge/reeborg-api is a documentation site for the API that creators of advanced tasks can use.

[6]





21 Jun 2018 1:19pm GMT

Andre Roberge: Javascript tools for Python hobbyists

I am just a hobbyist, Python enthusiast who has been, over the course of many years, writing what is now a relatively big Javascript program (close to 20,000 lines of code so far). If you like Python and the Pythonic way of programming, and find yourself writing more JavaScript code than you'd like for a fun side-project meant as a hobby, you may find some merit in the approach I use. I wish I could have read something like this blog post when I started my project or even just a few years ago, when I did a major rewrite, and started using some of the tools described in this post.

If you are a professional programmer, you can just stop reading as you know much more than I do, and you surely have a better, more efficient and cutting edge way of doing things right now - and you will likely use yet a different way next year, if not next month. So, you would likely find my advice to look the same as my site: dated and not using the latest and coolest techniques - in short, for you, not worth looking at. ;-)

Summary:

Warning

This blog post is long. I've attempted to provide enough details for you to determine in each case if my use-case corresponds to yours and thus if and when my recommendation might make sense for you and your project.

The context

I started working on Reeborg's World many years ago. The first version was created as a desktop program (rur-ple) in 2004. My first primitive attempt at a web version was done around 2007. During the years I have worked on it, tools and libraries have come, evolved, and gone, to be replaced by better ones. As programming is only a hobby for me which I work on when I have some free time, I cannot afford to change the set of tools I use every year to follow the latest trend.

I've started working on the current version when using color gradients for buttons and menu bars was the latest and coolest thing - well before the current flat UI became the norm.


Admittedly, my site looks dated - but since I do not have enough time to add all the new ideas for functional improvements I want to make, investing time to modernize the look is not a priority.

The Javascript code I wrote is split over many files and has become a tangled mess - in spite of some occasional attempts at reorganizing the code, including a near-complete rewrite a few years ago.


Some of the complexity is required as I want to make it easier for would-be collaborator to add new programming language or paradigms for learners [1] or additional human language support [2]. However, it is likely that some of this tangled mess could be simplified with a significant effort.

In addition, there is more to Reeborg's World than a single site; there is also a basic programming tutorial available in three languages [3] with additional languages in the works, a Teacher's Guide [4], an API documentation for advanced features [5], and more [6]. Each of these act almost like an independent project pulling me in different directions.

In order to preserve my sanity, as my project slowly evolves I need some constancy and simplicity in the tools I use.

Using a well-supported library

Unlike the situation with Python, which comes "batteries included", there is no standard library for Javascript. Using a library means choosing between various alternatives, and communities.

When I started this project, the main problem facing people writing Javascript code was browser incompatibilities. There was one obvious solution: use jQuery. Nowadays, it is most likely no longer needed for that purpose, but that was not the case back then.

I also knew that I wanted the ability to have floating windows for additional menus and dialogs. After examining a few choices, I settled on jQuery UI, since there was good documentation for it and an active community ... and I was already using jQuery which meant a smaller footprint than some other alternatives.

Libraries like jQuery and jQuery UI can be included with a link to a CDN (Content delivery network) which can reduce the load on the server where my project lives. I can also link to a specific version of these libraries, which means that I do not have to update code that depend on them (except if security issues are discovered).

10 years later, both libraries are still alive and well and I haven't needed to make any significant changes to any code that uses them.

Use npm for installing Javascript tools

npm is described as both a package manager for Javascript and as the world's largest software respository. I use it to install various Javascript tools I use (like tape, browserify, jsdoc, etc. which I describe below).

I do not use it to install javascript libraries (big or small) called by my own code. From what I can tell, the "best/most common" practice in the Javascript world is to make use of tons of modules found on the npm repository, some of which are simply one line of code. Requiring a single module can mean in reality that the project can depend on dozens of other modules, none of them being vetted - unlike the Python standard library. Upgrade to a single modules can result in a bug affecting hundreds of other modules ... For example, one developer broke Node, Babel and thousands of projects in 11 lines of JavaScript.

When I resume working on my project after months of inactivity, I never have to worry about how any change to any such third party module could require updating my code. (Yes, there are most likely ways to mitigate such problems, but I prefer to avoid them in the first place.)

There are alternatives to npm (such as yarn, and others), but, from what I can tell, they do not offer any advantages when it comes to installing Javascript tools - a task that is performed very rarely for a given project.

Use npm to manage your workflow

When reading about Javascript, I most often saw either gulp or grunt mentioned mentioned as tools to automate tasks. From what I read, it seems that were essential to do any serious Javascript development. Each of them came had its own way to do things ... and it was not easy for me to see which would be the best fit. In the various posts I read about gulp vs grunt, npm was never mentioned as an alternative.

However, as I learned more about npm, I found that, together with a very simple batch file it could do all the automation that I needed in a very, very simple way, by defining "scripts" in a file named package.json. Chaining tasks with npm scripts is a simple matter of "piping" them (with the | character). Since I had already installed npm, it became an easy choice.

Use browserify to concatenate all your Javascript files

Once my Javascript code became much too long to fit into a single file, I broke it up into various files. With Python, I would have use an import statement in individual files to take care of dependencies. With Javascript, the only method that I knew of at the beginning of my project (10 years ago) was to add individual links in my html file. As the number of Javascript files increased, it became difficult to ensure that files were inserted in the proper order to ensure that dependencies were taken care of ... In fact, it soon became almost impossible.

This required a major rewrite. Fortunately, when I had to do this, some standardized way of ensuring dependencies had emerged. The simplest was to use something like

require("module_a.js");

at the top of, say module_b.js, and use some tools to concatenate the javascript files, ensuring that proper dependencies were taken care of. The simplest tool I found for this purpose is browserify, originally created, as far as I can tell, by James Halliday.

browserify can be installed using npm.

Use tape for unit testing

Sigh ... I find testing boring ... But, as my project grew larger, it became necessary to write some tests.

When I did a search on testing tools/framework for Javascript, I most often saw mentions of Chai, Jasmine, Mocha QUnit and Sinon. A recent search yields a few more potential candidates like Cucumber, Karma, etc.

The Javascript world seems to really, really like so-called Behaviour Driven Development, where writing tests can mean writing code like:

tea.should.have.property('flavors').with.lengthOf(3);

If.I.wanted.to.write.code.that.read.like.English.I.would.likely.use.Cobol.

It is only by accident that I came accross tape as a testing framework that felt "right" to me. I like my tests to look like my code. With Python, I would use assert statements to ensure that the a function produces the correct result. My favourite unit testing framework for Python is, not surprisingly, pytest.

From what I have seen, tape is the closest Javascript testing framework to pytest. Here's an actual example where I test some code which is expected to raise/throw an exception/error:

test('add_wall: invalid orientation', function (assert) {
assert.plan(2);
try {
RUR.add_wall("n", 1, 2, true);
} catch (e) {
assert.ok(e.reeborg_shouts, "reeborg_shouts");
assert.equal(e.name, "ReeborgError", "error name ok");
}
assert.end();
});

I make use of "assert.plan()" to ensure that the number of assertions tested matches my expectations.

It was only after I had used tape for a while that I found out that it was also written by James Halliday.

tape can be installed using npm.

Use faucet for formatting of unit test results

Tape's output is in the TAP format (Test Anything Protocol) which, by default, is extremely verbose. Most often, it is recommended to pipe the results into formatters which produce more readable results.

Depending on what I am doing, I use different formatters, some more verbose than others. After trying out about a dozen formatters, I now use faucet by default. faucet can be installed using npm and has been written by, ... you guessed it, James Halliday.

Use QUnit for integration testing

Unit tests are fine, but they miss problems arising from putting all the code together. I used different strategies to do integration testing, all of which seem to create almost more problems than they solved, until I stumbled upon a very easy way that just works for me. Using a Python script, I take the single html file for my site, put all the code inside an html div with display set to none, insert some qunit code and my own tests, and let everything run.

Optional: use Madge for identifying circular dependencies

To help identify potential problems with circular dependencies, I use madge, which can be installed with npm.

There is one remaining dependency in my code, which I silence by not inserting a require() call in one of my modules: when the site is initialized, I want to draw a default version of the world which I by calling functions in the drawing module when loading some images. Later, when calling the drawing module, I do need the definitions found in the module where I load the images. I could get rid of the dependencies at the cost of duplicating some code ... but since the initializing of the site and the execution of user-entered code are done in separate phases, the circular dependency does not cause any problems.

Optional: use Dependo to identify any overlooked module


The image of the tangled mess of modules shown above was created using dependo. As I was refactoring code and adding various require() statement, dependo was helpful in identifying any module not included, either because they had been accidently forgotten or because they had become irrelevant. dependo can also be installed using npm.

Optional: use JSDoc for creating an API

While I do not particularly like it, as I cannot figure out how to extend it to address my particular needs, I found that jsdoc useful to produce an API for people wanting to use advanced features in creating unusual programming tasks (aka "worlds"). When I started using it, there did not seem to be any easy way to use Sphinx to create such API. I gather that this might no longer be the case ... but it would likely require too much effort to make the change at this point.

jsdoc can also be installed using npm.

Use jshint instead of jslint

A linter can often be useful in identifying potential or real problems with some code. When I started working on this project, the only linter I knew was jslint. jshint is friendlier and more configurable to use, and is my preferred choice. And, you guessed it, jshint can be installed using npm.

Last thoughts

There might very well be other tools that would be better for your own projects but, if you love Python and find yourself not overly enthusiastic at the thought of adopting the Javascript way when working on a project that requires Javascript, you might find that the tools I use match more closely the way you do things with Python. Or not.



[1] Currently, programs can be written in Python, Javascript, using blockly, or in Python using a REPL.

[2] Language support can mean one of two things: either the programming library for users (like using "avance()" in French as equivalent to "move()" in English, or for the UI, or both. Currently, French and English are implemented for both, while Korean and Polish are only available for UI. Work is underway to provide Chinese support for both.

[3] The tutorial can be found here; you can change the default language using the side-bar on the right. The repository is at https://github.com/aroberge/reeborg-docs. The tutorial is currently available in French, English and Korean, with additional languages in the works.

[4] https://github.com/aroberge/reeborg-howto is a site aimed at creators of advanced tasks for Reeborg's World. It has very little content currently but will have more to be migrated from https://github.com/aroberge/reeborg-world-creation which was written as an online book (a format which I found to be unsatisfactory.)

[5] https://github.com/aroberge/reeborg-api is a documentation site for the API that creators of advanced tasks can use.

[6]





21 Jun 2018 1:19pm GMT

Dan Crosta: Flask-PyMongo, Back from the Dead

Sprouting seeds homesteading.com

Long ago, when I worked at MongoDB I created Flask-PyMongo to make it easy for programmers using Flask to use the database. Fast forward almost 8 years, during which time I wasn't a consistent user of either Flask or MongoDB, and Flask-PyMongo has fallen into disrepair.

MongoDB, PyMongo, and Flask have moved on, and Flask-PyMongo hasn't been kept up to date. There are more than twice as many forks as pull requests, a GitHub ratio I'm not proud of. Fotunately, the future for Flask-PyMongo is bright.

False Starts and New Beginnings

At PyCon US 2017, I first had the idea to restore Flask-PyMongo. PyCon always has this effect on me, but sadly the effect is often short lived. In 2017, I got as far as mentioning plans for a 2.0 release, but did not go into any detail, nor begin to make any progress toward that goal.

This year at PyCon 2018, I once again had the urge to work on Flask-PyMongo. I had the same conversation with Jesse, decided, again, to jump from 0.x to 2.0, and even came up with the same technical plan as the previous year (about which more below). All without realizing that I had been down this road once before.

I can now confidently say that Flask-PyMongo 2.0 is (soon to be) a real thing, and it will set the stage for easier maintainability into the future and a better experience for users and contributors. Flask-PyMongo 2.0 will be released in early July, and pre-release versions are available today.

What's Changing

Flask-PyMongo 2.0 is not backwards compatible!

A lot of the historical problems with Flask-PyMongo have cenetered on the confusing and difficult configuration system. Originally, I envisioned that users would want configuration abstracted from PyMongo itself, and created a system where you could set Flask configurations for MONGO_HOST, MONGO_PORT, and MONGO_DBNAME, and be off to the races. For a while this worked, and many users seemed to like it. Unfortunately, there are quite a lot of configuration options for PyMongo, so the list of configurations grew. Worse, PyMongo and MongoDB are under active development, and grow and lose features over time. Attempts to make Flask-PyMongo version-agnostic added tremendous complexity to the configuration system, and evidently frustrated many users over the years.

In any event, it turns out that there's a better way to configure PyMongo -- with MongoDB URIs. Most hosted PyMongo services already provide configuration information in exactly this format. Going forward in 2.0, MongoDB URIs are the preferred configuration method for Flask-PyMongo. Flask-PyMongo will only look for or respect a single Flask configuration variable, MONGO_URI.

If you prefer, you may also pass positional and keyword arguments directly to Flask-PyMongo, which will be passed through to the underlying PyMongo MongoClient object.

Flask-PyMongo no longer supports configurating multiple instances via Flask configuration. If you wish to use multiple Flask-PyMongo instances, you must configure at least some of them using a URI or direct argument passing.

Flask-PyMongo 2.0 also clarifies the support policy for versions of Flask, PyMongo, MongoDB, and Python that are supported. For Flask and PyMongo, it supports "recent" versions -- those versions with releases in the preceding 3 years (give or take). For MongoDB, we follow the MongoDB server support policy, and support versions that are not end-of-lifed. For Python, we support 2.7 for as long as it is supported by the CPython core maintainers; and the most recent 3 versions of the 3.x series. For an exact list of supported versions and combinations, see the build matrix.

What You Should Do

If you are a Flask-PyMongo user and you are using the 0.x series, you should immediately pin a particular version. Flask-PyMongo 2.0 is not backwards compatible, so you should take steps to ensure that you don't accidentally break your application.

If you are already using a URI for Flask-PyMongo configuration, or if that is an easy change for you, I would appreciate if you could upgrade, test compatibility, and report any issues on GitHub. You can install Flask-PyMongo 2.0 pre-releases with pip install --pre flask-pymongo. You may also want to follow the general discussion and release notices in issue #110.

I also hope for Flask-PyMongo to be a place that supports Flask and MongoDB with more than just connection assistance. Please suggest ideas and propose contributions!

21 Jun 2018 1:00pm GMT

Dan Crosta: Flask-PyMongo, Back from the Dead

Sprouting seeds homesteading.com

Long ago, when I worked at MongoDB I created Flask-PyMongo to make it easy for programmers using Flask to use the database. Fast forward almost 8 years, during which time I wasn't a consistent user of either Flask or MongoDB, and Flask-PyMongo has fallen into disrepair.

MongoDB, PyMongo, and Flask have moved on, and Flask-PyMongo hasn't been kept up to date. There are more than twice as many forks as pull requests, a GitHub ratio I'm not proud of. Fotunately, the future for Flask-PyMongo is bright.

False Starts and New Beginnings

At PyCon US 2017, I first had the idea to restore Flask-PyMongo. PyCon always has this effect on me, but sadly the effect is often short lived. In 2017, I got as far as mentioning plans for a 2.0 release, but did not go into any detail, nor begin to make any progress toward that goal.

This year at PyCon 2018, I once again had the urge to work on Flask-PyMongo. I had the same conversation with Jesse, decided, again, to jump from 0.x to 2.0, and even came up with the same technical plan as the previous year (about which more below). All without realizing that I had been down this road once before.

I can now confidently say that Flask-PyMongo 2.0 is (soon to be) a real thing, and it will set the stage for easier maintainability into the future and a better experience for users and contributors. Flask-PyMongo 2.0 will be released in early July, and pre-release versions are available today.

What's Changing

Flask-PyMongo 2.0 is not backwards compatible!

A lot of the historical problems with Flask-PyMongo have cenetered on the confusing and difficult configuration system. Originally, I envisioned that users would want configuration abstracted from PyMongo itself, and created a system where you could set Flask configurations for MONGO_HOST, MONGO_PORT, and MONGO_DBNAME, and be off to the races. For a while this worked, and many users seemed to like it. Unfortunately, there are quite a lot of configuration options for PyMongo, so the list of configurations grew. Worse, PyMongo and MongoDB are under active development, and grow and lose features over time. Attempts to make Flask-PyMongo version-agnostic added tremendous complexity to the configuration system, and evidently frustrated many users over the years.

In any event, it turns out that there's a better way to configure PyMongo -- with MongoDB URIs. Most hosted PyMongo services already provide configuration information in exactly this format. Going forward in 2.0, MongoDB URIs are the preferred configuration method for Flask-PyMongo. Flask-PyMongo will only look for or respect a single Flask configuration variable, MONGO_URI.

If you prefer, you may also pass positional and keyword arguments directly to Flask-PyMongo, which will be passed through to the underlying PyMongo MongoClient object.

Flask-PyMongo no longer supports configurating multiple instances via Flask configuration. If you wish to use multiple Flask-PyMongo instances, you must configure at least some of them using a URI or direct argument passing.

Flask-PyMongo 2.0 also clarifies the support policy for versions of Flask, PyMongo, MongoDB, and Python that are supported. For Flask and PyMongo, it supports "recent" versions -- those versions with releases in the preceding 3 years (give or take). For MongoDB, we follow the MongoDB server support policy, and support versions that are not end-of-lifed. For Python, we support 2.7 for as long as it is supported by the CPython core maintainers; and the most recent 3 versions of the 3.x series. For an exact list of supported versions and combinations, see the build matrix.

What You Should Do

If you are a Flask-PyMongo user and you are using the 0.x series, you should immediately pin a particular version. Flask-PyMongo 2.0 is not backwards compatible, so you should take steps to ensure that you don't accidentally break your application.

If you are already using a URI for Flask-PyMongo configuration, or if that is an easy change for you, I would appreciate if you could upgrade, test compatibility, and report any issues on GitHub. You can install Flask-PyMongo 2.0 pre-releases with pip install --pre flask-pymongo. You may also want to follow the general discussion and release notices in issue #110.

I also hope for Flask-PyMongo to be a place that supports Flask and MongoDB with more than just connection assistance. Please suggest ideas and propose contributions!

21 Jun 2018 1:00pm GMT

py.CheckIO: Design Patterns. Part 1

design patterns

Well-structured code with the thought out architecture is your goal, but the ordinary books and articles seem too confusing? Then this article is for you! We've used the simplest analogies to describe two classic design patterns: Abstract Factory and Strategy. The article also includes Python code examples of patterns implementation and links to the coding challenges, so you can practice right away and understand how the patterns work once and for all.

21 Jun 2018 9:43am GMT

py.CheckIO: Design Patterns. Part 1

design patterns

Well-structured code with the thought out architecture is your goal, but the ordinary books and articles seem too confusing? Then this article is for you! We've used the simplest analogies to describe two classic design patterns: Abstract Factory and Strategy. The article also includes Python code examples of patterns implementation and links to the coding challenges, so you can practice right away and understand how the patterns work once and for all.

21 Jun 2018 9:43am GMT

20 Jun 2018

feedPlanet Python

NumFOCUS: Ethical Algorithms — Notes from the DISC Unconference

The post Ethical Algorithms - Notes from the DISC Unconference appeared first on NumFOCUS.

20 Jun 2018 8:02pm GMT

NumFOCUS: Ethical Algorithms — Notes from the DISC Unconference

The post Ethical Algorithms - Notes from the DISC Unconference appeared first on NumFOCUS.

20 Jun 2018 8:02pm GMT

Curtis Miller: Learn Basic Python and scikit-learn Machine Learning Hands-On with My Course: Training Your Systems with Python Statistical Modelling

In this course I cover statistics and machine learning topics. The course assumes little knowledge about what statistics or machine learning involves. I touch lightly on the theory of statistics and machine learning to motivate the tasks performed in the videos.

20 Jun 2018 5:10pm GMT

Curtis Miller: Learn Basic Python and scikit-learn Machine Learning Hands-On with My Course: Training Your Systems with Python Statistical Modelling

In this course I cover statistics and machine learning topics. The course assumes little knowledge about what statistics or machine learning involves. I touch lightly on the theory of statistics and machine learning to motivate the tasks performed in the videos.

20 Jun 2018 5:10pm GMT

PyCharm: PyCharm 2018.2 EAP 4

We're now in our fourth installment of a pretty big 2018.2 Early Access Program cycle. Lots to take a look at by downloading EAP 4 from our website.

New in PyCharm 2018.2 EAP 4

Pipenv support

We know many of you have been waiting for this for a long time, so here you go: Pipenv is supported in PyCharm 2018.2. There is still a lot of work before we finally release stable PyCharm 2018.2 so your input with bug reports or suggestions is very welcome in our issue tracker.

Currently supported Pipenv-related features in PyCharm:

pytest-bdd Support

In this EAP we introduce an initial support for pytest-bdd. To enable the pytest-bdd support open the BDD settings dialog (File | Settings/Preferences | Languages & Frameworks | BDD ) and from the Preferred BDD framework list select pytest-bdd. We're continuing to work on py-bdd support, so your input is much appreciated.

More details on pytest-bdd support in PyCharm

Type hints validation

Any time you're applying type hints, PyCharm checks if the type is used correctly. If there is a usage error, the corresponding warning is shown and the recommended action is suggested.

Learn more about type hints validation in PyCharm

New Front-End Development Functionality

As you might already know, PyCharm bundles all features available in WebStorm, a front-end development IDE by JetBrains. PyCharm EAP 4 adds several WebStorm EAP features:

PyCharm 2018.2 EAP 4 Release Notes

Interested?

Download this EAP from our website. Alternatively, you can use the JetBrains Toolbox App to stay up to date throughout the entire EAP.

If you're on Ubuntu 16.04 or later, you can use snap to get PyCharm EAP, and stay up to date. You can find the installation instructions on our website.

PyCharm 2018.2 is in development during the EAP phase, therefore not all new features are already available. More features will be added in the coming weeks. As PyCharm 2018.2 is pre-release software, it is not as stable as the release versions. Furthermore, we may decide to change and/or drop certain features as the EAP progresses.

All EAP versions will ship with a built-in EAP license, which means that these versions are free to use for 30 days after the day that they are built. As EAPs are released weekly, you'll be able to use PyCharm Professional Edition EAP for free for the duration of the EAP program, as long as you upgrade at least once every 30 days.

20 Jun 2018 4:12pm GMT

PyCharm: PyCharm 2018.2 EAP 4

We're now in our fourth installment of a pretty big 2018.2 Early Access Program cycle. Lots to take a look at by downloading EAP 4 from our website.

New in PyCharm 2018.2 EAP 4

Pipenv support

We know many of you have been waiting for this for a long time, so here you go: Pipenv is supported in PyCharm 2018.2. There is still a lot of work before we finally release stable PyCharm 2018.2 so your input with bug reports or suggestions is very welcome in our issue tracker.

Currently supported Pipenv-related features in PyCharm:

pytest-bdd Support

In this EAP we introduce an initial support for pytest-bdd. To enable the pytest-bdd support open the BDD settings dialog (File | Settings/Preferences | Languages & Frameworks | BDD ) and from the Preferred BDD framework list select pytest-bdd. We're continuing to work on py-bdd support, so your input is much appreciated.

More details on pytest-bdd support in PyCharm

Type hints validation

Any time you're applying type hints, PyCharm checks if the type is used correctly. If there is a usage error, the corresponding warning is shown and the recommended action is suggested.

Learn more about type hints validation in PyCharm

New Front-End Development Functionality

As you might already know, PyCharm bundles all features available in WebStorm, a front-end development IDE by JetBrains. PyCharm EAP 4 adds several WebStorm EAP features:

PyCharm 2018.2 EAP 4 Release Notes

Interested?

Download this EAP from our website. Alternatively, you can use the JetBrains Toolbox App to stay up to date throughout the entire EAP.

If you're on Ubuntu 16.04 or later, you can use snap to get PyCharm EAP, and stay up to date. You can find the installation instructions on our website.

PyCharm 2018.2 is in development during the EAP phase, therefore not all new features are already available. More features will be added in the coming weeks. As PyCharm 2018.2 is pre-release software, it is not as stable as the release versions. Furthermore, we may decide to change and/or drop certain features as the EAP progresses.

All EAP versions will ship with a built-in EAP license, which means that these versions are free to use for 30 days after the day that they are built. As EAPs are released weekly, you'll be able to use PyCharm Professional Edition EAP for free for the duration of the EAP program, as long as you upgrade at least once every 30 days.

20 Jun 2018 4:12pm GMT

Real Python: Operators and Expressions in Python

After finishing our previous tutorial on Python variables in this series, you should now have a good grasp of creating and naming Python objects of different types. Let's do some work with them!

Here's what you'll learn in this tutorial: You'll see how calculations can be performed on objects in Python. By the end of this tutorial, you will be able to create complex expressions by combining objects and operators.

Get Notified: Don't miss the follow up to this tutorial-Click here to join the Real Python Newsletter and you'll know when the next installment comes out.

In Python, operators are special symbols that designate that some sort of computation should be performed. The values that an operator acts on are called operands.

Here is an example:

>>> a = 10
>>> b = 20
>>> a + b
30

In this case, the + operator adds the operands a and b together. An operand can be either a literal value or a variable that references an object:

>>> a = 10
>>> b = 20
>>> a + b - 5
25

A sequence of operands and operators, like a + b - 5, is called an expression. Python supports many operators for combining data objects into expressions. These are explored below.

Arithmetic Operators

The following table lists the arithmetic operators supported by Python:

Operator Example Meaning Result
+ (unary) +a Unary Positive a
In other words, it doesn't really do anything. It mostly exists for the sake of completeness, to complement Unary Negation.
+ (binary) a + b Addition Sum of a and b
- (unary) -a Unary Negation Value equal to a but opposite in sign
- (binary) a - b Subtraction b subtracted from a
* a * b Multiplication Product of a and b
/ a / b Division Quotient when a is divided by b.
The result always has type float.
% a % b Modulus Remainder when a is divided by b
// a // b Floor Division (also called Integer Division) Quotient when a is divided by b, rounded to the next smallest whole number
** a ** b Exponentiation a raised to the power of b

Here are some examples of these operators in use:

>>> a = 4
>>> b = 3
>>> +a
4
>>> -b
-3
>>> a + b
7
>>> a - b
1
>>> a * b
12
>>> a / b
1.3333333333333333
>>> a % b
1
>>> a ** b
64

The result of standard division (/) is always a float, even if the dividend is evenly divisible by the divisor:

>>> 10 / 5
2.0
>>> type(10 / 5)
<class 'float'>

When the result of floor division (//) is positive, it is as though the fractional portion is truncated off, leaving only the integer portion. When the result is negative, the result is rounded down to the next smallest (greater negative) integer:

>>> 10 / 4
2.5
>>> 10 // 4
2
>>> 10 // -4
-3
>>> -10 // 4
-3
>>> -10 // -4
2

Note, by the way, that in a REPL session, you can display the value of an expression by just typing it in at the >>> prompt without print(), the same as you can with a literal value or variable:

>>> 25
25
>>> x = 4
>>> y = 6
>>> x
4
>>> y
6
>>> x * 25 + y
106

Comparison Operators

Operator Example Meaning Result
== a == b Equal to True if the value of a is equal to the value of b
False otherwise
!= a != b Not equal to True if a is not equal to b
False otherwise
< a < b Less than True if a is less than b
False otherwise
<= a <= b Less than or equal to True if a is less than or equal to b
False otherwise
> a > b Greater than True if a is greater than b
False otherwise
>= a >= b Greater than or equal to True if a is greater than or equal to b
False otherwise

Here are examples of the comparison operators in use:

>>> a = 10
>>> b = 20
>>> a == b
False
>>> a != b
True
>>> a <= b
True
>>> a >= b
False

>>> a = 30
>>> b = 30
>>> a == b
True
>>> a <= b
True
>>> a >= b
True

Comparison operators are typically used in Boolean contexts like conditional and loop statements to direct program flow, as you will see later.

Equality Comparison on Floating-Point Values

Recall from the earlier discussion of floating-point numbers that the value stored internally for a float object may not be precisely what you'd think it would be. For that reason, it is poor practice to compare floating-point values for exact equality. Consider this example:

>>> x = 1.1 + 2.2
>>> x == 3.3
False

Yikes! The internal representations of the addition operands are not exactly equal to 1.1 and 2.2, so you cannot rely on x to compare exactly to 3.3.

The preferred way to determine whether two floating-point values are "equal" is to compute whether they are close to one another, given some tolerance. Take a look at this example:

>>> tolerance = 0.00001
>>> x = 1.1 + 2.2
>>> abs(x - 3.3) < tolerance
True

abs() returns absolute value. If the absolute value of the difference between the two numbers is less than the specified tolerance, they are close enough to one another to be considered equal.

Logical Operators

The logical operators not, or, and and modify and join together expressions evaluated in Boolean context to create more complex conditions.

Logical Expressions Involving Boolean Operands

As you have seen, some objects and expressions in Python actually are of Boolean type. That is, they are equal to one of the Python objects True or False. Consider these examples:

>>> x = 5
>>> x < 10
True
>>> type(x < 10)
<class 'bool'>

>>> t = x > 10
>>> t
False
>>> type(t)
<class 'bool'>

>>> callable(x)
False
>>> type(callable(x))
<class 'bool'>

>>> t = callable(len)
>>> t
True
>>> type(t)
<class 'bool'>

In the examples above, x < 10, callable(x), and t are all Boolean objects or expressions.

Interpretation of logical expressions involving not, or, and and is straightforward when the operands are Boolean:

Operator Example Meaning
not not x True if x is False
False if x is True
(Logically reverses the sense of x)
or x or y True if either x or y is True
False otherwise
and x and y True if both x and y are True
False otherwise

Take a look at how they work in practice below.

"not" and Boolean Operands

x = 5
not x < 10
False
not callable(x)
True
Operand Value Logical Expression Value
x < 10 True not x < 10 False
callable(x) False not callable(x) True

"or" and Boolean Operands

x = 5
x < 10 or callable(x)
True
x < 0 or callable(x)
False
Operand Value Operand Value Logical Expression Value
x < 10 True callable(x) False x < 10 or callable(x) True
x < 0 False callable(x) False x < 0 or callable(x) False

"and" and Boolean Operands

x = 5
x < 10 and callable(x)
False
x < 10 and callable(len)
True
Operand Value Operand Value Logical Expression Value
x < 10 True callable(x) False x < 10 and callable(x) False
x < 10 True callable(len) True x < 10 or callable(len) True

Evaluation of Non-Boolean Values in Boolean Context

Many objects and expressions are not equal to True or False. Nonetheless, they may still be evaluated in Boolean context and determined to be "truthy" or "falsy."

So what is true and what isn't? As a philosophical question, that is outside the scope of this tutorial!

But in Python, it is well-defined. All the following are considered false when evaluated in Boolean context:

Virtually any other object built into Python is regarded as true.

You can determine the "truthiness" of an object or expression with the built-in bool() function. bool() returns True if its argument is truthy and False if it is falsy.

Numeric Value

A zero value is false.
A non-zero value is true.

>>> print(bool(0), bool(0.0), bool(0.0+0j))
False False False

>>> print(bool(-3), bool(3.14159), bool(1.0+1j))
True True True

String

An empty string is false.
A non-empty string is true.

>>> print(bool(''), bool(""), bool(""""""))
False False False

>>> print(bool('foo'), bool(" "), bool(''' '''))
True True True

Built-In Composite Data Object

Python provides built-in composite data types called list, tuple, dict, and set. These are "container" types that contain other objects. An object of one of these types is considered false if it is empty and true if it is non-empty.

The examples below demonstrate this for the list type. (Lists are defined in Python with square brackets.)

For more information on the list, tuple, dict, and set types, see the upcoming tutorials.

>>> type([])
<class 'list'>
>>> bool([])
False

>>> type([1, 2, 3])
<class 'list'>
>>> bool([1, 2, 3])
True

The "None" Keyword

None is always false:

>>> bool(None)
False

Logical Expressions Involving Non-Boolean Operands

Non-Boolean values can also be modified and joined by not, or and, and. The result depends on the "truthiness" of the operands.

"not" and Non-Boolean Operands

Here is what happens for a non-Boolean value x:

If x is not x is
"truthy" False
"falsy" True

Here are some concrete examples:

>>> x = 3
>>> bool(x)
True
>>> not x
False

>>> x = 0.0
>>> bool(x)
False
>>> not x
True

"or" and Non-Boolean Operands

This is what happens for two non-Boolean values x and y:

If x is x or y is
truthy x
falsy y

Note that in this case, the expression x or y does not evaluate to either True or False, but instead to one of either x or y:

>>> x = 3
>>> y = 4
>>> x or y
3

>>> x = 0.0
>>> y = 4.4
>>> x or y
4.4

Even so, it is still the case that the expression x or y will be truthy if either x or y is truthy, and falsy if both x and y are falsy.

"and" and Non-Boolean Operands

Here's what you'll get for two non-Boolean values x and y:

If x is x and y is
"truthy" y
"falsy" x
>>> x = 3
>>> y = 4
>>> x and y
4

>>> x = 0.0
>>> y = 4.4
>>> x and y
0.0

As with or, the expression x and y does not evaluate to either True or False, but instead to one of either x or y. x and y will be truthy if both x and y are truthy, and falsy otherwise.

Compound Logical Expressions and Short-Circuit Evaluation

So far, you have seen expressions with only a single or or and operator and two operands:

x or y
x and y

Multiple logical operators and operands can be strung together to form compound logical expressions.

Compound "or" Expressions

Consider the following expression:

x1 or x2 or x3 orxn

This expression is true if any of the xi are true.

In an expression like this, Python uses a methodology called short-circuit evaluation, also called McCarthy evaluation in honor of computer scientist John McCarthy. The xi operands are evaluated in order from left to right. As soon as one is found to be true, the entire expression is known to be true. At that point, Python stops and no more terms are evaluated. The value of the entire expression is that of the xi that terminated evaluation.

To help demonstrate short-circuit evaluation, suppose that you have a simple "identity" function f() that behaves as follows:

(You will see how to define such a function in the upcoming tutorial on Functions.)

Several example calls to f() are shown below:

>>> f(0)
-> f(0) = 0
0

>>> f(False)
-> f(False) = False
False

>>> f(1.5)
-> f(1.5) = 1.5
1.5

Because f() simply returns the argument passed to it, we can make the expression f(arg) be truthy or falsy as needed by specifying a value for arg that is appropriately truthy or falsy. Additionally, f() displays its argument to the console, which visually confirms whether or not it was called.

Now, consider the following compound logical expression:

>>> f(0) or f(False) or f(1) or f(2) or f(3)
-> f(0) = 0
-> f(False) = False
-> f(1) = 1
1

The interpreter first evaluates f(0), which is 0. A numeric value of 0 is false. The expression is not true yet, so evaluation proceeds left to right. The next operand, f(False), returns False. That is also false, so evaluation continues.

Next up is f(1). That evaluates to 1, which is true. At that point, the interpreter stops because it now knows the entire expression to be true. 1 is returned as the value of the expression, and the remaining operands, f(2) and f(3), are never evaluated. You can see from the display that the f(2) and f(3) calls do not occur.

Compound "and" Expressions

A similar situation exists in an expression with multiple and operators:

x1 and x2 and x3 andxn

This expression is true if all the xi are true.

In this case, short-circuit evaluation dictates that the interpreter stop evaluating as soon as any operand is found to be false, because at that point the entire expression is known to be false. Once that is the case, no more operands are evaluated, and the falsy operand that terminated evaluation is returned as the value of the expression:

>>> f(1) and f(False) and f(2) and f(3)
-> f(1) = 1
-> f(False) = False
False

>>> f(1) and f(0.0) and f(2) and f(3)
-> f(1) = 1
-> f(0.0) = 0.0
0.0

In both examples above, evaluation stops at the first term that is false-f(False) in the first case, f(0.0) in the second case-and neither the f(2) nor f(3) call occurs. False and 0.0, respectively, are returned as the value of the expression.

If all the operands are truthy, they all get evaluated and the last (rightmost) one is returned as the value of the expression:

>>> f(1) and f(2.2) and f('bar')
-> f(1) = 1
-> f(2.2) = 2.2
-> f(bar) = bar
'bar'

Idioms That Exploit Short-Circuit Evaluation

There are some common idiomatic patterns that exploit short-circuit evaluation for conciseness of expression.

Avoiding an Exception

Suppose you have defined two variables a and b, and you want to know whether (b / a) > 0:

>>> a = 3
>>> b = 1
>>> (b / a) > 0
True

But you need to account for the possibility that a might be 0, in which case the interpreter will raise an exception:

>>> a = 0
>>> b = 1
>>> (b / a) > 0
Traceback (most recent call last):
  File "<pyshell#2>", line 1, in <module>
    (b / a) > 0
ZeroDivisionError: division by zero

You can avoid an error with an expression like this:

>>> a = 0
>>> b = 1
>>> a != 0 and (b / a) > 0
False

When a is 0, a != 0 is false. Short-circuit evaluation ensures that evaluation stops at that point. (b / a) is not evaluated, and no error is raised.

If fact, you can be even more concise than that. When a is 0, the expression a by itself is falsy. There is no need for the explicit comparison a != 0:

>>> a = 0
>>> b = 1
>>> a and (b / a) > 0
0

Selecting a Default Value

Another idiom involves selecting a default value when a specified value is zero or empty. For example, suppose you want to assign a variable s to the value contained in another variable called string. But if string is empty, you want to supply a default value.

Here is a concise way of expressing this using short-circuit evaluation:

s = string or '<default_value>'

If string is non-empty, it is truthy, and the expression string or '<default_value>' will be true at that point. Evaluation stops, and the value of string is returned and assigned to s:

>>> string = 'foo bar'
>>> s = string or '<default_value>'
>>> s
'foo bar'

On the other hand, if string is an empty string, it is falsy. Evaluation of string or '<default_value>' continues to the next operand, '<default_value>', which is returned and assigned to s:

>>> string = ''
>>> s = string or '<default_value>'
>>> s
'<default_value>'

Chained Comparisons

Comparison operators can be chained together to arbitrary length. For example, the following expressions are nearly equivalent:

x < y <= z
x < y and y <= z

They will both evaluate to the same Boolean value. The subtle difference between the two is that in the chained comparison x < y <= z, y is evaluated only once. The longer expression x < y and y <= z will cause y to be evaluated twice.

Note: In cases where y is a static value, this will not be a significant distinction. But consider these expressions:

x < f() <= z
x < f() and f() <= z

If f() is a function that causes program data to be modified, the difference between its being called once in the first case and twice in the second case may be important.

More generally, if op1, op2, …, opn are comparison operators, then the following have the same Boolean value:

x1 op1 x2 op2 x3 … xn-1 opn xn

x1 op1 x2 and x2 op2 x3 and … xn-1 opn xn

In the former case, each xi is only evaluated once. In the latter case, each will be evaluated twice except the first and last, unless short-circuit evaluation causes premature termination.

Bitwise Operators

Bitwise operators treat operands as sequences of binary digits and operate on them bit by bit. The following operators are supported:

Operator Example Meaning Result
& a & b bitwise AND Each bit position in the result is the logical AND of the bits in the corresponding position of the operands. (1 if both are 1, otherwise 0.)
| a | b bitwise OR Each bit position in the result is the logical OR of the bits in the corresponding position of the operands. (1 if either is 1, otherwise 0.)
~ ~a bitwise negation Each bit position in the result is the logical negation of the bit in the corresponding position of the operand. (1 if 0, 0 if 1.)
^ a ^ b bitwise XOR (exclusive OR) Each bit position in the result is the logical XOR of the bits in the corresponding position of the operands. (1 if the bits in the operands are different, 0 if they are the same.)
>> a >> n Shift right n places Each bit is shifted right n places.
<< a << n Shift left n places Each bit is shifted left n places.

Here are some examples:

>>> '0b{:04b}'.format(0b1100 & 0b1010)
'0b1000'
>>> '0b{:04b}'.format(0b1100 | 0b1010)
'0b1110'
>>> '0b{:04b}'.format(0b1100 ^ 0b1010)
'0b0110'
>>> '0b{:04b}'.format(0b1100 >> 2)
'0b0011'
>>> '0b{:04b}'.format(0b0011 << 2)
'0b1100'

Note: The purpose of the '0b{:04b}'.format() is to format the numeric output of the bitwise operations, to make them easier to read. You will see the format() method in much more detail later. For now, just pay attention to the operands of the bitwise operations, and the results.

Identity Operators

Python provides two operators, is and is not, that determine whether the given operands have the same identity-that is, refer to the same object. This is not the same thing as equality, which means the two operands refer to objects that contain the same data but are not necessarily the same object.

Here is an example of two object that are equal but not identical:

>>> x = 1001
>>> y = 1000 + 1
>>> print(x, y)
1001 1001

>>> x == y
True
>>> x is y
False

Here, x and y both refer to objects whose value is 1001. They are equal. But they do not reference the same object, as you can verify:

>>> id(x)
60307920
>>> id(y)
60307936

x and y do not have the same identity, and x is y returns False.

You saw previously that when you make an assignment like x = y, Python merely creates a second reference to the same object, and that you could confirm that fact with the id() function. You can also confirm it using the is operator:

>>> a = 'I am a string'
>>> b = a
>>> id(a)
55993992
>>> id(b)
55993992

>>> a is b
True
>>> a == b
True

In this case, since a and b reference the same object, it stands to reason that a and b would be equal as well.

Unsurprisingly, the opposite of is is is not:

>>> x = 10
>>> y = 20
>>> x is not y
True

Operator Precedence

Consider this expression:

>>> 20 + 4 * 10
60

There is ambiguity here. Should Python perform the addition 20 + 4 first and then multiply the sum by 10? Or should the multiplication 4 * 10 be performed first, and the addition of 20 second?

Clearly, since the result is 60, Python has chosen the latter; if it had chosen the former, the result would be 240. This is standard algebraic procedure, found universally in virtually all programming languages.

All operators that the language supports are assigned a precedence. In an expression, all operators of highest precedence are performed first. Once those results are obtained, operators of the next highest precedence are performed. So it continues, until the expression is fully evaluated. Any operators of equal precedence are performed in left-to-right order.

Here is the order of precedence of the Python operators you have seen so far, from lowest to highest:

Operator Description
lowest precedence or Boolean OR
and Boolean AND
not Boolean NOT
==, !=, <, <=, >, >=, is, is not comparisons, identity
| bitwise OR
^ bitwise XOR
& bitwise AND
<<, >> bit shifts
+, - addition, subtraction
*, /, //, % multiplication, division, floor division, modulo
+x, -x, ~x unary positive, unary negation, bitwise negation
highest precedence ** exponentiation

Operators at the top of the table have the lowest precedence, and those at the bottom of the table have the highest. Any operators in the same row of the table have equal precedence.

It is clear why multiplication is performed first in the example above: multiplication has a higher precedence than addition.

Similarly, in the example below, 3 is raised to the power of 4 first, which equals 81, and then the multiplications are carried out in order from left to right (2 * 81 * 5 = 810):

>>> 2 * 3 ** 4 * 5
810

Operator precedence can be overridden using parentheses. Expressions in parentheses are always performed first, before expressions that are not parenthesized. Thus, the following happens:

>>> 20 + 4 * 10
60
>>> (20 + 4) * 10
240

>>> 2 * 3 ** 4 * 5
810
>>> 2 * 3 ** (4 * 5)
6973568802

In the first example, 20 + 4 is computed first, then the result is multiplied by 10. In the second example, 4 * 5 is calculated first, then 3 is raised to that power, then the result is multiplied by 2.

There is nothing wrong with making liberal use of parentheses, even when they aren't necessary to change the order of evaluation. In fact, it is considered good practice, because it can make the code more readable, and it relieves the reader of having to recall operator precedence from memory. Consider the following:

(a < 10) and (b > 30)

Here the parentheses are fully unnecessary, as the comparison operators have higher precedence than and does and would have been performed first anyhow. But some might consider the intent of the parenthesized version more immediately obvious than this version without parentheses:

a < 10 and b > 30

On the other hand, there are probably those who would prefer the latter; it's a matter of personal preference. The point is, you can always use parentheses if you feel it makes the code more readable, even if they aren't necessary to change the order of evaluation.

Augmented Assignment Operators

You have seen that a single equal sign (=) is used to assign a value to a variable. It is, of course, perfectly viable for the value to the right of the assignment to be an expression containing other variables:

>>> a = 10
>>> b = 20
>>> c = a * 5 + b
>>> c
70

In fact, the expression to the right of the assignment can include references to the variable that is being assigned to:

>>> a = 10
>>> a = a + 5
>>> a
15

>>> b = 20
>>> b = b * 3
>>> b
60

The first example is interpreted as "a is assigned the current value of a plus 5," effectively increasing the value of a by 5. The second reads "b is assigned the current value of b times 3," effectively increasing the value of b threefold.

Of course, this sort of assignment only makes sense if the variable in question has already previously been assigned a value:

>>> z = z / 12
Traceback (most recent call last):
  File "<pyshell#11>", line 1, in <module>
    z = z / 12
NameError: name 'z' is not defined

Python supports a shorthand augmented assignment notation for these arithmetic and bitwise operators:

Arithmetic Bitwise
+
-
*
/
%
//
**
&
|
^
>>
<<

For these operators, the following are equivalent:

x <op>= y
x = x <op> y

Take a look at these examples:

Augmented
Assignment
Standard
Assignment
a += 5 is equivalent to a = a + 5
a /= 10 is equivalent to a = a / 10
a ^= b is equivalent to a = a ^ b

Conclusion

In this tutorial, you learned about the diverse operators Python supports to combine objects into expressions.

Most of the examples you have seen so far have involved only simple atomic data, but you saw a brief introduction to the string data type. The next tutorial will explore string objects in much more detail.

Get Notified: Don't miss the follow up to this tutorial-Click here to join the Real Python Newsletter and you'll know when the next installment comes out.

« Variables in Python
Operators and Expressions in Python
Strings in Python »

[ Improve Your Python With 🐍 Python Tricks 💌 - Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

20 Jun 2018 2:00pm GMT

Real Python: Operators and Expressions in Python

After finishing our previous tutorial on Python variables in this series, you should now have a good grasp of creating and naming Python objects of different types. Let's do some work with them!

Here's what you'll learn in this tutorial: You'll see how calculations can be performed on objects in Python. By the end of this tutorial, you will be able to create complex expressions by combining objects and operators.

Get Notified: Don't miss the follow up to this tutorial-Click here to join the Real Python Newsletter and you'll know when the next installment comes out.

In Python, operators are special symbols that designate that some sort of computation should be performed. The values that an operator acts on are called operands.

Here is an example:

>>> a = 10
>>> b = 20
>>> a + b
30

In this case, the + operator adds the operands a and b together. An operand can be either a literal value or a variable that references an object:

>>> a = 10
>>> b = 20
>>> a + b - 5
25

A sequence of operands and operators, like a + b - 5, is called an expression. Python supports many operators for combining data objects into expressions. These are explored below.

Arithmetic Operators

The following table lists the arithmetic operators supported by Python:

Operator Example Meaning Result
+ (unary) +a Unary Positive a
In other words, it doesn't really do anything. It mostly exists for the sake of completeness, to complement Unary Negation.
+ (binary) a + b Addition Sum of a and b
- (unary) -a Unary Negation Value equal to a but opposite in sign
- (binary) a - b Subtraction b subtracted from a
* a * b Multiplication Product of a and b
/ a / b Division Quotient when a is divided by b.
The result always has type float.
% a % b Modulus Remainder when a is divided by b
// a // b Floor Division (also called Integer Division) Quotient when a is divided by b, rounded to the next smallest whole number
** a ** b Exponentiation a raised to the power of b

Here are some examples of these operators in use:

>>> a = 4
>>> b = 3
>>> +a
4
>>> -b
-3
>>> a + b
7
>>> a - b
1
>>> a * b
12
>>> a / b
1.3333333333333333
>>> a % b
1
>>> a ** b
64

The result of standard division (/) is always a float, even if the dividend is evenly divisible by the divisor:

>>> 10 / 5
2.0
>>> type(10 / 5)
<class 'float'>

When the result of floor division (//) is positive, it is as though the fractional portion is truncated off, leaving only the integer portion. When the result is negative, the result is rounded down to the next smallest (greater negative) integer:

>>> 10 / 4
2.5
>>> 10 // 4
2
>>> 10 // -4
-3
>>> -10 // 4
-3
>>> -10 // -4
2

Note, by the way, that in a REPL session, you can display the value of an expression by just typing it in at the >>> prompt without print(), the same as you can with a literal value or variable:

>>> 25
25
>>> x = 4
>>> y = 6
>>> x
4
>>> y
6
>>> x * 25 + y
106

Comparison Operators

Operator Example Meaning Result
== a == b Equal to True if the value of a is equal to the value of b
False otherwise
!= a != b Not equal to True if a is not equal to b
False otherwise
< a < b Less than True if a is less than b
False otherwise
<= a <= b Less than or equal to True if a is less than or equal to b
False otherwise
> a > b Greater than True if a is greater than b
False otherwise
>= a >= b Greater than or equal to True if a is greater than or equal to b
False otherwise

Here are examples of the comparison operators in use:

>>> a = 10
>>> b = 20
>>> a == b
False
>>> a != b
True
>>> a <= b
True
>>> a >= b
False

>>> a = 30
>>> b = 30
>>> a == b
True
>>> a <= b
True
>>> a >= b
True

Comparison operators are typically used in Boolean contexts like conditional and loop statements to direct program flow, as you will see later.

Equality Comparison on Floating-Point Values

Recall from the earlier discussion of floating-point numbers that the value stored internally for a float object may not be precisely what you'd think it would be. For that reason, it is poor practice to compare floating-point values for exact equality. Consider this example:

>>> x = 1.1 + 2.2
>>> x == 3.3
False

Yikes! The internal representations of the addition operands are not exactly equal to 1.1 and 2.2, so you cannot rely on x to compare exactly to 3.3.

The preferred way to determine whether two floating-point values are "equal" is to compute whether they are close to one another, given some tolerance. Take a look at this example:

>>> tolerance = 0.00001
>>> x = 1.1 + 2.2
>>> abs(x - 3.3) < tolerance
True

abs() returns absolute value. If the absolute value of the difference between the two numbers is less than the specified tolerance, they are close enough to one another to be considered equal.

Logical Operators

The logical operators not, or, and and modify and join together expressions evaluated in Boolean context to create more complex conditions.

Logical Expressions Involving Boolean Operands

As you have seen, some objects and expressions in Python actually are of Boolean type. That is, they are equal to one of the Python objects True or False. Consider these examples:

>>> x = 5
>>> x < 10
True
>>> type(x < 10)
<class 'bool'>

>>> t = x > 10
>>> t
False
>>> type(t)
<class 'bool'>

>>> callable(x)
False
>>> type(callable(x))
<class 'bool'>

>>> t = callable(len)
>>> t
True
>>> type(t)
<class 'bool'>

In the examples above, x < 10, callable(x), and t are all Boolean objects or expressions.

Interpretation of logical expressions involving not, or, and and is straightforward when the operands are Boolean:

Operator Example Meaning
not not x True if x is False
False if x is True
(Logically reverses the sense of x)
or x or y True if either x or y is True
False otherwise
and x and y True if both x and y are True
False otherwise

Take a look at how they work in practice below.

"not" and Boolean Operands

x = 5
not x < 10
False
not callable(x)
True
Operand Value Logical Expression Value
x < 10 True not x < 10 False
callable(x) False not callable(x) True

"or" and Boolean Operands

x = 5
x < 10 or callable(x)
True
x < 0 or callable(x)
False
Operand Value Operand Value Logical Expression Value
x < 10 True callable(x) False x < 10 or callable(x) True
x < 0 False callable(x) False x < 0 or callable(x) False

"and" and Boolean Operands

x = 5
x < 10 and callable(x)
False
x < 10 and callable(len)
True
Operand Value Operand Value Logical Expression Value
x < 10 True callable(x) False x < 10 and callable(x) False
x < 10 True callable(len) True x < 10 or callable(len) True

Evaluation of Non-Boolean Values in Boolean Context

Many objects and expressions are not equal to True or False. Nonetheless, they may still be evaluated in Boolean context and determined to be "truthy" or "falsy."

So what is true and what isn't? As a philosophical question, that is outside the scope of this tutorial!

But in Python, it is well-defined. All the following are considered false when evaluated in Boolean context:

Virtually any other object built into Python is regarded as true.

You can determine the "truthiness" of an object or expression with the built-in bool() function. bool() returns True if its argument is truthy and False if it is falsy.

Numeric Value

A zero value is false.
A non-zero value is true.

>>> print(bool(0), bool(0.0), bool(0.0+0j))
False False False

>>> print(bool(-3), bool(3.14159), bool(1.0+1j))
True True True

String

An empty string is false.
A non-empty string is true.

>>> print(bool(''), bool(""), bool(""""""))
False False False

>>> print(bool('foo'), bool(" "), bool(''' '''))
True True True

Built-In Composite Data Object

Python provides built-in composite data types called list, tuple, dict, and set. These are "container" types that contain other objects. An object of one of these types is considered false if it is empty and true if it is non-empty.

The examples below demonstrate this for the list type. (Lists are defined in Python with square brackets.)

For more information on the list, tuple, dict, and set types, see the upcoming tutorials.

>>> type([])
<class 'list'>
>>> bool([])
False

>>> type([1, 2, 3])
<class 'list'>
>>> bool([1, 2, 3])
True

The "None" Keyword

None is always false:

>>> bool(None)
False

Logical Expressions Involving Non-Boolean Operands

Non-Boolean values can also be modified and joined by not, or and, and. The result depends on the "truthiness" of the operands.

"not" and Non-Boolean Operands

Here is what happens for a non-Boolean value x:

If x is not x is
"truthy" False
"falsy" True

Here are some concrete examples:

>>> x = 3
>>> bool(x)
True
>>> not x
False

>>> x = 0.0
>>> bool(x)
False
>>> not x
True

"or" and Non-Boolean Operands

This is what happens for two non-Boolean values x and y:

If x is x or y is
truthy x
falsy y

Note that in this case, the expression x or y does not evaluate to either True or False, but instead to one of either x or y:

>>> x = 3
>>> y = 4
>>> x or y
3

>>> x = 0.0
>>> y = 4.4
>>> x or y
4.4

Even so, it is still the case that the expression x or y will be truthy if either x or y is truthy, and falsy if both x and y are falsy.

"and" and Non-Boolean Operands

Here's what you'll get for two non-Boolean values x and y:

If x is x and y is
"truthy" y
"falsy" x
>>> x = 3
>>> y = 4
>>> x and y
4

>>> x = 0.0
>>> y = 4.4
>>> x and y
0.0

As with or, the expression x and y does not evaluate to either True or False, but instead to one of either x or y. x and y will be truthy if both x and y are truthy, and falsy otherwise.

Compound Logical Expressions and Short-Circuit Evaluation

So far, you have seen expressions with only a single or or and operator and two operands:

x or y
x and y

Multiple logical operators and operands can be strung together to form compound logical expressions.

Compound "or" Expressions

Consider the following expression:

x1 or x2 or x3 orxn

This expression is true if any of the xi are true.

In an expression like this, Python uses a methodology called short-circuit evaluation, also called McCarthy evaluation in honor of computer scientist John McCarthy. The xi operands are evaluated in order from left to right. As soon as one is found to be true, the entire expression is known to be true. At that point, Python stops and no more terms are evaluated. The value of the entire expression is that of the xi that terminated evaluation.

To help demonstrate short-circuit evaluation, suppose that you have a simple "identity" function f() that behaves as follows:

(You will see how to define such a function in the upcoming tutorial on Functions.)

Several example calls to f() are shown below:

>>> f(0)
-> f(0) = 0
0

>>> f(False)
-> f(False) = False
False

>>> f(1.5)
-> f(1.5) = 1.5
1.5

Because f() simply returns the argument passed to it, we can make the expression f(arg) be truthy or falsy as needed by specifying a value for arg that is appropriately truthy or falsy. Additionally, f() displays its argument to the console, which visually confirms whether or not it was called.

Now, consider the following compound logical expression:

>>> f(0) or f(False) or f(1) or f(2) or f(3)
-> f(0) = 0
-> f(False) = False
-> f(1) = 1
1

The interpreter first evaluates f(0), which is 0. A numeric value of 0 is false. The expression is not true yet, so evaluation proceeds left to right. The next operand, f(False), returns False. That is also false, so evaluation continues.

Next up is f(1). That evaluates to 1, which is true. At that point, the interpreter stops because it now knows the entire expression to be true. 1 is returned as the value of the expression, and the remaining operands, f(2) and f(3), are never evaluated. You can see from the display that the f(2) and f(3) calls do not occur.

Compound "and" Expressions

A similar situation exists in an expression with multiple and operators:

x1 and x2 and x3 andxn

This expression is true if all the xi are true.

In this case, short-circuit evaluation dictates that the interpreter stop evaluating as soon as any operand is found to be false, because at that point the entire expression is known to be false. Once that is the case, no more operands are evaluated, and the falsy operand that terminated evaluation is returned as the value of the expression:

>>> f(1) and f(False) and f(2) and f(3)
-> f(1) = 1
-> f(False) = False
False

>>> f(1) and f(0.0) and f(2) and f(3)
-> f(1) = 1
-> f(0.0) = 0.0
0.0

In both examples above, evaluation stops at the first term that is false-f(False) in the first case, f(0.0) in the second case-and neither the f(2) nor f(3) call occurs. False and 0.0, respectively, are returned as the value of the expression.

If all the operands are truthy, they all get evaluated and the last (rightmost) one is returned as the value of the expression:

>>> f(1) and f(2.2) and f('bar')
-> f(1) = 1
-> f(2.2) = 2.2
-> f(bar) = bar
'bar'

Idioms That Exploit Short-Circuit Evaluation

There are some common idiomatic patterns that exploit short-circuit evaluation for conciseness of expression.

Avoiding an Exception

Suppose you have defined two variables a and b, and you want to know whether (b / a) > 0:

>>> a = 3
>>> b = 1
>>> (b / a) > 0
True

But you need to account for the possibility that a might be 0, in which case the interpreter will raise an exception:

>>> a = 0
>>> b = 1
>>> (b / a) > 0
Traceback (most recent call last):
  File "<pyshell#2>", line 1, in <module>
    (b / a) > 0
ZeroDivisionError: division by zero

You can avoid an error with an expression like this:

>>> a = 0
>>> b = 1
>>> a != 0 and (b / a) > 0
False

When a is 0, a != 0 is false. Short-circuit evaluation ensures that evaluation stops at that point. (b / a) is not evaluated, and no error is raised.

If fact, you can be even more concise than that. When a is 0, the expression a by itself is falsy. There is no need for the explicit comparison a != 0:

>>> a = 0
>>> b = 1
>>> a and (b / a) > 0
0

Selecting a Default Value

Another idiom involves selecting a default value when a specified value is zero or empty. For example, suppose you want to assign a variable s to the value contained in another variable called string. But if string is empty, you want to supply a default value.

Here is a concise way of expressing this using short-circuit evaluation:

s = string or '<default_value>'

If string is non-empty, it is truthy, and the expression string or '<default_value>' will be true at that point. Evaluation stops, and the value of string is returned and assigned to s:

>>> string = 'foo bar'
>>> s = string or '<default_value>'
>>> s
'foo bar'

On the other hand, if string is an empty string, it is falsy. Evaluation of string or '<default_value>' continues to the next operand, '<default_value>', which is returned and assigned to s:

>>> string = ''
>>> s = string or '<default_value>'
>>> s
'<default_value>'

Chained Comparisons

Comparison operators can be chained together to arbitrary length. For example, the following expressions are nearly equivalent:

x < y <= z
x < y and y <= z

They will both evaluate to the same Boolean value. The subtle difference between the two is that in the chained comparison x < y <= z, y is evaluated only once. The longer expression x < y and y <= z will cause y to be evaluated twice.

Note: In cases where y is a static value, this will not be a significant distinction. But consider these expressions:

x < f() <= z
x < f() and f() <= z

If f() is a function that causes program data to be modified, the difference between its being called once in the first case and twice in the second case may be important.

More generally, if op1, op2, …, opn are comparison operators, then the following have the same Boolean value:

x1 op1 x2 op2 x3 … xn-1 opn xn

x1 op1 x2 and x2 op2 x3 and … xn-1 opn xn

In the former case, each xi is only evaluated once. In the latter case, each will be evaluated twice except the first and last, unless short-circuit evaluation causes premature termination.

Bitwise Operators

Bitwise operators treat operands as sequences of binary digits and operate on them bit by bit. The following operators are supported:

Operator Example Meaning Result
& a & b bitwise AND Each bit position in the result is the logical AND of the bits in the corresponding position of the operands. (1 if both are 1, otherwise 0.)
| a | b bitwise OR Each bit position in the result is the logical OR of the bits in the corresponding position of the operands. (1 if either is 1, otherwise 0.)
~ ~a bitwise negation Each bit position in the result is the logical negation of the bit in the corresponding position of the operand. (1 if 0, 0 if 1.)
^ a ^ b bitwise XOR (exclusive OR) Each bit position in the result is the logical XOR of the bits in the corresponding position of the operands. (1 if the bits in the operands are different, 0 if they are the same.)
>> a >> n Shift right n places Each bit is shifted right n places.
<< a << n Shift left n places Each bit is shifted left n places.

Here are some examples:

>>> '0b{:04b}'.format(0b1100 & 0b1010)
'0b1000'
>>> '0b{:04b}'.format(0b1100 | 0b1010)
'0b1110'
>>> '0b{:04b}'.format(0b1100 ^ 0b1010)
'0b0110'
>>> '0b{:04b}'.format(0b1100 >> 2)
'0b0011'
>>> '0b{:04b}'.format(0b0011 << 2)
'0b1100'

Note: The purpose of the '0b{:04b}'.format() is to format the numeric output of the bitwise operations, to make them easier to read. You will see the format() method in much more detail later. For now, just pay attention to the operands of the bitwise operations, and the results.

Identity Operators

Python provides two operators, is and is not, that determine whether the given operands have the same identity-that is, refer to the same object. This is not the same thing as equality, which means the two operands refer to objects that contain the same data but are not necessarily the same object.

Here is an example of two object that are equal but not identical:

>>> x = 1001
>>> y = 1000 + 1
>>> print(x, y)
1001 1001

>>> x == y
True
>>> x is y
False

Here, x and y both refer to objects whose value is 1001. They are equal. But they do not reference the same object, as you can verify:

>>> id(x)
60307920
>>> id(y)
60307936

x and y do not have the same identity, and x is y returns False.

You saw previously that when you make an assignment like x = y, Python merely creates a second reference to the same object, and that you could confirm that fact with the id() function. You can also confirm it using the is operator:

>>> a = 'I am a string'
>>> b = a
>>> id(a)
55993992
>>> id(b)
55993992

>>> a is b
True
>>> a == b
True

In this case, since a and b reference the same object, it stands to reason that a and b would be equal as well.

Unsurprisingly, the opposite of is is is not:

>>> x = 10
>>> y = 20
>>> x is not y
True

Operator Precedence

Consider this expression:

>>> 20 + 4 * 10
60

There is ambiguity here. Should Python perform the addition 20 + 4 first and then multiply the sum by 10? Or should the multiplication 4 * 10 be performed first, and the addition of 20 second?

Clearly, since the result is 60, Python has chosen the latter; if it had chosen the former, the result would be 240. This is standard algebraic procedure, found universally in virtually all programming languages.

All operators that the language supports are assigned a precedence. In an expression, all operators of highest precedence are performed first. Once those results are obtained, operators of the next highest precedence are performed. So it continues, until the expression is fully evaluated. Any operators of equal precedence are performed in left-to-right order.

Here is the order of precedence of the Python operators you have seen so far, from lowest to highest:

Operator Description
lowest precedence or Boolean OR
and Boolean AND
not Boolean NOT
==, !=, <, <=, >, >=, is, is not comparisons, identity
| bitwise OR
^ bitwise XOR
& bitwise AND
<<, >> bit shifts
+, - addition, subtraction
*, /, //, % multiplication, division, floor division, modulo
+x, -x, ~x unary positive, unary negation, bitwise negation
highest precedence ** exponentiation

Operators at the top of the table have the lowest precedence, and those at the bottom of the table have the highest. Any operators in the same row of the table have equal precedence.

It is clear why multiplication is performed first in the example above: multiplication has a higher precedence than addition.

Similarly, in the example below, 3 is raised to the power of 4 first, which equals 81, and then the multiplications are carried out in order from left to right (2 * 81 * 5 = 810):

>>> 2 * 3 ** 4 * 5
810

Operator precedence can be overridden using parentheses. Expressions in parentheses are always performed first, before expressions that are not parenthesized. Thus, the following happens:

>>> 20 + 4 * 10
60
>>> (20 + 4) * 10
240

>>> 2 * 3 ** 4 * 5
810
>>> 2 * 3 ** (4 * 5)
6973568802

In the first example, 20 + 4 is computed first, then the result is multiplied by 10. In the second example, 4 * 5 is calculated first, then 3 is raised to that power, then the result is multiplied by 2.

There is nothing wrong with making liberal use of parentheses, even when they aren't necessary to change the order of evaluation. In fact, it is considered good practice, because it can make the code more readable, and it relieves the reader of having to recall operator precedence from memory. Consider the following:

(a < 10) and (b > 30)

Here the parentheses are fully unnecessary, as the comparison operators have higher precedence than and does and would have been performed first anyhow. But some might consider the intent of the parenthesized version more immediately obvious than this version without parentheses:

a < 10 and b > 30

On the other hand, there are probably those who would prefer the latter; it's a matter of personal preference. The point is, you can always use parentheses if you feel it makes the code more readable, even if they aren't necessary to change the order of evaluation.

Augmented Assignment Operators

You have seen that a single equal sign (=) is used to assign a value to a variable. It is, of course, perfectly viable for the value to the right of the assignment to be an expression containing other variables:

>>> a = 10
>>> b = 20
>>> c = a * 5 + b
>>> c
70

In fact, the expression to the right of the assignment can include references to the variable that is being assigned to:

>>> a = 10
>>> a = a + 5
>>> a
15

>>> b = 20
>>> b = b * 3
>>> b
60

The first example is interpreted as "a is assigned the current value of a plus 5," effectively increasing the value of a by 5. The second reads "b is assigned the current value of b times 3," effectively increasing the value of b threefold.

Of course, this sort of assignment only makes sense if the variable in question has already previously been assigned a value:

>>> z = z / 12
Traceback (most recent call last):
  File "<pyshell#11>", line 1, in <module>
    z = z / 12
NameError: name 'z' is not defined

Python supports a shorthand augmented assignment notation for these arithmetic and bitwise operators:

Arithmetic Bitwise
+
-
*
/
%
//
**
&
|
^
>>
<<

For these operators, the following are equivalent:

x <op>= y
x = x <op> y

Take a look at these examples:

Augmented
Assignment
Standard
Assignment
a += 5 is equivalent to a = a + 5
a /= 10 is equivalent to a = a / 10
a ^= b is equivalent to a = a ^ b

Conclusion

In this tutorial, you learned about the diverse operators Python supports to combine objects into expressions.

Most of the examples you have seen so far have involved only simple atomic data, but you saw a brief introduction to the string data type. The next tutorial will explore string objects in much more detail.

Get Notified: Don't miss the follow up to this tutorial-Click here to join the Real Python Newsletter and you'll know when the next installment comes out.

« Variables in Python
Operators and Expressions in Python
Strings in Python »

[ Improve Your Python With 🐍 Python Tricks 💌 - Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

20 Jun 2018 2:00pm GMT

Peter Bengtsson: A good Django view function cache decorator for http.JsonResponse

I use this a lot. It has served me very well. The code:

import hashlib
import functools

import markus  # optional
from django.core.cache import cache
from django import http
from django.utils.encoding import force_bytes, iri_to_uri

metrics = markus.get_metrics(__name__)  # optional


def json_response_cache_page_decorator(seconds):
    """Cache only when there's a healthy http.JsonResponse response."""

    def decorator(func):

        @functools.wraps(func)
        def inner(request, *args, **kwargs):
            cache_key = 'json_response_cache:{}:{}'.format(
                func.__name__,
                hashlib.md5(force_bytes(iri_to_uri(
                    request.build_absolute_uri()
                ))).hexdigest()
            )
            content = cache.get(cache_key)
            if content is not None:

                # metrics is optional
                metrics.incr(
                    'json_response_cache_hit',
                    tags=['view:{}'.format(func.__name__)]
                )

                return http.HttpResponse(
                    content,
                    content_type='application/json'
                )
            response = func(request, *args, **kwargs)
            if (
                isinstance(response, http.JsonResponse) and
                response.status_code in (200, 304)
            ):
                cache.set(cache_key, response.content, seconds)
            return response

        return inner

    return decorator

To use it simply add to Django view functions that might return a http.JsonResponse. For example, something like this:

@json_response_cache_page_decorator(60)
def search(request):
    q = request.GET.get('q')
    if not q:
        return http.HttpResponseBadRequest('no q')
    results = search_database(q)
    return http.JsonResponse({
        'results': results,
    })

The reasons I use this instead of django.views.decorators.cache.cache_page() is because of a couple of reasons.

Disclaimer: This snippet of code comes from a side-project that has a very specific set of requirements. They're rather unique to that project and I have a full picture of the needs. E.g. I know what specific headers matter and don't matter. Your project might be different. For example, perhaps you don't have markus to handle your metrics. Or perhaps you need to re-write the query string for something to normalize the cache key differently. Point being, take the snippet of code as inspiration when you too find that django.views.decorators.cache.cache_page() isn't good enough for your Django view functions.

20 Jun 2018 1:55pm GMT

Peter Bengtsson: A good Django view function cache decorator for http.JsonResponse

I use this a lot. It has served me very well. The code:

import hashlib
import functools

import markus  # optional
from django.core.cache import cache
from django import http
from django.utils.encoding import force_bytes, iri_to_uri

metrics = markus.get_metrics(__name__)  # optional


def json_response_cache_page_decorator(seconds):
    """Cache only when there's a healthy http.JsonResponse response."""

    def decorator(func):

        @functools.wraps(func)
        def inner(request, *args, **kwargs):
            cache_key = 'json_response_cache:{}:{}'.format(
                func.__name__,
                hashlib.md5(force_bytes(iri_to_uri(
                    request.build_absolute_uri()
                ))).hexdigest()
            )
            content = cache.get(cache_key)
            if content is not None:

                # metrics is optional
                metrics.incr(
                    'json_response_cache_hit',
                    tags=['view:{}'.format(func.__name__)]
                )

                return http.HttpResponse(
                    content,
                    content_type='application/json'
                )
            response = func(request, *args, **kwargs)
            if (
                isinstance(response, http.JsonResponse) and
                response.status_code in (200, 304)
            ):
                cache.set(cache_key, response.content, seconds)
            return response

        return inner

    return decorator

To use it simply add to Django view functions that might return a http.JsonResponse. For example, something like this:

@json_response_cache_page_decorator(60)
def search(request):
    q = request.GET.get('q')
    if not q:
        return http.HttpResponseBadRequest('no q')
    results = search_database(q)
    return http.JsonResponse({
        'results': results,
    })

The reasons I use this instead of django.views.decorators.cache.cache_page() is because of a couple of reasons.

Disclaimer: This snippet of code comes from a side-project that has a very specific set of requirements. They're rather unique to that project and I have a full picture of the needs. E.g. I know what specific headers matter and don't matter. Your project might be different. For example, perhaps you don't have markus to handle your metrics. Or perhaps you need to re-write the query string for something to normalize the cache key differently. Point being, take the snippet of code as inspiration when you too find that django.views.decorators.cache.cache_page() isn't good enough for your Django view functions.

20 Jun 2018 1:55pm GMT

PyPy Development: Repeating a Matrix Multiplication Benchmark

I watched the Hennessy & Patterson's Turing award lecture recently:

In it, there's a slide comparing the performance of various matrix multiplication implementations, using Python (presumably CPython) as a baseline and comparing that against various C implementations (I couldn't find the linked paper yet):

I expected the baseline speedup of switching from CPython to C to be higher and I also wanted to know what performance PyPy gets, so I did my own benchmarks. This is a problem that Python is completely unsuited for, so it should give very exaggerated results.

The usual disclaimers apply: All benchmarks are lies, benchmarking of synthetic workloads even more so. My implementation is really naive (though I did optimize it a little bit to help CPython), don't use any of this code for anything real. The benchmarks ran on my rather old Intel i5-3230M laptop under Ubuntu 17.10.

With that said, my results were as follows:

Implementation time speedup over CPython speedup over PyPy
CPython 512.588 ± 2.362 s 1 ×
PyPy 8.167 ± 0.007 s 62.761 ± 0.295 × 1 ×
'naive' C 2.164 ± 0.025 s 236.817 ± 2.918 × 3.773 ± 0.044 ×
NumPy 0.171 ± 0.002 s 2992.286 ± 42.308 × 47.678 ± 0.634 ×

This is running 1500x1500 matrix multiplications with (the same) random matrices. Every implementation is run 50 times in a fresh process. The results are averaged, the errors are bootstrapped 99% confidence intervals.

So indeed the speedup that I got of switching from CPython to C is quite a bit higher than 47x! PyPy is much better than CPython, but of course can't really compete against GCC. And then the real professionals (numpy/OpenBLAS) are in a whole 'nother league. The speedup of the AVX numbers in the slide above is even higher than my NumPy numbers, which I assume is the result of my old CPU with two cores, vs. the 18 core CPU with AVX support. Lesson confirmed: leave matrix multiplication to people who actually know what they are doing.

20 Jun 2018 1:45pm GMT

PyPy Development: Repeating a Matrix Multiplication Benchmark

I watched the Hennessy & Patterson's Turing award lecture recently:

In it, there's a slide comparing the performance of various matrix multiplication implementations, using Python (presumably CPython) as a baseline and comparing that against various C implementations (I couldn't find the linked paper yet):

I expected the baseline speedup of switching from CPython to C to be higher and I also wanted to know what performance PyPy gets, so I did my own benchmarks. This is a problem that Python is completely unsuited for, so it should give very exaggerated results.

The usual disclaimers apply: All benchmarks are lies, benchmarking of synthetic workloads even more so. My implementation is really naive (though I did optimize it a little bit to help CPython), don't use any of this code for anything real. The benchmarks ran on my rather old Intel i5-3230M laptop under Ubuntu 17.10.

With that said, my results were as follows:

Implementation time speedup over CPython speedup over PyPy
CPython 512.588 ± 2.362 s 1 ×
PyPy 8.167 ± 0.007 s 62.761 ± 0.295 × 1 ×
'naive' C 2.164 ± 0.025 s 236.817 ± 2.918 × 3.773 ± 0.044 ×
NumPy 0.171 ± 0.002 s 2992.286 ± 42.308 × 47.678 ± 0.634 ×

This is running 1500x1500 matrix multiplications with (the same) random matrices. Every implementation is run 50 times in a fresh process. The results are averaged, the errors are bootstrapped 99% confidence intervals.

So indeed the speedup that I got of switching from CPython to C is quite a bit higher than 47x! PyPy is much better than CPython, but of course can't really compete against GCC. And then the real professionals (numpy/OpenBLAS) are in a whole 'nother league. The speedup of the AVX numbers in the slide above is even higher than my NumPy numbers, which I assume is the result of my old CPU with two cores, vs. the 18 core CPU with AVX support. Lesson confirmed: leave matrix multiplication to people who actually know what they are doing.

20 Jun 2018 1:45pm GMT

Yasoob Khalid: An Intro to Web Scraping With lxml and Python

Hello everyone! I hope you are doing well. In this article, I'll teach you the basics of web scraping using lxml and Python. I also recorded this tutorial in a screencast so if you prefer to watch me do this step by step in a video please go ahead and watch it below. However, if for some reason you decide that you prefer text, just scroll a bit more and you will find the text of that same screencast.

First of all, why should you even bother learning how to web scrape? If your job doesn't require you to learn it, then let me give you some motivation. What if you want to create a website which curates cheapest products from Amazon, Walmart and a couple of other online stores? A lot of these online stores don't provide you with an easy way to access their information using an API. In the absence of an API, your only choice is to create a web scraper which can extract information from these websites automatically and provide you with that information in an easy to use way.

Here is an example of a typical API response in JSON. This is the response from Reddit:

Typical API Response in JSON

There are a lot of Python libraries out there which can help you with web scraping. There is lxml, BeautifulSoup and a full-fledged framework called Scrapy. Most of the tutorials discuss BeautifulSoup and Scrapy, so I decided to go with lxml in this post. I will teach you the basics of XPaths and how you can use them to extract data from an HTML document. I will take you through a couple of different examples so that you can quickly get up-to-speed with lxml and XPaths.

If you are a gamer, you will already know of (and likely love) this website. We will be trying to extract data from Steam. More specifically, we will be selecting from the "popular new releases" information. I am converting this into a two-part series. In this part, we will be creating a Python script which can extract the names of the games, the prices of the games, the different tags associated with each game and the target platforms. In the second part, we will turn this script into a Flask based API and then host it on Heroku.

Steam Popular New Releases

Step 1: Exploring Steam

First of all, open up the "popular new releases" page on Steam and scroll down until you see the Popular New Releases tab. At this point, I usually open up Chrome developer tools and see which HTML tags contain the required data. I extensively use the element inspector tool (The button in the top left of the developer tools). It allows you to see the HTML markup behind a specific element on the page with just one click. As a high-level overview, everything on a web page is encapsulated in an HTML tag and tags are usually nested. You need to figure out which tags you need to extract the data from and you are good to go. In our case, if we take a look, we can see that every separate list item is encapsulated in an anchor (a) tag.

The anchor tags themselves are encapsulated in the div with an id of tab_newreleases_content. I am mentioning the id because there are two tabs on this page. The second tab is the standard "New Releases" tab, and we don't want to extract information from that tab. Hence, we will first extract the "Popular New Releases" tab, and then we will extract the required information from this tag.

Step 2: Start writing a Python script

This is a perfect time to create a new Python file and start writing down our script. I am going to create a scrape.py file. Now let's go ahead and import the required libraries. The first one is the requests library and the second one is the lxml.html library.

import requests
import lxml.html

If you don't have requests installed, you can easily install it by running this command in the terminal:

$ pip install requests

The requests library is going to help us open the web page in Python. We could have used lxml to open the HTML page as well but it doesn't work well with all web pages so to be on the safe side I am going to use requests.

Now let's open up the web page using requests and pass that response to lxml.html.fromstring.

html = requests.get('https://store.steampowered.com/explore/new/')
doc = lxml.html.fromstring(html.content)

This provides us with an object of HtmlElement type. This object has the xpath method which we can use to query the HTML document. This provides us with a structured way to extract information from an HTML document.

Step 3: Fire up the Python Interpreter

Now save this file and open up a terminal. Copy the code from the scrape.py file and paste it in a Python interpreter session.

Python Terminal

We are doing this so that we can quickly test our XPaths without continuously editing, saving and executing our scrape.py file.

Let's try writing an XPath for extracting the div which contains the 'Popular New Releases' tab. I will explain the code as we go along:

new_releases = doc.xpath('//div[@id="tab_newreleases_content"]')[0]

This statement will return a list of all the divs in the HTML page which have an id of tab_newreleases_content. Now because we know that only one div on the page has this id we can take out the first element from the list ([0]) and that would be our required div. Let's break down the xpath and try to understand it:

Cool! We have got the required div. Now let's go back to chrome and check which tag contains the titles of the releases.

Step 4: Extract the titles & prices

Extract title from steam releases

The title is contained in a div with a class of tab_item_name. Now that we have the "Popular New Releases" tab extracted we can run further XPath queries on that tab. Write down the following code in the same Python console which we previously ran our code in:

titles = new_releases.xpath('.//div[@class="tab_item_name"]/text()')

This gives us with the titles of all of the games in the "Popular New Releases" tab. Here is the expected output:

title from steam releases in terminal

Let's break down this XPath a little bit because it is a bit different from the last one.

Now we need to extract the prices for the games. We can easily do that by running the following code:

prices = new_releases.xpath('.//div[@class="discount_final_price"]/text()')

I don't think I need to explain this code as it is pretty similar to the title extraction code. The only change we made is the change in the class name.

Extracting prices from steam

Step 5: Extracting tags

Now we need to extract the tags associated with the titles. Here is the HTML markup:

HTML markup

Write down the following code in the Python terminal to extract the tags:

tags = new_releases.xpath('.//div[@class="tab_item_top_tags"]')
total_tags = []
for tag in tags:
    total_tags.append(tag.text_content())

So what we are doing here is that we are extracting the divs containing the tags for the games. Then we loop over the list of extracted tags and then extract the text from those tags using the text_content() method. text_content() returns the text contained within an HTML tag without the HTML markup.

Note: We could have also made use of a list comprehension to make that code shorter. I wrote it down in this way so that even those who don't know about list comprehensions can understand the code. Eitherways, this is the alternate code:

tags = [tag.text_content() for tag in new_releases.xpath('.//div[@class="tab_item_top_tags"]')]

Lets separate the tags in a list as well so that each tag is a separate element:

tags = [tag.split(', ') for tag in tags]

Step 6: Extracting the platforms

Now the only thing remaining is to extract the platforms associated with each title. Here is the HTML markup:

HTML markup

The major difference here is that the platforms are not contained as texts within a specific tag. They are listed as the class name. Some titles only have one platform associated with them like this:

<span class="platform_img win"></span>

While some titles have 5 platforms associated with them like this:

<span class="platform_img win"></span>
<span class="platform_img mac"></span>
<span class="platform_img linux"></span>
<span class="platform_img hmd_separator"></span>
<span title="HTC Vive" class="platform_img htcvive"></span>
<span title="Oculus Rift" class="platform_img oculusrift"></span>

As we can see these spans contain the platform type as the class name. The only common thing between these spans is that all of them contain the platform_img class. First of all, we will extract the divs with the tab_item_details class, then we will extract the spans containing the platform_img class and finally we will extract the second class name from those spans. Here is the code:

platforms_div = new_releases.xpath('.//div[@class="tab_item_details"]')
total_platforms = []

for game in platforms_div:
    temp = game.xpath('.//span[contains(@class, "platform_img")]')
    platforms = [t.get('class').split(' ')[-1] for t in temp]
    if 'hmd_separator' in platforms:
        platforms.remove('hmd_separator')
    total_platforms.append(platforms)

In line 1 we start with extracting the tab_item_details div. The XPath in line 5 is a bit different. Here we have [contains(@class, "platform_img")] instead of simply having [@class="platform_img"]. The reason is that [@class="platform_img"] returns those spans which only have the platform_img class associated with them. If the spans have an additional class, they won't be returned. Whereas [contains(@class, "platform_img")] filters all the spans which have the platform_img class. It doesn't matter whether it is the only class or if there are more classes associated with that tag.

In line 6 we are making use of a list comprehension to reduce the code size. The .get() method allows us to extract an attribute of a tag. Here we are using it to extract the class attribute of a span. We get a string back from the .get() method. In case of the first game, the string being returned is platform_img win so we split that string based on the comma and the whitespace, and then we store the last part (which is the actual platform name) of the split string in the list.

In lines 7-8 we are removing the hmd_separator from the list if it exists. This is because hmd_separator is not a platform. It is just a vertical separator bar used to separate actual platforms from VR/AR hardware.

Step 7: Conclusion

This is the code we have so far:

import requests
import lxml.html

html = requests.get('https://store.steampowered.com/explore/new/')
doc = lxml.html.fromstring(html.content)

new_releases = doc.xpath('//div[@id="tab_newreleases_content"]')[0]

titles = new_releases.xpath('.//div[@class="tab_item_name"]/text()')
prices = new_releases.xpath('.//div[@class="discount_final_price"]/text()')

tags = [tag.text_content() for tag in new_releases.xpath('.//div[@class="tab_item_top_tags"]')]
tags = [tag.split(', ') for tag in tags]

platforms_div = new_releases.xpath('.//div[@class="tab_item_details"]')
total_platforms = []

for game in platforms_div:
    temp = game.xpath('.//span[contains(@class, "platform_img")]')
    platforms = [t.get('class').split(' ')[-1] for t in temp]
    if 'hmd_separator' in platforms:
        platforms.remove('hmd_separator')
    total_platforms.append(platforms)

Now we just need this to return a JSON response so that we can easily turn this into a Flask based API. Here is the code:

output = []
for info in zip(titles,prices, tags, total_platforms):
    resp = {}
    resp['title'] = info[0]
    resp['price'] = info[1]
    resp['tags'] = info[2]
    resp['platforms'] = info[3]
    output.append(resp)

This code is self-explanatory. We are using the zip function to loop over all of those lists in parallel. Then we create a dictionary for each game and assign the title, price, tags, and platforms as a separate key in that dictionary. Lastly, we append that dictionary to the output list.

In a future post, we will take a look at how we can convert this into a Flask based API and host it on Heroku.

Have a great day!

Note: This article first appeared on Timber.io

20 Jun 2018 7:46am GMT

Yasoob Khalid: An Intro to Web Scraping With lxml and Python

Hello everyone! I hope you are doing well. In this article, I'll teach you the basics of web scraping using lxml and Python. I also recorded this tutorial in a screencast so if you prefer to watch me do this step by step in a video please go ahead and watch it below. However, if for some reason you decide that you prefer text, just scroll a bit more and you will find the text of that same screencast.

First of all, why should you even bother learning how to web scrape? If your job doesn't require you to learn it, then let me give you some motivation. What if you want to create a website which curates cheapest products from Amazon, Walmart and a couple of other online stores? A lot of these online stores don't provide you with an easy way to access their information using an API. In the absence of an API, your only choice is to create a web scraper which can extract information from these websites automatically and provide you with that information in an easy to use way.

Here is an example of a typical API response in JSON. This is the response from Reddit:

Typical API Response in JSON

There are a lot of Python libraries out there which can help you with web scraping. There is lxml, BeautifulSoup and a full-fledged framework called Scrapy. Most of the tutorials discuss BeautifulSoup and Scrapy, so I decided to go with lxml in this post. I will teach you the basics of XPaths and how you can use them to extract data from an HTML document. I will take you through a couple of different examples so that you can quickly get up-to-speed with lxml and XPaths.

If you are a gamer, you will already know of (and likely love) this website. We will be trying to extract data from Steam. More specifically, we will be selecting from the "popular new releases" information. I am converting this into a two-part series. In this part, we will be creating a Python script which can extract the names of the games, the prices of the games, the different tags associated with each game and the target platforms. In the second part, we will turn this script into a Flask based API and then host it on Heroku.

Steam Popular New Releases

Step 1: Exploring Steam

First of all, open up the "popular new releases" page on Steam and scroll down until you see the Popular New Releases tab. At this point, I usually open up Chrome developer tools and see which HTML tags contain the required data. I extensively use the element inspector tool (The button in the top left of the developer tools). It allows you to see the HTML markup behind a specific element on the page with just one click. As a high-level overview, everything on a web page is encapsulated in an HTML tag and tags are usually nested. You need to figure out which tags you need to extract the data from and you are good to go. In our case, if we take a look, we can see that every separate list item is encapsulated in an anchor (a) tag.

The anchor tags themselves are encapsulated in the div with an id of tab_newreleases_content. I am mentioning the id because there are two tabs on this page. The second tab is the standard "New Releases" tab, and we don't want to extract information from that tab. Hence, we will first extract the "Popular New Releases" tab, and then we will extract the required information from this tag.

Step 2: Start writing a Python script

This is a perfect time to create a new Python file and start writing down our script. I am going to create a scrape.py file. Now let's go ahead and import the required libraries. The first one is the requests library and the second one is the lxml.html library.

import requests
import lxml.html

If you don't have requests installed, you can easily install it by running this command in the terminal:

$ pip install requests

The requests library is going to help us open the web page in Python. We could have used lxml to open the HTML page as well but it doesn't work well with all web pages so to be on the safe side I am going to use requests.

Now let's open up the web page using requests and pass that response to lxml.html.fromstring.

html = requests.get('https://store.steampowered.com/explore/new/')
doc = lxml.html.fromstring(html.content)

This provides us with an object of HtmlElement type. This object has the xpath method which we can use to query the HTML document. This provides us with a structured way to extract information from an HTML document.

Step 3: Fire up the Python Interpreter

Now save this file and open up a terminal. Copy the code from the scrape.py file and paste it in a Python interpreter session.

Python Terminal

We are doing this so that we can quickly test our XPaths without continuously editing, saving and executing our scrape.py file.

Let's try writing an XPath for extracting the div which contains the 'Popular New Releases' tab. I will explain the code as we go along:

new_releases = doc.xpath('//div[@id="tab_newreleases_content"]')[0]

This statement will return a list of all the divs in the HTML page which have an id of tab_newreleases_content. Now because we know that only one div on the page has this id we can take out the first element from the list ([0]) and that would be our required div. Let's break down the xpath and try to understand it:

Cool! We have got the required div. Now let's go back to chrome and check which tag contains the titles of the releases.

Step 4: Extract the titles & prices

Extract title from steam releases

The title is contained in a div with a class of tab_item_name. Now that we have the "Popular New Releases" tab extracted we can run further XPath queries on that tab. Write down the following code in the same Python console which we previously ran our code in:

titles = new_releases.xpath('.//div[@class="tab_item_name"]/text()')

This gives us with the titles of all of the games in the "Popular New Releases" tab. Here is the expected output:

title from steam releases in terminal

Let's break down this XPath a little bit because it is a bit different from the last one.

Now we need to extract the prices for the games. We can easily do that by running the following code:

prices = new_releases.xpath('.//div[@class="discount_final_price"]/text()')

I don't think I need to explain this code as it is pretty similar to the title extraction code. The only change we made is the change in the class name.

Extracting prices from steam

Step 5: Extracting tags

Now we need to extract the tags associated with the titles. Here is the HTML markup:

HTML markup

Write down the following code in the Python terminal to extract the tags:

tags = new_releases.xpath('.//div[@class="tab_item_top_tags"]')
total_tags = []
for tag in tags:
    total_tags.append(tag.text_content())

So what we are doing here is that we are extracting the divs containing the tags for the games. Then we loop over the list of extracted tags and then extract the text from those tags using the text_content() method. text_content() returns the text contained within an HTML tag without the HTML markup.

Note: We could have also made use of a list comprehension to make that code shorter. I wrote it down in this way so that even those who don't know about list comprehensions can understand the code. Eitherways, this is the alternate code:

tags = [tag.text_content() for tag in new_releases.xpath('.//div[@class="tab_item_top_tags"]')]

Lets separate the tags in a list as well so that each tag is a separate element:

tags = [tag.split(', ') for tag in tags]

Step 6: Extracting the platforms

Now the only thing remaining is to extract the platforms associated with each title. Here is the HTML markup:

HTML markup

The major difference here is that the platforms are not contained as texts within a specific tag. They are listed as the class name. Some titles only have one platform associated with them like this:

<span class="platform_img win"></span>

While some titles have 5 platforms associated with them like this:

<span class="platform_img win"></span>
<span class="platform_img mac"></span>
<span class="platform_img linux"></span>
<span class="platform_img hmd_separator"></span>
<span title="HTC Vive" class="platform_img htcvive"></span>
<span title="Oculus Rift" class="platform_img oculusrift"></span>

As we can see these spans contain the platform type as the class name. The only common thing between these spans is that all of them contain the platform_img class. First of all, we will extract the divs with the tab_item_details class, then we will extract the spans containing the platform_img class and finally we will extract the second class name from those spans. Here is the code:

platforms_div = new_releases.xpath('.//div[@class="tab_item_details"]')
total_platforms = []

for game in platforms_div:
    temp = game.xpath('.//span[contains(@class, "platform_img")]')
    platforms = [t.get('class').split(' ')[-1] for t in temp]
    if 'hmd_separator' in platforms:
        platforms.remove('hmd_separator')
    total_platforms.append(platforms)

In line 1 we start with extracting the tab_item_details div. The XPath in line 5 is a bit different. Here we have [contains(@class, "platform_img")] instead of simply having [@class="platform_img"]. The reason is that [@class="platform_img"] returns those spans which only have the platform_img class associated with them. If the spans have an additional class, they won't be returned. Whereas [contains(@class, "platform_img")] filters all the spans which have the platform_img class. It doesn't matter whether it is the only class or if there are more classes associated with that tag.

In line 6 we are making use of a list comprehension to reduce the code size. The .get() method allows us to extract an attribute of a tag. Here we are using it to extract the class attribute of a span. We get a string back from the .get() method. In case of the first game, the string being returned is platform_img win so we split that string based on the comma and the whitespace, and then we store the last part (which is the actual platform name) of the split string in the list.

In lines 7-8 we are removing the hmd_separator from the list if it exists. This is because hmd_separator is not a platform. It is just a vertical separator bar used to separate actual platforms from VR/AR hardware.

Step 7: Conclusion

This is the code we have so far:

import requests
import lxml.html

html = requests.get('https://store.steampowered.com/explore/new/')
doc = lxml.html.fromstring(html.content)

new_releases = doc.xpath('//div[@id="tab_newreleases_content"]')[0]

titles = new_releases.xpath('.//div[@class="tab_item_name"]/text()')
prices = new_releases.xpath('.//div[@class="discount_final_price"]/text()')

tags = [tag.text_content() for tag in new_releases.xpath('.//div[@class="tab_item_top_tags"]')]
tags = [tag.split(', ') for tag in tags]

platforms_div = new_releases.xpath('.//div[@class="tab_item_details"]')
total_platforms = []

for game in platforms_div:
    temp = game.xpath('.//span[contains(@class, "platform_img")]')
    platforms = [t.get('class').split(' ')[-1] for t in temp]
    if 'hmd_separator' in platforms:
        platforms.remove('hmd_separator')
    total_platforms.append(platforms)

Now we just need this to return a JSON response so that we can easily turn this into a Flask based API. Here is the code:

output = []
for info in zip(titles,prices, tags, total_platforms):
    resp = {}
    resp['title'] = info[0]
    resp['price'] = info[1]
    resp['tags'] = info[2]
    resp['platforms'] = info[3]
    output.append(resp)

This code is self-explanatory. We are using the zip function to loop over all of those lists in parallel. Then we create a dictionary for each game and assign the title, price, tags, and platforms as a separate key in that dictionary. Lastly, we append that dictionary to the output list.

In a future post, we will take a look at how we can convert this into a Flask based API and host it on Heroku.

Have a great day!

Note: This article first appeared on Timber.io

20 Jun 2018 7:46am GMT

19 Jun 2018

feedPlanet Python

Artem Golubin: How many objects does Python allocate during its interpreter lifetime?

It can be very surprising to see how many objects Python interpreter temporarily allocates while executing simple scripts. In fact, Python provides a way to check it.

To do so, we need to compile a standard CPython interpreter with additional debug flags:

./configure CFLAGS='-DCOUNT_ALLOCS' --with-pydebug 
make -s -j2

Let's open an empty interactive REPL and check allocation statistics:

>>> import sys
>>> sys.getcounts()
[('iterator', 7, 7, 4), ('functools._lru_cache_wrapper', 1, 0, 1), ('re.Match', 2, 2 ...

19 Jun 2018 4:25am GMT

Artem Golubin: How many objects does Python allocate during its interpreter lifetime?

It can be very surprising to see how many objects Python interpreter temporarily allocates while executing simple scripts. In fact, Python provides a way to check it.

To do so, we need to compile a standard CPython interpreter with additional debug flags:

./configure CFLAGS='-DCOUNT_ALLOCS' --with-pydebug 
make -s -j2

Let's open an empty interactive REPL and check allocation statistics:

>>> import sys
>>> sys.getcounts()
[('iterator', 7, 7, 4), ('functools._lru_cache_wrapper', 1, 0, 1), ('re.Match', 2, 2 ...

19 Jun 2018 4:25am GMT

Vladimir Iakolev: Filmstrip from subtitles and stock images

It's possible to find subtitles for almost every movie or TV series. And there's also stock images with anything imaginable. Wouldn't it be fun to connect this two things and make a sort of a filmstrip with a stock image for every caption from subtitles?

TLDR: the result is silly:

For the subtitles to play with I chose subtitles for Bob's Burgers - The Deeping. At first, we need to parse it with pycaption:

from pycaption.srt import SRTReader

lang = 'en-US'
path = 'burgers.srt'

def read_subtitles(path, lang):
    with open(path) as f:
        data = f.read()
        return SRTReader().read(data, lang=lang)
        
        
subtitles = read_subtitles(path, lang)
captions = subtitles.get_captions(lang)

>>> captions
['00:00:04.745 --> 00:00:06.746\nShh.', '00:00:10.166 --> 00:00:20.484\n...

As a lot of subtitles contains html, it's important to remove tags before future processing, it's very easy to do with lxml:

import lxml.html

def to_text(raw_text):
    return lxml.html.document_fromstring(raw_text).text_content()

to_text('<i>That shark is ruining</i>')
'That shark is ruining'

For finding most significant words in the text we need to tokenize it, lemmatize (replace every different form of a word with a common form) and remove stop words. It's easy to do with NLTK:

from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer

def tokenize_lemmatize(text):
    tokens = word_tokenize(text)
    lemmatizer = WordNetLemmatizer()
    lemmatized = [lemmatizer.lemmatize(token.lower())
                  for token in tokens if token.isalpha()]
    stop_words = set(stopwords.words("english"))
    return [lemma for lemma in lemmatized if lemma not in stop_words]

>>> tokenize_lemmatize('That shark is ruining')
['shark', 'ruining']

And after that we can just combine the previous two functions and find most frequently used words:

from collections import Counter

def get_most_popular(captions):
    full_text = '\n'.join(to_text(caption.get_text()) for caption in captions)
    tokens = tokenize_lemmatize(full_text)
    return Counter(tokens)
    
  
most_popular = get_most_popular(captions)

most_popular
Counter({'shark': 68, 'oh': 32, 'bob': 29, 'yeah': 25, 'right': 20,...

It's not the best way to find the most important words, but it kind of works.

After that it's straightforward to extract keywords from a single caption:

def get_keywords(most_popular, text, n=2):
    tokens = sorted(tokenize_lemmatize(text), key=lambda x: -most_popular[x])
    return tokens[:n]

>>> captions[127].get_text()
'Teddy, what is wrong with you?'
>>> get_keywords(most_popular, to_text(captions[127].get_text()))
['teddy', 'wrong']

The next step is to find a stock image for those keywords. There's not that many properly working and documented stocks, so I chose to use Shutterstock API. It's limited to 250 requests per hour, but it's enough to play.

From their API we only need to use /images/search. We will search for the most popular photo:

import requests

# Key and secret of your app
stock_key = ''
stock_secret = ''

def get_stock_image_url(query):
    response = requests.get(
        "https://api.shutterstock.com/v2/images/search",
        params={
            'query': query,
            'sort': 'popular',
            'view': 'minimal',
            'safe': 'false',
            'per_page': '1',
            'image_type': 'photo',
        },
        auth=(stock_key, stock_secret),
    )
    data = response.json()
    try:
        return data['data'][0]['assets']['preview']['url']
    except (IndexError, KeyError):
        return None

>>> get_stock_image_url('teddy wrong')
'https://image.shutterstock.com/display_pic_with_logo/2780032/635833889/stock-photo-guilty-boyfriend-asking-for-forgiveness-presenting-offended-girlfriend-a-teddy-bear-toy-lady-635833889.jpg'

The image looks relevant:

teddy wrong

Now we can create a proper card from a caption:

def make_slide(most_popular, caption):
    text = to_text(caption.get_text())
    if not text:
        return None

    keywords = get_keywords(most_popular, text)
    query = ' '.join(keywords)
    if not query:
        return None

    stock_image = get_stock_image_url(query)
    if not stock_image:
        return None

    return text, stock_image

make_slide(most_popular, captions[132])
('He really chewed it...\nwith his shark teeth.', 'https://image.shutterstock.com/display_pic_with_logo/181702384/710357305/stock-photo-scuba-diver-has-shark-swim-really-close-just-above-head-as-she-faces-camera-below-710357305.jpg')

The image is kind of relevant:

He really chewed it...with his shark teeth.

After that we can select captions that we want to put in our filmstrip and generate html like the one in the TLDR section:

output_path = 'burgers.html'
start_slide = 98
end_slide = 200


def make_html_output(slides):
    html = '<html><head><link rel="stylesheet" href="./style.css"></head><body>'
    for (text, stock_image) in slides:
        html += f'''<div class="box">
            <img src="{stock_image}" />
            <span>{text}</span>
        </div>'''
    html += '</body></html>'
    return html


interesting_slides = [make_slide(most_popular, caption)
                      for caption in captions[start_slide:end_slide]]
interesting_slides = [slide for slide in interesting_slides if slide]

with open(output_path, 'w') as f:
    output = make_html_output(interesting_slides)
    f.write(output)

And the result - burgers.html.

Another example, even worse and a bit NSFW, It's Always Sunny in Philadelphia - Charlie Catches a Leprechaun.

Gist with the sources.

19 Jun 2018 12:23am GMT

Vladimir Iakolev: Filmstrip from subtitles and stock images

It's possible to find subtitles for almost every movie or TV series. And there's also stock images with anything imaginable. Wouldn't it be fun to connect this two things and make a sort of a filmstrip with a stock image for every caption from subtitles?

TLDR: the result is silly:

For the subtitles to play with I chose subtitles for Bob's Burgers - The Deeping. At first, we need to parse it with pycaption:

from pycaption.srt import SRTReader

lang = 'en-US'
path = 'burgers.srt'

def read_subtitles(path, lang):
    with open(path) as f:
        data = f.read()
        return SRTReader().read(data, lang=lang)
        
        
subtitles = read_subtitles(path, lang)
captions = subtitles.get_captions(lang)

>>> captions
['00:00:04.745 --> 00:00:06.746\nShh.', '00:00:10.166 --> 00:00:20.484\n...

As a lot of subtitles contains html, it's important to remove tags before future processing, it's very easy to do with lxml:

import lxml.html

def to_text(raw_text):
    return lxml.html.document_fromstring(raw_text).text_content()

to_text('<i>That shark is ruining</i>')
'That shark is ruining'

For finding most significant words in the text we need to tokenize it, lemmatize (replace every different form of a word with a common form) and remove stop words. It's easy to do with NLTK:

from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer

def tokenize_lemmatize(text):
    tokens = word_tokenize(text)
    lemmatizer = WordNetLemmatizer()
    lemmatized = [lemmatizer.lemmatize(token.lower())
                  for token in tokens if token.isalpha()]
    stop_words = set(stopwords.words("english"))
    return [lemma for lemma in lemmatized if lemma not in stop_words]

>>> tokenize_lemmatize('That shark is ruining')
['shark', 'ruining']

And after that we can just combine the previous two functions and find most frequently used words:

from collections import Counter

def get_most_popular(captions):
    full_text = '\n'.join(to_text(caption.get_text()) for caption in captions)
    tokens = tokenize_lemmatize(full_text)
    return Counter(tokens)
    
  
most_popular = get_most_popular(captions)

most_popular
Counter({'shark': 68, 'oh': 32, 'bob': 29, 'yeah': 25, 'right': 20,...

It's not the best way to find the most important words, but it kind of works.

After that it's straightforward to extract keywords from a single caption:

def get_keywords(most_popular, text, n=2):
    tokens = sorted(tokenize_lemmatize(text), key=lambda x: -most_popular[x])
    return tokens[:n]

>>> captions[127].get_text()
'Teddy, what is wrong with you?'
>>> get_keywords(most_popular, to_text(captions[127].get_text()))
['teddy', 'wrong']

The next step is to find a stock image for those keywords. There's not that many properly working and documented stocks, so I chose to use Shutterstock API. It's limited to 250 requests per hour, but it's enough to play.

From their API we only need to use /images/search. We will search for the most popular photo:

import requests

# Key and secret of your app
stock_key = ''
stock_secret = ''

def get_stock_image_url(query):
    response = requests.get(
        "https://api.shutterstock.com/v2/images/search",
        params={
            'query': query,
            'sort': 'popular',
            'view': 'minimal',
            'safe': 'false',
            'per_page': '1',
            'image_type': 'photo',
        },
        auth=(stock_key, stock_secret),
    )
    data = response.json()
    try:
        return data['data'][0]['assets']['preview']['url']
    except (IndexError, KeyError):
        return None

>>> get_stock_image_url('teddy wrong')
'https://image.shutterstock.com/display_pic_with_logo/2780032/635833889/stock-photo-guilty-boyfriend-asking-for-forgiveness-presenting-offended-girlfriend-a-teddy-bear-toy-lady-635833889.jpg'

The image looks relevant:

teddy wrong

Now we can create a proper card from a caption:

def make_slide(most_popular, caption):
    text = to_text(caption.get_text())
    if not text:
        return None

    keywords = get_keywords(most_popular, text)
    query = ' '.join(keywords)
    if not query:
        return None

    stock_image = get_stock_image_url(query)
    if not stock_image:
        return None

    return text, stock_image

make_slide(most_popular, captions[132])
('He really chewed it...\nwith his shark teeth.', 'https://image.shutterstock.com/display_pic_with_logo/181702384/710357305/stock-photo-scuba-diver-has-shark-swim-really-close-just-above-head-as-she-faces-camera-below-710357305.jpg')

The image is kind of relevant:

He really chewed it...with his shark teeth.

After that we can select captions that we want to put in our filmstrip and generate html like the one in the TLDR section:

output_path = 'burgers.html'
start_slide = 98
end_slide = 200


def make_html_output(slides):
    html = '<html><head><link rel="stylesheet" href="./style.css"></head><body>'
    for (text, stock_image) in slides:
        html += f'''<div class="box">
            <img src="{stock_image}" />
            <span>{text}</span>
        </div>'''
    html += '</body></html>'
    return html


interesting_slides = [make_slide(most_popular, caption)
                      for caption in captions[start_slide:end_slide]]
interesting_slides = [slide for slide in interesting_slides if slide]

with open(output_path, 'w') as f:
    output = make_html_output(interesting_slides)
    f.write(output)

And the result - burgers.html.

Another example, even worse and a bit NSFW, It's Always Sunny in Philadelphia - Charlie Catches a Leprechaun.

Gist with the sources.

19 Jun 2018 12:23am GMT

10 Nov 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: King Willams Town Bahnhof

Gestern musste ich morgens zur Station nach KWT um unsere Rerservierten Bustickets für die Weihnachtsferien in Capetown abzuholen. Der Bahnhof selber ist seit Dezember aus kostengründen ohne Zugverbindung - aber Translux und co - die langdistanzbusse haben dort ihre Büros.


Größere Kartenansicht




© benste CC NC SA

10 Nov 2011 10:57am GMT

09 Nov 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein

Niemand ist besorgt um so was - mit dem Auto fährt man einfach durch, und in der City - nahe Gnobie- "ne das ist erst gefährlich wenn die Feuerwehr da ist" - 30min später auf dem Rückweg war die Feuerwehr da.




© benste CC NC SA

09 Nov 2011 8:25pm GMT

08 Nov 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: Brai Party

Brai = Grillabend o.ä.

Die möchte gern Techniker beim Flicken ihrer SpeakOn / Klinke Stecker Verzweigungen...

Die Damen "Mamas" der Siedlung bei der offiziellen Eröffnungsrede

Auch wenn weniger Leute da waren als erwartet, Laute Musik und viele Leute ...

Und natürlich ein Feuer mit echtem Holz zum Grillen.

© benste CC NC SA

08 Nov 2011 2:30pm GMT

07 Nov 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: Lumanyano Primary

One of our missions was bringing Katja's Linux Server back to her room. While doing that we saw her new decoration.

Björn, Simphiwe carried the PC to Katja's school


© benste CC NC SA

07 Nov 2011 2:00pm GMT

06 Nov 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: Nelisa Haircut

Today I went with Björn to Needs Camp to Visit Katja's guest family for a special Party. First of all we visited some friends of Nelisa - yeah the one I'm working with in Quigney - Katja's guest fathers sister - who did her a haircut.

African Women usually get their hair done by arranging extensions and not like Europeans just cutting some hair.

In between she looked like this...

And then she was done - looks amazing considering the amount of hair she had last week - doesn't it ?

© benste CC NC SA

06 Nov 2011 7:45pm GMT

05 Nov 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: Mein Samstag

Irgendwie viel mir heute auf das ich meine Blogposts mal ein bischen umstrukturieren muss - wenn ich immer nur von neuen Plätzen berichte, dann müsste ich ja eine Rundreise machen. Hier also mal ein paar Sachen aus meinem heutigen Alltag.

Erst einmal vorweg, Samstag zählt zumindest für uns Voluntäre zu den freien Tagen.

Dieses Wochenende sind nur Rommel und ich auf der Farm - Katja und Björn sind ja mittlerweile in ihren Einsatzstellen, und meine Mitbewohner Kyle und Jonathan sind zu Hause in Grahamstown - sowie auch Sipho der in Dimbaza wohnt.
Robin, die Frau von Rommel ist in Woodie Cape - schon seit Donnerstag um da ein paar Sachen zur erledigen.
Naja wie dem auch sei heute morgen haben wir uns erstmal ein gemeinsames Weetbix/Müsli Frühstück gegönnt und haben uns dann auf den Weg nach East London gemacht. 2 Sachen waren auf der Checkliste Vodacom, Ethienne (Imobilienmakler) außerdem auf dem Rückweg die fehlenden Dinge nach NeedsCamp bringen.

Nachdem wir gerade auf der Dirtroad losgefahren sind mussten wir feststellen das wir die Sachen für Needscamp und Ethienne nicht eingepackt hatten aber die Pumpe für die Wasserversorgung im Auto hatten.

Also sind wir in EastLondon ersteinmal nach Farmerama - nein nicht das onlinespiel farmville - sondern einen Laden mit ganz vielen Sachen für eine Farm - in Berea einem nördlichen Stadteil gefahren.

In Farmerama haben wir uns dann beraten lassen für einen Schnellverschluss der uns das leben mit der Pumpe leichter machen soll und außerdem eine leichtere Pumpe zur Reperatur gebracht, damit es nicht immer so ein großer Aufwand ist, wenn mal wieder das Wasser ausgegangen ist.

Fego Caffé ist in der Hemmingways Mall, dort mussten wir und PIN und PUK einer unserer Datensimcards geben lassen, da bei der PIN Abfrage leider ein zahlendreher unterlaufen ist. Naja auf jeden Fall speichern die Shops in Südafrika so sensible Daten wie eine PUK - die im Prinzip zugang zu einem gesperrten Phone verschafft.

Im Cafe hat Rommel dann ein paar online Transaktionen mit dem 3G Modem durchgeführt, welches ja jetzt wieder funktionierte - und übrigens mittlerweile in Ubuntu meinem Linuxsystem perfekt klappt.

Nebenbei bin ich nach 8ta gegangen um dort etwas über deren neue Deals zu erfahren, da wir in einigen von Hilltops Centern Internet anbieten wollen. Das Bild zeigt die Abdeckung UMTS in NeedsCamp Katjas Ort. 8ta ist ein neuer Telefonanbieter von Telkom, nachdem Vodafone sich Telkoms anteile an Vodacom gekauft hat müssen die komplett neu aufbauen.
Wir haben uns dazu entschieden mal eine kostenlose Prepaidkarte zu testen zu organisieren, denn wer weis wie genau die Karte oben ist ... Bevor man einen noch so billigen Deal für 24 Monate signed sollte man wissen obs geht.

Danach gings nach Checkers in Vincent, gesucht wurden zwei Hotplates für WoodyCape - R 129.00 eine - also ca. 12€ für eine zweigeteilte Kochplatte.
Wie man sieht im Hintergrund gibts schon Weihnachtsdeko - Anfang November und das in Südafrika bei sonnig warmen min- 25°C

Mittagessen haben wir uns bei einem Pakistanischen Curry Imbiss gegönnt - sehr empfehlenswert !
Naja und nachdem wir dann vor ner Stunde oder so zurück gekommen sind habe ich noch den Kühlschrank geputzt den ich heute morgen zum defrosten einfach nach draußen gestellt hatte. Jetzt ist der auch mal wieder sauber und ohne 3m dicke Eisschicht...

Morgen ... ja darüber werde ich gesondert berichten ... aber vermutlich erst am Montag, denn dann bin ich nochmal wieder in Quigney(East London) und habe kostenloses Internet.

© benste CC NC SA

05 Nov 2011 4:33pm GMT

31 Oct 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: Sterkspruit Computer Center

Sterkspruit is one of Hilltops Computer Centres in the far north of Eastern Cape. On the trip to J'burg we've used the opportunity to take a look at the centre.

Pupils in the big classroom


The Trainer


School in Countryside


Adult Class in the Afternoon


"Town"


© benste CC NC SA

31 Oct 2011 4:58pm GMT

Benedict Stein: Technical Issues

What are you doing in an internet cafe if your ADSL and Faxline has been discontinued before months end. Well my idea was sitting outside and eating some ice cream.
At least it's sunny and not as rainy as on the weekend.


© benste CC NC SA

31 Oct 2011 3:11pm GMT

30 Oct 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: Nellis Restaurant

For those who are traveling through Zastron - there is a very nice Restaurant which is serving delicious food at reasanable prices.
In addition they're selling home made juices jams and honey.




interior


home made specialities - the shop in the shop


the Bar


© benste CC NC SA

30 Oct 2011 4:47pm GMT

29 Oct 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: The way back from J'burg

Having the 10 - 12h trip from J'burg back to ELS I was able to take a lot of pcitures including these different roadsides

Plain Street


Orange River in its beginngings (near Lesotho)


Zastron Anglican Church


The Bridge in Between "Free State" and Eastern Cape next to Zastron


my new Background ;)


If you listen to GoogleMaps you'll end up traveling 50km of gravel road - as it was just renewed we didn't have that many problems and saved 1h compared to going the official way with all it's constructions sites




Freeway


getting dark


© benste CC NC SA

29 Oct 2011 4:23pm GMT

28 Oct 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: Wie funktioniert eigentlich eine Baustelle ?

Klar einiges mag anders sein, vieles aber gleich - aber ein in Deutschland täglich übliches Bild einer Straßenbaustelle - wie läuft das eigentlich in Südafrika ?

Ersteinmal vorweg - NEIN keine Ureinwohner die mit den Händen graben - auch wenn hier mehr Manpower genutzt wird - sind sie fleißig mit Technologie am arbeiten.

Eine ganz normale "Bundesstraße"


und wie sie erweitert wird


gaaaanz viele LKWs


denn hier wird eine Seite über einen langen Abschnitt komplett gesperrt, so das eine Ampelschaltung mit hier 45 Minuten Wartezeit entsteht


Aber wenigstens scheinen die ihren Spaß zu haben ;) - Wie auch wir denn gücklicher Weise mussten wir nie länger als 10 min. warten.

© benste CC NC SA

28 Oct 2011 4:20pm GMT