02 Jun 2020

feedDjango community aggregator: Community blog posts

Extracting values from environment variables in tox

[Tox](https://tox.readthedocs.io) is a great tool for automated testing. We use it, not only to run matrix testing, but to run different types of tests in different environments, enabling us to parallelise our test runs, and get better reporting about what types of tests failed. Recently, we started using [Robot Framework](https://robotframework.org) for some automated UI testing. This needs to run a django server, and almost certainly wants to run against a different database. This will require our `tox -e robot` to drop the database if it exists, and then create it. Because we use [dj-database-url](https://pypi.org/project/dj-database-url/) to provide our database settings, our [Codeship](https://codeship.com/) configuration contains an environment variable set to `DATABASE_URL`. This contains the host, port and database name, as well as the username/password if applicable. However, we don't have the database name (or port) directly available in their own environment variables. Instead, I wanted to extract these out of the `postgres://user:password@host:port/dbname` string. My tox environment also needed to ensure that a distinct database was used for robot: {% highlight text %} [testenv:robot] setenv= CELERY_ALWAYS_EAGER=True DATABASE_URL={env:DATABASE_URL}_robot PORT=55002 BROWSER=headlesschrome whitelist_externals= /bin/sh commands= sh -c 'dropdb --if-exists $(echo {env:DATABASE_URL} | cut -d "/" -f 4)' sh -c 'createdb $(echo {env:DATABASE_URL} | cut -d "/" -f 4)' coverage run --parallel-mode --branch manage.py robot --runserver={env:PORT} {% endhighlight %} And this was working great. I'm also using the `$PG_USER` environment variable, which is supplied by Codeship, but that just clutters things up. However, when merged to our main repo, which has it's own codeship environment, tests were failing. It would complain about the database not being present when attempting to run the robot tests. It seems that we were using a different version of postgres, and thus were using a different port. So, how can we extract the port from the `$DATABASE_URL`? {% highlight text %} commands= sh -c 'dropdb --if-exists \ -p $(echo {env:DATABASE_URL} | cut -d "/" -f 3 | cut -d ":" -f 3) \ $(echo {env:DATABASE_URL} | cut -d "/" -f 4)' {% endhighlight %} Which is all well and good, until you have a `$DATABASE_URL` that omits the port... dropdb: error: missing required argument database name Ah, that would mean the command being executed was: $ dropdb --if-exists -p Eventually, I came up with the following: {% highlight text %} sh -c 'PG_PORT=$(echo {env:DATABASE_URL} | cut -d "/" -f 3 | cut -d ":" -f 3) \ dropdb --if-exists -p $\{PG_PORT:-5432} \ $(echo {env:DATABASE_URL} | cut -d "/" -f 4)' {% endhighlight %} Whew, that is a mouthful! We store the extracted value in a variable `PG_PORT`, and then use bash variable substitution (rather than tox variable substitution) to put it in, with a default value. But because of tox variable substitution, we need to escape the curly brace to allow it to be passed through to bash: `$\{PG_PORT:-5432}`. Also note that you'll need a space after this before a line continuation, because bash seems to strip leading spaces from the continued line.

02 Jun 2020 6:03pm GMT

01 Jun 2020

feedDjango community aggregator: Community blog posts

Stop Using datetime.now!


One of my favorite job interview questions is this:

Write a function that returns tomorrow's date

This looks innocent enough for someone to suggest this as a solution:

import datetime

def tomorrow() -> datetime.date:
    return datetime.date.today() + datetime.timedelta(days=1)

This will work, but there is a followup question:

How would you test this function?

Before you move on.... take a second to think about your answer.

One of these pigeons is a mock<br><small>Photo by <a href="https://www.pexels.com/photo/two-pigeon-perched-on-white-track-light-681447/">Pedro Figueras</a></small>One of these pigeons is a mock
Photo by Pedro Figueras

Table of Contents


Naive Approach

The most naive approach to test a function that returns tomorrow's date is this:

# Bad
assert tomorrow() == datetime.date(2020, 4, 16)

This test will pass today, but it will fail on any other day.

Another way to test the function is this:

# Bad
assert tomorrow() == datetime.date.today() + datetime.timedelta(days=1)

This will also work, but there is an inherent problem with this approach. The same way you can't define a word in the dictionary using itself, you should not test a function by repeating its implementation.

Another problem with this approach is that it's only testing one scenario, for the day it is executed. What about getting the next day across a month or a year? What about the day after 2020-02-28?

The problem with both implementations is that today is set inside the function, and to simulate different test scenarios you need to control this value. One solution that comes to mind is to mock datetime.date, and try to set the value returned by today():

>>> from unittest import mock
>>> with mock.patch('datetime.date.today', return_value=datetime.date(2020, 1, 1)):
...     assert tomorrow() == datetime.date(2020, 1, 2)
...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.7/unittest/mock.py", line 1410, in __enter__
    setattr(self.target, self.attribute, new_attr)
TypeError: can't set attributes of built-in/extension type 'datetime.date'

As the exception suggests, built-in modules written in C cannot be mocked. The unittest.mock documentation specifically addresses this attempt to mock the datetime module. Apparently, this is a very common issue and the writers of the official documentation felt it's worth mentioning. They even go the extra mile and link to a blog post on this exact problem. The article is worth a read, and we are going to address the solution it presents later on.

Like every other problem in Python, there are libraries that provide a solution. Two libraries that stand out are freezegun and libfaketime. Both provide the ability to mock time at different levels. However, resorting to external libraries is a luxury only developers of legacy system can afford. For new projects, or projects that are small enough to change, there are other alternatives that can keep the project free of these dependencies.


Dependency Injection

The problem we were trying to solve with mock, can also be solved by changing the function's API:

import datetime

def tomorrow(asof: datetime.date) -> datetime.date:
    return asof + datetime.timedelta(days=1)

To control the reference time of the function, the time can be provided as an argument. This makes it easier to test the function in different scenarios:

import datetime
assert tomorrow(asof=datetime.date(2020, 5, 1))   == datetime.date(2020, 5, 2)
assert tomorrow(asof=datetime.date(2019, 12, 31)) == datetime.date(2020, 1, 1)
assert tomorrow(asof=datetime.date(2020, 2, 28))  == datetime.date(2020, 2, 29)
assert tomorrow(asof=datetime.date(2021, 2, 28))  == datetime.date(2021, 3, 1)

To remove the function's dependency on datetime.date.today, we provide today's date as an argument. This pattern of providing, or "injecting" dependencies into functions and objects is often called "dependency injection", or in short "DI".

Dependency Injection in The Wild

Dependency injection is a way to decouple modules from each other. As our previous example shows, the function tomorrow no longer depends on today.

Using dependency injection is very common and often very intuitive. It's very likely that you already use it without even knowing. For example, this article suggests that providing an open file to json.load is a form of dependency injection:

import json

with open('path/to/file.json', 'r') as f:
  data = json.load(f)

The popular test framework pytest builds its entire fixture infrastructure around the concept of dependency injection:

import pytest

@pytest.fixture
def one() -> int:
  return 1

@pytest.fixture
def two() -> int:
  return 2

def test_one_is_less_than_two(one: int, two: int) -> None:
  assert one < two

The functions one and two are declared as fixtures. When pytest executes the test function test_one_is_less_than_two, it will provide it with the values returned by the fixture functions matching the attribute names. In pytest, the injection is magically happening simply by using the name of a known fixture as an argument.

Dependency injection is not limited just to Python. The popular JavaScript framework Angular is also built around dependency injection:

@Component({
  selector: 'order-list',
  template: `...`
})
export class OrderListComponent {
  orders: Order[];

  constructor(orderService: OrderService) {
    this.orders = orderService.getOrders();
  }
}

Notice how the orderService is provided, or injected, to the constructor. The component is using the order service, but is not instantiating it.

Injecting Functions

Sometimes injecting a value is not enough. For example, what if we need to get the current date before and after some operation:

from typing import Tuple
import datetime

def go() -> Tuple[datetime.datetime, datetime.datetime]:
    started_at = datetime.datetime.now()
    # Do something ...
    ended_at = datetime.datetime.now()
    return started_at, ended_at

To test this function, we can provide the start time like we did before, but we can't provide the end time. One way to approach this is to make the calls to start and end outside the function. This is a valid solution, but for the sake of discussion we'll assume they need to be called inside.

Since we can't mock datetime.datetime itself, one way to make this function testable is to create a separate function that returns the current date:

from typing import Tuple
import datetime

def now() -> datetime.datetime:
  return datetime.datetime.now()

def go() -> Tuple[datetime.datetime, datetime.datetime]:
    started_at = now()
    # Do something ...
    ended_at = now()
    return started_at, ended_at

To control the values returned by the function now in tests, we can use a mock:

>>> from unittest import mock
>>> fake_start = datetime.datetime(2020, 1, 1, 15, 0, 0)
>>> fake_end = datetime.datetime(2020, 1, 1, 15, 1, 30)
>>> with mock('__main__.now', side_effect=[fake_start, fake_end]):
...    go()
(datetime.datetime(2020, 1, 1, 15, 0),
 datetime.datetime(2020, 1, 1, 15, 1, 30))

Another way to approach this without mocking, is to rewrite the function once again:

from typing import Callable, Tuple
import datetime

def go(
    now: Callable[[], datetime.datetime],
) -> Tuple[datetime.datetime, datetime.datetime]:
    started_at = now()
    # Do something ...
    ended_at = now()
    return started_at, ended_at

This time we provide the function with another function that returns a datetime. This is very similar to the first solution we suggested, when we injected the datetime itself to the function.

The function can now be used like this:

>>> go(datetime.datetime.now)
(datetime.datetime(2020, 4, 18, 14, 14, 5, 687471),
 datetime.datetime(2020, 4, 18, 14, 14, 5, 687475))

To test it, we provide a different function that returns known datetimes:

>>> fake_start = datetime.datetime(2020, 1, 1, 15, 0, 0)
>>> fake_end = datetime.datetime(2020, 1, 1, 15, 1, 30)
>>> gen = iter([fake_start, fake_end])
>>> go(lambda: next(gen))
(datetime.datetime(2020, 1, 1, 15, 0),
 datetime.datetime(2020, 1, 1, 15, 1, 30))

This pattern can be generalized even more using a utility object:

from typing import Iterator
import datetime

def ticker(
    start: datetime.datetime,
    interval: datetime.timedelta,
) -> Iterator[datetime.datetime]:
    """Generate an unending stream of datetimes in fixed intervals.

    Useful to test processes which require datetime for each step.
    """
    current = start
    while True:
        yield current
        current += interval

Using ticker, the test will now look like this:

>>> gen = ticker(datetime.datetime(2020, 1, 1, 15, 0, 0), datetime.timedelta(seconds=90))
>>> go(lambda: next(gen)))
(datetime.datetime(2020, 1, 1, 15, 0),
 datetime.datetime(2020, 1, 1, 15, 1, 30))

Fun fact: the name "ticker" was stolen from Go.

Injecting Values

The previous sections demonstrate injection of both values and functions. It's clear from the examples that injecting values is much simpler. This is why it's usually favorable to inject values rather than functions.

Another reason is consistency. Take this common pattern that is often used in Django models:

from django.db import models

class Order(models.Model):
    created = models.DateTimeField(auto_now_add=True)
    modified = models.DateTimeField(auto_now=True)

The model Order includes two datetime fields, created and modified. It uses Django's auto_now attribute to automatically set created when the object is saved for the first time, and auto_now_add to set modified every time the object is saved.

Say we create a new order and save it to the database:

>>> o = Order.objects.create()

Would you expect this test to fail:

>>> assert o.created == o.modified
False

This is very unexpected. How can an object that was just created have two different values for created and modified? Can you imagine what would happen if you rely on modified and created to be equal when an object was never changed, and actually use it to identify unchanged objects:

from django.db.models import F

# Wrong!
def get_unchanged_objects():
  return Order.objects.filter(created=F('modified'))

For the Order model above, this function will always return an empty queryset.

The reason for this unexpected behavior is that each individual DateTimeField is using django.timezone.now internally during save() to get the current time. The time between when the two fields are populated by Django causes the values to end up slightly different:

>>> o.created
datetime.datetime(2020, 4, 18, 11, 41, 35, 740909, tzinfo=<UTC>)

>>> o.modified
datetime.datetime(2020, 4, 18, 11, 41, 35, 741015, tzinfo=<UTC>)

If we treat timezone.now like an injected function, we understand the inconsistencies it may cause.

So, can this be avoided? Can created and modified be equal when the object is first created? I'm sure there are a lot of hacks, libraries and other such exotic solutions but the truth is much simpler. If you want to make sure these two fields are equal when the object is first created, you better avoid auto_now and auto_now_add:

from django.db import models

class Order(models.Model):
    created = models.DateTimeField()
    modified = models.DateTimeField()

Then, when you create a new instance, explicitly provide the values for both fields:

>>> from django.utils import timezone
>>> asof = timezone.now()
>>> o = Order.objects.create(created=asof, modified=asof)
>>> assert o.created == o.modified
>>> Order.objects.filter(created=F('modified'))
<QuerySet [<Order: Order object (2)>]>

To quote the "Zen of Python", explicit is better than implicit. Explicitly providing the values for the fields requires a bit more work, but this is a small price to pay for reliable and predictable data.

using auto_now and auto_now_add

When is it OK to use auto_now and auto_now_add? Usually when a date is used for audit purposes and not for business logic, it's fine to make this shortcut and use auto_now or auto_now_add.

When to Instantiate Injected Values

Injecting values poses another interesting question, at what point should the value be set? The answer to this is "it depends", but there is a rule of thumb that is usually correct: values should be instantiated at the topmost level.

For example, if asof represents when an order is created, a website backend serving a store front may set this value when the request is received. In a normal Django setup, this means that the value should be set by the view. Another common example is a scheduled job. If you have jobs that use management commands, asof should be set by the management command.

Setting the values at the topmost level guarantees that the lower levels remain decoupled and easier to test. The level at which injected values are set, is the level that you will usually need to use mock to test. In the example above, setting asof in the view will make the models easier to test.

Other than testing and correctness, another benefit of setting values explicitly rather than implicitly, is that it gives you more control over your data. For example, in the website scenario, an order's creation date is set by the view immediately when the request is received. However, if you process a batch file from a large customer, the time in which the order was created may well be in the past, when the customer first created the files. By avoiding "auto-magically" generated dates, we can implement this by passing the past date as an argument.


Dependency Injection in Practice

The best way to understand the benefits of DI and the motivation for it is using a real life example.

IP Lookup

Say we want to try and guess where visitors to our Django site are coming from, and we decide to try an use the IP address from the request to do that. An initial implementation can look like this:

from typing import Optional
from django.http import HttpRequest
import requests

def get_country_from_request(request: HttpRequest) -> Optional[str]:
    ip = request.META.get('REMOTE_ADDR', request.META.get('HTTP_X_FORWARDED_FOR'))
    if ip is None or ip == '':
        return None

    response = requests.get(f'https://ip-api.com/json/{ip}')
    if not response.ok:
        return None

    data = response.json()
    if data['status'] != 'success':
        return None

    return data['countryCode']

This single function accepts an HttpRequest, tries to extract an IP address from the request headers, and then uses the requests library to call an external service to get the country code.

ip lookup

I'm using the free service https://ip-api.com to lookup a country from an IP. I'm using this service just for demonstration purposes. I'm not familiar with it, so don't see this as a recommendation to use it.

Let's try to use this function:

>>> from django.test import RequestFactory
>>> rf = RequestFactory()
>>> request = rf.get('/', REMOTE_ADDR='216.58.210.46')
>>> get_country_from_request(request)
'US'

OK, so it works. Notice that to use it we created an HttpRequest object using Django's RequestFactory

Let's try to write a test for a scenario when a country code is found:

import re
import json
import responses

from django.test import RequestFactory

rf = RequestFactory()

with responses.RequestsMock() as rsps:
    url_pattern = re.compile(r'http://ip-api.com/json/[0-9\.]+')
    rsps.add(responses.GET, url_pattern, status=200, content_type='application/json', body=json.dumps({
        'status': 'success',
        'countryCode': 'US'
    }))
    request = rf.get('/', REMOTE_ADDR='216.58.210.46')
    countryCode = get_country_from_request(request)
    assert countryCode == 'US'

The function is using the requests library internally to make a request to the external API. To mock the response, we used the responses library.

If you look at this test and feel like it's very complicated than you are right. To test the function we had to do the following:

That last point is where it gets hairy. To test the function we used our knowledge of how the function is implemented: what endpoint it uses, how the URL is structured, what method it uses and what the response looks like. This creates an implicit dependency between the test and the implementation. In other words, the implementation of the function cannot change without changing the test as well. This type of unhealthy dependency is both unexpected, and prevents us from treating the function as a "black box".

Also, notice that that we only tested one scenario. If you look at the coverage of this test you'll find that it's very low. So next, we try and simplify this function.

Assigning Responsibility

One of the techniques to make functions easier to test is to remove dependencies. Our IP function currently depends on Django's HttpRequest, the requests library and implicitly on the external service. Let's start by moving the part of the function that handles the external service to a separate function:

def get_country_from_ip(ip: str) -> Optional[str]:
    response = requests.get(f'http://ip-api.com/json/{ip}')
    if not response.ok:
        return None

    data = response.json()
    if data['status'] != 'success':
        return None

    return data['countryCode']

def get_country_from_request(request: HttpRequest) -> Optional[str]:
    ip = request.META.get('REMOTE_ADDR', request.META.get('HTTP_X_FORWARDED_FOR'))
    if ip is None or ip == '':
        return None

    return get_country_from_ip(ip)

We now have two functions:

After splitting the function we can now search an IP directly, without crating a request:

>>> get_country_from_ip('216.58.210.46')
'US'
>>> from django.test import RequestFactory
>>> request = RequestFactory().get('/', REMOTE_ADDR='216.58.210.46')
>>> get_country_from_request(request)
'US'

Now, let's write a test for this function:

import re
import json
import responses

with responses.RequestsMock() as rsps:
    url_pattern = re.compile(r'http://ip-api.com/json/[0-9\.]+')
    rsps.add(responses.GET, url_pattern, status=200, content_type='application/json', body=json.dumps({
        'status': 'success',
        'countryCode': 'US'
    }))
    country_code = get_country_from_ip('216.58.210.46')
    assert country_code == 'US'

This test looks similar to the previous one, but we no longer need to use RequestFactory. Because we have a separate function that retrieves the country code for an IP directly, we don't need to "fake" a Django HttpRequest.

Having said that, we still want to make sure the top level function works, and that the IP is being extracted from the request correctly:

# BAD EXAMPLE!
import re
import json
import responses

from django.test import RequestFactory

rf = RequestFactory()
request_with_no_ip = rf.get('/')
country_code = get_country_from_request(request_with_no_ip)
assert country_code is None

We created a request with no IP and the function returned None. With this outcome, can we really say for sure that the function works as expected? Can we tell that the function returned None because it couldn't extract the IP from the request, or because the country lookup returned nothing?

Someone once told me that if to describe what a function does you need to use the words "and" or "or", you can probably benefit from splitting it. This is the layman's version of the Single-responsibility principle that dictates that every class or function should have just one reason to change.

The function get_country_from_request extracts the IP from a request and tries to find the country code for it. So, if the rule is correct, we need to split it up:

def get_ip_from_request(request: HttpRequest) -> Optional[str]:
    ip = request.META.get('REMOTE_ADDR', request.META.get('HTTP_X_FORWARDED_FOR'))
    if ip is None or ip == '':
        return None
    return ip


# Maintain backward compatibility
def get_country_from_request(request: HttpRequest) -> Optional[str]:
    ip = get_ip_from_request(request)
    if ip is None:
        return None
    return get_country_from_ip(ip)

To be able to test if we extract an IP from a request correctly, we yanked this part to a separate function. We can now test this function separately:

rf = RequestFactory()
assert get_ip_from_request(rf.get('/')) is None
assert get_ip_from_request(rf.get('/', REMOTE_ADDR='0.0.0.0')) == '0.0.0.0'
assert get_ip_from_request(rf.get('/', HTTP_X_FORWARDED_FOR='0.0.0.0')) == '0.0.0.0'
assert get_ip_from_request(rf.get('/', REMOTE_ADDR='0.0.0.0', HTTP_X_FORWARDED_FOR='1.1.1.1')) =='0.0.0.0'

With just these 5 lines of code we covered a lot more possible scenarios.

Using a Service

So far we've implemented unit tests for the function that extracts the IP from the request, and made it possible to do a country lookup using just an IP address. The tests for the top level function are still very messy. Because we use requests inside the function, we were forced to use responses as well to test it. There is nothing wrong with responses, but the less dependencies the better.

Invoking a request inside the function creates an implicit dependency between this function and the requests library. One way to eliminate this dependency is to extract the part making the request to a separate service:

import requests

class IpLookupService:

    def __init__(self, base_url: str) -> None:
        self.base_url = base_url

    def get_country_from_ip(self, ip: str) -> Optional[str]:
        response = requests.get(f'{self.base_url}/json/{ip}')
        if not response.ok:
            return None

        data = response.json()
        if data['status'] != 'success':
            return None

        return data['countryCode']

The new IpLookupService is instantiated with the base url for the service, and provides a single function to get a country from an IP:

>>> ip_lookup_service = IpLookupService('http://ip-api.com')
>>> ip_lookup_service.get_country_from_ip('216.58.210.46')
'US'

Constructing services this way has many benefits:

The top level function should also change. Instead of making requests on its own, it uses the service:

def get_country_from_request(
    request: HttpRequest,
    ip_lookup_service: IpLookupService,
) -> Optional[str]:
    ip = get_ip_from_request(request)
    if ip is None:
        return None
    return ip_lookup_service.get_country_from_ip(ip)

To use the function, we pass an instance of the service to it:

>>> ip_lookup_service = IpLookupService('http://ip-api.com')
>>> request = RequestFactory().get('/', REMOTE_ADDR='216.58.210.46')
>>> get_country_from_request(request, ip_lookup_service)
'US'

Now that we have full control of the service, we can test the top level function without using responses:

from unittest import mock
from django.test import RequestFactory

fake_ip_lookup_service = mock.create_autospec(IpLookupService)
fake_ip_lookup_service.get_country_from_ip.return_value = 'US'

request = RequestFactory().get('/', REMOTE_ADDR='216.58.210.46')

country_code = get_country_from_request(request, fake_ip_lookup_service)
assert country_code == 'US'

To test the function without actually making http requests we created a mock of the service. We then set the return value of get_country_from_ip, and passed the mock service to the function.

Changing Implementations

Another benefit of DI which is often mentioned, is the ability to completely change the underlying implementation of an injected service. For example, one day you discover that you don't have to use a remote service to lookup an IP. Instead, you can use a local IP database.

Because our IpLookupService does not leak its internal implementation, it's an easy switch:

from typing import Optional
import GeoIP

class LocalIpLookupService:
    def __init__(self, path_to_db_file: str) -> None:
        self.db = GeoIP.open(path_to_db_file, GeoIP.GEOIP_STANDARD)

    def get_country_from_ip(self, ip: str) -> Optional[str]:
        return self.db.country_code_by_addr(ip)

The service API remained unchanged, so you can use it the same way as the old service:

>>> ip_lookup_service = LocalIpLookupService('/usr/share/GeoIP/GeoIP.dat')
>>> ip_lookup_service.get_country_from_ip('216.58.210.46')
'US'
>>> from django.test import RequestFactory
>>> request = RequestFactory().get('/', REMOTE_ADDR='216.58.210.46')
>>> get_country_from_request(request, ip_lookup_service)
'US'

The best part here is that the tests are unaffected. All the tests should pass without making any changes.

GeoIP

In the example I use the MaxMind GeoIP Legacy Python Extension API because it uses files I already have in my OS as part of geoiplookup. If you really need to lookup IP addresses check out GeoIP2 and make sure to check the license and usage restrictions.

Also, Django users might be delighted to know that Django provides a wrapper around geoip2.

Typing Services

In the last section we cheated a bit. We injected the new service LocalIpLookupService into a function that expects an instance of IpLookupService. We made sure that these two are the same, but the type annotations are now wrong. We also used a mock to test the function which is also not of type IpLookupService. So, how can we use type annotations and still be able to inject different services?

from abc import ABCMeta
import GeoIP
import requests

class IpLookupService(metaclass=ABCMeta):
    def get_country_from_ip(self, ip: str) -> Optional[str]:
        raise NotImplementedError()


class RemoteIpLookupService(IpLookupService):
    def __init__(self, base_url: str) -> None:
        self.base_url = base_url

    def get_country_from_ip(self, ip: str) -> Optional[str]:
        response = requests.get(f'{self.base_url}/json/{ip}')
        if not response.ok:
            return None

        data = response.json()
        if data['status'] != 'success':
            return None

        return data['countryCode']


class LocalIpLookupService(IpLookupService):
    def __init__(self, path_to_db_file: str) -> None:
        self.db = GeoIP.open(path_to_db_file, GeoIP.GEOIP_STANDARD)

    def get_country_from_ip(self, ip: str) -> Optional[str]:
        return self.db.country_code_by_addr(ip)

We defined a base class called IpLookupService that acts as an interface. The base class defines the public API for users of IpLookupService. Using the base class, we can provide two implementations:

  1. RemoteIpLookupService: uses the requests library to lookup the IP at an external.
  2. LocalIpLookupService: uses the local GeoIP database.

Now, any function that needs an instance of IpLookupService can use this type, and the function will be able to accept any subclass of it.

Before we wrap things up, we still need to handle the tests. Previously we removed the test's dependency on responses, now we can ditch mock as well. Instead, we subclass IpLookupService with a simple implementation for testing:

from typing import Iterable

class FakeIpLookupService(IpLookupService):
    def __init__(self, results: Iterable[Optional[str]]):
        self.results = iter(results)

    def get_country_from_ip(self, ip: str) -> Optional[str]:
        return next(self.results)

The FakeIpLookupService implements IpLookupService, and is producing results from a list of predefined results we provide to it:

from django.test import RequestFactory

fake_ip_lookup_service = FakeIpLookupService(results=['US'])
request = RequestFactory().get('/', REMOTE_ADDR='216.58.210.46')

country_code = get_country_from_request(request, fake_ip_lookup_service)
assert country_code == 'US'

The test no longer uses mock.

Using a Protocol

The form of class hierarchy demonstrated in the previous section is called "nominal subtyping". There is another way to utilize typing without classes, using Protocols:

from typing import Iterable, Optional
from typing_extensions import Protocol
import GeoIP
import requests


class IpLookupService(Protocol):
    def get_country_from_ip(self, ip: str) -> Optional[str]:
        pass


class RemoteIpLookupService:
    def __init__(self, base_url: str) -> None:
        self.base_url = base_url

    def get_country_from_ip(self, ip: str) -> Optional[str]:
        response = requests.get(f'{self.base_url}/json/{ip}')
        if not response.ok:
            return None

        data = response.json()
        if data['status'] != 'success':
            return None

        return data['countryCode']


class LocalIpLookupService:
    def __init__(self, path_to_db_file: str) -> None:
        self.db = GeoIP.open(path_to_db_file, GeoIP.GEOIP_STANDARD)

    def get_country_from_ip(self, ip: str) -> Optional[str]:
        return self.db.country_code_by_addr(ip)


class FakeIpLookupService:
    def __init__(self, results: Iterable[Optional[str]]):
        self.results = iter(results)

    def get_country_from_ip(self, ip: str) -> Optional[str]:
        yield from self.results

The switch from classes to protocols is mild. Instead of creating IpLookupService as a base class, we declare it a Protocol. A protocol is used to define an interface and cannot be instantiated. Instead, a protocol is used only for typing purposes. When a class implements the interface defined by the protocol, is means "Structural Subtyping" exits and the type check will validate.

In our case, we use a protocol to make sure an argument of type IpLookupService implements the functions we expect an IP service to provide.

structural and nominal subtyping

I've written about protocols, structural and nominal subtyping to in the past. Check out Modeling Polymorphism in Django With Python.

So which to use? Some languages, like Java, use nominal typing exclusively, while other languages, like Go, use structural typing for interfaces. There are advantages and disadvantages to both ways, but we won't get into that here. In Python, nominal typing is easier to use and understand, so my recommendation is to stick to it, unless you need the flexibility afforded by protocols.

Nondeterminism and Side-Effects

If you ever had a test that one day just started to fail, unprovoked, or a test that fails once every blue moon for no apparent reason, it's possible your code is relying on something that is not deterministic. In the datetime.date.today example, the result of datetime.date.today relies on the current time which is always changing, hence it's not deterministic.

There are many sources of nondeterminism. Common examples include:

Dependency injection provides a good way to control nondeterminism in tests. The basic recipe is this:

  1. Identify the source of nondeterminism and encapsulate it in a service: For example, TimeService, RandomnessService, HttpService, FilesystemService and DatabaseService.
  2. Use dependency injection to access these services: Never bypass them by using datetime.now() and similar directly.
  3. Provide deterministic implementations of these services in tests: Use a mock, or a custom implementation suited for tests instead.

If you follow the recipe diligently, your tests will not be affected by external circumstances and you will not have flaky tests!


Conclusion

Dependency injection is a design pattern just like any other. Developers can decide to what degree they want to take advantage of it. The main benefits of DI are:

In the use-case above we took several twists and turns to illustrate a point, which might have caused the implementation to seem more complicated than it really is. In addition to that, searching for information about dependency injection in Python often result in libraries and packages than seem to completely change the way you structure your application. This can be very intimidating.

In reality, DI can be used sparingly and in appropriate places to achieve the benefits listed above. When implemented correctly, DI can make your code easier to maintain and to test.

01 Jun 2020 2:00am GMT

29 May 2020

feedDjango community aggregator: Community blog posts

Django News - Issue 25 - May 29th 2020

News

Pipenv new release

A major new release for Pipenv, the first since 2018!

pypa.io

PSF on Twitter: "The deadline to sign up to vote in @ThePSF upcoming election is May 31 AOE. To vote you need to be a PSF fellow, contributing, supporting and/or managing member. Membership info can be found here: https://t.co/W2H2ShVTDr"

The PSF's annual elections are this June. If you are not a member and want to be one, sign up before May 31 AOE to vote this year.

twitter.com

Articles

A tour of Django server setups

See what Django in production visually looks like with diagrams.

mattsegal.dev

Abusing the Django Admin app

Ken Whitesell walks us through automatically registering all of the models in our app while providing a custom ModelAdmin class to make their display useful. This technique is useful when dealing with a large number of models or a legacy database when you don't want to create 200 ModelAdmins.

dev.to

Handling SEO with Wagtail CMS

Adonis Simo walks us through how to make Wagtail more SEO friendly.

adonissimo.com

How to Get Hired as a Django Developer

Practical advice on finding work as a professional Python/Django programmer.

learndjango.com

Waiting in asyncio

While not directly written for Django, this cheat sheet explains the various options available in asyncio for waiting for results.

hynek.me

Debugging a Containerized Django App in VS Code

How to debug a containerized Django App in Visual Studio Code (VS Code).

testdriven.io

Sponsored Link

Two Scoops of Django 3.x: Best Practices for the Django Web Framework

The long-awaited update covers various tips, tricks, patterns, code snippets, and techniques of Django best practices.

feldroy.com

Podcasts

Django Chat - JWTs and AI - David Sanders

David is the author of the popular django-rest-framework-simplejwt package. They discuss authentication, JWTs, and his current role at an AI startup.

djangochat.com

Tutorials

Django Celery Tutorial Series

An approachable two-part tutorial series about setting up Django and Celery.

accordbox.com

Projects

phuoc-ng/csslayout

A very useful collection of popular layouts and patterns made with CSS. While not specifically Django related, HTML is unavoidable as a Django developer.

github.com

jonathan-s/django-sockpuppet

Build reactive applications with the django tooling you already know and love

github.com

Flimm/django-fullurl

Adds three template tags: fullurl, fullstatic, and buildfullurl.

github.com

django-fast-pagination

Faster Queries For Large Databases

github.com

django-grpc-framework

Django gRPC framework is a toolkit for building gRPC services, inspired by djangorestframework.

github.com

jrief/django-entangled

If you use Django's JSONField, django-entangled is the missing tool you need to build custom forms for these fields.

github.com

Shameless Plugs

LearnDjango.com - Free tutorials and premium books

learndjango.com

Work From Home Setups

Jeff created this moment featuring photos of our community's work-from-home desk setups in March as many of us were starting to transition to working from our homes. Take a photo and tag @webology if you want your setup added.

twitter.com


This RSS feed is published on https://django-news.com/. You can also subscribe via email.

29 May 2020 4:00pm GMT

28 May 2020

feedDjango community aggregator: Community blog posts

Book Review: Speed Up Your Django Tests

28 May 2020 5:31pm GMT

Bread and Butter Django - Building SaaS #58

In this episode, I worked on a views and templates. There are a number of core pages that are required to flesh out the minimal interface for the app. We're building them. I began by showing the page that we were going to work on. I outlined the changes I planned to make, then we started. The first thing we added was data about the school year, the main model on display in the page.

28 May 2020 5:00am GMT

27 May 2020

feedDjango community aggregator: Community blog posts

Django and Robot Framework

One of my colleagues has spent a bunch of time investigating and then implementing some testing using [Robot Framework](https://robotframework.com). Whilst at times the command line feels like it was written by someone who hasn't used unix much, it's pretty powerful. There are also some nice tools, like several Google Chrome [plugins](https://chrome.google.com/webstore/detail/robotcorder/ifiilbfgcemdapeibjfohnfpfmfblmpd) [that](https://chrome.google.com/webstore/detail/chrome-robot/dihdbpkpgdkioobahfpnkondnekhbmlo) will record what you are doing and generate a script based upon that. There are also other tools to [help build testing scripts](https://chrome.google.com/webstore/detail/page-modeller-selenium-ro/ejgkdhekcepfgdghejpkmbfjgnioejak). There is also an existing [DjangoLibrary](https://pypi.org/project/robotframework-djangolibrary/) for integrating with [Django](https://www.djangoproject.com/). It's an interesting approach: you install some extra middlewary that allows you to perform requests directly to the server to create instances using [Factory Boy](https://factoryboy.readthedocs.io/), or fetch data from Querysets. However, it requires that the data is serialised before sending to the django server, and the same the other way. This means, for instance, that you cannot follow object references to get a related object without a bunch of legwork: usually you end up doing another `Query Set` query. There are some things in it that I do not like: * A new instance of the django `runserver` command is started for each Test Suite. In our case, this takes over 10 seconds to start as all imports are processed. * The database is flushed between Test Suites. We have data that is added through migrations that is required for the system to operate correctly, and in some cases for tests to execute. This is the same problem I've seen with `TransactionTestCase`. * Migrations are applied before running each Test Suite. This is unnecessary, and just takes more time. * Migrations are created automatically before running each Test Suite. This is just the wrong approach: at worst you'd want to warn that migrations are not up to date - otherwise you are testing migrations that may not have been committed: your CI would pass because the migrations were generated, but your system would fail in reality because those migrations do not really exist. Unless you are also making migrations directly on your production server and not committing them at all, in which case you really should stop that. That's in addition to having to install extra middleware. But, back onto the initial issue: interacting with Django models. What would be much nicer is if you could just call the python code directly. You'd get python objects back, which means you can follow references, and not have to deal with serialisation. It's fairly easy to write a Library for Robot Framework, as it already runs under Python. The tricky bit is that to access Django models (or Factory Boy factories), you'll want to have the Django infrastructure all managed for you. Let's look at what the `DjangoLibrary` might look like if you are able to assume that `django` is already available and configured: {% highlight python %} import importlib from django.apps import apps from django.core.urlresolvers import reverse from robot.libraries.BuiltIn import BuiltIn class DjangoLibrary: """ Tools for making interaction with Django easier. Installation: ensure that in your `resource.robot` or test file, you have the following in your "***Settings***" section: Library djangobot.DjangoLibrary ${HOSTNAME} ${PORT} The following keywords are provided: Factory: execute the named factory with the args and kwargs. You may omit the 'factories' module from the path to reduce the amount of code required. ${obj}= Factory app_label.FactoryName arg kwarg=value ${obj}= Factory app_label.factories.FactoryName arg kwarg=value Queryset: return a queryset of the installed model, using the default manager and filtering according to any keyword arguments. ${qs}= Queryset auth.User pk=1 Method Call: Execute the callable with tha args/kwargs provided. This differs from the Builtin "Call Method" in that it expects a callable, rather than an instance and a method name. ${x}= Method Call ${foo.bar} arg kwargs=value Relative Url: Resolve the named url and args/kwargs, and return the path. Not quite as useful as the "Url", since it has no hostname, but may be useful when dealing with `?next=/path/` values, for instance. ${url}= Relative Url foo:bar baz=qux Url: Resolve the named url with args/kwargs, and return the fully qualified url. ${url}= Url foo:bar baz=qux Fetch Url: Resolve the named url with args/kwargs, and then using SeleniumLibrary, navigate to that URL. This should be used instead of the "Go To" command, as it allows using named urls instead of manually specifying urls. Fetch Url foo:bar baz=qux Url Should Match: Assert that the current page matches the named url with args/kwargs. Url Should Match foo:bar baz=qux """ def __init__(self, hostname, port, **kwargs): self.hostname = hostname self.port = port self.protocol = kwargs.pop('protocol', 'http') @property def selenium(self): return BuiltIn().get_library_instance('SeleniumLibrary') def factory(self, factory, **kwargs): module, name = factory.rsplit('.', 1) factory = getattr(importlib.import_module(module), name) return factory(**kwargs) def queryset(self, dotted_path, **kwargs): return apps.get_model(dotted_path.split('.'))._default_manager.filter(**kwargs) def method_call(self, method, *args, **kwargs): return method(*args, **kwargs) def fetch_url(self, name, *args, **kwargs): return self.selenium.go_to(self.url(name, *args, **kwargs)) def relative_url(self, name, *args, **kwargs): return reverse(name, args=args, kwargs=kwargs) def url(self, name, *args, **kwargs): return '{}://{}:{}'.format( self.protocol, self.hostname, self.port, ) + reverse(name, args=args, kwargs=kwargs) def url_should_match(self, name, *args, **kwargs): self.selenium.location_should_be(self.url(name, *args, **kwargs)) {% endhighlight %} You can write a management command: this allows you to hook in to Django's existing infrastructure. Then, instead of calling robot directly, you use `./manage.py robot` What's even nicer about using a management command is that you can have that (optionally, because in development you probably will already have a devserver running) start `runserver`, and kill it when it's finished. This is the same philosophy as `robotframework-DjangoLibrary` already does, but we can start it once before running out tests, and kill it at the end. So, what could our management command look like? Omitting the code for starting `runserver`, it's quite neat: {% highlight python %} from __future__ import absolute_import from django.core.management import BaseCommand, CommandError import robot class Command(BaseCommand): def add_arguments(self, parser): parser.add_argument('tests', nargs='?', action='append') parser.add_argument('--variable', action='append') parser.add_argument('--include', action='append') def handle(self, **options): robot_options = { 'outputdir': 'robot_results', 'variable': options.get('variable') or [] } if options.get('include'): robot_options['include'] = options['include'] args = [ 'robot_tests/{}_test.robot'.format(arg) for arg in options['tests'] or () if arg ] or ['robot_tests'] result = robot.run(*args, **robot_options) if result: raise CommandError('Robot tests failed: {}'.format(result)) {% endhighlight %} I think I'd like to do a bit more work on finding tests, but this works as a starting point. We can call this like: ./manage.py robot foo --variable BROWSER:firefox --variable PORT:8000 This will find a test called `robot_tests/foo_test.robot`, and execute that. If you omit the `test` argument, it will run on all tests in the `robot_tests/` directory. I've still got a bit to do on cleaning up the code that starts/stops the server, but I think this is useful even without that.

27 May 2020 10:34pm GMT

JWTs and AI - David Sanders

27 May 2020 10:00pm GMT

26 May 2020

feedDjango community aggregator: Community blog posts

Saving GeoPoints Using Django Form

A while ago I was working on a project where I had a map which was part of a simple form. User can select a point on the map and submit. Form's responsibility was to get the submitted data, validate it and save into database if everything is fine. I was using MySQL with GIS support. During the development I faced a couple of issues that I'd be addressing here and how did I fix them. Let's begin!

Consider the below example

from django.contrib.gis.db import models

class Location(models.Model):
    coordinate = models.PointField(blank=True, null=True)
    # many more fields

If you see the Django generated migration file for this model, you will notice that the default value of srid parameter is 4326 although we never provided that explicitly in the model definition.
This is how migration will look like.

operations = [
    migrations.CreateModel(
        name='Location',
        fields=[
            ('id', models.AutoField(auto_created=True, primary_key=True, serialize=False, verbose_name='ID')),
            ('coordinate', django.contrib.gis.db.models.fields.PointField(blank=True, null=True, srid=4326)),
        ],
    ),
]

The default value of srid is being propagated from a base class BaseSpatialField, the PointField has been inherited from. We can always change this value as per requirements but in most cases default value would be sufficient.

Let's try to save some Geo coordinates through the shell. First, we need to import Point class so that we can directly assign the value to the model field. Go ahead and hit the python manage.py shell

>>> from django.contrib.gis.geos import Point
>>> Point(75.778885, 26.922070)  # Latitude=26.922070 & longitude=75.778885
<Point object at 0x11a282c70>
>>> # save into database
>>> Location.objects.create(coordinate=Point(75.778885, 26.922070))
<Location: Location object (1)>

Lets see how it's been stored in the database.

>>> Location.objects.last().coordinate.coords
(75.778885, 26.92207)

Looks good. Thats what we saved.

Lets do the same exercise using Django form. Create a forms.py file as below

from django.contrib.gis import forms  
# Note: forms is being imported from gis module instead of: `from django import forms`

class LocationForm(forms.ModelForm):
    class Meta:
        model = Location
        fields = ('coordinate',)

Now, pass the same data to this form and see how it responds.

>>> data = {'coordinate': '75.778885, 26.92207'}
>>> form = LocationForm(data=data)
>>> form
<LocationForm bound=True, valid=Unknown, fields=(coordinate)>
>>>
>>> form.is_valid()  # check if the provided payload is valid 
Error creating geometry from value '75.778885, 26.92207' (String input unrecognized as WKT EWKT, and HEXEWKB.)

Oops! We got an error- Error creating geometry from value '75.778885, 26.92207' (String input unrecognized as WKT EWKT, and HEXEWKB.)

It seems the data we provided is not in one of the acceptable format. After a bit of searching, I found, I need to provide a proper geometry type with the data.

>>> data = {'coordinate': 'POINT(75.778885 26.92207)'}  # Note that points are separated by a space
>>> form = LocationForm(data=data)
>>> form.is_valid()
True

Nice, It worked! Wait… Did it actually? Too soon to celebrate 😏. Let's save this form and verify the data in the database.

>>> form.save()
<Location: Location object (2)>
>>>
>>> # Now Lets see how it was stored in the database
>>> Location.objects.last().coordinate.coords
(0.0006807333060903553, 0.0002418450696118364)

Whaaaat?

This is not what we provided.

What went wrong! Again Django form needs srid value explicitly. Let's modify the data a little bit and follow the same steps.

>>> data = {'coordinate': 'SRID=4326;POINT(75.778885 26.92207)'}
>>> form = LocationForm(data=data)
>>> form.is_valid()
True
>>> form.save()
<Location: Location object (3)>

Verify the database.

>>> Location.objects.last().coordinate.coords
(75.778885, 26.92207)
>>>

Awesome. finally we can see the data that we inserted.

Now the question is how and where should we make this change in the codebase?

We have two options -
1) We can modify the payload before passing it to the form. But that won't be a good place to do. moreover, we might be using this form in multiple places, in that case, we have to make changes at all those places. That leaves us with 2nd option.

2) We override __init__ method inside the Form class so that all the logic would be at one place.

class LocationForm(forms.ModelForm):
    class Meta:
        model = Location
        fields = ('coordinate',)

    def __init__(self, *args, **kwargs):
        coordinate = kwargs['data'].pop('coordinate', None)
        if coordinate:
            coordinate = coordinate.replace(',', '')  # remove comma, as we need single space between two numbers.
            kwargs['data']['coordinate'] = f'SRID=4326;POINT({coordinate})'

        super(LocationForm, self).__init__(*args, **kwargs)

Now we don't need to pass GEOM_TYPE in the data. we can simply pass the raw point data as we did in the very first step.

>>> data = {'coordinate': '75.778885, 26.92207'}
>>> form = LocationForm(data=data)
>>> form.save()
<Location: Location object (4)>

Verify the database.

>>> Location.objects.last().coordinate.coords
(75.778885, 26.92207)
>>>

πŸ‘πŸ‘πŸ‘ Sweet!

Additionally, if your business logic required some other conditional checks, then you can override clean_<field_name> or/and clean method, write all the logic there and raise relevant exceptions/validation errors if needed. Also, If you have multiple Point fields in your model, it would make sense to create a method inside the class and reuse that in the __init__ method.

The post Saving GeoPoints Using Django Form appeared first on Gaurav Jain.

26 May 2020 3:30pm GMT

Debugging a Containerized Django App in VS Code

In this article, we'll show you how to configure Visual Studio Code (VS Code) to debug a Django app running inside of Docker.

26 May 2020 3:28am GMT

22 May 2020

feedDjango community aggregator: Community blog posts

Django News - Virtual Meetups Galore - May 22nd 2020

News

Test pip's alpha resolver and help us document dependency conflicts - Bernard Tyers

The pip team needs your help testing the dependency resolver. They specifically are looking for projects with complex dependencies which are prone to fail. If this sounds like you, please help them out.

ei8fdb.org

Admin accessibility - Google Groups

Via Adam Johnson, "Interested in helping increase Django's accessibility? Chime in on this discussion started by Tom Carrick."

google.com

Articles

Second-guessing the modern web

A critical look at the SPA + API backend pattern currently en vogue.

macwright.org

Leverage the InnoDB architecture to optimize Django model design

What every developer should know about InnoDB.

medium.com

Optimizing Django ORM Queries

An under-the-hood look at Django's ORM.

schegel.net

The Fast Way to Test Django transaction.on_commit() Callbacks

Performance tips on Django tests from Adam Johnson.

adamj.eu

Sponsored Link

Two Scoops of Django 3.x: Best Practices for the Django Web Framework

The long-awaited update covers various tips, tricks, patterns, code snippets, and techniques of Django best practices.

feldroy.com

Videos

PyCon 2020 - Finite State Machine (FSM) in Django

Calvin Hendryx-Parker on using django-fsm ti build quick, lightweight business workflows for Django applications.

youtube.com

PyCon 2020 - Big O No

Chris Seto on Django ORM runtime complexity and how to avoid it using LATERAL JOINS.

youtube.com

PyCon 2020 - What is deployment, anyway?

A beyond-the-basics look at deploying Django projects.

youtu.be

Podcasts

Django Chat #64 - Python at Microsoft with Nina Zakharenko

Nina is a Cloud Developer Advocate at Microsoft, Twitch Streamer, Pythoniasta, PyCon 2019 Keynoter, long-time DjangoCon speaker, and now teacher at FrontendMasters.

djangochat.com

Tutorials

Caching and scaling for Django

An overview with sample Github code on how/why to cache a Django site.

eralpbayraktar.com

Projects

django-capture-on-commit-callbacks

Capture and make assertions on transaction.on_commit() callbacks.

github.com

EralpB/django-rotate-secret-key

Helps rotating your secret key config in your Django projects without losing sessions. (without logging out users)

github.com

browniebroke/django-codemod

django-codemod helps auto-upgrade your Python/Django code from one version to another. It's new and very promising.

github.com

Events

Django London Meetup - May 26th

The 2nd London Virtual Django Meetup.

meetup.com

Shameless Plugs

LearnDjango.com - Free tutorials and premium books

learndjango.com


This RSS feed is published on https://django-news.com/. You can also subscribe via email.

22 May 2020 7:05pm GMT

Django Celery Tutorial Series

This Django Celery tutorial series teaches you how to use Celery with Django better.

22 May 2020 7:13am GMT

21 May 2020

feedDjango community aggregator: Community blog posts

Wagtail query for scheduled pages

Wagtail has "scheduled" pages that are not visible on the site. I think the interface is not ideal as you need to click the Publish button after setting a publication date on the Settings tab. I'm not sure how exactly the data models work, but the actual publication is handled by a management command and putting the publication date into the future after the post was published doesn't seem to unpublish it.

I wanted to get a list of pages that were scheduled for publication, the query below might not handle all edge cases but shows how I got what I needed.

Raw
MyPageModel.objects.filter(go_live_at__isnull=False).not_live()

21 May 2020 4:48pm GMT

Waiting in asyncio

One of the main appeals of using asyncio is being able to fire off many coroutines and run them concurrently. How many ways do you know for waiting for their results?

21 May 2020 5:00am GMT

Switch A Django Project To Use Pytest - Building SaaS #57

In this episode, I replaced the default Django test runner to use pytest. We walked through installation, configuration, how to change tests, and the benefits that come from using pytest. We started by looking at the current state of the test suite to provide a baseline to compare against. After that, I went to PyPI to find the version of pytest-django that we wanted to install. I added the package to my requirements-dev.

21 May 2020 5:00am GMT

20 May 2020

feedDjango community aggregator: Community blog posts

Python at Microsoft- Nina Zakharenko

20 May 2020 10:00pm GMT

How to auto-reload Celery worker on Code Change

In this Django Celery tutorial, I would talk about how to auto-reload Celery worker on code change.

20 May 2020 5:33pm GMT