26 Jul 2017

feedPlanet Twisted

Moshe Zadka: Image Editing with Jupyter

With the news about MS Paint going away from the default MS install, it might be timely to look at other ways to edit images. The most common edit I need to do is to crop images -- and this is what we will use as an example.

My favorite image editing tool is Jupyter. Jupyter needs some encouragement to be an image editor -- and to easily open images. As is often the case, I have a non-pedagogical, but useful, preamble. The preamble turns Jupyter into an image editor.

from matplotlib.pyplot import imshow
import numpy
import PIL
import os

%matplotlib inline

def inline(some_image):
    imshow(numpy.asarray(some_image))

def open(file_name):
    return PIL.Image.open(os.path.expanduser(file_name))

With the boring part done, it is time to edit some images! In the Shopkick birthday party, I had my caricature drawn. I love it -- but it has a whole baggage talking about the birthday party which is irrelevant for uploading to Facebook.

I have downloaded the image from the blog. I use Pillow (the packaging fork of PIL) to open the image.

a=open("~/Downloads/weeeee.jpg")

Then I want to visually inspect the image inline:

inline(a)

I use the crop method, and directly inline it:

inline(a.crop((0,0,1500,1600)))

If this was longer, and more realistic, this would be playing with the numbers back and forth -- and maybe resize, or combine it with other images.

The Pillow library is great, and this way we can inspect the results as we are modifying the image, allowing iterative image editing. For people like me, without a strong steady artist's hand to perfectly select the right circle, this solution works just great!

26 Jul 2017 5:20am GMT

21 Jul 2017

feedPlanet Twisted

Itamar Turner-Trauring: Incremental results, not incremental implementation

Update: Added section on iterative development.

You're working on a large project, bigger than you've ever worked on before: how do you ship it on time? How do you ensure it has all the functionality it needs? How do you design something that is too big to fit in your head?

My colleague Rafi Schloming, speaking in the context of the transition to microservices, suggests that focusing on incremental results is fundamentally better than focusing on incremental implementation. This advice will serve you well in most large projects, and to explain why I'd like to tell you the story of a software project I built the wrong way.

A real world example

The wrong way…

I once built a system for efficiently sending streams of data from one source to many servers; the resulting software was run by the company's ops team. Since I was even more foolish than I am now, I implemented it in the following order, based on the architecture I had come up with:

  1. First I implemented a C++ integration layer for the Python networking framework I was using, so I could write higher performance code.
  2. Then I implemented the messaging protocol and system, based on a research paper I'd found.
  3. Finally, I handed the code over to ops.

As you can see, I implemented my project based on its architecture: first the bottom layer, then the layers that built on top of it. Unfortunately, since I hadn't consulted ops enough about the design they then had to make some changes on their own. As a result, it took six months to a year until the code was actually being used in production.

…and the right way

How would I have built my tool to deliver incremental results?

  1. Build a working tool in pure Python. This would probably have been too slow for some of the higher-speed message streams.
  2. Hand initial tool over to ops. Ops could then start using it for slower streams, and provide feedback on the design.
  3. Next I would have fixed any problems reported by ops.
  4. Finally, I would rewrite the core networking in C++ for performance.

Notice that this is seemingly less efficient than my original plan, since it involves re-implementing some code. Nonetheless I believe it would have resulted in the project going live much sooner.

Why incremental results are better

Incremental results means you focus on getting results as quickly as possible, even if you can't get all the desired results with initial versions. That means:

Beyond iterative development

"Iterative development" is a common, and good, suggestion for software development, but it's not quite the same as focusing on incremental results. In iterative development you build your full application end-to-end, and then in each released iteration you make the functionality work better. In that sense, the better alternative I was suggesting above could be seen as simply suggesting iterative development. But incremental results is a more broadly applicable idea than iterative development.

Incremental results are the goal; iterative development is one possible technique to achieve that goal. Sometimes you can achieve incremental results without iterative development:

Whenever you can, aim for incremental results: it will reduce the risks, and make your project valuable much earlier. It may mean some wasted effort, yes, as you re-implement certain features, but that waste is usually outweighed by the reduced risk and faster feedback you'll get from incremental results.

PS: I've made lots of other mistakes in my career. If you'd like to learn how to avoid them, sign up for my newsletter, where every week I write up one of my mistakes and how you can avoid it.

21 Jul 2017 4:00am GMT

20 Jul 2017

feedPlanet Twisted

Moshe Zadka: Anatomy of a Multi-Stage Docker Build

Docker, in recent versions, has introduced multi-stage build. This allows separating the build environment from the runtime envrionment much more easily than before.

In order to demonstrate this, we will write a minimal Flask app and run it with Twisted using its WSGI support.

The Flask application itself is the smallest demo app, straight from any number of Flask tutorials:

# src/msbdemo/wsgi.py
from flask import Flask
app = Flask("msbdemo")
@app.route("/")
def hello():
    return "If you are seeing this, the multi-stage build succeeded"

The setup.py file, similarly, is the minimal one from any number of Python packaging tutorials:

import setuptools
setuptools.setup(
    name='msbdemo',
    version='0.0.1',
    url='https://github.com/moshez/msbdemo',
    author='Moshe Zadka',
    author_email='zadka.moshe@gmail.com',
    packages=setuptools.find_packages(),
    install_requires=['flask'],
)

The interesting stuff is in the Dockefile. It is interesting enough that we will go through it line by line:

FROM python:2.7.13

We start from a "fat" Python docker image -- one with the Python headers installed, and the ability to compile extensions.

RUN virtualenv /buildenv

We create a custom virtual environment for the build process.

RUN /buildenv/bin/pip install pex wheel

We install the build tools -- in this case, wheel, which will let us build wheels, and pex, which will let us build single file executables.

RUN mkdir /wheels

We create a custom directory to put all of our wheels. Note that we will not install those wheels in this docker image.

COPY src /src

We copy our minimal Flask-based application's source code into the docker image.

RUN /buildenv/bin/pip wheel --no-binary :all: \
                            twisted /src \
                            --wheel-dir /wheels

We build the wheels. We take care to manually build wheels ourselves, since pex, right now, cannot handle manylinux binary wheels.

RUN /buildenv/bin/pex --find-links /wheels --no-index \
                      twisted msbdemo -o /mnt/src/twist.pex -m twisted

We build the twisted and msbdemo wheels, togther with any recursive dependencies, into a Pex file -- a single file executable.

FROM python:2.7.13-slim

This is where the magic happens. A second FROM line starts a new docker image build. The previous images are available -- but only inside this Dockerfile -- for copying files from. Luckily, we have a file ready to copy: the output of the Pex build process.

COPY --from=0 /mnt/src/twist.pex /root

The --from=0 indicates copying from a previously built image, rather than the so-called "build context". In theory, any number of builds can take place in one Dockefile. While only the last one will actually result in a permanent image, the others are all available as targets for --from copying. In practice, two stages are usually enough.

ENTRYPOINT ["/root/twist.pex", "web", "--wsgi", "msbdemo.wsgi.app", \
            "--port", "tcp:80"]

Finally, we use Twisted as our WSGI container. Since we bound the Pex file to the -m twisted package execution, all we need to is run the web plugin, ask it to run a wsgi container, and give it the logical (module) path to our WSGI app.

Using Docker multi-stage builds has allowed us to create a Docker container for production with:

The biggest benefit is that it let us do so with one Dockerfile, with no extra machinery.

20 Jul 2017 4:30am GMT

18 Jul 2017

feedPlanet Twisted

Glyph Lefkowitz: Beyond ThunderDock

This weekend I found myself pleased to receive a Kensington SD5000T Thunderbolt 3 Docking Station.

Some of its functionality was a bit of a weird surprise.

The Setup

Due to my ... accretive history with computer purchases, I have 3 things on my desk at home: a USB-C macbook pro, a 27" Thunderbolt iMac, and an older 27" Dell display, which is old enough at this point that I can't link it to you. Please do not take this to be some kind of totally sweet setup. It would just be somewhat pointlessly expensive to replace this jumble with something nicer. I purchased the dock because I want to have one cable to connect me to power & both displays.

For those not familiar, iMacs of a certain vintage1 can be jury-rigged to behave as Thunderbolt displays with limited functionality (no access from the guest system to the iMac's ethernet port, for example), using Target Display Mode, which extends their useful lifespan somewhat. (This machine is still, relatively speaking, a powerhouse, so it's not quite dead yet; but it's nice to be able to swap in my laptop and use the big screen.)

The Link-up

On the back of the Thunderbolt dock, there are 2 Thunderbolt 3 ports. I plugged the first one into a Thunderbolt 3 to Thunderbolt 2 adapter which connects to the back of the iMac, and the second one into the Macbook directly. The Dell display plugs into the DisplayPort; I connected my network to the Ethernet port of the dock. My mouse, keyboard, and iPhone were plugged into the USB ports on the dock.

The Problem

I set it up and at first it seemed to be delivering on the "one cable" promise of thunderbolt 3. But then I switched WiFi off to test the speed of the wired network and was surprised to see that it didn't see the dock's ethernet port at all. Flipping wifi back on, I looked over at my router's control panel and noticed that a new device (with the expected manufacturer) was on my network. nmap seemed to indicate that it was... running exactly the network services I expected to see on my iMac. VNCing into the iMac to see what was going on, I popped open the Network system preference pane, and right there alongside all the other devices, was the thunderbolt dock's ethernet device.

The Punch Line

Despite the miasma of confusion surrounding USB-C and Thunderbolt 32, the surprise here is that apparently Thunderbolt is Thunderbolt, and (for this device at least) Thunderbolt devices connected across the same bus can happily drive whatever they're plugged in to. The Thunderbolt 2 to 3 adapter isn't just a fancy way of plugging in hard drives and displays with the older connector; as far as I can tell all the functionality of the Thunderbolt interface remains intact as both "host" and "guest". It's like having an ethernet switch for your PCI bus.

What this meant is that when I unplugged everything and then carefully plugged in the iMac before the Macbook, it happily lit up the Dell display, and connected to all the USB devices plugged into the USB hub. When I plugged the laptop in, it happily started charging, but since it didn't "own" the other devices, nothing else connected to it.

Conclusion

This dock works a little bit too well; when I "dock" now I have to carefully plug in the laptop first, give it a moment to grab all the devices so that it "owns" them, then plug in the iMac, then use this handy app to tell the iMac to enter Target Display mode.

On the other hand, this does also mean that I can quickly toggle between "everything is plugged in to the iMac" and "everything is plugged in to the MacBook" just by disconnecting and reconnecting a single cable, which is pretty neat.


  1. Sadly, not the most recent fancy 5K ones.

  2. which are, simultaneously, both the same thing and not the same thing.

18 Jul 2017 7:11am GMT

Moshe Zadka: Bash is Unmaintainable Python

(Thanks to Aahz, Roy Williams, Yarko Tymciurak, and Naomi Ceder for feedback. Any mistakes that remain are mine alone.)

In the post about building Docker applications, I had the following Python script:

import datetime, subprocess
tag = datetime.datetime.utcnow().isoformat()
tag = tag.replace(':', '-').replace('.', '-')
for ext in ['', '-slim']:
    image = "moshez/python36{}:{}".format(ext, tag)
    orig = "python:3.6{}".format(ext)
    subprocess.check_call(["docker", "pull", orig])
    subprocess.check_call(["docker", "tag", orig, image])
    subprocess.check_call(["docker", "push", image])

I showed this script to two audiences, in two versions of the talk. One, a Python beginner audience, mostly new to Docker. Another, a Docker-centric audience, with varying levels of familiarity with Python. I gave excuses for why this script is in Python, rather than the obvious choice of shell scripting for automating command-line utilities.

None of the excuses were the true reason.

Note that in a talk, things are simplified. Typical scripts in the real world would not be 10 lines or so. They start out 10 lines, of course, but then have to account for edge cases, extra use cases, random bugs in the services that need to be worked around, and so on. I am more used to writing scripts for production than writing scripts for talks.

The true reason the script is in Python is that I have started doing all my "shell" scripting in Python recently, and I am never going back. Unix shell scripting is pretty much writing in unmaintainable Python. Before making the case for that, I am going to take a step in the other direction. The script above took care to only use the standard library. If it could take advantage of third party libraries, I would have written it this way:

import datetime, subprocess
import seashore
xctr = seashore.Executor(seashore.Shell())
tag = datetime.datetime.utcnow().isoformat()
tag = tag.replace(':', '-').replace('.', '-')
for ext in ['', '-slim']:
    image = "moshez/python36{}:{}".format(ext, tag)
    orig = "python:3.6{}".format(ext)
    xctr.docker.pull(orig)
    xctr.docker.tag(orig, image)
    xctr.docker.push(image)

But what if I went the other way?

import datetime, subprocess
tag = datetime.datetime.utcnow().isoformat()
tag = tag.replace(':', '-').replace('.', '-')
for ext in ['', '-slim']:
    image = "moshez/python36{}:{}".format(ext, tag)
    orig = "python:3.6{}".format(ext)
    subprocess.check_call("docker pull " + orig, shell=True)
    subprocess.check_call("docker tag " + orig + " " + image, shell=True)
    subprocess.check_call("docker push " + image, shell=True)

Note that using shell=True is discouraged, and is generally a bad idea. We will revisit why later. If I were using Python 3.6, I could even have the last three lines be:

subprocess.check_call(f"docker pull {orig}", shell=True)
subprocess.check_call(f"docker tag {orig} {image}", shell=True)
subprocess.check_call(f"docker push {image}", shell=True)

or I could even combine them:

subprocess.check_call(f"docker pull {orig} && "
                      f"docker tag {orig} {image} && "
                      f"docker push {image}", shell=True)

What about calculating the tag?

tag = subprocess.check_output("date --utc --rfc-3339=ns | "
                              "sed -e 's/ /T/' -e 's/:/-/g' "
                                  "-e 's/\./-/g' -e 's/\+.*//'",
                              shell=True)

Putting it all together, we would have

import subprocess
tag = subprocess.check_output("date --utc --rfc-3339=ns | "
                              "sed -e 's/ /T/' -e 's/:/-/g' "
                                  "-e 's/\./-/g' -e 's/\+.*//'",
                              shell=True)
for ext in ['', '-slim']:
    image = f"moshez/python36{ext}:{tag}"
    orig = f"python:3.6{ext}"
    subprocess.check_call(f"docker pull {orig} && "
                          f"docker tag {orig} {image} && "
                          f"docker push {image}", shell=True)

None of the changes we made were strictly improvements. They mostly made the code harder to read and more fragile. But now that we have done them, it is straightforward to convert it to a shell script:

#!/bin/sh
set -e
tag = $(date --utc --rfc-3339=ns |
        sed -e 's/ /T/' -e 's/:/-/g' \
        -e 's/\./-/g' -e 's/\+.*//')
for ext in '' '-slim'
do
    image = "moshez/python36$ext:$tag"
    orig = "python:3.6$ext
    docker pull $orig
    docker tag $orig $image
    docker push $image
done

Making our script worse and worse makes a Python script into a shell script. Not just a shell script -- this is arguably idiomatic shell. It uses -e, long options for legibility, and so on. Note that the shell does not even have a way to express a notion like shell=False. In a script without arguments, like this one, this merely means changes are dangerous. In a script with arguments, it means that input handling safely is difficult (and unlikely to happen). Indeed, this is why shell=False is the default, and recommended, approach in Python.

In this case, one that does little but automate unix commands, the primary use-case of shell scripts. It stands to reason that the reverse process -- making a shell script into Python, would have the reverse effect: making for more maintainable, less fragile code.

As an exercise of "going the other way", we will start with a simplified version of shell script

set -e

if [ $# != 3 ]; then
    echo "Invalid arguments: $*";
    exit 1;
fi;

PR_NUMBER="$1"; shift;
TICKET_NUMBER="$1"; shift;
BRANCH_NAME="$1"; shift;


repo="git@github.com:twisted/twisted.git";
wc="$(dirname "$(dirname "$0")")/.git";

if [ ! -d "${wc}" ]; then
  wc="$(mktemp -d -t twisted.XXXX)";

  git clone --depth 1 --progress "${repo}" "${wc}";

  cloned=true;
else
  cloned=false;
fi;

cd "${wc}";

git fetch origin "refs/pull/${PR_NUMBER}/head";
git push origin "FETCH_HEAD:refs/heads/${TICKET_NUMBER}-${BRANCH_NAME}";

if ${cloned}; then
  rm -fr "${wc}";
fi;

How would it look like, with Python and seashore?

import os
import shutil
import sys

import seashore

if len(sys.argv) != 4:
    sys.exit("Invalid arguments: " + ' '.join(sys.argv))

PR_NUMBER, TICKET_NUMBER, BRANCH_NAME = sys.argv[1:]

xctr = seashore.Executor(seashore.Shell())
repo="git@github.com:twisted/twisted.git";
wc=os.path.dirname(os.path.dirname(sys.argv[0])) + '/.git'
if not os.path.isdir(wc):
    wc = tempfile.mkdtemp(prefix='twisted')
    xctr.git.clone(repo, wc, depth=1, progress=None)
    cloned = True
else:
    cloned = False

xctr = xctr.chdir(wc)
xctr.git.fetch(origin, f"refs/pull/{PR_NUMBER}/head")
xctr.git.push(origin,
              f"FETCH_HEAD:refs/heads/{TICKET_NUMBER}-{BRANCH_NAME}")
if cloned:
    shutil.rmtree(wc)

The code is no longer, more explicit, and -- had we wanted to -- easier to now refactor into unit-testable functions.

If this is, indeed, the general case, we can skip that stage entirely: write the script in Python to begin with. When it inevitably increases in scope, it will already be in a language that supports modules and unit tests.

18 Jul 2017 5:20am GMT

16 Jul 2017

feedPlanet Twisted

Itamar Turner-Trauring: Beyond fad frameworks: which programming skills are in demand?

Which programming skills should spend your limited time and energy on, which engineering skills are really in demand? There will always be another fad framework that will soon fade from memory; the time you spend learning it might turn out to be wasted. And job listings ask for ever-changing, not very reasonable combinations of skills: "We want 5 years experience with AngularJS, a deep knowledge of machine learning, and a passion for flyfishing!"

Which skills are really in demand, which will continue to be in demand, and which can safely be ignored? The truth is that the skills employers want are not the skills they actually need: the gap between the two can be a problem, but if you present yourself right it can also be an opportunity.

What employers want

What employers typically want is someone who will get going quickly, with as short a ramp-up time as possible and as little training as possible. While perhaps short-sighted, this certainly seems to be the default mindset. There are two failure modes:

  1. Over-focusing on implementation skills, rather than problem solving skills: "We use AngularJS, therefore we must hire someone who already knows AngularJS!" If it turns out AngularJS doesn't work when the requirements change, hiring only for AngularJS skills will prove problematic.
  2. Hiring based on a hypothetical solution: "I hear that well-known company succeeded using microservices, so we should switch to microservices. We must hire someone who already knows microservices!" If that turns out to be the wrong solution, hiring someone to implement it will not turn out well.

What employers need

What employers actually need is someone who will identify and solve their problems. An organization's goal is never really to use AngularJS or switch to microservices: it's to sell a product or service, help some group of people, promote some point of view, and so on. Employers need employees who will help further these goals.

That doesn't necessarily require knowing the employer's existing technology stack, or having a working knowledge of trendy technologies. Someone who can quickly learn the existing codebase and technologies, identify the big-picture problems, and then come up with and implement a good solution: that is what employers really need.

This can build on a broad range of skills, including:

What you should do

Given this gap between what employers want and what they need, what should you do?

  1. Learn the problem solving skills that employers will always need. That means gathering requirements, coming up with efficient solutions, project management, and so on.
  2. Learn some long-lasting popular technologies in-depth. Relational databases have been around since the 1980's and aren't going anywhere: if you really understand how to structure data, the concurrency model, the impact of disk storage and layout, and so on, learning some other database like MongoDB will be easy (although perhaps a little horrifying). Similarly, decades after their creations languages like Python or Java are still going strong, and if you know one well you'll have an easy time learning many other languages.
  3. Dabble, minimally, in some trendy technologies. If you occasionally spend 30 minutes going through the tutorial for the web framework of the month, when it's time to interview you say can say "I played with it a little." This will also help you with the next item.
  4. Learn how to learn new technologies quickly.

Then, when it's time to look for a job, ignore the list of technology requirements when applying, presuming you think you can do the job: it's what the company wants, not what they need.

Interviewing is about marketing, about how you present yourself. So in your cover letter, and when you interview, emphasize all the ways you can address what they want in other ways, and try to point out ways in which you can help do what they actually need:

Learning every new web framework isn't necessary to get a job. Yes, technologies do change over the years: that means you need to be able to learn new technologies quickly. But just as important as technology skills are those that will make you valuable-the ability to identify and solve problems-and the skill that will make your value clear: the ability to market yourself.

16 Jul 2017 4:00am GMT

10 Jul 2017

feedPlanet Twisted

Itamar Turner-Trauring: Stop writing software, start solving problems

As software engineers we often suffer from an occupational hazard: we enjoy programming. Somewhere in college or high school we discovered that writing code is fun, and so we chose a profession that allowed us to both get paid and enjoy ourselves. And what could be wrong with that?

The problem is that our job as software engineers is not to write code: our job is to solve problems. And if we get distracted by the fun of programming we often do worse at solving those problems.

The siren call of programming

I've been coding since 1995, and I have to admit, I enjoy programming. My working life is therefore a constant struggle against the temptation to do so.

Recently, for example, I encountered a problem in the Softcover book publishing system, which converts Markdown into various e-book formats. I've been working on The Programmer's Guide to a Sane Workweek (intro email course here), and I've reached the point of needing to render the text into a nicely layed out PDF.

Softcover renders Markdown blockquotes like this:

> This is my story.

into LaTex \quote{} environments like this:

\begin{quote}
This is my story.
\end{quote}

I wanted the output to be a custom LaTeX environment, so I could customize the PDF output to look a particular way:

\begin{mycustomquote}
This is my story.
\end{mycustomquote}

This is the point where programming began calling out to me: "Write code! Contribute to the open source community! Submit a patch upstream!" I would need to learn the Softcover code base just enough to find the relevant transformation, learn just enough enough more Ruby to modify the code, figure out how to make the output customizable, write a test or three, and then submit a patch. This probably would have taken me an afternoon. It would have been fun, and I would have felt good about myself.

But my goal is not to write software: my goal is to solve problems, and the problem in this case is spitting out the correct LaTeX so I can control my book's formatting. And so instead of spending an afternoon on it, I spent five minutes writing the following Makefile:

build-pdf:
        rm -rf generated_polytex/*.tex
        softcover build:pdf
        sed 's/{quote}/{mycustomquote}/g' -i generated_polytex/*.tex
        softcover build:pdf

This is a horrible hack: I'm relying on the fact that building a PDF generates TeX files if they don't already exist, but uses existing ones if they are there and newer than the source. So I build the PDF, modify the generated TeX files in place, and then rebuild the PDF with the modified files.

I would never do anything like this if I was building a production system used by customers. But this isn't a production system, and there are no customers: it's a script only I will ever run, and I run it manually. It's not elegant, but then it doesn't have to be.

I solved my problem, and I solved it efficiently.

Stop writing code, start solving problems

Don't write code just because it's fun-instead, solve the problem the right way:

You can write software for fun, of course: programming makes a fine hobby. But when you're working, when you're trying to get a product shipped, when you're trying to get a bug fixed: be a professional, and focus on solving the problem.

PS: Coding for fun when I should've been solving problems is just is one of the many programming mistakes I've made over the years. Sign up for my Software Clown newsletter and every week you'll hear the story of one my engineering or career mistakes and how you can avoid it.

10 Jul 2017 4:00am GMT

07 Jul 2017

feedPlanet Twisted

Itamar Turner-Trauring: Don't crank out code at 2AM, especially if you're the CTO

Dear HubSpot CTO,

Yesterday over on the social medias you wrote that there's "nothing quite as satisfying as cranking out code at 2am for a feature a customer requested earlier today." I'm guessing that as CTO you don't get to code as much these days, and I don't wish to diminish your personal satisfaction. But nonetheless cranking out code at 2AM is a bad idea: it's bad for your customers, it sets a bad example for your employees, and as a result it's bad for your company.

An invitation to disaster

Tired people make mistakes. This is not controversial: lack of sleep has been tied to everything from medical errors to the Exxon Valdez and Challenger disasters (see Evan Robinson on the evils of crunch mode for references).

If you're coding and deploying at 2AM:

And that's just the short term cost. When you do start work the next day you'll also be tired, and correspondingly less productive and more likely to make mistakes.

None of this is good for your customers.

Encouraging a culture of low productivity

If you're a random developer cranking out code at 2AM the worse you can do is harm your product or production environment. If you're the CTO, however, you're also harming your organization.

By touting a 2AM deploy you're encouraging your workers to work long hours, and to write and deploy code while exhausted. Which is to say, you're encouraging your workers to engage in behavior that's bad for the company. Tired workers are less productive. Tired workers make more mistakes. Is that really what you want from your developers?

Don't crank out code at 2AM: it's bad for you and your customers. And if you must, don't brag about it publicly. At best it's a guilty pleasure; bragging makes it a public vice.

Regards,

-Itamar

PS: While I've never cranked out code at 2AM, I've certainly made my own share of mistakes as a programmer. If you'd like to learn from my failings sign up for my newsletter where each week I cover one of my mistakes and how you can avoid it.

07 Jul 2017 4:00am GMT

27 Jun 2017

feedPlanet Twisted

Itamar Turner-Trauring: It may not be your fault, but it's always your responsibility

If you're going to be a humble programmer, you need to start with the assumption that every reported bug is your fault. This is a good principle, but what if it turns out the user did something stupid? What if it really is a third-party library that is buggy, not your code?

Even if a bug isn't your fault it's still your responsibility, and in this post I'll explain how to apply that in practice.

First, discover the source of the problem

A user has reported a bug: they're using some code you wrote and something has gone horribly wrong. What can the source of the problem be?

A good starting assumption is that you are at fault, that your code is buggy. It's hard, I know: I often find myself assuming other people's code is the problem, only to find it was my own mistake. But precisely because it's so hard to blame oneself it's better to start with that as the presumed cause, to help overcome the bias against admitting a mistake.

If something is a bug in your code then you can go and fix it. But sometimes users will have problems that aren't caused by a bug in your code: sometimes users do silly things, or a library you depend on has a bug. What then?

Then, take responsibility

Even if the fault was elsewhere, you are still responsible, and you can take appropriate action.

User error

If the user made a mistake, or had a misunderstanding, that implies your design is at fault. Maybe your API encourages bad interaction patterns, maybe your error handling isn't informative enough, maybe your user interface doesn't ask users to confirm that yes, they want to delete all their data. Whatever the problem, user mistakes are something you can try to fix with a better design:

If a better design is impossible, the next best thing to do is write some documentation, and explain why the users shouldn't do that, or document a workaround. The worst thing to do is to dismiss user error as the user's problem: if one person made a mistake, probably others will as well.

Environmental problems

If your code doesn't work in a particular environment, well, that's your responsibility too:

If all else fails, write some documentation.

Third party bugs

Found a bug in someone else's code?

And again, if all else fails, write some documentation explaining a workaround.

It's always your responsibility

Releasing code into the world is a responsibility: you are telling people they can rely on you. When a user reports a problem, there's almost always something you can do. So take your responsibility seriously and fix the problem, regardless of whose fault it is.

Best of all is avoiding problems in the first place: I've made many mistakes you can avoid by signing up for my weekly newsletter. Every week I'll share an engineering or career mistake and how you can avoid it.

27 Jun 2017 4:00am GMT

26 Jun 2017

feedPlanet Twisted

Moshe Zadka: Imports at a Distance

(Thanks to Mark Williams for feedback and research)

Imagine the following code:

## mymodule.py
import toplevel.nextlevel.lowmodule

def _func():
    toplevel.nextlevel.lowmodule.dosomething(1)

def main():
    _func()

Assuming the toplevel.nextlevel.module does define a function dosomething, this code seems to work just fine.

However, imagine that later we decide to move _func to a different module: : .. code:

# utilmodule.py
import toplevel

def _func():
    toplevel.nextlevel.lowmodule.dosomething(1)

This code will probably still work, as long as at some point, before calling _func, we import mymodule.

This introduces a subtle action-at-a-distance: the code will only stop working when we remove the import from mymodule -- or any other modules which import lowmodule.

Even unit tests will not necessarily catch the problem, depending on the order of imports of the unit tests. Static analyzers, like pylint and pyflakes, also cannot catch it.

The only safe thing to do is to eschew this import style completely, and always do from toplevel.nextlevel import lowmodule.

Addendum

Why is this happening?

Python package imports are a little subtle.

import toplevel

Does three things:

  • (Once only) Creates a toplevel entry in sys.modules
  • (Once only) Executes toplevel/__init__.py inside the namespace of toplevel
  • Creates a variable called toplevel and assigns it the module.

The things marked "once only" will only happen the first time toplevel is imported.

import toplevel.nextlevel

Does the same three things (with :code:`toplevel`) as well as:

  • (Once only) Creates a toplevel.nextlevel entry in sys.modules
  • (Once only) Executes:code:toplevel/nextlevel/__init__.py inside the namespace of toplevel.nextlevel
  • (Once only) Creates a variable nextlevel in the namespace of toplevel, and binds the toplevel.nextlevel module to it.

The third one is the most interesting one -- note that the first time toplevel.nextlevel is imported, a nextlevel variable is created in the namespace of toplevel, so that every subsequent place that imports toplevel can access nextlevel for "free".

26 Jun 2017 5:20am GMT

25 Jun 2017

feedPlanet Twisted

Moshe Zadka: X Why Zip

PEP 441 resulted in the creation of the zipapp module. The PEP says "Python has had the ability to execute directories or ZIP-format archives as scripts since version 2.6 [...] This feature is not as popular as it should be mainly because it was not promoted as part of Python 2.6." So far, so true -- the first time I saw the feature used in production, in Facebook, I was so shocked I had to take a Sweet Stop break.

However, more than a year before the PEP was created, and even longer than the PEP was implemented, the PEX format was contributed to the Python community by Twitter. It was, indeed, not well promoted. Indeed the lightning talk by Brian Wickman (creator of PEX) wouldn't be given for two more years.

However, at this point in time, PEX is a superior solution to zipapp in every single way:

The only advantage zipapp has? It is in the standard library. This used to be a big advantage. However, Python packaging is good now and the biggest difference is that a module in the standard library can change in backwards-compatible ways extremely slowly, and in ways that evolve bad interfaces even slower. A module on PyPI can get regular updates, regular releases and, most importantly, if it is bad -- it can be supplanted by a new module, and see users naturally move to the new solution.

ZipApp is a non-solution for a non-problem. The solution for the problem of this feature not being well known is to talk about it more. I, and other people, have given multiple talks that involved the awesomeness of PEX (in SF Python, SF SRE, PyBay) and have written multiple posts and proofs of concept on my GitHub.

I have used PEX in production in three different companies, teaching my colleagues about it as a side-effect.

I wish more people would be giving talks, and writing posts. Using the standard library to reimplement a popular tool, that can iterate faster, not being bound to the Python release cycle, does not help anyone.

25 Jun 2017 5:20am GMT

23 Jun 2017

feedPlanet Twisted

Hynek Schlawack: Sharing Your Labor of Love: PyPI Quick and Dirty

A completely incomplete guide to packaging a Python module and sharing it with the world on PyPI.

23 Jun 2017 12:00am GMT

21 Jun 2017

feedPlanet Twisted

Itamar Turner-Trauring: The bad reasons you're forced to work long hours

Working long hours is unproductive, unhealthy, and unfortunately common. I strongly believe that working less hours is good for you and your employer, yet many companies and managers force you to work long hours, even as it decreases worker productivity.

So why do they do it? Let's go over some of the reasons.

Leading by example

Some managers simply don't understand that working long hours is counter-productive. Consider the founders of a startup. They love their job: the startup is their baby, and they are happy to works long hour to ensure it succeeds. That may well be inefficient and counter-productive, but they won't necessarily realize this.

The employees that join afterwards take their cue from the founders: if the boss is working long hours, it's hard not to do so yourself. And since the founders love what they're building it never occurs to them that long hours might not be for everyone, or even might be an outright negative for the company. Similar situations can also happen in larger organizations, when a team lead or manager put in long hours out of a sense of dedication.

A sense of entitlement

A less tractable problem is a manager who thinks they own your life. Jason Fried describes this as a Managerial Entitlement Complex: the idea that if someone is paying you a salary they are entitled to every minute of your time.

In this situation the problem isn't ignorance on the part of your manager. The problem is that your manager doesn't care about you as a human being or even as an employee. You're a resource provided by the human resources department, just like the office printer is provided by the IT department.

Control beats profits

Another problem is the fact that working hours are easy to measure, and therefore easy to control. When managers or companies see their employees as a cost center (and at least in the US the corporate culture is heavily biased against labor costs) the temptation to "control costs" by measuring and maximizing hours can be hard to resist.

Of course, this results in less output and so it is not rational behavior if the goal is to maximize profits. Would companies would actually choose labor control over productivity? Evidence from other industries suggests they would.

Up until the 1970s many farms in California forced their workers to use a short hoe, which involved bending over continuously. The result was a high rate of worker injuries. Employers liked the short hoe because they could easily control farm workers' labor: because of the way the workers bent over when using the short hoe it was easy to see whether or not they were working.

After a series of strikes and lawsuits by the United Farm Workers the short hoe was banned. The result? According to the CEO of a large lettuce grower, productivity actually went up.

(I learned this story from the book Solving the Climate Crisis through Social Change, by Gar W. Lipow. The book includes a number of other examples and further references.)

Bad incentives, or Cover Your Ass

Bad incentives in one part of the company can result in long hours in another. Consider this scenario: the sales team, which is paid on commission, has promised a customer to deliver a series of features in a month. Unfortunately implementing those features will take 6 months. The sales team doesn't care: they're only paid for sales, and delivering the product isn't their problem.

Now put yourself in the place of the tech lead or manager whose team has to implement those features. You can try to push back against the sales team's promises, but in many companies that will result in being seen as "not a team player." And when the project fails you and your team will be blamed by sales for not delivering on the company's promises.

When you've been set up to fail, your primary goal is to demonstrate that the inevitable failure was not your fault. The obvious and perhaps only way for you to do this is to have your team work long hours, a visible demonstration of commitment and effort. "We did everything we could! We worked 12 hour days, 6 days a week but we just couldn't do it."

Notice that in this scenario the manager may be good at their job; the issue is the organization as a whole.

Hero syndrome

Hero syndrome is another organizational failure that can cause long working hours. Imagine you're an engineer working for a startup that's going through a growth spurt. Servers keep going down under load, the architecture isn't keeping up, and there are lots of other growing pains. One evening the whole system goes down, and you stay up until 4AM bringing it back up. At the next company event you are lauded as a hero for saving the day… but no one devotes any resources to fixing the underlying problems.

The result is hero syndrome: the organization rewards those who save the day at the last minute, rather than work that prevents problems in the first place. And so they end up with a cycle of failure. Tired engineers making mistakes, lack of resources to build good infrastructure, and rewards for engineers who work long hours to try to duct tape a structure that is falling apart.

Avoiding bad companies

Working long hours is not productive. But since many companies don't understand this, when you're looking for a new job be on the lookout for the problems I described above. And if you'd like more tips to help you work a sane, productive workweek, check out my email course, the Programmer's Guide to a Sane Workweek.

21 Jun 2017 4:00am GMT

19 Jun 2017

feedPlanet Twisted

Hynek Schlawack: Why Your Dockerized Application Isn’t Receiving Signals

Proper cleanup when terminating your application isn't less important when it's running inside of a Docker container. Although it only comes down to making sure signals reach your application and handling them, there's a bunch of things that can go wrong.

19 Jun 2017 12:00am GMT

14 Jun 2017

feedPlanet Twisted

Itamar Turner-Trauring: Lawyers, bad jokes and typos: how not to name your software

When you're writing software you'll find yourself naming every beast of the field, and every fowl of the air: projects, classes, functions, and variables. There are many ways to fail at naming projects, and when you do the costs of a bad name can haunt you for years.

To help you avoid these problems, let me share some of bad naming schemes I have been responsible for, observed, or had inflicted on me. You can do better.

Five naming schemes to avoid

They're gonna try to take it

Rule #1: don't give your software the same name as a heavy metal band, or more broadly anyone who can afford to have a lawyer on retainer.

Long ago, the open source Twisted project had a sub-package for managing other subprocesses. Twisted's main package is called twisted, and the author decided to call this package twisted.sister. This was a mistake.

One day my company, which advertised Twisted consulting services, received a cease and desist letter from the lawyers of the band Twisted Sister. They indicated that Twisted's use of the name Twisted Sister was a violation of the band's intellectual property, demanded we stop immediately, after which they wanted to discuss damages. Since my company didn't actually own Twisted this was a little confusing, but I passed this on to the project.

The project wrote the lawyers explaining that Twisted was run by hobbyists, just so it was clear there was no money to be had. Twisted also changed the package name from twisted.sister to twisted.sibling: none of us believed the lawyers' claim had any validity, but no one wanted to deal with the hassle of fighting them.

A subject of ridicule

Rule #2: don't pick a name that will allow people to make bad jokes about you.

Continuing with the travails of the Twisted project, naming the project "Twisted" was a mistake. Python developers have, until recent years, not been very comfortable with asynchronous programming, and Twisted is an async framework. Unfortunately, having a name with negative connotations meant this discomfort was verbalized in a way that associated it with the project.

"Twisted" led people to say things like "Twisted is so twisted," over and over and over again. Other async libraries for Python, like asyncore or Tornado, had neutral names and didn't suffer from this problem.

Bad metaphors

Rule #3: if you're going to use an extended metaphor, pick one that makes sense.

Continuing to pick on Twisted yet again (sorry!), one of Twisted's packages is a remote method invocation library, similar to Java RMI. The package is called twisted.spread, the wire format is twisted.spread.banana, the serialization layer is twisted.spread.jelly, and the protocol itself is twisted.spread.pb.

This naming scheme, based on peanut butter and jelly sandwiches, has a number of problems. To begin with, PB&J is very American, and software is international. As a child and American emigrant living in a different country, the peanut butter and banana sandwiches my mother made led to ridicule by my friends.

Minor personal traumas aside, this naming scheme has no relation to what the software actually does. Silliness is a fine thing, but names should also be informative. The Homebrew project almost falls into this trap, with formulas and taps and casks and whatnot. But while the metaphor is a little unstable on its feet, it's not quite drunk enough to completely fall over.

Typos

Rule #4: avoid names with common typos.

One of my software projects is named Crochet. Other people-and I make this typo too, to be fair-will mistakenly write "crotchet" instead, which the dictionary describes as "a perverse fancy; a whim which takes possession of the mind; a conceit."

Bonus advice: you may wish to avoid whims or conceits when naming your software projects.

I can't even

Rule #5: avoid facepalms.

I used to work for a company named ClusterHQ, and our initial product was named Flocker. When the company shut down the CEO wrote a blog post titled ClusterF**ed.

Why you shouldn't listen to my advice

Twisted has many users, from individuals to massive corporations. My own project, Crochet, has a decent number. ClusterHQ shut down for reasons that had nothing to do with its name. So it's not clear any of this makes a difference.

You should certainly avoid names that confuse your users, and you'll be happier if you can avoid lawsuits. But if you're going to be writing software all day, you should enjoy yourself while you do. If your programming language supports Unicode symbols, why not use emoji in your project name?

🙊🙉🙈 has a nice sound to it.

By the way, if you'd like to learn how to avoid my many mistakes, subscribe to my weekly newsletter. Every week I share one of my programming or career mistakes and how you can avoid it.

14 Jun 2017 4:00am GMT

12 Jun 2017

feedPlanet Twisted

Hynek Schlawack: Hardening Your Web Server’s SSL Ciphers

There are many wordy articles on configuring your web server's TLS ciphers. This is not one of them. Instead I will share a configuration which is both compatible enough for today's needs and scores a straight "A" on Qualys's SSL Server Test.

12 Jun 2017 10:00am GMT