22 Oct 2016

feedPlanet Twisted

Glyph Lefkowitz: docker run glyph/rproxy

Want to TLS-protect your co-located stack of vanity websites with Twisted and Let's Encrypt using HawkOwl's rproxy, but can't tolerate the bone-grinding tedium of a pip install? I built a docker image for you now, so it's now as simple as:

$ mkdir -p conf/certificates;
$ cat > conf/rproxy.ini << EOF;
> [rproxy]
> certificates=certificates
> http_ports=80
> https_ports=443
> [hosts]
> mysite.com_host=<other container host>
> mysite.com_port=8080
$ docker run --restart=always -v "$(pwd)"/conf:/conf \
    -p 80:80 -p 443:443 \

There are no docs to speak of, so if you're interested in the details, see the tree on github I built it from.

Modulo some handwaving about docker networking to get that <other container host> IP, that's pretty much it. Go forth and do likewise!

22 Oct 2016 8:12pm GMT

19 Oct 2016

feedPlanet Twisted

Itamar Turner-Trauring: Why Pylint is both useful and unusable, and how you can actually use it

This is a story about a tool that caught a production-impacting bug the day before we released the code. This is also the story of a tool no one uses, and for good reason. By the time you're done reading you'll see why this tool is useful, why it's unusable, and how you can actually use it with your Python project.

(Not a Python programmer? The same problems and solutions are likely apply to tools in your ecosystem as well.)

Pylint saves the day

If you're coding in Haskell the compiler's got your back. If you're coding in Java the compiler will usually lend a helping hand. But if you're coding in a dynamic language like Python or Ruby you're on your own: you don't have a compiler to catch bugs for you.

The next best thing is a lint tool that uses heuristics to catch bugs in your code. One such tool is Pylint, and here's how I started using it.

One day at work we realized our builds had been consistently failing for a few days, and it wasn't the usual intermittent failures. After a few days of investigating, my colleague Tom Prince discovered the problem. It was Python code that looked something like this:

for volume in get_volumes():

for volme in get_other_volumes():

Notice the typo in the second for loop. Combined with the fact that Python leaks variables from blocks, the last value of volume from the first for loop was used for every iteration of the second loop.

To see if we could prevent these problems in the future I tried Pylint, re-introduced the bug... and indeed it caught the problem. I then looked at the rest of the output to see what else it had found.

What it had found was a serious bug. It was in code I had written a few days earlier, and the bug completely broke an important feature we were going to ship to users the very next day. Here's a heavily simplified minimal reproducer for the bug:

list_of_printers = []
for i in [1, 2, 3]:
    def printer():

for func in list_of_printers:

The intended result of this reproducer is to print:


But what will actually get printed with this code is:


When you define a nested function in Python that refers to a variable in the outside scope it binds not the value of a variable but the variable itself. In this case that means the i inside printer() ended up always getting the last value of the variable i in the for loop.

And luckily Pylint caught that bug before it shipped; pretty great, right?

Why no one uses Pylint

Pylint is useful, but many projects don't use it. For example, I went and checked just now, and neither Twisted nor Django nor Flask nor Sphinx seem to use Pylint. Why wouldn't these large, sophisticated Python projects use a tool that would automatically catch bugs for them?

One problem is that it's slow, but that's not the real problem; you can always just run it on the CI system with the other slow tests. The real problem is the amount of output.

Here's what I mean: I ran pylint on a checkout of Twisted and the resulting output was 28,000 lines of output (at which point pylint crashed, but I'll assume that's fixed in newer releases). Let me say that again: 28,000 errors or warnings.

That's insane.

And to be fair Twisted has a coding standard that doesn't match the Python mainstream, but massive amounts of noise has been my experience with other projects as well. Pylint has a lot of useful errors... but also a whole lot of utterly useless garbage assumptions about how your code should look. And fundamentally it treats them all the same; e.g. there's a distinction between warnings and errors but in practice both useful and useless stuff is in the warning category.

For example:

W:675, 0: Class has no __init__ method (no-init)

That's not a useful warning. Now imagine a few thousand of those.

How you should use Pylint

So here we have a tool that is potentially useful, but unusable in practice. What to do? Luckily Pylint has some functionality that can help: you can configure it with a whitelist of lint checks.

First, setup Pylint to do nothing:

  1. Make a list of all the features you plausibly want to enable from the Pylint docs and configure .pylintrc to whitelist them.
  2. Comment them all out.

At this point Pylint will do no checks. Next:

  1. Uncomment a small batch of checks, and run pylint.
  2. If the resulting errors are real problems, fix them. If the errors are utter garbage, delete those checks from the configuration.

At this point you have a small number of probably useful checks that are passing: you can run pylint and you only will be told about new problems. In other words, you have a useful tool.

Repeat this process a few times, or once a week, enabling a new batch of checks each time until you run out of patience or you run out of Pylint checks to enable.

The end result will be something like this configuration or this configuration; both projects are open source under the Apache 2.0 license, so you can use those as a starting point.

Go forth and lint

Here's my challenge to you: if you're a Python programmer, go setup Pylint on a project today. It'll take an hour to get some minimal checks going, and one day it will save you from a production-impacting bug. If you're not a Python programmer you can probably find some equivalent tool for your language; go set that up.

And if you're the author of a lint tool, please, try to come up with better defaults. It's better to catch 60% of bugs and have 10,000 software projects using your tool than to catch 70% of bugs and have almost no one use it.

19 Oct 2016 4:00am GMT

Glyph Lefkowitz: docker run glyph/rproxy

Want to TLS-protect your co-located stack of vanity websites with Twisted and Let's Encrypt using HawkOwl's rproxy, but can't tolerate the bone-grinding tedium of a pip install? I built a docker image for you now, so it's now as simple as:

$ mkdir -p conf/certificates;
$ cat > conf/rproxy.ini << EOF;
> [rproxy]
> certs=certificates
> http_ports=80
> https_ports=443
> [hosts]
> mysite.com_host=<other container host>
> mysite.com_port=8080
$ docker run --restart=always -v "$(pwd)"/conf:/conf \
    -p 80:80 -p 443:443 \

There are no docs to speak of, so if you're interested in the details, see the tree on github I built it from.

Modulo some handwaving about docker networking to get that <other container host> IP, that's pretty much it. Go forth and do likewise!

19 Oct 2016 12:32am GMT

18 Oct 2016

feedPlanet Twisted

Glyph Lefkowitz

As some of you may have guessed from the unintentional recent flurry of activity on my Twitter account, twitter feed, the service I used to use to post blog links automatically, is getting end-of-lifed. I've switched to dlvr.it for the time being, unless they send another unsolicited tweetstorm out on my behalf...

Sorry about the noise! In the interests of putting some actual content here, maybe you would be interested to know that I was recently interviewed for PyDev of the Week?

18 Oct 2016 8:37pm GMT

15 Oct 2016

feedPlanet Twisted

Jonathan Lange: servant-template: production-ready Haskell web services in 5 minutes

If you want to write a web API in Haskell, then you should start by using my new cookiecutter template at https://github.com/jml/servant-template. It'll get you a production-ready web service in 5 minutes or less.

Whenever you start any new web service and you actually care about getting it working and available to users, it's very useful to have:

These are largely boring, but nearly essential. Logs and monitoring give you visibility into the code's behaviour in production, tests and continuous integration help you make sure you don't break it, and, of course, you need some way of actually shipping code to users. As an engineer who cares deeply about running code in production, these are pretty much the bare minimum for me to be able to deploy something to my users.

The cookiecutter template at gh:jml/servant-template creates a simple Haskell web API service that does all of these things:

As the name suggests, all of this enables writing a servant server. Servant lets you declaring web APIs at the type-level and then using those API specifications to write servers. It's hard to overstate just how useful it is for writing RESTful APIs.

Get started with:

$ cookiecutter gh:jml/servant-template
project_name [awesome-service]: awesome-service
$ cd awesome-service
$ stack test
$ make image
$ docker run awesome-service:latest --help
awesome-service - TODO fill this in

Usage: awesome-service --port PORT [--access-logs ARG] [--log-level ARG]
  One line description of project

Available options:
  -h,--help                Show this help text
  --port PORT              Port to listen on
  --access-logs ARG        How to log HTTP access
  --log-level ARG          Minimum severity for log messages
  --ghc-metrics            Export GHC metrics. Requires running with +RTS.
$ docker run -p 8080:80 awesome-service --port 80
[2016-10-16T20:50:07.983292987000] [Informational] Listening on :80

For this to work, you'll need to have Docker installed on your system. I've tested it on my Mac with Docker Machine, but haven't yet with Linux.

You might have to run stack docker pull before make image, if you haven't already used stack to build things from within Docker.

Once it's up and running, you can browse to http://localhost:8080/ (or http://$(docker-machine ip):8080/) if you're on a Mac, and you'll see a simple HTML page describing the API and giving you a link to the /metrics page, which is where all the Prometheus metrics are exported.

There you have it, a production-ready web service. At least for some values of "production-ready".

Of course, the API it offers is really simple. You can make it your own by editing the API definition and the server implementation to make it really your own. Note these two are in separate libraries to make it easier to generate client code.

The template comes with a test suite that uses servant-quickcheck to guarantee that none of your endpoints return 500s, take longer than 100ms to serve, and that all the 201s include Location headers.

If you're so inclined, you could push the created Docker image to a repository somewhere-it's around 25MB when built. Then, people could use it and no one would have to know that it's Haskell, they'd just notice a fast web service that works.

As the README says, I've made a few questionable decisions when building this. If you disagree, or think I could have done anything better I'd love to know. If you use this to build something cool, or even something silly, please let me know on Twitter.

15 Oct 2016 11:00pm GMT

14 Oct 2016

feedPlanet Twisted

Itamar Turner-Trauring: How to find a programming job you won't hate

Somewhere out there is a company that wants to hire you as a software engineer. Working for that company is a salesperson whose incentives were set by an incompetent yet highly compensated upper management. The salesperson has just made a sale, and in return for a large commission has promised the new customer twice the features in half the time.

The team that wants to hire you will spend the next three months working evenings and weekends. And then, with a job badly done, they'll move on to the next doomed project.

You don't want to work for this company, and you shouldn't waste your time applying there.

When you're looking for a new programming job you want to find it quickly:

Assuming you can afford to be choosy, you'll want to speed up the process by filtering out as many companies as possible in advance. There are many useful ways to filter your list down: your technical interests, the kinds of company you want to work for, location.

In this post, however, I'd like to talk about ways to filter out companies you'd hate. That is, companies with terrible work conditions.

Talk to your friends

Some companies have an bad reputation, some have a great reputation. But once a company is big enough different teams can end up with very different work environments.

Talking to someone who actually works at a company will give you much better insight about how things work more locally. They can tell you which groups to avoid, and which groups have great leadership.

For example, Amazon does not have a very good reputation as a workplace, but I know someone who enjoys his job there and his very reasonable working hours.


For companies where you don't have contacts Glassdoor can be a great resource. Glassdoor is a site that lets employees post anonymous salaries and reviews of their company.

The information is anonymous, so you have to be a little skeptical, especially when there's only a few reviews. And you need to pay attention to the reviewer's role, location, and the year it was posted. Once you take all that into account the reviews can often be very informative.

During my last job search I found one company in the healthcare area with many complaints of long working hours. One of Glassdoor's features is a way for a company to reply to reviews. In this case the CEO himself answered, explaining that they work hard because "sick patients can't wait."

Personally I'd rather not work for someone who confuses working long hours with increased output or productivity.

Read company materials

After you've checked out Glassdoor the next thing to look at is the job posting itself, along with the company's website. These are often written by people other than the engineering team, but you can still learn a lot from them.

Sometimes you'll get the sense the company is actually a great place to work for. For example, Memrise has this to say in their Software Engineering postings:

If you aren't completely confident that you fit our exact criteria, please get in touch immediately. Humility is a wonderful thing and we're not interested in hiring 'rockstars' or 'ninjas'.

On the other hand, consider a job post I found for an Automation Test Engineer. First we learn:

Must be able to execute scripts during off hours if required.

This is peculiar; if they're automated why does a person need to run them manually? Later on we read:

This isn't the job for someone looking for a traditional 8-5 position, but it's a great role for someone who is hungry for a terrific opportunity in a fast-paced, state of the art environment.

Apparently they consider working 8-5 traditional, they will work their employees much longer hours, and they think they're "state of the art" even though they haven't heard of cron.

Notice, by the way, that it's worth reading all of a company's job postings. Other job postings from the same company are less informative about working conditions than the one I just quoted.


Finally, if a company has passed the previous filters and you've gotten an interview, make sure you ask about working conditions. Tactfully, of course, and once you've demonstrated your value, but if you don't ask you won't know until it's too late. Here are some sample questions to get you started:

Depending on the question you might want to ask individual contributors rather than managers. But I've had managers tell me outright they want employees to work really long hours.


There are many bad software jobs out there. But you don't need to work evenings or weekends to succeed as a programmer.

If you want to find a programming job with a sane workweek, a job you'll actually enjoy, sign up for the free email course below for more tips and tricks.

14 Oct 2016 4:00am GMT

09 Oct 2016

feedPlanet Twisted

Thomas Vander Stichele: Puppet/puppetdb/storeconfigs validation issues

Over the past year I've chipped away at setting up new servers for apestaart and managing the deployment in puppet as opposed to a by now years old manual single server configuration that would be hard to replicate if the drives fail (one of which did recently, making this more urgent).

It's been a while since I felt like I was good enough at puppet to love and hate it in equal parts, but mostly manage to control a deployment of around ten servers at a previous job.

Things were progressing an hour or two here and there at a time, and accelerated when a friend in our collective was launching a new business for which I wanted to make sure he had a decent redundancy setup.

I was saving the hardest part for last - setting up Nagios monitoring with Matthias Saou's puppet-nagios module, which needs External Resources and storeconfigs working.

Even on the previous server setup based on CentOS 6, that was a pain to set up - needing MySQL and ruby's ActiveRecord. But it sorta worked.

It seems that for newer puppet setups, you're now supposed to use something called PuppetDB, which is not in fact a database on its own as the name suggests, but requires another database. Of course, it chose to need a different one - Postgres. Oh, and PuppetDB itself is in Java - now you get the cost of two runtimes when you use puppet!

So, to add useful Nagios monitoring to my puppet deploys, which without it are quite happy to be simple puppet apply runs from a local git checkout on each server, I now need storedconfigs which needs puppetdb which pulls in Java and Postgres. And that's just so a system that handles distributed configuration can actually be told about the results of that distributed configuration and create a useful feedback cycle allowing it to do useful things to the observed result.

Since I test these deployments on local vagrant/VirtualBox machines, I had to double their RAM because of this - even just the puppetdb java server by default starts with 192MB reserved out of the box.

But enough complaining about these expensive changes - at least there was a working puppetdb module that managed to set things up well enough.

It was easy enough to get the first host monitored, and apart from some minor changes (like updating the default Nagios config template from 3.x to 4.x), I had a familiar Nagios view working showing results from the server running Nagios itself. Success!

But all runs from the other vm's did not trigger adding any exported resources, and I couldn't find anything wrong in the logs. In fact, I could not find /var/log/puppetdb/puppetdb.log at all…

fun with utf-8

After a long night of experimenting and head scratching, I chased down a first clue in /var/log/messages saying puppet-master[17702]: Ignoring invalid UTF-8 byte sequences in data to be sent to PuppetDB

I traced that down to puppetdb/char_encoding.rb, and with my limited ruby skills, I got a dump of the offending byte sequence by adding this code:

Puppet.warning "Ignoring invalid UTF-8 byte sequences in data to be sent to PuppetDB"
File.open('/tmp/ruby', 'w') { |file| file.write(str) }
Puppet.warning "THOMAS: is here"

(I tend to use my name in debugging to have something easy to grep for, and I wanted some verification that the File dump wasn't triggering any errors)
It took a little time at 3AM to remember where these /tmp files end up thanks to systemd, but once found, I saw it was a json blob with a command to "replace catalog". That could explain why my puppetdb didn't have any catalogs for other hosts. But file told me this was a plain ASCII file, so that didn't help me narrow it down.

I brute forced it by just checking my whole puppet tree:

find . -type f -exec file {} \; > /tmp/puppetfile
grep -v ASCII /tmp/puppetfile | grep -v git

This turned up a few UTF-8 candidates. Googling around, I was reminded about how terrible utf-8 handling was in ruby 1.8, and saw information that puppet recommended using ASCII only in most of the manifests and files to avoid issues.

It turned out to be a config from a webalizer module:

webalizer/templates/webalizer.conf.erb: UTF-8 Unicode text

While it was written by a Jesús with a unicode name, the file itself didn't have his name in it, and I couldn't obviously find where the UTF-8 chars were hiding. One StackOverflow post later, I had nailed it down - UTF-8 spaces!

00004ba0 2e 0a 23 c2 a0 4e 6f 74 65 20 66 6f 72 20 74 68 |..#..Note for th|
00004bb0 69 73 20 74 6f 20 77 6f 72 6b 20 79 6f 75 20 6e |is to work you n|

The offending character is c2 a0 - the non-breaking space

I have no idea how that slipped into a comment in a config file, but I changed the spaces and got rid of the error.

Puppet's error was vague, did not provide any context whatsoever (Where do the bytes come from? Dump the part that is parseable? Dump the hex representation? Tell me the position in it where the problem is?), did not give any indication of the potential impact, and in a sea of spurious puppet warnings that you simply have to live with, is easy to miss. One down.

However, still no catalogs on the server, so still only one host being monitored. What next?

users, groups, and permissions

Chasing my next lead turned out to be my own fault. After turning off SELinux temporarily, checking all permissions on all puppetdb files to make sure that they were group-owned by puppetdb and writable for puppet, I took the last step of switching to that user role and trying to write the log file myself. And it failed. Huh? And then id told me why - while /var/log/puppetdb/ was group-writeable and owned by puppetdb group, my puppetdb user was actually in the www-data group.

It turns out that I had tried to move some uids and gids around after the automatic assignment puppet does gave different results on two hosts (a problem I still don't have a satisfying answer for, as I don't want to hard-code uids/gids for system accounts in other people's modules), and clearly I did one of them wrong.

I think a server that for whatever reason cannot log should simply not start, as this is a critical error if you want a defensive system.

After fixing that properly, I now had a puppetdb log file.

resource titles

Now I was staring at an actual exception:

2016-10-09 14:39:33,957 ERROR [c.p.p.command] [85bae55f-671c-43cf-9a54-c149cede
c659] [replace catalog] Fatal error on attempt 0
java.lang.IllegalArgumentException: Resource '{:type "File", :title "/var/lib/p
uppet/concat/thomas_vimrc/fragments/75_thomas_vimrc-\" allow adding additional
config through .vimrc.local_if filereadable(glob(\"~_.vimrc.local\"))_\tsource
~_.vimrc.local_endif_"}' has an invalid tag 'thomas:vimrc-" allow adding additi
onal config through .vimrc.local
if filereadable(glob("~/.vimrc.local"))
source ~/.vimrc.local
'. Tags must match the pattern /\A[a-z0-9_][a-z0-9_:\-.]*\Z/.
at com.puppetlabs.puppetdb.catalogs$validate_resources.invoke(catalogs.
clj:331) ~[na:na]

Given the name of the command (replace catalog), I felt certain this was going to be the problem standing between me and multiple hosts being monitored.

The problem was a few levels deep, but essentially I had code creating fragments of vimrc files using the concat module, and was naming the resources with file content as part of the title. That's not a great idea, admittedly, but no other part of puppet had ever complained about it before. Even the files on my file system that store the fragments, which get their filename from these titles, happily stored with a double quote in its name.

So yet again, puppet's lax approach to specifying types of variables at any of its layers (hiera, puppet code, ruby code, ruby templates, puppetdb) in any of its data formats (yaml, json, bytes for strings without encoding information) triggers errors somewhere in the stack without informing whatever triggered that error (ie, the agent run on the client didn't complain or fail).

Once again, puppet has given me plenty of reasons to hate it with a passion, tipping the balance.

I couldn't imagine doing server management without a tool like puppet. But you love it when you don't have to tweak it much, and you hate it when you're actually making extensive changes. Hopefully after today I can get back to the loving it part.

flattr this!

09 Oct 2016 8:31pm GMT

07 Oct 2016

feedPlanet Twisted

Itamar Turner-Trauring: More learning, less time: how to quickly gather new tools and techniques

Update: Added newsletters to the list.

Have you ever worked hard to solve a problem, only to discover a few weeks later an existing design pattern that was even better than your solution? Or built an internal tool, only to discover an existing tool that already solved the problem?

To be a good software engineer you need a good toolbox. That means software tools you can use when appropriate, design patterns so you don't have to reinvent the wheel, testing techniques... the list goes on. Learning all existing tools and techniques is impossible, and just keeping up with every newly announced library would be a full time job.

How do you learn what you need to know to succeed at your work? And how can you do so without spending a huge amount of your free time reading and programming just to keep up?

A broad toolbox, the easy way

To understand how you can build your toolbox, consider the different levels of knowledge you can have. You can be an expert on a subject, or you can have some basic understanding, or you might just have a vague awareness that the subject exists.

For our purposes building awareness is the most important of the three. You will never be an expert in everything, and even basic understanding takes some time. But broad awareness takes much less effort: you just need to remember small amounts of information about each tool or technique.

You don't need to be an expert on a tool or technique, or even use it at all. As long as you know a tool exists you'll be able to learn more about it when you need to.

For example, there is a tool named Logstash that moves server logs around. That's pretty much all you have to remember about it, and it takes just 3 seconds to read that previous sentence. Maybe you'll never use that information... or maybe one day you'll need to get logs from a cluster of machines to a centralized location. At that point you'll remember the name "Logstash", look it up, and have the motivation to actually go read the documentation and play around with it.

Design patterns and other techniques take a bit more effort to gain useful awareness, but still, awareness is usually all you need. For example, property-based testing is hugely useful. But all it takes is a little reading to gain awareness, even if it will take more work to actually use it.

The more tools and techniques you are aware of the more potential solutions you will have to the problems you encounter while programming. Being aware of a broad range of tools and techniques is hugely valuable and easy to achieve.

Building your toolbox

How do you build your toolbox? How do you find the tools and techniques you need to be aware of? Here are three ways to do so quickly and efficiently.


A great way to learn new tools and techniques are newsletters like Ruby Weekly. There are newsletters on many languages and topics, from DevOps to PostgreSQL.

Newsletters typically include not just links but also short descriptions, so you can skim them and gain awareness even without reading all the articles. In contrast, sites like Reddit or Hacker News only include links, so you gain less information unless you spend more time reading.

The downside of newsletters is that they focus on the new. You won't hear about a classic design pattern or a standard tool unless someone happens to write a new blog post about it. You should therefore rely on additional sources as well.

Conference proceedings

Another broader source of tools and techniques are conferences. Conference talks are chosen by a committee with some understanding of the conference subject. Often they can be quite competitive: I believe the main US Python conference accepts only a third of proposals. And good conferences will aim for a broad range of talks, within the limits of their target audience. As a result conferences are a great way to discover relevant, useful tools and techniques, both new and old.

Of course, going to a conference can be expensive and time consuming. Luckily you don't have to go to the conference to benefit.

Just follow this quick procedure:

  1. Find a conference relevant to your interests. E.g. if you're a Ruby developer find a conference like RubyConf.
  2. Skim the talk descriptions; they're pretty much always online.
  3. If something sounds really interesting, there's a decent chance you can find a recording of the talk, or at least the slides.
  4. Mostly however you just need to see what people are talking about and make a mental note of things that sound useful or interesting.

For example, skimming the RubyConf 2016 program I see there's something called OpenStruct for dynamic data objects, FactoryGirl which is apparently a testing-related library, a library for writing video games, an explanation of hooks and so on. I'm not really a Ruby programmer, but if I ever want to write a video game in Ruby I'll go find that talk.

Meetups and user groups

Much like conferences, meetups are a great way to learn about a topic. And much like conferences, you don't actually have to go to the meetup to gain awareness.

For example, the Boston Python Meetup has had talks in recent months about CPython internals, microservices, BeeKeeper which is something for REST APIs, the Plone content management system, etc..

I've never heard of BeeKeeper before, but now I know its name and subject. That's very little information, gained very quickly... but next time I'm building a REST API with Python I can go look it up and see if it's useful.

If you don't know what a "REST API" is, well, that's another opportunity for growing your awareness: do a Google search and read a paragraph or two. If it's relevant to your job, keep reading. Otherwise, make a mental note and move on.

Book catalogs

Since your goal is awareness, not in-depth knowledge, you don't need to read a book to gain something: the title and description may be enough. Technical book publishers are in the business of publishing relevant books, so browsing their catalog can be very educational.

For example, the Packt book catalog will give you awareness of a long list of tools you might find useful one day. You can see that "Unity" is something you use for game development, "Spark" is something you use for data science, etc.. Spend 20 seconds reading the Spark book description and you'll learn Spark does "statistical data analysis, data visualization, predictive modeling" for "Big Data". If you ever need to do that you now have a starting point for further reading.

Using your new toolbox

There are only so many hours in the day, so many days in a year. That means you need to work efficiently, spending your limited time in ways that have the most impact.

The techniques you've just read do exactly that: you can learn more in less time by spending the minimum necessary to gain awareness. You only need to spend the additional time to gain basic understanding or expertise for those tools and techniques you actually end up using. And having a broad range of tools and techniques means you can get more done at work, without reinventing the wheel every time.

You don't need to work evenings or weekends to be a successful programmer! This post covers just some of the techniques you can use to be more productive within the limits of a normal working week. To help you get there I'm working on a book, The Programmer's Guide to a Sane Workweek.

Sign up in the email subscription form below to learn more about the book, and to get notified as I post more tips and tricks on how you can become a better software engineer.

07 Oct 2016 4:00am GMT

24 Sep 2016

feedPlanet Twisted

Hynek Schlawack: Sharing Your Labor of Love: PyPI Quick and Dirty

A completely incomplete guide to packaging a Python module and sharing it with the world on PyPI.

24 Sep 2016 12:00pm GMT

17 Sep 2016

feedPlanet Twisted

Glyph Lefkowitz: Hitting The Wall

I'm an introvert.

I say that with a full-on appreciation of just how awful thinkpieces on "introverts" are.

However, I feel compelled to write about this today because of a certain type of social pressure that a certain type of introvert faces. Specifically, I am a high-energy introvert.

Cementing this piece's place in the hallowed halls of just awful thinkpieces, allow me to compare my mild cognitive fatigue with the plight of those suffering from chronic illness and disability1. There's a social phenomenon associated with many chronic illnesses, "but you don't LOOK sick", where well-meaning people will look at someone who is suffering, with no obvious symptoms, and imply that they really ought to be able to "be normal".

As a high-energy introvert, I frequently participate in social events. I go to meet-ups and conferences and I engage in plenty of public speaking. I am, in a sense, comfortable extemporizing in front of large groups of strangers.

This all sounds like extroverted behavior, I know. But there's a key difference.

Let me posit two axes for personality type: on the X axis, "introvert" to "extrovert", and on the Y, "low energy" up to "high energy".

The X axis describes what kinds of activities give you energy, and the Y axis describes how large your energy reserves are for the other type.

Notice that I didn't say which type of activity you enjoy.

Most people who would self-describe as "introverts" are in the low-energy/introvert quadrant. They have a small amount of energy available for social activities, which they need to frequently re-charge by doing solitary activities. As a result of frequently running out of energy for social activities, they don't enjoy social activities.

Most people who would self-describe as "extroverts" are also on the "low-energy" end of the spectrum. They have low levels of patience for solitary activity, and need to re-charge by spending time with friends, going to parties, etc, in order to have the mental fortitude to sit still for a while and focus. Since they can endlessly get more energy from the company of others, they tend to enjoy social activities quite a bit.

Therefore we have certain behaviors we expect to see from "introverts". We expect them to be shy, and quiet, and withdrawn. When someone who behaves this way has to bail on a social engagement, this is expected. There's a certain affordance for it. If you spend a few hours with them, they may be initially friendly but will visibly become uncomfortable and withdrawn.

This "energy" model of personality is of course an oversimplification - it's my personal belief that everyone needs some balance of privacy and socialization and solitude and eventually overdoing one or the other will be bad for anyone - but it's a useful one.

As a high-energy introvert, my behavior often confuses people. I'll show up at a week's worth of professional events, be the life of the party, go out to dinner at all of them, and then disappear for a month. I'm not visibily shy - quite the opposite, I'm a gregarious raconteur. In fact, I quite visibly enjoy the company of friends. So, usually, when I try to explain that I am quite introverted, this claim is met with (quite understandable) skepticism.

In fact, I am quite functionally what society expects of an "extrovert" - until I hit the wall.

In endurance sports, one is said to "hit the wall" at the point where all the short-term energy reserves in one's muscles are exhausted, and there is a sudden, dramatic loss of energy. Regardless, many people enjoy endurance sports; part of the challenge of them is properly managing your energy.

This is true for me and social situations. I do enjoy social situations quite a bit! But they are nevertheless quite taxing for me, and without prolonged intermissions of solitude, eventually I get to the point where I can no longer behave as a normal social creature without an excruciating level of effort and anxiety.

Several years ago, I attended a prolonged social event2 where I hit the wall, hard. The event itself was several hours too long for me, involved meeting lots of strangers, and in the lead-up to it I hadn't had a weekend to myself for a few weeks due to work commitments and family stuff. Towards the end I noticed I was developing a completely flat affect, and had to start very consciously performing even basic body language, like looking at someone while they were talking or smiling. I'd never been so exhausted and numb in my life; at the time I thought I was just stressed from work.

Afterwards though, I started having a lot of weird nightmares, even during the daytime. This concerned me, since I'd never had such a severe reaction to a social situation, and I didn't have good language to describe it. It was also a little perplexing that what was effectively a nice party, the first half of which had even been fun for me, would cause such a persistent negative reaction after the fact. After some research, I eventually discovered that such involuntary thoughts are a hallmark of PTSD.

While I've managed to avoid this level of exhaustion before or since, this was a real learning experience for me that the consequences of incorrectly managing my level of social interaction can be quite severe.

I'd rather not do that again.

The reason I'm writing this, though3, is not to avoid future anxiety. My social energy reserves are quite large enough, and I now have enough self-knowledge, that it is extremely unlikely I'd ever find myself in that situation again.

The reason I'm writing is to help people understand that I'm not blowing them off because I don't like them. Many times now, I've declined or bailed an invitation from someone, and later heard that they felt hurt that I was passive-aggressively refusing to be friendly.

I certainly understand this reaction. After all, if you see someone at a party and they're clearly having a great time and chatting with everyone, but then when you invite them to do something, they say "sorry, too much social stuff", that seems like a pretty passive-aggressive way to respond.

You might even still be skeptical after reading this. "Glyph, if you were really an introvert, surely, I would have seen you looking a little shy and withdrawn. Surely I'd see some evidence of stage fright before your talks."

But that's exactly the problem here: no, you wouldn't.

At a social event, since I have lots of energy to begin with, I'll build up a head of steam on burning said energy that no low-energy introvert would ever risk. If I were to run out of social-interaction-juice, I'd be in the middle of a big crowd telling a long and elaborate story when I find myself exhausted. If I hit the wall in that situation, I can't feel a little awkward and make excuses and leave; I'll be stuck creepily faking a smile like a sociopath and frantically looking for a way out of the converstaion for an hour, as the pressure from a large crowd of people rapidly builds up months worth of nightmare fuel from my spiraling energy deficit.

Given that I know that's what's going to happen, you won't see me when I'm close to that line. You won't be in at my desk when I silently sit and type for a whole day, or on my couch when I quietly read a book for ten hours at a time. My solitary side is, by definition, hidden.

But, if I don't show up to your party, I promise: it's not you, it's me.

  1. In all seriousness: this is a comparison of kind and not of degree. I absolutely do not have any illusions that my minor mental issues are a serious disability. They are - by definition, since I do not have a diagnosis - subclinical. I am describing a minor annoyance and frequent miscommunication in this post, not a personal tragedy.

  2. I'll try to keep this anonymous, so hopefully you can't guess - I don't want to make anyone feel bad about this, since it was my poor time-management and not their (lovely!) event which caused the problem.

  3. ... aside from the hope that maybe someone else has had trouble explaining the same thing, and this will be a useful resource for them ...

17 Sep 2016 9:18pm GMT

16 Sep 2016

feedPlanet Twisted

Itamar Turner-Trauring: Introducing the Programmer's Guide to a Sane Workweek

I'm working on a book: The Programmer's Guide to a Sane Workweek, a guide to how you can achieve a saner, shorter workweek. If you want to get a free course based on the the book signup in the email subscription at the end of the post. Meanwhile, here's the first excerpt from the book:

A sane workweek is achievable: for the past 4 years I've been working less than 40 hours a week.

Soon after my daughter was born I quit my job as a product manager at Google and became a part-time consultant, writing software for clients. I wrote code for 20-something hours each week while our child was in daycare, and I spent the rest of my time taking care of our kid.

Later I got a job with one of my clients, a startup, where I worked as an employee but had a 28-hour workweek. These days I work at another startup, with a 35-hour workweek.

I'm not the only software engineer who has chosen to work a saner, shorter workweek. There are contractors who work part-time, spending the rest of their time starting their own business. There are employees with specialized skills who only work two days a week. There are even entrepreneurs who have deliberately created a business that isn't all-consuming.

Would you like to join us?

If you're a software developer working crazy hours then this book can help you get to a saner schedule. Of course what makes a schedule sane or crazy won't be the same for me as it is for you. You should spend some time thinking about what exactly it is that you want.

How much time do you want to spend working each week?

Depending on what you want there are different paths you can pursue.

Some paths to a saner workweek

Here are some ways you can reduce your workweek; I'll cover them in far more detail in later chapters of the book:

Normalizing your workweek

If you're working a lot more than 40 hours a week you always have the option of unilaterally normalizing your hours. That is, reducing your hours down to 40 hours or 45 hours or whatever you think is fair. Chances are your productivity and output will actually increase. You might face problems, however, if your employer cares more about hours "worked" than about output.

Reducing overhead

Chances are that the hours your employer counts as your work are just part of the time you spend on your job. In particular, commuting can take another large bite out your free time. Cut down on commuting and long lunch breaks and you've gotten some of that time back without any reduction in the hours your boss cares about.

Negotiating a shorter workweek at your current job

If you want a shorter-than-normal workweek you can try to negotiate that at your current job. Your manager doesn't want to replace a valued, trained employee: hiring new people is expensive and risky. That means you have an opening to negotiate shorter hours. This is one of the most common ways software engineers I know have reduced their hours.

Find a shorter workweek at a new job

If you're looking for a 40-hour workweek this is mostly about screening for a good company culture as part of your interview process. If you want a shorter-than-normal workweek you will need to negotiate a better job offer. That usually means your salary but you can sometimes negotiate shorter working hours. This path can be tricky; I've managed to do it, but have also been turned down, and I know of other people who have failed. It's easier if you've already worked for the company as a consultant, so they know what they're getting. Alternatively if your previous (ideally, your current) job gave you a shorter workweek you'll have better negotiating leverage.

Long-term contracts

Instead of working as an employee you can take on long-term contract work, often through an agency. The contract can specify how many hours you will work, and shorter workweeks are sometimes possible. You can even get paid overtime!


Instead of taking on long-term work, which is similar in many ways to being an employee, you go out and find project work for yourself. That means you need to spend something like half your time on marketing. By marketing well and providing high value to your clients you can charge high rates, allowing you to work reasonable hours.

Product business

All the paths so far involved exchanging money for time, in one form or another. As a software engineer you have another choice: you can make a product once and easily sell that same product multiple times. That means your income is no longer directly tied to how many hours you work. You'll need marketing and other business skills to do so, and you won't just be writing code.

Early retirement

Finally, if you don't want to work ever again there is the path of early retirement. That doesn't mean you can't get make money afterwards; it means you no longer have to make a living, you've earned enough that your time is your own. To get there you'll need very low living expenses, and a high saving rate while you're still working. Luckily programmers tend to get paid well.

Which path will you take?

Each of these paths has its own set of requirements and trade-offs, so it's worth considering which one fits your needs. At different times of your life you might prefer one path, and later you might prefer another. For example, I've worked as both a consultant and a part-time employee.

What kind of work environment do you want right now?

A later chapter will cover choosing your path in more detail. For now, take a little time to think it through and imagine what your ideal job would be like. Combine that with your weekly hours goal you should get some sense of which path is best for you.

It won't be easy

Working a sane workweek is not something corporate culture encourages, at least in the US. That means you won't be following the default, easy path that most workers do: you're going to need to do some work to get to your destination. In later chapters I'll explain how you can acquire the prerequisites for your chosen path, but for now here's a summary:

How much do you really want to work a sane workweek? Do you care enough to make the necessary effort?

It won't be easy, but I think it's worth it.

Shall we get started? Sign up below to get a free course that will take you through the first steps of your journey.

16 Sep 2016 4:00am GMT

15 Sep 2016

feedPlanet Twisted

Moshe Zadka: Post-Object-Oriented Design

In the beginning, came the so-called "procedural" style. Data was data, and behavior, implemented as procedure, were separate things. Object-oriented design is the idea to bundle data and behavior into a single thing, usually called "classes". In return for having to tie the two together, the thought went, we would get polymorphism.

Polymorphism is pretty neat. We send different objects the same message, for example, "turn yourself into a string", and they respond appropriately - each according to their uniquely defined behavior.

But what if we could separate the data and beahvior, and still get polymorphism? This is the idea behind post-object-oriented design.

In Python, we achieve this with two external packages. One is the "attr" package. This package allows a useful way to define bundles of data, that still exhibit the minimum amount of behavior we do want: initialization, string representation, hashing and more.

The other is the "singledispatch" package (available as functools.singledispatch in Python 3.4+).

import attr
import singledispatch

In order to be specific, we imagine a simple protocol. The low-level details of the protocol do not concern us, but we assume some lower-level parsing allows us to communicate in dictionaries back and forth (perhaps serialized/deserialized using JSON).

Our protocol is one to send changes to a map. The only two messages are "set", to set a key to a given value, and "delete", to delete a key.

messages = (
    'type': 'set',
    'key': 'language',
    'value': 'python'
    'type': 'delete',
    'key': 'human'

We want to represent those as attr-based classes.

class Set(object):
    key = attr.ib()
    value = attr.ib()

class Delete(object):
    key = attr.ib()
print(Set(key='language', value='python'))
Set(key='language', value='python')

When incoming dictionaries arrive, we want to convert them to the logical classes. This code could not be simpler, in this example. (The reason is mostly because the protocol is simple.)

def from_dict(dct):
    tp = dct.pop('type')
    name_to_klass = dict(set=Set, delete=Delete)
        klass = name_to_klass[tp]
    except KeyError:
        raise ValueError('unknown type', tp)
    return klass(**dct)

Note how we take advantage of the fact that attr-based classes accept correctly-named keyword arguments.

from_dict(dict(type='set', key='name', value='myname')), from_dict(dict(type='delete', key='data'))
(Set(key='name', value='myname'), Delete(key='data'))

But this was easy! There was no need for polymorphism: we always get one type in (dictionaries), and we consult a mapping to decide which type to produce.

However, for serialization, we do need polymorphism. Enter our second tool - the singledispatch package. The default function is equivalent to a method defined on "object": the ultimate super-class. Since we do not want to serialize generic objects, our default implementation errors out.

def to_dict(obj):
    raise TypeError("cannot serialize", obj)

Now, we implement the actual serializers. The names of the functions are not important. To emphasize they should not be used directly, we make them "private" by prepending an underscore.

def _to_dict_set(st):
    return dict(type='set', key=st.key, value=st.value)

def _to_dict_delete(dlt):
    return dict(type='delete', key=dlt.key)

Indeed, we do not call them directly.

print(to_dict(Set(key='k', value='v')))
{'type': 'set', 'value': 'v', 'key': 'k'}
{'type': 'delete', 'key': 'kk'}

However, arbitrary objects cannot be serialized.

except TypeError as e:
    print e
('cannot serialize', <object object at 0x7fbdb254ac60>)

Now that the structure of adding such an "external method" has been shown, another example can be given: "act on": applying the changes requested to an in-memory map.

def act_on(command, d):
    raise TypeError("Cannot act on", command)

def act_on_set(st, d):
    d[st.key] = st.value

def act_on_delete(dlt, d):
    del d[dlt.key]

d = {}
act_on(Set(key='name', value='woohoo'), d)
print("After setting")
act_on(Delete(key='name'), d)
print("After deleting")
After setting
{'name': 'woohoo'}
After deleting

In this case, we kept the functionality "near" the code. However, note that the functionality could be implemented in a different module: these functions, even though they are polymorphic, follow Python namespace rules. This is useful: several different modules could implement "act_on": for example, an in-memory map (as we defined above), a module using Redis or a module using a SQL database.

Actual methods are not completely obsolete. It would still be best to make methods do anything that would require private attribute access. In simple cases, as above, there is no difference between the public interface and the public implementation.

15 Sep 2016 6:03am GMT

09 Sep 2016

feedPlanet Twisted

Itamar Turner-Trauring: How to choose a side project

If you're a programmer just starting out you'll often get told to work on side projects, beyond what you do at school or work. But there are so many things you could be doing: what should you be working on? How do you choose a side project you will actually finish? How will you make sure you're learning something?

Keep in mind that you don't actually have to work on side projects to be a good programmer. I know many successful software engineers who code only at their job and spend their free time on other hobbies. But if you do want to work on software in your spare time there are two different approaches you can take.

To understand these approaches let's consider a real side project that managed to simultaneously both succeed and fail.

Long ago, in an Internet far far away

Back in 2000 my friend Glyph started a small project called Twisted Reality. It was supposed to be a game engine, with the goal of implementing a particularly complex and sophisticated game.

Since the game had a chat system, and web server, and other means of communication the game grew a networking engine. Glyph and his friends hung out on the Internet Relay Chat (IRC) Python channel and every time someone asked a networking question they'd tell them "use Twisted Reality!" Over time more people would show up needing a small feature added to the networking engine, so they'd submit a patch. That's how I became a Twisted Reality contributor.

Eventually the networking engine grew so big that Twisted Reality was split into two projects: the Twisted networking framework and the Reality game engine. These days Twisted is used by companies like Apple, Cisco and Yelp, and is still going strong. The game engine has been through multiple rewrites, but the game it was designed for has never been built.

Approach #1: solving a problem

The difference between Twisted, a successful side project, and the game that never got written is that Twisted solved a specific, limited problem. If you need to write some networking code in Python then Twisted will help you get it done quickly and well. The game, however, was so ambitious that it was never started: there was always another simulation feature to be added to the game engine first.

If you are building a side project choose one that solves a specific, limited problem. For example, let's say you feel you're wasting time playing on Facebook when you should be doing homework.

  1. "Build the best time tracking app ever" is neither limited nor specific, nor is it really a problem you're solving.
  2. "I want to keep track of how much time I spend actually working on homework vs. procrastinating" is better, but still not quite problem-driven.
  3. A good problem statement is "I want to prevent myself from visiting Facebook and other specific websites while I'm working on homework." At this point you have a clear sense of what software you're building.

Why a specific and limited problem?

Approach #2: artificial limits

How do you choose a side project if you don't have any specific problems in mind? The key is to still have constraints and limits so that your project is small, achievable and has a clear goal.

One great way to do that is to set a time limit. I'm not a fan of hackathons, since they promote the idea that sleeplessness and working crazy hours is a reasonable way to write software. But with a longer time frame building something specific with a time limit can be a great way to create a side project.

The PyWeek project for example has you build a game in one week, using a theme chosen by the organizers. Building a game isn't solving a problem, but it can still be fun and educational. And the one week limit will ensure you focus your efforts and achieve something concrete.

Software has no value

Whether you decide to solve a problem or to set artificial time limits on your side project, the key is having constraints and a clear goal. Software is just a tool, there is no inherent value in producing more; value is produced by solving problems or the entertainment value of a game. A half-solved problem or a half-finished game are valueless, so you want your initial goal to be small and constrained.

I've learned this the hard way, focusing on the value of my code instead of on the problems it solved. If you want to avoid that and other mistakes I've made over 20 years of writing software check out my career as a Software Clown.

09 Sep 2016 4:00am GMT

28 Aug 2016

feedPlanet Twisted

Twisted Matrix Laboratories: Twisted 16.4.0 Released

On behalf of Twisted Matrix Laboratories, I am honoured to announce the release of Twisted 16.4.0.

The highlights of this release are:

For more information, check the NEWS file (link provided below).

You can find the downloads on PyPI (or alternatively our website). The NEWS file is also available on GitHub.

Many thanks to everyone who had a part in this release - the supporters of the Twisted Software Foundation, the developers who contributed code as well as documentation, and all the people building great things with Twisted!

Twisted Regards,
Amber Brown (HawkOwl)

PS: Twisted 16.4.1 will be coming soon after this with a patch mitigating SWEET32, by updating the acceptable cipher list.

28 Aug 2016 1:48am GMT

25 Aug 2016

feedPlanet Twisted

Itamar Turner-Trauring: From 10x programmer to 0.1x programmer: creating more with less

You've heard of the mythical 10x programmers, programmers who can produce ten times as much as us normal humans. If you want to become a better programmer this myth is demoralizing, but it's also not useful: how can you write ten times as much code? On the other hand, consider the 0.1x programmer, a much more useful concept: anyone can choose to write only 10% code as much code as a normal programmer would. As they say in the business world, becoming a 0.1x programmer is actionable.

Of course writing less code might seem problematic, so let's refine our goal a little. Can you write 10% as much code as you do now and still do just as well at your job, still fixing the same amount of bugs, still implementing the same amount of features? The answer may still be "no", but at least this is a goal you can more easily work towards incrementally.

Doing more with less code

How do you do achieve just as much while writing less code?

1. Use a higher level programming language

As it turns out many of us are 0.1x programmers without even trying, compared to previous generations of programmers that were stuck with lower-level programming languages. If you don't have to worry about manual memory management or creating a data structure from scratch you can write much less code to achieve the same goal.

2. Use existing code

Instead of coding from scratch, use an existing library that achieves the same thing. For example, earlier this week I was looking at the problem of incrementing version numbers in source code and documentation as part of a release. A little searching and I found an open source tool that did exactly what I needed. Because it's been used by many people and improved over time chances are it's better designed, better tested, and less buggy than my first attempt would have been.

3. Spend some time thinking

Surprisingly spending more time planning up front can save you time in the long run. If you have 2 days to fix a bug it's worth spending 10% of that time, an hour and half, to think about how to solve it. Chances are the first solution you come up with in the first 5 minutes won't be the best solution, especially if it's a hard problem. Spend an hour more thinking and you might come up with a solution that takes two hours instead of two days.

4. Good enough features

Most feature requests have three parts:

  1. The stuff the customer must have.
  2. The stuff that is nice to have but not strictly necessary.
  3. The stuff the customer is willing to admit is not necessary.

The last category is usually dropped in advance, but you're usually still asked to implement the middle category of things that the customer and product manager really really want but aren't actually strictly necessary. So figure out the real minimum path to implement a feature, deliver it, and much of the time it'll turn out that no one will miss those nice-to-have additions.

5. Drop the feature altogether

Some features don't need to be done at all. Some features are better done a completely different way than requested.

Instead of saying "yes, I'll do that" to every feature request, make sure you understand why someone needs the feature, and always consider alternatives. If you come up with a faster, superior idea the customer or product manager will usually be happy to go along with your suggestion.

6. Drop the product altogether

Sometimes your whole product is not worth doing: it will have no customers, will garner no interest. Spending months and months on a product no one will ever use is a waste of time, not to mention depressing.

Lean Startup is one methodology for dealing with this: before you spend any time developing the product you do the minimal work possible to figure out if it's worth doing in the first place.


Your goal as programmer is not to write code, your goal is to solve problems. From low-level programming decisions to high-level business decisions there are many ways you can solve problems with less code. So don't start with "how do I write this code?", start with "how do I solve this problem?" Sometimes you'll do better not solving the problem at all, or redefining it. As you get better at solving problems with less code you will find yourself becoming more productive, especially if you start looking at the big picture.

Being productive is a great help if you're tired of working crazy hours. Want a shorter workweek? Check out The Programmer's Guide to a Sane Workweek.

25 Aug 2016 4:00am GMT

Moshe Zadka: Time Series Data

When operating computers, we are often exposed to so-called "time series". Whether it is database latency, page fault rate or total memory used, these are all exposed as numbers that are usually sampled at frequent intervals.

However, not only computer engineers are exposed to such data. It is worthwhile to know what other disciplines are exposed to such data, and what they do with it. "Earth sciences" (geology, climate, etc.) have a lot of numbers, and often need to analyze trends and make predictions. Sometimes these predictions have, literally, billions dollars' worth of decision hinging on them. It is worthwhile to read some of the textbooks for students of those disciplines to see how to approach those series.

Another discipline that needs to visually inspect time series data is physicians. EKG data is often vital to analyze patients' health - and especially when compared to their historical records. For that, that data needs to be saved. A lot of EKG research has been done on how to compress numerical data, but still keep it "visually the same". While the research on that is not as rigorous, and not as settled, as the trend analysis in geology, it is still useful to look into. Indeed, even the basics are already better than so-called "roll-ups", which preserve none of the visual distinction of the data, flattening peaks and filling hills while keeping a score of "standard deviation" that is not as helpful as is usually hoped for.

25 Aug 2016 3:50am GMT