22 Aug 2016

feedPlanet Twisted

Hynek Schlawack: Better Python Object Serialization

The Python standard library is full of underappreciated gems. One of them allows for simple and elegant function dispatching based on argument types. This makes it perfect for serialization of arbitrary objects - for example to JSON in web APIs and structured logs.

22 Aug 2016 12:30pm GMT

20 Aug 2016

feedPlanet Twisted

Moshe Zadka: Extension API: An exercise in a negative case study

I was idly contemplating implementing a new Jupyter kernel. Luckily, they try to provide facility to make it easier. Unfortunately, they made a number of suboptimal choices in their API. Fortunately, those mistakes are both common and easily avoidable.

Subclassing as API

They suggest subclassing IPython.kernel.zmq.kernelbase.Kernel. Errr…not "suggest". It is a "required step". The reason is probably that this class already implements 21 methods. When you subclass, make sure to not use any of these names, or things will break randomly. If you do not want to subclass, good luck figuring out what the assumption that the system makes about these 21 methods because there is no interface or even prose documentation.

The return statement in their example is particularly illuminating:

        return {'status': 'ok',
                # The base class increments the execution count
                'execution_count': self.execution_count,
                'payload': [],
                'user_expressions': {},
               }

Note the comment "base class increments the execution count". This is a classic code smell: this seems like this would be needed in every single overrider, which means it really belongs in the helper class, not in every kernel.

None

The signature for the example do_execute is:

    def do_execute(self, code, silent, store_history=True, 
                   user_expressions=None,
                   allow_stdin=False):

Of course, this means that user_expressions will sometimes be a dictionary and sometimes None. It is likely that the code will be written to anticipate one or the other, and will fail in interesting ways if None is actually sent.

Optional Overrides

As described in this section there are also ways to make the kernel better with optional overrides. The convention used, which is nowhere explained, is that do_ methods mean you should override to make a better kernel. Nowhere it is explained why there is no default history implementation, or where to get one, or why a simple stupid implementation is wrong.

Dictionaries

All overrides return dictionaries, which get serialized directly into the underlying communication platform. This is a poor abstraction, especially when the documentation is direct links to the underlying protocol. When wrapping a protocol, it is much nicer to use an Interface as the documentation of what is assumed - and define an attr.s-based class to allow returning something which is automatically the correct type, and will fail in nice ways if a parameter is forgotten.

Summary

If you are providing an API, here are a few positive lessons based on the issues above:

20 Aug 2016 6:56pm GMT

19 Aug 2016

feedPlanet Twisted

Twisted Matrix Laboratories: Twisted 16.3.2 Released

On behalf of Twisted Matrix Laboratories, I am honoured to announce the release of Twisted 16.3.2.

This is a bug fix & security fix release, and is recommended for all users of Twisted. The fixes are:

For more information, check the NEWS file (link provided below).

You can find the downloads on PyPI (or alternatively our website). The NEWS file is also available at on GitHub.

Many thanks to everyone who had a part in this release - the supporters of the Twisted Software Foundation, the developers who contributed code as well as documentation, and all the people building great things with Twisted!

Twisted Regards,
Amber Brown (HawkOwl)

19 Aug 2016 9:45am GMT

18 Aug 2016

feedPlanet Twisted

Jonathan Lange: Patterns are half-formed code

If "technology is stuff that doesn't work yet"[1], then patterns are code we don't know how to write yet.

In the Go Programming Language, the authors show how to iterate over elements in a map, sorted by keys:

To enumerate the key/value pairs in order, we must sort the keys explicitly, for instances, using the Strings function from the sort package if the keys are strings. This is a common pattern.

-Go Programming Language, Alan A. A. Donovan & Brian W. Kernighan, p94

The pattern is illustrated by the following code:

import "sort"

var names []string
for name := range ages {
    name = append(names, name)
}
sort.Strings(names)
for _, name := range names {
    fmt.Printf("%s\t%d\n", name, ages[name])
}

Peter Norvig calls this an informal design pattern: something referred to by name ("iterate through items in a map in order of keys") and re-implemented from scratch each time it's needed.

Informal patterns have their place but they are a larval form of knowledge, stuck halfway between intuition and formal understanding. When we see a recognize a pattern, our next step should always be to ask, "can we make it go away?"

Patterns are one way of expressing "how to" knowledge [2] but we have another, better way: code. Source code is a formal expression of "how to" knowledge that we can execute, test, manipulate, verify, compose, and re-use. Encoding "how to" knowledge is largely what programming is [3]. We talk about replacing people with programs precisely because we take the knowledge about how to do their job and encode it such that even a machine can understand it.

So how can we encode the knowledge of iterating through the items in a map in order of keys? How can we replace this pattern with code?

We can start by following Peter Norvig's example and reach for a dynamic programming language, such as Python:

names = []
for name in ages:
    names.append(name)
names.sort()
for name in names:
    print("{}\t{}".format(name, ages[name]))

This is a very literal translation of the first snippet. A more idiomatic approach would look like:

names = sorted(ages.keys())
for name in names:
    print("{}\t{}".format(name, ages[name])

To turn this into a formal pattern, we need to extract a function that takes a map and returns a list of pairs of (key, value) in sorted order, like so:

def sorted_items(d):
    result = []
    sorted_keys = sorted(d.keys())
    for k in sorted_keys:
        result.append((k, d[k]))
    return result

for name, age in sorted_items(ages):
    print("{}\t{}".format(name, age))

The pattern has become a function. Instead of a name or a description, it has an identifier, a True Name that gives us power over the thing. When we invoke it we don't need to comment our code to indicate that we are using a pattern because the name sorted_items makes it clear. If we choose, we can test it, optimize it, or perhaps even prove its correctness.

If we figure out a better way of doing it, such as:

def sorted_items(d):
    return [(k, d[k]) for k in sorted(d.keys())]

Then we only have to change one place.

And if we are willing to tolerate a slight change in behavior,

def sorted_items(d):
    return sorted(d.items())

Then we might not need the function at all.

It was being able to write code like this that drew me towards Python and away from Java, way back in 2001. It wasn't just that I could get more done in fewer lines-although that helped-it was that I could write what I meant.

Of course, these days I'd much rather write:

import Data.List (sort)
import qualified Data.HashMap as Map

sortedItems :: (Ord k, Ord v) => Map.Map k v -> [(k, v)]
sortedItems d = sort (Map.toList d)

But that's another story.

[1] Bran Ferren, via Douglas Adams
[2] Patterns can also contain "when to", "why to", "why not to", and "how much" knowledge, but they _always_ contain "how to" knowledge.
[3] The excellent SICP lectures open with the insight that what we call "computer science" might be the very beginning of a science of "how to" knowledge.

18 Aug 2016 5:00pm GMT

Itamar Turner-Trauring: Less stress, more productivity: why working fewer hours is better for you and your employer

There's always too much work to be done on software projects, too many features to implement, too many bugs to fix. Some days you're just not going through the backlog fast enough, you're not producing enough code, and it's taking too long to fix a seemingly-impossible bug. And to make things worse you're wasting time in pointless meetings instead of getting work done.

Once it gets bad enough you can find yourself always scrambling, working overtime just to keep up. Pretty soon it's just expected, and you need to be available to answer emails at all hours even when there are no emergencies. You're tired and burnt out and there's still just as much work as before.

The real solution is not working even harder or even longer, but rather the complete opposite: working fewer hours.

Some caveats first:

Fewer hours, more productivity

Why does working longer hours not improve the situation? Because working longer makes you less productive at the same time that it encourages bad practices by your boss. Working fewer hours does the opposite.

1. A shorter work-week improves your ability to focus

As I've discussed before, working while tired is counter-productive. It takes longer and longer to solve problems, and you very quickly hit the point of diminishing returns. And working consistently for long hours is even worse for your mental focus, since you will quickly burn out.

Long hours: "It's 5 o'clock and I should be done with work, but I just need to finish this problem, just one more try," you tell yourself. But being tired it actually takes you another three hours to solve. The next day you go to work tired and unfocused.

Shorter hours: "It's 5 o'clock and I wish I had this fixed, but I guess I'll try tomorrow morning." The next morning, refreshed, you solve the problem in 10 minutes.

2. A shorter work-week promotes smarter solutions

Working longer hours encourages bad programming habits: you start thinking that the way to solve problems is just forcing yourself to get through the work. But programming is all about automation, about building abstractions to reduce work. Often you can get huge reductions in effort by figuring out a better way to implement an API, or that a particular piece of functionality is not actually necessary.

Let's imagine your boss hands you a task that must ship to your customer in 2 weeks. And you estimate that optimistically it will take you 3 weeks to implement.

Long hours: "This needs to ship in two weeks, but I think it's 120 hours to complete... so I guess I'm working evenings and weekends again." You end up even more burnt out, and probably the feature will still ship late.

Shorter hours: "I've got two weeks, but this is way too much work. What can I do to reduce the scope? Guess I'll spend a couple hours thinking about it."

And soon: "Oh, if I do this restructuring I can get 80% of the feature done in one week, and that'll probably keep the customer happy until I finish the rest. And even if I underestimated I've still got the second week to get that part done."

3. A shorter work-week discourages bad management practices

If your response to any issue is to work longer hours you are encouraging bad management practices. You are effectively telling your manager that your time is not valuable, and that they need not prioritize accordingly.

Long hours: If your manager isn't sure whether you should go to a meeting, they might tell themselves that "it might waste an hour of time, but they'll just work an extra hour in the evening to make it up." If your manager can't decide between two features, they'll just hand you both instead of making a hard decision.

Shorter hours: With shorter hours your time becomes more scarce and valuable. If your manager is at all reasonable less important meetings will get skipped and more important features will be prioritized.

Getting to fewer hours

A short work-week mean different things to different people. One programmer I know made clear when she started a job at a startup that she worked 40-45 hours a week and that's it. Everyone else worked much longer hours, but that was her personal limit. Personally I have negotiated a 35-hour work week.

Whatever the number that makes sense to you, the key is to clearly explain your limits and then stick to them. Tell you manager "I am going to be working a 40-hour work week, unless it's a real emergency." Once you've explained your limits you need to stick to them: no answering emails after hours, no agreeing to do just one little thing on the weekend.

And then you need to prove yourself by still being productive, and making sure that when you are working you are working. Spending a couple hours a day at work watching cat videos probably won't go well with shorter hours.

There are companies where this won't fly, of course, where management is so bad or norms are so out of whack that even a 40-hour work week by a productive team member won't be acceptable. In those cases you need to look for a new job, and as part of the interview figure out the work culture and project management practices of prospective employers. Do people work short hours or long hours? Is everything always on fire or do projects get delivered on time?

Whether you're negotiating your hours at your existing job or at a new job, you'll do better the more experienced and skilled of a programmer you are. These days I have enough provable skills that I can do OK at negotiating, but it's taken learning from a lot of mistakes to get there. If you want to improve you skills and learn faster than I did, head over to Software Clown where you can hear about my past mistakes and how you can avoid them.

18 Aug 2016 4:00am GMT

17 Aug 2016

feedPlanet Twisted

Glyph Lefkowitz

Probably best to get this out of the way before this weekend:

If I meet you at a technical conference, you'll probably see me extend my elbow in your direction, rather than my hand. This is because I won't shake your hand at a conference.

People sometimes joke about "con crud", but the amount of lost productivity and human misery generated by conference-transmitted sickness is not funny. Personally, by the time the year is out, I will most likely have attended 5 conferences. This means that if I get sick at each one, I will spend more than a month out of the year out of commission being sick.

When I tell people this, they think I'm a germophobe. But, in all likelihood, I won't be the one getting sick. I already have 10 years of building up herd immunity to the set of minor ailments that afflict the international Python-conference-attending community. It's true that I don't particularly want to get sick myself, but I happily shake people's hands in more moderately-sized social gatherings. I've had a cold before and I've had one again; I have no illusion that ritually dousing myself in Purell every day will make me immune to all disease.

I'm not shaking your hand because I don't want you to get sick. Please don't be weird about it!

17 Aug 2016 6:42pm GMT

14 Aug 2016

feedPlanet Twisted

Glyph Lefkowitz: A Container Is A Function Call

It seems to me that the prevailing mental model among users of container technology1 right now is that a container is a tiny little virtual machine. It's like a machine in the sense that it is provisioned and deprovisioned by explicit decisions, and we talk about "booting" containers. We configure it sort of like we configure a machine; dropping a bunch of files into a volume, setting some environment variables.

In my mind though, a container is something fundamentally different than a VM. Rather than coming from the perspective of "let's take a VM and make it smaller so we can do cool stuff" - get rid of the kernel, get rid of fixed memory allocations, get rid of emulated memory access and instructions, so we can provision more of them at higher density... I'm coming at it from the opposite direction.

For me, containers are "let's take a program and made it bigger so we can do cool stuff". Let's add in the whole user-space filesystem so it's got all the same bits every time, so we don't need to worry about library management, so we can ship it around from computer to computer as a self-contained unit. Awesome!

Of course, there are other ecosystems that figured this out a really long time ago, but having it as a commodity within the most popular server deployment environment has changed things.

Of course, an individual container isn't a whole program. That's why we need tools like compose to put containers together into a functioning whole. This makes a container not just a program, but rather, a part of a program. And of course, we all know what the smaller parts of a program are called:

Functions.2

A container of course is not the function itself; the image is the function. A container itself is a function call.

Perceived through this lens, it becomes apparent that Docker is missing some pretty important information. As a tiny VM, it has all the parts you need: it has an operating system (in the docker build) the ability to boot and reboot (docker run), instrumentation (docker inspect) debugging (docker exec) etc. As a really big function, it's strangely anemic.

Specifically: in every programming language worth its salt, we have a type system; some mechanism to identify what parameters a function will take, and what return value it will have.

You might find this weird coming from a Python person, a language where

1
2
def foo(a, b, c):
    return a.x(c.d(b))

is considered an acceptable level of type documentation by some3; there's no requirement to say what a, b, and c are. However, just because the type system is implicit, that doesn't mean it's not there, even in the text of the program. Let's consider, from reading this tiny example, what we can discover:

And so on, and so on. At runtime each of these types takes on a specific, concrete value, with a type, and if you set a breakpoint and single-step into it with a debugger, you can see each of those types very easily. Also at runtime you will get TypeError exceptions telling you exactly what was wrong with what you tried to do at a number of points, if you make a mistake.

The analogy to containers isn't exact; inputs and outputs aren't obviously in the shape of "arguments" and "return values", especially since containers tend to be long-running; but nevertheless, a container does have inputs and outputs in the form of env vars, network services, and volumes.

Let's consider the "foo" of docker, which would be the middle tier of a 3-tier web application (cribbed from a real live example):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
FROM pypy:2
RUN apt-get update -ym
RUN apt-get upgrade -ym
RUN apt-get install -ym libssl-dev libffi-dev
RUN pip install virtualenv
RUN mkdir -p /code/env
RUN virtualenv /code/env
RUN pwd

COPY requirements.txt /code/requirements.txt
RUN /code/env/bin/pip install -r /code/requirements.txt
COPY main /code/main
RUN chmod a+x /code/main

VOLUME /clf
VOLUME /site
VOLUME /etc/ssl/private

ENTRYPOINT ["/code/main"]

In this file, we can only see three inputs, which are filesystem locations: /clf, /site, and /etc/ssl/private. How is this different than our Python example, a language with supposedly "no type information"?

Of course, the one way that this example is unrealistic is that I deleted all the comments explaining all of those things. Indeed, best practice these days would be to include comments in your Dockerfiles, and include example compose files in your repository, to give users some hint as to how these things all wire together.

This sort of state isn't entirely uncommon in programming languages. In fact, in this popular GitHub project you can see that large programs written in assembler in the 1960s included exactly this sort of documentation convention: huge front-matter comments in English prose.

That is the current state of the container ecosystem. We are at the "late '60s assembly language" stage of orchestration development. It would be a huge technological leap forward to be able to communicate our intent structurally.


When you're building an image, you're building it for a particular purpose. You already pretty much know what you're trying to do and what you're going to need to do it.

  1. When instantiated, the image is going to consume network services. This is not just a matter of hostnames and TCP ports; those services need to be providing a specific service, over a specific protocol. A generic reverse proxy might be able to handle an arbitrary HTTP endpoint, but an API client needs that specific API. A database admin tool might be OK with just "it's a database" but an application needs a particular schema.
  2. It's going to consume environment variables. But not just any variables; the variables have to be in a particular format.
  3. It's going to consume volumes. The volumes need to contain data in a particular format, readable and writable by a particular UID.
  4. It's also going to produce all of these things; it may listen on a network service port, provision a database schema, or emit some text that needs to be fed back into an environment variable elsewhere.

Here's a brief sketch of what I want to see in a Dockerfile to allow me to express this sort of thing:

1
2
3
4
5
6
7
8
9
FROM ...
RUN ...

LISTENS ON: TCP:80 FOR: org.ietf.http/com.example.my-application-api
CONNECTS TO: pgwritemaster.internal ON: TCP:5432 FOR: org.postgresql.db/com.example.my-app-schema
CONNECTS TO: {{ETCD_HOST}} ON: TCP:{{ETCD_PORT}} FOR: com.coreos.etcd/client-communication
ENVIRONMENT NEEDS: ETCD_HOST FORMAT: HOST(com.coreos.etcd/client-communication)
ENVIRONMENT NEEDS: ETCD_PORT FORMAT: PORT(com.coreos.etcd/client-communication)
VOLUME AT: /logs FORMAT: org.w3.clf REQUIRES: WRITE UID: 4321

An image thusly built would refuse to run unless:

There are probably a lot of flaws in the specific syntax here, but I hope you can see past that, to the broader point that the software inside a container has precise expectations of its environment, and that we presently have no way of communicating those expectations beyond writing a Melvilleian essay in each Dockerfile comments, beseeching those who would run the image to give it what it needs.


Why bother with this sort of work, if all the image can do with it is "refuse to run"?

First and foremost, today, the image effectively won't run. Oh, it'll start up, and it'll consume some resources, but it will break when you try to do anything with it. What this metadata will allow the container runtime to do is to tell you why the image didn't run, and give you specific, actionable, fast feedback about what you need to do in order to fix the problem. You won't have to go groveling through logs; which is always especially hard if the back-end service you forgot to properly connect to was the log aggregation service. So this will be an order of magnitude speed improvement on initial deployments and development-environment setups for utility containers. Whole applications typically already come with a compose file, of course, but ideally applications would be built out of functioning self-contained pieces and not assembled one custom container at a time.

Secondly, if there were a strong tooling standard for providing this metadata within the image itself, it might become possible for infrastructure service providers (like, ahem, my employer) to automatically detect and satisfy service dependencies. Right now, if you have a database as a service that lives outside the container system in production, but within the container system in development and test, there's no way for the orchestration layer to say "good news, everyone! you can find the database you need here: ...".

My main interest is in allowing open source software developers to give service operators exactly what they need, so the upstream developers can get useful bug reports. There's a constant tension where volunteer software developers find themselves fielding bug reports where someone deployed their code in a weird way, hacked it up to support some strange environment, built a derived container that had all kinds of extra junk in it to support service discovery or logging or somesuch, and so they don't want to deal with the support load that that generates. Both people in that exchange are behaving reasonably. The developers gave the ops folks a container that runs their software to the best of their abilities. The service vendors made the minimal modifications they needed to have the container become a part of their service fabric. Yet we arrive at a scenario where nobody feels responsible for the resulting artifact.

If we could just say what it is that the container needs in order to really work, in a way which was precise and machine-readable, then it would be clear where the responsibility lies. Service providers could just run the container unmodified, and they'd know very clearly whether or not they'd satisfied its runtime requirements. Open source developers - or even commercial service vendors! - could say very clearly what they expected to be passed in, and when they got bug reports, they'd know exactly how their service should have behaved.


  1. which mostly but not entirely just means "docker"; it's weird, of course, because there are pieces that docker depends on and tools that build upon docker which are part of this, but docker remains the nexus.

  2. Yes yes, I know that they're not really functions Tristan, they're subroutines, but that's the word people use for "subroutines" nowadays.

  3. Just to be clear: no it isn't. Write a damn docstring, or at least some type annotations.

14 Aug 2016 10:22pm GMT

Glyph Lefkowitz: Python Packaging Is Good Now

Okay folks. Time's up. It's too late to say that Python's packaging ecosystem terrible any more. I'm calling it.

Python packaging is not bad any more. If you're a developer, and you're trying to create or consume Python libraries, it can be a tractable, even pleasant experience.

I need to say this, because for a long time, Python's packaging toolchain was … problematic. It isn't any more, but a lot of people still seem to think that it is, so it's time to set the record straight.

If you're not familiar with the history it went something like this:

The Dawn

Python first shipped in an era when adding a dependency meant a veritable Odyssey into cyberspace. First, you'd wait until nobody in your whole family was using the phone line. Then you'd dial your ISP. Once you'd finished fighting your SLIP or PPP client, you'd ask a netnews group if anyone knew of a good gopher site to find a library that could solve your problem. Once you were done with that task, you'd sign off the Internet for the night, and wait about 48 hours too see if anyone responded. If you were lucky enough to get a reply, you'd set up a download at the end of your night's web-surfing.

pip search it wasn't.

For the time, Python's approach to dependency-handling was incredibly forward-looking. The import statement, and the pluggable module import system, made it easy to get dependencies from wherever made sense.

In Python 2.01, Distutils was introduced. This let Python developers describe their collections of modules abstractly, and added tool support to producing redistributable collections of modules and packages. Again, this was tremendously forward-looking, if somewhat primitive; there was very little to compare it to at the time.

Fast forwarding to 2004; setuptools was created to address some of the increasingly-common tasks that open source software maintainers were facing with distributing their modules over the internet. In 2005, it added easy_install, in order to provide a tool to automate resolving dependencies and downloading them into the right locations.

The Dark Age

Unfortunately, in addition to providing basic utilities for expressing dependencies, setuptools also dragged in a tremendous amount of complexity. Its author felt that import should do something slightly different than what it does, so installing setuptools changed it. The main difference between normal import and setuptools import was that it facilitated having multiple different versions of the same library in the same program at the same time. It turns out that that's a dumb idea, but in fairness, it wasn't entirely clear at the time, and it is certainly useful (and necessary!) to be able to have multiple versions of a library installed onto a computer at the same time.

In addition to these idiosyncratic departures from standard Python semantics, setuptools suffered from being unmaintained. It became a critical part of the Python ecosystem at the same time as the author was moving on to other projects entirely outside of programming. No-one could agree on who the new maintainers should be for a long period of time. The project was forked, and many operating systems' packaging toolchains calcified around a buggy, ancient version.

From 2008 to 2012 or so, Python packaging was a total mess. It was painful to use. It was not clear which libraries or tools to use, which ones were worth investing in or learning. Doing things the simple way was too tedious, and doing things the automated way involved lots of poorly-documented workarounds and inscrutable failure modes.

This is to say nothing of the fact that there were critical security flaws in various parts of this toolchain. There was no practical way to package and upload Python packages in such a way that users didn't need a full compiler toolchain for their platform.

To make matters worse for the popular perception of Python's packaging prowess2, at this same time, newer languages and environments were getting a lot of buzz, ones that had packaging built in at the very beginning and had a much better binary distribution story. These environments learned lessons from the screw-ups of Python and Perl, and really got a lot of things right from the start.

Finally, the Python Package Index, the site which hosts all the open source packages uploaded by the Python community, was basically a proof-of-concept that went live way too early, had almost no operational resources, and was offline all the dang time.

Things were looking pretty bad for Python.


Intermission

Here is where we get to the point of this post - this is where popular opinion about Python packaging is stuck. Outdated information from this period abounds. Blog posts complaining about problems score high in web searches. Those who used Python during this time, but have now moved on to some other language, frequently scoff and dismiss Python as impossible to package, its packaging ecosystem as broken, PyPI as down all the time, and so on. Worst of all, bad advice for workarounds which are no longer necessary are still easy to find, which causes users to pre-emptively break their environments where they really don't need to.


From The Ashes

In the midst of all this brokenness, there were some who were heroically, quietly, slowly fixing the mess, one gnarly bug-report at a time. pip was started, and its various maintainers fixed much of easy_install's overcomplexity and many of its flaws. Donald Stufft stepped in both on Pip and PyPI and improved the availability of the systems it depended upon, as well as some pretty serious vulnerabilities in the tool itself. Daniel Holth wrote a PEP for the wheel format, which allows for binary redistribution of libraries. In other words, it lets authors of packages which need a C compiler to build give their users a way to not have one.

In 2013, setuptools and distribute un-forked, providing a path forward for operating system vendors to start updating their installations and allowing users to use something modern.

Python Core started distributing the ensurepip module along with both Python 2.7 and 3.3, allowing any user with a recent Python installed to quickly bootstrap into a sensible Python development environment with a one-liner.

A New Renaissance

I won't give you a full run-down of the state of the packaging art. There's already a website for that. I will, however, give you a précis of how much easier it is to get started nowadays. Today, if you want to get a sensible, up-to-date python development environment, without administrative privileges, all you have to do is:

1
2
3
$ python -m ensurepip --user
$ python -m pip install --user --upgrade pip
$ python -m pip install --user --upgrade virtualenv

Then, for each project you want to do, make a new virtualenv:

1
2
3
$ python -m virtualenv lets-go
$ . ./lets-go/bin/activate
(lets-go) $ _

From here on out, now the world is your oyster; you can pip install to your heart's content, and you probably won't even need to compile any C for most packages. These instructions don't depend on Python version, either: as long as it's up-to-date, the same steps work on Python 2, Python 3, PyPy and even Jython. In fact, often the ensurepip step isn't even necessary since pip comes preinstalled. Running it if it's unnecessary is harmless, even!

Other, more advanced packaging operations are much simpler than they used to be, too.

Okay that last one's not as obvious as it ought to be but they did at least make it freely available!

Importantly, PyPI will almost certainly be online. Not only that, but a new, revamped site will be "launching" any day now3.

Again, this isn't a comprehensive resource; I just want to give you an idea of what's possible. But, as a deeply experienced Python expert I used to swear at these tools six times a day for years; the most serious Python packaging issue I've had this year to date was fixed by cleaning up my git repo to delete a cache file.

Work Still To Do

While the current situation is good, it's still not great.

Here are just a few of my desiderata:

I could go on. There are lots of ways that Python packaging could be better.

The Bottom Line

The real takeaway here though, is that although it's still not perfect, other languages are no longer doing appreciably better. Go is still working through a number of different options regarding dependency management and vendoring, and, like Python extensions that require C dependencies, CGo is sometimes necessary and always a problem. Node has had its own well-publicized problems with their dependency management culture and package manager. Hackage is cool and all but everything takes a literal geological epoch to compile.

As always, I'm sure none of this applies to Rust and Cargo is basically perfect, but that doesn't matter, because nobody reading this is actually using Rust.

My point is not that packaging in any of these languages is particularly bad. They're all actually doing pretty well, especially compared to the state of the general programming ecosystem a few years ago; many of them are making regular progress towards user-facing improvements.

My point is that any commentary suggesting they're meaningfully better than Python at this point is probably just out of date. Working with Python packaging is more or less fine right now. It could be better, but lots of people are working on improving it, and the structural problems that prevented those improvements from being adopted by the community in a timely manner have almost all been addressed.

Go! Make some virtualenvs! Hack some setup.pys! If it's been a while and your last experience was really miserable, I promise, it's better now.


Am I wrong? Did I screw up a detail of your favorite language? Did I forget to mention the one language environment that has a completely perfect, flawless packaging story? Do you feel the need to just yell at a stranger on the Internet about picayune details? Feel free to get in touch!


  1. released in October, 2000

  2. say that five times fast.

  3. although I'm not sure what it means to "launch" when the site is online, and running against the production data-store, and you can use it for pretty much everything...

  4. "app" meaning of course "docker container"

14 Aug 2016 9:17am GMT

Glyph Lefkowitz: What’s In A Name

Amber's excellent lightning talk on identity yesterday made me feel many feels, and reminded me of this excellent post by Patrick McKenzie about false assumptions regarding names.

While that list is helpful, it's very light on positively-framed advice, i.e. "you should" rather than "you shouldn't". So I feel like I want to give a little bit of specific, prescriptive advice to programmers who might need to deal with names.

First and foremost: stop asking for unnecessary information. If I'm just authenticating to your system to download a comic book, you do not need to know my name. Your payment provider might need a billing address, but you absolutely do not need to store my name.

Okay, okay. I understand that may make your system seem a little impersonal, and you want to be able to greet me, or maybe have a name to show to other users beyond my login ID or email address that has to be unique on the site. Fine. Here's what a good "name" field looks like:


You don't need to break my name down into parts. If you just need a way to refer to me, then let me tell you whatever the heck I want. Honorific? Maybe I have more than one; maybe I don't want you to use any.

And this brings me to "first name / last name".

In most cases, you should not use these terms. They are oversimplifications of how names work, appropriate only for children in English-speaking countries who might not understand the subtleties involved and only need to know that one name comes before the other.

The terms you're looking for are given name and surname, or perhaps family name. ("Middle name" might still be an appropriate term because that fills a more specific role.) But by using these more semantically useful terms, you include orders of magnitude more names in your normalization scheme. More importantly, by acknowledging the roles of the different parts of a name, you'll come to realize that there are many other types of name, such as:

If your application does have a legitimate need to normalize names, for example, to interoperate with third-party databases, or to fulfill some regulatory requirement:

If you're a programmer and you're balking at this complexity: good. Remember that for most systems, sticking with the first option - treating users' names as totally opaque text - is probably your best bet. You probably don't need to know the structure of the user's name for most purposes.

14 Aug 2016 12:48am GMT

12 Aug 2016

feedPlanet Twisted

Glyph Lefkowitz: The One Python Library Everyone Needs

Do you write programs in Python? You should be using attrs.

Why, you ask? Don't ask. Just use it.

Okay, fine. Let me back up.

I love Python; it's been my primary programming language for 10+ years and despite a number of interesting developments in the interim I have no plans to switch to anything else.

But Python is not without its problems. In some cases it encourages you to do the wrong thing. Particularly, there is a deeply unfortunate proliferation of class inheritance and the God-object anti-pattern in many libraries.

One cause for this might be that Python is a highly accessible language, so less experienced programmers make mistakes that they then have to live with forever.

But I think that perhaps a more significant reason is the fact that Python sometimes punishes you for trying to do the right thing.

The "right thing" in the context of object design is to make lots of small, self-contained classes that do one thing and do it well. For example, if you notice your object is starting to accrue a lot of private methods, perhaps you should be making those "public"1 methods of a private attribute. But if it's tedious to do that, you probably won't bother.

Another place you probably should be defining an object is when you have a bag of related data that needs its relationships, invariants, and behavior explained. Python makes it soooo easy to just define a tuple or a list. The first couple of times you type host, port = ... instead of address = ... it doesn't seem like a big deal, but then soon enough you're typing [(family, socktype, proto, canonname, sockaddr)] = ... everywhere and your life is filled with regret. That is, if you're lucky. If you're not lucky, you're just maintaining code that does something like values[0][7][4][HOSTNAME]["canonical"] and your life is filled with garden-variety pain rather than the more complex and nuanced emotion of regret.


This raises the question: is it tedious to make a class in Python? Let's look at a simple data structure: a 3-dimensional cartesian coordinate. It starts off simply enough:

1
class Point3D(object):

So far so good. We've got a 3 dimensional point. What next?

1
2
class Point3D(object):
    def __init__(self, x, y, z):

Well, that's a bit unfortunate. I just want a holder for a little bit of data, and I've already had to override a special method from the Python runtime with an internal naming convention? Not too bad, I suppose; all programming is weird symbols after a fashion.

At least I see my attribute names in there, that makes sense.

1
2
3
class Point3D(object):
    def __init__(self, x, y, z):
        self.x

I already said I wanted an x, but now I have to assign it as an attribute...

1
2
3
class Point3D(object):
    def __init__(self, x, y, z):
        self.x = x

... to x? Uh, obviously ...

1
2
3
4
5
class Point3D(object):
    def __init__(self, x, y, z):
        self.x = x
        self.y = y
        self.z = z

... and now I have to do that once for every attribute, so this actually scales poorly? I have to type every attribute name 3 times?!?

Oh well. At least I'm done now.

1
2
3
4
5
6
class Point3D(object):
    def __init__(self, x, y, z):
        self.x = x
        self.y = y
        self.z = z
    def __repr__(self):

Wait what do you mean I'm not done.

1
2
3
4
5
6
7
8
class Point3D(object):
    def __init__(self, x, y, z):
        self.x = x
        self.y = y
        self.z = z
    def __repr__(self):
        return (self.__class__.__name__ +
                ("(x={}, y={}, z={})".format(self.x, self.y, self.z)))

Oh come on. So I have to type every attribute name 5 times, if I want to be able to see what the heck this thing is when I'm debugging, which a tuple would have given me for free?!?!?

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
class Point3D(object):
    def __init__(self, x, y, z):
        self.x = x
        self.y = y
        self.z = z
    def __repr__(self):
        return (self.__class__.__name__ +
                ("(x={}, y={}, z={})".format(self.x, self.y, self.z)))
    def __eq__(self, other):
        if not isinstance(other, self.__class__):
            return NotImplemented
        return (self.x, self.y, self.z) == (other.x, other.y, other.z)

7 times?!?!?!?

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
class Point3D(object):
    def __init__(self, x, y, z):
        self.x = x
        self.y = y
        self.z = z
    def __repr__(self):
        return (self.__class__.__name__ +
                ("(x={}, y={}, z={})".format(self.x, self.y, self.z)))
    def __eq__(self, other):
        if not isinstance(other, self.__class__):
            return NotImplemented
        return (self.x, self.y, self.z) == (other.x, other.y, other.z)
    def __lt__(self, other):
        if not isinstance(other, self.__class__):
            return NotImplemented
        return (self.x, self.y, self.z) < (other.x, other.y, other.z)

9 times?!?!?!?!?

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
from functools import total_ordering
@total_ordering
class Point3D(object):
    def __init__(self, x, y, z):
        self.x = x
        self.y = y
        self.z = z
    def __repr__(self):
        return (self.__class__.__name__ +
                ("(x={}, y={}, z={})".format(self.x, self.y, self.z)))
    def __eq__(self, other):
        if not isinstance(other, self.__class__):
            return NotImplemented
        return (self.x, self.y, self.z) == (other.x, other.y, other.z)
    def __lt__(self, other):
        if not isinstance(other, self.__class__):
            return NotImplemented
        return (self.x, self.y, self.z) < (other.x, other.y, other.z)

Okay, whew - 2 more lines of code isn't great, but now at least we don't have to define all the other comparison methods. But now we're done, right?

1
2
from unittest import TestCase
class Point3DTests(TestCase):

You know what? I'm done. 20 lines of code so far and we don't even have a class that does anything; the hard part of this problem was supposed to be the quaternion solver, not "make a data structure which can be printed and compared". I'm all in on piles of undocumented garbage tuples, lists, and dictionaries it is; defining proper data structures well is way too hard in Python.2


namedtuple to the (not really) rescue

The standard library's answer to this conundrum is namedtuple. While a valiant first draft (it bears many similarities to my own somewhat embarrassing and antiquated entry in this genre) namedtuple is unfortunately unsalvageable. It exports a huge amount of undesirable public functionality which would be a huge compatibility nightmare to maintain, and it doesn't address half the problems that one runs into. A full enumeration of its shortcomings would be tedious, but a few of the highlights:

As to that last point, either you can use it like this:

1
Point3D = namedtuple('Point3D', ['x', 'y', 'z'])

in which case it doesn't look like a type in your code; simple syntax-analysis tools without special cases won't recognize it as one. You can't give it any other behaviors this way, since there's nowhere to put a method. Not to mention the fact that you had to type the class's name twice.

Alternately you can use inheritance and do this:

1
2
class Point3D(namedtuple('_Point3DBase', 'x y z'.split()])):
    pass

This gives you a place you can put methods, and a docstring, and generally have it look like a class, which it is... but in return you now have a weird internal name (which, by the way, is what shows up in the repr, not the class's actual name). However, you've also silently made the attributes not listed here mutable, a strange side-effect of adding the class declaration; that is, unless you add __slots__ = 'x y z'.split() to the class body, and then we're just back to typing every attribute name twice.

And this doesn't even mention the fact that science has proven that you shouldn't use inheritance.

So, namedtuple can be an improvement if it's all you've got, but only in some cases, and it has its own weird baggage.


Enter The attr

So here's where my favorite mandatory Python library comes in.

Let's re-examine the problem above. How do I make Point3D with attrs?

1
2
import attr
@attr.s

Since this isn't built into the language, we do have to have 2 lines of boilerplate to get us started: the import and the decorator saying we're about to use it.

1
2
3
import attr
@attr.s
class Point3D(object):

Look, no inheritance! By using a class decorator, Point3D remains a Plain Old Python Class (albeit with some helpful double-underscore methods tacked on, as we'll see momentarily).

1
2
3
4
import attr
@attr.s
class Point3D(object):
    x = attr.ib()

It has an attribute called x.

1
2
3
4
5
6
import attr
@attr.s
class Point3D(object):
    x = attr.ib()
    y = attr.ib()
    z = attr.ib()

And one called y and one called z and we're done.

We're done? Wait. What about a nice string representation?

1
2
>>> Point3D(1, 2, 3)
Point3D(x=1, y=2, z=3)

Comparison?

1
2
3
4
5
6
>>> Point3D(1, 2, 3) == Point3D(1, 2, 3)
True
>>> Point3D(3, 2, 1) == Point3D(1, 2, 3)
False
>>> Point3D(3, 2, 3) > Point3D(1, 2, 3)
True

Okay sure but what if I want to extract the data defined in explicit attributes in a format appropriate for JSON serialization?

1
2
>>> attr.asdict(Point3D(1, 2, 3))
{'y': 2, 'x': 1, 'z': 3}

Maybe that last one was a little on the nose. But nevertheless, it's one of many things that becomes easier because attrs lets you declare the fields on your class, along with lots of potentially interesting metadata about them, and then get that metadata back out.

1
2
3
4
5
>>> import pprint
>>> pprint.pprint(attr.fields(Point3D))
(Attribute(name='x', default=NOTHING, validator=None, repr=True, cmp=True, hash=True, init=True, convert=None),
 Attribute(name='y', default=NOTHING, validator=None, repr=True, cmp=True, hash=True, init=True, convert=None),
 Attribute(name='z', default=NOTHING, validator=None, repr=True, cmp=True, hash=True, init=True, convert=None))

I am not going to dive into every interesting feature of attrs here; you can read the documentation for that. Plus, it's well-maintained, so new goodies show up every so often and I might miss something important. But attrs does a few key things that, once you have them, you realize that Python was sorely missing before:

  1. It lets you define types concisely, as opposed to the normally quite verbose manual def __init__.... Types without typing.
  2. It lets you say what you mean directly with a declaration rather than expressing it in a roundabout imperative recipe. Instead of "I have a type, it's called MyType, it has a constructor, in the constructor I assign the property 'A' to the parameter 'A' (and so on)", you say "I have a type, it's called MyType, it has an attribute called a", and behavior is derived from that fact, rather than having to later guess about the fact by reverse engineering it from behavior (for example, running dir on an instance, or looking at self.__class__.__dict__).
  3. It provides useful default behavior, as opposed to Python's sometimes-useful but often-backwards defaults.
  4. It adds a place for you to put a more rigorous implementation later, while starting out simple.

Let's explore that last point.

Progressive Enhancement

While I'm not going to talk about every feature, I'd be remiss if I didn't mention a few of them. As you can see from those mile-long repr()s for Attribute above, there are a number of interesting ones.

For example: you can validate attributes when they are passed into an @attr.s-ified class. Our Point3D, for example, should probably contain numbers. For simplicity's sake, we could say that that means instances of float, like so:

1
2
3
4
5
6
7
import attr
from attr.validators import instance_of
@attr.s
class Point3D(object):
    x = attr.ib(validator=instance_of(float))
    y = attr.ib(validator=instance_of(float))
    z = attr.ib(validator=instance_of(float))

The fact that we were using attrs means we have a place to put this extra validation: we can just add type information to each attribute as we need it. Some of these facilities let us avoid other common mistakes. For example, this is a popular "spot the bug" Python interview question:

1
2
3
4
5
6
7
class Bag:
    def __init__(self, contents=[]):
        self._contents = contents
    def add(self, something):
        self._contents.append(something)
    def get(self):
        return self._contents[:]

Fixing it, of course, becomes this:

1
2
3
4
5
class Bag:
    def __init__(self, contents=None):
        if contents is None:
            contents = []
        self._contents = contents

adding two extra lines of code.

contents inadvertently becomes a global varible here, making all Bag objects not provided with a different list share the same list. With attrs this instead becomes:

1
2
3
4
5
6
7
@attr.s
class Bag:
    _contents = attr.ib(default=attr.Factory(list))
    def add(self, something):
        self._contents.append(something)
    def get(self):
        return self._contents[:]

There are several other features that attrs provides you with opportunities to make your classes both more convenient and more correct. Another great example? If you want to be strict about extraneous attributes on your objects (or more memory-efficient on CPython), you can just pass slots=True at the class level - e.g. @attr.s(slots=True) - to automatically turn your existing attrs declarations a matching __slots__ attribute. All of these handy features allow you to make better and more powerful use of your attr.ib() declarations.


The Python Of The Future

Some people are excited about eventually being able to program in Python 3 everywhere. What I'm looking forward to is being able to program in Python-with-attrs everywhere. It exerts a subtle, but positive, design influence in all the codebases I've seen it used in.

Give it a try: you may find yourself surprised at places where you'll now use a tidily explained class, where previously you might have used a sparsely-documented tuple, list, or a dict, and endure the occasional confusion from co-maintainers. Now that it's so easy to have structured types that clearly point in the direction of their purpose (in their __repr__, in their __doc__, or even just in the names of their attributes), you might find you'll use a lot more of them. Your code will be better for it; I know mine has been.


  1. Scare quotes here because the attributes aren't meaningfully exposed to the caller, they're just named publicly. This pattern, getting rid of private methods entirely and having only private attributes, probably deserves its own post...

  2. And we hadn't even gotten to the really exciting stuff yet: type validation on construction, default mutable values...

12 Aug 2016 9:47am GMT

Glyph Lefkowitz

Remember that thing I said in my pycon talk about native packaging being the main thing to worry about, and single-file binaries being at best a stepping stone to that and at worst a bit of a red herring? You don't have to take it from me. From the authors of a widely-distributed command-line application that was rewritten from Python into Go specifically for easier distribution, and then rewritten in Python:

... [the] majority of people prefer native packages so distributing precompiled binaries wasn't a big win for this type of project1 ...

I don't want to say "I told you so", but... no. Wait a second. That is exactly what I want to do. That is what I am doing.

I told you so.


  1. Marcin Kulik, '1.3 aka "And Now for Something Completely Different"', asciinema blog

12 Aug 2016 3:37am GMT

11 Aug 2016

feedPlanet Twisted

Glyph Lefkowitz

Hello lazyweb,

I want to run some "legacy" software (Trac, specifically) on a Swarm cluster. The files that it needs to store are mostly effectively write-once (it's the attachments database) but may need to be deleted (spammers and purveyors of malware occasionally try to upload things for spamming or C&C) so while mutability is necessary, there's a very low risk of any write contention.

I can't use a networked filesystem, or any volume drivers, so no easy-mode solutions. Basically I want to be able to deploy this on any swarm cluster, so no cheating and fiddling with the host.

Is there any software that I can easily run as a daemon that runs in the background, synchronizing the directory of a data volume between N containers where N is the number of hosts in my cluster?

I found this but it strikes me as ... somehow wrong ... to use that as a critical part of my data-center infrastructure. Maybe it would actually be a good start? But in addition to not being really designed for this task, it's also not open source, which makes me a little nervous. This list, or even this smaller one is huge and bewildering. So I was hoping you could drop me a line if you've got an idea what I could use for this.

11 Aug 2016 5:28am GMT

08 Aug 2016

feedPlanet Twisted

Glyph Lefkowitz

I like keeping a comprehensive an accurate addressbook that includes all past email addresses for my contacts, including those which are no longer valid. I do this because I want to be able to see conversations stretching back over the years as originating from that person.

Unfortunately this causes problems when sending mail sometimes. On macOS, at least as of El Capitan, neither the Mail application nor the Contacts application have any mechanism for indicating preference-order of email addresses that I've been able to find. Compounding this annoyance, when completing a recipient's address based on their name, it displays all email addresses for a contact without showing their label, which means even if I label one "preferred" or "USE THIS ONE NOW", or "zzz don't use this hasn't worked since 2005", I can't tell when I'm sending a message.

But it seems as though it defaults to sending messages to the most recent outgoing address for that contact that it can see in an email. For people I send email to regularly to this is easy enough. For people who I'm aware have changed their email address, but where I don't actually want to send them a message, I think I figured out a little hack that makes it work: make a new folder called "Preferred Addresses Hack" (or something suitably), compose a new message addressed to the correct address, then drag the message out of drafts into the folder; since it has a recent date and is addressed to the right person, Mail.app will index it and auto-complete the correct address in the future.

However, since the previous behavior appeared somewhat non-deterministic, I might be tricking myself into believing that this hack worked. If you can confirm it, I'd appreciate it if you would let me know.

08 Aug 2016 11:24pm GMT

Itamar Turner-Trauring: Why living below your means can help you find a better job

Sooner or later you're going to have to find a new job. Maybe you'll decide you can't take another day of the crap your boss is putting you through. Maybe you'll get laid off, or maybe you'll just decide to move to a new city.

Whatever the reason, when you're looking for a new job one of the keys to success is the ability to be choosy. If you need a job right now to pay your bills that means you've got no negotiating leverage. And there's no reason to think the first job offer you get will be the best job for you; maybe it'll be the second offer, or the tenth.

It's true that having in-demand skills or being great at interviews will help you get more and better offers. But interviewing for jobs always takes time... and if you can't afford to go without income for long then you won't be able to pick and choose between offers.

How do you make sure you don't need to find a new job immediately and that you can be choosy about which job you take? By living below your means.

Saving for a rainy day

You're going to have to find a new job someday but you should start preparing right now. By the time you realize you need a new job it may be too late. How do you prepare? By saving money in an easy to access way, an emergency stash that will pay for your expenses while you have no income. Cash in the bank is a fine way to do this since the goal is not to maximize returns, the goal is to have money available when you need it.

Let's say you need to spend 100 Gold Pieces a month to sustain your current lifestyle. And you decide you want 4 months of expenses saved just in case you lose your job during a recession, when jobs will take longer to find. That means you need 400GP savings in the bank.

If your expenses are 100GP and you have no savings that suggests your take-home pay is also 100GP (if you're spending more than you make better fix that first!). Increasing pay is hard to do quickly, so you need to reduce your spending temporarily until you have those 400GP saved. At that point you can go back to spending your full paycheck with the knowledge that you have money saved for a rainy day.

But you can do better.

Living below your means

If you're always spending less than your income you get a double benefit: you're saving more and it takes you longer to exhaust your savings. To see that we can look at two two scenarios, one where you permanently reduce your spending to 80GP a month and another where you permanently reduce it to 50GP a month.

Scenario 1: You have 100GP take-home pay, 80GP expenses. You're saving 20GP a month, so it will take you 20 months to save 400GP. Since your monthly expenses are 80GP this means you can now go 400/80 = 5 months without any pay.

Scenario 2: You have 100GP take-home pay, 50GP expenses. You're saving 50GP a month, so it will take you 8 months to save 400GP. Since your monthly expenses are 50GP this means you can now go 400/50 = 8 months without any pay.

As you can see, the lower your expenses the better off you are. At 80GP/month expenses it takes you 20 months to save 5 months' worth of expenses. At 50GP/month expenses it only takes you 8 months to save 8 months' worth of expenses. Reducing your expenses allows you to save faster and makes your money last longer!

The longer your money lasts the more leverage you have during a job search: you can walk away from a bad job offer and have an easier time negotiating a better offer. And you also have the option of taking a lower-paid job if that job is attractive enough in other ways. Finally, a permanently reduced cost of living also means that over time you are less and less reliant on your job as a source of income.

Reduce your expenses today!

To prepare for the day when you need to look for a job you should reduce expenses temporarily until you've saved enough to pay for a few months' living expenses. Once you've done that you'll have a sense of whether that lower level of expenses works for you; there's a pretty good chance you'll be just as happy at 90GP or 80GP as at 100GP.

In that case you should permanently reduce your expenses rather than going back to 100GP a month. You'll have more money in the bank, you'll have money you can invest for the long term, and the money you have saved will last you longer when you eventually have to look for a new job.

08 Aug 2016 4:00am GMT

03 Aug 2016

feedPlanet Twisted

Itamar Turner-Trauring: Why lack of confidence can make you a better programmer

What if you're not a superstar programmer? What if other people work longer than you, or harder than you, or have built better projects than you? Can you succeed without self-confidence? I believe you can, and moreover I feel that lack of confidence can actually make you a better programmer.

The problem with self-confidence

It's easy to think that self-confidence is far more useful than lack of confidence. Self-confidence will get you to try new things and can convince others of your worth. Self-confidence seems self-evidently worthwhile: if it isn't worthwhile, why is that self-confident person so confident?

But in fact unrealistic self-confidence can be quite harmful. Software is usually far more complex than we believe it to be, far harder to get right than we think. And if we're self-confident we may think our software works even when it doesn't.

When I was younger I suffered from the problem of too much self-confidence. I wrote software that didn't work, or was built badly... and I only realized its flaws after the fact, when it was too late. Software that crashed at 4AM, patches to open source software that never worked in the first place, failed projects I learned nothing from (at the time)... it's a long list.

I finally became a better programmer when I learned to doubt myself, to doubt the software that I wrote.

Why good programmers lack confidence in themselves

Why does doubting yourself, lacking confidence in yourself, make you a better a programmer?

When we write software we're pretty much always operating beyond the point of complexity where we can fit the whole program in our mind. That means you always have to doubt your ability to catch all the problems, to come up with the best solutions.

And so we get peer feedback, write tests, and get code reviews, all in the hopes of overcoming our inevitable failures:

  1. "My design isn't great," you think. So you talk it over with a colleague and together you come up with an even better idea.
  2. "My code might have bugs," you say. So you write tests, and catch any bugs and prevent future bugs.
  3. "My code might still have UI problems," you imagine. So you manually test your web application and check all the edge cases.
  4. "I might have forgotten something," you consider. So you get a code review, and get suggestions for improving your code.

These techniques also have the great benefit of teaching you to be a better programmer, increasing the complexity you can work with. You'll still need tests and code reviews and all the rest; our capacity for understanding is always finite and always stretched.

You can suffer from too much lack of confidence: you should not judge yourself harshly. But we're all human, and therefore all fallible. If you want your creations to succeed you should embrace lack of confidence: test your code, look for things you've missed, get help from others. And if you head over to Software Clown you can discover the many mistakes caused by my younger self's over-confidence, and the lessons you can learn from those mistakes.

03 Aug 2016 4:00am GMT

02 Aug 2016

feedPlanet Twisted

Itamar Turner-Trauring: Why living below your means can help you find a better job

Sooner or later you're going to have to find a new job. Maybe you'll decide you can't take another day of the crap your boss is putting you through. Maybe you'll get laid off, or maybe you'll just decide to move to a new city.

Whatever the reason, when you're looking for a new job one of the keys to success is the ability to be choosy. If you need a job right now to pay your bills that means you've got no negotiating leverage. And there's no reason to think the first job offer you get will be the best job for you; maybe it'll be the second offer, or the tenth.

It's true that having in-demand skills or being great at interviews will help you get more and better offers. But interviewing for jobs always takes time... and if you can't afford to go without income for long then you won't be able to pick and choose between offers.

How do you make sure you don't need to find a new job immediately and that you can be choosy about which job you take? By living below your means.

Saving for a rainy day

You're going to have to find a new job someday but you should start preparing right now. By the time you realize you need a new job it may be too late. How do you prepare? By saving money in an easy to access way, an emergency stash that will pay for your expenses while you have no income. Cash in the bank is a fine way to do this since the goal is not to maximize returns, the goal is to have money available when you need it.

Let's say you need to spend 100 Gold Pieces a month to sustain your current lifestyle. And you decide you want 4 months of expenses saved just in case you lose your job during a recession, when jobs will take longer to find. That means you need 400GP savings in the bank.

If your expenses are 100GP and you have no savings that suggests your take-home pay is also 100GP (if you're spending more than you make better fix that first!). Increasing pay is hard to do quickly, so you need to reduce your spending temporarily until you have those 400GP saved. At that point you can go back to spending your full paycheck with the knowledge that you have money saved for a rainy day.

But you can do better.

Living below your means

If you're always spending less than your income you get a double benefit: you're saving more and it takes you longer to exhaust your savings. To see that we can look at two two scenarios, one where you permanently reduce your spending to 80GP a month and another where you permanently reduce it to 50GP a month.

Scenario 1: You have 100GP take-home pay, 80GP expenses. You're saving 20GP a month, so it will take you 20 months to save 400GP. Since your monthly expenses are 80GP this means you can now go 400/80 = 5 months without any pay.

Scenario 2: You have 100GP take-home pay, 50GP expenses. You're saving 50GP a month, so it will take you 8 months to save 400GP. Since your monthly expenses are 50GP this means you can now go 400/50 = 8 months without any pay.

As you can see, the lower your expenses the better off you are. At 80GP/month expenses it takes you 20 months to save 5 months' worth of expenses. At 50GP/month expenses it only takes you 8 months to save 8 months' worth of expenses. Reducing your expenses allows you to save faster and makes your money last longer!

The longer your money lasts the more leverage you have during a job search: you can walk away from a bad job offer and have an easier time negotiating a better offer. And you also have the option of taking a lower-paid job if that job is attractive enough in other ways. Finally, a permanently reduced cost of living also means that over time you are less and less reliant on your job as a source of income.

Reduce your expenses today!

To prepare for the day when you need to look for a job you should reduce expenses temporarily until you've saved enough to pay for a few months' living expenses. Once you've done that you'll have a sense of whether that lower level of expenses works for you; there's a pretty good chance you'll be just as happy at 90GP or 80GP as at 100GP.

In that case you should permanently reduce your expenses rather than going back to 100GP a month. You'll have more money in the bank, you'll have money you can invest for the long term, and the money you have saved will last you longer when you eventually have to look for a new job.

02 Aug 2016 4:00am GMT