20 Nov 2017

feedPlanet Twisted

Hynek Schlawack: Python Hashes and Equality

Most Python programmers don't spend a lot of time thinking about how equality and hashing works. It usually just works. However there's quite a bit of gotchas and edge cases that can lead to subtle and frustrating bugs once one starts to customize their behavior - especially if the rules on how they interact aren't understood.

20 Nov 2017 6:45am GMT

Itamar Turner-Trauring: Young programmers working long hours: a fun job or bad management?

If you're looking for a job with work/life balance, you'll obviously want to avoid a company where everyone works long hours. But what about companies where only some people work long hours? If you ask for an explanation of what's going on at these companies, one common answer you'll hear is something like "oh, they're young, they don't have families, they enjoy their work so they work longer hours."

As you venture into the unknown waters of a potential employer, is this something you should worry about? Those younger programmers, you're told, are having a wonderful time. They're waving their arms, true, and there's a tangle of tentacles wrapped around them, but that's just the company mascot, a convivial squid. Really, they're having fun.

Or are they?

To be fair, there really are companies where programmers stay later at the office because they enjoy hanging out with people they like, and working on interesting problems while they do it. More experienced programmers are older, are therefore more likely to have children or other responsibilities, and that's why they're working shorter hours. On the other hand, it may be that the pile of tentacles wrapped around these programmers is not so much convivial as voracious and hostile: the kraken of overwork.

The kraken of overwork

Another potential reason less experienced programmers are working longer hours is that they don't know how to get their work done in a reasonable amount of time. Why? Because no one taught them the necessary skills.

I'm not talking about technical skills here, but rather things like:

Many managers don't quite realize these skills exist, or can't articulate what they are, or don't know how to teach them. So even if the inexperienced programmers are given reasonable amounts of work, they are never taught the skills to finish their work on time. In this situation having children or a family is not causal, it's merely correlated with experience. Experienced programmers know how to get their work done in a reasonable amount of time, but inexperienced programmers don't.

In which case the young programmers' lack of kids, and "oh, they just enjoy their work", is just a rationalization: an excuse for skills that aren't being taught, an excuse for pointless and wasted effort. Those inexperienced programmers waving their hands around aren't having fun, they're being eaten by a squid-and no one is helping save them.

Avoiding the kraken

So when you're interviewing for a job, how do you tell the difference between these two reasons for long hours? Make sure you ask about training and career development.

Hopefully the answers will demonstrate the less-experienced programmers are taught the skills they need, and helped when they're floundering. If no, you may wish to avoid this company, especially if you are less experienced. Good luck interviewing, and watch out for krakens!

20 Nov 2017 5:00am GMT

15 Nov 2017

feedPlanet Twisted

Moshe Zadka: Abstaction Cascade

(This is an adaptation of part of the talk Kurt Rose and I gave at PyBay 2017)

An abstraction cascade is a common anti-pattern in legacy system. It is useful to understand how to recognize it, how it tends to come about, how to fix it -- and most importantly, what kind of things will not fix it. The last one is important, in general, for anti-patterns in legacy systems: if the obvious fix worked, it would have been already dealt with, and would not be a common anti-pattern in legacy systems.

Recognition

The usual pattern for a abstraction cascade looks like complicated, ad-hoc, if/else sequence to decide which path to take. Here is example for a abstraction cascade for finding out a network address corresponding to a name:

def get_address(name):
    if name in services:
        if services[name].ip:
            return service[name].ip, service[name].port
        elif services[name].address:
            # Added for issue #2321
            if ':' in services[name].address:
               return service[name].address.split(':')
            else:
               # Fixes issues #6985
               # TODO: Hotfix, clean-up later
               return service[name].address, DEFAULT_PORT
    return dns_lookup(name), DEFAULT_PORT

History

At each step, it seems reasonable to make a specific change. Here is a typical way this kind of code comes about.

The initial version is reasonable: since DNS is a way to publish name to address mapping, why not use a standard?

def get_address(name):
    return dns_lookup(name), DEFAULT_PORT

Under load, an outage happened. There was no time to investigate how to configure DNS caching or TTL better -- so the "popular" services got added to a static list, with a "fast path" checking. This decision also makes sense: when an outage is ongoing, the top priority is to relieve the symptoms.

def get_address(name):
    if name in services:
        # Fixes issues #6985
        # TODO: Hotfix, clean-up later
        return service[name].address, DEFAULT_PORT
    return dns_lookup(name), DEFAULT_PORT

However, now the door has opened to add another path in the function. When the need to support multiple services on one host happened, it was easier to just add another path: after all, this was only for new services.

def get_address(name):
    if name in services:
        # Added for issue #2321
        if ':' in services[name].address:
            return service[name].address.split(':')
        else:
            # Fixes issues #6985
            # TODO: Hotfix, clean-up later
            return service[name].address, DEFAULT_PORT
    return dns_lookup(name), DEFAULT_PORT

When the change to IPv6 occured, splitting on : was not a safe operation -- so a separate field was added. Again, the existing "new" services (by now, many -- and not so new!) did not need to be touched:

def get_address(name):
    if name in services:
        if services[name].ip:
            return service[name].ip, service[name].port
        elif services[name].address:
            # Added for issue #2321
            if ':' in services[name].address:
               return service[name].address.split(':')
            else:
               # Fixes issues #6985
               # TODO: Hotfix, clean-up later
               return service[name].address, DEFAULT_PORT
    return dns_lookup(name), DEFAULT_PORT

Of course, this is typically just chapter one in the real story: having to adapt to multiple data centers, or multiple providers of services, will lead to more and more of these paths -- with nothing thrown away, because "some legacy service depends on it -- maybe".

Non-fixes

Fancier dispatch

Sometimes the ad-hoc if/else pattern is obscured by more abstract dispatch logic: for example, something that loops through classes and finds out which one is the right one:

class AbstractNameFinder(object):
    def matches(self, name):
        raise NotImplementedError()
    def get_address(self, name):
        raise NotImplementedError()
class DNS(AbstractNameFinder):
    def matches(self, name):
        return True
    def get_address(self, name):
        return dns_lookup(name), DEFAULT_PORT
class Local(AbstractNameFinder):
    def matches(self, name):
        return hasattr(services.get(name), 'ip')
    def get_address(self, name):
        return services[name].ip, services[name].port
finders = [Local(), DNS()]
def get_address(name):
    for finder in finders:
        if finder.match(name):
            return finder.get_address(name)

This is actually worse -- now the problem can be spread over multiple files, with no single place to fix it. While the code can be converted to this form, semi-mechanically, this does not fix the underlying issue -- and will actually make the problem continue on with force.

Pareto fix

The Pareto rule is that 80% of the problem is solved with 20% of the effort. It is often the case that a big percentage (in the stereotypical Pareto case, 80%) of the problem is not hard to fix.

For example, most services are actually listed in some file, and all we need to do is read this file in and look up based on that. The incentive to fix "80% of the problem" and leave the "20%" for later is strong.

However, usually the problem is that each of those "Pareto fixes" again makes the problem worse: since it is not a complete replacement, another dispatch layer needs to be built to support the "legacy solution". The new dispatch layer, the new solution, and the legacy solution all become part of the newest iteration of the legacy system, and cause the problem to be even worse.

Fixing 80% of the problem is useful for prototyping, since we are not sure we are solving the right problem and nothing better exists. However, in this case, the complete solution is necessary, so neither of these conditions hold.

Escape strategy

The reason this happens is because no single case can be removed. The way forward is not to add more cases, but to try and remove a single case. The first question to ask is: why was no case removed? Often, the reason is that there is no way to test whether removal is safe.

It might take some work to build infrastructure that will properly make removal safe. Unit tests are often not enough. Integration tests, as well, are sometimes not enough. Sometimes canary systems, sometimes feature flag systems, or, if worst comes to worst, a way to test and roll-back quickly if a problem is found.

Once it is possible to remove just one case (in our example above, maybe check what it would take to remove the case where we split on a colon, since this is clearly worse than just having separate attributes), thought needs to be given to which case is best.

Sometimes, there is more than one case that is really needed: some inherent, deep, trade-off. However, it is rare to need more than two, and almost unheard of to need more than three. Start removing unneeded cases one by one.

Conclusion

When seeing an abstraction cascade, there is a temptation to "clean it up": but most obvious clean-ups end up making it worse. However, by understanding how it came to be, and finding a way to remove cases, it is possible to do away with it.

15 Nov 2017 2:23am GMT

14 Nov 2017

feedPlanet Twisted

Moshe Zadka: Gather

Gather is a plugin framework -- and it now has its own blog.

Use it! If you like it, tell us about it, and if there is a problem, tell us about that.

14 Nov 2017 2:23am GMT

07 Nov 2017

feedPlanet Twisted

Itamar Turner-Trauring: There's no such thing as bad code

Are you worried that you're writing bad code? Code that doesn't follow best practices, code without tests, code that violates coding standards, code you simply don't want to think about because it's so very very embarrassing?

In fact, there is no such thing as inherently bad code, or for that matter inherently good code. This doesn't mean you shouldn't be judging your code, it's just that if you're worrying about whether your code is "good" or "bad" then you're worrying about the wrong thing.

In this post I will:

"Bad" code, "good" code

Let's look at a couple of examples of "bad" code and see that, under some circumstances, this "badness" is irrelevant.

Hard-to-read code

As everyone knows, "good" code is easy to read and follow. You need to choose readable variables, clear function names, and so on and so forth. But then again-

Unit tests

As everyone knows, "good" code has unit tests, and "bad" code does not. But then again-

There's no such thing as "best practices"

At this point you might be getting a little annoyed. "Yes," you might say, "these are some exceptions, but for the most part there are standard best practices that everyone can and should follow." Consider:

Both NASA's techniques and formal verification lead to far less defects. So should they be best practices? It depends: if you're building the website for a local chain of restaurants, using these techniques is ridiculous. You'll ship a year late and 10× over budget. On the other hand, if you're writing the software for a heart monitor, or a spacecraft that's going to Pluto, "I wrote some unit tests!" is very definitely not best practices.

A sanity check for your code

Instead of feeling embarrassed about your "bad" code, or proud of your "good" code, you should judge your code by how well it succeeds at achieving your goals. Whether you're competing in the International Obfuscated C Code Contest or working on NASA's latest mission, you need to use techniques suitable to your particular situation and goal. Judging one's code can be a little tricky, of course: it's easy to miss the ways in which you've failed, easy to underestimate the ways in which you've succeeded.

So try this exercise to give yourself some perspective: every time you write some code figure out the tradeoffs you're making. That is, identify the circumstances and goals for which your current practices are insufficient, and those for which your current practices are overkill. If you can't come up with an answer, if your code seems suitable for all situations, that is a danger sign: there's always a tradeoff, even if you can't see it.

Finally, when you encounter someone else's code, be kind: don't tell them their code is "bad". Instead, go through the same exercise with them. Figure out their goals, and then walk through the tradeoffs involved in how they've written their code. This is a far more useful way of improving their code, and can help you understand why you make the decisions you do.

07 Nov 2017 5:00am GMT

23 Oct 2017

feedPlanet Twisted

Glyph Lefkowitz: Careful With That PyPI

Too Many Secrets

A wise man once said, "you shouldn't use ENV variables for secret data". In large part, he was right, for all the reasons he gives (and you should read them). Filesystem locations are usually a better operating system interface to communicate secrets than environment variables; fewer things can intercept an open() than can read your process's command-line or calling environment.

One might say that files are "more secure" than environment variables. To his credit, Diogo doesn't, for good reason: one shouldn't refer to the superiority of such a mechanism as being "more secure" in general, but rather, as better for a specific reason in some specific circumstance.

Supplying your PyPI password to tools you run on your personal machine is a very different case than providing a cryptographic key to a containerized application in a remote datacenter. In this case, based on the constraints of the software presently available, I believe an environment variable provides better security, if you use it correctly.

Popping A Shell By Any Other Name

If you upload packages to the python package index, and people use those packages, your PyPI password is an extremely high-privilege credential: effectively, it grants a time-delayed arbitrary code execution privilege on all of the systems where anyone might pip install your packages.

Unfortunately, the suggested mechanism to manage this crucial, potentially world-destroying credential is to just stick it in an unencrypted file.

The authors of this documentation know this is a problem; the authors of the tooling know too (and, given that these tools are all open source and we all could have fixed them to be better about this, we should all feel bad).

Leaving the secret lying around on the filesystem is a form of ambient authority; a permission you always have, but only sometimes want. One of the worst things about this is that you can easily forget it's there if you don't use these credentials very often.

The keyring is a much better place, but even it can be a slightly scary place to put such a thing, because it's still easy to put it into a state where some random command could upload a PyPI release without prompting you. PyPI is forever, so we want to measure twice and cut once.

Luckily, even more secure places exist: password managers. If you use https://1password.com or https://www.lastpass.com, both offer command-line interfaces that integrate nicely with PyPI. If you use 1password, you'll really want https://stedolan.github.io/jq/ (apt-get install jq, brew install jq) to slice & dice its command-line.

The way that I manage my PyPI credentials is that I never put them on my filesystem, or even into my keyring; instead, I leave them in my password manager, and very briefly toss them into the tools that need them via an environment variable.

First, I have the following shell function, to prevent any mistakes:

1
2
3
4
function twine () {
    echo "Use dev.twine or prod.twine depending on where you want to upload.";
    return 1;
}

For dev.twine, I configure twine to always only talk to my local DevPI instance:

1
2
3
4
5
6
function dev.twine () {
    env TWINE_USERNAME=root \
        TWINE_PASSWORD= \
        TWINE_REPOSITORY_URL=http://127.0.0.1:3141/root/plus/ \
        twine "$@";
}

This way I can debug Twine, my setup.py, and various test-upload things without ever needing real credentials at all.

But, OK. Eventually, I need to actually get the credentials and do the thing. How does that work?

1Password

1password's command line is a little tricky to log in to (you have to eval its output, it's not just a command), so here's a handy shell function that will do it.

1
2
3
4
5
6
function opme () {
    # Log this shell in to 1password.
    if ! env | grep -q OP_SESSION; then
        eval "$(op signin "$(jq -r '.latest_signin' ~/.op/config)")";
    fi;
}

Then, I have this little helper for slicing out a particular field from the OP JSON structure:

1
2
3
function _op_field () {
    jq -r '.details.fields[] | select(.name == "'"${1}"'") | .value';
}

And finally, I use this to grab the item I want (named, memorably enough, "PyPI") and invoke Twine:

1
2
3
4
5
6
7
function prod.twine () {
    opme;
    local pypi_item="$(op get item PyPI)";
    env TWINE_USERNAME="$(echo ${pypi_item} | _op_field username)" \
        TWINE_PASSWORD="$(echo "${pypi_item}" | _op_field password)" \
        twine "$@";
}

LastPass

For lastpass, you can just log in (for all shells; it's a little less secure) via lpass login; if you've logged in before you often don't even have to do that, and it will just prompt you when running command that require you to be logged in; so we don't need the preamble that 1password's command line did.

Its version of prod.twine looks quite similar, but its plaintext output obviates the need for jq:

1
2
3
4
5
function prod.twine () {
    env TWINE_USERNAME="$(lpass show PyPI --username)" \
        TWINE_PASSWORD="$(lpass show PyPI --password)" \
        twine "$@";
}

In Conclusion

"Keep secrets out of your environment" is generally a good idea, and you should always do it when you can. But, better a moment in your process environment than an eternity on your filesystem. Environment-based configuration can be a very useful stopgap for limiting the lifetimes of credentials when your tools don't support more sophisticated approaches to secret storage.1

Post Script

If you are interested in secure secret storage, my micro-project secretly might be of interest. Right now it doesn't do a whole lot; it's just a small wrapper around the excellent keyring module and the pinentry / pinentry-mac password prompt tools. secretly presents an interface both for prompting users for their credentials without requiring the command-line or env vars, and for saving them away in keychain, for tools that need to pull in an API key and don't want to make the user manually edit a config file first.


  1. Really, PyPI should have API keys that last for some short amount of time, that automatically expire so you don't have to freak out if you gave somebody a 5-year-old laptop and forgot to wipe it first. But again, if I wanted that so bad, I should have implemented it myself...

23 Oct 2017 5:10am GMT

Itamar Turner-Trauring: Your technical skills are obsolete: now what?

One day you go to work and discover your technical skills are obsolete:

You feel like your growth has been stunted: there are all these skills you should have been learning, but you never did because you didn't need them at work. Your coworkers seem to know all about the latest tools and you don't, and eventually, maybe soon, you'll just be left behind.

What should you do? How can you get out of this mess and salvage your career?

I'm not going to say "code in your spare time", because that's not possible for many people. And while I believe it's completely possible to keep your skills up-to-date as part of your job (e.g. I've written about having a broad grasp of available technology and practicing on the job), the assumption at this point is that you haven't done so.

Here are your goals, then:

  1. Get your technical skills up to speed, and quickly.
  2. Do it all during work hours.
  3. End up looking good to your manager.

In this post I'll explain one way to do so, which involves:

Why you're using old technology

Most programmers, probably including you, work on existing software projects. Existing software projects tend to use older technology. The result: you are likely to be using older, out-of-date technology, rather than the latest and (speculatively) greatest.

When a project gets started the programmers in charge pick the best technology they know of, and hope for the best. After that initial set of choices most projects stick to their current technology set, only slowly upgrading over time. Of course, there's always new technologies coming out that claim to be better than existing ones.

Updating an existing project to new technologies is difficult, which means changes are often put off until it's truly necessary. It takes effort, ranging from a small effort to upgrade to a newer version of a library version, to a large effort for changing infrastructure like version control, to an enormous effort if you want to switch to a new programming language. So even when a clearly superior and popular technology becomes available, the cost of switching may not be worth it. Time spent switching technologies is time that could be spent shipping features, after all.

Two real-world examples of this dynamic:

The switch to new technology

Eventually the cost of sticking with an old technology becomes too high, and management starts putting the resources into upgrading. In a well-run company this happens on a regular, on-going basis, and management will have spent the resources to keep programmers' skills up-to-date. In these companies learning about new technologies, and then deciding which are worth the cost of adopting, will be an ongoing activity.

In many companies, however, the cost of keeping programmer skills up-to-date is dumped on to you. You are expected to spend your spare time researching new programming languages, tools, and techniques. If you enjoy doing that, great. If you don't, your technical knowledge will stagnate, at some cost to the company, but even more so to you.

Helping your project, upgrading your skills

If you find yourself in this situation then you can turn your project's out-of-date technology into a learning opportunity for you. Technology's purpose is to solve business problems: you need identify business problems where your current technology isn't working well, and try to solve those problems. This will allow you to research and learn new technologies while helping your project improve.

Specifically, you should:

  1. Identify obsolete and problematic technologies.
  2. Identify potential replacements.
  3. Convince your manager that this is a problem that merits further resources. Your goal is to get the time to build a proof-of-concept or pilot project where you can expand your understanding of a relevant, useful new technology.

If all goes well you'll have both demonstrated your value to your manager, and been given the time to learn a new technology at work. But even if you fail to convince your manager, you'll have an easier time when it comes to interviewing at other jobs, and some sense of which technologies are worth learning.

Let's go through these steps one by one.

1. Identify obsolete and problematic technologies

Your project is likely using many out-of-date technologies: you want to find one that is both expensive to your project, and not too expensive to replace. Since you're going to have to convince your manager to put some resources into the project, you want to have some clear evidence that an obsolete technology is costing the company.

Look for things like:

You can do this while you work: just look for signs of trouble as you go about your normal business.

2. Identify potential replacements

Once you've identified a problem technology, you need to find a plausible replacement. It's likely that any problem you have is not a rare problem: someone else has had this issue, so someone else has probably come up with a solution. In fact, chances are there are multiple solutions. You just need to find a reasonable one.

You should:

  1. Figure out the keywords that describe this technology. For example, if you need to line up images automatically the term of art is "image registration". If you want to run a series of long-running processes in a row the terms of art are "workflow management", "batch processing", "data pipelines", and other terms. You can do this by reading the documentation for some known solution, talking to a colleague with broader technical knowledge, or some search engine iteration.
  2. Once you have the keywords, you can start finding solutions. The documentation for a tool will often provide more keywords with which to extend your search, and will often mention competitors.
  3. Search engine queries can also find more alternatives, e.g. search for "$SOMETECH alternatives" or look at the Google search auto-completes for "$SOMETECH vs".
  4. Once you have found a number of alternatives, get a sense of what the top contender or contenders are. Criteria might include complexity, maturity, risk (is it developed only by a startup?), features, popularity, and so on. At the end of this post you can sign up to get a PDF with my personal process for evaluating technologies.

The goal is become aware of the range of technologies available, and get a superficial understanding of theirs strengths ("React has a bigger community, but skimming the Vue tutorial was easier", for example). This process can therefore be done over the course of a few hours, at most a day or two, during your work day in-between scheduled tasks.

Remember, you can always rope in a colleague with broader technical knowledge to help out: the goal is to improve things for your project, after all.

3. Getting management buy-in

At this point you should have:

  1. Identified a problem area.
  2. Identified a technology or three that might solve this problem.

Next you need to convince your manager that it's worth the cost of trying out a new technology. In particular you need to:

Your pitch might go something like this:

Demonstrate the problem: "Hey, you know how we have all these problems with Foobar, where we spend all our time redoing the formatting instead of working on actual features?"

Suggest a solution: "I've been looking into it and I think the problem is that we're using a pretty old library; it turns out there's some newer tools that could make things easier."

Evidence for solution: "For example the Format.js library, it's got a huge userbase, and from the docs it's pretty easy to use, and see this example, that's exactly what we waste all our time on doing manually!"

Next step: "So, how about for that small project we're doing next month I try this out instead of our usual Foobar setup, and see how it goes?"

If your manager agrees: success! You now have time to learn a new technology in depth, on the job.

If your manager doesn't agree, all is not lost. You've gained awareness of what newer technologies are available; you might spend a little spare time here and there at work learning it more in depth. And when you next interview for a job, you'll have some sense of technologies to either brush up on, or at least to mention during the interview: "Formatting? Oh, we used Foobar, which is pretty bad because X, Y and Z. But I did some research and found Format.js and it seemed a lot better because A, B and C. So that's what I'd use in the future."

Don't just be a problem solver

The process I describe above is just one approach; no doubt there are others. The key skill involved, however, can't be replaced: learning how to identify problems is critical to your success as a programmer.

As a junior programmer you get handed a solution, and then you go off and implement it. When you're more experienced you get handed problems, and come up with solutions on your own: you become a problem solver. In many ways this is an improvement, both in your skills and in your value as an employee, but it's all a dangerous place to be.

If you're a junior programmer no one expects much of you, but once you're past that point expectations rise. And if you're only a problem solver, then you're at the mercy of whoever has the job of identifying problems. If they fail to identify an important problem, like the use of an old technology, or decide your career isn't a problem worth worrying about, then you might find yourself in trouble: working on a failed project, or lacking the skills you need.

Don't just be a problem solver: learn how to identify problems on your own. Every day when you go to work, every time you look at some code, every time you see a bug report or a feature request, every time you feel bored, every time someone complains, ask yourself: "What is the problem here?" As you learn to identify problems, you'll start recognizing obsolete technology. As you learn to identify problems, you'll start noticing the limits of your own skills and your current career choices. You'll become a more valuable employee, and you'll become more effective at achieving your own goals.

(Don't see a form? You may need to disable Privacy Badger temporarily).

23 Oct 2017 4:00am GMT

Hynek Schlawack: Sharing Your Labor of Love: PyPI Quick and Dirty

A completely incomplete guide to packaging a Python module and sharing it with the world on PyPI.

23 Oct 2017 12:00am GMT

20 Oct 2017

feedPlanet Twisted

Jonathan Lange: Category theory in everyday life

I was going to write a post about how knowing some abstract algebra can help you write clearer programs.

Then I saw Eugenia Cheng's excellent talk, Category Theory in Everyday Life, which was a keynote at Haskell Exchange 2017.

It's excellent. She says what I wanted to say much better than I could, and says many more things that I wouldn't have thought to say at all. You should watch it.

The talk assumes very little technical or mathematical knowledge, and certainly no knowledge of Haskell.

20 Oct 2017 11:00pm GMT

12 Oct 2017

feedPlanet Twisted

Jonathan Lange: SPAKE2 in Haskell: How Haskell Helped

Porting SPAKE2 from Python to Haskell helped me understand how SPAKE2 worked, and a large part of that is due to specific features of Haskell.

What's this again?

As a favour for Jean-Paul, I wrote a Haskell library implementing SPAKE2, so he could go about writing a magic-wormhole client. This turned out to be much more work than I expected. Although there was a perfectly decent Python implementation for me to crib from, my ignorance of cryptography and the lack of standards documentation for SPAKE2 made it difficult for me to be sure I was doing the right thing.

One of the things that made it easier was the target language: Haskell. Here's how.

Elliptic curves-how do they work?

The arithmetic around elliptic curves can be slow. There's a trick where you can do the operations in 4D space, rather than 2D space, which somehow makes the operations faster. Brian's code calls these "extended points". The 2D points are called "affine points".

However, there's a catch. Many of the routines can generate extended points that aren't on the curve for that we're working in, which makes them useless (possibly dangerous) for our cryptography.

The Python code deals with this using runtime checks and documentation. There are many checks of isoncurve, and comments like extended->extended.

Because I have no idea what I'm doing, I wanted to make sure I got this right.

So when I defined ExtendedPoint, I put whether or not the point is on the curve (in the group) into the type.

e.g.

-- | Whether or not an extended point is a member of Ed25519.
data GroupMembership = Unknown | Member

-- | A point that might be a member of Ed25519.
data ExtendedPoint (groupMembership :: GroupMembership)
  = ExtendedPoint
  { x :: !Integer
  , y :: !Integer
  , z :: !Integer
  , t :: !Integer
  } deriving (Show)

This technique is called phantom types.

It means we can write functions with signatures like this:

isExtendedZero :: ExtendedPoint irrelevant -> Bool

Which figures out whether an extended point is zero, and we don't care whether it's in the group or not.

Or functions like this:

doubleExtendedPoint
  :: ExtendedPoint preserving
  -> ExtendedPoint preserving

Which says that whether or not the output is in the group is determined entirely by whether the input is in the group.

Or like this:

affineToExtended
  :: AffinePoint
  -> ExtendedPoint 'Unknown

Which means that we know that we don't know whether a point is on the curve after we've projected it from affine to extended.

And we can very carefully define functions that decide whether an extended point is in the group or not, which have signatures that look like this:

ensureInGroup
  :: ExtendedPoint 'Unknown
  -> Either Error (ExtendedPoint 'Member)

This pushes our documentation and runtime checks into the type system. It means the compiler will tell me when I accidentally pass an extended point that's not a member (or not proven to be a member) to something that assumes it is a member.

When you don't know what you are doing, this is hugely helpful. It can feel a bit like a small child trying to push a star-shaped thing through the square-shaped hole. The types are the holes that guide how you insert code and values.

What do we actually need?

Python famously uses "duck typing". If you have a function that uses a value, then any value that has the right methods and attributes will work, probably.

This is very useful, but it can mean that when you are trying to figure out whether your value can be used, you have to resort to experimentation.

inbound_elem = g.bytes_to_element(self.inbound_message)
if inbound_elem.to_bytes() == self.outbound_message:
   raise ReflectionThwarted
pw_unblinding = self.my_unblinding().scalarmult(-self.pw_scalar)
K_elem = inbound_elem.add(pw_unblinding).scalarmult(self.xy_scalar)

Here, g is a group. What does it need to support? What kinds of things are its elements? How are they related?

Here's what the type signature for the corresponding Haskell function looks like:

generateKeyMaterial
  :: AbelianGroup group
  => Spake2Exchange group  -- ^ An initiated SPAKE2 exchange
  -> Element group  -- ^ The outbound message from the other side (i.e. inbound to us)
  -> Element group -- ^ The final piece of key material to generate the session key.

This makes it explicit that we need something that implements AbelianGroup, which is an interface with defined methods.

If we start to rely on something more, the compiler will tell us. This allows for clear boundaries.

When reverse engineering the Python code, it was never exactly clear whether a function in a group implementation was meant to be public or private.

By having interfaces (type classes) enforced by the compiler, this is much more clear.

What comes first?

The Python SPAKE2 code has a bunch of assertions to make sure that one method isn't called before another.

In particular, you really shouldn't generate the key until you've generated your message and received one from the other side.

Using Haskell, I could put this into the type system, and get the compiler to take care of it for me.

We have a function that initiates the exchange, startSpake2:

-- | Initiate the SPAKE2 exchange. Generates a secret (@xy@) that will be held
-- by this side, and transmitted to the other side in "blinded" form.
startSpake2
  :: (AbelianGroup group, MonadRandom randomly)
  => Spake2 group
  -> randomly (Spake2Exchange group)

This takes a Spake2 object for a particular AbelianGroup, which has our password scalar and protocol parameters, and generates a Spake2Exchange for that group.

We have another function that computes the outbound message:

-- | Determine the element (either \(X^{\star}\) or \(Y^{\star}\)) to send to the other side.
computeOutboundMessage
  :: AbelianGroup group
  => Spake2Exchange group
  -> Element group

This takes a Spake2Exchange as its input. This means it is _impossible_ for us to call it unless we have already called startSpake2.

We don't need to write tests for what happens if we try to call it before we call startSpake2, in fact, we cannot write such tests. They won't compile.

Psychologically, this helped me immensely. It's one less thing I have to worry about getting right, and that frees me up to explore other things.

It also meant I had to do less work to be satisfied with correctness. This one line type signature replaces two or three tests.

We can also see that startSpake2 is the only thing that generates random numbers. This means we know that computeOutboundMessage will always return the same element for the same initiated exchange.

Conclusion

Haskell helped me be more confident in the correctness of my code, and also gave me tools to explore the terrain further.

It's easy to think of static types as being a constraint the binds you and prevents you from doing wrong things, but an expressive type system can help you figure out what code to write.

12 Oct 2017 11:00pm GMT

10 Oct 2017

feedPlanet Twisted

Itamar Turner-Trauring: The lone and level sands of software

There's that moment late at night when you can't sleep, and you're so tired you can't even muster the energy to check the time. So you stare blindly at the ceiling and look back over your life, and you think: "Did I really accomplish anything? Was my work worth anything at all?"

I live in a 140-year-old house, a house which has outlasted its architect and builders, and quite possibly will outlast me. But having spent the last twenty years of my life building software, I can't really hope to have my own work live on. In those late night moments I sometimes believe that my resume, like that of most programmers, should open with a quote from Shelley's mocking poem:

My name is Ozymandias, King of Kings;
Look on my Works, ye Mighty, and despair!
Nothing beside remains. Round the decay
Of that colossal Wreck, boundless and bare
The lone and level sands stretch far away.

Who among us has not had projects canceled, rewritten from scratch, obsoleted, abandoned or discarded? Was that code worth writing, or was all that effort just a futile waste?

Decay, boundless and bare

Consider some of the projects I've worked on. I've been writing software for 20+ years at this point, which means I've accumulated many decayed wrecks:

I could go on, but that would just make me sadder. This is not to say none of my software lives on: there are open source projects, mostly, that have survived quite a whole, and will hopefully continue for many more. But I've spent years of my life working on software that is dead and gone.

How about you? How much of your work has survived?

Which yet survive

So what do you have left, after all these years of effort? You get paid for your work, of course, and getting paid has its benefits. And if you're lucky your software proved valuable to someone, for a while at least, before it was replaced or shut down. For me at least that's worth even more than the money.

But there's something else you gain, something you get to take with you when the money is spent and your users have moved on: knowledge, skills, and mistakes you'll know how to avoid next time. Every failure I've listed above, every mistake I've made, every preventable rewrite, is something I hope to avoid the next time around.

And while software mostly dies quickly, the ideas live on, and if we pay attention it'll be the good ideas that survive. I've borrowed ideas for my own logging library from software that is now dead. If my library dies one day, and no doubt it will, I can only hope its own contributions will be revived by one of my users, or even someone who just half-remembers a better way of doing things.

Dead but not forgotten

Since the ultimate benefit of most software projects is what you learned from them, it's important to make sure you're actually learning. It's easy to just do your work and move on. If you're not careful you'll forget to look for the mistakes to avoid next time, and you won't notice the ideas that are the only thing that can truly survive in the long run.

As for me, I've been writing a weekly newsletter where I share my mistakes, some mentioned above, others in my current work: you can gain from my failures, without all the wasted effort.

10 Oct 2017 4:00am GMT

04 Oct 2017

feedPlanet Twisted

Itamar Turner-Trauring: Technical skills alone won't make you productive

When you're just starting out in your career as a programmer, the variety and number of skills you think you need to learn can be overwhelming. And working with colleagues who can produce far more than you can be intimidating, demoralizing, and confusing: how do they do it? How can some programmers create so much more?

The obvious answer is that these productive programmers have technical skills. They know more programming languages, more design patterns, more architectural styles, more testing techniques. And all these do help: they'll help you find a bug faster, or implement a solution that is more elegant and efficient.

But the obvious answer is insufficient: technical skills are necessary, but they're not enough, and they often don't matter as much as you'd think. Productivity comes from avoiding unnecessary work, and unnecessary work is a temptation you'll encounter long before you reach the point of writing code.

In this post I'm going to cover some of the ways you can be unproductive, from most to least unproductive. As you'll see, technical programming skills do help, but only much further along in the process of software development.

How to be unproductive

1. Destructive work

The most wasteful and unproductive thing you can do is work on something that hurts others, or that you personally think is wrong. Instead of creating, you're actively destroying. Instead of making the world a better place, you're making the world a worse place. The better you are at your job, the less productive you are.

Being productive, then, starts with avoiding destructive work.

2. Work that doesn't further your goals

You go to work every day, and you're bored. You're not learning anything, you're not paid well, you don't care one way or another about the results of your work… why bother at all?

Productivity can only be defined against a goal: you're trying to produce some end result. If you're working on something that doesn't further your own goals-making money, learning, making the world a better place-then this work isn't productive for you.

To be productive, figure out your own goals, and then find work that will align your goals with those of your employer.

3. Building something no one wants

You're working for a startup, and it's exciting, hard work, churning out code like there's no tomorrow. Finally, the big day comes: you launch your product to great fanfare. And then no one shows up. Six months later the company shuts down, and you're looking for a new job.

This failure happens at big companies too, and it happens to individuals building their side project: building a product that the world doesn't need. It doesn't matter how good a programmer you are: if you're working on solving a problem that no one has, you're doing unnecessary work.

Personally, I've learned a lot from Stacking the Bricks about how to avoid this form of unproductive work.

4. Running out of time

Even if you're working on a real problem, on a problem you understand well, your work is for naught if you fail to solve the problem before you run out of time or money. Technical skills will help you come up with a faster, simpler solution, but they're not enough. You also need to avoid digressions, unnecessary work that will slow you down.

The additional skills you need here are project planning skills. For example:

5. Solving the symptoms of a problem, instead of the root cause

Finally, you've gotten to the point of solving a problem! Unfortunately, you haven't solved the root cause because you haven't figured out why you're doing your work. You've added a workaround, instead of discovering the bug, or you've made a codepath more efficient, when you could have just ripped out an unused feature altogether.

Whenever you're given a task, ask why you're doing it, what success means, and keep digging until you've found the real problem.

6. Solving a problem inefficiently

You've solved the right problem, on time and on budget! Unfortunately, your design wasn't as clean and efficient as it could have been. Here, finally, technical skills are the most important skills.

Beyond technical skills

If you learn the skills you need to be productive-starting with goals, prioritizing, avoiding digressions, and so on-your technical skills will also benefit. Learning technical skills is just another problem to solve: you need to learn the most important skills, with a limited amount of time. When you're thinking about which skills to learn next, take some time to consider which skills you're missing that aren't a programming language or a web framework.

Here's one suggestion: during my 20+ years as a programmer I've made all but the first of the mistakes I've listed above. You can hear these stories, and learn how to avoid my mistakes, by signing up for my weekly Software Clown email.

04 Oct 2017 4:00am GMT

28 Sep 2017

feedPlanet Twisted

Moshe Zadka: Brute Forcing AES

Thanks to Paul Kehrer for reviewing! Any mistakes or oversights that are left are my responsibility.

AES's maximum key size is 256 bits (there are also 128 and 192 bit versions available). Is that enough? Well, if there is a cryptographic flaw in AES (i.e., a way to recover some bits of the key by some manipulation that takes less than 2**256 operations), then it depends on how big the flaw is. All algorithms come with the probablistic "flaw" that, on average, only 50% of the keys need to be tested -- since the right key is just as easily in the first half as the second half. This means, on average, just 2**255 operations are needed to check "all" keys.

If there is an implementation flaw in your AES implementation, then it depends on the flaw -- most implementation flaws are "game over". For example, if the radio leakage from the CPU is enough to detect key bits, the entire key can be recovered -- but that would be true (with only minor additional hardship) if the key was 4K bit long. Another example is a related subkey attack, where many messages are encrypted with keys that have a certain relationship to each other (e.g., sharing a prefix). This implementation flaw (in a different encryption algorithm) defeated the WEP WiFi standard.

What if there is none? What if actually recovering a key requires checking all possibilities? Can someone do it, if they have a "really big" computer? Or a $10B data-center?

How much is 256-bit security really worth?

Let's see!

We'll be doing a lot of unit conversions, so we bring in the pint library, and create a new unit registry.

import pint
REGISTRY = pint.UnitRegistry()

Assume we have a really fast computer. How fast? As fast as theoretically possible, or so. The time it takes a photon to cross the nucleus of the hydrogen atom (a single proton) is called a "jiffy". (If someone tells you they'll be back in a jiffy, they're probably lying -- unless they're really fast, and going a very short distance!)

REGISTRY.define('jiffy = 5.4*10**-44 seconds')

Some secrets are temporary. Your birthday surprise party is no longer a secret after your friends yell "surprise!". Some secrets are long-lived. The British kept the secret of the broken Enigma until none were in use -- long after WWII was done.

Even the Long Now Foundation, though, does not have concrete plans post-dating the death of our sun. No worries, unless the Twisted gets more efficient, the cursed orb has got a few years on it.

sun_life = 10**10 * REGISTRY.years

With our super-fast computer, how many ticks do we get until the light of the sun shines no longer...

ticks = sun_life.to('jiffy').magnitude

...and how many do we need to brute-force AES?

brute_force_aes = 2**256

Luckily, brute-force parallelises really well: just have each computer check a different part of the key-space. We have fast computer technology, and quite a while, so how many do we need?

parallel = brute_force_aes / ticks

No worries! Let's just take over the US, and use its entire Federal budget to finance our computers.

US_budget = 4 * 10**12

Assume our technology is cheap -- maintaining each computer, for the entire lifetime of the sun, costs a mere $1.

Do we have enough money?

parallel/US_budget
4953.566155198452

Oh, we are only off by a factor of about 5000. We just need the budget of 5000 more countries, about as wealthy as the US, in order to fund our brute-force project.

Again, to be clear, none of this is a cryptographic analysis of AES -- but AES is the target of much analysis, and thus far, no theoretical flaw has been found that gives more than a bit or two. Assuming AES is secure, and assuming the implementation has no flaws, brute-forcing AES is impossible -- even with alien technology, plenty of time and access to quite a bit of the world's wealth.

28 Sep 2017 5:50am GMT

24 Sep 2017

feedPlanet Twisted

Jp Calderone: Finishing the txflashair Dockerfile

Some while ago I got a new wifi-capable camera. Of course, it has some awful proprietary system for actually transferring images to a real computer. Fortunately, it's all based on a needlessly complex HTTP interface which can fairly easily be driven by any moderately capable HTTP client. I played around with FlashAero a bit first but it doesn't do quite what I want out of the box and the code is a country mile from anything I'd like to hack on. It did serve as a decent resource for the HTTP interface to go alongside the official reference which I didn't find until later.

Fast forward a bit and I've got txflashair doing basically what I want - essentially, synchronizing the contents of the camera to a local directory. Great. Now I just need to deploy this such that it will run all the time and I can stop thinking about this mundane task forever. Time to bust out Docker, right? It is 2017 after all.

This afternoon I took the Dockerfile I'd managed to cobble together in the last hack session:


FROM python:2-alpine

COPY . /src
RUN apk add --no-cache python-dev
RUN apk add --no-cache openssl-dev
RUN apk add --no-cache libffi-dev
RUN apk add --no-cache build-base

RUN pip install /src

VOLUME /data

ENTRYPOINT ["txflashair-sync"]
CMD ["--device-root", "/DCIM", "--local-root", "/data", "--include", "IMG_*.JPG"]

and turn it into something halfway to decent and that produces something actually working to boot:


FROM python:2-alpine

RUN apk add --no-cache python-dev
RUN apk add --no-cache openssl-dev
RUN apk add --no-cache libffi-dev
RUN apk add --no-cache build-base
RUN apk add --no-cache py-virtualenv
RUN apk add --no-cache linux-headers

RUN virtualenv /app/env

COPY requirements.txt /src/requirements.txt
RUN /app/env/bin/pip install -r /src/requirements.txt

COPY . /src

RUN /app/env/bin/pip install /src

FROM python:2-alpine

RUN apk add --no-cache py-virtualenv

COPY --from=0 /app/env /app/env

VOLUME /data

ENTRYPOINT ["/app/env/bin/txflashair-sync"]
CMD ["--device-root", "/DCIM", "--local-root", "/data", "--include", "IMG_*.JPG"]

So, what have I done exactly? The change to make the thing work is basically just to install the missing py-virtualenv. It took a few minutes to track this down. netifaces has this as a build dependency. I couldn't find an apk equivalent to apt-get build-dep but I did finally track down its APKBUILD file and found that linux-headers was probably what I was missing. Et voila, it was. Perhaps more interesting, though, are the changes to reduce the image size. I began using the new-ish Docker feature of multi-stage builds. From the beginning of the file down to the 2nd FROM line defines a Docker image as usual. However, the second FROM line starts a new image which is allowed to copy some of the contents of the first image. I merely copy the entire virtualenv that was created in the first image into the second one, leaving all of the overhead of the build environment behind to be discarded.

The result is an image that only has about 50MiB of deltas (compressed, I guess; Docker CLI presentation of image/layer sizes seems ambiguous and/or version dependent) from the stock Alphine Python 2 image. That's still pretty big for what's going on but it's not crazy big - like including all of gcc, etc.

The other changes involving virtualenv are in support of using the multi-stage build feature. Putting the software in a virtualenv is not a bad idea in general but in this case it also provides a directory containing all of the necessary txflashair bits that can easily be copied to the new image. Note that py-virtualenv is also copied to the second image because a virtualenv does not work without virtualenv itself being installed, strangely.

Like this kind of thing? Check out Supporing Open Source on the right.

24 Sep 2017 7:15pm GMT

23 Sep 2017

feedPlanet Twisted

Glyph Lefkowitz: Photo Flow

Hello, the Internet. If you don't mind, I'd like to ask you a question about photographs.

My spouse and I both take pictures. We both anticipate taking more pictures in the near future. No reason, just a total coincidence.

We both have iPhones, and we both have medium-nice cameras that are still nicer than iPhones. We would both like to curate and touch up these photos and actually do something with them; ideally we would do this curation collaboratively, whenever either of us has time.

This means that there are three things we want to preserve:

  1. The raw, untouched photographs, in their original resolution,
  2. The edits that have been made to them, and
  3. The "workflow" categorization that has been done to them (minimally, "this photo has not been looked at", "this photo has been looked at and it's not good enough to bother sharing", "this photo has been looked at and it's good enough to be shared if it's touched up", and "this has been/should be shared in its current state"). Generally speaking this is a "which album is it in" categorization.

I like Photos. I have a huge photo library with years of various annotations in it, including faces (the only tool I know of that lets you do offline facial recognition so you can automatically identify pictures of your relatives without giving the police state a database to do the same thing).

However, iCloud Photo Sharing has a pretty major issue; it downscales photographs to "up to 2048 pixels on the long edge", which is far smaller even than the 12 megapixels that the iPhone 7 sports; more importantly it's lower resolution than our television, so the image degradation is visible. This is fine for sharing a pic or two on a small phone screen, but not good for a long-term archival solution.

To complicate matters, we also already have an enormous pile of disks in a home server that I have put way too much energy into making robust; a similarly-sized volume of storage would cost about $1300 a year with iCloud (and would not fit onto one account, anyway). I'm not totally averse to paying for some service if it's turnkey, but something that uses our existing pile of storage would definitely get bonus points.

Right now, my plan is to dump all of our photos into a shared photo library on a network drive, only ever edit them at home, try to communicate carefully about when one or the other of us is editing it so we don't run into weird filesystem concurrency issues, and hope for the best. This does not strike me as a maximally robust solution. Among other problems, it means the library isn't accessible to our mobile devices. But I can't think of anything better.

Can you? Email me. If I get a really great answer I'll post it in a followup.

23 Sep 2017 1:27am GMT

19 Sep 2017

feedPlanet Twisted

Moshe Zadka: Announcing NColony 17.9.0

I have released NColony 17.9.0, available in a PyPI near you.

New this version:

Thanks to Mark Williams for reviewing many pull requests.

19 Sep 2017 10:00pm GMT