24 Sep 2016

feedPlanet Twisted

Hynek Schlawack: Sharing Your Labor of Love: PyPI Quick and Dirty

A completely incomplete guide to packaging a Python module and sharing it with the world on PyPI.

24 Sep 2016 12:00pm GMT

17 Sep 2016

feedPlanet Twisted

Glyph Lefkowitz: Hitting The Wall

I'm an introvert.

I say that with a full-on appreciation of just how awful thinkpieces on "introverts" are.

However, I feel compelled to write about this today because of a certain type of social pressure that a certain type of introvert faces. Specifically, I am a high-energy introvert.

Cementing this piece's place in the hallowed halls of just awful thinkpieces, allow me to compare my mild cognitive fatigue with the plight of those suffering from chronic illness and disability1. There's a social phenomenon associated with many chronic illnesses, "but you don't LOOK sick", where well-meaning people will look at someone who is suffering, with no obvious symptoms, and imply that they really ought to be able to "be normal".

As a high-energy introvert, I frequently participate in social events. I go to meet-ups and conferences and I engage in plenty of public speaking. I am, in a sense, comfortable extemporizing in front of large groups of strangers.

This all sounds like extroverted behavior, I know. But there's a key difference.

Let me posit two axes for personality type: on the X axis, "introvert" to "extrovert", and on the Y, "low energy" up to "high energy".

The X axis describes what kinds of activities give you energy, and the Y axis describes how large your energy reserves are for the other type.

Notice that I didn't say which type of activity you enjoy.

Most people who would self-describe as "introverts" are in the low-energy/introvert quadrant. They have a small amount of energy available for social activities, which they need to frequently re-charge by doing solitary activities. As a result of frequently running out of energy for social activities, they don't enjoy social activities.

Most people who would self-describe as "extroverts" are also on the "low-energy" end of the spectrum. They have low levels of patience for solitary activity, and need to re-charge by spending time with friends, going to parties, etc, in order to have the mental fortitude to sit still for a while and focus. Since they can endlessly get more energy from the company of others, they tend to enjoy social activities quite a bit.

Therefore we have certain behaviors we expect to see from "introverts". We expect them to be shy, and quiet, and withdrawn. When someone who behaves this way has to bail on a social engagement, this is expected. There's a certain affordance for it. If you spend a few hours with them, they may be initially friendly but will visibly become uncomfortable and withdrawn.

This "energy" model of personality is of course an oversimplification - it's my personal belief that everyone needs some balance of privacy and socialization and solitude and eventually overdoing one or the other will be bad for anyone - but it's a useful one.

As a high-energy introvert, my behavior often confuses people. I'll show up at a week's worth of professional events, be the life of the party, go out to dinner at all of them, and then disappear for a month. I'm not visibily shy - quite the opposite, I'm a gregarious raconteur. In fact, I quite visibly enjoy the company of friends. So, usually, when I try to explain that I am quite introverted, this claim is met with (quite understandable) skepticism.

In fact, I am quite functionally what society expects of an "extrovert" - until I hit the wall.


In endurance sports, one is said to "hit the wall" at the point where all the short-term energy reserves in one's muscles are exhausted, and there is a sudden, dramatic loss of energy. Regardless, many people enjoy endurance sports; part of the challenge of them is properly managing your energy.

This is true for me and social situations. I do enjoy social situations quite a bit! But they are nevertheless quite taxing for me, and without prolonged intermissions of solitude, eventually I get to the point where I can no longer behave as a normal social creature without an excruciating level of effort and anxiety.

Several years ago, I attended a prolonged social event2 where I hit the wall, hard. The event itself was several hours too long for me, involved meeting lots of strangers, and in the lead-up to it I hadn't had a weekend to myself for a few weeks due to work commitments and family stuff. Towards the end I noticed I was developing a completely flat affect, and had to start very consciously performing even basic body language, like looking at someone while they were talking or smiling. I'd never been so exhausted and numb in my life; at the time I thought I was just stressed from work.

Afterwards though, I started having a lot of weird nightmares, even during the daytime. This concerned me, since I'd never had such a severe reaction to a social situation, and I didn't have good language to describe it. It was also a little perplexing that what was effectively a nice party, the first half of which had even been fun for me, would cause such a persistent negative reaction after the fact. After some research, I eventually discovered that such involuntary thoughts are a hallmark of PTSD.

While I've managed to avoid this level of exhaustion before or since, this was a real learning experience for me that the consequences of incorrectly managing my level of social interaction can be quite severe.

I'd rather not do that again.


The reason I'm writing this, though3, is not to avoid future anxiety. My social energy reserves are quite large enough, and I now have enough self-knowledge, that it is extremely unlikely I'd ever find myself in that situation again.

The reason I'm writing is to help people understand that I'm not blowing them off because I don't like them. Many times now, I've declined or bailed an invitation from someone, and later heard that they felt hurt that I was passive-aggressively refusing to be friendly.

I certainly understand this reaction. After all, if you see someone at a party and they're clearly having a great time and chatting with everyone, but then when you invite them to do something, they say "sorry, too much social stuff", that seems like a pretty passive-aggressive way to respond.

You might even still be skeptical after reading this. "Glyph, if you were really an introvert, surely, I would have seen you looking a little shy and withdrawn. Surely I'd see some evidence of stage fright before your talks."

But that's exactly the problem here: no, you wouldn't.

At a social event, since I have lots of energy to begin with, I'll build up a head of steam on burning said energy that no low-energy introvert would ever risk. If I were to run out of social-interaction-juice, I'd be in the middle of a big crowd telling a long and elaborate story when I find myself exhausted. If I hit the wall in that situation, I can't feel a little awkward and make excuses and leave; I'll be stuck creepily faking a smile like a sociopath and frantically looking for a way out of the converstaion for an hour, as the pressure from a large crowd of people rapidly builds up months worth of nightmare fuel from my spiraling energy deficit.

Given that I know that's what's going to happen, you won't see me when I'm close to that line. You won't be in at my desk when I silently sit and type for a whole day, or on my couch when I quietly read a book for ten hours at a time. My solitary side is, by definition, hidden.

But, if I don't show up to your party, I promise: it's not you, it's me.


  1. In all seriousness: this is a comparison of kind and not of degree. I absolutely do not have any illusions that my minor mental issues are a serious disability. They are - by definition, since I do not have a diagnosis - subclinical. I am describing a minor annoyance and frequent miscommunication in this post, not a personal tragedy.

  2. I'll try to keep this anonymous, so hopefully you can't guess - I don't want to make anyone feel bad about this, since it was my poor time-management and not their (lovely!) event which caused the problem.

  3. ... aside from the hope that maybe someone else has had trouble explaining the same thing, and this will be a useful resource for them ...

17 Sep 2016 9:18pm GMT

16 Sep 2016

feedPlanet Twisted

Itamar Turner-Trauring: Introducing the Programmer's Guide to a Sane Workweek

I'm working on a book: The Programmer's Guide to a Sane Workweek, a guide to how you can achieve a saner, shorter workweek. If you want to hear more about the book, receive free excerpts and updates and a discount when the book is released, then signup in the email subscription at the end of the post. Meanwhile, here's the first excerpt from the book:

A sane workweek is achievable: for the past 4 years I've been working less than 40 hours a week.

Soon after my daughter was born I quit my job as a product manager at Google and became a part-time consultant, writing software for clients. I wrote code for 20-something hours each week while our child was in daycare, and I spent the rest of my time taking care of our kid.

Later I got a job with one of my clients, a startup, where I worked as an employee but had a 28-hour workweek. These days I work at another startup, with a 35-hour workweek.

I'm not the only software engineer who has chosen to work a saner, shorter workweek. There are contractors who work part-time, spending the rest of their time starting their own business. There are employees with specialized skills who only work two days a week. There are even entrepreneurs who have deliberately created a business that isn't all-consuming.

Would you like to join us?

If you're a software developer working crazy hours then this book can help you get to a saner schedule. Of course what makes a schedule sane or crazy won't be the same for me as it is for you. You should spend some time thinking about what exactly it is that you want.

How much time do you want to spend working each week?

Depending on what you want there are different paths you can pursue.

Some paths to a saner workweek

Here are some ways you can reduce your workweek; I'll cover them in far more detail in later chapters of the book:

Normalizing your workweek

If you're working a lot more than 40 hours a week you always have the option of unilaterally normalizing your hours. That is, reducing your hours down to 40 hours or 45 hours or whatever you think is fair. Chances are your productivity and output will actually increase. You might face problems, however, if your employer cares more about hours "worked" than about output.

Reducing overhead

Chances are that the hours your employer counts as your work are just part of the time you spend on your job. In particular, commuting can take another large bite out your free time. Cut down on commuting and long lunch breaks and you've gotten some of that time back without any reduction in the hours your boss cares about.

Negotiating a shorter workweek at your current job

If you want a shorter-than-normal workweek you can try to negotiate that at your current job. Your manager doesn't want to replace a valued, trained employee: hiring new people is expensive and risky. That means you have an opening to negotiate shorter hours. This is one of the most common ways software engineers I know have reduced their hours.

Find a shorter workweek at a new job

If you're looking for a 40-hour workweek this is mostly about screening for a good company culture as part of your interview process. If you want a shorter-than-normal workweek you will need to negotiate a better job offer. That usually means your salary but you can sometimes negotiate shorter working hours. This path can be tricky; I've managed to do it, but have also been turned down, and I know of other people who have failed. It's easier if you've already worked for the company as a consultant, so they know what they're getting. Alternatively if your previous (ideally, your current) job gave you a shorter workweek you'll have better negotiating leverage.

Long-term contracts

Instead of working as an employee you can take on long-term contract work, often through an agency. The contract can specify how many hours you will work, and shorter workweeks are sometimes possible. You can even get paid overtime!

Consulting

Instead of taking on long-term work, which is similar in many ways to being an employee, you go out and find project work for yourself. That means you need to spend something like half your time on marketing. By marketing well and providing high value to your clients you can charge high rates, allowing you to work reasonable hours.

Product business

All the paths so far involved exchanging money for time, in one form or another. As a software engineer you have another choice: you can make a product once and easily sell that same product multiple times. That means your income is no longer directly tied to how many hours you work. You'll need marketing and other business skills to do so, and you won't just be writing code.

Early retirement

Finally, if you don't want to work ever again there is the path of early retirement. That doesn't mean you can't get make money afterwards; it means you no longer have to make a living, you've earned enough that your time is your own. To get there you'll need very low living expenses, and a high saving rate while you're still working. Luckily programmers tend to get paid well.

Which path will you take?

Each of these paths has its own set of requirements and trade-offs, so it's worth considering which one fits your needs. At different times of your life you might prefer one path, and later you might prefer another. For example, I've worked as both a consultant and a part-time employee.

What kind of work environment do you want right now?

A later chapter will cover choosing your path in more detail. For now, take a little time to think it through and imagine what your ideal job would be like. Combine that with your weekly hours goal you should get some sense of which path is best for you.

It won't be easy

Working a sane workweek is not something corporate culture encourages, at least in the US. That means you won't be following the default, easy path that most workers do: you're going to need to do some work to get to your destination. In later chapters I'll explain how you can acquire the prerequisites for your chosen path, but for now here's a summary:

How much do you really want to work a sane workweek? Do you care enough to make the necessary effort?

It won't be easy, but I think it's worth it.

Shall we get started?

16 Sep 2016 4:00am GMT

15 Sep 2016

feedPlanet Twisted

Moshe Zadka: Post-Object-Oriented Design

In the beginning, came the so-called "procedural" style. Data was data, and behavior, implemented as procedure, were separate things. Object-oriented design is the idea to bundle data and behavior into a single thing, usually called "classes". In return for having to tie the two together, the thought went, we would get polymorphism.

Polymorphism is pretty neat. We send different objects the same message, for example, "turn yourself into a string", and they respond appropriately - each according to their uniquely defined behavior.

But what if we could separate the data and beahvior, and still get polymorphism? This is the idea behind post-object-oriented design.

In Python, we achieve this with two external packages. One is the "attr" package. This package allows a useful way to define bundles of data, that still exhibit the minimum amount of behavior we do want: initialization, string representation, hashing and more.

The other is the "singledispatch" package (available as functools.singledispatch in Python 3.4+).

import attr
import singledispatch

In order to be specific, we imagine a simple protocol. The low-level details of the protocol do not concern us, but we assume some lower-level parsing allows us to communicate in dictionaries back and forth (perhaps serialized/deserialized using JSON).

Our protocol is one to send changes to a map. The only two messages are "set", to set a key to a given value, and "delete", to delete a key.

messages = (
{
    'type': 'set',
    'key': 'language',
    'value': 'python'
},
{
    'type': 'delete',
    'key': 'human'
}
)

We want to represent those as attr-based classes.

@attr.s
class Set(object):
    key = attr.ib()
    value = attr.ib()

@attr.s
class Delete(object):
    key = attr.ib()
print(Set(key='language', value='python'))
print(Delete(key='human'))
Set(key='language', value='python')
Delete(key='human')

When incoming dictionaries arrive, we want to convert them to the logical classes. This code could not be simpler, in this example. (The reason is mostly because the protocol is simple.)

def from_dict(dct):
    tp = dct.pop('type')
    name_to_klass = dict(set=Set, delete=Delete)
    try:
        klass = name_to_klass[tp]
    except KeyError:
        raise ValueError('unknown type', tp)
    return klass(**dct)

Note how we take advantage of the fact that attr-based classes accept correctly-named keyword arguments.

from_dict(dict(type='set', key='name', value='myname')), from_dict(dict(type='delete', key='data'))
(Set(key='name', value='myname'), Delete(key='data'))

But this was easy! There was no need for polymorphism: we always get one type in (dictionaries), and we consult a mapping to decide which type to produce.

However, for serialization, we do need polymorphism. Enter our second tool - the singledispatch package. The default function is equivalent to a method defined on "object": the ultimate super-class. Since we do not want to serialize generic objects, our default implementation errors out.

@singledispatch.singledispatch
def to_dict(obj):
    raise TypeError("cannot serialize", obj)

Now, we implement the actual serializers. The names of the functions are not important. To emphasize they should not be used directly, we make them "private" by prepending an underscore.

@to_dict.register(Set)
def _to_dict_set(st):
    return dict(type='set', key=st.key, value=st.value)

@to_dict.register(Delete)
def _to_dict_delete(dlt):
    return dict(type='delete', key=dlt.key)

Indeed, we do not call them directly.

print(to_dict(Set(key='k', value='v')))
print(to_dict(Delete(key='kk')))
{'type': 'set', 'value': 'v', 'key': 'k'}
{'type': 'delete', 'key': 'kk'}

However, arbitrary objects cannot be serialized.

try:
    to_dict(object())
except TypeError as e:
    print e
('cannot serialize', <object object at 0x7fbdb254ac60>)

Now that the structure of adding such an "external method" has been shown, another example can be given: "act on": applying the changes requested to an in-memory map.

@singledispatch.singledispatch
def act_on(command, d):
    raise TypeError("Cannot act on", command)

@act_on.register(Set)
def act_on_set(st, d):
    d[st.key] = st.value

@act_on.register(Delete)
def act_on_delete(dlt, d):
    del d[dlt.key]

d = {}
act_on(Set(key='name', value='woohoo'), d)
print("After setting")
print(d)
act_on(Delete(key='name'), d)
print("After deleting")
print(d)
After setting
{'name': 'woohoo'}
After deleting
{}

In this case, we kept the functionality "near" the code. However, note that the functionality could be implemented in a different module: these functions, even though they are polymorphic, follow Python namespace rules. This is useful: several different modules could implement "act_on": for example, an in-memory map (as we defined above), a module using Redis or a module using a SQL database.

Actual methods are not completely obsolete. It would still be best to make methods do anything that would require private attribute access. In simple cases, as above, there is no difference between the public interface and the public implementation.

15 Sep 2016 6:03am GMT

09 Sep 2016

feedPlanet Twisted

Itamar Turner-Trauring: How to choose a side project

If you're a programmer just starting out you'll often get told to work on side projects, beyond what you do at school or work. But there are so many things you could be doing: what should you be working on? How do you choose a side project you will actually finish? How will you make sure you're learning something?

Keep in mind that you don't actually have to work on side projects to be a good programmer. I know many successful software engineers who code only at their job and spend their free time on other hobbies. But if you do want to work on software in your spare time there are two different approaches you can take.

To understand these approaches let's consider a real side project that managed to simultaneously both succeed and fail.

Long ago, in an Internet far far away

Back in 2000 my friend Glyph started a small project called Twisted Reality. It was supposed to be a game engine, with the goal of implementing a particularly complex and sophisticated game.

Since the game had a chat system, and web server, and other means of communication the game grew a networking engine. Glyph and his friends hung out on the Internet Relay Chat (IRC) Python channel and every time someone asked a networking question they'd tell them "use Twisted Reality!" Over time more people would show up needing a small feature added to the networking engine, so they'd submit a patch. That's how I became a Twisted Reality contributor.

Eventually the networking engine grew so big that Twisted Reality was split into two projects: the Twisted networking framework and the Reality game engine. These days Twisted is used by companies like Apple, Cisco and Yelp, and is still going strong. The game engine has been through multiple rewrites, but the game it was designed for has never been built.

Approach #1: solving a problem

The difference between Twisted, a successful side project, and the game that never got written is that Twisted solved a specific, limited problem. If you need to write some networking code in Python then Twisted will help you get it done quickly and well. The game, however, was so ambitious that it was never started: there was always another simulation feature to be added to the game engine first.

If you are building a side project choose one that solves a specific, limited problem. For example, let's say you feel you're wasting time playing on Facebook when you should be doing homework.

  1. "Build the best time tracking app ever" is neither limited nor specific, nor is it really a problem you're solving.
  2. "I want to keep track of how much time I spend actually working on homework vs. procrastinating" is better, but still not quite problem-driven.
  3. A good problem statement is "I want to prevent myself from visiting Facebook and other specific websites while I'm working on homework." At this point you have a clear sense of what software you're building.

Why a specific and limited problem?

Approach #2: artificial limits

How do you choose a side project if you don't have any specific problems in mind? The key is to still have constraints and limits so that your project is small, achievable and has a clear goal.

One great way to do that is to set a time limit. I'm not a fan of hackathons, since they promote the idea that sleeplessness and working crazy hours is a reasonable way to write software. But with a longer time frame building something specific with a time limit can be a great way to create a side project.

The PyWeek project for example has you build a game in one week, using a theme chosen by the organizers. Building a game isn't solving a problem, but it can still be fun and educational. And the one week limit will ensure you focus your efforts and achieve something concrete.

Software has no value

Whether you decide to solve a problem or to set artificial time limits on your side project, the key is having constraints and a clear goal. Software is just a tool, there is no inherent value in producing more; value is produced by solving problems or the entertainment value of a game. A half-solved problem or a half-finished game are valueless, so you want your initial goal to be small and constrained.

I've learned this the hard way, focusing on the value of my code instead of on the problems it solved. If you want to avoid that and other mistakes I've made over 20 years of writing software check out my career as a Software Clown.

09 Sep 2016 4:00am GMT

28 Aug 2016

feedPlanet Twisted

Twisted Matrix Laboratories: Twisted 16.4.0 Released

On behalf of Twisted Matrix Laboratories, I am honoured to announce the release of Twisted 16.4.0.

The highlights of this release are:

For more information, check the NEWS file (link provided below).

You can find the downloads on PyPI (or alternatively our website). The NEWS file is also available on GitHub.

Many thanks to everyone who had a part in this release - the supporters of the Twisted Software Foundation, the developers who contributed code as well as documentation, and all the people building great things with Twisted!

Twisted Regards,
Amber Brown (HawkOwl)

PS: Twisted 16.4.1 will be coming soon after this with a patch mitigating SWEET32, by updating the acceptable cipher list.

28 Aug 2016 1:48am GMT

25 Aug 2016

feedPlanet Twisted

Itamar Turner-Trauring: From 10x programmer to 0.1x programmer: creating more with less

You've heard of the mythical 10x programmers, programmers who can produce ten times as much as us normal humans. If you want to become a better programmer this myth is demoralizing, but it's also not useful: how can you write ten times as much code? On the other hand, consider the 0.1x programmer, a much more useful concept: anyone can choose to write only 10% code as much code as a normal programmer would. As they say in the business world, becoming a 0.1x programmer is actionable.

Of course writing less code might seem problematic, so let's refine our goal a little. Can you write 10% as much code as you do now and still do just as well at your job, still fixing the same amount of bugs, still implementing the same amount of features? The answer may still be "no", but at least this is a goal you can more easily work towards incrementally.

Doing more with less code

How do you do achieve just as much while writing less code?

1. Use a higher level programming language

As it turns out many of us are 0.1x programmers without even trying, compared to previous generations of programmers that were stuck with lower-level programming languages. If you don't have to worry about manual memory management or creating a data structure from scratch you can write much less code to achieve the same goal.

2. Use existing code

Instead of coding from scratch, use an existing library that achieves the same thing. For example, earlier this week I was looking at the problem of incrementing version numbers in source code and documentation as part of a release. A little searching and I found an open source tool that did exactly what I needed. Because it's been used by many people and improved over time chances are it's better designed, better tested, and less buggy than my first attempt would have been.

3. Spend some time thinking

Surprisingly spending more time planning up front can save you time in the long run. If you have 2 days to fix a bug it's worth spending 10% of that time, an hour and half, to think about how to solve it. Chances are the first solution you come up with in the first 5 minutes won't be the best solution, especially if it's a hard problem. Spend an hour more thinking and you might come up with a solution that takes two hours instead of two days.

4. Good enough features

Most feature requests have three parts:

  1. The stuff the customer must have.
  2. The stuff that is nice to have but not strictly necessary.
  3. The stuff the customer is willing to admit is not necessary.

The last category is usually dropped in advance, but you're usually still asked to implement the middle category of things that the customer and product manager really really want but aren't actually strictly necessary. So figure out the real minimum path to implement a feature, deliver it, and much of the time it'll turn out that no one will miss those nice-to-have additions.

5. Drop the feature altogether

Some features don't need to be done at all. Some features are better done a completely different way than requested.

Instead of saying "yes, I'll do that" to every feature request, make sure you understand why someone needs the feature, and always consider alternatives. If you come up with a faster, superior idea the customer or product manager will usually be happy to go along with your suggestion.

6. Drop the product altogether

Sometimes your whole product is not worth doing: it will have no customers, will garner no interest. Spending months and months on a product no one will ever use is a waste of time, not to mention depressing.

Lean Startup is one methodology for dealing with this: before you spend any time developing the product you do the minimal work possible to figure out if it's worth doing in the first place.

Conclusion

Your goal as programmer is not to write code, your goal is to solve problems. From low-level programming decisions to high-level business decisions there are many ways you can solve problems with less code. So don't start with "how do I write this code?", start with "how do I solve this problem?" Sometimes you'll do better not solving the problem at all, or redefining it. As you get better at solving problems with less code you will find yourself becoming more productive, especially if you start looking at the big picture.

Being productive is a great help if you're tired of working crazy hours. Want a shorter workweek? Check out The Programmer's Guide to a Sane Workweek.

25 Aug 2016 4:00am GMT

Moshe Zadka: Time Series Data

When operating computers, we are often exposed to so-called "time series". Whether it is database latency, page fault rate or total memory used, these are all exposed as numbers that are usually sampled at frequent intervals.

However, not only computer engineers are exposed to such data. It is worthwhile to know what other disciplines are exposed to such data, and what they do with it. "Earth sciences" (geology, climate, etc.) have a lot of numbers, and often need to analyze trends and make predictions. Sometimes these predictions have, literally, billions dollars' worth of decision hinging on them. It is worthwhile to read some of the textbooks for students of those disciplines to see how to approach those series.

Another discipline that needs to visually inspect time series data is physicians. EKG data is often vital to analyze patients' health - and especially when compared to their historical records. For that, that data needs to be saved. A lot of EKG research has been done on how to compress numerical data, but still keep it "visually the same". While the research on that is not as rigorous, and not as settled, as the trend analysis in geology, it is still useful to look into. Indeed, even the basics are already better than so-called "roll-ups", which preserve none of the visual distinction of the data, flattening peaks and filling hills while keeping a score of "standard deviation" that is not as helpful as is usually hoped for.

25 Aug 2016 3:50am GMT

24 Aug 2016

feedPlanet Twisted

Hynek Schlawack: Hardening Your Web Server’s SSL Ciphers

There are many wordy articles on configuring your web server's TLS ciphers. This is not one of them. Instead I will share a configuration which is both compatible enough for today's needs and scores a straight "A" on Qualys's SSL Server Test.

24 Aug 2016 3:40pm GMT

22 Aug 2016

feedPlanet Twisted

Hynek Schlawack: Better Python Object Serialization

The Python standard library is full of underappreciated gems. One of them allows for simple and elegant function dispatching based on argument types. This makes it perfect for serialization of arbitrary objects - for example to JSON in web APIs and structured logs.

22 Aug 2016 12:30pm GMT

20 Aug 2016

feedPlanet Twisted

Moshe Zadka: Extension API: An exercise in a negative case study

I was idly contemplating implementing a new Jupyter kernel. Luckily, they try to provide facility to make it easier. Unfortunately, they made a number of suboptimal choices in their API. Fortunately, those mistakes are both common and easily avoidable.

Subclassing as API

They suggest subclassing IPython.kernel.zmq.kernelbase.Kernel. Errr…not "suggest". It is a "required step". The reason is probably that this class already implements 21 methods. When you subclass, make sure to not use any of these names, or things will break randomly. If you do not want to subclass, good luck figuring out what the assumption that the system makes about these 21 methods because there is no interface or even prose documentation.

The return statement in their example is particularly illuminating:

        return {'status': 'ok',
                # The base class increments the execution count
                'execution_count': self.execution_count,
                'payload': [],
                'user_expressions': {},
               }

Note the comment "base class increments the execution count". This is a classic code smell: this seems like this would be needed in every single overrider, which means it really belongs in the helper class, not in every kernel.

None

The signature for the example do_execute is:

    def do_execute(self, code, silent, store_history=True, 
                   user_expressions=None,
                   allow_stdin=False):

Of course, this means that user_expressions will sometimes be a dictionary and sometimes None. It is likely that the code will be written to anticipate one or the other, and will fail in interesting ways if None is actually sent.

Optional Overrides

As described in this section there are also ways to make the kernel better with optional overrides. The convention used, which is nowhere explained, is that do_ methods mean you should override to make a better kernel. Nowhere it is explained why there is no default history implementation, or where to get one, or why a simple stupid implementation is wrong.

Dictionaries

All overrides return dictionaries, which get serialized directly into the underlying communication platform. This is a poor abstraction, especially when the documentation is direct links to the underlying protocol. When wrapping a protocol, it is much nicer to use an Interface as the documentation of what is assumed - and define an attr.s-based class to allow returning something which is automatically the correct type, and will fail in nice ways if a parameter is forgotten.

Summary

If you are providing an API, here are a few positive lessons based on the issues above:

20 Aug 2016 6:56pm GMT

19 Aug 2016

feedPlanet Twisted

Twisted Matrix Laboratories: Twisted 16.3.2 Released

On behalf of Twisted Matrix Laboratories, I am honoured to announce the release of Twisted 16.3.2.

This is a bug fix & security fix release, and is recommended for all users of Twisted. The fixes are:

For more information, check the NEWS file (link provided below).

You can find the downloads on PyPI (or alternatively our website). The NEWS file is also available at on GitHub.

Many thanks to everyone who had a part in this release - the supporters of the Twisted Software Foundation, the developers who contributed code as well as documentation, and all the people building great things with Twisted!

Twisted Regards,
Amber Brown (HawkOwl)

19 Aug 2016 9:45am GMT

18 Aug 2016

feedPlanet Twisted

Jonathan Lange: Patterns are half-formed code

If "technology is stuff that doesn't work yet"[1], then patterns are code we don't know how to write yet.

In the Go Programming Language, the authors show how to iterate over elements in a map, sorted by keys:

To enumerate the key/value pairs in order, we must sort the keys explicitly, for instances, using the Strings function from the sort package if the keys are strings. This is a common pattern.

-Go Programming Language, Alan A. A. Donovan & Brian W. Kernighan, p94

The pattern is illustrated by the following code:

import "sort"

var names []string
for name := range ages {
    name = append(names, name)
}
sort.Strings(names)
for _, name := range names {
    fmt.Printf("%s\t%d\n", name, ages[name])
}

Peter Norvig calls this an informal design pattern: something referred to by name ("iterate through items in a map in order of keys") and re-implemented from scratch each time it's needed.

Informal patterns have their place but they are a larval form of knowledge, stuck halfway between intuition and formal understanding. When we see a recognize a pattern, our next step should always be to ask, "can we make it go away?"

Patterns are one way of expressing "how to" knowledge [2] but we have another, better way: code. Source code is a formal expression of "how to" knowledge that we can execute, test, manipulate, verify, compose, and re-use. Encoding "how to" knowledge is largely what programming is [3]. We talk about replacing people with programs precisely because we take the knowledge about how to do their job and encode it such that even a machine can understand it.

So how can we encode the knowledge of iterating through the items in a map in order of keys? How can we replace this pattern with code?

We can start by following Peter Norvig's example and reach for a dynamic programming language, such as Python:

names = []
for name in ages:
    names.append(name)
names.sort()
for name in names:
    print("{}\t{}".format(name, ages[name]))

This is a very literal translation of the first snippet. A more idiomatic approach would look like:

names = sorted(ages.keys())
for name in names:
    print("{}\t{}".format(name, ages[name])

To turn this into a formal pattern, we need to extract a function that takes a map and returns a list of pairs of (key, value) in sorted order, like so:

def sorted_items(d):
    result = []
    sorted_keys = sorted(d.keys())
    for k in sorted_keys:
        result.append((k, d[k]))
    return result

for name, age in sorted_items(ages):
    print("{}\t{}".format(name, age))

The pattern has become a function. Instead of a name or a description, it has an identifier, a True Name that gives us power over the thing. When we invoke it we don't need to comment our code to indicate that we are using a pattern because the name sorted_items makes it clear. If we choose, we can test it, optimize it, or perhaps even prove its correctness.

If we figure out a better way of doing it, such as:

def sorted_items(d):
    return [(k, d[k]) for k in sorted(d.keys())]

Then we only have to change one place.

And if we are willing to tolerate a slight change in behavior,

def sorted_items(d):
    return sorted(d.items())

Then we might not need the function at all.

It was being able to write code like this that drew me towards Python and away from Java, way back in 2001. It wasn't just that I could get more done in fewer lines-although that helped-it was that I could write what I meant.

Of course, these days I'd much rather write:

import Data.List (sort)
import qualified Data.HashMap as Map

sortedItems :: (Ord k, Ord v) => Map.Map k v -> [(k, v)]
sortedItems d = sort (Map.toList d)

But that's another story.

[1] Bran Ferren, via Douglas Adams
[2] Patterns can also contain "when to", "why to", "why not to", and "how much" knowledge, but they _always_ contain "how to" knowledge.
[3] The excellent SICP lectures open with the insight that what we call "computer science" might be the very beginning of a science of "how to" knowledge.

18 Aug 2016 5:00pm GMT

Itamar Turner-Trauring: Less stress, more productivity: why working fewer hours is better for you and your employer

Update: This post got to #1 on Hacker News and the /r/programming subreddit, and had over 40,000 views. Given that level of interest in the subject I've decided to write The Programmer's Guide to a Sane Workweek.

There's always too much work to be done on software projects, too many features to implement, too many bugs to fix. Some days you're just not going through the backlog fast enough, you're not producing enough code, and it's taking too long to fix a seemingly-impossible bug. And to make things worse you're wasting time in pointless meetings instead of getting work done.

Once it gets bad enough you can find yourself always scrambling, working overtime just to keep up. Pretty soon it's just expected, and you need to be available to answer emails at all hours even when there are no emergencies. You're tired and burnt out and there's still just as much work as before.

The real solution is not working even harder or even longer, but rather the complete opposite: working fewer hours.

Some caveats first:

Fewer hours, more productivity

Why does working longer hours not improve the situation? Because working longer makes you less productive at the same time that it encourages bad practices by your boss. Working fewer hours does the opposite.

1. A shorter work-week improves your ability to focus

As I've discussed before, working while tired is counter-productive. It takes longer and longer to solve problems, and you very quickly hit the point of diminishing returns. And working consistently for long hours is even worse for your mental focus, since you will quickly burn out.

Long hours: "It's 5 o'clock and I should be done with work, but I just need to finish this problem, just one more try," you tell yourself. But being tired it actually takes you another three hours to solve. The next day you go to work tired and unfocused.

Shorter hours: "It's 5 o'clock and I wish I had this fixed, but I guess I'll try tomorrow morning." The next morning, refreshed, you solve the problem in 10 minutes.

2. A shorter work-week promotes smarter solutions

Working longer hours encourages bad programming habits: you start thinking that the way to solve problems is just forcing yourself to get through the work. But programming is all about automation, about building abstractions to reduce work. Often you can get huge reductions in effort by figuring out a better way to implement an API, or that a particular piece of functionality is not actually necessary.

Let's imagine your boss hands you a task that must ship to your customer in 2 weeks. And you estimate that optimistically it will take you 3 weeks to implement.

Long hours: "This needs to ship in two weeks, but I think it's 120 hours to complete... so I guess I'm working evenings and weekends again." You end up even more burnt out, and probably the feature will still ship late.

Shorter hours: "I've got two weeks, but this is way too much work. What can I do to reduce the scope? Guess I'll spend a couple hours thinking about it."

And soon: "Oh, if I do this restructuring I can get 80% of the feature done in one week, and that'll probably keep the customer happy until I finish the rest. And even if I underestimated I've still got the second week to get that part done."

3. A shorter work-week discourages bad management practices

If your response to any issue is to work longer hours you are encouraging bad management practices. You are effectively telling your manager that your time is not valuable, and that they need not prioritize accordingly.

Long hours: If your manager isn't sure whether you should go to a meeting, they might tell themselves that "it might waste an hour of time, but they'll just work an extra hour in the evening to make it up." If your manager can't decide between two features, they'll just hand you both instead of making a hard decision.

Shorter hours: With shorter hours your time becomes more scarce and valuable. If your manager is at all reasonable less important meetings will get skipped and more important features will be prioritized.

Getting to fewer hours

A short work-week mean different things to different people. One programmer I know made clear when she started a job at a startup that she worked 40-45 hours a week and that's it. Everyone else worked much longer hours, but that was her personal limit. Personally I have negotiated a 35-hour work week.

Whatever the number that makes sense to you, the key is to clearly explain your limits and then stick to them. Tell you manager "I am going to be working a 40-hour work week, unless it's a real emergency." Once you've explained your limits you need to stick to them: no answering emails after hours, no agreeing to do just one little thing on the weekend.

And then you need to prove yourself by still being productive, and making sure that when you are working you are working. Spending a couple hours a day at work watching cat videos probably won't go well with shorter hours.

There are companies where this won't fly, of course, where management is so bad or norms are so out of whack that even a 40-hour work week by a productive team member won't be acceptable. In those cases you need to look for a new job, and as part of the interview figure out the work culture and project management practices of prospective employers. Do people work short hours or long hours? Is everything always on fire or do projects get delivered on time?

Whether you're negotiating your hours at your existing job or at a new job, you'll do better the more experienced and skilled of a programmer you are. If you want to learn how to get there check out The Programmer's Guide to a Sane Workweek.

18 Aug 2016 4:00am GMT

17 Aug 2016

feedPlanet Twisted

Glyph Lefkowitz

Probably best to get this out of the way before this weekend:

If I meet you at a technical conference, you'll probably see me extend my elbow in your direction, rather than my hand. This is because I won't shake your hand at a conference.

People sometimes joke about "con crud", but the amount of lost productivity and human misery generated by conference-transmitted sickness is not funny. Personally, by the time the year is out, I will most likely have attended 5 conferences. This means that if I get sick at each one, I will spend more than a month out of the year out of commission being sick.

When I tell people this, they think I'm a germophobe. But, in all likelihood, I won't be the one getting sick. I already have 10 years of building up herd immunity to the set of minor ailments that afflict the international Python-conference-attending community. It's true that I don't particularly want to get sick myself, but I happily shake people's hands in more moderately-sized social gatherings. I've had a cold before and I've had one again; I have no illusion that ritually dousing myself in Purell every day will make me immune to all disease.

I'm not shaking your hand because I don't want you to get sick. Please don't be weird about it!

17 Aug 2016 6:42pm GMT

14 Aug 2016

feedPlanet Twisted

Glyph Lefkowitz: A Container Is A Function Call

It seems to me that the prevailing mental model among users of container technology1 right now is that a container is a tiny little virtual machine. It's like a machine in the sense that it is provisioned and deprovisioned by explicit decisions, and we talk about "booting" containers. We configure it sort of like we configure a machine; dropping a bunch of files into a volume, setting some environment variables.

In my mind though, a container is something fundamentally different than a VM. Rather than coming from the perspective of "let's take a VM and make it smaller so we can do cool stuff" - get rid of the kernel, get rid of fixed memory allocations, get rid of emulated memory access and instructions, so we can provision more of them at higher density... I'm coming at it from the opposite direction.

For me, containers are "let's take a program and made it bigger so we can do cool stuff". Let's add in the whole user-space filesystem so it's got all the same bits every time, so we don't need to worry about library management, so we can ship it around from computer to computer as a self-contained unit. Awesome!

Of course, there are other ecosystems that figured this out a really long time ago, but having it as a commodity within the most popular server deployment environment has changed things.

Of course, an individual container isn't a whole program. That's why we need tools like compose to put containers together into a functioning whole. This makes a container not just a program, but rather, a part of a program. And of course, we all know what the smaller parts of a program are called:

Functions.2

A container of course is not the function itself; the image is the function. A container itself is a function call.

Perceived through this lens, it becomes apparent that Docker is missing some pretty important information. As a tiny VM, it has all the parts you need: it has an operating system (in the docker build) the ability to boot and reboot (docker run), instrumentation (docker inspect) debugging (docker exec) etc. As a really big function, it's strangely anemic.

Specifically: in every programming language worth its salt, we have a type system; some mechanism to identify what parameters a function will take, and what return value it will have.

You might find this weird coming from a Python person, a language where

1
2
def foo(a, b, c):
    return a.x(c.d(b))

is considered an acceptable level of type documentation by some3; there's no requirement to say what a, b, and c are. However, just because the type system is implicit, that doesn't mean it's not there, even in the text of the program. Let's consider, from reading this tiny example, what we can discover:

And so on, and so on. At runtime each of these types takes on a specific, concrete value, with a type, and if you set a breakpoint and single-step into it with a debugger, you can see each of those types very easily. Also at runtime you will get TypeError exceptions telling you exactly what was wrong with what you tried to do at a number of points, if you make a mistake.

The analogy to containers isn't exact; inputs and outputs aren't obviously in the shape of "arguments" and "return values", especially since containers tend to be long-running; but nevertheless, a container does have inputs and outputs in the form of env vars, network services, and volumes.

Let's consider the "foo" of docker, which would be the middle tier of a 3-tier web application (cribbed from a real live example):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
FROM pypy:2
RUN apt-get update -ym
RUN apt-get upgrade -ym
RUN apt-get install -ym libssl-dev libffi-dev
RUN pip install virtualenv
RUN mkdir -p /code/env
RUN virtualenv /code/env
RUN pwd

COPY requirements.txt /code/requirements.txt
RUN /code/env/bin/pip install -r /code/requirements.txt
COPY main /code/main
RUN chmod a+x /code/main

VOLUME /clf
VOLUME /site
VOLUME /etc/ssl/private

ENTRYPOINT ["/code/main"]

In this file, we can only see three inputs, which are filesystem locations: /clf, /site, and /etc/ssl/private. How is this different than our Python example, a language with supposedly "no type information"?

Of course, the one way that this example is unrealistic is that I deleted all the comments explaining all of those things. Indeed, best practice these days would be to include comments in your Dockerfiles, and include example compose files in your repository, to give users some hint as to how these things all wire together.

This sort of state isn't entirely uncommon in programming languages. In fact, in this popular GitHub project you can see that large programs written in assembler in the 1960s included exactly this sort of documentation convention: huge front-matter comments in English prose.

That is the current state of the container ecosystem. We are at the "late '60s assembly language" stage of orchestration development. It would be a huge technological leap forward to be able to communicate our intent structurally.


When you're building an image, you're building it for a particular purpose. You already pretty much know what you're trying to do and what you're going to need to do it.

  1. When instantiated, the image is going to consume network services. This is not just a matter of hostnames and TCP ports; those services need to be providing a specific service, over a specific protocol. A generic reverse proxy might be able to handle an arbitrary HTTP endpoint, but an API client needs that specific API. A database admin tool might be OK with just "it's a database" but an application needs a particular schema.
  2. It's going to consume environment variables. But not just any variables; the variables have to be in a particular format.
  3. It's going to consume volumes. The volumes need to contain data in a particular format, readable and writable by a particular UID.
  4. It's also going to produce all of these things; it may listen on a network service port, provision a database schema, or emit some text that needs to be fed back into an environment variable elsewhere.

Here's a brief sketch of what I want to see in a Dockerfile to allow me to express this sort of thing:

1
2
3
4
5
6
7
8
9
FROM ...
RUN ...

LISTENS ON: TCP:80 FOR: org.ietf.http/com.example.my-application-api
CONNECTS TO: pgwritemaster.internal ON: TCP:5432 FOR: org.postgresql.db/com.example.my-app-schema
CONNECTS TO: {{ETCD_HOST}} ON: TCP:{{ETCD_PORT}} FOR: com.coreos.etcd/client-communication
ENVIRONMENT NEEDS: ETCD_HOST FORMAT: HOST(com.coreos.etcd/client-communication)
ENVIRONMENT NEEDS: ETCD_PORT FORMAT: PORT(com.coreos.etcd/client-communication)
VOLUME AT: /logs FORMAT: org.w3.clf REQUIRES: WRITE UID: 4321

An image thusly built would refuse to run unless:

There are probably a lot of flaws in the specific syntax here, but I hope you can see past that, to the broader point that the software inside a container has precise expectations of its environment, and that we presently have no way of communicating those expectations beyond writing a Melvilleian essay in each Dockerfile comments, beseeching those who would run the image to give it what it needs.


Why bother with this sort of work, if all the image can do with it is "refuse to run"?

First and foremost, today, the image effectively won't run. Oh, it'll start up, and it'll consume some resources, but it will break when you try to do anything with it. What this metadata will allow the container runtime to do is to tell you why the image didn't run, and give you specific, actionable, fast feedback about what you need to do in order to fix the problem. You won't have to go groveling through logs; which is always especially hard if the back-end service you forgot to properly connect to was the log aggregation service. So this will be an order of magnitude speed improvement on initial deployments and development-environment setups for utility containers. Whole applications typically already come with a compose file, of course, but ideally applications would be built out of functioning self-contained pieces and not assembled one custom container at a time.

Secondly, if there were a strong tooling standard for providing this metadata within the image itself, it might become possible for infrastructure service providers (like, ahem, my employer) to automatically detect and satisfy service dependencies. Right now, if you have a database as a service that lives outside the container system in production, but within the container system in development and test, there's no way for the orchestration layer to say "good news, everyone! you can find the database you need here: ...".

My main interest is in allowing open source software developers to give service operators exactly what they need, so the upstream developers can get useful bug reports. There's a constant tension where volunteer software developers find themselves fielding bug reports where someone deployed their code in a weird way, hacked it up to support some strange environment, built a derived container that had all kinds of extra junk in it to support service discovery or logging or somesuch, and so they don't want to deal with the support load that that generates. Both people in that exchange are behaving reasonably. The developers gave the ops folks a container that runs their software to the best of their abilities. The service vendors made the minimal modifications they needed to have the container become a part of their service fabric. Yet we arrive at a scenario where nobody feels responsible for the resulting artifact.

If we could just say what it is that the container needs in order to really work, in a way which was precise and machine-readable, then it would be clear where the responsibility lies. Service providers could just run the container unmodified, and they'd know very clearly whether or not they'd satisfied its runtime requirements. Open source developers - or even commercial service vendors! - could say very clearly what they expected to be passed in, and when they got bug reports, they'd know exactly how their service should have behaved.


  1. which mostly but not entirely just means "docker"; it's weird, of course, because there are pieces that docker depends on and tools that build upon docker which are part of this, but docker remains the nexus.

  2. Yes yes, I know that they're not really functions Tristan, they're subroutines, but that's the word people use for "subroutines" nowadays.

  3. Just to be clear: no it isn't. Write a damn docstring, or at least some type annotations.

14 Aug 2016 10:22pm GMT