05 Dec 2016
Hello, hello. It's been a long time since the last entry in the "Twisted Web in 60 Seconds" series. If you're new to the series and you like this post, I recommend going back and reading the older posts as well.
In this entry, I'll show you how to enable HTTP/2 for your Twisted Web-based site. HTTP/2 is the latest entry in the HTTP family of protocols. It builds on work from Google and others to improve performance (and other) shortcomings of the older HTTP/1.x protocols in wide-spread use today.
Twisted implements HTTP/2 support by building on the general-purpose H2 Python library. In fact, all you have to do to have HTTP/2 for your Twisted Web-based site (starting in Twisted 16.3.0) is install the dependencies:
$ pip install twisted[http2]
Your TLS-based site is now available via HTTP/2! A future version of Twisted will likely extend this to non-TLS sites (which requires the
Upgrade: h2c handshake) with no further effort on your part.
If you like this post or others in the Twisted Web in 60 Seconds series, let me know with a donation! I'll post another entry in the series when the counter hits zero. Topic suggestions welcome in the comment section.
05 Dec 2016 12:00pm GMT
02 Dec 2016
When writing unit tests, it is good to call functions with "mocks" or "fakes" - objects with equivalent interface but a simple, "fake" implementation. For example, instead of a real socket object, something that has recv() but returns "hello" the first time, and an empty string the second time. This is great! Instead of testing the vagaries of the other side of a socket connection, you can focus on testing your code - and force your code to handle corner cases, like recv() returning partial messages, that happen rarely on the same host (but not so rarely in more complex network environments).
There is one OS interface which it is wise not to mock - the venerable UNIX file system. Mocking the file system is the classic case of low-ROI effort:
- It is easy to isolate: if functions get a parameter of "which directory to work inside", tests can use a per-suite temporary directory. Directories are cheap to create and destroy.
- It is reliable: the file system rarely fails - and if it does, your code is likely to get weird crashes anyway.
- The surface area is enormous: open(), but also os.open, os.mkdir, os.rename, os.mknod, os.rename, shutil.copytree and others, plus modules calling out to C functions which call out to C's fopen().
The first two items decrease the Return, since mocking the file system does not make the tests easier to write or the test run more reproducible, while the last one increases the Investment.
Do not mock the file system, or it will mock you back.
02 Dec 2016 5:34am GMT
30 Nov 2016
Itamar Turner-Trauring: The Not-So-Passionate Programmer: finding a job when you're just a normal person
When reading programming job postings you'll find many companies that want to hire "passionate programmers". If you're just a normal programmer looking for a normal job this can be pretty discouraging.
What if you're not passionate? What if you don't work on side projects, or code in your spare time?
What if you have a sneaking suspicion that "passionate" is a euphemism for "we want you to work large amounts of unpaid overtime"? Can you really find a job where you don't have to be passionate, where you can just do your job and go home?
The truth is that many companies will hire you even if you don't have "passion". Not to mention that "passion" has very little impact on whether you do your job well.
But since companies do ask for "passion" in job postings and sometimes look for it during interviews, here's what you can do about your lack of "passion" when searching for a job.
Searching for a job
The first thing to do is not worry about it too much. Consider some real job postings for passionate programmers:
- "[Our company] is looking for Java Engineer who is passionate about solving real world business problems to join our team."
- "We're looking for a senior developer to play a major role in a team of smart, passionate and driven people."
- "This role is ideal for somebody who is passionate about building great online apps."
They all say "passionate", yes. But these are all posts from very different kinds of companies, with different customers, different technology stacks, and very different cultures (and they're in two different countries). Whoever wrote the job posting at each company probably didn't think very hard about their choice of words, and if pressed each would probably explain "passionate" differently.
It might be a euphemism for working long hours, but it might also just mean they want to hire competent engineers. If the job looks good otherwise, don't think about it too hard: apply and see how it goes.
Interviewing for a job
Eventually you'll get a job interview at a company that wants "passionate" programmers. A job interview has two aspects: the company is interviewing you, and you are interviewing the company.
When the company is interviewing you they want to find out if you're going to do your job. You need to make a good impression... even if insurance premiums, or content management systems, or internal training or whatever the company does won't be putting beating cartoon hearts in your eyes.
- First, that means you need to take an interest in the company. Before your interview do some research about the company, and then ask questions about the product during the interview.
- Second, since you can't muster that crazy love for insurance premiums, focus on what you can provide: emphasize your professional pride in your work, your willingness to get things done and do them right.
At the same time that you're trying to sell yourself to the company you should also be trying to figure out if you want to work for them. Among other things, you want to figure out if the word "passionate" is just a codeword for unpaid overtime.
Ask what a typical workday is like, and what a typical workweek is like. Ask how they do project planning, and how they ensure code ships on time.
Finally, you will sometimes discover that the employees who work at the company are passionate about what they do. If this rubs you the wrong way, you might want to find a different company to work for.
If you're OK with it you'll want to make sure you'll be able to fit in. So try to figure out if they're open to other ways of thinking: how they handle conflicts, how they handle diversity of opinion.
On the job
Eventually you will have a job. Most you'll just have a normal job, with normal co-workers who are just doing their job too.
But sometimes you will end up somewhere where everyone else is passionate and you are not. So long as your coworkers and management value a diversity of opinion, your lack of passion can actually be a positive.
For example, startups are often full of passion for what they're building. Most startups fail, of course, and so every startup has a story about why they are different, why they won't fail. Given the odds that story will usually turn out to be wrong, but passionate employees will keep on believing, or at least won't be willing to contradict the story in public.
As someone who isn't passionate you can provide the necessary sanity checks: "Sure, it's a great product... but we're not getting customers fast enough. Maybe we should figure out what we can change?"
Similarly, passionate programmers often love playing with new technology. But technology can be a distraction, and writing code is often the wrong solution. As someone who isn't passionate you can ensure the company's goals are actually being met... even if that means using existing, boring software instead of writing something new and exciting.
There's nothing wrong with wanting to go home at the end of the day and stop thinking about work. There are many successful software developers who don't work crazy hours and who don't spend their spare time coding.
Join the course: Getting to a Sane Workweek
Don't let your job take over your life. Join over 650 other programmers on the journey to a saner workweek by taking this free 6-part email course. You'll learn how you can work reasonable hours and still succeed in your career a programmer.
Success! Now check your email to confirm your subscription.
There was an error submitting your subscription. Please try again.
If you would like a job that doesn't overwhelm your life, join my free 6-part email course to learn how you can get to a sane workweek.
30 Nov 2016 5:00am GMT
25 Nov 2016
On behalf of Twisted Matrix Laboratories, I am honoured to announce the release of Twisted 16.6!
The highlights of this release are:
- The ability to use "python -m twisted" to call the new twist runner,
- More reliable tests from a more reliable implementation of some things, like IOCP,
- Fixes for async/await & twisted.internet.defer.ensureDeferred, meaning it's getting closer to prime time!
- ECDSA support in Conch & ckeygen (which has also been ported to Python 3),
- Python 3 support for Words' IRC support and twisted.protocols.sip among some smaller modules,
- Some HTTP/2 server optimisations,
- and a few bugfixes to boot!
For more information, check the NEWS file (link provided below).
You can find the downloads on PyPI (or alternatively our website). The NEWS file is also available on GitHub.
Many thanks to everyone who had a part in this release - the supporters of the Twisted Software Foundation, the developers who contributed code as well as documentation, and all the people building great things with Twisted!
Amber Brown (HawkOwl)
25 Nov 2016 8:06pm GMT
21 Nov 2016
The Changelog has just published an episode about Servo. It covers the motivations and goals of the project, some aspects of Servo performance and use of the Rust language, and even has a bit about our wonderful community. If your curious about why Servo exists, how we plan to ship it to real users, or what it was like to use Rust before it was stable, I recommend giving it a listen.
21 Nov 2016 12:00am GMT
19 Nov 2016
Recently I have been talking about deploying Python, and some people had the reasonable question: if a .pex file is used for isolating dependencies, and a Docker container is used for isolating dependencies, why use both? Isn't it redundant?
Why use containers?
I really like glyph's explanation for containers: they isolate not just the filesystem stack but the processes and the network, giving a lot of the power that UNIX was supposed to give but missed out on. Containers isolate the file system, making it easier for code to write/read files from known locations. For example, its log files will be carefully segregated, and can be moved to arbitrary places by the operator without touching the code.
The other part is that none of the reasonable options packages Python and this means that a pex file would still have to be tested with multiple Pythons, and perhaps do some checking at start-up that it is using the right interpreter. If PyPy is the right choice, it is the choice the operator would have to make and implement.
Why use pex?
Containers are an easy sell. They are right on the hype train. But if we use containers, what use is pex?
ADD wheelhouse /wheelhouse RUN . /appenv/bin/activate; \ pip install --no-index -f wheelhouse DeployMe
COPY twist.pex /
Note that in the first option, we are left with extra gunk in the /wheelhouse directory. Note also that we still have to have pip and virtualenv installed in the runtime container. Pex files bring the double-dutch philosophy to its logical conclusion: do even more of the build on the builder side, do even less of it on the runtime side.
19 Nov 2016 5:11am GMT
18 Nov 2016
If it hurts to type you'll have a much harder time working as a programmer. Yes, there's voice recognition, but it's just not the same. So when my wrist and arm pain returned soon after starting a new job I was starting to get a little scared.
The last two times this happened I'd had to take months and then years off from programming before the pain went away. Was my career as a programmer going to take another hit?
And then, while biking to work one day, I realized what was going on. I came up with a way to test my theory, tried it out... and the pain went away. It's quite possible the same solution would have worked all those years ago, too: instead of unhappily working as a product manager for a few years I could have been programming.
But before I tell you what I figured out, here's what I tried first.
Failed solution #1: better hardware, better ergonomics, more breaks
When I first got wrist pain bad enough that I couldn't type I started by getting a better keyboard, the Kinesis Advantage. It's expensive, built like a tank and amazingly well designed: because Ctrl, Alt, Space, Enter are on the thumb are you don't end up stretching your hands as much.
As an Emacs user this is important; I basically can't use regular keyboards for anything more than a few minutes these days. I own multiple Kinesis keyboards and would be very sad without them. They've definitely solved one particular kind of pain I used to have due to overstretching.
I reconfigured my desk setup to be more ergonomic (the days I do this via a standing desk). And I also started taking typing breaks: half a minute every few minutes, 10 minutes once an hours. That might have helped, or not.
The pain came and went, and eventually it came and stayed.
Failed solution #2: doctor's orders
I went to see a doctor, and she suggested it was some sort of "-itis", a fancy Latin word saying I was in pain and she wasn't quite sure why. She prescribed a non-steroidal anti-inflammatory (high doses of ibuprofen will do the trick) and occupational therapy.
That didn't help either, though the meds dulled the pain when I took them.
Failed solution #3: alternative physical therapy
Next I tried massage, Yoga, Alexander Technique, and Rolfing. I learned that my muscles were tight and sore, and ended up understanding some ways I could improve my posture. A couple of times during the Alexander Technique classes my whole back suddenly relaxed, an amazing experience: I was obviously doing something wrong with my muscles.
What I learned was useful. My hands are often cold, and all those classes helped me eventually discover that if I relaxed my muscles the right way my hands would warm up. Tense muscles were hurting my circulation.
At the time, however, none of it helped.
After six months at home not typing I was no better: I was still in pain.
So I went back to work and got a new role, as a Product Analyst, where I needed to type less and could use voice recognition for dictation. I did this for 2 or 3 years, but I was not happy: I missed programming.
Working part time
At some point during this period I read one of Dr. Sarno's books. His theory is that long periods of pain are not due to actual injury, but rather an emotional problem causing e.g. muscles to tense up or reduced blood flow. There are quite a few people who have had their pain go away by reading one of his books and doing some mental exercises.
I decided to give it a try: release emotional stress, go back to programming, and not worry about pain anymore. Since I wasn't sure I could work full time I took on consulting, and later a part time programming job.
It worked! I was able to type again, with no pain for four years.
The pain comes back
Earlier this year I started another job, with more hours but still slightly less than full time. And then the pain returned.
Why was I in pain again? I wasn't working that many more hours, I was still using a standing desk as I had for the past four years. What was going on?
An epiphany: environmental causes
Biking to work one day the epiphany hit me: Dr. Sarno's theory was that suppressed emotional stress caused the pain by tensing muscles or reducing blood flow. And that seemed to be the case for me at least. But emotional stress wasn't the only way I could end up with tense muscles or reduced blood flow.
The new office I was working in was crazy cold, and a couple of weeks earlier I'd moved my desk right under the air conditioning vent. Cold would definitely reduce blood flow. For that matter, caffeine shrinks blood vessels. And during the four years I'd work part time and pain free I'd been working in a hackerspace with basically no air conditioning.
I started wearing a sweatshirt and hand warmers at work, and I avoided caffeine on the days I went to the office. The pain went away, and so far hasn't come back.
I spent three years unable to work as a programmer, and there's a good chance I could have solved the problem just by wearing warmer clothing.
If you're suffering from wrist or arm pain:
- Start by putting a sweatshirt on: getting warmer may be all you need to solve the problem.
- If Emacs key combos are bad for your wrist, consider vi, god-mode, Spacemacs... or the expensive option, a Kinesis Advantage keyboard.
- Next, consider improving your posture (standing desks are good for that).
- Finally, if you're still in pain after a month or two go read Dr. Sarno's book. (Update: After posting this blog I got a number of emails from people saying "I read that book and my pain quickly went away.")
This may not work for everyone, but I do believe most so-called repetitive strain injury is not actually an injury. If you're in pain, don't give up: you will be able to get back to typing.
By the way, taking so long to figure out why my arms were hurting isn't the only thing I've gotten wrong during my career. So if you want to become a better software engineer, learn how you can avoid my many mistakes as a programmer.
18 Nov 2016 5:00am GMT
16 Nov 2016
How many times have you seen software exhibiting completely impossible results? In theory software is completely deterministic, but in practice it often seems capriciously demonic. But all is not lost: the detection methods of Sherlock Holmes can help you discover the hidden order beneath the chaos.
Sherlock Holmes famously stated that "once you eliminate the impossible, whatever remains, no matter how improbable, must be the truth." And what is true for (fictional) crimes is also true for software. The basic process you follow to find a problem is:
- Come up with a list of potential causes.
- Test each potential cause in isolation, ruling them out one by one.
- Whatever cause you can't rule out is the likely cause, even if it seems improbable.
To see how this might work in practice, here's a bug my colleague Phil and I encountered over at my day job where we're building microservices architecture.
The Case of the Missing Stats
I was working on a client library, Phil was working on the server. Phil was testing out a new feature where the client would send messages to the server, containing certain statistics. When he ran the client the server did get messages, but the messages only ever had empty lists of stats.
Someone had kidnapped the stats, and we had to find them.
Phil was using the following components, each of which was a potential suspect:
- A local server with in-development code.
- Python 3.4.
- The latest version of the client.
- The latest version of the test script.
Eliminating the impossible
Our next step was to isolate each possible cause and falsify it.
Theory #1: the client was broken
The client code had never been used with a real server; perhaps it was buggy? I checked to see if there were unit tests, and there were some checking for existence of stats. Maybe the unit tests were broken though.
We ran the client with Python 3.5 on my computer using the same test script Phil had used and recorded traffic to the production server. Python 3.5 and 3.4 are similar enough that it seemed OK to change that variable at the same time.
The messages sent to the server did include the expected stats. The client was apparently not the problem, nor was the test script.
Theory #2: Python version
We tried Python 2.7, just for kicks; stats were still there.
Theory #3: Phil's computer
Maybe Phil's computer was cursed? Phil gave me an SSH login to his computer, I set up a new environment and ran the client against the production server using the Python 3.4 on his computer.
Once again we saw stats being sent.
Theory #4: the server was broken
You may have noticed that so far we were testing against the production server, and Phil had been testing against his in-development server. The server seemed an unlikely cause, however: the client unilaterally sent messages to the server, so the server version shouldn't have mattered.
However, having eliminated all other causes, that was the next thing to check. We ran the client against Phil's in-development server... and suddenly the stats were missing from the client transmission logs.
We had found the kidnapper. Now we needed to figure out how the crime had been committed.
Recreating the crime
So far we'd assumed that when the client talked to the dev server the messages did not include stats. Now that we could reproduce the problem we noticed that it wasn't that the messages didn't include stats; rather, we were sending fewer messages.
Messages with stats were failing to be sent. A quick check of the logs indicated an encoding error: we were failing to encode messages that had stats, so they were never sent. (We should have checked the logs much much earlier in the process, as it turns out.)
Reading the code suggested the problem: the in-development server was feeding the client bogus data earlier on. When the client tried to send a message to the server that included stats it needed to use some of that bogus data, and it failed to encode and the message got dropped. If the client sent a message to the server with an empty list of stats the bogus data was not needed, so encoding and sending succeeded.
The server turned out to be the culprit after all, even though it seemed to be the most improbable cause at the outset. Or at least, the first order culprit; a root-cause analysis suggested that some problems in our protocol design were the real cause.
You too can be a scientific software detective
Our debugging process could have been better: we didn't really check only one change at a time, and we neglected the obvious step of checking the logs. But the basic process worked:
- Isolate a possible cause.
- Falsify it, demonstrating it can't be the real cause.
- Repeat until only one cause is left.
Got an impossible bug? Put on your imaginary detective hat, stick an imaginary detective pipe in your mouth, and catch that culprit.
16 Nov 2016 5:00am GMT
13 Nov 2016
Too Long: Didn't Read
build script builds a Docker container,
$ ./build MY_VERSION $ docker run --rm -it --publish 8080:8080 \ moshez/sayhello:MY_VERSION --port 8080
There will be a simple application running on port 8080.
If you own the domain name
hello.example.com, you can point it at a machine that the domain resolves to and then run:
$ docker run --rm -it --publish 443:443 \ moshez/sayhello:MY_VERSION --port le:/srv/www/certs:tcp:443 \ --empty-file /srv/www/certs/hello.example.com.pem
It will result in the same application running on a secure web site:
All source code is available on GitHub.
WSGI has been a successful standard. Very successful. It allows people to write Python applications using many frameworks (Django, Pyramid, Flask and Bottle, to name but a few) and deploy using many different servers (uwsgi, gunicorn and Apache).
Twisted makes a good WSGI container. Like Gunicorn, it is pure Python, simplifying deployment. Like Apache, it sports a production-grade web server that does not need a front end.
Container images allow us to package an application with all of its dependencies. They often cause a temptation to use those as the configuration management. However, Dockerfile is a challenging language to write big parts of the application in. People writing WSGI applications probably think Python is a good programming language. The more of the application logic is in Python, the easier it is for a WSGI-based team to master it.
Pex is a way to package several Python "distributions" (sometimes informally called "Packages", the things that are hosted by PyPI) into one file, optionally with an entry-point so that running the file will call a pre-defined function. It can take an explicit list of wheels but can also, as in our example here, take arguments compatible with the ones pip takes. The best practice is to give it a list of wheels, and build the wheels with
The pkg_resources module allows access to files packaged in a distribution in a way that is agnostic to how the distribution was deployed. Specifically, it is possible to install a distribution as a zipped directory, instead of unpacking it into
site-packages. The code:pex format relies on this feature of Python, so adherence to using
pkg_resources to access data files is important in order to not break code:pex compatibility.
Let's Encrypt is a free, automated, and open Certificate Authority. It has invented the ACME protocol in order to make getting secure certificates a simple operation. txacme is an implementation of an ACME client, i.e., something that asks for certificates, for Twisted applications. It uses the server endpoint plugin mechanism in order to allow any application that builds a listening endpoint to support ACME.
twist command-line tools allows running any Twisted service plugin. Service plugins allow us to configure a service using Python, a pretty nifty language, while still allowing specific customizations at the point of use via command line parameters.
Putting it all together
setup.py files defines a distribution called
sayhello. In it, we have three parts:
src/sayhello/wsgi.py: A simple Flask-based WSGI application
src/sayhello/data/index.html: an HTML file meant to serve as the root
src/twisted/plugins/sayhello.py: A Twist plugin
There is also some build infrastructure:
buildis a Python script to run the build.
build.dockeris a Dockerfile designed to build pex files, but not run as a production server.
run.dockeris a Dockerfile designed for production container.
Note that build does not push the resulting container to DockerHub.
Tristan Seligmann has written txacme.
Amber "Hawkowl" Brown has written "twist", which is much better at running Twisted-based services than the older "twistd".
Of course, all mistakes and problems here are completely my responsibility.
13 Nov 2016 3:38pm GMT
12 Nov 2016
I'm crying as I write this, and I want you to understand why.
Politics is the mind-killer. I hate talking about it; I hate driving a wedge between myself and someone I might be able to participate in a coalition with, however narrow. But, when you ignore politics for long enough, it doesn't just kill the mind; it goes on to kill the rest of the body, as well as anyone standing nearby. So, sometimes one is really obligated to talk about it.
Today, I am in despair. Donald Trump is an unprecedented catastrophe for American politics, in many ways. I find it likely that I will get into some nasty political arguments with his supporters in the years to come. But hopefully, this post is not one of those arguments. This post is for you, hypothetical Trump supporter. I want you to understand why we1 are not just sad, that we are not just defeated, but that we are in more emotional distress than any election has ever provoked for us. I want you to understand that we are afraid for our safety, and for good reason.
I do not believe I can change your views; don't @ me to argue, because you certainly can't change mine. My hope is simply that you can read this and at least understand why a higher level level of care and compassion in political discourse than you are used to may now be required. At least soften your tone, and blunt your rhetoric. You already won, and if you rub it in too much, you may be driving people to literally kill themselves.
First let me list the arguments that I'm not making, so you can't write off my concerns as a repeat of some rhetoric you've heard before.
I won't tell you about how Trump has the support of the American Nazi Party and the Ku Klux Klan; I know that you'll tell me that he "can't control who supports him", and that he denounced2 their support. I won't tell you about the very real campaign of violence that has been carried out by his supporters in the mere days since his victory; a campaign that has even affected the behavior of children. I know you don't believe there's a connection there.
I think these are very real points to be made. But even if I agreed with you completely, that none of this was his fault, that none of this could have been prevented by his campaign, and that in his heart he's not a hateful racist, I would still be just as scared.
Bear Sterns estimates that there are approximately 20 million illegal immigrants in the United States. Donald Trump's official position on how to handle this population is mass deportation. He has promised that this will be done "warmly and humanely", which betrays his total ignorance of how mass resettlements have happened in the past.
By contrast, the total combined number of active and reserve personnel in the United States Armed Forces is a little over 2 million people.
What do you imagine happens when a person is deported? A person who, as an illegal immigrant, very likely gave up everything they have in their home country, and wants to be where they are so badly that they risk arrest every day, just by living where they live? How do you think that millions of them returning to countries where they have no home, no food, and quite likely no money or access to the resources or support that they had while in the United States?
They die. They die of exposure because they are in poverty and all their possessions were just stripped away and they can no longer feed themselves, or because they were already refugees from political violence in their home country, or because their home country kills them at the border because it is a hostile action to suddenly burden an economy with the shock of millions of displaced (and therefore suddenly poor and unemployed, whether they were before or not) people.
A conflict between 20 million people on one side and 2 million (heavily armed) people on the other is not a "police action". It cannot be done "warmly and humanely". At best, such an action could be called a massacre. At worst (and more likely) it would be called a civil war. Individual deportees can be sent home without incident, and many have been, but the victims of a mass deportation will know what is waiting for them on the other side of that train ride. At least some of them won't go quietly.
It doesn't matter if this is technically enforcing "existing laws". It doesn't matter whether you think these people deserve to be in the country or not. This is just a reality of very, very large numbers.
Let's say, just for the sake of argument, that of the population of immigrants has assimilated so poorly that each one knows only one citizen who will stand up to defend them, once it's obvious that they will be sent to their deaths. That's a hypothetical resistance army of 40 million people. Let's say they are so thoroughly overpowered by the military and police that there are zero casualties on the other side of this. Generously, let's say that the police and military are incredibly restrained, and do not use unnecessary overwhelming force, and the casualty rate is just 20%; 4 out of 5 people are captured without lethal force, and miraculously nobody else dies in the remaining 16 million who are sent back to their home countries.
That's 8 million casualties.
6 million Jews died in the Holocaust.
This is why we are afraid. Forget all the troubling things about Trump's character. Forget the coded racist language, the support of hate groups, and every detail and gaffe that we could quibble over as the usual chum of left/right political struggle in the USA. Forget his deeply concerning relationship with African-Americans, even.
We are afraid because of things that others have said about him, yes. But mainly, we are afraid because, in his own campaign, Trump promised to be 33% worse than Hitler.
I know that there are mechanisms in our democracy to prevent such an atrocity from occurring. But there are also mechanisms to prevent the kind of madman who would propose such a policy from becoming the President, and thus far they've all failed.
I'm not all that afraid for myself. I'm not a Muslim. I am a Jew, but despite all the swastikas painted on walls next to Trump's name and slogans, I don't think he's particularly anti-Semitic. Perhaps he will even make a show of punishing anti-Semites, since he has some Jews in his family3.
I don't even think he's trying to engineer a massacre; I just know that what he wants to do will cause one. Perhaps, when he sees what is happening as a result of his orders, he will stop. But his character has been so erratic, I honestly have no idea.
I'm not an immigrant, but many in my family are. One of those immigrants is intimately familiar with the use of the word "deportation" as an euphemism for extermination; there's even a museum about it where she comes from.
Her mother's name is written in a book there.
In closing, I'd like to share a quote.
The last thing that my great-grandmother said to my grandmother, before she was dragged off to be killed by the Nazis, was this:
Pleure pas, les gens sont bons.
or, in English:
Don't cry, people are good.
As it turns out, she was right, in a sense; thanks in large part to the help of anonymous strangers, my grandmother managed to escape, and, here I am.
My greatest hope for this upcoming regime change is that I am dramatically catastrophizing; that none of these plans will come to fruition, that the strange story4 I have been told by Trump supporters is in fact true.
But if my fears, if our fears, should come to pass - and the violence already in the streets already is showing that at least some of those fears will - you, my dear conservative, may find yourself at a crossroads. You may see something happening in your state, or your city, or even in your own home. Your children might use a racial slur, or even just tell a joke that you find troubling. You may see someone, even a policeman, beating a Muslim to death. In that moment, you will have a choice: to say something, or not. To be one of the good people, or not.
Please, be one of the good ones.
In the meanwhile, I'm going to try to take great-grandma's advice.
When I say "we", I mean, the people that you would call "liberals", although our politics are often much more complicated than that; the people from "blue states" even though most states are closer to purple than pure blue or pure red; people of color, and immigrants, and yes, Jews. ↩
While tacitly allowing continued violence against Muslims, of course. ↩
"His campaign is really about campaign finance", "he just said that stuff to get votes, of course he won't do it", "they'll be better off in their home countries", and a million other justifications. ↩
12 Nov 2016 2:33am GMT
10 Nov 2016
It's tempting to believe that taking your work home will make you a better software engineer, and that work/life balance will limit your learning.
- For some software developers programming isn't just a job: it's something to do for fun, sometimes even a reason for being. If you love coding and coding is your job, why not keep working over the weekend? It's more practice of the skills you need.
- When you don't have the motivation or ability to take work home on the weekends you might feel you're never going to be as good a software engineer as those who do.
But the truth is that if you want to be a good software engineer you shouldn't take your work home.
What makes a good software engineer? The ability to build solutions for hard, complex problems. Here's why spending extra hours on your normal job won't help you do that.
New problems, new solutions
If you have the time and motivation to write software in your free time you could write more software for your job. But that restricts you to a particular kind of problem and limits the solution space you can consider.
If you take your work home you will end up solving the same kinds of problems that you work on during your normal workweek. You'll need to use technologies that meet your employer's business goals, and you'll need to use the same standards of quality your employer expects. But if you take on a personal programming project you'll have no such constraints.
- If your company has low quality standards, you can learn how to test really well.
- Or you can write complete hacks just to learn something new.
- You can use and learn completely different areas of technology.
I once wrote a Python Global Interpreter Lock profiler, using
LD_PRELOAD to override the Python process' interactions with operating system locks and the
gdb debugger to look at the live program's C stack. It never worked well enough to be truly useful... but building it was very educational.
The additional learning you'll get from working on different projects will make you a better software engineer. But even if you don't have the time or motivation to code at home, fear not: work/life balance can still make you a better software engineer.
Learning other skills
Being a good software engineer isn't just about churning out code. There are many other skills you need, and time spent doing things other than coding can still improve your abilities.
When I was younger and had more free time I spent my evenings studying at a university for a liberals art degree. Among other things I learned how to write: how to find abstractions that mattered, how to marshal evidence, how to explain complex ideas, how to extract nuances from text I read. This has been immensely useful when working on harder problems, where good abstractions are critical and design documents are a must.
These days I'm spending more of my time with my child, and as a side-effect I'm learning other things. For example, explaining the world to a 4-year-old requires the ability to take complex concepts and simplify them to their essential core.
You need a hammock to solve hard problems
Though additional learning will help you, much of the benefit of work/life balance is that you're not working. Hard problems require downtime, time when you're explicitly not thinking about solutions, time for your brain to sort things out in the background. Rich Hickey, the creator of Clojure, has a great talk on the subject called Hammock Driven Development.
The gist is that hard problems require a lot of research, of alternatives and existing solutions and the problem definition, and then a lot of time letting your intuition sort things out on its own. And that takes time, time when you're not actively thinking about the problem.
At one point I was my kid's primary caregiver when I wasn't at work. I'm not up to Hickey's standard of hard problems, and taking care of an infant and toddler wasn't as restful as a hammock. But I still found that time spent not thinking about work was helpful in solving the hard problems I went back to the next day.
Learning to do more with less
The final benefit of work/life balance is attitude: the way you think about your job. If you work extra hours on your normal job you are training yourself to do your work with more time than necessary. To improve as a software engineer you want to learn how to do your work in less time, which is important if you want to take on bigger, harder projects.
Working a reasonable, limited work week will help focus you on becoming a more productive programmer rather than trying to solve problems the hard, slow way.
Given the choice you shouldn't take your work home with you. If you want to keep coding you should have no trouble finding interesting projects to work on, untrammeled by the requirements of your job. If can't or won't code in your free time, that's fine too.
But what if that isn't a choice you can make? What if you don't have work/life balance as a software engineer because of pressure from your boss, or constant emergencies at work? In that case you should sign up for my free 6-part email course, which will show you how to get a to a saner, shorter workweek.
10 Nov 2016 5:00am GMT
30 Oct 2016
When you've been writing Java for a while switching to Python can make you a little anxious. Not only are you learning a new language with new idioms and tools, you're also dealing with a language with far less built-in safety. No more type checks, no more clear separation between public and private.
It's much easier to learn Python than Java, it's true, but it's also much easier to write unmaintainable code. Can you really build large scale, robust and maintainable applications in Python? I think you can, if you do it right.
The suggestions below will help get you started on a new Python project, or improve an existing project that you're joining. You'll need to keep up the best practices you've used in Java, and learn new tools that will help you write better Python.
Tools and Best Practices
Python 2 and 3
Before you start a new Python project you have to choose which version of the language to support: Python 3 is not backwards-compatible with Python 2. Python 2 is only barely being maintained, and will be end-of-lifed in 2020, so that leaves you with only two options with long term viability:
- A hybrid language, the intersection of Python 2 and Python 3. This requires you to understand the subtleties of the differences between the two languages. The best guide I've seen to writing this hybrid language is on the Python Future website.
- Python 3 only.
Most popular Python libraries now support Python 3, as do most runtime environments. Unless you need to write a library that will be used by both new and legacy applications it's best to stick to Python 3 only.
However, on OS X you'll need to use Homebrew to install Python 3 (though using Homebrew's Python 2 is also recommended over using the system Python 2). And on Google App Engine you'll need to use the beta Flexible Environment to get Python 3 support.
Java enforces types on method parameters, on object attributes, and on variables. To get the equivalent in Python you can use a combination of runtime type checking and static analysis tools.
- To ensure your classes have the correct types on attributes you can use the attrs library, though it's very useful even if you don't care about type enforcement. This will only do runtime type checking, so you'll need to have decent test coverage.
- For method attributes and variables, the mypy static type checker, combined with the new Python 3 type annotation syntax, will catch many problems. For Python 2 there is a comment-based syntax as well. The clever folks at Zulip have a nice introductory article about mypy.
Public, private and interfaces
Python lets you do many things Java wouldn't, everything from metaclasses to replacing a method at runtime. But while these more dynamic capabilities can be quite useful, there's nothing wrong with using them sparingly. For example, while Python allows you to set random attributes on a passed in object, usually you shouldn't.
- As with Java, you typically want to interact with objects using a method-based interface (explicit or implicit), not by randomly mucking with its internals.
- As with Java code, you want to have a clear separation between public and private parts of your API.
- And as with Java, you want to be coding to an interface, not to implementation details.
Where Java has explicit and compiler enforced public/private separation, in Python you do this by convention:
- Private methods and attributes on a class are typically prefixed with an "_".
- The public interface of a module is declared using
__all__ = ["MyClass", "AnotherClass"].
__all__also controls what you gets imported when you do
from module import *, but wildcard imports are a bad idea. For more details see the relevant Python documentation.
As for interfaces, if you want to explicitly declare them you can use Python's built-in abstract base classes; not quite the same, but they can be used as pseudo-interfaces. Alternatively, the zope.interface package is more powerful and flexible (and the
attrs library mentioned above understands it).
Automated tests are important if you want some assurance your code works. Python has a built-in unittest library that is similar to
JUnit, but at a minimum you'll want a more powerful test runner.
- nose is a test runner for the built-in
unittest, with many plugins.
- pytest is a test runner and framework, supporting the built-in
unittestlibrary as well as a more succinct style of testing. It also has numerous plugins.
Other useful tools:
- Hypothesis lets you write a single function that generates hundreds or thousands of test cases for maximal test coverage.
- To set up isolated test environments tox is useful; it builds on Python's built-in virtualenv.
- coverage let's you measure code coverage on your test runs. If you have multiple
toxenvironments, here's a tutorial on combining the resulting code coverage.
More static analysis
In addition to
mypy, two other lint tools may prove useful:
- flake8 is quick, catches a few important bugs, and checks for some standard coding style violations.
- pylint is much more powerful, slower, and generates massive numbers of false positives. As a result much fewer Python projects use it than
flake8. I still recommend using it, but see my blog post on the subject for details on making it usable.
You should document your classes and public methods using docstrings. Unless you're using the new type signature syntax you should also document the types of function parameters and results.
Typically Python docstrings are written in reStructuredText format. It's surprisingly difficult to find an example of the standard style, but here's one.
A good Python editor or IDE won't be as powerful as the equivalent Java IDE, but it will make your life easier. All of these will do syntax highlighting, code completion, error highlighting, etc.:
- If you're used to IntelliJ you can use PyCharm.
- If you're used to Eclipse you can use PyDev.
- Elpy is a great Emacs mode for Python.
- Not certain what your best bet is for vim, but python-mode looks plausible.
Writing maintainable Python
In the end, writing maintainable Python is very much like writing maintainable Java. Python has more flexibility, but also more potential for abuse, so Python expects you to be a responsible adult.
You can choose to write bad code, but if you follow the best practices you learned from Java you won't have to. And the tools I've described above will help catch any mistakes you make along the way.
30 Oct 2016 4:00am GMT
29 Oct 2016
WSGI is a great standard. It has been amazingly successful. In order to describe how successful it is, let me describe life before WSGI. In the beginning, CGI existed. CGI was just a standard for how a web server can run a process - what environment variables to pass, and so forth. In order to write a web-based application, people would write programs that complied with CGI. At that time, Apache's only competition was commercial web servers, and CGI allowed you to write applications that ran on both. However, starting a process for each request was slow and wasteful.
For Python applications, people wrote mod_python for Apache. It allowed people to write Python programs that ran inside the Apache process, and directly used Apache's API to access the HTTP request details. Since Apache was the only server that mattered, that was fine. However, as more servers arrived, a standard was needed. mod_wsgi was originally a way to run the same Django application on many servers. However, as a side effect, it also allowed the second wave of Python web application frameworks -Paste, Flask and more - to have something to run on. In order to make life easier, Python included wsgiref, a module that implemented a single-thread single-process blocking web server with the WSGI protocol.
Some web frameworks come with their own development web servers that will run their WSGI apps. Some use wsgiref. Almost always those options are carefully documented as "just for development use, do not use in production." Wouldn't it be nice to use the same WSGI container in both development and production, eliminating one potential source of reproduction bugs?
For ease of use, it should probably be written in Python. Luckily, "twist web -wsgi" is just such a server. In order to show-case how easy it is to use it, twist-wsgi shows commands to run Django, Flask, Pyramid and Bottle apps as easy as it is to run frameworks' built-in web server.
In production, using the Twisted WSGI containers come with several advantages. Production-grade SSL support using PyOpenssl and cryptography allows elimination of "SSL terminators", removing one moving piece from the equation. With third-party extensions like txsni and txacme, it allows modern support for "easy SSL". The built-in HTTP/2 support, starting with Twisted 16.3, allows better support for parallel requests from modern browsers.
The Twisted web server also has a built-in static file server, allowing the elimination of a "front-end" web server that deals with static files by itself, and passing dynamic requests to the application server.
Twisted is also not limited to web serving. As a full-stack network application, it has support for scheduling repeated tasks, running processes and supporting other protocols (for example, a side-channel for online control). Last but not least, in order to integrate that, the language used is Python. As an example for an integrated solution, the Frankenstenian monster plugin show-cases a combo web application combining 4 frameworks, a static file server and a scheduled task updating a file.
While the goal is not to encourage using four web frameworks and a couple of side services in order to greet the user and tell them what time it is, it is nice that if the need strikes this can all be integrated into one process in one language, without the need to remember how to spell "every 4 seconds" in cron or how to quote a string in the nginx configuration file.
29 Oct 2016 3:03pm GMT
On behalf of Twisted Matrix Laboratories, I am honoured to announce the release of Twisted 16.5!
The highlights of this release are:
- Deferred.addTimeout, for timing out your Deferreds! (contributed by cyli, reviews by adiroiban, theisencouple, manishtomar, markrwilliams)
- yield from support for Deferreds, in functions wrapped with twisted.internet.defer.ensureDeferred. This will work in Python 3.4, unlike async/await which is 3.5+ (contributed by hawkowl, reviews by markrwilliams, lukasa).
- The new asyncio interop reactor, which allows Twisted to run on top of the asyncio event loop. This doesn't include any Deferred-Future interop, but stay tuned! (contributed by itamar and hawkowl, reviews by rodrigc, markrwilliams)
- twisted.internet.cfreactor is now supported on Python 2.7 and Python 3.5+! This is useful for writing pyobjc or Toga applications. (contributed by hawkowl, reviews by glyph, markrwilliams)
- twisted.python.constants has been split out into constantly on PyPI, and likewise with twisted.python.versions going into the PyPI package incremental. Twisted now uses these external packages, which will be shared with other projects (like Klein). (contributed by hawkowl, reviews by glyph, markrwilliams)
- Many new Python 3 modules, including twisted.pair, twisted.python.zippath, twisted.spread.pb, and more parts of Conch! (contributed by rodrigc, hawkowl, glyph, berdario, & others, reviews by acabhishek942, rodrigc, & others)
- Many bug fixes and cleanups!
- 260+ closed tickets overall.
For more information, check the NEWS file (link provided below).
You can find the downloads on PyPI (or alternatively our website). The NEWS file is also available on GitHub.
Many thanks to everyone who had a part in this release - the supporters of the Twisted Software Foundation, the developers who contributed code as well as documentation, and all the people building great things with Twisted!
Amber Brown (HawkOwl)
PS: I wrote a blog post about Twisted's progress in 2016! https://atleastfornow.net/blog/marching-ever-forward/
29 Oct 2016 7:11am GMT
27 Oct 2016
Perhaps you are a software developer.
Perhaps, as a developer, you have recently become familiar with the term "containers".
Perhaps you have heard containers described as something like "LXC, but better", "an application-level interface to cgroups" or "like virtual machines, but lightweight", or perhaps (even less usefully), a function call. You've probably heard of "docker"; do you wonder whether a container is the same as, different from, or part of an Docker?
Are you are bewildered by the blisteringly fast-paced world of "containers"? Maybe you have no trouble understanding what they are - in fact you might be familiar with a half a dozen orchestration systems and container runtimes already - but frustrated because this seems like a whole lot of work and you just don't see what the point of it all is?
If so, this article is for you.
I'd like to lay out what exactly the point of "containers" are, why people are so excited about them, what makes the ecosystem around them so confusing. Unlike my previous writing on the topic, I'm not going to assume you know anything about the ecosystem in general; just that you have a basic understanding of how UNIX-like operating systems separate processes, files, and networks.1
At the dawn of time, a computer was a single-tasking machine. Somehow, you'd load your program into main memory, and then you'd turn it on; it would run the program, and (if you're lucky) spit out some output onto paper tape.
When a program running on such a computer looked around itself, it could "see" the core memory of the computer it was running on, any attached devices, including consoles, printers, teletypes, or (later) networking equipment. This was of course very powerful - the program had full control of everything attached to the computer - but also somewhat limiting.
This mode of addressing hardware is limiting because it meant that programs would break the instant you moved them to a new computer. They had to be re-written to accommodate new amounts and types of memory, new sizes and brands of storage, new types of networks. If the program had to contain within itself the full knowledge of every piece of hardware that it might ever interact with, it would be very expensive indeed.
Also, if all the resources of a computer were dedicated to one program, then you couldn't run a second program without stomping all over the first one - crashing it by mangling its structures in memory, deleting its data by overwriting its data on disk.
So, programmers cleverly devised a way of indirecting, or "virtualizing", access to hardware resources. Instead of a program simply addressing all the memory in the whole computer, it got its own little space where it could address its own memory - an address space, if you will. If a program wanted more memory, it would ask a supervising program - what we today call a "kernel" - to give it some more memory. This made programs much simpler: instead of memorizing the address offsets where a particular machine kept its memory, a program would simply begin by saying "hey operating system, give me some memory", and then it would access the memory in its own little virtual area.
In other words: memory allocation is just virtual RAM.
Virtualizing memory - i.e. ephemeral storage - wasn't enough; in order to save and transfer data, programs also had to virtualize disk - i.e. persistent storage. Whereas a whole-computer program would just seek to position 0 on the disk and start writing data to it however it pleased, a program writing to a virtualized disk - or, as we might call it today, a "file" - first needed to request a file from the operating system.
In other words: file systems are just virtual disks.
Networking was treated in a similar way. Rather than addressing the entire network connection at once, each program could allocate a little slice of the network - a "port". That way a program could, instead of consuming all network traffic destined for the entire machine, ask the operating system to just deliver it all the traffic for, say, port number seven.
In other words: listening ports are just virtual network cards.
Getting bored by all this obvious stuff yet? Good. One of the things that frustrates me the most about containers is that they are an incredibly obvious idea that is just a logical continuation of a trend that all programmers are intimately familiar with.
All of these different virtual resources exist for the same reason: as I said earlier, if two programs need the same resource to function properly, and they both try to use it without coordinating, they'll both break horribly.2
UNIX-like operating systems more or less virtualize RAM correctly. When one program grabs some RAM, nobody else - modulo super-powered administrative debugging tools - gets to use it without talking to that program. It's extremely clear which memory belongs to which process. If programs want to use shared memory, there is a very specific, opt-in protocol for doing so; it is basically impossible for it to happen by accident.
However, the abstractions we use for disks (filesystems) and network cards (listening ports and addresses) are significantly more limited. Every program on the computer sees the same file-system. The program itself, and the data the program stores, both live on the same file-system. Every program on the computer can see the same network information, can query everything about it, and can receive arbitrary connections. Permissions can remove certain parts of the filesystem from view (i.e. programs can opt-out) but it is far less clear which program "owns" certain parts of the filesystem; access must be carefully controlled, and sometimes mediated by administrators.
In particular, the way that UNIX manages filesystems creates an environment where "installing" a program requires manipulating state in the same place (the filesystem) where other programs might require different state. Popular package managers on UNIX-like systems (APT, RPM, and so on) rarely have a way to separate program installation even by convention, let alone by strict enforcement. If you want to do that, you have to re-compile the software with
./configure --prefix to hard-code a new location. And, fundamentally, this is why the package managers don't support installing to a different place: if the program can tell the difference between different installation locations, then it will, because its developers thought it should go in one place on the file system, and why not hard code it? It works on their machine.
In order to address this shortcoming of the UNIX process model, the concept of "virtualization" became popular. The idea of virtualization is simple: you write a program which emulates an entire computer, with its own storage media, network devices, and then you install an operating system on it. This completely resolves the over-sharing of resources: a process inside a virtual machine is in a very real sense running on a different computer than programs running on a different virtual machine on the same physical device.
However, virtualiztion is also an extremly heavy-weight blunt instrument. Since virtual machines are running operating systems designed for physical machines, they have tons of redundant hardware-management code; enormous amounts of operating system data which could be shared with the host, but since it's in the form of a disk image totally managed by the virtual machine's operating system, the host can't really peek inside to optimize anything. It also makes other kinds of intentional resource sharing very hard: any software to manage the host needs to be installed on the host, since if it is installed on the guest it won't have full access to the host's hardware.
I hate using the term "heavy-weight" when I'm talking about software - it's often bandied about as a content-free criticism - but the difference in overhead between running a virtual machine and a process is the difference between gigabytes and kilobytes; somewhere between 4-6 orders of magnitude. That's a huge difference.
This means that you need to treat virtual machines as multi-purpose, since one VM is too big to run just a single small program. Which means you often have to manage them almost as if they were physical harware.
When we run a program on a UNIX-like operating system, and by so running it, grant it its very own address space, we call the entity that we just created a "process".
This is how to understand a "container".
A "container" is what we get when we run a program and give it not just its own memory, but its own whole virtual filesystem and its own whole virtual network card.
The metaphor to processes isn't perfect, because a container can contain multiple processes with different memory spaces that share a single filesystem. But this is also where some of the "container ecosystem" fervor begins to creep in - this is why people interested in containers will religiously exhort you to treat a container as a single application, not to run multiple things inside it, not to SSH into it, and so on. This is because the whole point of containers is that they are lightweight - far closer in overhead to the size of a process than that of a virtual machine.
A process inside a container, if it queries the operating system, will see a computer where only it is running, where it owns the entire filesystem, and where any mounted disks were explicitly put there by the administrator who ran the container. In other words, if it wants to share data with another application, it has to be given the shared data; opt-in, not opt-out, the same way that memory-sharing is opt-in in a UNIX-like system.
So why is this so exciting?
In a sense, it really is just a lower-overhead way to run a virtual machine, as long as it shares the same kernel. That's not super exciting, by itself.
The reason that containers are more exciting than processes is the same reason that using a filesystem is more exciting than having to use a whole disk: sharing state always, inevitably, leads to brokenness. Opt-in is better than opt-out.
When you give a program a whole filesystem to itself, sharing any data explicitly, you eliminate even the possibility that some other program scribbling on a shared area of the filesystem might break it. You don't need package managers any more, only package installers; by removing the other functions of package managers (inventory, removal) they can be radically simplified, and less complexity means less brokenness.
When you give a program an entire network address to itself, exposing any ports explicitly, you eliminate even the possibility that some rogue program will expose a security hole by listening on a port you weren't expecting. You eliminate the possibility that it might clash with other programs on the same host, hard-coding the same port numbers or auto-discovering the same addresses.
In addition to the exciting things on the run-time side, containers - or rather, the things you run to get containers, "images"3, present some compelling improvements to the build-time side.
On Linux and Windows, building a software artifact for distribution to end-users can be quite challenging. It's challenging because it's not clear how to specify that you depend on certain other software being installed; it's not clear what to do if you have conflicting versions of that software that may not be the same as the versions already available on the user's computer. It's not clear where to put things on the filesystem. On Linux, this often just means getting all of your software from your operating system distributor.
You'll notice I said "Linux and Windows"; not the usual (linux, windows, mac) big-3 desktop platforms, and I didn't say anything about mobile OSes. That's because on macOS, Android, iOS, and Windows Metro, applications already run in their own containers. The rules of macOS containers are a bit weird, and very different from Docker containers, but if you have a Mac you can check out ~/Library/Containers to see the view of the world that the applications you're running can see. iOS looks much the same.
This is something that doesn't get discussed a lot in the container ecosystem, partially because everyone is developing technology at such a breakneck pace, but in many ways Linux server-side containerization is just a continuation of a trend that started on mainframe operating systems in the 1970s and has already been picked up in full force by mobile operating systems.
When one builds an image, one is building a picture of the entire filesystem that the container will see, so an image is a complete artifact. By contrast, a package for a Linux package manager is just a fragment of a program, leaving out all of its dependencies, to be integrated later. If an image runs on your machine, it will (except in some extremely unusual circumstances) run on the target machine, because everything it needs to run is fully included.
Because you build all the software an image requires into the image itself, there are some implications for server management. You no longer need to apply security updates to a machine - they get applied to one application at a time, and they get applied as a normal process of deploying new code. Since there's only one update process, which is "delete the old container, run a new one with a new image", updates can roll out much faster, because you can build an image, run tests for the image with the security updates applied, and be confident that it won't break anything. No more scheduling maintenance windows, or managing reboots (at least for security updates to applications and libraries; kernel updates are a different kettle of fish).
That's why it's exciting. So why's it all so confusing?5
Fundamentally the confusion is caused by there just being way too many tools. Why so many tools? Once you've accepted that your software should live in images, none of the old tools work any more. Almost every administrative, monitoring, or management tool for UNIX-like OSes depends intimately upon the ability to promiscuously share the entire filesystem with every other program running on it. Containers break these assumptions, and so new tools need to be built. Nobody really agrees on how those tools should work, and a wide variety of forces ranging from competitive pressure to personality conflicts make it difficult for the panoply of container vendors to collaborate perfectly4.
Many companies whose core business has nothing to do with infrastructure have gone through this reasoning process:
- Containers are so much better than processes, we need to start using them right away, even if there's some tooling pain in adopting them.
- The old tools don't work.
- The new tools from the tool vendors aren't ready.
- The new tools from the community don't work for our use-case.
- Time to write our own tool, just for our use-case and nobody else's! (Which causes problem #3 for somebody else, of course...)
A less fundamental reason is too much focus on scale. If you're running a small-scale web application which has a stable user-base that you don't expect a lot of growth in, there are many great reasons to adopt containers as opposed to automating your operations; and in fact, if you keep things simple, the very fact that your software runs in a container might obviate the need for a system-management solution like Chef, Ansible, Puppet, or Salt. You should totally adopt them and try to ignore the more complex and involved parts of running an orchestration system.
However, containers are even more useful at significant scale, which means that companies which have significant scaling problems invest in containers heavily and write about them prolifically. Many guides and tutorials on containers assume that you expect to be running a multi-million-node cluster with fully automated continuous deployment, blue-green zero-downtime deploys, a 1000-person operations team. It's great if you've got all that stuff, but building each of those components is a non-trivial investment.
So, where does that leave you, my dear reader?
You should absolutely be adopting "container technology", which is to say, you should probably at least be using Docker to build your software. But there are other, radically different container systems - like Sandstorm - which might make sense for you, depending on what kind of services you create. And of course there's a huge ecosystem of other tools you might want to use; too many to mention, although I will shout out to my own employer's docker-as-a-service Carina, which delivered this blog post, among other things, to you.
You shouldn't feel as though you need to do containers absolutely "the right way", or that the value of containerization is derived from adopting every single tool that you can all at once. The value of containers comes from four very simple things:
- It reduces the overhead and increases the performance of co-locating multiple applications on the same hardware,
- It forces you to explicitly call out any shared state or required resources,
- It creates a complete build pipeline that results in a software artifact that can be run without special installation or set-up instructions (at least, on the "software installation" side; you still might require configuration, of course), and
- It gives you a way to test exactly what you're deploying.
These benefits can combine and interact in surprising and interesting ways, and can be enhanced with a wide and growing variety of tools. But underneath all the hype and the buzz, the very real benefit of containerization is basically just that it is fixing a very old design flaw in UNIX.
Containers let you share less state, and shared mutable state is the root of all evil.
If you have a more sophisticated understanding of memory, disks, and networks, you'll notice that everything I'm saying here is patently false, and betrays an overly simplistic understanding of the development of UNIX and the complexities of physical hardware and driver software. Please believe that I know this; this is an alternate history of the version of UNIX that was developed on platonically ideal hardware. The messy co-evolution of UNIX, preemptive multitasking, hardware offload for networks, magnetic secondary storage, and so on, is far too large to fit into the margins of this post. ↩
One runs an "executable" to get a process; one runs an "image" to get a container. ↩
Although the container ecosystem is famously acrimonious, companies in it do actually collaborate better than the tech press sometimes give them credit for; the Open Container Project is a significant extraction of common technology from multiple vendors, many of whom are also competitors, to facilitate a technical substrate that is best for the community. ↩
27 Oct 2016 9:23am GMT
22 Oct 2016
Want to TLS-protect your co-located stack of vanity websites with Twisted and Let's Encrypt using HawkOwl's
rproxy, but can't tolerate the bone-grinding tedium of a
pip install? I built a docker image for you now, so it's now as simple as:
1 2 3 4 5 6 7 8 9 10 11 12 13
$ mkdir -p conf/certificates; $ cat > conf/rproxy.ini << EOF; > [rproxy] > certificates=certificates > http_ports=80 > https_ports=443 > [hosts] > mysite.com_host=<other container host> > mysite.com_port=8080 > EOF $ docker run --restart=always -v "$(pwd)"/conf:/conf \ -p 80:80 -p 443:443 \ glyph/rproxy;
There are no docs to speak of, so if you're interested in the details, see the tree on github I built it from.
Modulo some handwaving about docker networking to get that
<other container host> IP, that's pretty much it. Go forth and do likewise!
22 Oct 2016 8:12pm GMT