03 Jun 2021

feedPlanet Twisted

Thomas Vander Stichele: Amazing Marvin and KeyCombiner

I recently came across an excellent tool called KeyCombiner that helps you practice keyboard shortcuts (3 sets for free, $29/6 months for more sets). I spent some time to create a set for Amazing Marvin, my current todo manager of choice.

The shareable URL to use in KeyCombiner is https://keycombiner.com/collecting/collections/shared/f1f78977-0920-4888-a86d-d00a7201502e

I generated it from the printed PDF version of Marvin's keyboard guide and a bunch of manual editing, in a google sheet.

Keyboard shortcuts are great timesavers and help reduce friction, but it's getting harder to learn them properly, and this tool has been a great help for some other apps, and for figuring out common shortcuts across apps, and for picking custom shortcuts (in other apps) that don't conflict. If this is a problem you recognize, give KeyCombiner a try.

Flattr this!

03 Jun 2021 2:27am GMT

01 Jun 2021

feedPlanet Twisted

Glyph Lefkowitz: Detweeting

Twitter is horrible. Everyone already knows this. 1 2 3

But, Twitter can also be good, sometimes, after a fashion.

Throughout the pandemic, I have personally found Twitter to be a helpful tool for self-regulation. The little hits of dopamine on demand throughout the day have allowed me to suppress and modulate some truly unpleasant intrusive thoughts, during times when I have had neither the executive function nor sufficient continuous uninterrupted time allocated to focus on other, more useful things. Twitter has allowed me to anesthetize the internal doom-sayer during the absolutely most mind-shatteringly stressful period of my - and, presumably, most living humans' - entire life.

Like any anesthetic, however, there comes a point where administering additional doses is more harmful than beneficial, even if the pain it's suppressing is still there. It's time for me to take a break, and it seems like it would be wise to take one long enough for new habits to form.

To that end, I'll be taking the entirety of June off from Twitter; depending on how that goes, I might see you back there on 2021-07-01, or, should I find the fortitude in the meanwhile, never.

The "I'm taking a break from social media" genre of post is certainly a bit self-indulgent4, so it behooves me to say why I'm bothering to post about this rather than just, you know, doing it.

There are three reasons:

  1. Changing times: I'm naturally distractable so I tend to keep an eye on my social media usage. I periodically look at how much time I'm spending, the benefits I'm getting, and the problems it's causing. For most of the pandemic I could point to at least one or two useful actions per week that I'd taken because of something I'd learned on Twitter. Sometimes I'd learn about risk modeling or health precautions, emerging understanding of impacts of isolation on mental health, and ways to participate to address the exhausting, non-stop political upheaval of 2020/2021. But now I'm mostly just agonizing over the lack of any useful guidance for parents with young children who cannot yet get vaccinated for COVID-19 at this late stage of the crisis, and getting directionlessly more angry about the state of the world. The benefits have slowly evaporated over the last few weeks but the costs remain.5

  2. Accountability: simply deleting the app, logging out of the website, etc, is clearly not enough to stay away, so an audience who can notice me posting and say "stop posting" should hopefully be enough to keep me honest. Please do note that I will still be allowing certain automated systems to post on my behalf, though. This post, for example, and any other posts I put on my blog, will show up in my Twitter feed automatically, I don't post those manually.

  3. A gentle prompt for others: maybe you're having similar issues. Maybe you'd like to join me. During the pandemic I've found that many types of unpleasant mental states that I've described are more relatable than usual. Some so much so that they've got whole articles about jargon to describe them, like "disenfranchised stress"6 and "vicarious trauma"7. Feel free to ignore this: I'm not saying you should join me. Just that if you've already been thinking you should, you can take this as a challenge to do the same.

In the meanwhile, I'll try to do some longer-form writing, particularly writing that isn't about social media.

If you'd like to get in touch, I won't be replying to DMs, so feel free to send me an email directly. If you want to interact in real time, I am still on IRC, as glyph on irc.libera.chat. Feel free to drop by #glyph and say hi.


  1. "I hope Twitter goes away", by Alex Gaynor

  2. "Why is Twitter so toxic?", by Thought Slime

  3. "Quit Social Media. Your Career May Depend on It.", by Cal Newport in the New York Times

  4. "I'm Taking a Social Media Break", by Caren Lissner for McSweeny's Internet Tendency

  5. This is a personal thing, everyone's timeline is different, so if you're having a great time, you don't have to email me. Mazel tov.

  6. "White-Collar Workers Hit the Pandemic Wall", by Suzanne Degges-White Ph.D. in Psychology Today

  7. "Vicarious Trauma, Mirror Neurons, and COVID-19", by Maureen O'Reilly-Landry Ph.D. in Psychology Today

01 Jun 2021 6:05am GMT

06 Apr 2021

feedPlanet Twisted

Moshe Zadka: Portable Python Binary Wheels

It is possible to work with Python quite a bit and not be aware of some of the subtler details of package management. Since Python is a popular "glue" language, one of its core strengths is integrating with libraries written in other languages: from database drivers written in C, numerical algorithms written in Fortran, to cryptographic algorithms written in Rust. In all these cases, one way to avoid error-prone and frustrating installation errors in the target environment is to distribute pre-built code. However, while source code can be made portable, making the build output portable is a lot more complicated.

Note: This post focuses specifically about binary wheels on Linux. Binary wheels exist for other platforms, but those are beyond the current scope.

The Python manylinux project, composed of three PEPs, two software repositories, and support in pip, addresses how to accomplish that. These problems are hard, and few other ecosystems solve them as well as Python. The solution has many moving parts, developed over the course of ten years. Unfortunately, this means that understanding all of those is not easy.

While this post cannot make it easy, it can at least make it easier, by making sure all the details are in one place.

Wheels

Python packages come in two main forms:

  • Source
  • Wheels

Wheels are "pre-built" packages that are easier and faster to install. The name comes originally from a bad joke: the Monty Python Cheese Shop sketch, since PyPI used to be called "Cheese Shop" and cheese is sometimes sold in wheels. The name has been retconned for another bad joke, as a reference to the phrase "reinventing the wheel", allowing Python packaging talks to make cheap puns. For the kind of people who give packaging talks, or write explainers about packaging formats, these cheap jokes fill the void in what would otherwise be their soul.

Even for packages that include no native code, only pure Python, wheels have some advantages. They do not execute any potentially-fragile code on installation, and querying their dependencies can be done without a Python interpreter.

However, when packages do include native code the story is more complicated.

C library

Let's start with the relatively straightforward part: portable binary wheels for Linux are called manylinux, not alllinux. This is because it relies on the GNU C library, and specific features of it. There is another popular libc for Linux: musl. There is absolutely no attempt to be compatible with musl-based Linux distributions[#], the most famous among them is Alpine Linux.

[#] For now. See PEP-656

However, most other distributions derive from either Debian (for example, Ubuntu) or from Fedora (CentOS, RHEL, and more). Those all use the GNU C library.

GNU C library

GNU libc has an official "infinite backwards compatibility" policy: libc6 version X.Y is compatible with W.Z if X>=W or X=W and Y>=Z.

Aside: the 6 in libc6 does not refer to the version of the GNU C Library: Linux only moved to adopt the GNU C Library in libc6. The libc4 library was written from scratch, while libc5 combined code from GNU C Library version 1 and some bits from BSD C library. In libc6, Linux moved to rely on GNU C Library version 2.x, first released in January 1997. The GNU C Library is still, over twenty years later, on major version 2. We will ignore some nuances, and just treat all GNU C Library versions as 2.X.

The infinite compatibility policy means that binaries built against libc6 version 2.17, for example, are compatible with libc6 version 2.32.

Manylinux history

The relevant PEP is dense but worth reading. "Portable" is a loaded word, and unpacking it is important. The specific meaning of "portable" is encoded in the auditwheel policy file. This file concedes the main point: portability is a spectrum.

When the manylinux project started, in 2016, the oldest security-supported open source distribution was CentOS: specifically, CentOS 5.11. It was released in 2014. However, because CentOS tracks RHEL, and RHEL is conservative, the GNU C library (glibc, from now on) it used was 2.5: a version released in 2006.

Even then, it was clear that the "minimum" compatibility level will be a moving target. Because of that, that compatibility level was named manylinux1.

In 2018, the manylinux project moved to a more transparent naming scheme: the date in which the relevant compatible CentOS release was first released. Thus, instead of manylinux1, the next compatibility target (defined in 2018) was called manylinux2010, referencing CentOS 6.

In April 2019, manylinux2014 was defined as a compatibility tag, referencing CentOS 7.

In the beginning of 2021, Red Hat, in a controversial move, changed the way CentOS works, effectively nullifying the value any future releases have as a way of specifying a minimum glibc version support.

The Python community decided to switch to a new scheme: directly naming the version of glibc supported. The first such tag, manylinux_2_24, was added in November 2020. The next release of auditwheel, 4.0, moves all releases to glibc-based tags, while keeping the original names as "aliases". It also adds a compatibility level manylinux_2_27.

Libc compatibility and beyond

The compatibility level of a manylinux wheel is defined by the glibc symbols it links against. However, this is not the only compatibility manylinux wheels care about: this just puts them on a serial line from "most compatible" to "least compatible".

Each compatibility level also includes A list of allowed libraries to dynamically link against. Specific symbol versions and ABI flags that depend on both glibc and gcc.

However, many Python extensions include native code precisely because they need to link against a C library. As a concrete example, the mysqlclient wheel would not compile if the libmysql headers are not installed, and would not run if the libmysql shared library (of a version that matches the one the package was compiled against) is not installed.

It would seem that portable binary wheels are only of limited utility if they do not support the main use case. However, the auditwheel tool includes one more twist: patching ELF.

Elves

Elves predate Tolkien's Middle-Earth. They appear in many Germanic and Nordic mythologies: sometimes as do-gooders, sometimes as evil-doers, but always associated with having powerful magic.

Our context is no less magical, but more modern. ELF ("Executable and Loader Format") is the format of executable and shared libraries in Linux, since libc5 (before that, Linux used the so-called a.out format).

When auditwheel is asked to repair a wheel for a specific platform version, it checks for any shared libraries it links against that are not part of the pre-approved list. If it finds any, it patches them directly into the module. This means that post repair, the new ("repaired") wheel will not depend on any libraries outside the approved list.

These repaired binary wheels will include the requested manylinux tag and the patched modules. They can be uploaded to PyPI or other Python packaging repositories (such as DevPI).

For pip to install the correct wheels it needs to be up-to-date in order to self-check the OS and decide which manylinux tags are compatible.

Installing Binary Wheels

Because wheels tagged as linux_<cpu architecture> (for example, linux_x86_64) cannot be assumed on any platform other than the one they have been compiled for, PyPI rejects those. In order to upload a binary wheel for Linux to PyPI, it has to be tagged with a manylinux tag. It is possible to upload multiple manylinux wheels for a single package, each with a different compatibility target.

When installing packages, pip will prefer to use a wheel, if available, instead of a source distribution. When pip checks the availability of a wheel, it will introspect the platform it is running it, and map it to the list of compatible manylinux distributions. Since the list is changing, it is possible that a newer pip will recognize more compatibilities than an older pip.

Once pip finds the list of manylinux tags compatible with its platform, it will install the least-compatible wheel that is still compatible with the platform: for example, it will prefer manylinux2014 to manylinux2010 if both are compatible. If there are no binary wheels available, pip will fall back to installing from a source distribution (sdist). As mentioned before, installing from sdist, at the very least, requires a functional compiler and Python header files. It might also have specific build-time dependencies, depending on the package.

Thanks

Thanks to SurveyMonkey for encouraging me to do the research this post is based on.

Thanks to Nathaniel J. Smith and Glyph for their feedback on this blog, post-publication. I have added some corrections and clarifications based on their feedback.

All mistakes that remain are my responsibility.

06 Apr 2021 3:00am GMT

17 Mar 2021

feedPlanet Twisted

Glyph Lefkowitz: Interfaces and Protocols

Some of you read my previous post on typing.Protocols and probably wondered: "what about zope.interface?" I've advocated strongly for it in the past - but now that we have Mypy and Protocols, is it simply a relic of an earlier time? Can we entirely replace it with Protocol?

Let's have a look.

Typing in 2 dimensions

In the previous post I discussed structural versus nominal typing. In Mypy's type system, most classes are checked nominally whereas Protocol is checked structurally. However, there's another way that Protocol is distinct from a normal class: normal classes are concrete types, and Protocols are abstract.

Abstract types:

  1. cannot be instantiated: every instance of an abstract type is an instance of some concrete sub-type, and
  2. do not include (complete) implementation logic.

Concrete types:

  1. can be instantiated: they are complete descriptions of a type, and
  2. must include all their own implementation logic.

Protocols and Interfaces are both abstract, but Interfaces are nominal. The highest level distinction between the two is that when you have a problem that requires an abstract type, but nominal checking is preferable to structural, Interfaces are a better solution.

Python's built-in Abstract Base Classes are technically abstract-and-nominal as well, but they're in a strange halfway space; they're formally "abstract" because they can't be instantiated, but they're partially concrete in that they can contain any amount of implementation logic themselves, and thereby making an object which is a subtype of multiple ABCs drags in all the usual problems of the conflicting namespaces within multiple inheritance.

Theoretically, there's a way to treat ABCs as purely abstract - which is to use ABCMeta.register - but as of this writing (March 2021) it doesn't work with Mypy, so within the context of "static typing in Python" we presently have to ignore it.

Practicalities

The first major advantage that Protocol has is that since it is now built in to Python itself, there's no reason not to use it. When Protocol didn't even exist, regardless of all the advantages of adding explicit abstract types to your project with zope.interface, it did still have the small down-side of requiring a new dependency, with all the minor headaches that might imply.

beyond the theoretical distinctions, there's a question of how well tooling supports zope.interface. There are some clear gaps; there is not a ton of great built-in IDE support for zope.interface; less-sophisticated linters will sometimes still complain that Interfaces don't take self as their first argument. Indeed, Mypy itself does this by default - although more on that in a moment. Less mainstream performance-focused type-checkers like Pyre and Pyright don't support zope.interface, either, although their lack of support for zope.interface is just a part of a broader problem of their lack of extensibility; they also can't support SQLAlchemy or the Django ORM without special-casing in the tools themselves.

But what about Mypy itself - if we have to discount ABCMeta.register due to practical tooling deficiencies even if they provide a built-in way to declare a nominal-but-abstract type in principle, we need to be able to use zope.interface within Mypy as well for a fair comparison with Protocol. Can we?

Luckily, yes! Thanks to Shoobx, there's a fairly actively maintained Mypy plugin that supports zope.interface which you can use to statically check your Interfaces.

However, this plugin does have a few key limitations as of this writing (Again, March 2021), which makes its safety guarantees a bit lower-quality than Protocol.

The net result of this is that Protocols have the "home-field advantage" in most cases; out of the box, they'll work more smoothly with your existing editor / linter setup, and as long as your project supports Python 3.6+, at worst (if you can't use Python 3.7, where Protocol is built in to typing) you have to take a type-check-time dependency on the typing_extensions package, whereas with zope.interface you'll need both the run-time dependency of zope.interface itself and the Mypy plugin at type-checking time.

So in a situation where both are roughly equivalent, Protocol tends to win by default. There are undeniably big areas where Interfaces and Protocols overlap, and in plenty of them, using Protocol is a fine idea. But there are still some clear places that zope.interface shines.

First, let's look at a case which Interfaces handle more gracefully than Protocols: opting out of matching a simple shape, where the shape doesn't fully describe its own meaning.

Where Interfaces work best: hidden and complex meanings

The string is a stark data structure and everywhere it is passed there is much duplication of process. It is a perfect vehicle for hiding information.

Alan Perlis, "Epigrams in Programming", Epigram 34.

The place where structural typing has the biggest advantage is when the type system is expressive enough to fully encode the meaning of the desired behavior within the structure of the type itself. Consider a Protocol which describes an object that can add some integers together:

1
2
3
class Math(Protocol):
    def add_integers(addend1: int, addend2: int) -> int:
        ...

It's fairly unambiguous what adherents to this Protocol should do, and anyone implementing such a thing should be able to clearly tell that the method is supposed to add a couple of integers together; there's nothing hidden about the structure of the integers, no constraints the type system won't let us specify. It would be quite surprising if anything that didn't have the intended behavior would match this Protocol.

A the other end of the spectrum, we might have a plugin Interface that has a lot of hidden structure. For this example, we have an Interface called IPlugin containing a method with an easy-to-conflict-with name ("name") overloaded with very specific constraints on its return type: the string must contain the dotted-path name of a Python object in an import-able module (like, for example, "os.path.join").

1
2
3
class IPlugin(Interface):
    def name() -> str:
        "Return the fully-qualified Python identifier of the thing to load."

With Protocols, you can work around these limitations, by manually making it harder to match; adding elements to the structure that embed names relevant to its semantics and thereby making the type behave more as if it were nominally typed.

You could make the method's name long and ugly instead (plugin_name_to_load, let's say) or add unused additional attributes (yep_i_am_a_plugin = Literal[True]) in order to reduce the risk of accidental matches, but these workarounds look hacky, and they have to be manually namespaced; if you want to mark it as having semantics associated with your specific plugin system, you have to embed the name of that system in your attributes themselves; here we're just saying "plugin" but if we want to be truly careful, we have to embed the whole name of our project in there.

With Interfaces, the maintainer of each implementation must explicitly opt in, by choosing whether to specify that they are an @implementer(IPlugin). Since they had to import IPlugin from somewhere, this annotation carries with it a specific, namespaced declaration of semantic intent: "I know what the Interface IPlugin means, and I promise that I can provide it".

This is the most salient distinction between Protocols and Interfaces: if you have strong reasons to want adherents to the abstract type to opt in, you want an Interface; if you want them to match automatically, you want a Protocol.

Runtime support

Interfaces also provide a more nuanced set of runtime checks.

You can say that an object directlyProvides an interface, allowing for some level of (at least runtime) type safety, and ask if IPlugin is .providedBy some object.

You can do most of this with Protocol, but it's awkward. The @runtime_checkable decorator allows your Protocol to make isinstance(x, MyProtocol) work like IMyInterface.providedBy(x), but:

  1. you're still missing directlyProvides; the runtime checking is all by type, not by the individual properties of the instance;
  2. it's not the default, so if you're not the one defining the Protocol, there's no guarantee you'll be able to use it.

With Interfaces, there's also no mandatory relationship between the implementer (i.e. the type whose instances fit the specified shape) and the provider (the specific object which can fit the specified shape). This means you get features like classProvides and moduleProvides "for free".

Interfaces work particularly well for communication between frameworks and application code. For example, let's say you're evolving the meaning of an Interface implemented by applications over time - EventHandler, EventHandler2, EventHandler3 - which have similarly named and typed methods, but subtly different expectations on their lifecycle or when precisely the methods will be called. A framework facing this problem can use a series of Interfaces, and check at runtime to see which of these the application implements, and be secure in the knowledge that the application has properly intentionally adopted the new interface, and doesn't just happen to have a matching method name against an older version.

Finally, zope.interface gives you adaptation and adapter registries, which can be a useful mechanism for doing things like templating, like a much more powerful version of singledispatch from the standard library.

Adapter registries are nuanced, complex tools and unfortunately an example that captures the full utility of their power would itself be commensurately complex. However, the core of adaptation is the idea that if you have an arbitrary object x, and you want a provider of the interface IY, you can do the following:

1
y = IY(x, None)

This performs a multi-stage check:

  1. If x already provides IY (either via implementer, provider, directlyProvides, classProvides, or moduleProvides), it's simply returned; so you don't need to special-case the case where you've already got what you want.
  2. If x has a __conform__(interface) method, it'll be called with IY as the interface, and if __conform__ returns anything non-None that result will be returned from the call to IY.
  3. If IY has a specially-defined __adapt__ method, it can implement its own logic for this hook directly.
  4. Each globally-registered function in zope.interface's adapter_hooks will be invoked to find a function that can transform x into an IY provider. Twisted has its own global registry in this list, which is what registerAdapter manipulates.

But from the perspective of the caller, you can just say "I want an IY".

With Protocols, you can emulate this with functools.singledispatch by making a function which returns your Protocol type and registers various types to do conversion. The place that adapter registries have an advantage is their central nature and consistent idiom for converting to the target type; you can use adaptation for any Interface in the same way, and any type can participate in adaptation in the ways listed above via flexible mechanisms depending on where it makes sense to put your implementation, whereas any singledispatch function to convert to a Protocol needs to be bespoke per-Protocol.

Describing and restricting existing shapes

There are still several scenarios where Protocol's semantics apply more cleanly.

Unlike Interfaces, Protocols can describe the types of things that already exist. To see when that's an advantage, consider a sprawling application that uses tons of libraries and manipulates 3D spatial data points.

There's a convention among these disparate libraries where they all represent a "point" as an object with .x, .y, and .z attributes which are all floats. This is a natural enough shape, given the domain, that lots of your libraries just fit it by accident. You want to write functions that can work with data output by any of these libraries as long as it plausibly looks like your own concept of a Point:

1
2
3
4
class Point(Protocol):
    x: float
    y: float
    z: float

In this case, the thing defining the Protocol is your application; the thing implementing the Protocol is your collection of libraries. Since the libraries don't and can't know about the application - the dependency arrow points the other way - they can't reference the Protocol to note that they implement it.

Using Protocol, you can also restrict an existing type to preserve future flexibility.

For example, let's say we're implementing a "mailbox" type pattern, where some systems deliver messages and other systems retrieve them later. To avoid mix-ups, the system that sends the messages shouldn't retrieve them and vice versa - receivers only receive, and senders only send. With Protocols, we can describe this without having any new custom concrete types, like so:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
from typing import Protocol, TypeVar

T_co = TypeVar("T_co", covariant=True)
T_con = TypeVar("T_con", contravariant=True)

class Sender(Protocol[T_con]):
    def add(self, item: T_con) -> None:
        "Put an item in the slot."

class Receiver(Protocol[T_co]):
    def pop(self) -> T_co:
        "Retrieve an item from the PO box."

All of that code is just telling Mypy our intentions; there's no behavior here yet.

The actual implementation is even shorter:

1
2
3
from typing import Set

mailbox: Set[int] = set()

Literally no code of our own - set already does the job we described. And how do we use this?

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
def send(sender: Sender[int]) -> None:
    sender.add(3)

def receive(receiver: Receiver[int]) -> None:
    receiver.pop()
    receiver.add(3)
    # Mypy stops us making this mistake:
    # "Receiver[int]" has no attribute "add"

send(mailbox)
receive(mailbox)

For its initial implementation, this system requires nothing beyond types available in the standard library; just a set. However, by treating their parameter as a Sender and a Receiver respectively rather than a Set, send and receive prevent themselves from using any functionality from the set passed in aside from the one method that their respective roles are supposed to "see". As a result, Mypy will now tell us if any code which receives the sender object tries to remove objects.

This allows us to use existing data structures in libraries without the usual attendant problem of advertising to all clients that every tiny implementation detail of those existing structures is an intended part of the public interface. Python has always tried to make these sort of distinctions by leaving certain things undocumented or saying narratively which things you should rely on, but it's always hit-or-miss (usually miss) whether library consumers will see those admonitions or not; by making it a feature of the programming environment, Mypy makes it harder to ignore.

Conclusions

In modern Python code, when you have an abstract collection of behavior, you should probably consider using a Protocol to describe it by default. However, Interface is also staying up to date with modern Python tooling by with Mypy support, and it can be worthwhile for more sophisticated consumers that want support for nominal typing, or that want to draw on its reach adaptation and component registration feature-set.

17 Mar 2021 7:22am GMT

12 Mar 2021

feedPlanet Twisted

Moshe Zadka: So you want to create a universe

A story about looking for a universe, and finding a pi(e)

This is fine. You need not feel shame. Many want to create a universe. But it is good you are being careful. A universe with sentient beings is a big moral responsibility.

It is good to start with something small. The smallest. Move up from there. So there is one point, that can move in one dimension. Just interesting enough to be exciting, but little chances of messing anything serious up.

It is good if the way the point moves can be described by a function. At each point in time, \(t\), the point is at a place, \(x\). This mapping is a function.

$$ f: \mathbb{R} \to \mathbb{R} $$

Right now there are no sentient beings in the universe. But inevitably, you will want to create a universe with such beings. Beings that want the universe to be predictable. Better start practicing now!

This means you want \(f\) to have a simple mathematical description. The concept of death is morally complicated, but you want to allow for the potential of beings with a limited lifespan. This means that figuring out how \(f\) changes should not rely on \(t\).

One way to have a predictable function that does not depend on \(t\) is to define \(f\) with a position-independent differential equation (PIDE): an equation that involves \(f\) and its derivatives.

You are just starting out, so why not have the simplest PIDE?

$$ f' = f $$

Any simpler and your universe will be constant! Solving differential equations is hard. A solution probably exists, right? Hopefully, one that is more interesting than the constant zero function.

$$ f = 0 $$

Yes, it definitely solves it, but that sounds like a really boring universe. If a solution that is not \(0\) at \(0\) exists, \(f\), then

$$ f/f(0) $$

is also a solution since derivatives are linear. The function \(f/f(0)\) is an interesting solution. Call it \(e\).

For a constant \(c\), \(e(x + c)/e(c)\) solves the equation and is \(1\) at \(0\). Differential equations have a unique solution with the same starting condition, so

$$ e(x + c)/e(c) = e(x) $$

or, equivalently

$$ e(x + c) = e(x)e(c) $$

As a result, with a little induction,

$$ e(n/m) ^ m = e(1)^n $$

or

$$ e(n/m) = \sqrt[m]{e(1) ^ n} $$

If you use the convention that

$$ \sqrt[m](a) = a^{1/m} $$

, you get

$$ e(n/m) = e(1)^{n/m} $$

Your universe is one dimensional, but you cannot help thinking about two dimensions. Is there a way to get a second dimension for "free"? You decide to leave the details for later, and for now, just see if \(e\) can be extended to the complex numbers. This is sure to come handy later.

Since \(e' = e\), \(e'' = e\) nd so on. In particular

$$ 1 = e(0) = e'(0) = e''(0) ... $$

This means that the Taylor series for \(e\) looks like

$$ e(x) = \Sigma_{n=0}^{\infty} x^n / n! $$

This converges for every \(x\). Because it converges absolutely everywhere, it can be extended to complex number with the same formula, and \(e' = e\) over the complex numbers as well.

If \(t\) is a real number,

$$ e(-it) = \overline {e (it) } $$

and so

$$ 1 = e(0) = e(it + (-it)) = e(it)e(-it)=e(it)\overline{(e(it))} = || e(it) || $$

Nifty, for real \(t\), you get that \(e(it)\) is on the unit circle. But where on the unit circle?

$$ \operatorname{Re} e(2i) = 1 - 2^2/2! + 2^4/4! - 2^6 / 6! + d $$
$$ = 1 - 2 + 16/24 - 64/720 + ... = -1 + 2/3 - 4/45 + d = -0.4\bar{2} + d $$

Where \(d\) represents the rest of the series.

We can estimate \(|d|\) as follows:

$$ |d| \leq \Sigma_{n=0}^{\infty} 2^{8 + n}/(8 + n)! \leq \Sigma_{n=0}^{\infty} ((2/315) 2^n / 4^n)) $$
$$ \leq 2/315 < 1/10 = 0.1 $$

and

$$ \operatorname{Re} e(2i) < -0.4\bar{2} + d < -0.4\bar{2} + 0.1 < -0.3 < 0 $$

Now

$$ \operatorname{Re} e(0i) = \operatorname{Re} e(0) = \operatorname{Re} 1 = 1 > 0 $$

and \(t \to \operatorname{Re} e(ti)\) is a continuous function. This means there must be a minimal \(t\) such that

$$ \operatorname{Re} e(ti) = 0 $$

This is fun! It is not obvious how this ties into the original goal of building universes, but it definitely cannot hurt! This is an interesting number. You decide to give it a name.

The minimal \(t\) such that \(\operatorname{Re} e(ti) = 0\) will be \(\rho\). Since \(||e(\rho i)|| = 1\), this means that

$$ e(\rho i) = \pm i $$

and so

$$ e(4 \rho i) = (\pm 1)^4 i^4 = 1 $$

Looks like \(4 \rho\) is even more interesting than \(\rho\), maybe it should have its own name. How about \(\tau\)? With this new symbol, we get

$$ e (\tau i + x) = e(\tau i)e(x) = 1e(x) = e(x) $$

So \(e\), the tentative universe-evolution function, has a period of \(\tau i\). You did not expect it. It is good you started with a simple universe, there are many things to learn before creating an interesting one.

You had a special name for \(\rho\) and for \(4 \rho\), it seems almost rude not to have a name for their geometric mean. All this universe creation is hungry work, though. It would be so much easier to think if you had a piece of...

Back to the topic at hand, you decide to call the geometric mean of \(\rho\) and \(\tau\), \(\tau / 2\), \(\pi\):

$$ \pi = \tau / 2 $$

Time to relax and eat a nice piece of pie. You definitely deserved it. Whether it is savory or sweet, a pie is delicious. Enjoy it. Savor it. The universe will be waiting for you, right here.

Satisfied and with a full tummy, you get back to the universe. You gave \(\rho\), \(\pi\), and \(\tau\) names.

Any idea what their approximate value is?

You know that \(0 < \rho < 2\), but this is a pretty wide gap of ignorance. Calculating \(e(it)\), for \(t\) in that range, seems to converge quickly.

Just a handful of terms gave you something accurate to within \(2/315\). Time to leave abstract thinking, and crank up your universe simulation machine. You want to have a sense of the values involved here.

import math

def approximate_re_e_i(t):
    return sum((-1) ** n * t ** (2*n) / math.factorial(2 * n) for n in range(0, 10))

With a decent approximation of $ \operatorname{Re} e(it) $, you look for where the function is zero using binary search. It might not be the fastest way, but your universe simulator can handle it.

def find_zero():
    low, high = 0, 2
    while (high - low) > 0.001:
        midpoint = (high + low) / 2
        value = approximate_re_e_i(midpoint)
        if value < 0:
            high = midpoint
        else:
            low = midpoint
    return (high + low) / 2

Now it is time to activate the simulator, and find the values.

rho = find_zero()
tau = 4 * rho
pi = tau / 2

Wonderful progress on the universe for today. A great day. A wonderful day. A day you want to celebrate.

But when?

With \(\rho <2\), you know that \(\pi <8\). Sounds like the integer part of it could be the month, maybe?

month = int(pi)

All that is left is to choose a day. There are thirty days in a month, so hopefully two digits will do here.

rest = pi - month
day = int(rest * 100)

import datetime

year = datetime.date.today().year

celebration = datetime.date(year=year, month=month, day=day)

print("Celebrate on", celebration)

Celebrate on 2021-03-14

Enjoy your pie!

12 Mar 2021 5:00am GMT

02 Mar 2021

feedPlanet Twisted

Hynek Schlawack: Semantic Versioning Will Not Save You

The widely used Python package cryptography changed their build system to use Rust for low-level code which caused an emotional GitHub thread. Enthusiasts of 32-bit hardware from the 1990s aside, there was a vocal faction that stipulated adherence to Semantic Versioning from the maintainers - claiming it would've prevented all grief. I will show you not only why this is wrong, but also how relying on Semantic Versioning hurts you.

02 Mar 2021 12:00am GMT

19 Feb 2021

feedPlanet Twisted

Moshe Zadka: Virtual Buffet Line

Many people have written about the logistical challenges of food in a conference. You trade off not just, as Chris points out, expensive food versus terrible food, but also the challenges of serving the food to everyone at once.

One natural method of crowd control is the buffet line. People shuffling slowly through the line, picking food items, allows for a natural choke-point that avoids overwhelming table and staff availability. It is unpleasant to have to walk slowly, at the pace of the slowest decision maker, while hungry.

As humans do, one tries to make the best of a bad situation. All of the people in the conference share some common interests, and many of them have interesting tales besides. A common way of entertaining yourself in the line is to strike up a conversation with the random person before or ahead of you. Indeed, this has led me to hear some fascinating things: tales of incidents, new libraries, or just interesting perspectives.

With a global pandemic looming, responsible folks have either cancelled conferences or led virtual conferences. Virtual conferences, especially while a global pandemic ravages the world, are nowhere as good as the real thing.

One of my favorite things in conferences is the so-called hallway track, where we stand and chat about common interests. Friendly and inclusive people stand in the "pac-man" shape, so that people can join the conversation. I have learned a lot from these random conversations.

As humans do, one tries to make the best of a bad situation. While we are stuck at home, at least lunch time is easy. When you want to eat, order a delivery or step into the kitchen and food, chosen by you, is available. No shuffling. No waiting.

So far, no conference has tried to have a virtual buffet line, where people are forced to virtually wait in a line before eating. True, the random conversations are gone, but they have always been a coping mechanism, not the intent. If the pandemic continues, however, I am not sure this will remain true.

Conferences have already tried to "recreate" many of the constraints foisted upon physical conferences by the uncaring laws of physics in order to make them feel more "real". This rarely helps the "realism" but often creates new, unexpected problem.

One conference platform allows for "virtual coffee tables" where 2-10 people (depending on the table) can sit. Once the table is "full", nobody else can join the conversation. Table-mates can speak via text, video, or audio.

The reason real hallway tables are set for 2-10 people is because of physical constraints and avaialbility of furniture. There was no careful design of which combination of 2-10-sized tables makes for an "optimal" experience.

Further, this is not even a good recreation. With real tables, space is somewhat negotiable. An extra person can fit in if the seated people will let them. People can see the conversation. People can trade-off a subtle "how eaves-droppy" they want to be. You can stand next to the table for a long time, but possibly perceived as weird. You can pass by quickly, catch a whiff of the conversation. You can hear from afar, but only distorted highlights

These things mean, for example, someone seated at a table trying to harass a table-mate chances being seen and caught by random people. While we hope that this is not the only thing preventing people from harassing, this is a useful social enforcement tool. However, the "virtual tables" are more like "virtual isolation rooms". Stuck inside one with an unpleasant person means they can say and do what they will with no fear of witnesses.

How does Code of Conduct enforcement happens? How do vulnerable demographics feel about that?

Attempting to recreate a physical experience in a virtual world is doomed to failure, unless you have sophisticated science-fictional-level virtual reality and physics simulation. However, as a culture, we have adapted to video chats, video webinars, text chats and more. We figured out social conventions and norms, and how to enforce them.

When designing a virtual conference, concentrating on "physical fidelity" is a fool's errand. Instead, figure out what kind of pleasant virtual experiences you want to supply, how to enforce those norms you want to enforce, and how to communicate expected standards to the attendees.

Just like physical conferences can be different, virtual conferences can be different. Pre-recorded or live talks, video distribution platforms, chatting platforms, and more, need to be chosen carefully. Optimize for a good conference, not a conference that feels like an in-person conference.

19 Feb 2021 3:00am GMT

11 Feb 2021

feedPlanet Twisted

Hynek Schlawack: Hardening Your Web Server’s SSL Ciphers

There are many wordy articles on configuring your web server's TLS ciphers. This is not one of them. Instead, I will share a configuration that scores a straight "A" on Qualys's SSL Server Test in 2020.

11 Feb 2021 12:00am GMT

04 Jan 2021

feedPlanet Twisted

Hynek Schlawack: Testing & Packaging

How to ensure that your tests run code that you think they are running, and how to measure your coverage over multiple tox runs (in parallel!).

04 Jan 2021 12:00am GMT

12 Dec 2020

feedPlanet Twisted

Moshe Zadka: DRY is a Trade-Off

DRY, or Don't Repeat Yourself is frequently touted as a principle of software development. "Copy-pasta" is the derisive term applied to a violation of it, tying together the concept of copying code and pasta as description of software development bad practices (see also spaghetti code).

It is so uniformly reviled that some people call DRY a "principle" that you should never violate. Indeed, some linters even detect copy-paste so that it can never sneak into the code. But copy-paste is not a comic-book villain, and DRY does not come bedecked in primary colors to defeat it.

It is worthwhile to know why DRY started out as a principle. In particular, some for some modern software development practices, violating DRY is the right thing to do.

The main problem with repeating a code chunk is that if a bug is found, there is more than one place where it needs to be fixed. On the surface of it, this seems like a reasonable criticism. All code has bugs, those bugs will be fixed, why not minimize the cost of fixing them?

As with all engineering decisions, following DRY is a trade-off. DRY leads to the following issues:

Loss of locality

The alternative to copy-pasting the code is usually to put it in a function (or procedure, or a subroutine, depending on the language), and call it. This means that when reading through the original caller, it is less clear what the code does.

When you are debugging, this means we need to "Step into" the function. While stepping into, it is non-trivial to check the original variables. If you are doing "print debugging", this means finding the original source for the function and adding relevant print statements there.

Especially when DRY is pointed out and reactions are instinctive, the function might have some surprising semantics. For example, mutating contents of local variables is sensible in code. When you move this code to a function as a part of a straightforward DRY refactoring, this means that now a function is mutating its parameters.

Overgeneralized code

Even if the code initially was the same in both places, there is no a-priori guarantee that it will stay this way. For example, one of those places might be called frequently, and so would like to avoid logging too many details. The other place is called seldom, and those details are essential to trouble-shooting frequent problems.

The function that was refactored now has to support an extra parameter: whether to log those details or not. (This parameter might be a boolean, a logging level, or even a logging "object" that has correct levels set up.)

Since usually there is no institutional memory to undo the DRY refactoring, the function might add more and more cases, eventually almost being two functions in one. If the "copy-pasta" was more extensive, it might lead to extensive over-generalization: each place needs a slightly different variation of the functionality.

Coordination issues

Each modification of the "common" function now requires testing all of its callers. In some situations, this can be subtly non-trivial.

For example, if the repetition was across different repositories, now updates means updating library versions. The person making the change might not even be aware of all the callers. The callers only find out when a new library version is used in their code.

Ownership issues

When each of those code segments were repeated, ownership and responsibility were trivial. Whoever owned the surrounding code also owned the repeated segment.

Now that the code has been moved elsewhere, to a "shared" location, ownership can often be muddled. When a bug is found, who is supposed to fix it? What happens if that "bug" is already relied on by another use?

Especially in case with reactive DRY refactoring, there is little effort given to specifying the expected semantics of the common code. There might be some tests, but the behavior that is not captured by tests might still vary.

Summary

Having a common library which different code bases can be relied on is good. However, adding functions to such a library or libraries should be done mindfully. A reviewer comment about "this code duplicates the functionality already implemented here" or, even worse, something like pylint code duplication detector, does not have that context or mindfulness.

It is better to acknowledge the duplication, perhaps track it via a ticket, and let the actual "DRY" application take place later. This allows gathering more examples, thinking carefully about API design, and make sure that ownership and backwards compatibility issues have been thought of.

Deduplicating code by putting common lines into functions, without careful thought about abstractions, is never a good idea. Understanding how to abstract correctly is essentially API design. API design is subtle, and difficult to do well. There are no easy short-cuts, and developing expertise in it takes a long time.

Because API design is such a complex skill, it is not easy to give general guidelines except one: wait. Rushing into an API design does not make a good API, even if the person rushing is an expert.

12 Dec 2020 4:00am GMT

30 Nov 2020

feedPlanet Twisted

Glyph Lefkowitz: Faster

I've often heard Henry Ford quoted as saying:

"If I had asked people what they wanted, they would have said faster horses."

Despite the fact that he probably didn't actually say that, it does neatly encapsulate a certain approach to product development. And it's one that the modern technology industry loves to lionize.

There's a genre of mythologized product development whereby wholly unique and novel products spring, fully-formed, Athena-like, from the foreheads of Zeusian industrialists like Jobs, or Musk, or Bezos. This act of creation requires no input from customers. Indeed, the myths constructed about the iconic products associated with these industrialists often gloss over or outright ignore the work of their hundreds of thousands of employees, not to mention long years of iteration along with legions of early-adopter customers.

Ford's other major area of contribution to public discourse was, of course, being a big ol' Nazi, just writing so much Nazi stuff that he was one of Hitler's heroes.1

This could be a coincidence, of course; lots of prominent thinkers in the past were absolutely hideous racists, anti-semites, slave owners and worse; these terrible ideas were often products of the time, and the people who held them sometimes nevertheless had other ideas worth examining.

But I think that this sentiment reflects a wider underlying infatuation with authoritarian ideology. At its core, the idea is that the uniquely gifted engineer is just better than their users, fundamentally smarter, more able to discern their true needs, more aware of the capabilities of the technology that we alone are familiar with. Why ask the little people, they can't possibly know what they really need.

While we may blithely quote this sort of thing, when you look at the nuts and bolts of the technology industry, the actual practice of the industry has matured past it. Focus groups and user research are now cornerstones of interaction design. We know that it's pure hubris to think that we can predict the way that users react with; you can't just wing it.

But, I hadn't heard a similarly pithy encapsulation of an empathetic approach that keeps the user in the loop and doesn't condescend to them, until today. The quote came up, and my good friend Tristan Seligmann responded with this:

If you ask your users questions that they don't have the skills to answer - like "how can we improve your horse?" - they will give you bad answers; but the solution to this is to ask better questions, not to ask no questions.

Tristan Seligmann

That, fundamentally, is the work-product of a really good engineer. Not faster horses or faster cars:

Better questions.


  1. Pro tip: don't base your design ethos on Nazi ideas.

30 Nov 2020 7:03am GMT

20 Sep 2020

feedPlanet Twisted

Moshe Zadka: Fifty Shades of Ver

Computers work on binary code. If statements take one path: true, or false. For computers, bright lines and clear borders make sense.

Humans are more complicated. What's an adult? When are you happy? How mature are you? Humans have fuzzy feelings with no clear delination.

I was more responsible as a ten year old than as a three year old. At 13, I reached the age when I was responsible for following Jewish law myself. At 18, the legal system trusted me to could drink alcohol and drive, and trusted me that I will keep the two activities distinct. In the US, you cannot become a senator before you are 30.

At what age are you responsible "enough"?

Software is written by humans, not computers. Humans with feelings, hopes, and dreams. We cry, we strive, we feel accomplished at times, and disappointed at others.

If you were designing a version system for computers, SemVer, or "Semantic Versioning", would make perfect sense. Each part number in a three-part version number is given a specific, distinct, definition:

But software is not made by computers. It is made by humans.

Start small

A journey of a thousand miles begins with a single step. The first version of the Linux kernel printed As and Bs to the screen. The first version of Python didn't have modules. SemVer, to its credit, acknowledges that.

In versions like 0.x.y. SemVer defines the semantics:

Anything MAY change at any time. The public API SHOULD NOT be considered stable.

When something is small and fragile, it should be able to change. Every UNIX programmer knows the story of why Makefile treats tabs and spaces differently. In retrospect, causing a dozen of people a small amount of pain would probably have been better than staying with the problem.

Once the software is mature enough, the SemVer reasoning goes, just release 1.0.0 and commit to API stability.

Grow slowly

Given the amount of projects that have stayed on ZeroVer for a long time (or forever!) the assumption that commiting to API stability once the project matures is easy seems to not pan out.

Remember: software is written by humans. Fuzzy humans, in complicated social structures, who work together as best they can, using brains evolved to figure out politics in a fifty-person tribe in order to stay alive.

Once a social structure is in place and working, changing it is hard. In the ZeroVer days, there was no reason to figure out which changes broke API compatibility. There was no reason to clearly delinate which parts are "public" API and which are not. After all, there was no need.

Switching out of ZeroVer requires building all of this. Not switching out of ZeroVer does not require complicated social engineering. It is not surprising that it is hard. It's almost as if humans work better with slow changes, and not sudden revolutions.

Small commitments

Lately, I have been frustrated with some aspects of my life. COVID-19 did not cause them, but helped bring them into sharp focus. As the least embarassing example to admit in a public forum, I realized that while my book shelves are so full of books that shoving another one requires my 80s-kids Tetris skills, I have not read a single fiction book in the last three years. I used to be a voracious reader!

How do you change habits? I used to read, easily, 200 pages of fiction a day in my 20s. I have not gotten worse at reading. I could commit to reading 200 pages a day, and track my progress. If you have ever done that, you know what the outcome is. Every day, you look at the task, and you decide it is too big. You never begin.

Instead, I decided I will read 20 pages a day, and feel good about it. Feel good? I even decided to reward myself for every week where I hit this goal five out seven days.

The result? The last few weeks, I have been consistently been reading 20 pages a day, missing only one or two days.

When you are not good at something, as a person or as a group, and you want to get better, small commitments frequently achieved are the way to go.

SemVer does not work that way. It is all or nothing. SemWay or the Highway. Perhaps it is better to have a versioning system for humans, not a fictional alien race, if we assume software will keep being written by humans.

Deprecation policy

It is an easy change to say that no single change can "just" break an API. One change to deprecate, and one change to break. This is straightforward to verify. It is reasonable to have a policy for exceptions, but document the exceptions carefully.

Note that this change does not help potential users all that much by itself. After all, two PRs in close succession can land, and there is no reasonable upgrade path.

Should we feel good about making a small change that does not help anyone? Absolutely. Because it is small, and it is on the right path.

Now that this change becomes ingrained in the developer group, we can start mandating a minimum time between deprecation and breakage. At first, we can have a 0day policy: you can break, as long as the deprecated software has been released. This causes more releases to happen, making the team better at releasing. It helps users only minimally. However, at least with careful version pinning, there is an upgrade path.

Now, we can start making the number 0 a bit bigger. First, a week. Then, a month. Eventually, a quarter or a year. If the project is big, the number might be different for different parts.

But at that point, the project has a clear deprecation policy. A deprecation policy that can slowly grow the more mature the project is. Not a binary, true/false, mature/new. Shades of maturity. Levels of reliability.

The calendar

A minute has 60 seconds. An hour has 60 minutes. A day has 24 hours. Our time measurement system is still based on the Babylonian base-60 system, though the actual digits used by the Babylonians are studied only by specialists.

Humans organize their lives by their calendar. Kids learn that their birthday happens when they are a year older. Every seven days, we have a weekend. Every month, utility bills need to be paid.

Humans make plans that depend on time. They wait for their tax refund on April 15th to make purchases.

A time-based deprecation policy takes advantage of those skills. If the time between deprecation and breakage is one week, then the policy is clear: better make sure to upgrade weekly. If it is one year, do it when returning from the end-of-year holidays. If the policy is incompatible with the expected value of the maintenance effort, then this can be known in advance. This might mean that that dependency is not mature enough to be used.

Versioning for adults

A versioning scheme needs to remember two things:

  • The people writing the software are humans
  • The people using the software are humans

If there is one thing that humans are good at, it is communicating with other humans. Humans can communicate feelings, fuzzy boundaries, and plans for the future.

Calendar-based versioning, and a clear deprecation policy, give them the ability to communicate those. Not in a way that is suitable for computers. Not in a way that will help your dependency-resolver decide which version is "compatible". But in a way that lets you communicate with people about your needs in a mature way, and figuring out whether you can work together, or part on friendly terms.

Summary

If the main consumer of version numbers was the dependency resolver, and the producer of version number was a top-down military structure used to following orders, SemVer would work well.

For real software projects, used by humans, depending on documents written by humans for humans, and often "managed" in extremely loose ways, even for commercial projects, let alone volunteer-led ones, a versioning system that helps adults work with adults is best.

Credits

20 Sep 2020 5:00am GMT

18 Sep 2020

feedPlanet Twisted

Itamar Turner-Trauring: From YAGNI to YDNIY

How do you ship a product on schedule? One useful approach is applying the You Ain't Gonna Need It principle, or YAGNI for short: leave out all the things that seem nice-to-have, but you have no proof you actually need.

But beyond the things you don't need, there are still plenty of features you pretty clearly do need… but are not blockers on releasing your product. So beyond YAGNI, there's also YDNIY: You Don't Need It Yet.

Let's see an example of this principle in practice, visualize the principle as a flowchart, and then compare it to another popular acronymed concept, the Minimum Viable Product.

A real world example: shipping a new memory profiler

In March 2020 I shipped the initial release of a new memory profiler for Python, Fil.

Here's how it changed over time in terms of features, from May to August 2020:

All of the features I added in later releases were clearly necessary from the start; YAGNI did not apply. Lots of people use macOS, the target audience of data scientists and scientists often use Conda and Jupyter, all those memory allocation APIs are used in the real world, and so on.

But even a tool that only runs complete programs on Linux, and only tracks the most popular memory allocation APIs, is still useful to some people.

If I had waited until all those features were implemented to ship an initial release, all the people who used the profiler during the first four months of its existence would have had to keep using worse tools. And with every release, the number of people for whom the tool is useful has grown.

Unlike YAGNI, YDNIY doesn't mean you don't implement a feature-you just delay it so that you can release something now.

The YAGNI and YDNIY algorithm

Features that are not clearly necessary can be dropped based on the YAGNI principle. And if the product is still useful without the feature, you can delay that feature based on the YDNIY principle.

In flowchart form:

Is the feature clearly needed? Is the product usable without it? Yes YAGNI No Implement it now No YDNIY Yes Implement it later

YAGNI and YDNIY vs. MVP

The Minimum Viable Product, or MVP, is another acronym that might seem like it means the same thing as YDNIY. But as defined by its originator, Eric Ries, an MVP has a different goal, and actually adds on more work.

Specifically, Ries defines an MVP as "that version of a new product which allows a team to collect the maximum amount of validated learning about customers with the least effort." He goes on to explain that a "MVP is quite annoying, because it imposes extra overhead. We have to manage to learn something from our first product iteration. In a lot of cases, this requires a lot of energy invested in talking to customers or metrics and analytics."

To put it another way, the goal of the MVP is to learn about users or customers, whereas the goal of YAGNI and YDNIY is to get something useful into users' hands as quickly as possible.



Tired of scrambling to get your job done?

If you were productive enough, you could take the afternoon off, confident you'd produced high value work. Not to mention having an easier time finding a new job when you need one.

Learn the secret skills of productive programmers.

18 Sep 2020 4:00am GMT

24 Aug 2020

feedPlanet Twisted

Glyph Lefkowitz: Nice Animations with Twisted and PyGame

SNEKS

One of my favorite features within Twisted - but also one of the least known - is LoopingCall.withCount, which can be used in applications where you have some real-time thing happening, which needs to keep happening at a smooth rate regardless of any concurrent activity or pauses in the main loop. Originally designed for playing audio samples from a softphone without introducing a desync delay over time, it can also be used to play animations while keeping track of their appropriate frame.

LoopingCall is all around a fun tool to build little game features with. I've built a quick little demo to showcase some discoveries I've made over a few years of small hobby projects (none of which are ready for an open-source release) over here: DrawSnek.

This little demo responds to 3 key-presses:

  1. q quits. Always a useful thing for full-screen apps which don't always play nice with C-c :).
  2. s spawns an additional snek. Have fun, make many sneks.
  3. h introduces a random "hiccup" of up to 1 full second so you can see what happens visually when the loop is overburdened or stuck.

Unfortunately a fully-functioning demo is a bit lengthy to go over line by line in a blog post, so I'll just focus on a couple of important features for stutter- and tearing-resistant animation & drawing with PyGame & Twisted.

For starters, you'll want to use a very recent prerelease of PyGame 2, which recently added support for vertical sync even without OpenGL mode; then, pass the vsync=1 argument to set_mode:

1
2
3
4
5
screen = pygame.display.set_mode(
    (640 * 2, 480 * 2),
    pygame.locals.SCALED | pygame.locals.FULLSCREEN,
    vsync=1
)

To allow for as much wall-clock time as possible to handle non-drawing work, such as AI and input handling, I also use this trick:

1
2
3
4
5
6
7
def drawScene():
    screen.fill((0, 0, 0))
    for drawable in self.drawables:
        drawable.draw(screen)
    return deferToThread(pygame.display.flip)

LoopingCall(drawScene).start(1 / 62.0)

By deferring pygame.display.flip to a thread1, the main loop can continue processing AI timers, animation, network input, and user input while blocking and waiting for the vertical blank. Since the time-to-vblank can easily be up to 1/120th of a second, this is a significant amount of time! We know that the draw won't overlap with flip, because LoopingCall respects Deferreds returned from its callable and won't re-invoke you until the Deferred fires.

Drawing doesn't use withCount, because it just needs to repeat about once every refresh interval (on most displays, about 1/60th of a second); the vblank timing is what makes sure it lines up.

However, animation looks like this:

1
2
3
def animate(self, frameCount):
    self.index += frameCount
    self.index %= len(self.images)

We move the index forward by however many frames it's been, then be sure it wraps around by modding it by the number of frames.

Similarly, the core2 of movement looks like this:

1
2
3
def move(self, frameCount):
    self.sprite.x += frameCount * self.dx
    self.sprite.y += frameCount * self.dy

Rather than moving based on the number of times we've been called, which can result in slowed-down movement when the framerate isn't keeping up, we jump forward by however many frames we should have been called at this point in time.

One of these days, maybe I'll make an actual game, but in the meanwhile I hope you all enjoy playing with these fun little basic techniques for using Twisted in your game engine.


  1. I'm mostly sure that this is safe, but, it's definitely the dodgiest thing here. If you're going to do this, make sure that you never do any drawing outside of the draw() method.

  2. Hand-waving over a ton of tedious logic to change direction before we go out of bounds...

24 Aug 2020 2:50am GMT

23 Aug 2020

feedPlanet Twisted

Glyph Lefkowitz: Never Run ‘python’ In Your Downloads Folder

One of the wonderful things about Python is the ease with which you can start writing a script - just drop some code into a .py file, and run python my_file.py. Similarly it's easy to get started with modularity: split my_file.py into my_app.py and my_lib.py, and you can import my_lib from my_app.py and start organizing your code into modules.

However, the details of the machinery that makes this work have some surprising, and sometimes very security-critical consequences: the more convenient it is for you to execute code from different locations, the more opportunities an attacker has to execute it as well...

Python needs a safe space to load code from

Here are three critical assumptions embedded in Python's security model:

  1. Every entry on sys.path is assumed to be a secure location from which it is safe to execute arbitrary code.
  2. The directory where the "main script" is located is always on sys.path.
  3. When invoking python directly, the current directory is treated as the "main script" location, even when passing the -c or -m options.

If you're running a Python application that's been installed properly on your computer, the only location outside of your Python install or virtualenv that will be automatically added to your sys.path (by default) is the location where the main executable, or script, is installed.

For example, if you have pip installed in /usr/bin, and you run /usr/bin/pip, then only /usr/bin will be added to sys.path by this feature. Anything that can write files to that /usr/bin can already make you, or your system, run stuff, so it's a pretty safe place. (Consider what would happen if your ls executable got replaced with something nasty.)

However, one emerging convention is to prefer calling /path/to/python -m pip in order to avoid the complexities of setting up $PATH properly, and to avoid dealing with divergent documentation of how scripts are installed on Windows (usually as .exe files these days, rather than .py files).

This is fine - as long as you trust that you're the only one putting files into the places you can import from - including your working directory.

Your "Downloads" folder isn't safe

As the category of attacks with the name "DLL Planting" indicates, there are many ways that browsers (and sometimes other software) can be tricked into putting files with arbitrary filenames into the Downloads folder, without user interaction.

Browsers are starting to take this class of vulnerability more seriously, and adding various mitigations to avoid allowing sites to surreptitiously drop files in your downloads folder when you visit them.1

Even with mitigations though, it will be hard to stamp this out entirely: for example, the Content-Disposition HTTP header's filename* parameter exists entirely to allow the the site to choose the filename that it downloads to.

Composing the attack

You've made a habit of python -m pip to install stuff. You download a Python package from a totally trustworthy website that, for whatever reason, has a Python wheel by direct download instead of on PyPI. Maybe it's internal, maybe it's a pre-release; whatever. So you download totally-legit-package.whl, and then:

1
2
~$ cd Downloads
~/Downloads$ python -m pip install ./totally-legit-package.whl

This seems like a reasonable thing to do, but unbeknownst to you, two weeks ago, a completely different site you visited had some XSS JavaScript on it that downloaded a pip.py with some malware in it into your downloads folder.

Boom.

Demonstrating it

Here's a quick demonstration of the attack:

1
2
3
4
5
~$ mkdir attacker_dir
~$ cd attacker_dir
~/attacker_dir$ echo 'print("lol ur pwnt")' > pip.py
~/attacker_dir$ python -m pip install requests
lol ur pwnt

PYTHONPATH surprises

Just a few paragraphs ago, I said:

If you're running a Python application that's been installed properly on your computer, the only location outside of your Python install or virtualenv that will be automatically added to your sys.path (by default) is the location where the main executable, or script, is installed.

So what is that parenthetical "by default" doing there? What other directories might be added?

Anything entries on your $PYTHONPATH environment variable. You wouldn't put your current directory on $PYTHONPATH, would you?

Unfortunately, there's one common way that you might have done so by accident.

Let's simulate a "vulnerable" Python application:

1
2
3
4
5
# tool.py
try:
    import optional_extra
except ImportError:
    print("extra not found, that's fine")

Make 2 directories: install_dir and attacker_dir. Drop this in install_dir. Then, cd attacker_dir and put our sophisticated malware there, under the name used by tool.py:

1
2
# optional_extra.py
print("lol ur pwnt")

Finally, let's run it:

1
2
~/attacker_dir$ python ../install_dir/tool.py
extra not found, that's fine

So far, so good.

But, here's the common mistake. Most places that still recommend PYTHONPATH recommend adding things to it like so:

1
export PYTHONPATH="/new/useful/stuff:$PYTHONPATH";

Intuitively, this makes sense; if you're adding project X to your $PYTHONPATH, maybe project Y had already added something, maybe not; you never want to blow it away and replace what other parts of your shell startup might have done with it, especially if you're writing documentation that lots of different people will use.

But this idiom has a critical flaw: the first time it's invoked, if $PYTHONPATH was previously either empty or un-set, this then includes an empty string, which resolves to the current directory. Let's try it:

1
2
3
~/attacker_dir$ export PYTHONPATH="/a/perfectly/safe/place:$PYTHONPATH";
~/attacker_dir$ python ../install_dir/tool.py
lol ur pwnt

Oh no! Well, just to be safe, let's empty out $PYTHONPATH and try it again:

1
2
3
~/attacker_dir$ export PYTHONPATH="";
~/attacker_dir$ python ../install_dir/tool.py
lol ur pwnt

Still not safe!

What's happening here is that if PYTHONPATH is empty, that is not the same thing as it being unset. From within Python, this is the difference between os.environ.get("PYTHONPATH") == "" and os.environ.get("PYTHONPATH") == None.

If you want to be sure you've cleared $PYTHONPATH from a shell (or somewhere in a shell startup), you need to use the unset command:

1
2
~/attacker_dir$ python ../install_dir/tool.py
extra not found, that's fine

Setting PYTHONPATH used to be the most common way to set up a Python development environment; hopefully it's mostly fallen out of favor, with virtualenvs serving this need better. If you've got an old shell configuration that still sets a $PYTHONPATH that you don't need any more, this is a good opportunity to go ahead and delete it.

However, if you do need an idiom for "appending to" PYTHONPATH in a shell startup, use this technique:

1
2
export PYTHONPATH="${PYTHONPATH:+${PYTHONPATH}:}new_entry_1"
export PYTHONPATH="${PYTHONPATH:+${PYTHONPATH}:}new_entry_2"

In both bash and zsh, this results in

1
2
$ echo "${PYTHONPATH}"
new_entry_1:new_entry_2

with no extra colons or blank entries on your $PYTHONPATH variable now.

Finally: if you're still using $PYTHONPATH, be sure to always use absolute paths!

Related risks

There are a bunch of variant unsafe behaviors related to inspecting files in your Downloads folder by doing anything interactive with Python. Other risky activities:

Get those scripts and notebooks out of your downloads folder before you run 'em!

But cd Downloads and then doing anything interactive remains a problem too:

Remember that ~/Downloads/ isn't special; it's just one place where unexpected files with attacker-chosen filenames might sneak in. Be on the lookout for other locations where this is true. For example, if you're administering a server where the public can upload files, make extra sure that neither your application nor any administrator who might run python ever does cd public_uploads.

Maybe consider changing the code that handles uploads to mangle file names to put a .uploaded at the end, avoiding the risk of a .py file getting uploaded and executed accidentally.

Mitigations

If you have tools written in Python that you want to use while in your downloads folder, make a habit of preferring typing the path to the script (/path/to/venv/bin/pip) rather than the module (/path/to/venv/bin/python -m pip).

In general, just avoid ever having ~/Downloads as your current working directory, and move any software you want to use to a more appropriate location before launching it.

It's important to understand where Python gets the code that it's going to be executing. Giving someone the ability to execute even one line of arbitrary Python is equivalent to giving them full control over your computer!

Why I wrote this article

When writing a "tips and tricks" article like this about security, it's very easy to imply that I, the author, am very clever for knowing this weird bunch of trivia, and the only way for you, the reader, to stay safe, is to memorize a huge pile of equally esoteric stuff and constantly be thinking about it. Indeed, a previous draft of this post inadvertently did just that. But that's a really terrible idea and not one that I want to have any part in propagating.

So if I'm not trying to say that, then why post about it? I'll explain.

Over many years of using Python, I've infrequently, but regularly, seen users confused about the locations that Python loads code from. One variety of this confusion is when people put their first program that uses Twisted into a file called twisted.py. That shadows the import of the library, breaking everything. Another manifestation of this confusion is a slow trickle of confused security reports where a researcher drops a module into a location where Python is documented to load code from - like the current directory in the scenarios described above - and then load it, thinking that this reflects an exploit because it's executing arbitrary code.

Any confusion like this - even if the system in question is "behaving as intended", and can't readily be changed - is a vulnerability that an attacker can exploit.

System administrators and developers are high-value targets in the world of cybercrime. If you hack a user, you get that user's data; but if you hack an admin or a dev, and you do it right, you could get access to thousands of users whose systems are under the administrator's control or even millions of users who use the developers' software.

Therefore, while "just be more careful all the time" is not a sustainable recipe for safety, to some extent, those of us acting on our users' behalf do have a greater obligation to be more careful. At least, we should be informed about the behavior of our tools. Developer tools, like Python, are inevitably power tools which may require more care and precision than the average application.

Nothing I've described above is a "bug" or an "exploit", exactly; I don't think that the developers of Python or Jupyter have done anything wrong; the system works the way it's designed and the way it's designed makes sense. I personally do not have any great ideas for how things could be changed without removing a ton of power from Python.

One of my favorite safety inventions is the SawStop. Nothing was wrong with the way table saws worked before its invention; they were extremely dangerous tools that performed an important industrial function. A lot of very useful and important things were made with table saws. Yet, it was also true that table saws were responsible for a disproportionate share of wood-shop accidents, and, in particular, lost fingers. Despite plenty of care taken by experienced and safety-conscious carpenters, the SawStop still saves many fingers every year.

So by highlighting this potential danger I also hope to provoke some thinking among some enterprising security engineers out there. What might be the SawStop of arbitrary code execution for interactive interpreters? What invention might be able to prevent some of the scenarios I describe below without significantly diminishing the power of tools like Python?

Stay safe out there, friends.


Acknowledgments

Thanks very much to Paul Ganssle, Nathaniel J. Smith, Itamar Turner-Trauring and Nelson Elhage for substantial feedback on earlier drafts of this post.

Any errors remain my own.


  1. Restricting which sites can drive-by drop files into your downloads folder is a great security feature, except the main consequence of adding it is that everybody seems to be annoyed by it, not understand it, and want to turn it off.

23 Aug 2020 6:47am GMT

21 Aug 2020

feedPlanet Twisted

Moshe Zadka: Universal Binary

I have written before about my Inbox Zero methodology. This is still what I practice, but there is a lot more that helps me.

The concept behind "Universal Binary" is that the only numbers that make sense asymptotically are zero, one, and infinity. Therefore, in order to prevent things from going off into infinity, there needs to be processes that keep everything to either zero or one.

One:

Zero:

One TODO List

I have a single list that tracks everything I needed to do. Be it a reminder to put a garbage bin in the car or work on upgrading a dependency in production, everything goes in the same place.

Sometimes this will not be where all the information is. Many of the things I need to do for work, for example, require a link to our internal issue tracking system. For open source tickets, I have a GitHub link.

But the important thing is that I don't go to GitHub or our internal ticketing system to figure out what I need to do. I have a single TODO list

Since I have one TODO list, it gets a lot of things. If my wife asks me to run an errand, it becomes a task. In my one-on-one meeting with my manager, if I make a commitment, it becomes a task. If a conference e-mails me to suggest I participate in the CFP, it becomes a task. The tasks accumulate fast.

Currently, I feel like I'm on top of things and not behind on anything. In this calm, smooth sailing situation, I have around 200 tasks in my list. If every time I opened my list, I would have to look through 200 items to figure out what I am doing, I would get frustrated.

Instead, I have appropriate filters on it. "Today and not related to work" when I am at home relaxing. "Overdue and related to work" when I get to the office in the morning, to see I dropped anything on the floor. "Things that are either not related to work or need to be done at home and due soon" for when I'm at home catching up in the evening.

As I mentioned, I use TODOist. I think it's perfectly reasonable. However, there are a lot of equally reasonable alternatives. What's not reasonable is anything that does not let you tag and filter.

One Calendar

I have gotten all my calendars feeding into a single pane of glass, which color-codes the source. My calendars include:

  • Work calendar (the feeding removes sensitive information)
  • A Trello board with the Calendar power-up for co-ordinating events with my family.
  • TODOist's due date/time calendar.
  • Personal calendar invites.
  • My "Daily schedule", which is where I try to document my plans for each hour I am awake, by day of the week.

I have a daily morning task to review the calendar for the day.

One Time Tracker

I use Toggl. The coolest feature in Toggl is that the Firefox button integration integrates with TODOist and GitHub, so I have a button that says "start working on this". Some of the things I do are not tracked at tasks. As a horrible work-around, I have a Microsoft TO-DO pinned tab. This does not violate the "One TODO list" motto, because these are not tasks I ever plan to accomplish. This are simply things I can activate as a "thing I am doing" with one click: for example, "dinner" or "figuring out next task".

Since as long as I am alive I am doing "something", my time tracker is always supposed to be ticking. I also have a daily task to go over the tracked items and fix spelling and add appropriate metadata so that I have less pressure to do so when I start tasks.

Zero Notes

A note is just some information that has no place. Everything should have a proper place. If I want to take down some information and not sure what is its proper place, then it goes in the TODO list. The action item is "figure out where to put this."

I have a links file, where I put links. I have a recipes GitHub repo, where I put recipes. I have a "Notes" folder in Dropbox, but the only notes that go there are things that I need to be able to see immediately on my phone. This means that every note should have an expiry date, and after that they can be archived.

During the beforetimes, I would have flight information for trips there, and the like. In these times, sadly, this folder is mostly empty until the world is right again.

Zero Unpinned Tabs

Firefox has an amazing feature called "pinned tabs". Pinned tabs are always left-most, and only have their icon showing up. My pinned tabs include my E-mail, my Calendar, WhatsApp, TODOist, and the Microsoft To-Do hack.

Other tabs get closed. Since many of my tasks generate a lot of open tabs, when cross-referencing documentation, this is an easy reminder. Whenever I can't see all tab titles, I close everything unpinned. Anything that I hesitate to close gets converted to a task with the TODOist browser extension and then closed.

Zero E-mails

I have daily tasks to empty out my personal and work inbox. Anything that can't be emptied in the time I allocated to doing that gets converted to a task.

Zero Floating Tasks

Tasks get created in "Inbox" with no "due date". There is a daily task have that list have zero items. Items can be non-floating by being assigned to a project and having one of three things be true:

  • It's marked @when:time_permitting, which is effectively equivalent to "Archived".
  • It's marked @status:subtask, which means it is part of a bigger task where it is managed.
  • It's marked with a specific day I plan to do it ("due date", but it does not actually mean "due date").

(Work in Progress) One Report

I am working on having one report that pulls via API integration from Toggl, TODOist, and FastMail Calendar (CalDav) details for the past week and summarizes them. For example, how many things did I do without a task? Did they make sense? What was my calendar saying I did at the time?

I have a rough prototype, so now it is mostly debugging the logic to make sure I am getting everything I expect and cleaning up the Look And Feel so I can see as much as possible on one screen. I am using Jupyter for that.

Summary

For some people, "small amounts of chaos" is a reasonable goal. But for me, it's zero or infinity Funneling everything to one TODO box allows emptying all the others. Funneling everything to one calendar means only checking in one place "what am I supposed to be doing".

21 Aug 2020 4:45am GMT