18 Oct 2019

feedPlanet Twisted

Moshe Zadka: An introduction to zope.interface

This has previously been published on opensource.com.

The Zen of Python is loose enough and contradicts itself enough that you can prove anything from it. Let's meditate upon one of its most famous principles: "Explicit is better than implicit."

One thing that traditionally has been implicit in Python is the expected interface. Functions have been documented to expect a "file-like object" or a "sequence." But what is a file-like object? Does it support .writelines? What about .seek? What is a "sequence"? Does it support step-slicing, such as a[1:10:2]?

Originally, Python's answer was the so-called "duck-typing," taken from the phrase "if it walks like a duck and quacks like a duck, it's probably a duck." In other words, "try it and see," which is possibly the most implicit you could possibly get.

In order to make those things explicit, you need a way to express expected interfaces. One of the first big systems written in Python was the Zope web framework, and it needed those things desperately to make it obvious what rendering code, for example, expected from a "user-like object."

Enter zope.interface, which was part of Zope but published as a separate Python package. The zope.interface package helps declare what interfaces exist, which objects provide them, and how to query for that information.

Imagine writing a simple 2D game that needs various things to support a "sprite" interface; e.g., indicate a bounding box, but also indicate when the object intersects with a box. Unlike some other languages, in Python, attribute access as part of the public interface is a common practice, instead of implementing getters and setters. The bounding box should be an attribute, not a method.

A method that renders the list of sprites might look like:

def render_sprites(render_surface, sprites):
    """
    sprites should be a list of objects complying with the Sprite interface:
    * An attribute "bounding_box", containing the bounding box.
    * A method called "intersects", that accepts a box and returns
      True or False
    """
    pass # some code that would actually render

The game will have many functions that deal with sprites. In each of them, you would have to specify the expected contract in a docstring.

Additionally, some functions might expect a more sophisticated sprite object, maybe one that has a Z-order. We would have to keep track of which methods expect a Sprite object, and which expect a SpriteWithZ object.

Wouldn't it be nice to be able to make what a sprite is explicit and obvious so that methods could declare "I need a sprite" and have that interface strictly defined? Enter zope.interface.

from zope import interface

class ISprite(interface.Interface):

    bounding_box = interface.Attribute(
        "The bounding box"
    )

    def intersects(box):
        "Does this intersect with a box"

This code looks a bit strange at first glance. The methods do not include a self, which is a common practice, and it has an Attribute thing. This is the way to declare interfaces in zope.interface. It looks strange because most people are not used to strictly declaring interfaces.

The reason for this practice is that the interface shows how the method will be called, not how it is defined. Because interfaces are not superclasses, they can be used to declare data attributes.

One possible implementation of the interface can be with a circular sprite:

@implementer(ISprite)
@attr.s(auto_attribs=True)
class CircleSprite:
    x: float
    y: float
    radius: float

    @property
    def bounding_box(self):
        return (
            self.x - self.radius,
            self.y - self.radius,
            self.x + self.radius,
            self.y + self.radius,
        )

    def intersects(self, box):
        # A box intersects a circle if and only if
        # at least one corner is inside the circle.
        top_left, bottom_right = box[:2], box[2:]
        for choose_x_from (top_left, bottom_right):
            for choose_y_from (top_left, bottom_right):
                x = choose_x_from[0]
                y = choose_y_from[1]
                if (((x - self.x) ** 2 + (y - self.y) ** 2) <=
                    self.radius ** 2):
                     return True
        return False

This explicitly declares that the CircleSprite class implements the interface. It even enables us to verify that the class implements it properly:

from zope.interface import verify

def test_implementation():
    sprite = CircleSprite(x=0, y=0, radius=1)
    verify.verifyObject(ISprite, sprite)

This is something that can be run by pytest, nose, or another test runner, and it will verify that the sprite created complies with the interface. The test is often partial: it will not test anything only mentioned in the documentation, and it will not even test that the methods can be called without exceptions! However, it does check that the right methods and attributes exist. This is a nice addition to the unit test suite and -- at a minimum -- prevents simple misspellings from passing the tests.

If you have some implicit interfaces in your code, why not document them clearly with zope.interface?

18 Oct 2019 3:00am GMT

16 Oct 2019

feedPlanet Twisted

Hynek Schlawack: Sharing Your Labor of Love: PyPI Quick and Dirty

A completely incomplete guide to packaging a Python module and sharing it with the world on PyPI.

16 Oct 2019 12:00am GMT

13 Oct 2019

feedPlanet Twisted

Glyph Lefkowitz: Mac Python Distribution Post Updated for Catalina and Notarization

I previously wrote a post about shipping a PyGame app to users on macOS. It's now substantially updated for the new Notarization requirements in Catalina. I hope it's useful to somebody!

13 Oct 2019 9:10pm GMT

07 Oct 2019

feedPlanet Twisted

Glyph Lefkowitz: The Numbers, They Lie

It's October, and we're all getting ready for Halloween, so allow me to me tell you a horror story, in Python:

1
2
>>> 0.1 + 0.2 - 0.3
5.551115123125783e-17

some scary branches

Some of you might already be familiar with this chilling tale, but for those who might not have experienced it directly, let me briefly recap.

In Python, the default representation of a number with a decimal point in it is something called an "IEEE 754 double precision binary floating-point number". This standard achieves a generally useful trade-off between performance, correctness, and is widely implemented in hardware, making it a popular choice for numbers in many programming language.

However, as our spooky story above indicates, it's not perfect. 0.1 + 0.2 is very slightly less than 0.3 in this representation, because it is a floating-point representation in base 2.

If you've worked professionally with software that manipulates money1, you typically learn this lesson early; it's quite easy to smash head-first into the problem with binary floating-point the first time you have an item that costs 30 cents and for some reason three dimes doesn't suffice to cover it.

There are a few different approaches to the problem; one is using integers for everything, and denominating your transactions in cents rather than dollars. A strategy which requires less weird unit-conversion2, is to use the built-in decimal module, which provides a floating-point base 10 representation, rather than the standard base-2, which doesn't have any of these weird glitches surrounding numbers like 0.1.

This is often where a working programmer's numerical education ends; don't use floats, they're bad, use decimals, they're good. Indeed, this advice will work well up to a pretty high degree of application complexity. But the story doesn't end there. Once division gets involved, things can still get weird really fast:

1
2
3
>>> from decimal import Decimal
>>> (Decimal("1") / 7) * 14
Decimal('2.000000000000000000000000001')

The problem is the same: before, we were working with 1/10, a value that doesn't have a finite (non-repeating) representation in base 2; now we're working with 1/7, which has the same problem in base 10.

Any time you have a representation of a number which uses digits and a decimal point, no matter the base, you're going to run in to some rational values which do not have an exact representation with a finite number of digits; thus, you'll drop some digits off the (necessarily finite) end, and end up with a slightly inaccurate representation.

But Python does have a way to maintain symbolic accuracy for arbitrary rational numbers -- the fractions module!

1
2
3
4
5
>>> from fractions import Fraction
>>> Fraction(1)/3 + Fraction(2)/3 == 1
True
>>> (Fraction(1)/7) * 14 == 2
True

You can multiply and divide and add and subtract to your heart's content, and still compare against zero and it'll always work exactly, giving you the right answers.

So if Python has a "correct" representation, which doesn't screw up our results under a basic arithmetic operation such as division, why isn't it the default? We don't care all that much about performance, right? Python certainly trades off correctness and safety in plenty of other areas.

First of all, while Python's willing to trade off some storage or CPU efficiency for correctness, precise fractions rapidly consume huge amounts of storage even under very basic algorithms, like consuming gigabytes while just trying to maintain a simple running average over a stream of incoming numbers.

But even more importantly, you'll notice that I said we could maintain symbolic accuracy for arbitrary rational numbers; but, as it turns out, a whole lot of interesting math you might want to do with a computer involves numbers which are irrational: like π. If you want to use a computer to do it, pretty much all trigonometry3 involves a slightly inaccurate approximation unless you have a literally infinite amount of storage.

As Morpheus put it, "welcome to the desert of the ".


  1. or any proxy for it, like video-game virtual currency

  2. and less time saying weird words like "nanodollars" to your co-workers

  3. or, for that matter, geometry, or anything involving a square root

07 Oct 2019 6:25am GMT

05 Oct 2019

feedPlanet Twisted

Glyph Lefkowitz: A Few Bad Apples

I'm a little annoyed at my Apple devices right now.

Time to complain.

"Trust us!" says Apple.

"We're not like the big, bad Google! We don't just want to advertise to you all the time! We're not like Amazon, just trying to sell you stuff! We care about your experience. Magical. Revolutionary. Courageous!"

But I can't hear them over the sound of my freshly-updated Apple TV - the appliance which exists solely to play Daniel Tiger for our toddler - playing the John Wick 3 trailer at full volume automatically as soon as it turns on.

For the aforementioned toddler.

I should mention that it is playing this trailer while specifically logged in to a profile that knows their birth date1 and also their play history2.


I'm aware of the preferences which control autoplay on the home screen; it's disabled now. I'm aware that I can put an app other than "TV" in the default spot, so that I can see ads for other stuff, instead of the stuff "TV" shows me ads for.

But the whole point of all this video-on-demand junk was supposed to be that I can watch what I want, when I want - and buying stuff on the iTunes store included the implicit promise of no advertisements.

At least Google lets me search the web without any full-screen magazine-style ads popping up.

Launch the app store to check for new versions?

apple arcade ad

I can't install my software updates without accidentally seeing HUGE ads for new apps.

Launch iTunes to play my own music?

apple music ad

I can't play my own, purchased music without accidentally seeing ads for other music - and also Apple's increasingly thirsty, desperate plea for me to remember that they have a streaming service now. I don't want it! I know where Spotify is if I wanted such a thing, the whole reason I'm launching iTunes is that I want to buy and own the music!

On my iPhone, I can't even launch the Settings app to turn off my WiFi without seeing an ad for AppleCare+, right there at the top of the UI, above everything but my iCloud account. I already have AppleCare+; I bought it with the phone! Worse, at some point the ad glitched itself out, and now it's blank, and when I tap the blank spot where the ad used to be, it just shows me this:

undefined is not an insurance plan

I just want to use my device, I don't need ad detritus littering every blank pixel of screen real estate.

Knock it off, Apple.


  1. less than 3 years ago

  2. Daniel Tiger, Doctor McStuffins, Word World; none of which have super significant audience overlap with the John Wick franchise

05 Oct 2019 6:32pm GMT

24 Sep 2019

feedPlanet Twisted

Jp Calderone: Tahoe-LAFS on Python 3 - Call for Porters

Hello Pythonistas,

Earlier this year a number of Tahoe-LAFS community members began an effort to port Tahoe-LAFS from Python 2 to Python 3. Around five people are currently involved in a part-time capacity. We wish to accelerate the effort to ensure a Python 3-compatible release of Tahoe-LAFS can be made before the end of upstream support for CPython 2.x.

Tahoe-LAFS is a Free and Open system for private, secure, decentralized storage. It encrypts and distributes your data across multiple servers. If some of the servers fail or are taken over by an attacker, the entire file store continues to function correctly, preserving your privacy and security.

Foolscap, a dependency of Tahoe-LAFS, is also being ported. Foolscap is an object-capability-based RPC protocol with flexible serialization.

Some details of the porting effort are available in a milestone on the Tahoe-LAFS trac instance.

For this help, we are hoping to find a person/people with significant prior Python 3 porting experience and, preferably, some familiarity with Twisted, though in general the Tahoe-LAFS project welcomes contributors of all backgrounds and skill levels.

We would prefer someone to start with us as soon as possible and no later than October 15th. If you are interested in this opportunity, please send us any questions you have, as well as details of your availability and any related work you have done previously (GitHub, LinkedIn links, etc). If you would like to find out more about this opportunity, please contact us at jessielisbetfrance at gmail (dot) com or on IRC in #tahoe-lafs on Freenode.


24 Sep 2019 4:59pm GMT

17 Sep 2019

feedPlanet Twisted

Moshe Zadka: Adding Methods Retroactively

The following post was originally published on OpenSource.com as part of a series on seven libraries that help solve common problems.

Imagine you have a "shapes" library. We have a Circle class, a Square class, etc.

A Circle has a radius, a Square has a side, and maybe Rectangle has height and width. The library already exists: we do not want to change it.

However, we do want to add an area calculation. If this was our library, we would just add an area method, so that we can call shape.area(), and not worry about what the shape is.

While it is possible to reach into a class and add a method, this is a bad idea: nobody expects their class to grow new methods, and things might break in weird ways.

Instead, the singledispatch function in functools can come to our rescue:

@singledispatch
def get_area(shape):
    raise NotImplementedError("cannot calculate area for unknown shape",
                              shape)

The "base" implementation for the get_area function just fails. This makes sure that if we get a new shape, we will cleanly fail instead of returning a nonsense result.

@get_area.register(Square)
def _get_area_square(shape):
    return shape.side ** 2
@get_area.register(Circle)
def _get_area_circle(shape):
    return math.pi * (shape.radius ** 2)

One nice thing about doing things this way is that if someone else writes a new shape that is intended to play well with our code, they can implement the get_area themselves:

from area_calculator import get_area

@attr.s(auto_attribs=True, frozen=True)
class Ellipse:
    horizontal_axis: float
    vertical_axis: float

@get_area.register(Ellipse)
def _get_area_ellipse(shape):
    return math.pi * shape.horizontal_axis * shape.vertical_axis

Calling get_area is straightforward:

print(get_area(shape))

This means we can change a function that has a long if isintance()/elif isinstance() chain to work this way, without changing the interface. The next time you are tempted to check if isinstance, try using singledispatch!

17 Sep 2019 1:00am GMT

10 Sep 2019

feedPlanet Twisted

Itamar Turner-Trauring: What can a software developer do about climate change?

Pines and firs are dying across the Pacific Northwest, fires rage across the Amazon, it's the hottest it's ever been in Paris-climate change is impacting the whole planet, and things are not getting any better. You want to do something about climate change, but you're not sure what.

If you do some research you might encounter an essay by Bret Victor-What can a technologist do about climate change? There's a whole pile of good ideas in there, and it's worth reading, but the short version is that you can use technology to "create options for policy-makers."

Thing is, policy-makers aren't doing very much.

So this essay isn't about technology, because technology isn't the bottleneck right now, it's about policy and politics what you can do about it. It's still written for software developers, because that's who I write for, but also because software developers often have access to two critical catalysts for political change. And it's written for software developers in the US, because that's where I live, and because the US is a big part of the problem.

But before I go into what you can do, let me tell you the story of a small success I happened to be involved in, a small step towards a better future.

Infrastructure and the status quo

About a year ago I spent some of my mornings handing out pamphlets to bicycle riders. I looked like an idiot: in order to show I was one of them I wore my bike helmet, which is weirdly shaped and the color of fluorescent yellow snot.

After finding an intersection with plenty of bicycle riders and a long red light that forces them to stop, I would do the following:

  1. When the light turns red, step into the street and hand out the pamphlet.
  2. Keep an eye out for the light changing to green so that I didn't get run over by moving cars.
  3. Twiddle my thumbs waiting for the next light cycle.

It was boring, and not very glamorous.

I was one of just many volunteers, and besides gathering signatures we also held rallies, had conversations with city councilors and staff, wrote emails, talked at city council meetings-it was a process. The total effort took a couple of years (and I only joined in towards the end)-but in the end we succeeded.

We succeeded in having the council pass a short ordinance, a city-level law in the city of Cambridge, Massachusetts. The ordinance states that whenever a road that was supposed to have protected bike lanes (per the city's Bike Plan) was rebuilt from scratch, it would have those lanes built by default.

Now, clearly this ordinance isn't going to solve climate change. In fact, nothing Cambridge does as a city will solve climate change, because there's only so much impact 100,000 people can have on greenhouse gas emissions.

But while in some ways this ordinance was a tiny victory in a massive war, if we take a step back it's actually more important than it seems. In particular, this ordinance has three effects:

  1. Locally, safer bike infrastructure means more bicycle riders, and fewer car drivers. That reduces emissions-a little.
  2. Over time, more bicycle riders can kick off a positive feedback cycle, reducing emissions even more.
  3. Most significantly, local initiatives spread to other cities-kicking off these three effects in those other cities.

Let's examine these effects one by one.

Effect #1: Fewer cars, less emissions

About 43% of the greenhouse gas emissions in Massachusetts are due to transportation; for the US overall it's 29% (ref). And that means cars.

The reason people in the US mostly drive cars is because all the transportation infrastructure is built for cars. No bike lanes, infrequent, slow and non-existent buses, no trains… Even in cities, where other means of transportation are feasible, the whole built infrastructure sends the very strong message that cars are the only reasonable way to get around.

If we focus on bicycles, our example at hand, the problem is that riding a bicycle can be dangerous-mostly because of all those cars! But if you get rid of the danger and build good infrastructure-dedicated protected bike lanes that separate bicycle riders from those dangerous cars-then bicycle use goes up.

Consider what Copenhagen achieved between 2008 and 2017 (ref):

2008 2018
# of seriously injured cyclists 121 81
% who residents who feel secure cycling 51 77
% who cycle to work/school 37 49

With safer infrastructure for bicycles, perception of safety goes up, and people bike more and drive less. Similarly, if you have frequent, fast, and reliable buses and trains, people drive less. And that means less carbon emissions.

In Copenhagen the number of kilometers driven by cars was flat or slightly down over those 10 years-whereas in the US, it's up 6-7% (ref).

Effect #2: A positive feedback loop

The changes in Copenhagen are a result of a plan the city government there adopted in 2011 (ref): they're the result of a policy action. And the political will was there in part because there were already a huge number of bicycle riders. So it's a positive feedback loop, and a good one.

Let's see how this is happening in Cambridge:

If Copenhagen can reach 50% of residents with a bicycle commute, so can Cambridge-and the ordinance is a good step in that direction.

Effect #3: The idea spreads

The Cambridge ordinance passed in April 2019-and the idea is spreading elsewhere:

All of this is the result of local advocacy-but I've no doubt Cambridge's example helped. It's always easier to be the second adopter. And the examples from these larger localities will no doubt inspire other groups and cities, spreading the idea even more.

Change requires politics

Bike infrastructure is just an example, not a solution-but there are three takeaways from this story that I'd like to emphasize:

By politics I don't just mean having an opinion or voting for a candidate, but rather engaging in the process of how policy decisions are made.

Merely having an opinion doesn't change anything. For example, two-thirds of Cambridge residents support building more protected bike lanes (ref). But that doesn't mean that many protected lanes are getting built-the neighboring much smaller city of Somerville is building far more than Cambridge.

The only reason the city polled residents about bike lanes is because, one suspects, all the fuss we'd been making-emails, rallies, meetings, city council policy orders-made the city staff wonder if bike infrastructure really had a lot of public support or not.

Voting results in some change, but not enough. Elected officials and government staff have lots and lots of things to worry about-if they're not being pressured to focus on a particular issue, it's likely to fall behind.

What's more, the candidates you get to vote for have to get on the ballot, and to do that they need money (for advertising, hiring staff, buying supplies). Lacking money, they need volunteer time.

And it's much easier for a small group of rich people to provide that support to the candidates they want-so by the time you're voting, you only get to choose between candidates that have been pre-vetted (I highly recommend reading The Golden Rule to understand how this works on a national level).

What you can do: Become an activist

In the end power is social. Power comes from people showing up to meetings, people showing up for rallies, people going door-to-door convincing other people to vote for the right person or support the right initiative, people blocking roads and making a fuss.

And that takes time and money.

So if you want to change policy, you need to engage in politics, with time and money:

Here are some policies you might be interested in:

Where you should do it: Start local

If you are going to become an activist, the local level is a good starting point.

Of course, local organizing is just the starting point for creating change on the global level. But you have to start somewhere. And global change is a lot easier if you have thousands of local organizations supporting it.

It's a good to be a software developer

Let's get back to our starting point-you're paid to write software, you want to do something about climate change. As a software developer you likely have access to the inputs needed to make political campaigns succeed-both candidate-based and issue-based:

If you don't have children or other responsibilities, you can work a 40-hour workweek, leaving you time for other things. Before I got married I worked full-time and went to a local adult education college half-time in the evenings: it was a lot of work, but it was totally doable. Set boundaries at your job, and you'll have at least some free time for activism.

You can also negotiate a shorter workweek, which is possible in part because software developers are in such demand. I've done this, I've interviewed people who have done it, I've found many random people on the Internet who have done it-it is possible.

If you need help doing it yourself, I've written a book to help you negotiate a shorter workweek. If you want to negotiate a shorter workweek so you have time for political activism, you can use the code FIGHTCLIMATECHANGE to get the book for 60% off.

Some common responses

"There will never be the political will to make this happen"

Things do change, for better and for worse, and sometimes unexpectedly. To give a couple of examples:

The timelines for gay marriage and cannabis legalization in the US are illuminating: these things didn't just happen, it was the result of long, sustained activist efforts, much of it at the local level.

Local changes do make a difference.

"Politics is awful and broken"

So are all our software tools, and somehow we manage to get things done!

"I don't like your policy suggestions, we should do X instead"

No problem, find the local groups that promote your favorite policies and join them.

"The necessary policies will never work because of problem Y"

Same answer: join and help the local groups working on Y.

"It's too late, the planet is doomed no matter what we do"

Perhaps, but it's very hard to say. So we're in Pascal's Wager territory here: given even a tiny chance there is something we can do, we had better do our best to make it happen.

And even if humanity really is doomed, there's always the hope that someday a hyperintelligent species of cockroach will inherit the Earth. And when cockroach archaeologists try to reconstruct our history, I would like them to be able to say, loosely translated from their complex pheromone-and-dancing system of communication: "These meatsacks may not have been as good at surviving as us cockroaches-but at least they tried!"

Time to get started

If you find this argument compelling-that policy is driven by power, and that power requires social mobilization-then it's up to you to take the next step. Find a local group or candidate pushing for a policy you care about, and show up for the next meeting.

And the meeting after that.

And then go to the rally.

And knock on doors.

And make some friends, and make some changes happen.

Some of the work is fun, some of it is boring, but there's plenty to do-time to get started!



Struggling with a 40-hour workweek? Too tired by the end of the day to do anything but collapse on the sofa and watch TV?

Learn how you can get a 3-day weekend, every single week.

10 Sep 2019 4:00am GMT

09 Sep 2019

feedPlanet Twisted

Ralph Meijer: XMPP Message Attaching, Fastening, References

Services like Twitter and Slack have functionality that attempts to interpret parts of the plain text of tweets or message as entered by the user. Pieces of the text that look like links, mentions of another user, hash tags, or stock symbols, cause additional meta data to be added to the object representing the message, so that receiving clients can mark up those pieces of text in a special way. Twitter calls this meta data Tweet Entities and for each piece of interpreted text, it includes indices for the start and end of along with additional information depending on the type of entity. A client can then do in-line replacements at the exact character indices, e.g. by making it into a hyperlink. Twitter Entities served as inspiration for XEP-0372: References.

References can be used in two ways: including a reference as a sibling to the body element of a message. The begin and end attributes then point to the indices of the plain text in the body. This would typically be used if the interpretation of the message is done by the sending client.

Alternatively, a service (e.g. a MUC service) could parse incoming messages and send a separate stanza to mark up the original stanza. In this case you need a mechanism for pointing to that other message. There have been two proposals for this, with slightly differing approaches, and in the examples below, I'll use the proto-XEP Message Fastening. While pointing to the stanza ID of the other message, it embeds a reference element in the apply-to element.

Mentioning another user

Let's start out with the example of mentioning another user.

<message from="room@muc.this.example/Kev" type="groupchat">
  <stanza-id id="2019-09-02-1" by="room@muc.this.example"
             xmlns="urn:xmpp:sid:0"/>
  <body>Some rubbish @ralphm</body>
</message>

A client might render this as:

Kev 13:04

Some rubbish @ralphm

The MUC service then parses the plain-text message, and finds a reference to my nickname prefixed with an @-sign, and sends a stanza to the room that marks up the message Kev sent to me.

<message from="room@muc.this.example"
         type="groupchat">
  <stanza-id xmlns="urn:xmpp:sid:0"
             id="2019-09-02-2" by="room@muc.this.example"/>
  <apply-to xmlns="urn:xmpp:fasten:0"
            id="2019-09-02-1">
    <reference begin="13" end="19" xmlns="urn:example:reference:0">
      <mention jid="room@muc.this.example/ralphm"/>
    </reference>
  </apply-to>
</message>

This stanza declares that it is attached to the previous message by the stanza ID that was included with the original stanza. In its payload, it includes a reference, referring to the characters 13 through 19. It has a mention child pointing to my occupant JID. Alternatively, the room might have linked to my real JID. A client can then alter the presentation of the original message to use the attached mention reference:

Kev 13:04

Some rubbish @ralphm

The characters referencing @ralphm are now highlighted, hovering the mention shows a tooltip with my full name, and clicking on it brings you to a page describing me. This information was not present in the stanza, but a client can use the XMPP URI as a key to present additional information. E.g. from the user's contact list, by doing a vCard lookup, etc.


Note:

The current specification for References does not have defined child elements, but instead uses a type attribute and URIs. However, Jonas Wielicki Schäfer provided some valuable feedback, suggesting this idea. By using a dedicated element for the target of the reference, each can have their own attributes, making it more explicit. Also, it is a natural extension point, by including a differently namespaced element instead.


Referring to previous messages

<message from="room@muc.this.example/Ge0rG"
         type="groupchat">
  <stanza-id xmlns="urn:xmpp:sid:0"
             id="2019-09-02-3" by="room@muc.this.example"/>
  <reference begin="0" end="6" xmlns="urn:example:reference:0">
    <mention jid="room@muc.this.example/ralphm"/>
  </reference>
  <reference begin="26" end="32" xmlns="urn:example:reference:0">
    <message id="2019-09-02-1"/>
  </reference>
  <body>@ralphm did you see Kev's message earlier?</body>
</message>

Unlike before, this example does not point to another stanza with apply-to. Instead, Ge0rG's client added references to go along with the plain-text body: one for the mention of me, and one for a reference to an earlier message.

Ge0rG 13:16

@ralphm did you see Kev's message earlier?

Emoji Reactions

Instead of reacting with a full message, Slack, like online forum software much earlier, has the ability to attach emoji reactions to messages.

<message from="room@muc.this.example/Kev"
         type="groupchat">
  <stanza-id xmlns="urn:xmpp:sid:0"
            id="2019-09-02-4" by="room@muc.this.example"/>
  <apply-to xmlns="urn:xmpp:fasten:0"
            id="2019-09-02-3">
    <reactions xmlns="urn:example:reactions:0">
      <reaction label=":+1:">👍</reaction>
    </reactions>
  </apply-to>
</message>
<message from="room@muc.this.example/ralphm"
         type="groupchat">
  <stanza-id xmlns="urn:xmpp:sid:0"
             id="2019-09-02-6" by="room@muc.this.example"/>
  <apply-to xmlns="urn:xmpp:fasten:0"
            id="2019-09-02-3">
    <reactions xmlns="urn:example:reactions:0">
      <reaction label=":parrot:"
                img="cid:b729aec3f521694a35c3fc94d7477b32bc6444ca@bob.xmpp.org"/>
    </reactions>
  </apply-to>
</message>

These two examples show two separate instances of a person reacting to the previous message by Ge0rG. It uses the protocol from Message Reactions, another Proto-XEP. However, I expanded on it by introducing two new attributes. The label allows for a textual shorthand, that might be typed by a user. Custom emoji can be represented with the img attribute, that points to a XEP-0231: Bits of Binary object.

Ge0rG 13:16

@ralphm did you see Kev's message earlier?

👍 2 1

The attached emoji are rendered below the original message, and hovering over them reveals who were the respondents. Here my own reaction is highlighted by a squircle border.

Including a link

<message from="room@muc.this.example/ralphm" type="groupchat">
  <stanza-id id="2019-09-02-7" by="room@muc.this.example"
             xmlns="urn:xmpp:sid:0"/>
  <body>Have you seen https://ralphm.net/blog/2013/10/10/logitech_t630?</body>
</message>
<message from="room@muc.this.example"
         type="groupchat">
  <stanza-id xmlns="urn:xmpp:sid:0"
             id="2019-09-02-8" by="room@muc.this.example"/>
  <apply-to xmlns="urn:xmpp:fasten:0"
            id="2019-09-02-7">
    <reference begin="14" end="61" xmlns="urn:example:reference:0">
      <link url="https://ralphm.net/blog/2013/10/10/logitech_t630"/>
    </reference>
  </apply-to>
</message>

Here the MUC service marks up the original messages with an explicit link reference. Possibly, the protocol might be extended so that a service can include shortened versions of the URL for display purposes.

ralphm 13:16

Have you seen https://ralphm.net/blog/2013/10/10/logitech_t630?

Logitech Ultrathin Touch Mouse

Logitech input devices are my favorite. This tiny bluetooth mouse is a nice portable device for every day use or while traveling.

The client has used the markup to fetch meta data on the URL and presents a summary card below the original message. Alternatively, the MUC service could have done this using XEP-0385: Stateless Inline Media Sharing (SIMS):

<message from="room@muc.this.example"
         type="groupchat">
  <stanza-id xmlns="urn:xmpp:sid:0"
             id="2019-09-02-8" by="room@muc.this.example"/>
  <apply-to xmlns="urn:xmpp:fasten:0"
            id="2019-09-02-7">
    <reference begin="14" end="61" xmlns="urn:example:reference:0">
      <link url="https://ralphm.net/blog/2013/10/10/logitech_t630"/>
      <card xmlns="urn:example:card:0">
        <title>Logitech Ultrathin Touch Mouse</ulink></title>
        <description>Logitech input devices are my favorite. This tiny bluetooth mouse is a nice portable device for every day use or while traveling.</description>
      </card>
      <media-sharing xmlns='urn:xmpp:sims:1'>
        <file xmlns='urn:xmpp:jingle:apps:file-transfer:5'>
          <media-type>image/jpeg</media-type>
          <name>ultrathin-touch-mouse-t630.jpg</name>
          <size>23458</size>
          <hash xmlns='urn:xmpp:hashes:2' algo='sha3-256'>5TOeoNI9z6rN5f+cQagnCgxitQE0VUgzCMeQ9JqbhWJT/FzPpDTTFCbbo1jWwOsIoo9u0hQk6CPxH4t/dvTN0Q==</hash>
          <thumbnail xmlns='urn:xmpp:thumbs:1'uri='cid:sha1+21ed723481c24efed81f256c8ed11854a8d47eff@bob.xmpp.org' media-type='image/jpeg' width='116' height='128'/>
        </file>
        <sources>
          <reference xmlns='urn:xmpp:reference:0' type='data' uri='https://test.ralphm.net/images/blog/ultrathin-touch-mouse-t630.jpg' />
        </sources>
      </media-sharing>
    </reference>
  </apply-to>
</message>

Editing a previous message

<message from="room@muc.this.example/ralphm" type="groupchat">
  <stanza-id id="2019-09-02-9" by="room@muc.this.example"
             xmlns="urn:xmpp:sid:0"/>
  <body>Some thoughtful reply</body>
</message>
ralphm 13:19

Some thoughtful reply

After sending that message, I want to add a bit more information:

<message from="room@muc.this.example/ralphm" type="groupchat">
  <stanza-id id="2019-09-02-10" by="room@muc.this.example"
             xmlns="urn:xmpp:sid:0"/>
  <apply-to xmlns="urn:xmpp:fasten:0"
            id="2019-09-02-9">
    <external name='body'/>
    <replace xmlns='urn:example:message-correct:1'/>
  </apply-to>
  <body>Some more thoughtful reply</body>
</message>

Unlike XEP-0308: Last Message Correction, this example uses Fastening to refer to the original message. I would also lift the restriction on correcting just the last message, but allow any previous message to be edited.

ralphm 13:19

Some more thoughtful reply

Upon receiving the correction, the client indicates that the message has been edited. Hovering over the marker reveals when the message was changed.

Editing a previous message that had fastened references

<message from="room@muc.this.example/Kev" type="groupchat">
  <stanza-id id="2019-09-02-11" by="room@muc.this.example"
             xmlns="urn:xmpp:sid:0"/>
  <body>A witty response mentioning @ralphm</body>
</message>
<message from="room@muc.this.example"
         type="groupchat">
  <stanza-id xmlns="urn:xmpp:sid:0"
             id="2019-09-02-12" by="room@muc.this.example"/>
  <apply-to xmlns="urn:xmpp:fasten:0"
            id="2019-09-02-11">
    <reference begin="28" end="34" xmlns="urn:example:reference:0">
      <mention jid="room@muc.this.example/ralphm"/>
    </reference>
  </apply-to>
</message>
Kev 13:26

A witty response mentioning @ralphm

After a bit of consideration, Kev edits his response:

<message from="room@muc.this.example/Kev" type="groupchat">
  <stanza-id id="2019-09-02-13" by="room@muc.this.example"
             xmlns="urn:xmpp:sid:0"/>
  <apply-to xmlns="urn:xmpp:fasten:0"
            id="2019-09-02-11">
    <external name='body'/>
    <replace xmlns='urn:example:message-correct:1'/>
  </apply-to>
  <body>A slighty wittier response mentioning @ralphm</body>
</message>
Kev 13:26

A slightly wittier response mentioning @ralphm

Upon receiving the correction, the client discards all fastened references. The body text was changed, so the reference indices are stale. The room can then send a new stanza marking up the new text:

<message from="room@muc.this.example"
         type="groupchat">
  <stanza-id xmlns="urn:xmpp:sid:0"
             id="2019-09-02-14" by="room@muc.this.example"/>
  <apply-to xmlns="urn:xmpp:fasten:0"
            id="2019-09-02-11">
    <reference begin="40" end="46" xmlns="urn:example:reference:0">
      <mention jid="room@muc.this.example/ralphm"/>
    </reference>
  </apply-to>
</message>
Kev 13:26

A slightly wittier response mentioning @ralphm

Closing notes

09 Sep 2019 2:37pm GMT

16 Aug 2019

feedPlanet Twisted

Twisted Matrix Laboratories: Twisted 19.7.0 Released

On behalf of Twisted Matrix Laboratories and our long-suffering release manager Amber Brown, I am honored to announce1 the release of Twisted 19.7.0!


The highlights of this release include:
  • A full description on the PyPI page! Check it out here: https://pypi.org/project/Twisted/19.7.0/ (and compare to the slightly sad previous version, here: https://pypi.org/project/Twisted/19.2.1/)
  • twisted.test.proto_helpers has been renamed to "twisted.internet.testing"
    • This removes the gross special-case carve-out where it was the only "public" API in a test module, and now the rule is that all test modules are private once again.
  • Conch's SSH server now supports hmac-sha2-512.
  • The XMPP server in Twisted Words will now validate certificates!
  • A nasty data-corruption bug in the IOCP reactor was fixed. If you're doing high-volume I/O on Windows you'll want to upgrade!
  • Twisted Web no longer gives clients a traceback by default, both when you instantiate Site and when you use twist web on the command line. You can turn this behavior back on for local development with twist web --display-tracebacks.
  • Several bugfixes and documentation fixes resolving bytes/unicode type confusion in twisted.web.
  • Python 3.4 is no longer supported.
pip install -U twisted[tls] and enjoy all these enhancements today!

Thanks for using Twisted,

-glyph

1: somewhat belatedly: it came out 10 days ago. Oops!

16 Aug 2019 6:38am GMT

08 Aug 2019

feedPlanet Twisted

Moshe Zadka: Designing Interfaces

One of the items of feedback I got from the article about interface immutability is that it did not give any concrete feedback for how to design interfaces. Given that they are forever, it would be good to have some sort of guidance.

The first item is that you want something that uses the implementation, as well as several distinct implementations. However, this item is too obvious: in almost all cases I have seen in the wild of a bad interface, this guideline was followed.

It was also followed in all cases of a good interface.

I think this guideline is covered well enough that by the time anyone designs a real interface, they understand that. Why am I mentioning this guideline at all, then?

Because I think it is important for the context of the guideline that I do think actually distinguishes good interfaces from bad interfaces. It is almost identical to the non-criterion above!

The real guideline is: something that uses the implementation, as well as several distinct implementations that do not share a superclass (other than object or whatever is in the top of the hierarchy).

This simple addition, preventing the implementations from sharing a superclass, is surprisingly powerful. It means each implementation has to implement the "boring" parts by hand. This will immediately cause pressure to avoid "boring" parts, and instead put them in a wrapper, or in the interface user.

Otherwise, the most common failure mode is that the implementations are all basic variants on what is mostly the "big superclass".

In my experience, just the constraint on not having a "helper superclass" puts appropriate pressure on interfaces to be good.

(Thanks to Tom Most for his encouragement to write this, and the feedback on an earlier draft. Any mistakes that remain are my responsibility.)

08 Aug 2019 5:20am GMT

13 Jul 2019

feedPlanet Twisted

Moshe Zadka: Interfaces are forever

(The following talks about zope.interface interfaces, but applies equally well to Java interfaces, Go interfaces, and probably other similar constructs.)

When we write a function, we can sometimes change it in backwards-compatible ways. For example, we can loosen the type of a variable. We can restrict the type of the return value. We can add an optional argument.

We can even have a backwards compatible path to make an argument required. We add an optional argument, and encourage people to change it. Then, in the next version, we make the default value be one that causes a warning. In a version after that, we make the value required. At each point, someone could write a library that worked with at least two consecutive versions.

In a similar way, we can have a path to remove an argument. First make it optional. Then warn when it is passed in. Finally, remove it and make it an error to pass it in.

As long as we do not intend to support inheritance, making backwards compatible changes to classes also works. For example, to remove a method we first have a version that warns when you call it, and then remove it in a succeeding version.

However, what changes can we make to an interface?

Assume we have an interface like:

from zope.interface import Interface, implements

class IFancyFormat(Interface):

    def fancify_int(value: int) -> str:
        pass

It is a perfectly reasonable, if thin, interface. Implementing it seems like fun:

@implements(IFancyFormat)
@attr.s(auto_attribs=True)
class FancySuffixer:
    suffix: str

    def fancify_int(self, value: int) -> str:
        return str(value) + self.suffix

Using it also seems like fun:

def dashify_fancy_five(fancifier: IFancyFormat) -> str:
    return f"---{fancifier.fancify_int(5)}---"

These are very different kinds of fun, though! Probably the kind of fun that appeals to different people. The first implementation is in the superfancy open-source library. The second one is in the dash_five open-source library. Such is the beauty of open source: it takes all kinds of people.

We cannot add a method to IFancyFormat: the superfancy library has a unit test that uses verifyImplements, which will fail if we add a method. We cannot remove the method fancify_int, since this will break dash_five: the mypy check will fail, since IFancifySuffixer will not have that method.

Similarly, we cannot make the parameter optional without breaking superfancy, or loosen the return type without breaking dash_five. Once we have published IFancyFormat as an API, it cannot change.

The only way to recover from a bad interface is to create a new interface, IAwesomeFancyFormat. Then write conversion functions from and to IFancyFormat and IAwesomeFancyFormat. Then deprecate using the IFancyFormat interface. Finally, we can remove the interface. Then we can alias IFancyFormat == IAwesomeFancyFormat, and eventually, maybe even deprecate the name IAwesomeFancyFormat.

When publishing interfaces, one must be careful: to a first approximation, they are forever.

(Thanks to Glyph Lefkowitz for his helpful suggestions. Any mistakes or issues that are left are my responsibility.)

13 Jul 2019 5:00am GMT

14 Jun 2019

feedPlanet Twisted

Glyph Lefkowitz: Toward a “Kernel Python”

Prompted by Amber Brown's presentation at the Python Language Summit last month, Christian Heimes has followed up on his own earlier work on slimming down the Python standard library, and created a proper Python Enhancement Proposal PEP 594 for removing obviously obsolete and unmaintained detritus from the standard library.

PEP 594 is great news for Python, and in particular for the maintainers of its standard library, who can now address a reduced surface area. A brief trip through the PEP's rogues gallery of modules to deprecate or remove1 is illuminating. The python standard library contains plenty of useful modules, but it also hides a veritable necropolis of code, a towering monument to obsolescence, threatening to topple over on its maintainers at any point.

However, I believe the PEP may be approaching the problem from the wrong direction. Currently, the standard library is maintained in tandem with, and by the maintainers of, the CPython python runtime. Large portions of it are simply included in the hope that it might be useful to somebody. In the aforementioned PEP, you can see this logic at work in defense of the colorsys module: why not remove it? "The module is useful to convert CSS colors between coordinate systems. [It] does not impose maintenance overhead on core development."

There was a time when Internet access was scarce, and maybe it was helpful to pre-load Python with lots of stuff so it could be pre-packaged with the Python binaries on the CD-ROM when you first started learning.

Today, however, the modules you need to convert colors between coordinate systems are only a pip install away. The bigger core interpreter is just more to download before you can get started.

Why Didn't You Review My PR?

So let's examine that claim: does a tiny module like colorsys "impose maintenance overhead on core development"?

The core maintainers have enough going on just trying to maintain the huge and ancient C codebase that is CPython itself. As Mariatta put it in her North Bay Python keynote, the most common question that core developers get is "Why haven't you looked at my PR?" And the answer? It's easier to not look at PRs when you don't care about them. This from a talk about what it means to be a core developer!

One might ask, whether Twisted has the same problem. Twisted is a big collection of loosely-connected modules too; a sort of standard library for networking. Are clients and servers for SSH, IMAP, HTTP, TLS, et. al. all a bit much to try to cram into one package?

I'm compelled to reply: yes. Twisted is monolithic because it dates back to a similar historical period as CPython, where installing stuff was really complicated. So I am both sympathetic and empathetic towards CPython's plight.

At some point, each sub-project within Twisted should ideally become a separate project with its own repository, CI, website, and of course its own more focused maintainers. We've been slowly splitting out projects already, where we can find a natural boundary. Some things that started in Twisted like constantly and incremental have been split out; deferred and filepath are in the process of getting that treatment as well. Other projects absorbed into the org continue to live separately, like klein and treq. As we figure out how to reduce the overhead of setting up and maintaining the CI and release infrastructure for each of them, we'll do more of this.


But is our monolithic nature the most pressing problem, or even a serious problem, for the project? Let's quantify it.

As of this writing, Twisted has 5 outstanding un-reviewed pull requests in our review queue. The median time a ticket spends in review is roughly four and a half days.2 The oldest ticket in our queue dates from April 22, which means it's been less than 2 months since our oldest un-reviewed PR was submitted.

It's always a struggle to find enough maintainers and enough time to respond to pull requests. Subjectively, it does sometimes feel like "Why won't you review my pull request?" is a question we do still get all too often. We aren't always doing this well, but all in all, we're managing; the queue hovers between 0 at its lowest and 25 or so during a bad month.

By comparison to those numbers, how is core CPython doing?

Looking at CPython's keyword-based review queue queue, we can see that there are 429 tickets currently awaiting review. The oldest PR awaiting review hasn't been touched since February 2, 2018, which is almost 500 days old.

How many are interpreter issues and how many are stdlib issues? Clearly review latency is a problem, but would removing the stdlib even help?

For a quick and highly unscientific estimate, I scanned the first (oldest) page of PRs in the query above. By my subjective assessment, on this page of 25 PRs, 14 were about the standard library, 10 were about the core language or interpreter code; one was a minor documentation issue that didn't really apply to either. If I can hazard a very rough estimate based on this proportion, somewhere around half of the unreviewed PRs might be in standard library code.


So the first reason the CPython core team needs to stop maintaining the standard library because they literally don't have the capacity to maintain the standard library. Or to put it differently: they aren't maintaining it, and what remains is to admit that and start splitting it out.

It's true that none of the open PRs on CPython are in colorsys3. It does not, in fact, impose maintenance overhead on core development. Core development imposes maintenance overhead on it. If I wanted to update the colorsys module to be more modern - perhaps to have a Color object rather than a collection of free functions, perhaps to support integer color models - I'd likely have to wait 500 days, or more, for a review.

As a result, code in the standard library is harder to change, which means its users are less motivated to contribute to it. CPython's unusually infrequent releases also slow down the development of library code and decrease the usefulness of feedback from users. It's no accident that almost all of the modules in the standard library have actively maintained alternatives outside of it: it's not a failure on the part of the stdlib's maintainers. The whole process is set up to produce stagnation in all but the most frequently used parts of the stdlib, and that's exactly what it does.

New Environments, New Requirements

Perhaps even more importantly is that bundling together CPython with the definition of the standard library privileges CPython itself, and the use-cases that it supports, above every other implementation of the language.

Podcast after podcast after podcast after keynote tells us that in order to keep succeeding and expanding, Python needs to grow into new areas: particularly web frontends, but also mobile clients, embedded systems, and console games.

These environments require one or both of:

In all of these cases, determining which modules have been removed from the standard library is a sticking point. They have to be discovered by a process of trial and error; notably, a process completely different from the standard process for determining dependencies within a Python application. There's no install_requires declaration you can put in your setup.py that indicates that your library uses a stdlib module that your target Python runtime might leave out due to space constraints.

You can even have this problem even if all you ever use is the standard python on your Linux installation. Even server- and desktop-class Linux distributions have the same need for a more minimal core Python package, and so they already chop up the standard library somewhat arbitrarily. This can break the expectations of many python codebases, and result in bugs where even pip install won't work.

Take It All Out

How about the suggestion that we should do only a little a day? Although it sounds convincing, don't be fooled. The reason you never seem to finish is precisely because you tidy a little at a time. [...] The ultimate secret of success is this: If you tidy up in one shot, rather than little by little, you can dramatically change your mind-set.

- Kondō, Marie.
"The Life-Changing Magic of Tidying Up"
(p. 15-16)

While incremental slimming of the standard library is a step in the right direction, incremental change can only get us so far. As Marie Kondō says, when you really want to tidy up, the first step is to take everything out so that you can really see everything, and put back only what you need.

It's time to thank those modules which do not spark joy and send them on their way.

We need a "kernel" version of Python that contains only the most absolutely minimal library, so that all implementations can agree on a core baseline that gives you a "python", and applications, even those that want to run on web browsers or microcontrollers, can simply state their additional requirements in terms of requirements.txt.

Now, there are some business environments where adding things to your requirements.txt is a fraught, bureaucratic process, and in those places, a large standard library might seem appealing. But "standard library" is a purely arbitrary boundary that the procurement processes in such places have drawn, and an equally arbitrary line may be easily drawn around a binary distribution.

So it may indeed be useful for some CPython binary distributions - perhaps even the official ones - to still ship with a broader selection of modules from PyPI. Even for the average user, in order to use it for development, at the very least, you'd need enough stdlib stuff that pip can bootstrap itself, to install the other modules you need!

It's already the case, today, that pip is distributed with Python, but isn't maintained in the CPython repository. What the default Python binary installer ships with is already a separate question from what is developed in the CPython repo, or what ships in the individual source tarball for the interpreter.

In order to use Linux, you need bootable media with a huge array of additional programs. That doesn't mean the Linux kernel itself is in one giant repository, where the hundreds of applications you need for a functioning Linux server are all maintained by one team. The Linux kernel project is immensely valuable, but functioning operating systems which use it are built from the combination of the Linux kernel and a wide variety of separately maintained libraries and programs.

Conclusion

The "batteries included" philosophy was a great fit for the time when it was created: a booster rocket to sneak Python into the imagination of the programming public. As the open source and Python packaging ecosystems have matured, however, this strategy has not aged well, and like any booster, we must let it fall back to earth, lest it drag us back down with it.

New Python runtimes, new deployment targets, and new developer audiences all present tremendous opportunities for the Python community to soar ever higher.

But to do it, we need a newer, leaner, unburdened "kernel" Python. We need to dump the whole standard library out on the floor, adding back only the smallest bits that we need, so that we can tell what is truly necessary and what's just nice to have.

I hope I've convinced at least a few of you that we need a kernel Python.

Now: who wants to write the PEP?

🚀

Acknowledgments

Thanks to Jean-Paul Calderone, Donald Stufft, Alex Gaynor, Amber Brown, Ian Cordasco, Jonathan Lange, Augie Fackler, Hynek Schlawack, Pete Fein, Mark Williams, Tom Most, Jeremy Thurgood, and Aaron Gallagher for feedback and corrections on earlier drafts of this post. Any errors of course remain my own.


  1. sunau, xdrlib, and chunk are my personal favorites.

  2. Yeah, yeah, you got me, the mean is 102 days.

  3. Well, as it turns out, one is on colorsys, but it's a documentation fix that Alex Gaynor filed after reviewing a draft of this post so I don't think it really counts.

14 Jun 2019 4:51am GMT

06 Jun 2019

feedPlanet Twisted

Twisted Matrix Laboratories: Twisted 19.2.1 Released

On behalf of Twisted Matrix Laboratories, I am honoured to announce the release of Twisted 19.2.1!

This is a security release, and contains the following changes:
  • All HTTP clients in twisted.web.client now raise a ValueError when called with a method and/or URL that contain invalid characters. This mitigates CVE-2019-12387. Thanks to Alex Brasetvik for reporting this vulnerability.
It is recommended you update to this release as soon as is practical.

Additional mitigation may be required if Twisted is not your only HTTP client library:

You can find the downloads at <https://pypi.python.org/pypi/Twisted> (or alternatively <http://twistedmatrix.com/trac/wiki/Downloads>). The NEWS file is also available at <https://github.com/twisted/twisted/blob/twisted-19.2.1/NEWS.rst>.

Twisted Regards,
Amber Brown (HawkOwl)

06 Jun 2019 2:49pm GMT

03 Jun 2019

feedPlanet Twisted

Hynek Schlawack: Python in Azure Pipelines, Step by Step

Since the acquisition of Travis CI, the future of their free offering is unclear. Azure Pipelines has a generous free tier, but the examples I found are discouragingly complex and take advantage of features like templating that most projects don't need. To close that gap, this article shows you how to move a Python project with simple CI needs from Travis CI to Azure Pipelines.

03 Jun 2019 9:14am GMT

28 May 2019

feedPlanet Twisted

Moshe Zadka: Analyzing the Stack Overflow Survey

The Stack Overflow Survey Results for 2019 are in! There is some official analysis, that mentioned some things that mattered to me, and some that did not. I decided to dig into the data and see if I can find some things that would potentially interest my readership.

import csv, collections, itertools
with open("survey_results_public.csv") as fpin:
    reader = csv.DictReader(fpin)
    responses = list(reader)
len(responses)
88883

Wow, almost 90K respondents! This is the sweet spots of "enough to make meaningful generalizations" while being able to analyze with rudimentary tools, not big-data-ware.

pythonistas = [x for x in responses if 'Python' in x['LanguageWorkedWith']]
len(pythonistas)/len(responses)
0.41001091322300104

About 40% of the respondents use Python in some capacity. That is pretty cool! This is one of the things where I wonder if there is bias in the source data. Are people who use Stack Overflow, or respond to surveys for SO, more likely to be the kind of person who uses Python? Or less?

In any case, I am excited! This means my favorite language, for all its issues, is doing well. This is also a good reminder that we need to think about the consequences of our decisions on a big swath of developers we will never ever meet.

opensource = collections.Counter(x['OpenSourcer'] for x in pythonistas)
sorted(opensource.items(), key=lambda x:x[1], reverse=True)
[('Never', 11310),
 ('Less than once per year', 10374),
 ('Less than once a month but more than once per year', 9572),
 ('Once a month or more often', 5187)]
opensource['Once a month or more often']/len(pythonistas)
0.1423318607139917

Python is open source. Almost all important libraries (Django, Pandas, PyTorch, requests) are open source. Many important tools (Jupyter) are open source. The number of people who contribute to them with any kind of regular cadence is less than 15%.

general_opensource = collections.Counter(x['OpenSourcer'] for x in responses)
sorted(general_opensource.items(), key=lambda x:x[1], reverse=True)
[('Never', 32295),
 ('Less than once per year', 24972),
 ('Less than once a month but more than once per year', 20561),
 ('Once a month or more often', 11055)]

The Python community does compare well to the general populace, though!

devtype = collections.Counter(itertools.chain.from_iterable(x["DevType"].split(";") for x in pythonistas))
devtype['DevOps specialist']/len(responses)
0.052282213696657406

About 5% of total respondents are my peers: using Python for DevOps. That is pretty exciting! My interest in that is not merely theoretical, my upcoming book targets that crowd.

general_devtype = collections.Counter(itertools.chain.from_iterable(x["DevType"].split(";") for x in responses))
general_devtype['DevOps specialist']/len(responses), devtype['DevOps specialist']/len(pythonistas)
(0.09970410539698255, 0.12751420025793705)

In general, DevOps specialists are 10% of respondents.

devtype['DevOps specialist']/general_devtype['DevOps specialist']
0.524373730534868

Over 50% of DevOps specialists use Python!

def safe_int(x):
    try:
        return int(x)
    except ValueError:
        return -1

intermediate = sum(1 for x in pythonistas if 1<=safe_int(x['YearsCode'])<=5)

My next hush-hush (for now!) project is going to be targeting intermediate Python developers. I wish I could slice by "number of years writing in Python, but this is the best I could do. (I treat "NA" responses as "not intermediate". This is OK, since I prefer to underestimate rather than overestimate.)

intermediate/len(responses)
0.11346376697456206

11%! Not bad.

general_intermediate = sum(1 for x in responses if 1<=safe_int(x['YearsCode'])<=5)
intermediate/len(pythonistas), general_intermediate/len(responses)
(0.27673352907279863, 0.2671264471271222)

Seems like using Python does not change much the chances of someone being intermediate.

Summary

  • 40% of respondents use Python. Python is kind of a big deal.
  • 5% of respondents use Python for DevOps. This is a lot! DevOps as a profession is less than 10 years old.
  • 11% of respondents are intermediate Python users. My previous book targets this crowd.

(Thanks to Robert Collins and Matthew Broberg for their comments on an earlier draft. Any remaining issues are purely my responsibility.)

28 May 2019 5:20am GMT