17 Apr 2014

feedPlanet Python

Martijn Faassen: Morepath Python 3 support

Thanks to an awesome contribution by Alec Munro, Morepath, your friendly neighborhood Python micro framework with super powers, has just gained Python 3 support!

Developing something new while juggling the complexities of Python 2 and Python 3 in my head at the same time was not something I wanted to do -- I wanted to focus on my actual goals, which was to create a great web framework.

So then I had to pick one version of Python or the other. Since my direct customer use cases involves integrating it with Python 2 code, picking Python 2 was the obvious choice.

But now that Morepath has taken shape, taking on the extra complexity of supporting Python 3 is doable. The Morepath test coverage is quite comprehensive, and I had already configured tox (so I could test it with PyPy). Adding Python 3.4 meant patiently going through all the code and adjusting it, which is what Alec did. Thank you Alec, this is great!

Morepath's dependencies (such as WebOb) already had Python 3 support, so credit goes to their maintainers too (thanks Chris McDonough in particular!). This includes the Reg library, which I polyglotted to support Python 3 myself a few months ago.

All this doesn't take away from my opinion that we need to do more to support the large Python 2 application codebases. They are much harder to transition to Python 3 than well-tested libraries and frameworks, for which the path was cleared in the last 5 years or so.

[update: this is still in git; the Morepath 0.1 release is Python 2 only. But it will be included in the upcoming Morepath 0.2 release]

17 Apr 2014 2:25pm GMT

Martijn Faassen: Morepath Python 3 support

Thanks to an awesome contribution by Alec Munro, Morepath, your friendly neighborhood Python micro framework with super powers, has just gained Python 3 support!

Developing something new while juggling the complexities of Python 2 and Python 3 in my head at the same time was not something I wanted to do -- I wanted to focus on my actual goals, which was to create a great web framework.

So then I had to pick one version of Python or the other. Since my direct customer use cases involves integrating it with Python 2 code, picking Python 2 was the obvious choice.

But now that Morepath has taken shape, taking on the extra complexity of supporting Python 3 is doable. The Morepath test coverage is quite comprehensive, and I had already configured tox (so I could test it with PyPy). Adding Python 3.4 meant patiently going through all the code and adjusting it, which is what Alec did. Thank you Alec, this is great!

Morepath's dependencies (such as WebOb) already had Python 3 support, so credit goes to their maintainers too (thanks Chris McDonough in particular!). This includes the Reg library, which I polyglotted to support Python 3 myself a few months ago.

All this doesn't take away from my opinion that we need to do more to support the large Python 2 application codebases. They are much harder to transition to Python 3 than well-tested libraries and frameworks, for which the path was cleared in the last 5 years or so.

[update: this is still in git; the Morepath 0.1 release is Python 2 only. But it will be included in the upcoming Morepath 0.2 release]

17 Apr 2014 2:25pm GMT

Continuum Analytics Blog: Bokeh 0.4.4 Released!

We are pleased to announce the release of version 0.4.4 of Bokeh, an interactive web plotting library for Python!

This release includes improved Matplotlib, ggplot, and Seaborn support, PyPy compatibility, continuous integration testing, downsampling of remote data, and initial work on Bokeh "apps".

17 Apr 2014 11:30am GMT

Continuum Analytics Blog: Bokeh 0.4.4 Released!

We are pleased to announce the release of version 0.4.4 of Bokeh, an interactive web plotting library for Python!

This release includes improved Matplotlib, ggplot, and Seaborn support, PyPy compatibility, continuous integration testing, downsampling of remote data, and initial work on Bokeh "apps".

17 Apr 2014 11:30am GMT

16 Apr 2014

feedPlanet Python

Ian Ozsvald: 2nd Early Release of High Performance Python (we added a chapter)

Here's a quick book update - we just released a second Early Release of High Performance Python which adds a chapter on lists, tuples, dictionaries and sets. This is available to anyone who has bought it already (login into O'Reilly to get the update). Shortly we'll follow with chapters on Matrices and the Multiprocessing module.

One bit of feedback we've had is that the images needed to be clearer for small-screen devices - we've increased the font sizes and removed the grey backgrounds, the updates will follow soon. If you're curious about how much paper is involved in writing a book, here's a clue:

We announce each updates along with requests for feedback via our mailing list.

I'm also planning on running some private training in London later in the year, please contact me if this is interesting? Both High Performance and Data Science are possible.

In related news - the PyDataLondon conference videos have just been released and you can see me talking on the High Performance Python landscape here.


Ian applies Data Science as an AI/Data Scientist for companies in Mor Consulting, founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

16 Apr 2014 8:11pm GMT

Ian Ozsvald: 2nd Early Release of High Performance Python (we added a chapter)

Here's a quick book update - we just released a second Early Release of High Performance Python which adds a chapter on lists, tuples, dictionaries and sets. This is available to anyone who has bought it already (login into O'Reilly to get the update). Shortly we'll follow with chapters on Matrices and the Multiprocessing module.

One bit of feedback we've had is that the images needed to be clearer for small-screen devices - we've increased the font sizes and removed the grey backgrounds, the updates will follow soon. If you're curious about how much paper is involved in writing a book, here's a clue:

We announce each updates along with requests for feedback via our mailing list.

I'm also planning on running some private training in London later in the year, please contact me if this is interesting? Both High Performance and Data Science are possible.

In related news - the PyDataLondon conference videos have just been released and you can see me talking on the High Performance Python landscape here.


Ian applies Data Science as an AI/Data Scientist for companies in Mor Consulting, founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

16 Apr 2014 8:11pm GMT

Python Sweetness: Portable 9-22x compression of animated GIFs with JPEG+Javascript

Warning: not Python

A friend recently built out a site which amongst other things, in some cases features large pages of animated GIFs. There is perhaps nothing more wasteful of an Internet connection than such a page, especialy when the "animation" is actually continuous tone real colour videos converted from some other format.

[Whoops, removed utterly wrong explanation of GIF compression. GIFs aren't run-length encoded, they use LZW coding, so the description and example that previously appeared here were completely incorrect]

This is pretty much how photos and real-world videos rich in varied tones compress, and so using GIF to encode files like these is a horrible choice.

So why is it popular, then? Well, compatibility of course. GIF has been around since at least the 90s, if not earlier, and has been supported by all browsers for over a decade.

Web Video

Unless you've been living under a rock, you might know that in recent years modern web browsers grew a <video> tag. Great, portable standardized containers for video!

Except it doesn't work like that at all, because politics and money, of course. As can be seen from Video Formats and Browser Support, there is no single video codec that satisfies all popular browsers.

So unless we encode our videos at least twice (doubling at least storage costs), we can't portably support the HTML <video> element. Even if a single encoding was supported by all modern browsers, that still leaves those less fortunate people stuck with ancient browsers out in the cold.

JPEGs

Still, each time I click one of these GIF-heavy pages and waiting 30 seconds for all 50MiB of it to load, I'm left wondering if there is a better way. And so comes a little head scratching, and an even littler proof of concept…

There is another format supported by almost every browser, one that excels at encoding continuously toned images, I am of course talking about JPEG. So how could we reuse JPEG compression to encode video files? With horrible nasty Javascript/CSS hacks, of course!

PoC

My little proof of concept doesn't quite work well for all GIFs yet, though not surprising, since I only spent an hour or so on it. The general idea is:

* Figure out the maximum size of any GIF frame (since GIF frames may be variable)

* Politely ask ImageMagick to render each GIF frame in a tiled composition as a single new JPEG image (example source - 8.4MiB, result - 377KiB)

* Politely ask gifparse to give us the inter-frame delays, then stuff this information alongside (width, height, filename, column count) into a new JSON file (example) for JavaScript to read.

* In JavaScript, create a new <DIV> element with absolute height+width set to the animation's size. Set the DIV's background-image CSS property to point to the JPEG file.

* Instantiate a class that uses the information stored in the JSON file to modify the DIV's background-image-position CSS property at timed intervals, such that all but the image for the current frame is clipped by the DIV's dimensions

* Success! 8.4MiB GIF is now a 377KiB "animated JPEG". You can try out a final rendering here (and full page here). Note that many of the GIFs don't quite render properly yet, and their timing is way off, but I'm certain the output size is representative.

Note also the browser's CPU usage. It seems at least comparable to the same page full of GIFs, which I was quite surprised by. With Firefox, when the page is running in a background tab, CPU time is minimal.

Problems?

No doubt there are issues with doing this in some browsers - for example, at the very least, the produced JPEGs are huge when they are decompressed. For our example GIF, this requires at least 40MiB RAM in the browser to decompress (and possibly 56MiB if the browser stores alpha information too)

In any case, I think there is room to improve on this technique and maybe produce something suitable for a live web site.

The original web page that caused me to think about this had 50MiB of GIF files. Recompressed, they come out as just 6.4MiB of JPEGs.

16 Apr 2014 5:05pm GMT

Python Sweetness: Portable 9-22x compression of animated GIFs with JPEG+Javascript

Warning: not Python

A friend recently built out a site which amongst other things, in some cases features large pages of animated GIFs. There is perhaps nothing more wasteful of an Internet connection than such a page, especialy when the "animation" is actually continuous tone real colour videos converted from some other format.

[Whoops, removed utterly wrong explanation of GIF compression. GIFs aren't run-length encoded, they use LZW coding, so the description and example that previously appeared here were completely incorrect]

This is pretty much how photos and real-world videos rich in varied tones compress, and so using GIF to encode files like these is a horrible choice.

So why is it popular, then? Well, compatibility of course. GIF has been around since at least the 90s, if not earlier, and has been supported by all browsers for over a decade.

Web Video

Unless you've been living under a rock, you might know that in recent years modern web browsers grew a <video> tag. Great, portable standardized containers for video!

Except it doesn't work like that at all, because politics and money, of course. As can be seen from Video Formats and Browser Support, there is no single video codec that satisfies all popular browsers.

So unless we encode our videos at least twice (doubling at least storage costs), we can't portably support the HTML <video> element. Even if a single encoding was supported by all modern browsers, that still leaves those less fortunate people stuck with ancient browsers out in the cold.

JPEGs

Still, each time I click one of these GIF-heavy pages and waiting 30 seconds for all 50MiB of it to load, I'm left wondering if there is a better way. And so comes a little head scratching, and an even littler proof of concept…

There is another format supported by almost every browser, one that excels at encoding continuously toned images, I am of course talking about JPEG. So how could we reuse JPEG compression to encode video files? With horrible nasty Javascript/CSS hacks, of course!

PoC

My little proof of concept doesn't quite work well for all GIFs yet, though not surprising, since I only spent an hour or so on it. The general idea is:

* Figure out the maximum size of any GIF frame (since GIF frames may be variable)

* Politely ask ImageMagick to render each GIF frame in a tiled composition as a single new JPEG image (example source - 8.4MiB, result - 377KiB)

* Politely ask gifparse to give us the inter-frame delays, then stuff this information alongside (width, height, filename, column count) into a new JSON file (example) for JavaScript to read.

* In JavaScript, create a new <DIV> element with absolute height+width set to the animation's size. Set the DIV's background-image CSS property to point to the JPEG file.

* Instantiate a class that uses the information stored in the JSON file to modify the DIV's background-image-position CSS property at timed intervals, such that all but the image for the current frame is clipped by the DIV's dimensions

* Success! 8.4MiB GIF is now a 377KiB "animated JPEG". You can try out a final rendering here (and full page here). Note that many of the GIFs don't quite render properly yet, and their timing is way off, but I'm certain the output size is representative.

Note also the browser's CPU usage. It seems at least comparable to the same page full of GIFs, which I was quite surprised by. With Firefox, when the page is running in a background tab, CPU time is minimal.

Problems?

No doubt there are issues with doing this in some browsers - for example, at the very least, the produced JPEGs are huge when they are decompressed. For our example GIF, this requires at least 40MiB RAM in the browser to decompress (and possibly 56MiB if the browser stores alpha information too)

In any case, I think there is room to improve on this technique and maybe produce something suitable for a live web site.

The original web page that caused me to think about this had 50MiB of GIF files. Recompressed, they come out as just 6.4MiB of JPEGs.

16 Apr 2014 5:05pm GMT

Alexandre Conrad: The painful process of submitting your first patch to OpenStack

Recently, I built a Python tool for SurveyMonkey that hooks up our Github projects to our Jenkins instance by asking a few command-line questions (wizard) and generates the Jenkins job automatically. It literally takes less than a minute to get projects hooked up with the code building on every change / pull request, run tests, coverage, etc. and you don't even have to visit Jenkins' irritating UI.

A lot of the heavy lifting is actually done by Jenkins Job Builder (JJB), a great tool created by the OpenStack InfraTeam on which I rely on. During the development process I did small improvements to JJB and submitting a patch back to OpenStack as a way to say thank you sounded like a no-brainer. Little did I know.

The 27 steps to OpenStack contribution

If I had an OpenStack instructor, this is what I would have been told:

Protip

The following steps illustrate my process on how I eventually succeeded at submitting a patch and I'm confident this is how most wannabe contributors would do it.

  1. Fork the Github project
  2. Hack a patch and submit a pull request.
  3. See the pull request being automatically closed with the message:

    openstack-infra/jenkins-job-builder uses Gerrit for code review.

    Please visit http://wiki.openstack.org/GerritWorkflow and follow the instructions there to upload your change to Gerrit.


  4. Visit the GerritWorkflow page.
  5. Convince yourself that you don't want to read the novel entirely, CTRL+F for smart keywords.
  6. Run out of ideas, give up.
  7. Regret, grab a Red Bull, get back to it, read the novel.
  8. Create a Launchpad.net account (I had one, password recovered).
  9. Join the OpenStack foundation (wat?).
  10. Locate the free membership button and click Join Now!
  11. Skip the legal stuff and find the form fields, name, email...
  12. Wonder what "Add New Affiliation" means. Skip and submit your application.
  13. Oops, you need to add an affiliation. Add an affiliation.
  14. You also need to add an address. Address of what? Run the following Python code to find out:
    python -c 'import random
    print random.choice(["address-of-yourself", "address-of-affiliation"])'
  15. Finally submit your application form and wonder what they could possibly do with your address, it should work.
  16. Return to the GerritWorkflow page.
  17. Ah, upload your SSH key to Gerrit.
  18. pip install git-review
  19. Skip the instructions that don't apply to you but don't skip too much.
  20. Try something. Didn't work? That's because you didn't skip enough.
  21. Understand that you must run git commit --amend because you need a Change-Id line on your commit message which gets generated by git-review.
  22. Finally, run git review! (like git push, but pushes to Gerrit)
  23. Oh wait, now you have to figure out how Gerrit works. It's a code review tool which UI seems to have been inspired from the Jenkins one. Curse.
  24. <squash the numerous understanding-Gerrit-steps into one>
  25. Tweet your experience.
  26. Hope that someone sees your patch.
  27. Iterate with friendly OpenStack developers.

Actually, step 27 is a lot of fun.

Dear OpenStack

Many contributors probably just gave up and you may have lost valuable contributions. This is not a rant about Gerrit itself (ahem), I do understand that it is a code-review tool that you prefer over Github pull requests and that each tool has its learning curve. But you must smoothen the first-commit-to-Gerrit process for newcomers, one way or another. Please consider these improvements:

Please make your on-boarding pleasant, everyone will be rewarded.

Love,
Alex

16 Apr 2014 9:59am GMT

Alexandre Conrad: The painful process of submitting your first patch to OpenStack

Recently, I built a Python tool for SurveyMonkey that hooks up our Github projects to our Jenkins instance by asking a few command-line questions (wizard) and generates the Jenkins job automatically. It literally takes less than a minute to get projects hooked up with the code building on every change / pull request, run tests, coverage, etc. and you don't even have to visit Jenkins' irritating UI.

A lot of the heavy lifting is actually done by Jenkins Job Builder (JJB), a great tool created by the OpenStack InfraTeam on which I rely on. During the development process I did small improvements to JJB and submitting a patch back to OpenStack as a way to say thank you sounded like a no-brainer. Little did I know.

The 27 steps to OpenStack contribution

If I had an OpenStack instructor, this is what I would have been told:

Protip

The following steps illustrate my process on how I eventually succeeded at submitting a patch and I'm confident this is how most wannabe contributors would do it.

  1. Fork the Github project
  2. Hack a patch and submit a pull request.
  3. See the pull request being automatically closed with the message:

    openstack-infra/jenkins-job-builder uses Gerrit for code review.

    Please visit http://wiki.openstack.org/GerritWorkflow and follow the instructions there to upload your change to Gerrit.


  4. Visit the GerritWorkflow page.
  5. Convince yourself that you don't want to read the novel entirely, CTRL+F for smart keywords.
  6. Run out of ideas, give up.
  7. Regret, grab a Red Bull, get back to it, read the novel.
  8. Create a Launchpad.net account (I had one, password recovered).
  9. Join the OpenStack foundation (wat?).
  10. Locate the free membership button and click Join Now!
  11. Skip the legal stuff and find the form fields, name, email...
  12. Wonder what "Add New Affiliation" means. Skip and submit your application.
  13. Oops, you need to add an affiliation. Add an affiliation.
  14. You also need to add an address. Address of what? Run the following Python code to find out:
    python -c 'import random
    print random.choice(["address-of-yourself", "address-of-affiliation"])'
  15. Finally submit your application form and wonder what they could possibly do with your address, it should work.
  16. Return to the GerritWorkflow page.
  17. Ah, upload your SSH key to Gerrit.
  18. pip install git-review
  19. Skip the instructions that don't apply to you but don't skip too much.
  20. Try something. Didn't work? That's because you didn't skip enough.
  21. Understand that you must run git commit --amend because you need a Change-Id line on your commit message which gets generated by git-review.
  22. Finally, run git review! (like git push, but pushes to Gerrit)
  23. Oh wait, now you have to figure out how Gerrit works. It's a code review tool which UI seems to have been inspired from the Jenkins one. Curse.
  24. <squash the numerous understanding-Gerrit-steps into one>
  25. Tweet your experience.
  26. Hope that someone sees your patch.
  27. Iterate with friendly OpenStack developers.

Actually, step 27 is a lot of fun.

Dear OpenStack

Many contributors probably just gave up and you may have lost valuable contributions. This is not a rant about Gerrit itself (ahem), I do understand that it is a code-review tool that you prefer over Github pull requests and that each tool has its learning curve. But you must smoothen the first-commit-to-Gerrit process for newcomers, one way or another. Please consider these improvements:

Please make your on-boarding pleasant, everyone will be rewarded.

Love,
Alex

16 Apr 2014 9:59am GMT

15 Apr 2014

feedPlanet Python

PyPy Development: NumPy on PyPy - Status Update

Work on NumPy on PyPy continued in March, though at a lighter pace than the previous few months. Progress was made on both compatibility and speed fronts. Several behavioral issues reported to the bug tracker were resolved. The most significant of these was probably the correction of casting to built-in Python types. Previously, int/long conversions of numpy scalars such as inf/nan/1e100 would return bogus results. Now, they raise or return values, as appropriate.

On the speed front, enhancements to the PyPy JIT were made to support virtualizing the raw_store/raw_load memory operations used in numpy arrays. Further work remains here in virtualizing the alloc_raw_storage when possible. This will allow scalars to have storages but still be virtualized when possible in loops.

Aside from continued work on compatibility/speed of existing code, we also hope to begin implementing the C-level components of other numpy modules such as mtrand, nditer, linalg, and so on. Several approaches could be taken to get C-level code in these modules working, ranging from reimplementing in RPython to interfacing with existing code with CFFI, if possible. The appropriate approach depends on many factors and will probably vary from module to module.

To try out PyPy + NumPy, grab a nightly PyPy and install our NumPy fork. Feel free to report comments/issues to IRC, our mailing list, or bug tracker. Thanks to the contributors to the NumPy on PyPy proposal for supporting this work.

15 Apr 2014 10:08pm GMT

PyPy Development: NumPy on PyPy - Status Update

Work on NumPy on PyPy continued in March, though at a lighter pace than the previous few months. Progress was made on both compatibility and speed fronts. Several behavioral issues reported to the bug tracker were resolved. The most significant of these was probably the correction of casting to built-in Python types. Previously, int/long conversions of numpy scalars such as inf/nan/1e100 would return bogus results. Now, they raise or return values, as appropriate.

On the speed front, enhancements to the PyPy JIT were made to support virtualizing the raw_store/raw_load memory operations used in numpy arrays. Further work remains here in virtualizing the alloc_raw_storage when possible. This will allow scalars to have storages but still be virtualized when possible in loops.

Aside from continued work on compatibility/speed of existing code, we also hope to begin implementing the C-level components of other numpy modules such as mtrand, nditer, linalg, and so on. Several approaches could be taken to get C-level code in these modules working, ranging from reimplementing in RPython to interfacing with existing code with CFFI, if possible. The appropriate approach depends on many factors and will probably vary from module to module.

To try out PyPy + NumPy, grab a nightly PyPy and install our NumPy fork. Feel free to report comments/issues to IRC, our mailing list, or bug tracker. Thanks to the contributors to the NumPy on PyPy proposal for supporting this work.

15 Apr 2014 10:08pm GMT

Europython: Code of Conduct

EuroPython 2014 is a community conference intended for networking and collaboration in the developer community.

We value the participation of each member of the Python community and want all attendees to have an enjoyable and fulfilling experience. Accordingly, all attendees are expected to show respect and courtesy to other attendees throughout the conference and at all conference events, whether officially sponsored by EuroPython 2014 or not.

To make clear what is expected, all delegates/attendees, speakers, exhibitors, organisers and volunteers at any EuroPython 2014 event are required to conform to the following Code of Conduct. organisers will enforce this code throughout the event.

The Short Version

EuroPython 2014 is dedicated to providing a harassment-free conference experience for everyone, regardless of gender, sexual orientation, disability, physical appearance, body size, race, or religion. We do not tolerate harassment of conference participants in any form.

All communication should be appropriate for a professional audience including people of many different backgrounds. Sexual language and imagery is not appropriate for any conference venue, including talks.

Be kind to others. Do not insult or put down other attendees. Behave professionally. Remember that harassment and sexist, racist, or exclusionary jokes are not appropriate for EuroPython 2014.

Attendees violating these rules may be asked to leave the conference without a refund at the sole discretion of the conference organisers.

Thank you for helping make this a welcoming, friendly event for all.

The Long Version

Harassment includes offensive verbal comments related to gender, sexual orientation, disability, physical appearance, body size, race, religion, sexual images in public spaces, deliberate intimidation, stalking, following, harassing photography or recording, sustained disruption of talks or other events, inappropriate physical contact, and unwelcome sexual attention.

Participants asked to stop any harassing behavior are expected to comply immediately.

Exhibitors in the expo hall, sponsor or vendor booths, or similar activities are also subject to the anti-harassment policy. In particular, exhibitors should not use sexualized images, activities, or other material. Booth staff (including volunteers) should not use sexualized clothing/uniforms/costumes, or otherwise create a sexualized environment.

Be careful in the words that you choose. Remember that sexist, racist, and other exclusionary jokes can be offensive to those around you. Excessive swearing and offensive jokes are not appropriate for EuroPython 2014.

If a participant engages in behavior that violates this code of conduct, the conference organisers may take any action they deem appropriate, including warning the offender or expulsion from the conference with no refund.

The full Code of Conduct text including contact information can be found here.

This text is based on the Code Of Conduct text by PyCon IE which is based on the original PSF Code of Conduct.

15 Apr 2014 7:55pm GMT

Europython: Code of Conduct

EuroPython 2014 is a community conference intended for networking and collaboration in the developer community.

We value the participation of each member of the Python community and want all attendees to have an enjoyable and fulfilling experience. Accordingly, all attendees are expected to show respect and courtesy to other attendees throughout the conference and at all conference events, whether officially sponsored by EuroPython 2014 or not.

To make clear what is expected, all delegates/attendees, speakers, exhibitors, organisers and volunteers at any EuroPython 2014 event are required to conform to the following Code of Conduct. organisers will enforce this code throughout the event.

The Short Version

EuroPython 2014 is dedicated to providing a harassment-free conference experience for everyone, regardless of gender, sexual orientation, disability, physical appearance, body size, race, or religion. We do not tolerate harassment of conference participants in any form.

All communication should be appropriate for a professional audience including people of many different backgrounds. Sexual language and imagery is not appropriate for any conference venue, including talks.

Be kind to others. Do not insult or put down other attendees. Behave professionally. Remember that harassment and sexist, racist, or exclusionary jokes are not appropriate for EuroPython 2014.

Attendees violating these rules may be asked to leave the conference without a refund at the sole discretion of the conference organisers.

Thank you for helping make this a welcoming, friendly event for all.

The Long Version

Harassment includes offensive verbal comments related to gender, sexual orientation, disability, physical appearance, body size, race, religion, sexual images in public spaces, deliberate intimidation, stalking, following, harassing photography or recording, sustained disruption of talks or other events, inappropriate physical contact, and unwelcome sexual attention.

Participants asked to stop any harassing behavior are expected to comply immediately.

Exhibitors in the expo hall, sponsor or vendor booths, or similar activities are also subject to the anti-harassment policy. In particular, exhibitors should not use sexualized images, activities, or other material. Booth staff (including volunteers) should not use sexualized clothing/uniforms/costumes, or otherwise create a sexualized environment.

Be careful in the words that you choose. Remember that sexist, racist, and other exclusionary jokes can be offensive to those around you. Excessive swearing and offensive jokes are not appropriate for EuroPython 2014.

If a participant engages in behavior that violates this code of conduct, the conference organisers may take any action they deem appropriate, including warning the offender or expulsion from the conference with no refund.

The full Code of Conduct text including contact information can be found here.

This text is based on the Code Of Conduct text by PyCon IE which is based on the original PSF Code of Conduct.

15 Apr 2014 7:55pm GMT

Python Diary: Many great talks and swag to be had

PyCon 2014 was a great experience to be had. There were many fascinating talks and so much to see and do at the convention. I also performed a Lightning Talk on server security giving some tips and tricks I use on a daily basis, which is based on my Debian Diary article I wrote a couple months back.

Some of the larger swag items I nabbed was a book titled, Hacking: The art of exploitation, and another called Core Python Application Programming which was signed by the author. I am really excited about reading both of these books.

I went to a couple very interesting talks as well.

15 Apr 2014 5:47pm GMT

Python Diary: Many great talks and swag to be had

PyCon 2014 was a great experience to be had. There were many fascinating talks and so much to see and do at the convention. I also performed a Lightning Talk on server security giving some tips and tricks I use on a daily basis, which is based on my Debian Diary article I wrote a couple months back.

Some of the larger swag items I nabbed was a book titled, Hacking: The art of exploitation, and another called Core Python Application Programming which was signed by the author. I am really excited about reading both of these books.

I went to a couple very interesting talks as well.

15 Apr 2014 5:47pm GMT

Jeff Knupp: How 'DevOps' is Killing the Developer

There are two recent trends I really hate: DevOps and the notion of the "full-stack" developer. The DevOps movement is so popular that I may as well say I hate the x86 architecture or monolithic kernels. But it's true: I can't stand it. The underlying cause of my pain? This fact: not every company is a start-up, though it appears that every company must act as though they were.

DevOps

"DevOps" is meant to denote a close collaboration and cross-pollination between what were previously purely development roles, purely operations roles, and purely QA roles. Because software needs to be released at an ever-increasing rate, the old "waterfall" develop-test-release cycle is seen as broken. Developers must also take responsibility for the quality of the testing and release environments.

The increasing scope of responsibility of the "developer" (whether or not that term is even appropriate anymore is debatable) has given rise to a chimera-like job candidate: the "full-stack" developer. Such a developer is capable of doing the job of developer, QA team member, operations analyst, sysadmin, and DBA. Before you accuse me of hyperbole, go back and read that list again. Is there any role in the list whose duties you wouldn't expect a "full-stack" developer to be well versed in?

Where did these concepts come from? Start-ups, of course (and the Agile methodology). Start-ups are a peculiar beast and need to function in a very lean way to survive their first few years. I don't deny this. Unfortunately, we've taken the multiple technical roles that engineers at start-ups were forced to play due to lack of resources into a set of minimum qualifications for the role of "developer".

Many Hats

Imagine you're at a start-up with a development team of seven. You're one year into development of a web applications that X's all the Y's and things are going well, though it's always a frantic scramble to keep everything going. If there's a particularly nasty issue that seems to require deep database knowledge, you don't have the liberty of saying "that's not my specialty," and handing it off to a DBA team to investigate. Due to constrained resources, you're forced to take on the role of DBA and fix the issue yourself.

Now expand that scenario across all the roles listed earlier. At any one time, a developer at a start-up may be acting as a developer, QA tester, deployment/operations analyst, sysadmin, or DBA. That's just the nature of the business, and some people thrive in that type of environment. Somewhere along the way, however, we tricked ourselves into thinking that because, at any one time, a start-up developer had to take on different roles he or she should actually be all those things at once.

If such people even existed, "full-stack" developers still wouldn't be used as they should. Rather than temporarily taking on a single role for a short period of time, then transitioning into the next role, they are meant to be performing all the roles, all the time. And here's what really sucks: most good developers can almost pull this off.

The Totem Pole

Good developers are smart people. I know I'm going to get a ton of hate mail, but there is a hierarchy of usefulness of technology roles in an organization. Developer is at the top, followed by sysadmin and DBA. QA teams, "operations" people, release coordinators and the like are at the bottom of the totem pole. Why is it arranged like this?

Because each role can do the job of all roles below it if necessary.

Start-ups taught us this. Good developers can be passable DBAs if need be. They make decent testers, "deployment engineers", and whatever other ridiculous term you'd like to use. Their job requires them to know much of the domain of "lower" roles. There's one big problem with this, and hopefully by now you see it:

It doesn't work in the opposite direction.

A QA person can't just do the job of a developer in a pinch, nor can a build-engineer do the job of a DBA. They never acquired the specialized knowledge required to perform the role. And that's fine. Like it or not, there are hierarchies in every organization, and people have different skill sets and levels of ability. However, when you make developers take on other roles, you don't have anyone to take on the role of development!

An example will make this more clear. My dad is a dentist running his own practice. He employs a secretary, hygienist, and dental assistant. Under some sort of "DentOps" movement, my dad would be making appointments and cleaning people's teeth while trying to find time to drill cavities, perform root canals, etc. My dad can do all of the other jobs in his office, because he has all the specialized knowledge required to do so.

But no one, not even all of his employees combined, can do his job.

Such a movement does a disservice to everyone involved, except (of course) employers. What began as an experiment aimed at increasing software quality has become a farce, where the most talented employees are overworked (while doing less, less useful work) and lower-level positions simply don't exist.

And this is the crux of the issue. All of the positions previously held by people of various levels of ability are made redundant by the "full-stack" engineer. Large companies love this, as it means they can hire far fewer people to do the same amount of work. In the process, though, actual development becomes a vanishingly small part of a developer's job. This is why we see so many developers that can't pass FizzBuzz: they never really had to write any code. All too common a question now, can you imagine interviewing a chef and asking him what portion of the day he actually devotes to cooking?

Jack of All Trades, Master of None

If you are a developer of moderately sized software, you need a deployment system in place. Quick, what are the benefits and drawbacks of the following such systems: Puppet, Chef, Salt, Ansible, Vagrant, Docker. Now implement your deployment solution! Did you even realize which systems had no business being in that list?

We specialize for a reason: human beings are only capable of retaining so much knowledge. Task-switching is cognitively expensive. Forcing developers to take on additional roles traditionally performed by specialists means that they:

What's more, by forcing developers to take on "full-stack" responsibilities, they are paying their employees far more than the market average for most of those tasks. If a developer makes 100K a year, you can pay four developers 100K per year to do 50% development and 50% release management on a single, two-person task. Or, simply hire a release manager at, say, 75K and two developers who develop full-time. And notice the time wasted by developers who are part time release-managers but don't always have releases to manage.

Don't Kill the Developer

The effect of all of this is to destroy the role of "developer" and replace it with a sort of "technology utility-player". Every developer I know got into programming because they actually enjoyed doing it (at one point). You do a disservice to everyone involved when you force your brightest people to take on additional roles.

Not every company is a start-up. Start-ups don't make developers wear multiple hats by choice, they do so out of necessity. Your company likely has enough resource constraints without you inventing some. Please, don't confuse "being lean" with "running with the fewest possible employees". And for God's sake, let developers write code!

15 Apr 2014 12:21pm GMT

Jeff Knupp: How 'DevOps' is Killing the Developer

There are two recent trends I really hate: DevOps and the notion of the "full-stack" developer. The DevOps movement is so popular that I may as well say I hate the x86 architecture or monolithic kernels. But it's true: I can't stand it. The underlying cause of my pain? This fact: not every company is a start-up, though it appears that every company must act as though they were.

DevOps

"DevOps" is meant to denote a close collaboration and cross-pollination between what were previously purely development roles, purely operations roles, and purely QA roles. Because software needs to be released at an ever-increasing rate, the old "waterfall" develop-test-release cycle is seen as broken. Developers must also take responsibility for the quality of the testing and release environments.

The increasing scope of responsibility of the "developer" (whether or not that term is even appropriate anymore is debatable) has given rise to a chimera-like job candidate: the "full-stack" developer. Such a developer is capable of doing the job of developer, QA team member, operations analyst, sysadmin, and DBA. Before you accuse me of hyperbole, go back and read that list again. Is there any role in the list whose duties you wouldn't expect a "full-stack" developer to be well versed in?

Where did these concepts come from? Start-ups, of course (and the Agile methodology). Start-ups are a peculiar beast and need to function in a very lean way to survive their first few years. I don't deny this. Unfortunately, we've taken the multiple technical roles that engineers at start-ups were forced to play due to lack of resources into a set of minimum qualifications for the role of "developer".

Many Hats

Imagine you're at a start-up with a development team of seven. You're one year into development of a web applications that X's all the Y's and things are going well, though it's always a frantic scramble to keep everything going. If there's a particularly nasty issue that seems to require deep database knowledge, you don't have the liberty of saying "that's not my specialty," and handing it off to a DBA team to investigate. Due to constrained resources, you're forced to take on the role of DBA and fix the issue yourself.

Now expand that scenario across all the roles listed earlier. At any one time, a developer at a start-up may be acting as a developer, QA tester, deployment/operations analyst, sysadmin, or DBA. That's just the nature of the business, and some people thrive in that type of environment. Somewhere along the way, however, we tricked ourselves into thinking that because, at any one time, a start-up developer had to take on different roles he or she should actually be all those things at once.

If such people even existed, "full-stack" developers still wouldn't be used as they should. Rather than temporarily taking on a single role for a short period of time, then transitioning into the next role, they are meant to be performing all the roles, all the time. And here's what really sucks: most good developers can almost pull this off.

The Totem Pole

Good developers are smart people. I know I'm going to get a ton of hate mail, but there is a hierarchy of usefulness of technology roles in an organization. Developer is at the top, followed by sysadmin and DBA. QA teams, "operations" people, release coordinators and the like are at the bottom of the totem pole. Why is it arranged like this?

Because each role can do the job of all roles below it if necessary.

Start-ups taught us this. Good developers can be passable DBAs if need be. They make decent testers, "deployment engineers", and whatever other ridiculous term you'd like to use. Their job requires them to know much of the domain of "lower" roles. There's one big problem with this, and hopefully by now you see it:

It doesn't work in the opposite direction.

A QA person can't just do the job of a developer in a pinch, nor can a build-engineer do the job of a DBA. They never acquired the specialized knowledge required to perform the role. And that's fine. Like it or not, there are hierarchies in every organization, and people have different skill sets and levels of ability. However, when you make developers take on other roles, you don't have anyone to take on the role of development!

An example will make this more clear. My dad is a dentist running his own practice. He employs a secretary, hygienist, and dental assistant. Under some sort of "DentOps" movement, my dad would be making appointments and cleaning people's teeth while trying to find time to drill cavities, perform root canals, etc. My dad can do all of the other jobs in his office, because he has all the specialized knowledge required to do so.

But no one, not even all of his employees combined, can do his job.

Such a movement does a disservice to everyone involved, except (of course) employers. What began as an experiment aimed at increasing software quality has become a farce, where the most talented employees are overworked (while doing less, less useful work) and lower-level positions simply don't exist.

And this is the crux of the issue. All of the positions previously held by people of various levels of ability are made redundant by the "full-stack" engineer. Large companies love this, as it means they can hire far fewer people to do the same amount of work. In the process, though, actual development becomes a vanishingly small part of a developer's job. This is why we see so many developers that can't pass FizzBuzz: they never really had to write any code. All too common a question now, can you imagine interviewing a chef and asking him what portion of the day he actually devotes to cooking?

Jack of All Trades, Master of None

If you are a developer of moderately sized software, you need a deployment system in place. Quick, what are the benefits and drawbacks of the following such systems: Puppet, Chef, Salt, Ansible, Vagrant, Docker. Now implement your deployment solution! Did you even realize which systems had no business being in that list?

We specialize for a reason: human beings are only capable of retaining so much knowledge. Task-switching is cognitively expensive. Forcing developers to take on additional roles traditionally performed by specialists means that they:

What's more, by forcing developers to take on "full-stack" responsibilities, they are paying their employees far more than the market average for most of those tasks. If a developer makes 100K a year, you can pay four developers 100K per year to do 50% development and 50% release management on a single, two-person task. Or, simply hire a release manager at, say, 75K and two developers who develop full-time. And notice the time wasted by developers who are part time release-managers but don't always have releases to manage.

Don't Kill the Developer

The effect of all of this is to destroy the role of "developer" and replace it with a sort of "technology utility-player". Every developer I know got into programming because they actually enjoyed doing it (at one point). You do a disservice to everyone involved when you force your brightest people to take on additional roles.

Not every company is a start-up. Start-ups don't make developers wear multiple hats by choice, they do so out of necessity. Your company likely has enough resource constraints without you inventing some. Please, don't confuse "being lean" with "running with the fewest possible employees". And for God's sake, let developers write code!

15 Apr 2014 12:21pm GMT

Andy Todd: Generating Reasonable Passwords with Python

Thanks to a certain recent Open SSL bug there's been a lot of attention paid to passwords in the media. I've been using KeePassX to manage my passwords for the last few years so it's easy for me to find accounts that I should update. It's also a good opportunity to use stronger passwords than 'banana'.

My problem is that I have always resisted the generation function in KeePassX because the resulting strings are very hard to remember and transcribe. This isn't an issue if you always use one machine but I tend to chop and change and don't always have my password database on the machine I'm using. I usually have a copy on my phone but successfully typing 'Gh46^f27EEGR1p{' is a hit and miss affair for me. So I prefer passwords that are long but easy to remember, not unlike the advice from XKCD.

Which leaves a problem. Given that I now have to change quite a lot of passwords how can I create suitably random passwords that aren't too difficult to remember or transcribe? Quite coincidentally I read an article titled "Using Vim as a password manager". The advice within it is quite sound and at the bottom there is a Python function to generate a password from word lists (in this case the system dictionary). This does a nice job with the caveat that it I understand from a cryptographic standpoint the passwords it creates are not that strong. But useful enough for sites which aren't my bank or primary email. For those I'm using stupidly long values generated from KeePassX. When I tried the Python function on my machine there was one drawback, it doesn't work in Python 3. This is because the use of 'map' is discouraged in Python 3. But that's alright because I can replace it with one of my favourite Python constructs - the list comprehension. Here is an updated version of invert's function that works in Python 3. Use at your own risk.


def get_password():
    import random
    # Make a list of all of the words in our system dictionary
    f = open('/usr/share/dict/words')
    words = [x.strip() for x in f.readlines()]
    # Pick 2 random words from the list
    password = '-'.join(random.choice(words) for i in range(2)).capitalize()
    # Remove any apostrophes
    password = password.replace("'", "")
    # Add a random number to the end of our password
    password += str(random.randint(1, 9999))
    return password

15 Apr 2014 6:33am GMT

Andy Todd: Generating Reasonable Passwords with Python

Thanks to a certain recent Open SSL bug there's been a lot of attention paid to passwords in the media. I've been using KeePassX to manage my passwords for the last few years so it's easy for me to find accounts that I should update. It's also a good opportunity to use stronger passwords than 'banana'.

My problem is that I have always resisted the generation function in KeePassX because the resulting strings are very hard to remember and transcribe. This isn't an issue if you always use one machine but I tend to chop and change and don't always have my password database on the machine I'm using. I usually have a copy on my phone but successfully typing 'Gh46^f27EEGR1p{' is a hit and miss affair for me. So I prefer passwords that are long but easy to remember, not unlike the advice from XKCD.

Which leaves a problem. Given that I now have to change quite a lot of passwords how can I create suitably random passwords that aren't too difficult to remember or transcribe? Quite coincidentally I read an article titled "Using Vim as a password manager". The advice within it is quite sound and at the bottom there is a Python function to generate a password from word lists (in this case the system dictionary). This does a nice job with the caveat that it I understand from a cryptographic standpoint the passwords it creates are not that strong. But useful enough for sites which aren't my bank or primary email. For those I'm using stupidly long values generated from KeePassX. When I tried the Python function on my machine there was one drawback, it doesn't work in Python 3. This is because the use of 'map' is discouraged in Python 3. But that's alright because I can replace it with one of my favourite Python constructs - the list comprehension. Here is an updated version of invert's function that works in Python 3. Use at your own risk.


def get_password():
    import random
    # Make a list of all of the words in our system dictionary
    f = open('/usr/share/dict/words')
    words = [x.strip() for x in f.readlines()]
    # Pick 2 random words from the list
    password = '-'.join(random.choice(words) for i in range(2)).capitalize()
    # Remove any apostrophes
    password = password.replace("'", "")
    # Add a random number to the end of our password
    password += str(random.randint(1, 9999))
    return password

15 Apr 2014 6:33am GMT

14 Apr 2014

feedPlanet Python

Andre Roberge: Reeborg knows multiple programming languages

I wish I were in Montreal to visit my daughter, eat some delicious Saint-Sauveur bagels for breakfast, a good La Banquise poutine and some Montreal Smoked Meat for lunch... and, of course, attend Pycon. Alas....

In the meantime, a quick update: Reeborg now knows Python, Javascript and CoffeeScript. The old tutorials are gone as Reeborg's World has seen too many changes. I now am in the process of writing the following tutorials, all using Reeborg's world as the test environment

  1. A quick introduction to Python (for people that know programming in another language)
  2. A quick introduction to Javascript (same as above)
  3. A quick introduction to CoffeeScript (same as above)
  4. An introduction to programming using Python, for absolute beginners
  5. An introduction to programming using Javascript, for absolute beginners
  6. An introduction to Object-Oriented Programming concepts using Python
  7. An introduction to Object-Oriented Programming concepts using Javascript
Note that I have two "versions" of Javascript, one that uses JSHint to enforce good programming practices (and runs the code with "use strict"; option) and one that is the normal permissive Javascript.

If anyone knows of any other transpilers written in Javascript that can convert code client-side from language X into Javascript (like Brython does for Python, or CoffeeScript does naturally), I would be interested in adding them as additional options.

14 Apr 2014 11:31pm GMT

Andre Roberge: Reeborg knows multiple programming languages

I wish I were in Montreal to visit my daughter, eat some delicious Saint-Sauveur bagels for breakfast, a good La Banquise poutine and some Montreal Smoked Meat for lunch... and, of course, attend Pycon. Alas....

In the meantime, a quick update: Reeborg now knows Python, Javascript and CoffeeScript. The old tutorials are gone as Reeborg's World has seen too many changes. I now am in the process of writing the following tutorials, all using Reeborg's world as the test environment

  1. A quick introduction to Python (for people that know programming in another language)
  2. A quick introduction to Javascript (same as above)
  3. A quick introduction to CoffeeScript (same as above)
  4. An introduction to programming using Python, for absolute beginners
  5. An introduction to programming using Javascript, for absolute beginners
  6. An introduction to Object-Oriented Programming concepts using Python
  7. An introduction to Object-Oriented Programming concepts using Javascript
Note that I have two "versions" of Javascript, one that uses JSHint to enforce good programming practices (and runs the code with "use strict"; option) and one that is the normal permissive Javascript.

If anyone knows of any other transpilers written in Javascript that can convert code client-side from language X into Javascript (like Brython does for Python, or CoffeeScript does naturally), I would be interested in adding them as additional options.

14 Apr 2014 11:31pm GMT

Mike Driscoll: Miss PyCon 2014? Watch the Replay!

If you're like me, you missed PyCon North America 2014 this year. It happened last weekend. While the main conference days are over, the code sprints are still running. Anyway, for those of you who missed PyCon, they have released a bunch of videos on pyvideo! Every year, they seem to get the videos out faster than the last. I think that's pretty awesome myself. I'm looking forward to watching a few of these so I can see what I missed.

14 Apr 2014 4:45pm GMT

Mike Driscoll: Miss PyCon 2014? Watch the Replay!

If you're like me, you missed PyCon North America 2014 this year. It happened last weekend. While the main conference days are over, the code sprints are still running. Anyway, for those of you who missed PyCon, they have released a bunch of videos on pyvideo! Every year, they seem to get the videos out faster than the last. I think that's pretty awesome myself. I'm looking forward to watching a few of these so I can see what I missed.

14 Apr 2014 4:45pm GMT

Machinalis: Migrating data into your Django project

There are times when we have an existing, legacy, DB and we need to migrate its data into our Django application. In this post I'll share a technique that we successfully applied for this.

Working on a big project, our client had an existing application using a MySQL DB. Our objective was to develop a new, more modern, feature-rich, Django 1.5-based version of his tool. At a certain stage of the development our client requested that we migrate some of the current users' data into the new system, so we could move to a beta-testing phase.

The method that we applied not only allowed us to effectively migrate dozens of users to the new system, but also we could keep doing migrations as the application continued its development.

General description

We based our work in two very powerful Django's features:

  1. Multiple databases and
  2. Integrating Django with a legacy database

So, the general procedure would be:

  1. Add a new, legacy database to your project.

  2. Create a legacy app.
    • Automatically generate the models
    • Set up a DB router.
  3. Write your migration script.

Let's describe each step a little bit more:

1. A legacy database

We assume here that you have access to the legacy DB. In our particular case, before each migration our client will give us a MySQL dump of the legacy DB. So we create a fresh legacydb in our own DB server and import the dump, every time.

However, it doesn't matter how you access the legacy DB as long as you can do it from Django. So, following the Multiple databases approach, you must edit the project's settings.py and add the legacy database. For example like this:

DATABASES = {
    'default': {
        'NAME': 'projectdb',
        'ENGINE': 'django.db.backends.mysql',
        'USER': 'some_user',
        'PASSWORD': '123'
        },
    'legacy': {
        'NAME': 'legacydb',
        'ENGINE': 'django.db.backends.mysql',
        'USER': 'other_user',
        'PASSWORD': '456'
    }
}

Depending on your objectives regarding the migration, this settings can be set either in your standard project's settings.py file or in a different, special, settings file to be used only during extraordinary migrations.

2. A legacy app

The general idea here is that you start a new app that will represent your legacy data. All the work (other than the settings) will be done within this app. Thus, you can keep it in a different branch (maintain the migration feature isolated) and continue the development process normally.

inspectdb

Now, the key for this step is to follow the Integrating Django with a legacy database document. By using the admin's inspectdb command the models.py file can be automatically generated!.

$ mkdir apps/legacy
$ python manage.py startapp legacy apps/legacy/
$ python manage.py inspectdb --database=legacy > apps/legacy/models.py

Anyways, as the documentation says:

This feature is meant as a shortcut, not as definitive model generation. After you run it, you'll want to look over the generated models yourself to make customizations.

In our particular case, it worked like a charm and only cosmetic modifications were needed!

Database router

Next, a database router must be provided. It is Django's mechanism to match objects with their original database.

Django's default routing scheme ensures that if a database isn't specified, all queries fall back to the default database. In our case, we will make sure that objects from the legacy app are taken from its corresponding DB (and make it read-only). An example router would be:

# Specific router to point all read-operations on legacy models to the
# 'legacy' DB.
# Forbid write-operations and syncdb.


class LegacyRouter(object):

    def db_for_read(self, model, **hints):
        """Point all operations on legacy models to the 'legacy' DB."""
        if model._meta.app_label == 'legacy':
            return 'legacy'
        return 'default'

    def db_for_write(self, model, **hints):
        """Our 'legacy' DB is read-only."""
        return False

    def allow_relation(self, obj1, obj2, **hints):
        """Forbid relations from/to Legacy to/from other app."""
        obj1_is_legacy = (obj1._meta.app_label == 'legacy')
        obj2_is_legacy = (obj2._meta.app_label == 'legacy')
        return obj1_is_legacy == obj2_is_legacy

    def allow_syncdb(self, db, model):
        return db != 'legacy' and model._meta.app_label != 'legacy'

Finally, to use the router you'll need to add it to your settings.py file.

DATABASE_ROUTERS = ['apps.legacy.router.LegacyRouter']

Now you are ready to access your legacy data using Django's ORM. Open the shell, import your legacy models and play around!

For a more detailed example of this technique applied, check this other blog post. It is based on Django 1.3 but still useful.

3. Your migration script

At this point you have access to the legacy data using Django's ORM. Now it is time to write the actual migration script. There is no magic nor much automation here: you know your data model and (hopefully) the legacy DB structure. It is in your hands to create your system's models instances and their relations.

In our case, we wrote an export.py script that we manually run from the command line whenever we need.

It's a really good idea to perform the migration inside a single transaction. Otherwise, any error while running the migration script will let you with a partial (and possible inconsistent migration) and will force you to write complex logic to be able to resume it. The @transaction.commit_on_success decorator is a good way to achieve the desired effect. As a helpful side effect, it will also be faster to do a single commit.

Conclusions

As a general data-migration technique for Django applications, it has several advantages:

  • allows to migrate lots of data,
  • can be used with immature or changing data-models,
  • relies on standard Django's features (ORM, Multiple databases),
  • the project's testing infrastructure can be used normally,
  • can be used for one-time-only migration scripts as well as for continuous-migration's features,
  • it can be applied in case of multiple and heterogeneous data sources.

On the other side, as usual, it is no silver bullet. One of the main problems here is that the complexity of the task is directly proportional to the difference between the DB models. Since the actual data manipulation must be programmed manually, very different data models potentially means a lot of work.

So, as stated in the beginning of the post: the method allowed us to successfully migrate a considerable amount of data into our system, allowing us to accommodate to changes as the application continued its development.

14 Apr 2014 2:52pm GMT

Machinalis: Migrating data into your Django project

There are times when we have an existing, legacy, DB and we need to migrate its data into our Django application. In this post I'll share a technique that we successfully applied for this.

Working on a big project, our client had an existing application using a MySQL DB. Our objective was to develop a new, more modern, feature-rich, Django 1.5-based version of his tool. At a certain stage of the development our client requested that we migrate some of the current users' data into the new system, so we could move to a beta-testing phase.

The method that we applied not only allowed us to effectively migrate dozens of users to the new system, but also we could keep doing migrations as the application continued its development.

General description

We based our work in two very powerful Django's features:

  1. Multiple databases and
  2. Integrating Django with a legacy database

So, the general procedure would be:

  1. Add a new, legacy database to your project.

  2. Create a legacy app.
    • Automatically generate the models
    • Set up a DB router.
  3. Write your migration script.

Let's describe each step a little bit more:

1. A legacy database

We assume here that you have access to the legacy DB. In our particular case, before each migration our client will give us a MySQL dump of the legacy DB. So we create a fresh legacydb in our own DB server and import the dump, every time.

However, it doesn't matter how you access the legacy DB as long as you can do it from Django. So, following the Multiple databases approach, you must edit the project's settings.py and add the legacy database. For example like this:

DATABASES = {
    'default': {
        'NAME': 'projectdb',
        'ENGINE': 'django.db.backends.mysql',
        'USER': 'some_user',
        'PASSWORD': '123'
        },
    'legacy': {
        'NAME': 'legacydb',
        'ENGINE': 'django.db.backends.mysql',
        'USER': 'other_user',
        'PASSWORD': '456'
    }
}

Depending on your objectives regarding the migration, this settings can be set either in your standard project's settings.py file or in a different, special, settings file to be used only during extraordinary migrations.

2. A legacy app

The general idea here is that you start a new app that will represent your legacy data. All the work (other than the settings) will be done within this app. Thus, you can keep it in a different branch (maintain the migration feature isolated) and continue the development process normally.

inspectdb

Now, the key for this step is to follow the Integrating Django with a legacy database document. By using the admin's inspectdb command the models.py file can be automatically generated!.

$ mkdir apps/legacy
$ python manage.py startapp legacy apps/legacy/
$ python manage.py inspectdb --database=legacy > apps/legacy/models.py

Anyways, as the documentation says:

This feature is meant as a shortcut, not as definitive model generation. After you run it, you'll want to look over the generated models yourself to make customizations.

In our particular case, it worked like a charm and only cosmetic modifications were needed!

Database router

Next, a database router must be provided. It is Django's mechanism to match objects with their original database.

Django's default routing scheme ensures that if a database isn't specified, all queries fall back to the default database. In our case, we will make sure that objects from the legacy app are taken from its corresponding DB (and make it read-only). An example router would be:

# Specific router to point all read-operations on legacy models to the
# 'legacy' DB.
# Forbid write-operations and syncdb.


class LegacyRouter(object):

    def db_for_read(self, model, **hints):
        """Point all operations on legacy models to the 'legacy' DB."""
        if model._meta.app_label == 'legacy':
            return 'legacy'
        return 'default'

    def db_for_write(self, model, **hints):
        """Our 'legacy' DB is read-only."""
        return False

    def allow_relation(self, obj1, obj2, **hints):
        """Forbid relations from/to Legacy to/from other app."""
        obj1_is_legacy = (obj1._meta.app_label == 'legacy')
        obj2_is_legacy = (obj2._meta.app_label == 'legacy')
        return obj1_is_legacy == obj2_is_legacy

    def allow_syncdb(self, db, model):
        return db != 'legacy' and model._meta.app_label != 'legacy'

Finally, to use the router you'll need to add it to your settings.py file.

DATABASE_ROUTERS = ['apps.legacy.router.LegacyRouter']

Now you are ready to access your legacy data using Django's ORM. Open the shell, import your legacy models and play around!

For a more detailed example of this technique applied, check this other blog post. It is based on Django 1.3 but still useful.

3. Your migration script

At this point you have access to the legacy data using Django's ORM. Now it is time to write the actual migration script. There is no magic nor much automation here: you know your data model and (hopefully) the legacy DB structure. It is in your hands to create your system's models instances and their relations.

In our case, we wrote an export.py script that we manually run from the command line whenever we need.

It's a really good idea to perform the migration inside a single transaction. Otherwise, any error while running the migration script will let you with a partial (and possible inconsistent migration) and will force you to write complex logic to be able to resume it. The @transaction.commit_on_success decorator is a good way to achieve the desired effect. As a helpful side effect, it will also be faster to do a single commit.

Conclusions

As a general data-migration technique for Django applications, it has several advantages:

  • allows to migrate lots of data,
  • can be used with immature or changing data-models,
  • relies on standard Django's features (ORM, Multiple databases),
  • the project's testing infrastructure can be used normally,
  • can be used for one-time-only migration scripts as well as for continuous-migration's features,
  • it can be applied in case of multiple and heterogeneous data sources.

On the other side, as usual, it is no silver bullet. One of the main problems here is that the complexity of the task is directly proportional to the difference between the DB models. Since the actual data manipulation must be programmed manually, very different data models potentially means a lot of work.

So, as stated in the beginning of the post: the method allowed us to successfully migrate a considerable amount of data into our system, allowing us to accommodate to changes as the application continued its development.

14 Apr 2014 2:52pm GMT

Martijn Faassen: The Call of Python 2.8

Introduction

Guido recently felt he needed to re-empathize that there will be no Python 2.8. The Python developers have been very clear for years that there will never be a Python 2.8.

http://legacy.python.org/dev/peps/pep-0404/

At the Python language summit there were calls for a Python 2.8. Guido reports:

We (I) still don't want to do a 2.8 release, and I don't want to accelerate 3.5, but I do think we should make things better for people who have to straddle Python 2 and 3 in a single codebase, by developing more tools, and by security and possibly installer updates to 2.7 (PEP 466).

At his keynote at PyCon, he said it again:

/guido_no.jpg

A very good thing happened to recognize the reality that Python 2.7 is still massively popular: the end of life date for Python 2.7 was changed by Guido to 2020 (it was 2015). In the same change he felt he should repeat there will be no Python 2.8:

+There will be no Python 2.8.

The call for Python 2.8 is strong. Even Guido feels it!

People talk about a Python 2.8, and are for it, or, like Guido, against it, but rarely talk about what it should be. So let's actually have that conversation.

Why talk about something that will never be? Because we can't call for something, nor reject something if we don't know what it is.

What is Python 2.8 for?

Python 2.8 could be different things. It could be a Python 2.x release that reduces some pain points and adds features for Python 2 developers independent from what's going on in Python 3. It makes sense, really: we haven't had a new Python 2 feature release since 2010 now. Those of us with existing large Python 2 codebases haven't benefited from the work the language developers have done in those years. Even polyglot libraries that support Python 2 and 3 both can't use the new features, so are also stuck with a 2010 Python. Before Python 2.7, the release cycle of Python has seen a new compatible release every 2 years or less. The reality of Python for many of its users is that there has been no feature update of the language for years now.

But I don't want to talk about that. I want to talk about Python 2.8 as an incremental upgrade path to Python 3. If we are going to add features to Python 2, let's take them from Python 3. I want to talk about bringing Python 2.x closer to Python 3. Python 2 might never quite reach Python 3 parity, but it could still help a lot if it can get closer incrementally.

Why an incremental upgrade?

In the discussion about Python 3 there is a lot of discussion about the need to port Python libraries to Python 3. This is indeed important if you want the ability to start new projects on Python 3. But many of us in the trenches are working on large Python 2 code bases. This isn't just maintenance. A large code base is alive, so we're building new features in Python 2.

Such a large Python codebase is:

  • Important to some organization. Important enough for people to actually pay developers money to work on Python code.
  • Cannot be easily ported in a giant step to Python 3, even if all external open source libraries are ported.
  • Porting would not see any functional gain, so the organization won't see it as a worthwhile investment.
  • Porting would entail bugs and breakages, which is what the organization would want to avoid.

You can argue that I'm overstating the risks of porting. But we need to face it: many codebases written in Python 2 have low automatic test coverage. We don't like to talk about it because we think everybody else is better at automated testing than we are, but it's the reality in the field.

We could say, fine, they can stay on Python 2 forever then! Well, at least until 2020. I think this would be unwise, as these organizations are paying a lot of developers money to work on Python code. This has an effect on the community as a whole. It contributes to the gravity of Python 2.

Those organizations, and thus the wider Python community, would be helped if there was an incremental way to upgrade their code bases to Python 3, with easy steps to follow. I think we can do much more to support such incremental upgrades than Python 2.7 offers right now.

Python 2.8 for polyglot developers

Besides helping Python 2 code bases go further step by step, Python 2.8 can also help those of us who are maintaining polyglot libraries, which work in both Python 2 and Python 3.

If a Python 2.8 backported Python 3 features, it means that polyglot authors can start using those features if they drop Python 2.7 support right there in their polyglot libraries, without giving up Python 2 compatibility. Python 2.8 would actually help encourage those on Python 2.7 codebases to move towards Python 3, so they can use the library upgrades.

Of course dropping Python 2.x support entirely for a polyglot library will also make that possible. But I think it'll be feasible to drop Python 2.7 support in favor of Python 2.8 much faster than it is possible to drop Python 2 support entirely.

But what do we want?

I've seen Python 3 developers say: but we've done all we could with Python 2.7 already! What do you want from a Python 2.8?

And that's a great question. It's gone unanswered for far too long. We should get a lot more concrete.

What follows are just ideas. I want to get them out there, so other people can start thinking about them. I don't intend to implement any of it myself; just blogging about it is already breaking my stress-reducing policy of not worrying about Python 3.

Anyway, I might have it all wrong. But at least I'm trying.

Breaking code

Here's a paradox: I think that in order to make an incremental upgrade possible for Python 2.x we should actually break existing Python 2.x code in Python 2.8! Some libraries will need minor adjustments to work in Python 2.8.

I want to do what the from __future__ pattern was introduced for in the first place: introduce a new incompatible feature in a release but making it optional, and then later making the incompatible feature the default.

The Future is Required

Python 2.7 lets you do from __future__ import something to get the interpreter behave a bit more like Python 3. In Python 2.8, those should be the default behavior.

In order to encourage this and make it really obvious, we may want to consider requiring these in Python 2.8. That means that the interpreter raises an error unless it has such a from __future__ import there.

If we go for that, it means you have to have this on the top of all your Python modules in Python 2.8:

  • from __future__ import division
  • from __future__ import absolute_import
  • from __future__ import print_function

absolute_import appears to be uncontroversial, but I've seen people complain about both division and print_function. If people reject Python 3 for those reasons, I want to make clear I'm not in the same camp. I believe that is confusing at most a minor inconvenience with a dealbreaker. I think discussion about these is pretty pointless, and I'm not going to engage in it.

I've left out unicode_literals. This is because I've seen both Nick Coghlan and Armin Ronacher argue against them. I have a different proposal. More below.

What do we gain by this measure? It's ugly! Yes, but we've made the upgrade path a lot more obvious. If an organisation wants to upgrade to Python 2.8, they have to review their imports and divisions and change their print statements to function calls. That should be doable enough, even in large code bases, and is an upgrade path a developer can do incrementally, maybe even without having to convince their bosses first. Compare that to an upgrade to Python 3.

from __future3__ import new_classes

We can't do everything with the old future imports. We want to allow more incremental upgrading. So let's introduce a new future import.

New-style classes, that is classes that derive from object, were introduced in Python 2 many years ago, but old-style classes are still supported. Python 3 only has new-style classes. Python 2.8 can help here by making new style classes the default. If you import from __future3__ import new_classes at the top of your module, any class definition in that module that looks like this:

class Foo:
   pass

is interpreted as a new-style class.

This might break the contract of the module, as people may subclass from this class and expect an old-style class, and in some (rare) cases this can break code. But at least those problems can be dealt with incrementally. And the upgrade path is really obvious.

__future3__?

Why did I write __future3__ and not __future__? Because otherwise we can't write polyglot code that is compatible in Python 2 and Python 3.

Python 3.4 doesn't support from __future__ import new_classes. We don't want to wait for a Python 3.5 or Python 3.6 to support this, even there is even any interest in supporting this among the Python language developers at all. Because after all, there won't be a Python 2.8.

That problem doesn't exist for __future3__. We can easily fake a __python3__ module in Python 3 without being dependent on the language developers. So polyglot code can safely use this.

from __future3__ import explicit_literals

Back to the magic moment of Nick Coghlan and Armin Ronacher agreeing.

Let's have a from __future3__ import explicit_literals.

This forces the author to be entirely explicit with string literals in the module that imports it. "foo" and 'foo' are now errors; the module won't import. Instead the module has to be explicit and use b'foo' and u'foo' everywhere.

What does that get us? It forces a developer to think about string literals everywhere, and that helps the codebase become incrementally more compatible with Python 3.

from __future3__ import str

This import line does two things:

  • you get a str function that creates a Python 3 str. This string has unicode text in it and cannot be combined with Python 2 style bytes and Python 3 style bytes without error (which I'll discuss later).
  • if from __future__ import explicit_literals is in effect, a bare literal now creates a Python 3 str. Or maybe explicit_literals is a prerequisite and from __future3__ import str should error if it isn't there.

I took this idea from the Python future module, which makes Python 3 style str and bytes (and much more) available in Python 2.7. I've modified the idea as I have the imaginary power to change the interpreter in Python 2.8. Of course anything I got wrong is my own fault, not the fault of Ed Schofield, the author of the future module.

from __past__ import bytes

To ensure you still have access to Python 2 bytes (really str) just in case you still need it, we need an additional import:

from __past__ import bytes as oldbytes

oldbytes` can be called with Python 2 str, Python 2 bytes and Python 3 bytes. It rejects a Python 3 str. I'll talk about why it can be needed in a bit.

Yes, __past__ is another new namespace we can safely support in Python 3. It would get more involved in Python 3: it contains a forward port of the Python 2 bytes object. Python 3 bytes have less features than Python 2 bytes, and this has been a pain point for some developers who need to work with bytes a lot. Having a more capable bytes object in Python 3 would not hurt existing Python 3 code, as combining it with a Python 3 string would still result in an error. It's just an alternative implementation of bytes with more methods on it.

from __future3__ import bytes

This is the equivalent import for getting the Python 3 bytes object.

Combining Python 3 str/bytes with Python 2 unicode/str

So what happens when we somehow combine a Python 3 str/bytes with a Python 2 str/bytes/unicode? Let's think about it.

The future module by Ed Schofield forbids py3bytes + py2unicode, but supports other combinations and upcasts them to their Python 3 version. So, for instance, py3str + py2unicode -> py3str. This is a consequence of the way it tries to make Python 2 string literals work a bit like they're Python 3 unicode literals. There is a big drawback to this approach; a Python 3 bytes is not fully compatible with APIs that expect a Python 2 str, and a library that tried to use this approach would suffer API breakage. See this issue for more information on that.

I think since we have the magical power to change the interpreter, we can do better. We can make real Python 3 string literals exist in Python 2 using __future3__.

I think we need these rules:

  • py3str + py2unicode -> py3str
  • py3str + py2str: UnicodeError
  • py3bytes + py2unicode: TypeError
  • py3bytes + py2str: TypeError

So while we upcast existing Python 2 unicode strings to Python 3 str we refuse any other combination.

Why not let people combine Python 2 str/bytes with Python 3 bytes? Because the Python 3 bytes object is not compatible with the Python 2 bytes object, and we should refuse to guess and immediately bail out when someone tries to mix the two. We require an explicit Python 2 str call to convert a Python 3 bytes to a str.

This is assuming that the Python 3 str is compatible with Python 2 unicode. I think we should aim for making a Python 3 string behave like a subclass of a Python 2 unicode.

What have we gained?

We can now start using Python 3 str and Python 3 bytes in our Python 2 codebases, incrementally upgrading, module by module.

Libraries could upgrade their internals to use Python 3 str and bytes entirely, and start using Python 3 str objects in any public API that returns Python 2 unicode strings now. If you're wrong and the users of your API actually do expect str-as-bytes instead of unicode strings, you can go deal with these issues one by one, in an incremental fashion.

For compatibility you can't return Python 3 bytes where Python 2 str-as-bytes is used, so judicious use of __past__.str would be needed at the boundaries in these cases.

After Python 2.8

People who have ported their code to Python 2.8 and have turned on all the __future3__ imports incrementally will be in a better place to port their code to Python 3. But to offer a more incremental step, we can have a Python 2.9 that requires the __future3__ imports introduced by Python 2.8. And by then we might have thought of some other ways to smoothen the upgrade path.

Summary

  • There will be no Python 2.8. There will be no Python 2.8! Really, there will be no Python 2.8.
  • Large code bases in Python need incremental upgrades.
  • The upgrade from Python 2 to Python 3 is not incremental enough.
  • A Python 2.8 could help smoothen the way.
  • A Python 2.8 could help polyglot libraries.
  • A Python 2.8 could let us drop support for Python 2.7 with an obvious upgrade path in place that brings everybody closer to Python 3.
  • The old __future__ imports are mandatory in Python 2.8 (except unicode_literals).
  • We introduce a new __future3__ in Python 2.8. __future3__ because we can support it in Python 3 today.
  • We introduce from __future3__ import new_classes, mandating new style objects for plain class statements.
  • We introduce from __future3__ import explicit_literals, str, bytes to support a migration to use Python 3 style str and bytes.
  • We introduce from __past__ import bytes to be able to access the old-style bytes object.
  • A forward port of the Python 2 bytes object to Python 3 would be useful. It would error if combined with a Python 3 str, just like the Python 3 bytes does.
  • A future Python 2.9 could introduce more incremental upgrade steps. But there will be no Python 2.9.
  • I'm not going to do the work, but at least now we have something to talk about.

14 Apr 2014 11:52am GMT

Martijn Faassen: The Call of Python 2.8

Introduction

Guido recently felt he needed to re-empathize that there will be no Python 2.8. The Python developers have been very clear for years that there will never be a Python 2.8.

http://legacy.python.org/dev/peps/pep-0404/

At the Python language summit there were calls for a Python 2.8. Guido reports:

We (I) still don't want to do a 2.8 release, and I don't want to accelerate 3.5, but I do think we should make things better for people who have to straddle Python 2 and 3 in a single codebase, by developing more tools, and by security and possibly installer updates to 2.7 (PEP 466).

At his keynote at PyCon, he said it again:

/guido_no.jpg

A very good thing happened to recognize the reality that Python 2.7 is still massively popular: the end of life date for Python 2.7 was changed by Guido to 2020 (it was 2015). In the same change he felt he should repeat there will be no Python 2.8:

+There will be no Python 2.8.

The call for Python 2.8 is strong. Even Guido feels it!

People talk about a Python 2.8, and are for it, or, like Guido, against it, but rarely talk about what it should be. So let's actually have that conversation.

Why talk about something that will never be? Because we can't call for something, nor reject something if we don't know what it is.

What is Python 2.8 for?

Python 2.8 could be different things. It could be a Python 2.x release that reduces some pain points and adds features for Python 2 developers independent from what's going on in Python 3. It makes sense, really: we haven't had a new Python 2 feature release since 2010 now. Those of us with existing large Python 2 codebases haven't benefited from the work the language developers have done in those years. Even polyglot libraries that support Python 2 and 3 both can't use the new features, so are also stuck with a 2010 Python. Before Python 2.7, the release cycle of Python has seen a new compatible release every 2 years or less. The reality of Python for many of its users is that there has been no feature update of the language for years now.

But I don't want to talk about that. I want to talk about Python 2.8 as an incremental upgrade path to Python 3. If we are going to add features to Python 2, let's take them from Python 3. I want to talk about bringing Python 2.x closer to Python 3. Python 2 might never quite reach Python 3 parity, but it could still help a lot if it can get closer incrementally.

Why an incremental upgrade?

In the discussion about Python 3 there is a lot of discussion about the need to port Python libraries to Python 3. This is indeed important if you want the ability to start new projects on Python 3. But many of us in the trenches are working on large Python 2 code bases. This isn't just maintenance. A large code base is alive, so we're building new features in Python 2.

Such a large Python codebase is:

  • Important to some organization. Important enough for people to actually pay developers money to work on Python code.
  • Cannot be easily ported in a giant step to Python 3, even if all external open source libraries are ported.
  • Porting would not see any functional gain, so the organization won't see it as a worthwhile investment.
  • Porting would entail bugs and breakages, which is what the organization would want to avoid.

You can argue that I'm overstating the risks of porting. But we need to face it: many codebases written in Python 2 have low automatic test coverage. We don't like to talk about it because we think everybody else is better at automated testing than we are, but it's the reality in the field.

We could say, fine, they can stay on Python 2 forever then! Well, at least until 2020. I think this would be unwise, as these organizations are paying a lot of developers money to work on Python code. This has an effect on the community as a whole. It contributes to the gravity of Python 2.

Those organizations, and thus the wider Python community, would be helped if there was an incremental way to upgrade their code bases to Python 3, with easy steps to follow. I think we can do much more to support such incremental upgrades than Python 2.7 offers right now.

Python 2.8 for polyglot developers

Besides helping Python 2 code bases go further step by step, Python 2.8 can also help those of us who are maintaining polyglot libraries, which work in both Python 2 and Python 3.

If a Python 2.8 backported Python 3 features, it means that polyglot authors can start using those features if they drop Python 2.7 support right there in their polyglot libraries, without giving up Python 2 compatibility. Python 2.8 would actually help encourage those on Python 2.7 codebases to move towards Python 3, so they can use the library upgrades.

Of course dropping Python 2.x support entirely for a polyglot library will also make that possible. But I think it'll be feasible to drop Python 2.7 support in favor of Python 2.8 much faster than it is possible to drop Python 2 support entirely.

But what do we want?

I've seen Python 3 developers say: but we've done all we could with Python 2.7 already! What do you want from a Python 2.8?

And that's a great question. It's gone unanswered for far too long. We should get a lot more concrete.

What follows are just ideas. I want to get them out there, so other people can start thinking about them. I don't intend to implement any of it myself; just blogging about it is already breaking my stress-reducing policy of not worrying about Python 3.

Anyway, I might have it all wrong. But at least I'm trying.

Breaking code

Here's a paradox: I think that in order to make an incremental upgrade possible for Python 2.x we should actually break existing Python 2.x code in Python 2.8! Some libraries will need minor adjustments to work in Python 2.8.

I want to do what the from __future__ pattern was introduced for in the first place: introduce a new incompatible feature in a release but making it optional, and then later making the incompatible feature the default.

The Future is Required

Python 2.7 lets you do from __future__ import something to get the interpreter behave a bit more like Python 3. In Python 2.8, those should be the default behavior.

In order to encourage this and make it really obvious, we may want to consider requiring these in Python 2.8. That means that the interpreter raises an error unless it has such a from __future__ import there.

If we go for that, it means you have to have this on the top of all your Python modules in Python 2.8:

  • from __future__ import division
  • from __future__ import absolute_import
  • from __future__ import print_function

absolute_import appears to be uncontroversial, but I've seen people complain about both division and print_function. If people reject Python 3 for those reasons, I want to make clear I'm not in the same camp. I believe that is confusing at most a minor inconvenience with a dealbreaker. I think discussion about these is pretty pointless, and I'm not going to engage in it.

I've left out unicode_literals. This is because I've seen both Nick Coghlan and Armin Ronacher argue against them. I have a different proposal. More below.

What do we gain by this measure? It's ugly! Yes, but we've made the upgrade path a lot more obvious. If an organisation wants to upgrade to Python 2.8, they have to review their imports and divisions and change their print statements to function calls. That should be doable enough, even in large code bases, and is an upgrade path a developer can do incrementally, maybe even without having to convince their bosses first. Compare that to an upgrade to Python 3.

from __future3__ import new_classes

We can't do everything with the old future imports. We want to allow more incremental upgrading. So let's introduce a new future import.

New-style classes, that is classes that derive from object, were introduced in Python 2 many years ago, but old-style classes are still supported. Python 3 only has new-style classes. Python 2.8 can help here by making new style classes the default. If you import from __future3__ import new_classes at the top of your module, any class definition in that module that looks like this:

class Foo:
   pass

is interpreted as a new-style class.

This might break the contract of the module, as people may subclass from this class and expect an old-style class, and in some (rare) cases this can break code. But at least those problems can be dealt with incrementally. And the upgrade path is really obvious.

__future3__?

Why did I write __future3__ and not __future__? Because otherwise we can't write polyglot code that is compatible in Python 2 and Python 3.

Python 3.4 doesn't support from __future__ import new_classes. We don't want to wait for a Python 3.5 or Python 3.6 to support this, even there is even any interest in supporting this among the Python language developers at all. Because after all, there won't be a Python 2.8.

That problem doesn't exist for __future3__. We can easily fake a __python3__ module in Python 3 without being dependent on the language developers. So polyglot code can safely use this.

from __future3__ import explicit_literals

Back to the magic moment of Nick Coghlan and Armin Ronacher agreeing.

Let's have a from __future3__ import explicit_literals.

This forces the author to be entirely explicit with string literals in the module that imports it. "foo" and 'foo' are now errors; the module won't import. Instead the module has to be explicit and use b'foo' and u'foo' everywhere.

What does that get us? It forces a developer to think about string literals everywhere, and that helps the codebase become incrementally more compatible with Python 3.

from __future3__ import str

This import line does two things:

  • you get a str function that creates a Python 3 str. This string has unicode text in it and cannot be combined with Python 2 style bytes and Python 3 style bytes without error (which I'll discuss later).
  • if from __future__ import explicit_literals is in effect, a bare literal now creates a Python 3 str. Or maybe explicit_literals is a prerequisite and from __future3__ import str should error if it isn't there.

I took this idea from the Python future module, which makes Python 3 style str and bytes (and much more) available in Python 2.7. I've modified the idea as I have the imaginary power to change the interpreter in Python 2.8. Of course anything I got wrong is my own fault, not the fault of Ed Schofield, the author of the future module.

from __past__ import bytes

To ensure you still have access to Python 2 bytes (really str) just in case you still need it, we need an additional import:

from __past__ import bytes as oldbytes

oldbytes` can be called with Python 2 str, Python 2 bytes and Python 3 bytes. It rejects a Python 3 str. I'll talk about why it can be needed in a bit.

Yes, __past__ is another new namespace we can safely support in Python 3. It would get more involved in Python 3: it contains a forward port of the Python 2 bytes object. Python 3 bytes have less features than Python 2 bytes, and this has been a pain point for some developers who need to work with bytes a lot. Having a more capable bytes object in Python 3 would not hurt existing Python 3 code, as combining it with a Python 3 string would still result in an error. It's just an alternative implementation of bytes with more methods on it.

from __future3__ import bytes

This is the equivalent import for getting the Python 3 bytes object.

Combining Python 3 str/bytes with Python 2 unicode/str

So what happens when we somehow combine a Python 3 str/bytes with a Python 2 str/bytes/unicode? Let's think about it.

The future module by Ed Schofield forbids py3bytes + py2unicode, but supports other combinations and upcasts them to their Python 3 version. So, for instance, py3str + py2unicode -> py3str. This is a consequence of the way it tries to make Python 2 string literals work a bit like they're Python 3 unicode literals. There is a big drawback to this approach; a Python 3 bytes is not fully compatible with APIs that expect a Python 2 str, and a library that tried to use this approach would suffer API breakage. See this issue for more information on that.

I think since we have the magical power to change the interpreter, we can do better. We can make real Python 3 string literals exist in Python 2 using __future3__.

I think we need these rules:

  • py3str + py2unicode -> py3str
  • py3str + py2str: UnicodeError
  • py3bytes + py2unicode: TypeError
  • py3bytes + py2str: TypeError

So while we upcast existing Python 2 unicode strings to Python 3 str we refuse any other combination.

Why not let people combine Python 2 str/bytes with Python 3 bytes? Because the Python 3 bytes object is not compatible with the Python 2 bytes object, and we should refuse to guess and immediately bail out when someone tries to mix the two. We require an explicit Python 2 str call to convert a Python 3 bytes to a str.

This is assuming that the Python 3 str is compatible with Python 2 unicode. I think we should aim for making a Python 3 string behave like a subclass of a Python 2 unicode.

What have we gained?

We can now start using Python 3 str and Python 3 bytes in our Python 2 codebases, incrementally upgrading, module by module.

Libraries could upgrade their internals to use Python 3 str and bytes entirely, and start using Python 3 str objects in any public API that returns Python 2 unicode strings now. If you're wrong and the users of your API actually do expect str-as-bytes instead of unicode strings, you can go deal with these issues one by one, in an incremental fashion.

For compatibility you can't return Python 3 bytes where Python 2 str-as-bytes is used, so judicious use of __past__.str would be needed at the boundaries in these cases.

After Python 2.8

People who have ported their code to Python 2.8 and have turned on all the __future3__ imports incrementally will be in a better place to port their code to Python 3. But to offer a more incremental step, we can have a Python 2.9 that requires the __future3__ imports introduced by Python 2.8. And by then we might have thought of some other ways to smoothen the upgrade path.

Summary

  • There will be no Python 2.8. There will be no Python 2.8! Really, there will be no Python 2.8.
  • Large code bases in Python need incremental upgrades.
  • The upgrade from Python 2 to Python 3 is not incremental enough.
  • A Python 2.8 could help smoothen the way.
  • A Python 2.8 could help polyglot libraries.
  • A Python 2.8 could let us drop support for Python 2.7 with an obvious upgrade path in place that brings everybody closer to Python 3.
  • The old __future__ imports are mandatory in Python 2.8 (except unicode_literals).
  • We introduce a new __future3__ in Python 2.8. __future3__ because we can support it in Python 3 today.
  • We introduce from __future3__ import new_classes, mandating new style objects for plain class statements.
  • We introduce from __future3__ import explicit_literals, str, bytes to support a migration to use Python 3 style str and bytes.
  • We introduce from __past__ import bytes to be able to access the old-style bytes object.
  • A forward port of the Python 2 bytes object to Python 3 would be useful. It would error if combined with a Python 3 str, just like the Python 3 bytes does.
  • A future Python 2.9 could introduce more incremental upgrade steps. But there will be no Python 2.9.
  • I'm not going to do the work, but at least now we have something to talk about.

14 Apr 2014 11:52am GMT

Future Foundries: Crochet 1.2.0, now with a better API!

Crochet is a library for using Twisted more easily from blocking programs and libraries. The latest version, released here at PyCon 2014, includes a much improved API for calling into Twisted from threads. In particular, a timeout is passed in - if it is hit the underlying operation is cancelled, and an exception is raised. Not all APIs in Twisted support cancellation, but for those that do (or APIs you implement) this is a really nice feature. You get high level timeouts (instead of blocking sockets' timeout-per-socket-operation) and automatic cleanup of resources if something takes too long.

#!/usr/bin/python
"""
Do a DNS lookup using Twisted's APIs.
"""
from __future__ import print_function

# The Twisted code we'll be using:
from twisted.names import client

from crochet import setup, wait_for
setup()


# Crochet layer, wrapping Twisted's DNS library in a blocking call.
@wait_for(timeout=5.0)
def gethostbyname(name):
"""Lookup the IP of a given hostname.

Unlike socket.gethostbyname() which can take an arbitrary amount
of time to finish, this function will raise crochet.TimeoutError
if more than 5 seconds elapse without an answer being received.
"""
d = client.lookupAddress(name)
d.addCallback(lambda result: result[0][0].payload.dottedQuad())
return d


if __name__ == '__main__':
# Application code using the public API - notice it works in a normal
# blocking manner, with no event loop visible:
import sys
name = sys.argv[1]
ip = gethostbyname(name)
print(name, "->", ip)

14 Apr 2014 8:52am GMT

Future Foundries: Crochet 1.2.0, now with a better API!

Crochet is a library for using Twisted more easily from blocking programs and libraries. The latest version, released here at PyCon 2014, includes a much improved API for calling into Twisted from threads. In particular, a timeout is passed in - if it is hit the underlying operation is cancelled, and an exception is raised. Not all APIs in Twisted support cancellation, but for those that do (or APIs you implement) this is a really nice feature. You get high level timeouts (instead of blocking sockets' timeout-per-socket-operation) and automatic cleanup of resources if something takes too long.

#!/usr/bin/python
"""
Do a DNS lookup using Twisted's APIs.
"""
from __future__ import print_function

# The Twisted code we'll be using:
from twisted.names import client

from crochet import setup, wait_for
setup()


# Crochet layer, wrapping Twisted's DNS library in a blocking call.
@wait_for(timeout=5.0)
def gethostbyname(name):
"""Lookup the IP of a given hostname.

Unlike socket.gethostbyname() which can take an arbitrary amount
of time to finish, this function will raise crochet.TimeoutError
if more than 5 seconds elapse without an answer being received.
"""
d = client.lookupAddress(name)
d.addCallback(lambda result: result[0][0].payload.dottedQuad())
return d


if __name__ == '__main__':
# Application code using the public API - notice it works in a normal
# blocking manner, with no event loop visible:
import sys
name = sys.argv[1]
ip = gethostbyname(name)
print(name, "->", ip)

14 Apr 2014 8:52am GMT

Ned Batchelder: PyCon 2014

PyCon 2014 is over, and as usual, I loved every minute. There are a huge number of people that I know there, and about 5 different sub-communities that I feel an irrationally strong attachment to.

Some highlights:

My head is still spinning from the high-energy four days I've had, I'm sure I'm leaving out an important high point. I just love every minute!

On the downside, I did not see as much of Montreal as I would have liked, but we'll be back for PyCon 2015, so I have a second chance!

14 Apr 2014 3:36am GMT

Ned Batchelder: PyCon 2014

PyCon 2014 is over, and as usual, I loved every minute. There are a huge number of people that I know there, and about 5 different sub-communities that I feel an irrationally strong attachment to.

Some highlights:

My head is still spinning from the high-energy four days I've had, I'm sure I'm leaving out an important high point. I just love every minute!

On the downside, I did not see as much of Montreal as I would have liked, but we'll be back for PyCon 2015, so I have a second chance!

14 Apr 2014 3:36am GMT

11 Oct 2013

feedPython Software Foundation | GSoC'11 Students

Yeswanth Swami: How I kicked off GSoC

Zero to hero

What Prompted me??

I started my third year thinking I should do something that would put me different from the rest and one of my professors suggested me as to why don't I apply for GSoC. I don't know why but I took the suggestion rather seriously, thanks to the bet I had with one of my friend(who is about to complete his MBBS) that whoever earns first will buy the other a "RayBan shades". Well, that's it. I was determined. I started my research early, probably during the start of February(I knew I want to buy my friend, his shades and also buy mine too, in the process).

What experiences I had before??

I started looking at previous years' GSoC projects(having had little experience with Open Source) and started learning how to contribute. I was also very fascinated to the amount of knowledge one could gain just by googling and browsing web pages . I discovered very soon, as to what an immensely great tool , email, through which I could chat with anyone in the open source world and ask seemingly stupid questions and always expect to get a gentle reply back with an answer. Well, that held me spell bound and I knew I want to contribute to Open Source.

How did I begin??

About the middle of March, I discovered that my passion for Python as a programming language increased , after understanding how easy it is as a language. Added to that, my popularity among my fellow classmates increased when I started evangelizing Python(thanks to my seniors for introducing it, I guess I did a decent job popularizing the language). And I started contributing to PSF(Python Software Foundation) , started with a simple bug to fix documentation and slowly my interactivity in IRC increased and I started liking one of the project one of the community member proposed.

A twist in the story??

There I was, still a noob and not knowing how to convince my probable mentor that I could complete the project, given direction. About this juncture, a fellow student(from some university in France) mailed this particular mentor that he was interested in the project . Do, remember, I was part of the mailing list and follow the happenings of it. So, I was furious knowing that I had a competition(having put so much effort) and I was not willing to compromise my project (knowing that this is the one project I actually understood and started researching a little bit too). The other projects require me to have some domain knowledge. I went back to my teachers, seniors, friends and Google and started asking the question , "how would i solve the problem the mentor posted?" . I framed a couple of answers, though very noobish , but at least I could reply the email thread posting my understanding of the problem and how I would solve it and also ask various questions I had in my mind. Well, the mentor replied, immediately to my surprise, and responded back with comments as well as answers to the questions I posed. Again, my nemesis/competitor replied back(he having good knowledge about the problem domain). I knew it was not going to be easy. Hence, I went back again, through all my sources, made further understanding of the problem and posted back again. I guess, about 20 mails in the thread , till we(all three of us) decided we should catch up in IRC and discuss more.

The conclusion:

Well, at IRC , most of senior members from the community were present, and they suggested that they should probably extend the scope of the project(since two students were interested in one project and showed immense passion). Unsurprisingly, over multiple meetings, the project scope was expanded, both the students were given equal important but independent tasks and both the students got opportunity to say they are Google Summer of Code students. Thank goodness, we decided to built the project from scratch giving us more than enough work on our plate.

Movie titles:

1) In the open source world, there is no competition , it is only "COLLABORATION".

2) Why give up, when you can win??

3) From Zero to Hero!!

4) A prodigy in making

p.s. I still owe my friend his shades . *sshole, I am still waiting for him to visit me so that I can buy him his shades and buy mine too. Also, I know its been two years since the story happened, but it is never too late to share, don't you agree??


11 Oct 2013 5:39am GMT