26 Feb 2015

feedPlanet Python

EuroPython Society: Farewell to John Pinner

image

John Pinner passed away this morning after fighting cancer for the past months.

John Pinner has been a long time Python supporter and community builder. He ran his own company Clockwork Software Systems, created one of the first Open Source payroll systems built on Python called PayThyme, helped initiate the EuroPython conference in 2002, started PyCon UK in 2007 and was its benevolent chair ever since.

In 2009 and 2010, John hosted EuroPython 2009 and 2010 as chair in Birmingham. In 2010, he received the Python Software Foundation Community Award.

But most of all he was a kind and modest man with a great talent to create a pleasant and open community atmosphere.

We will miss you, John, and your kind and comforting smile.

Thank you for all the wonderful moments,
-
Your friends from the EuroPython community

26 Feb 2015 10:55pm GMT

EuroPython Society: Farewell to John Pinner

image

John Pinner passed away this morning after fighting cancer for the past months.

John Pinner has been a long time Python supporter and community builder. He ran his own company Clockwork Software Systems, created one of the first Open Source payroll systems built on Python called PayThyme, helped initiate the EuroPython conference in 2002, started PyCon UK in 2007 and was its benevolent chair ever since.

In 2009 and 2010, John hosted EuroPython 2009 and 2010 as chair in Birmingham. In 2010, he received the Python Software Foundation Community Award.

But most of all he was a kind and modest man with a great talent to create a pleasant and open community atmosphere.

We will miss you, John, and your kind and comforting smile.

Thank you for all the wonderful moments,
-
Your friends from the EuroPython community

26 Feb 2015 10:55pm GMT

Hybrid Cluster: Powerstrip-flocker: Portable volumes using just the Docker CLI

Migration of data between hosts in a Docker cluster is a serious challenge. Until now a Docker data volume running on a server has been stuck there without powerful tooling for moving it along with its container to a new host. Despite how easy it is to spin up a Dockerized database, the difficulty of managing where the data resides has meant that most of us haven't yet containerized our data services.

The good news is that today, using nothing more than the Docker client, we can now migrate our data volumes between servers, starting the process of containerizing our database, queue and key-value store microservices. Before seeing exactly how we can do this, let's look at a simple example.

Quick fire example of Docker data volume migration

Let's say we have 2 hosts, node1 and node2. We run the following Docker command on

node1$ sudo docker run -v /flocker/test001:/data \
       ubuntu sh -c "echo powerflock > /data/file.txt"

This runs an application which will write some data to the volume. This means that if we run the container on any other host - we are foobared because the data will not be there.

Here comes the magic - on node2, without any other commands in between - we run this docker command:

node2$ sudo docker run -v /flocker/test001:/data \
       ubuntu sh -c "cat /data/file.txt"
powerflock

Now the container running on node2 has access to the data that was written on node1. Flocker moved the data between the nodes in order to make node2 to the active master for that dataset.

You can use this to run any database or stateful application, not just cat and echo!

Portable volumes from directly within the Docker CLI

The secret behind this easy, fast data volume migration is powerstrip-flocker.

Powerstrip-flocker combines two technologies:

Flocker gives us portable volumes and Powerstrip gives us extensions without wrapping Docker. Combine the two and we get portable volumes without wrapping docker!

Step-by-step data volume migration using powerstrip-flocker

Here is a demo of powerstrip-flocker that you can try for yourself.

Note: this is a very early technology preview of what we're working on. The following description shows you how to spin up a development snapshot of Flocker, and a pre-alpha version of powerstrip and powerstrip-flocker. You have been warned!

Step 1

Clone this repository to your machine and change to the directory:

$ git clone https://github.com/ClusterHQ/powerstrip-flocker
$ cd powerstrip-flocker/vagrant

Step 2

Vagrant up which brings up 2 VM's (host1 and host2) - we will migrate data from node1 to node2

$ vagrant up

Step 3

Connect to node1 and run a container that writes to a volume:

$ vagrant ssh node1
$ sudo docker run -v /flocker/test001:/data \
    ubuntu sh -c "echo powerflock > /data/file.txt"
$ exit

Step 4

Connect to node2 and run another container that reads the data we just wrote to node1:

$ vagrant ssh node2
$ sudo docker run -v /flocker/test001:/data \
    ubuntu sh -c "cat /data/file.txt"

This will print powerflock to the console and that means the data has been migrated!

You can also experiment with powerstrip-flocker on AWS with this README: that gives you docker + powerstrip + powerstrip-flocker + flocker on AWS!

A bit more detail on Powerstrip

A few weeks ago, we launched Powerstrip, a tool that lets you build prototypes of Docker extensions, called Powerstrip adapters. Powerstrip-flocker which you've just seen is a Powerstrip adapter for providing portable data volumes. The rest of this post will provide a little more detail about the Powerstrip project itself. We hope you'll get involved, either by writing your own Powerstrip adapter or getting involved in the official Docker extensions project itself.

The main benefit of Powerstrip is that multiple Powerstrip adapters can seamlessly combine behind the familiar Docker CLI and API. For example, you can have a storage adapter (e.g. powerstrip-flocker) running alongside a networking adapter (e.g. powerstrip-weave), all playing nice with your choice of orchestration framework.

Crucially for the community, this immediately enables composition and experimentation with prototypes of Docker extensions.

"There is a huge demand for customizing and extending Docker without breaking its standard API. Hundreds of Docker contributors are collaborating to make it happen. Powerstrip will help them experiment faster, without having to patch and rebuild Docker every time. It's a huge time saver and will help us find the right design faster. A must have if you're into hacking and customizing Docker"
-Solomon Hykes, CTO at Docker

Powerstrip is implemented as a configurable, pluggable HTTP proxy for the Docker API which lets you plug multiple Docker extension prototypes into the same Docker daemon. It is intended to allow quick prototyping in order to figure out which integration points are needed in order to turn such prototypical adapters into real Docker extensions. Inspired by this GitHub issue, Powerstrip allows you to build Docker extensions by implementing chained blocking webhooks triggered by arbitrary Docker API calls.

The following diagram shows how multiple adapters can inject blocking pre- and post-hooks on a Docker API request:

powerstrip architecture

Why did we create Powerstrip?

At ClusterHQ we are participating in the ongoing effort in the Docker community to add an extensions API to Docker. The goal of this project is make it possible for a rich ecosystem of tool builders to offer specialized services without forcing end-users to choose between different Docker experiences. Currently, the only way to build Docker extensions for something like storage or networking is to wrap the Docker API. This means that multiple extensions can't be used together, and the tool builders must reimplement the Docker API to expose a familiar interface to the user.

While the important work of building this extensions API happens in the open, there is interest from the community to start prototyping extensions today. So in order to enable the whole community to prototype and experiment faster, we open sourced Powerstrip, a tool for prototyping Docker extensions. If you like what you've read here, get involved at #clusterhq or the official Docker Extensions discussion at #docker-extensions, both on Freenode.

Get involved in the discussion over on HN!

The post Powerstrip-flocker: Portable volumes using just the Docker CLI appeared first on ClusterHQ.

26 Feb 2015 5:07pm GMT

Hybrid Cluster: Powerstrip-flocker: Portable volumes using just the Docker CLI

Migration of data between hosts in a Docker cluster is a serious challenge. Until now a Docker data volume running on a server has been stuck there without powerful tooling for moving it along with its container to a new host. Despite how easy it is to spin up a Dockerized database, the difficulty of managing where the data resides has meant that most of us haven't yet containerized our data services.

The good news is that today, using nothing more than the Docker client, we can now migrate our data volumes between servers, starting the process of containerizing our database, queue and key-value store microservices. Before seeing exactly how we can do this, let's look at a simple example.

Quick fire example of Docker data volume migration

Let's say we have 2 hosts, node1 and node2. We run the following Docker command on

node1$ sudo docker run -v /flocker/test001:/data \
       ubuntu sh -c "echo powerflock > /data/file.txt"

This runs an application which will write some data to the volume. This means that if we run the container on any other host - we are foobared because the data will not be there.

Here comes the magic - on node2, without any other commands in between - we run this docker command:

node2$ sudo docker run -v /flocker/test001:/data \
       ubuntu sh -c "cat /data/file.txt"
powerflock

Now the container running on node2 has access to the data that was written on node1. Flocker moved the data between the nodes in order to make node2 to the active master for that dataset.

You can use this to run any database or stateful application, not just cat and echo!

Portable volumes from directly within the Docker CLI

The secret behind this easy, fast data volume migration is powerstrip-flocker.

Powerstrip-flocker combines two technologies:

Flocker gives us portable volumes and Powerstrip gives us extensions without wrapping Docker. Combine the two and we get portable volumes without wrapping docker!

Step-by-step data volume migration using powerstrip-flocker

Here is a demo of powerstrip-flocker that you can try for yourself.

Note: this is a very early technology preview of what we're working on. The following description shows you how to spin up a development snapshot of Flocker, and a pre-alpha version of powerstrip and powerstrip-flocker. You have been warned!

Step 1

Clone this repository to your machine and change to the directory:

$ git clone https://github.com/ClusterHQ/powerstrip-flocker
$ cd powerstrip-flocker/vagrant

Step 2

Vagrant up which brings up 2 VM's (host1 and host2) - we will migrate data from node1 to node2

$ vagrant up

Step 3

Connect to node1 and run a container that writes to a volume:

$ vagrant ssh node1
$ sudo docker run -v /flocker/test001:/data \
    ubuntu sh -c "echo powerflock > /data/file.txt"
$ exit

Step 4

Connect to node2 and run another container that reads the data we just wrote to node1:

$ vagrant ssh node2
$ sudo docker run -v /flocker/test001:/data \
    ubuntu sh -c "cat /data/file.txt"

This will print powerflock to the console and that means the data has been migrated!

You can also experiment with powerstrip-flocker on AWS with this README: that gives you docker + powerstrip + powerstrip-flocker + flocker on AWS!

A bit more detail on Powerstrip

A few weeks ago, we launched Powerstrip, a tool that lets you build prototypes of Docker extensions, called Powerstrip adapters. Powerstrip-flocker which you've just seen is a Powerstrip adapter for providing portable data volumes. The rest of this post will provide a little more detail about the Powerstrip project itself. We hope you'll get involved, either by writing your own Powerstrip adapter or getting involved in the official Docker extensions project itself.

The main benefit of Powerstrip is that multiple Powerstrip adapters can seamlessly combine behind the familiar Docker CLI and API. For example, you can have a storage adapter (e.g. powerstrip-flocker) running alongside a networking adapter (e.g. powerstrip-weave), all playing nice with your choice of orchestration framework.

Crucially for the community, this immediately enables composition and experimentation with prototypes of Docker extensions.

"There is a huge demand for customizing and extending Docker without breaking its standard API. Hundreds of Docker contributors are collaborating to make it happen. Powerstrip will help them experiment faster, without having to patch and rebuild Docker every time. It's a huge time saver and will help us find the right design faster. A must have if you're into hacking and customizing Docker"
-Solomon Hykes, CTO at Docker

Powerstrip is implemented as a configurable, pluggable HTTP proxy for the Docker API which lets you plug multiple Docker extension prototypes into the same Docker daemon. It is intended to allow quick prototyping in order to figure out which integration points are needed in order to turn such prototypical adapters into real Docker extensions. Inspired by this GitHub issue, Powerstrip allows you to build Docker extensions by implementing chained blocking webhooks triggered by arbitrary Docker API calls.

The following diagram shows how multiple adapters can inject blocking pre- and post-hooks on a Docker API request:

powerstrip architecture

Why did we create Powerstrip?

At ClusterHQ we are participating in the ongoing effort in the Docker community to add an extensions API to Docker. The goal of this project is make it possible for a rich ecosystem of tool builders to offer specialized services without forcing end-users to choose between different Docker experiences. Currently, the only way to build Docker extensions for something like storage or networking is to wrap the Docker API. This means that multiple extensions can't be used together, and the tool builders must reimplement the Docker API to expose a familiar interface to the user.

While the important work of building this extensions API happens in the open, there is interest from the community to start prototyping extensions today. So in order to enable the whole community to prototype and experiment faster, we open sourced Powerstrip, a tool for prototyping Docker extensions. If you like what you've read here, get involved at #clusterhq or the official Docker Extensions discussion at #docker-extensions, both on Freenode.

Get involved in the discussion over on HN!

The post Powerstrip-flocker: Portable volumes using just the Docker CLI appeared first on ClusterHQ.

26 Feb 2015 5:07pm GMT

Python Software Foundation: John Pinner

I am very sad to report that John Pinner has passed away. The Python Community has lost a great friend. John received a PSF Community Service Award in 2010 for his many contributions. He was a PSF fellow and an organizer of PyCon UK from 2007 to 2014 and of EuroPython from 2004 to 2011. He was also a frequent speaker at PyCons, and at workshops and users' groups, as well as an enthusiastic and effective advocate of Python and Open Source.


John was an original contributor to Free and Libre Open Source Software UK (FLOSS), which started out as the UK Unix Users Group (UKUUG). After working 21 years as Principal Engineer for The Rover Company Limited, he decided to found his own company, Clockwork Software Systems.
His dedication to and interest in Python are best expressed in his own words:
Thanks to Linux Journal I 'discovered' Python in 2000 and have been using it ever since; together with an occasional regression to C, it covers all my programming needs. I find that it gets in the way least of all the languages I have used, and brings back the joy to programming. I am proud to be an elected member of the Python Software Foundation, and am something of a Python evangelist, through running training courses and promoting such events such as PyCon UK.
I had the great honor and pleasure of meeting and spending time with John at PyCon UK in 2013. He was a delightful host, full of energy, knowledge about the locale (history, good beers and the best pubs, landmarks, neighborhoods, cathedrals), and enthusiastic good will. To give a sense of his warm-hearted and jovial personality to those who didn't know him, his intermediate-level Python tutorial included such topics as WTF is Pythonic and it's not C, C++ or Java, don't try and make it so. Heartfelt condolences to his family and to all whose lives he touched. He will be sorely missed.

26 Feb 2015 4:58pm GMT

Python Software Foundation: John Pinner

I am very sad to report that John Pinner has passed away. The Python Community has lost a great friend. John received a PSF Community Service Award in 2010 for his many contributions. He was a PSF fellow and an organizer of PyCon UK from 2007 to 2014 and of EuroPython from 2004 to 2011. He was also a frequent speaker at PyCons, and at workshops and users' groups, as well as an enthusiastic and effective advocate of Python and Open Source.


John was an original contributor to Free and Libre Open Source Software UK (FLOSS), which started out as the UK Unix Users Group (UKUUG). After working 21 years as Principal Engineer for The Rover Company Limited, he decided to found his own company, Clockwork Software Systems.
His dedication to and interest in Python are best expressed in his own words:
Thanks to Linux Journal I 'discovered' Python in 2000 and have been using it ever since; together with an occasional regression to C, it covers all my programming needs. I find that it gets in the way least of all the languages I have used, and brings back the joy to programming. I am proud to be an elected member of the Python Software Foundation, and am something of a Python evangelist, through running training courses and promoting such events such as PyCon UK.
I had the great honor and pleasure of meeting and spending time with John at PyCon UK in 2013. He was a delightful host, full of energy, knowledge about the locale (history, good beers and the best pubs, landmarks, neighborhoods, cathedrals), and enthusiastic good will. To give a sense of his warm-hearted and jovial personality to those who didn't know him, his intermediate-level Python tutorial included such topics as WTF is Pythonic and it's not C, C++ or Java, don't try and make it so. Heartfelt condolences to his family and to all whose lives he touched. He will be sorely missed.

26 Feb 2015 4:58pm GMT

Fabio Zadrozny: Design for client-side applications in Python (but applicable to other languages too)


Ok, so, this is a post I wanted to write for some time already and never really got the time to do...

First off, this is a design I've worked some times during the years on client-side applications (Python and Javascript). It revolves around some simple concepts which can be used to develop small to big sized applications, and is suited to places where there are more long-lived instances -- which is usually the case on client-side applications (and usually not for server-side web applications).

These are also the concepts I used when implementing PyVmMonitor (http://pyvmmonitor.com).

For those interested, although PyVmMonitor is closed source, the non-domain specific bits are actually open source and may be found at the links below (so, they may be cloned in git and I'll reference them in this post):

https://github.com/fabioz/pyvmmonitor-framework
https://github.com/fabioz/pyvmmonitor-core

1. Plugin system based on interfaces (interfaces in this case being the java concept of interfaces or ABCs in Python) -- usually I like to be explicit about the interfaces provided and registering the implementors for those interfaces (if you want your programs to be extensible, you really should be programming based on interfaces -- for me it helps to think how a client would consume it, instead of thinking about issues on how to actually implement it).

This is also the mechanism that provides dependency-injection (so, you can swap out some implementor -- although it's not the classing dependency-injection because you still ask for things instead of them appearing magically for you at some variable).

The structure would be something like:


pm = PluginManager()
pm.register(EPView, 'my.View') # As a note, EP is a short for 'Extension Point'.
pm.register(EPMenu, 'my.Menu1') # Implementors are registered as strings to avoid
pm.register(EPMenu, 'my.Menu2') # having to import the classes (to cut on startup time).


An example of use would be:


# Keeps the EPView instance alive inside the PluginManager
# (more details ahead on item #2) and starts the view main loop.
view = pm.get_instance(EPView).main_loop()

# The EPView implementation could create its menus by asking the EPMenus registered.
menus = pm.get_implementations(EPMenu)
for menu in menus:
view.create_menu(menu)


The actual implementation that PyVmMonitor is using can be found at https://github.com/fabioz/pyvmmonitor-core/blob/master/pyvmmonitor_core/plugins.py

You can see that we're not worried about discoverability as most plugin frameworks are, as we're being explicit about registering things (and it should be easy to add that to the structure if we do need the extensibility for clients).

Also, in this particular implementation, I actually skipped the plugins as things are self contained and I didn't want the added complexity, but usually you'd register plugins and then the extension points and implementations would be registered only through plugins (and would specify the dependencies to other plugins).

2. A place to hold your instances:

I like to keep track of where instances I created are... usually it's difficult to track things in long-lived applications (which is usually not a problem in web-based applications because the model is really the database and objects are short-lived).

In the Eclipse SDK for instance it's easy to see many singleton-like structures spread apart many places and there's no unified approach to it. Every place has an API, so, there's a platform which has a workbench which has a part which has an editor... (I know, that's plain object orientation, but I find that lacking when extending things and there's always a different API for accessing anything).

So, instead of using many different APIs there are 2 main places where instances live -- and that's also the place to query for instances:


Note that this means that the 'ownership' of the items is one of these 2 places -- and they're quite different: when it's in the PluginManager, it'll be kept alive until the application finishes (although it's lazily started). For items in the EPModelsContainer, when any instance leaves the EPModelsContainer it should be readily deleted as well as anything it holds (so, other places should only keep a weak-reference or should monitor the removal of the item from the models container to make the proper cleanup so that the object can be garbage collected at that point).

The actual extension point that PyVmMonitor is using can be found at https://github.com/fabioz/pyvmmonitor-framework/blob/master/pyvmmonitor_framework/extensions/ep_models_container.py (note that it provides classes-based filtering and tree-iteration and could be extended to provide a JQuery-like api to query it).

Also, test-cases should test that before/after each test, all the instances created in the PluginManager and in the EPModelsContainer are garbage-collected! Note that on non-deterministic garbage-collection implementations such as PyPy/Jython this is not feasible because they aren't immediately collected, but on CPython with reference counting, this works great. So, at least test on CPython -- then if you don't have binary dependencies, run on PyPy :)

3. Callbacks:

As we've just defined that the instances ownership is really on the EPModelsContainer, there's room for callbacks which only keep weak-references to bound methods when you're interested is something -- in which case https://github.com/fabioz/pyvmmonitor-core/blob/master/pyvmmonitor_core/callback.py is a pretty good implementation for that... note that for top-level methods, strong references are kept.

Why you may ask? Well, the main reason is that this use-case is usually for closures, so, it may be hard to find a place to add the function to -- and if it's a top level, the function will be alive until the end of times anyway (i.e.: process shutdown).

Anyways, this is probably a case that should only be used with care as unregistering must be explicit and things in the function scope will be kept alive!

4. A standard selection mechanism.

Again, as we just defined that all our model lives inside EPModelsContainer, the selection is usually simply keeping the id of the object(s) selected.

In PyVmMonitor, the base extension for this is the EPSelectionService. The concept is pretty simple: clients can listen to changes in the selection (through a Callback), can trigger selection changes and can get the current selection. As PyVmMonitor accepts multiple selection to inspect multiple processes at once, it always deals with a list(obj_id) and clients act accordingly to the selection to show the proper UI.

The extension interface used in PyVmMonitor lives at https://github.com/fabioz/pyvmmonitor-framework/blob/master/pyvmmonitor_framework/extensions/ep_selection_service.py

5. Undo/redo: Well, in PyVmMonitor there's really no undo/redo functionality because it's not really required for what it does, but on other applications I worked that had undo/redo, the basis was actually on the model entities that entered the EPModelsContainer: if they implemented an interface saying that they provided changes in them, they'd be tracked and commands would be automatically added to a command list for undo/redo purposes when the model changed (simply by recording the id of the object and the attributes new/previous values -- as well as providing a memento for the specific object when it entered/left the EPModelsContainer)... This is a bit different from using the command pattern because it's all done from the outside as we'd actually be hearing changes in the model instead of actively changing how we code to add the command pattern (which IMHO makes code much more verbose than it needs to be).

6. The actual UI... Well, for PyVmMonitor I didn't go through the effort of actually creating a reusable UI, and other projects I worked in which did have that concern aren't actually mine to open source, but the idea here would be making a view which would query for EPMenu, EPToolbar, EPAction, EPCentralWidget, EPDock, etc and would create the actual UI from that. For PyVmMonitor I just have a simple non-extensible UI which listens to the EPSelectionService and updates its central editor accordingly, and the UI has a 'def set_model' which receives the actual model or models it should show.

Well, that's it, hope you liked the approach, as I said, I've used it some times in the last years and it works great for me (but I'm interested if there are better approaches out there which could be reused or improve on those aspects).

26 Feb 2015 3:46pm GMT

Fabio Zadrozny: Design for client-side applications in Python (but applicable to other languages too)


Ok, so, this is a post I wanted to write for some time already and never really got the time to do...

First off, this is a design I've worked some times during the years on client-side applications (Python and Javascript). It revolves around some simple concepts which can be used to develop small to big sized applications, and is suited to places where there are more long-lived instances -- which is usually the case on client-side applications (and usually not for server-side web applications).

These are also the concepts I used when implementing PyVmMonitor (http://pyvmmonitor.com).

For those interested, although PyVmMonitor is closed source, the non-domain specific bits are actually open source and may be found at the links below (so, they may be cloned in git and I'll reference them in this post):

https://github.com/fabioz/pyvmmonitor-framework
https://github.com/fabioz/pyvmmonitor-core

1. Plugin system based on interfaces (interfaces in this case being the java concept of interfaces or ABCs in Python) -- usually I like to be explicit about the interfaces provided and registering the implementors for those interfaces (if you want your programs to be extensible, you really should be programming based on interfaces -- for me it helps to think how a client would consume it, instead of thinking about issues on how to actually implement it).

This is also the mechanism that provides dependency-injection (so, you can swap out some implementor -- although it's not the classing dependency-injection because you still ask for things instead of them appearing magically for you at some variable).

The structure would be something like:


pm = PluginManager()
pm.register(EPView, 'my.View') # As a note, EP is a short for 'Extension Point'.
pm.register(EPMenu, 'my.Menu1') # Implementors are registered as strings to avoid
pm.register(EPMenu, 'my.Menu2') # having to import the classes (to cut on startup time).


An example of use would be:


# Keeps the EPView instance alive inside the PluginManager
# (more details ahead on item #2) and starts the view main loop.
view = pm.get_instance(EPView).main_loop()

# The EPView implementation could create its menus by asking the EPMenus registered.
menus = pm.get_implementations(EPMenu)
for menu in menus:
view.create_menu(menu)


The actual implementation that PyVmMonitor is using can be found at https://github.com/fabioz/pyvmmonitor-core/blob/master/pyvmmonitor_core/plugins.py

You can see that we're not worried about discoverability as most plugin frameworks are, as we're being explicit about registering things (and it should be easy to add that to the structure if we do need the extensibility for clients).

Also, in this particular implementation, I actually skipped the plugins as things are self contained and I didn't want the added complexity, but usually you'd register plugins and then the extension points and implementations would be registered only through plugins (and would specify the dependencies to other plugins).

2. A place to hold your instances:

I like to keep track of where instances I created are... usually it's difficult to track things in long-lived applications (which is usually not a problem in web-based applications because the model is really the database and objects are short-lived).

In the Eclipse SDK for instance it's easy to see many singleton-like structures spread apart many places and there's no unified approach to it. Every place has an API, so, there's a platform which has a workbench which has a part which has an editor... (I know, that's plain object orientation, but I find that lacking when extending things and there's always a different API for accessing anything).

So, instead of using many different APIs there are 2 main places where instances live -- and that's also the place to query for instances:


Note that this means that the 'ownership' of the items is one of these 2 places -- and they're quite different: when it's in the PluginManager, it'll be kept alive until the application finishes (although it's lazily started). For items in the EPModelsContainer, when any instance leaves the EPModelsContainer it should be readily deleted as well as anything it holds (so, other places should only keep a weak-reference or should monitor the removal of the item from the models container to make the proper cleanup so that the object can be garbage collected at that point).

The actual extension point that PyVmMonitor is using can be found at https://github.com/fabioz/pyvmmonitor-framework/blob/master/pyvmmonitor_framework/extensions/ep_models_container.py (note that it provides classes-based filtering and tree-iteration and could be extended to provide a JQuery-like api to query it).

Also, test-cases should test that before/after each test, all the instances created in the PluginManager and in the EPModelsContainer are garbage-collected! Note that on non-deterministic garbage-collection implementations such as PyPy/Jython this is not feasible because they aren't immediately collected, but on CPython with reference counting, this works great. So, at least test on CPython -- then if you don't have binary dependencies, run on PyPy :)

3. Callbacks:

As we've just defined that the instances ownership is really on the EPModelsContainer, there's room for callbacks which only keep weak-references to bound methods when you're interested is something -- in which case https://github.com/fabioz/pyvmmonitor-core/blob/master/pyvmmonitor_core/callback.py is a pretty good implementation for that... note that for top-level methods, strong references are kept.

Why you may ask? Well, the main reason is that this use-case is usually for closures, so, it may be hard to find a place to add the function to -- and if it's a top level, the function will be alive until the end of times anyway (i.e.: process shutdown).

Anyways, this is probably a case that should only be used with care as unregistering must be explicit and things in the function scope will be kept alive!

4. A standard selection mechanism.

Again, as we just defined that all our model lives inside EPModelsContainer, the selection is usually simply keeping the id of the object(s) selected.

In PyVmMonitor, the base extension for this is the EPSelectionService. The concept is pretty simple: clients can listen to changes in the selection (through a Callback), can trigger selection changes and can get the current selection. As PyVmMonitor accepts multiple selection to inspect multiple processes at once, it always deals with a list(obj_id) and clients act accordingly to the selection to show the proper UI.

The extension interface used in PyVmMonitor lives at https://github.com/fabioz/pyvmmonitor-framework/blob/master/pyvmmonitor_framework/extensions/ep_selection_service.py

5. Undo/redo: Well, in PyVmMonitor there's really no undo/redo functionality because it's not really required for what it does, but on other applications I worked that had undo/redo, the basis was actually on the model entities that entered the EPModelsContainer: if they implemented an interface saying that they provided changes in them, they'd be tracked and commands would be automatically added to a command list for undo/redo purposes when the model changed (simply by recording the id of the object and the attributes new/previous values -- as well as providing a memento for the specific object when it entered/left the EPModelsContainer)... This is a bit different from using the command pattern because it's all done from the outside as we'd actually be hearing changes in the model instead of actively changing how we code to add the command pattern (which IMHO makes code much more verbose than it needs to be).

6. The actual UI... Well, for PyVmMonitor I didn't go through the effort of actually creating a reusable UI, and other projects I worked in which did have that concern aren't actually mine to open source, but the idea here would be making a view which would query for EPMenu, EPToolbar, EPAction, EPCentralWidget, EPDock, etc and would create the actual UI from that. For PyVmMonitor I just have a simple non-extensible UI which listens to the EPSelectionService and updates its central editor accordingly, and the UI has a 'def set_model' which receives the actual model or models it should show.

Well, that's it, hope you liked the approach, as I said, I've used it some times in the last years and it works great for me (but I'm interested if there are better approaches out there which could be reused or improve on those aspects).

26 Feb 2015 3:46pm GMT

EuroPython Society: EuroPython 2015: Launch preparations are underway

The EuroPython Workgroups are busy preparing the launch of the website. Just launched in mid-January, all workgroups (WGs) are fully under steam by now, working hard to make EuroPython 2015 a fabulous event.

Community building the conference

The On-site Team WG is doing a wonderful job getting us the best possible deals in Bilbao, the Web WG is knee deep into code and docker containers setting up the website, the Marketing & Design WG working with the designers to create wonderful logos and brochures, the Program WG contacting keynote speakers and creating the call for proposals, the Finance WG building the budget and making sure the conference stays affordable for everyone, the Support WG setting up the online help desk to answer your questions, the Communications WG preparing to create a constant stream of exciting news updates, the Administration WG is managing the many accounts, contracts and services needed to run the organization.

The Financial Aid WG and Media WG are preparing to start their part of the conference organization later in March.

The WGs are all staffed with members from the ACPySS on-site team, the EuroPython Society and volunteers from the EuroPython community to drive the organization forward and we're getting a lot done in a very short time frame.

More help needed

We are very happy with the help we are getting from the community, but there still is a lot more to be done. If you want to help us build a great EuroPython conference, please consider joining one of the above workgroups.

Stay tuned and be sure to follow the EuroPython Blog for updates on the conference.

Enjoy,
-
EuroPython Society

26 Feb 2015 3:08pm GMT

EuroPython Society: EuroPython 2015: Launch preparations are underway

The EuroPython Workgroups are busy preparing the launch of the website. Just launched in mid-January, all workgroups (WGs) are fully under steam by now, working hard to make EuroPython 2015 a fabulous event.

Community building the conference

The On-site Team WG is doing a wonderful job getting us the best possible deals in Bilbao, the Web WG is knee deep into code and docker containers setting up the website, the Marketing & Design WG working with the designers to create wonderful logos and brochures, the Program WG contacting keynote speakers and creating the call for proposals, the Finance WG building the budget and making sure the conference stays affordable for everyone, the Support WG setting up the online help desk to answer your questions, the Communications WG preparing to create a constant stream of exciting news updates, the Administration WG is managing the many accounts, contracts and services needed to run the organization.

The Financial Aid WG and Media WG are preparing to start their part of the conference organization later in March.

The WGs are all staffed with members from the ACPySS on-site team, the EuroPython Society and volunteers from the EuroPython community to drive the organization forward and we're getting a lot done in a very short time frame.

More help needed

We are very happy with the help we are getting from the community, but there still is a lot more to be done. If you want to help us build a great EuroPython conference, please consider joining one of the above workgroups.

Stay tuned and be sure to follow the EuroPython Blog for updates on the conference.

Enjoy,
-
EuroPython Society

26 Feb 2015 3:08pm GMT

Geert Vanderkelen: Connector/Python 2.1.1 Alpha released with C Extension

MySQL Connector/Python 2.1.1 took a while to release and that was because we had to add some more packages which contains the optional C Extension. Note that this is still Alpha and we want you guys to report any problems and requests.

The Connector/Python C Extension was added because in certain situations, for example reading a huge result set, can take a long time with pure Python. That's why we choose to interface with Connector/C (libmysqlclient).

Note: Pure Python is still default and it will be kept that way!

Installing Connector/Python 2.1 didn't change much:

shell> python setup.py install

If you'd like the C Extension, you have to first install MySQL Connector/C or have the MySQL Server development packages available. Careful with mixing 32 and 64-bit: make sure Python matches your MySQL libraries. Connector/Python will try to detect the mismatch and notify you.

For example, on OS X with development tools installed, I would do the following:

shell> virtualenv CPYENV
shell> source CPYENV/bin/activate
shell> tar xzf ~/Downloads/mysql-connector-c-6.1.5-osx10.7-x86_64.tar.gz
shell> tar xzf ~/Downloads/mysql-connector-python-2.1.1.tar.gz
shell> cd mysql-connector-2.1.1
shell> python setup.py install --with-mysql-capi=../mysql-connector-c-6.1.5-osx10.7-x86_64

If all goes well, the above would have compiled and install the C Extension together with the pure Python code inside a virtual environment. Here is how you can check if the C Extension is available:

>>> import mysql.connector
>>> mysql.connector.HAVE_CEXT
True

If you want to see the speed improvements, you can load up the employees sample database and do the following in the Python interpreter:

shell> python
>>> import mysql.connector
>>> cnx = mysql.connector.connect(user='root', database='employees')
>>> cnxc = mysql.connector.connect(use_pure=False, user='root', database='employees')
>>> cur = cnx.cursor()
>>> q = "SELECT * FROM salaries"
>>> s=time(); cur.execute(q); r=cur.fetchall(); print("%.2f" % (time()-s))
65.57
>>> cur = cnxc.cursor()
>>> s=time(); cur.execute(q); r=cur.fetchall(); print("%.2f" % (time()-s))
13.09

That's 66 seconds vs. 13 seconds using the C Extension.

If that is not fast enough, and it is not, you can directly load the C Extension and use the wrapper around the MySQL C API (see manual). Here's an example:

>>> import _mysql_connector
>>> cnx = _mysql_connector.MySQL()
>>> cnx.connect(user='root', database='employees')
>>> cnx.query("SELECT emp_no, last_name, hire_date FROM employees")
True
>>> cnx.fetch_row()
(10001, 'Facello', datetime.date(1986, 6, 26))
>>> cnx.free_result()
>>> cnx.close()

It is a bit different than using mysql.connector, but notice that result coming from the C Extension is also converted to Python data types.

How fast is using _mysql_connector? Lets say we want the raw data, save the following to a Python script file and execute:

from time import time
import _mysql_connector
cnx = _mysql_connector.MySQL(raw=True)
cnx.connect(user='root', database='employees')
cnx.query("SELECT * FROM salaries")
s = time()
row = cnx.fetch_row()
while row:
  row = cnx.fetch_row()
cnx.free_result()

print("All fetched in %.2fs" % (time() - s))

cnx.close()

The output would be something like this:

All fetched in 2.25s

If you put it all together, and this is not scientific, just on my OS X MacBook, SELECT * FORM salaries:

If you want to dump big sets of data, and you want to do it the Python way, you can use the C Extension to get it faster.

Yes, the C Extension works and compiles on Windows!

26 Feb 2015 11:32am GMT

Geert Vanderkelen: Connector/Python 2.1.1 Alpha released with C Extension

MySQL Connector/Python 2.1.1 took a while to release and that was because we had to add some more packages which contains the optional C Extension. Note that this is still Alpha and we want you guys to report any problems and requests.

The Connector/Python C Extension was added because in certain situations, for example reading a huge result set, can take a long time with pure Python. That's why we choose to interface with Connector/C (libmysqlclient).

Note: Pure Python is still default and it will be kept that way!

Installing Connector/Python 2.1 didn't change much:

shell> python setup.py install

If you'd like the C Extension, you have to first install MySQL Connector/C or have the MySQL Server development packages available. Careful with mixing 32 and 64-bit: make sure Python matches your MySQL libraries. Connector/Python will try to detect the mismatch and notify you.

For example, on OS X with development tools installed, I would do the following:

shell> virtualenv CPYENV
shell> source CPYENV/bin/activate
shell> tar xzf ~/Downloads/mysql-connector-c-6.1.5-osx10.7-x86_64.tar.gz
shell> tar xzf ~/Downloads/mysql-connector-python-2.1.1.tar.gz
shell> cd mysql-connector-2.1.1
shell> python setup.py install --with-mysql-capi=../mysql-connector-c-6.1.5-osx10.7-x86_64

If all goes well, the above would have compiled and install the C Extension together with the pure Python code inside a virtual environment. Here is how you can check if the C Extension is available:

>>> import mysql.connector
>>> mysql.connector.HAVE_CEXT
True

If you want to see the speed improvements, you can load up the employees sample database and do the following in the Python interpreter:

shell> python
>>> import mysql.connector
>>> cnx = mysql.connector.connect(user='root', database='employees')
>>> cnxc = mysql.connector.connect(use_pure=False, user='root', database='employees')
>>> cur = cnx.cursor()
>>> q = "SELECT * FROM salaries"
>>> s=time(); cur.execute(q); r=cur.fetchall(); print("%.2f" % (time()-s))
65.57
>>> cur = cnxc.cursor()
>>> s=time(); cur.execute(q); r=cur.fetchall(); print("%.2f" % (time()-s))
13.09

That's 66 seconds vs. 13 seconds using the C Extension.

If that is not fast enough, and it is not, you can directly load the C Extension and use the wrapper around the MySQL C API (see manual). Here's an example:

>>> import _mysql_connector
>>> cnx = _mysql_connector.MySQL()
>>> cnx.connect(user='root', database='employees')
>>> cnx.query("SELECT emp_no, last_name, hire_date FROM employees")
True
>>> cnx.fetch_row()
(10001, 'Facello', datetime.date(1986, 6, 26))
>>> cnx.free_result()
>>> cnx.close()

It is a bit different than using mysql.connector, but notice that result coming from the C Extension is also converted to Python data types.

How fast is using _mysql_connector? Lets say we want the raw data, save the following to a Python script file and execute:

from time import time
import _mysql_connector
cnx = _mysql_connector.MySQL(raw=True)
cnx.connect(user='root', database='employees')
cnx.query("SELECT * FROM salaries")
s = time()
row = cnx.fetch_row()
while row:
  row = cnx.fetch_row()
cnx.free_result()

print("All fetched in %.2fs" % (time() - s))

cnx.close()

The output would be something like this:

All fetched in 2.25s

If you put it all together, and this is not scientific, just on my OS X MacBook, SELECT * FORM salaries:

If you want to dump big sets of data, and you want to do it the Python way, you can use the C Extension to get it faster.

Yes, the C Extension works and compiles on Windows!

26 Feb 2015 11:32am GMT

Montreal Python User Group: Django Carrots Workshop – PyCon Edition!


Django Carrots - PyCon Edition is a one-­day, intensive, free programming workshop for anyone who wants to learn how to code.

Organized by Geek Girls Carrots (GGC), the Django Carrots curriculum emphasizes individual contact between the student and mentor (before, during and after the workshop), a horizontal structure (students learn from mentors but also from each other), and sharing general knowledge about communities, organization and internet resources to help our participants after the workshop.


How can I sign up?

Please fill out this registration form. Registration is open from February 2nd to March 1st. Everyone who registered should get a response before March 5th. You do not need a ticket for PyCon to attend the Django Carrots workshop during PyCon, but we strongly encourage you to attend the conference as well.

What do I need during the workshop?

Remember: You don't need to have any additional knowledge about programming to attend (we mean that!). The most important thing is your motivation and readiness to learn!

Who are the mentors?

Our mentors are active programmers working in leading tech companies and universities in Europe and North America. In addition to knowledge and job experience, they are skilled educators who care about diversity and cultural changes in the tech industry.

We look forward to coding with you in Montreal!


Geek Girls Carrots ­is a global community and social enterprise focused on connecting, teaching and inspiring women in Tech and IT. We create community by organizing meetings, workshops, hackathons and other events, gathering people to share their knowledge and experience. We gather in 22 cities, including Warsaw, NYC, London, Berlin, Luxembourg and Sydney.

26 Feb 2015 5:00am GMT

Montreal Python User Group: Django Carrots Workshop – PyCon Edition!


Django Carrots - PyCon Edition is a one-­day, intensive, free programming workshop for anyone who wants to learn how to code.

Organized by Geek Girls Carrots (GGC), the Django Carrots curriculum emphasizes individual contact between the student and mentor (before, during and after the workshop), a horizontal structure (students learn from mentors but also from each other), and sharing general knowledge about communities, organization and internet resources to help our participants after the workshop.


How can I sign up?

Please fill out this registration form. Registration is open from February 2nd to March 1st. Everyone who registered should get a response before March 5th. You do not need a ticket for PyCon to attend the Django Carrots workshop during PyCon, but we strongly encourage you to attend the conference as well.

What do I need during the workshop?

Remember: You don't need to have any additional knowledge about programming to attend (we mean that!). The most important thing is your motivation and readiness to learn!

Who are the mentors?

Our mentors are active programmers working in leading tech companies and universities in Europe and North America. In addition to knowledge and job experience, they are skilled educators who care about diversity and cultural changes in the tech industry.

We look forward to coding with you in Montreal!


Geek Girls Carrots ­is a global community and social enterprise focused on connecting, teaching and inspiring women in Tech and IT. We create community by organizing meetings, workshops, hackathons and other events, gathering people to share their knowledge and experience. We gather in 22 cities, including Warsaw, NYC, London, Berlin, Luxembourg and Sydney.

26 Feb 2015 5:00am GMT

25 Feb 2015

feedPlanet Python

Ludovic Gasc: Open letter for the sync world

Theses days, I've seen more and more haters about the async community in Python, especially around AsyncIO.
I think this is sad and counter-productive.
I feel that for some people, frustrations or misunderstandings about the place of this new tool might be the cause, so I'd like to share some of my thoughts about it.


Just a proven pattern, not a "who has the biggest d*" contest


Some micro-benchmarks have been published to try to explain that AsyncIO isn't really efficient.
We all know that it is possible to have benchmarks prove about anything, and that the world isn't black or white.
So just for the sake of completeness, here are some macro-benchmarks based on Web applications examples: http://blog.gmludo.eu/2015/02/macro-benchmark-with-django-flask-and-asyncio.html


Now, before starting a ping-pong to try to determine who has the biggest, please read further:

Asynchronous/coroutine pattern isn't a new fancy stuff to decrease developer productivity and performance.
In fact, the idea of asynchrounous, non-blocking IO has been around in many OSes and programming languages for years.
In Linux for example, Asynchronous I/O Support was added to kernel 2.5, back in 2003, you can even find some specifications back in 1997 (http://pubs.opengroup.org/onlinepubs/007908799/xsh/aio.h.html)
It started to gain more visibility with (amongst others) NodeJS a couple of years ago.
This pattern is now included in most new languages (Go...) and is made available in older languages (Python, C#...).

Async isn't a silver bullet, especially for intensive calculations, but for I/O, at least from my experience, it seems to be much more efficient.


The lengthy but successful maturation process of a new standard


In the Python world, a number of alternatives were available (Gevent, Twisted, Eventlet, libevent, stackless,...) each with their own strengths and weaknesses.
Each of them went to a maturation process and could eventually be used on real production environments.

It was really clever for Guido to take all good ideas from all these async frameworks to create AsyncIO.
Instead of having a number of different frameworks, each of them reinventing the wheel on an island,
AsyncIO should help to have a "lingua franca" for doing async in Python.
This is pretty important because once you enter in the async world, all your usual tools and libs (like your favourite DB lib) should also be async compliant.
Because, AsyncIO isn't just a library, it will become the "standard" way to write async code with Python.


If Async means rewriting my perfectly working code, why should I bother ?


To integrate cleanly AsyncIO in your library or your application, you have to rethink the internal architecture.
When you start a new project in "async mode", you can't keep sync for the part of it: to get all async benefits, everything should be async.

But, this isn't mandatory from day 1: you can start simple, and port your code to the async pattern step-by-step.

I can understand some haters reactions: Internet is a big swarm where you have a lot of trends and hype.
Finally, few tools and patterns will really survive to the production's fire.
Meanwhile, you already wrote a lot of perfectly working code, and obviously you really don't want to rewrite that just for the promises of the latest buzz-word.

It's like oriented object programming, years ago, it suddenly became the new "proper" way of writing your code (some said),
and you couldn't be object and procedural in the same time.
Years later, procedural isn't completely dead, because in fact, OO sometimes brings unnecessary overhead.
It really depends on what sort of things you are writing (size matters!).
On the other hand, in 2015, who writes a full-Monty application with procedural only ?

I think one day, it will be the same for the async pattern.
It is always better to driving the change than to endure the change.
Think organic: on the long term, it is not the strongest that survives, nor is it the most intelligent.
It is usually the one being most open and adaptive to changes.


Buzzword, or real paradigm change ?


We don't know for sure if the async pattern is only a temporary fashion buzzword or a real paradigm shift in IT, just like virtualization has become a de-facto standard over the last few years.

But my feeling is that it is here to stay, even if it won't be relevant for all Python projects.
I think it will become the right way to build efficient and scalable I/O-Bound projects,

For example, in an Internet (network) driven world, I see more and more projects centred around piping between cloud-based services.
For this type of developments, I'm personally convinced a paradigm shift has become unavoidable, and for Pythonists AsyncIO is probably the right horse to bet on.



Does anyone really care or "will I be paid more" ?


Let's face it, beside your geek fellows, nobody cares about the tools you are using:
Your users just want features for yesterday, as few bugs as possible, and they want their application to be fast and responsive.
Who cares if you use async, or some other hoodoo-voodoo-black-magic to reach the goal ?

I think that, by starting a "religious war" between sync and async Python developers, we would all waste our (precious) time.
Instead, we should cultivate emulation between Pythonistas, build solutions to increase real-world performances and stability.
Then let Darwin show us the long term path and adapt to it.

In the end, the whole Python community will benefit if Python is considered as a great language to write business logic with ease AND with brute performance.
We are all tired to hear people in other communities say that Python is slow, we are all convinced this is simply not true.

This is a communication war that the Python community has to win as a team.

PS: Special thanks to Nicolas Stein, aka. Nike, for the review of this text and his precious advices in general to stimulate a scientific approach of problems.

25 Feb 2015 4:48pm GMT

Ludovic Gasc: Open letter for the sync world

Theses days, I've seen more and more haters about the async community in Python, especially around AsyncIO.
I think this is sad and counter-productive.
I feel that for some people, frustrations or misunderstandings about the place of this new tool might be the cause, so I'd like to share some of my thoughts about it.


Just a proven pattern, not a "who has the biggest d*" contest


Some micro-benchmarks have been published to try to explain that AsyncIO isn't really efficient.
We all know that it is possible to have benchmarks prove about anything, and that the world isn't black or white.
So just for the sake of completeness, here are some macro-benchmarks based on Web applications examples: http://blog.gmludo.eu/2015/02/macro-benchmark-with-django-flask-and-asyncio.html


Now, before starting a ping-pong to try to determine who has the biggest, please read further:

Asynchronous/coroutine pattern isn't a new fancy stuff to decrease developer productivity and performance.
In fact, the idea of asynchrounous, non-blocking IO has been around in many OSes and programming languages for years.
In Linux for example, Asynchronous I/O Support was added to kernel 2.5, back in 2003, you can even find some specifications back in 1997 (http://pubs.opengroup.org/onlinepubs/007908799/xsh/aio.h.html)
It started to gain more visibility with (amongst others) NodeJS a couple of years ago.
This pattern is now included in most new languages (Go...) and is made available in older languages (Python, C#...).

Async isn't a silver bullet, especially for intensive calculations, but for I/O, at least from my experience, it seems to be much more efficient.


The lengthy but successful maturation process of a new standard


In the Python world, a number of alternatives were available (Gevent, Twisted, Eventlet, libevent, stackless,...) each with their own strengths and weaknesses.
Each of them went to a maturation process and could eventually be used on real production environments.

It was really clever for Guido to take all good ideas from all these async frameworks to create AsyncIO.
Instead of having a number of different frameworks, each of them reinventing the wheel on an island,
AsyncIO should help to have a "lingua franca" for doing async in Python.
This is pretty important because once you enter in the async world, all your usual tools and libs (like your favourite DB lib) should also be async compliant.
Because, AsyncIO isn't just a library, it will become the "standard" way to write async code with Python.


If Async means rewriting my perfectly working code, why should I bother ?


To integrate cleanly AsyncIO in your library or your application, you have to rethink the internal architecture.
When you start a new project in "async mode", you can't keep sync for the part of it: to get all async benefits, everything should be async.

But, this isn't mandatory from day 1: you can start simple, and port your code to the async pattern step-by-step.

I can understand some haters reactions: Internet is a big swarm where you have a lot of trends and hype.
Finally, few tools and patterns will really survive to the production's fire.
Meanwhile, you already wrote a lot of perfectly working code, and obviously you really don't want to rewrite that just for the promises of the latest buzz-word.

It's like oriented object programming, years ago, it suddenly became the new "proper" way of writing your code (some said),
and you couldn't be object and procedural in the same time.
Years later, procedural isn't completely dead, because in fact, OO sometimes brings unnecessary overhead.
It really depends on what sort of things you are writing (size matters!).
On the other hand, in 2015, who writes a full-Monty application with procedural only ?

I think one day, it will be the same for the async pattern.
It is always better to driving the change than to endure the change.
Think organic: on the long term, it is not the strongest that survives, nor is it the most intelligent.
It is usually the one being most open and adaptive to changes.


Buzzword, or real paradigm change ?


We don't know for sure if the async pattern is only a temporary fashion buzzword or a real paradigm shift in IT, just like virtualization has become a de-facto standard over the last few years.

But my feeling is that it is here to stay, even if it won't be relevant for all Python projects.
I think it will become the right way to build efficient and scalable I/O-Bound projects,

For example, in an Internet (network) driven world, I see more and more projects centred around piping between cloud-based services.
For this type of developments, I'm personally convinced a paradigm shift has become unavoidable, and for Pythonists AsyncIO is probably the right horse to bet on.



Does anyone really care or "will I be paid more" ?


Let's face it, beside your geek fellows, nobody cares about the tools you are using:
Your users just want features for yesterday, as few bugs as possible, and they want their application to be fast and responsive.
Who cares if you use async, or some other hoodoo-voodoo-black-magic to reach the goal ?

I think that, by starting a "religious war" between sync and async Python developers, we would all waste our (precious) time.
Instead, we should cultivate emulation between Pythonistas, build solutions to increase real-world performances and stability.
Then let Darwin show us the long term path and adapt to it.

In the end, the whole Python community will benefit if Python is considered as a great language to write business logic with ease AND with brute performance.
We are all tired to hear people in other communities say that Python is slow, we are all convinced this is simply not true.

This is a communication war that the Python community has to win as a team.

PS: Special thanks to Nicolas Stein, aka. Nike, for the review of this text and his precious advices in general to stimulate a scientific approach of problems.

25 Feb 2015 4:48pm GMT

Ludovic Gasc: Macro-benchmark with Django, Flask and AsyncIO (aiohttp.web+API-Hour)

Disclaimer: If you have some bias and/or dislike AsyncIO, please read my previous blog post before starting a war.

Tip: If you don't have the time to read the text, scroll down to see graphics.


Context of this macro-benchmark

Today, I propose you to benchmark a HTTP daemon based on AsyncIO, and compare results with a Flask and Django version.

For those who didn't follow AsyncIO news, aiohttp.web is a light Web framework based on aiohttp. It's like Flask but with less internal layers.
aiohttp is the implementation of HTTP with AsyncIO.

Moreover, API-Hour helps you to have multiprocess daemons with AsyncIO.
With this tool, we can compare Flask, Django and aiohttp.web in the same conditions.
This benchmark is based on a concrete need of one of our customers: they wanted to have a REST/JSON API to interact with their telephony server, based on Asterisk.
One of the WebServices gives the list of agents with their status. This WebService is heavily used because they use it on their public Website (itself having a serious traffic) to show who is available.

First, I've made a HTTP daemon based on Flask and Gunicorn, which gave honorable results. Later on, I replaced the HTTP part and pushed in production a daemon based on aiohttp.web and API-Hour.
A subset of theses daemons are used for this benchmark.
I've added a Django version because with Django and Flask, I certainly cover 90% of tools used by Python Web developers.

I've tried to have the same parameters for each daemon: for example, I obviously use the same number of workers, 16 in this benchmark.

I don't benchmark Django manage.py or dev HTTP server of Flask, I use Gunicorn, as most people use on production, to try to compare apples with apples.

Hardware

Network benchmark

I've almost 1Gb/s with this network:

On Server:

$ iperf -c 192.168.2.101 -d
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 28.6 MByte (default)
------------------------------------------------------------
------------------------------------------------------------
Client connecting to 192.168.2.101, TCP port 5001
TCP window size: 28.6 MByte (default)
------------------------------------------------------------
[ 5] local 192.168.2.100 port 24831 connected with 192.168.2.101 port 5001
[ 4] local 192.168.2.100 port 5001 connected with 192.168.2.101 port 16316
[ ID] Interval Transfer Bandwidth
[ 4] 0.0-10.1 sec 1.06 GBytes 903 Mbits/sec
[ 5] 0.0-10.1 sec 1.11 GBytes 943 Mbits/sec


On Client:

$ iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 28.6 MByte (default)
------------------------------------------------------------
[ 4] local 192.168.2.101 port 5001 connected with 192.168.2.100 port 24831
------------------------------------------------------------
Client connecting to 192.168.2.100, TCP port 5001
TCP window size: 28.6 MByte (default)
------------------------------------------------------------
[ 6] local 192.168.2.101 port 16316 connected with 192.168.2.100 port 5001
[ ID] Interval Transfer Bandwidth
[ 6] 0.0-10.0 sec 1.06 GBytes 908 Mbits/sec
[ 4] 0.0-10.2 sec 1.11 GBytes 927 Mbits/sec



System configuration
It's important to configure your PostgreSQL as a production server.
You need also to configure your Linux kernel to handle a lot of open sockets and some TCP tricks.
Everything is in the benchmark repository.

Client benchmark tool

From my experience with AsyncIO, Apache Benchmark (ab), Siege, Funkload and some old fashion HTTP benchmarks tools don't hit enough for an API-Hour daemon.
For now, I use wrk and wrk2 to benchmark.
wrk hits as fast as possible, where wrk2 hits with the same rate.

Metrics observed

I record three metrics:

  1. Requests/sec: Least interesting of metrics. (see below)
  2. Error rate: Sum of all errors (socket timeout, socket read/write, 5XX errors...)
  3. Reactivity: Certainly the most interesting of the three, it measures the time that our client will actually wait.


WebServices daemons

You can find all source code in API-Hour repository: https://github.com/Eyepea/API-Hour/tree/master/benchmarks
Each daemon has at least two WebServices:

On Flask daemon, I added /agents_with_pool endpoint, to use a database connection pool with Flask, but it isn't really good, you'll see later.
On Django daemon, I added /agents_with_orm endpoint, to measure the overhead to use Django-ORM instead of to use SQL directly. Warning: I didn't find a solution to have the exact same query.

Methodology

Each daemon will run alone to preserve resources.
Between each run, the daemon is restarted to be sure that previous test doesn't pollute the next one.

First turn

At the beginning, to have an idea how much maximum HTTP queries each daemon can support, I quickly attack (30 seconds) on localhost.

Warning ! This benchmark doesn't represent the reality you can have in production, because you don't have a network limitation nor latency, it's only for calibration.

Simple JSON document

In each daemon folder in benchmarks repository, you can read the output result of each wrk.
To simplify the reading, I summarize the captured values with an array and graphs:


Requests/s Errors Avg Latency (s)
Django+Gunicorn 70598 4489 7.7
Flask+Gunicorn 79598 4433 13.16
aiohttp.web+API-Hour 395847 0 0.03


Requests by seconds
(Higher is better)
Errors
(Lower is better)


Latency (s)
(Lower is better)

Agents list from database


Requests/s Errors Avg Latency (s)
Django+Gunicorn 583 2518 0.324
Django ORM+Gunicorn 572 2798 0.572
Flask+Gunicorn 634 2985 13.16
Flask (connection pool) 2535 79704 12.09
aiohttp.web+API-Hour 4179 0 0.098

Requests by seconds
(Higher is better)
Errors
(Lower is better)


Latency (s)
(Lower is better)


Conclusions for the next round

On high charge, Django doesn't have the same behaviour as Flask: Both handle more or less the same requests rate, but Django penalizes less global latency of HTTP queries. The drawback is that the slow HTTP queries are very slow (26,43s for Django compared to 13,31s for Flask).
I removed Django ORM test for the next round because it isn't exactly the same SQL query generated and the performance difference with a SQL query is negligible.
I removed also Flask DB connection pool because the error rate is too important compared to other tests.

Second round

Here, I use wrk2, and changed the run time to 5 minutes.
A longer run time is very important because of how resources availability can change with time.
There are at least two reasons for this:

1. Your test environment runs on top of some OS which continues its activity during the test.
Therefore, you need a long time to be more insensitive to transient use of your test machine resources by other things
like another OS daemon or cron job triggering meanwhile.

2. The ramp-up of your test will gradually consume more resources at different levels: at the level of your Python scripts & libs,
as well as at the level of you OS / (Virtual) Machine.
This decrease of available resources will not necessarily be instantaneous, nor linear.
This is a typical source of after-deployment bad surprises in prod.
Here too, to be as close as possible to production scenario, you need to give time to your test to arrive to a "hover", eventually saturating some resources.
Ideally you'd saturate the network first (which in this case is like winning the jackpot).

Here, I'm testing at a constant 4000 queries per second, this time through the network.

Simple JSON document


Requests/s Errors Avg Latency (s)
Django+Gunicorn 1799 26883 97
Flask+Gunicorn 2714 26742 52
aiohttp.web+API-Hour 3995 0 0.002

Requests by seconds
(Higher is better)
Errors
(Lower is better)


Latency (s)
(Lower is better)

Agents list from database


Requests/s Errors Avg Latency (s)
Django+Gunicorn 278 37480 141.6
Flask+Gunicorn 304 40951 136.8
aiohttp.web+API-Hour 3698 0 7.84


Requests by seconds
(Higher is better)
Errors
(Lower is better)


Latency (s)
(Lower is better)

(Extra) Third round

For the fun, I used the same setup as second round, but with only with 10 requests/seconds during 30 seconds to see if under a low load, sync daemons could be quicker, because you have the AsyncIO overhead.

Agents list from database


Requests/s Errors Avg Latency (s)
Django+Gunicorn 10 0 0.01936
Flask+Gunicorn 10 0 0.01874
aiohttp.web+API-Hour 10 0 0.00642


Latency (s)
(Lower is better)

Conclusion

AsyncIO with aiohttp.web and API-Hour increases the number of requests per second, but more importantly, you have no sockets nor 5XX errors and the waiting time for each user is very really better, even with low load. This benchmark uses an ideal network setup, and therefore it doesn't cover a much worse scenario where your client arrives over a slow network (think smartphone users) on your Website.

It has been said often: If your webapp is your business, reduce waiting time is a key winner for you:

Some clues to improve AsyncIO performances

Even if this looks like good performance, we shouldn't rest on our laurels, we can certainly find more optimizations:

  1. Use an alternative event loop: I've tested to replace AsyncIO event loop and network layer by aiouv and quamash. For now, it doesn't really have a huge impact, maybe in the future.
  2. Have multiplex protocols from frontend to backend: HTTP 2 is now a multiplex protocol, it means you can stack several HTTP queries without waiting for the first response. This pattern should increase AsyncIO performances, but it must be validated by a benchmark.
  3. If you have another idea, don't hesitate to post it in comments.

Don't take architectural decisions based on micro-benchmarks

It's important to be very cautious with benchmarks, especially with micro-benchmarks. Check several different benchmarks, using different scenari, before to conclude on architecture for your application.

Don't forget this is all about IO-bound

If I was working for an organisation with a lot of CPU-bound projects, (such as a scientific organisation for example), my speech would be totally different.
But, my day-to-day challenges are more about I/O than about CPU, probably like for most Web developers.

Don't simply take me as a mentor. The needs and problematics of one person or organisation are not necessarily the same as your, even if that person is considered as a "guru" in one opensource community or another.

We should all try to keep a rational, scientific approach instead of religious approach when selecting your tools.
I hope this post will give you some ideas to experiment with. Feel free to share your tips to increase performances, I'd be glad to include them in my benchmarks!

I hope that these benchmarks will be an eye-opener for you.

25 Feb 2015 4:47pm GMT

Ludovic Gasc: Macro-benchmark with Django, Flask and AsyncIO (aiohttp.web+API-Hour)

Disclaimer: If you have some bias and/or dislike AsyncIO, please read my previous blog post before starting a war.

Tip: If you don't have the time to read the text, scroll down to see graphics.


Context of this macro-benchmark

Today, I propose you to benchmark a HTTP daemon based on AsyncIO, and compare results with a Flask and Django version.

For those who didn't follow AsyncIO news, aiohttp.web is a light Web framework based on aiohttp. It's like Flask but with less internal layers.
aiohttp is the implementation of HTTP with AsyncIO.

Moreover, API-Hour helps you to have multiprocess daemons with AsyncIO.
With this tool, we can compare Flask, Django and aiohttp.web in the same conditions.
This benchmark is based on a concrete need of one of our customers: they wanted to have a REST/JSON API to interact with their telephony server, based on Asterisk.
One of the WebServices gives the list of agents with their status. This WebService is heavily used because they use it on their public Website (itself having a serious traffic) to show who is available.

First, I've made a HTTP daemon based on Flask and Gunicorn, which gave honorable results. Later on, I replaced the HTTP part and pushed in production a daemon based on aiohttp.web and API-Hour.
A subset of theses daemons are used for this benchmark.
I've added a Django version because with Django and Flask, I certainly cover 90% of tools used by Python Web developers.

I've tried to have the same parameters for each daemon: for example, I obviously use the same number of workers, 16 in this benchmark.

I don't benchmark Django manage.py or dev HTTP server of Flask, I use Gunicorn, as most people use on production, to try to compare apples with apples.

Hardware

Network benchmark

I've almost 1Gb/s with this network:

On Server:

$ iperf -c 192.168.2.101 -d
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 28.6 MByte (default)
------------------------------------------------------------
------------------------------------------------------------
Client connecting to 192.168.2.101, TCP port 5001
TCP window size: 28.6 MByte (default)
------------------------------------------------------------
[ 5] local 192.168.2.100 port 24831 connected with 192.168.2.101 port 5001
[ 4] local 192.168.2.100 port 5001 connected with 192.168.2.101 port 16316
[ ID] Interval Transfer Bandwidth
[ 4] 0.0-10.1 sec 1.06 GBytes 903 Mbits/sec
[ 5] 0.0-10.1 sec 1.11 GBytes 943 Mbits/sec


On Client:

$ iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 28.6 MByte (default)
------------------------------------------------------------
[ 4] local 192.168.2.101 port 5001 connected with 192.168.2.100 port 24831
------------------------------------------------------------
Client connecting to 192.168.2.100, TCP port 5001
TCP window size: 28.6 MByte (default)
------------------------------------------------------------
[ 6] local 192.168.2.101 port 16316 connected with 192.168.2.100 port 5001
[ ID] Interval Transfer Bandwidth
[ 6] 0.0-10.0 sec 1.06 GBytes 908 Mbits/sec
[ 4] 0.0-10.2 sec 1.11 GBytes 927 Mbits/sec



System configuration
It's important to configure your PostgreSQL as a production server.
You need also to configure your Linux kernel to handle a lot of open sockets and some TCP tricks.
Everything is in the benchmark repository.

Client benchmark tool

From my experience with AsyncIO, Apache Benchmark (ab), Siege, Funkload and some old fashion HTTP benchmarks tools don't hit enough for an API-Hour daemon.
For now, I use wrk and wrk2 to benchmark.
wrk hits as fast as possible, where wrk2 hits with the same rate.

Metrics observed

I record three metrics:

  1. Requests/sec: Least interesting of metrics. (see below)
  2. Error rate: Sum of all errors (socket timeout, socket read/write, 5XX errors...)
  3. Reactivity: Certainly the most interesting of the three, it measures the time that our client will actually wait.


WebServices daemons

You can find all source code in API-Hour repository: https://github.com/Eyepea/API-Hour/tree/master/benchmarks
Each daemon has at least two WebServices:

On Flask daemon, I added /agents_with_pool endpoint, to use a database connection pool with Flask, but it isn't really good, you'll see later.
On Django daemon, I added /agents_with_orm endpoint, to measure the overhead to use Django-ORM instead of to use SQL directly. Warning: I didn't find a solution to have the exact same query.

Methodology

Each daemon will run alone to preserve resources.
Between each run, the daemon is restarted to be sure that previous test doesn't pollute the next one.

First turn

At the beginning, to have an idea how much maximum HTTP queries each daemon can support, I quickly attack (30 seconds) on localhost.

Warning ! This benchmark doesn't represent the reality you can have in production, because you don't have a network limitation nor latency, it's only for calibration.

Simple JSON document

In each daemon folder in benchmarks repository, you can read the output result of each wrk.
To simplify the reading, I summarize the captured values with an array and graphs:


Requests/s Errors Avg Latency (s)
Django+Gunicorn 70598 4489 7.7
Flask+Gunicorn 79598 4433 13.16
aiohttp.web+API-Hour 395847 0 0.03


Requests by seconds
(Higher is better)
Errors
(Lower is better)


Latency (s)
(Lower is better)

Agents list from database


Requests/s Errors Avg Latency (s)
Django+Gunicorn 583 2518 0.324
Django ORM+Gunicorn 572 2798 0.572
Flask+Gunicorn 634 2985 13.16
Flask (connection pool) 2535 79704 12.09
aiohttp.web+API-Hour 4179 0 0.098

Requests by seconds
(Higher is better)
Errors
(Lower is better)


Latency (s)
(Lower is better)


Conclusions for the next round

On high charge, Django doesn't have the same behaviour as Flask: Both handle more or less the same requests rate, but Django penalizes less global latency of HTTP queries. The drawback is that the slow HTTP queries are very slow (26,43s for Django compared to 13,31s for Flask).
I removed Django ORM test for the next round because it isn't exactly the same SQL query generated and the performance difference with a SQL query is negligible.
I removed also Flask DB connection pool because the error rate is too important compared to other tests.

Second round

Here, I use wrk2, and changed the run time to 5 minutes.
A longer run time is very important because of how resources availability can change with time.
There are at least two reasons for this:

1. Your test environment runs on top of some OS which continues its activity during the test.
Therefore, you need a long time to be more insensitive to transient use of your test machine resources by other things
like another OS daemon or cron job triggering meanwhile.

2. The ramp-up of your test will gradually consume more resources at different levels: at the level of your Python scripts & libs,
as well as at the level of you OS / (Virtual) Machine.
This decrease of available resources will not necessarily be instantaneous, nor linear.
This is a typical source of after-deployment bad surprises in prod.
Here too, to be as close as possible to production scenario, you need to give time to your test to arrive to a "hover", eventually saturating some resources.
Ideally you'd saturate the network first (which in this case is like winning the jackpot).

Here, I'm testing at a constant 4000 queries per second, this time through the network.

Simple JSON document


Requests/s Errors Avg Latency (s)
Django+Gunicorn 1799 26883 97
Flask+Gunicorn 2714 26742 52
aiohttp.web+API-Hour 3995 0 0.002

Requests by seconds
(Higher is better)
Errors
(Lower is better)


Latency (s)
(Lower is better)

Agents list from database


Requests/s Errors Avg Latency (s)
Django+Gunicorn 278 37480 141.6
Flask+Gunicorn 304 40951 136.8
aiohttp.web+API-Hour 3698 0 7.84


Requests by seconds
(Higher is better)
Errors
(Lower is better)


Latency (s)
(Lower is better)

(Extra) Third round

For the fun, I used the same setup as second round, but with only with 10 requests/seconds during 30 seconds to see if under a low load, sync daemons could be quicker, because you have the AsyncIO overhead.

Agents list from database


Requests/s Errors Avg Latency (s)
Django+Gunicorn 10 0 0.01936
Flask+Gunicorn 10 0 0.01874
aiohttp.web+API-Hour 10 0 0.00642


Latency (s)
(Lower is better)

Conclusion

AsyncIO with aiohttp.web and API-Hour increases the number of requests per second, but more importantly, you have no sockets nor 5XX errors and the waiting time for each user is very really better, even with low load. This benchmark uses an ideal network setup, and therefore it doesn't cover a much worse scenario where your client arrives over a slow network (think smartphone users) on your Website.

It has been said often: If your webapp is your business, reduce waiting time is a key winner for you:

Some clues to improve AsyncIO performances

Even if this looks like good performance, we shouldn't rest on our laurels, we can certainly find more optimizations:

  1. Use an alternative event loop: I've tested to replace AsyncIO event loop and network layer by aiouv and quamash. For now, it doesn't really have a huge impact, maybe in the future.
  2. Have multiplex protocols from frontend to backend: HTTP 2 is now a multiplex protocol, it means you can stack several HTTP queries without waiting for the first response. This pattern should increase AsyncIO performances, but it must be validated by a benchmark.
  3. If you have another idea, don't hesitate to post it in comments.

Don't take architectural decisions based on micro-benchmarks

It's important to be very cautious with benchmarks, especially with micro-benchmarks. Check several different benchmarks, using different scenari, before to conclude on architecture for your application.

Don't forget this is all about IO-bound

If I was working for an organisation with a lot of CPU-bound projects, (such as a scientific organisation for example), my speech would be totally different.
But, my day-to-day challenges are more about I/O than about CPU, probably like for most Web developers.

Don't simply take me as a mentor. The needs and problematics of one person or organisation are not necessarily the same as your, even if that person is considered as a "guru" in one opensource community or another.

We should all try to keep a rational, scientific approach instead of religious approach when selecting your tools.
I hope this post will give you some ideas to experiment with. Feel free to share your tips to increase performances, I'd be glad to include them in my benchmarks!

I hope that these benchmarks will be an eye-opener for you.

25 Feb 2015 4:47pm GMT

Kushal Das: FUDCON Pune 2015 CFP is open

FUDCON, the Fedora Users and Developers Conference is going to happen next in Pune, in India, from 26th June to 28th June in the Maharashtra Institute of Technology College of Engineering (MIT COE). The call for proposals (CFP) is already out and 9th March is the last date to submit a talk/workshop. If you are working on any upstream project, you may want to talk about your work to a technical crowd in the conference. If you are a student, and you want to talk about the latest patches you have submitted to the upstream project, this is the right place to do so. May be you never talked in front of a crowd like this before, but you can start doing this by submitting a talk in the FUDCON.

A few tips for your talk/workshop proposal

In case you need help with your proposal, you can show it to the other community members before submitting it. You can always find a few of us in #fedora-india IRC channel.

So do not waste time, go ahead and submit a talk or workshop proposal.

25 Feb 2015 4:45pm GMT

Kushal Das: FUDCON Pune 2015 CFP is open

FUDCON, the Fedora Users and Developers Conference is going to happen next in Pune, in India, from 26th June to 28th June in the Maharashtra Institute of Technology College of Engineering (MIT COE). The call for proposals (CFP) is already out and 9th March is the last date to submit a talk/workshop. If you are working on any upstream project, you may want to talk about your work to a technical crowd in the conference. If you are a student, and you want to talk about the latest patches you have submitted to the upstream project, this is the right place to do so. May be you never talked in front of a crowd like this before, but you can start doing this by submitting a talk in the FUDCON.

A few tips for your talk/workshop proposal

In case you need help with your proposal, you can show it to the other community members before submitting it. You can always find a few of us in #fedora-india IRC channel.

So do not waste time, go ahead and submit a talk or workshop proposal.

25 Feb 2015 4:45pm GMT

Europython: Our first keynote speaker: Guido van Rossum

We are pleased to announce our first keynote speaker for EuroPython 2015:


Guido van Rossum

Python's creator: Guido van Rossum

Guido will give a keynote and a more technical talk about the new type hinting proposal for Python 3.5 that's currently being discussed as PEP 483 (The Theory of Type Hints) and PEP 484 (Type Hints).

Enjoy,
-
EuroPython Society

25 Feb 2015 2:32pm GMT

Europython: Our first keynote speaker: Guido van Rossum

We are pleased to announce our first keynote speaker for EuroPython 2015:


Guido van Rossum

Python's creator: Guido van Rossum

Guido will give a keynote and a more technical talk about the new type hinting proposal for Python 3.5 that's currently being discussed as PEP 483 (The Theory of Type Hints) and PEP 484 (Type Hints).

Enjoy,
-
EuroPython Society

25 Feb 2015 2:32pm GMT

Django Weblog: Django 1.8 beta 1 and 1.7.5 released

Today the Django team has released Django 1.8 beta 1, a preview/testing package that represents the second stage in the 1.8 release cycle and an opportunity for you to try out some of the changes coming in Django 1.8.

Django 1.8 has been designated as Django's second "Long-Term Support" (LTS) release. It will receive security updates for at least three years after its release. Support for the previous LTS, Django 1.4, will end 6 months from the release date of Django 1.8.

For full details, see the in-development 1.8 release notes.

Only bugs in new features and regressions from earlier versions of Django will be fixed between now and 1.8 final (also, translations will be updated following the "string freeze" when the release candidate is issued). While the beta release was delayed a week and a half from the originally planned date, we've fixed several issues that should result in a better experience with the beta. The current release schedule calls for the release candidate in two and half weeks from now. We'll judge from the number of bug reports we get between now and then whether or not to delay the release candidate. Watch the django-developers mailing list thread for updates.

As with all alpha and beta packages, this is not for production use. But if you'd like to take some of the new features for a spin, or to help find and fix bugs (which should be reported to the issue tracker), you can grab a copy of the package from our downloads page. And as always, signed MD5, SHA1, and SHA256 checksums of the 1.8 beta package are available.

In addition to the beta release, we've issued a bug fix release for the 1.7 series, 1.7.5. See the release notes for details.

The PGP key ID used for these releases is Tim Graham: 1E8ABDC773EDE252.

25 Feb 2015 2:02pm GMT

Django Weblog: Django 1.8 beta 1 and 1.7.5 released

Today the Django team has released Django 1.8 beta 1, a preview/testing package that represents the second stage in the 1.8 release cycle and an opportunity for you to try out some of the changes coming in Django 1.8.

Django 1.8 has been designated as Django's second "Long-Term Support" (LTS) release. It will receive security updates for at least three years after its release. Support for the previous LTS, Django 1.4, will end 6 months from the release date of Django 1.8.

For full details, see the in-development 1.8 release notes.

Only bugs in new features and regressions from earlier versions of Django will be fixed between now and 1.8 final (also, translations will be updated following the "string freeze" when the release candidate is issued). While the beta release was delayed a week and a half from the originally planned date, we've fixed several issues that should result in a better experience with the beta. The current release schedule calls for the release candidate in two and half weeks from now. We'll judge from the number of bug reports we get between now and then whether or not to delay the release candidate. Watch the django-developers mailing list thread for updates.

As with all alpha and beta packages, this is not for production use. But if you'd like to take some of the new features for a spin, or to help find and fix bugs (which should be reported to the issue tracker), you can grab a copy of the package from our downloads page. And as always, signed MD5, SHA1, and SHA256 checksums of the 1.8 beta package are available.

In addition to the beta release, we've issued a bug fix release for the 1.7 series, 1.7.5. See the release notes for details.

The PGP key ID used for these releases is Tim Graham: 1E8ABDC773EDE252.

25 Feb 2015 2:02pm GMT

Machinalis: Reading TechCrunch

When we discussed Information Extraction and IEPY among professional peers we noticed that the approach was often unknown to those who could benefit from it the most. Its main beneficiaries are those with large volumes of unstructured or poorly structured text, where it is very costly to go through the text manually to extract relationships (e.g. in the VC industry such as funding, acquisitions, creating or opening of offices, etc.) between entities (companies, investment funds, people, and so on).

To create an example directed at those with perhaps less of a technical background, we processed the news articles from TechCrunch News, the main technology blog in the United States. We sought the funding relationships in U.S. companies. We published the result and found some interesting things:

VC Industry and Specialized Press

The publication of news about funding may result from investigation by specialized journalists or pushed from the companies themselves, who manage to promote news within mainstream media content.

Then checking the funding-related content in the TechCrunch News posts and comparing it to more complete databases can show us the editorial policies that these journalists follow or the companies efficiency in placing their own content.

So for example, in the funded companies vs. average funding chart (currently one of the main discussion topics) you can see a growing gap between the events covered in the more general database (CrunchBase) and those from TechCrunch News coverage

Since last year there has been a tendency to cover events where the funding amount was greater than the average from the CrunchBase database. Based on this data, higher level funding events should attract more attention from journalists than other below-average ones.

Considering geographical distribution of events coverage




Some of the highlights we can see include:

And so on.

In summary, what was the advantage of this approach?

If you wanted to have an overall view you could include content from other blogs like Gigaom, VentureBeat, TWSJ, Forbes Tech, Mashable, Wired, The Verge, etc. without extra effort once the tool has learned to identify and predict relationships (e.g. funding to companies).

And of course, as the demo outlines, we were able to read several thousand news articles, extract the information to build a database and make the demo without arousing the deepest murderous rage in us that reading ~100k articles looking for that relationship can awake.




Disclaimer: this post and the demo doesn't pretend to be a comprehensive analysis of the VC industry but to show what information extraction can be used for

25 Feb 2015 1:14pm GMT

Machinalis: Reading TechCrunch

When we discussed Information Extraction and IEPY among professional peers we noticed that the approach was often unknown to those who could benefit from it the most. Its main beneficiaries are those with large volumes of unstructured or poorly structured text, where it is very costly to go through the text manually to extract relationships (e.g. in the VC industry such as funding, acquisitions, creating or opening of offices, etc.) between entities (companies, investment funds, people, and so on).

To create an example directed at those with perhaps less of a technical background, we processed the news articles from TechCrunch News, the main technology blog in the United States. We sought the funding relationships in U.S. companies. We published the result and found some interesting things:

VC Industry and Specialized Press

The publication of news about funding may result from investigation by specialized journalists or pushed from the companies themselves, who manage to promote news within mainstream media content.

Then checking the funding-related content in the TechCrunch News posts and comparing it to more complete databases can show us the editorial policies that these journalists follow or the companies efficiency in placing their own content.

So for example, in the funded companies vs. average funding chart (currently one of the main discussion topics) you can see a growing gap between the events covered in the more general database (CrunchBase) and those from TechCrunch News coverage

Since last year there has been a tendency to cover events where the funding amount was greater than the average from the CrunchBase database. Based on this data, higher level funding events should attract more attention from journalists than other below-average ones.

Considering geographical distribution of events coverage




Some of the highlights we can see include:

And so on.

In summary, what was the advantage of this approach?

If you wanted to have an overall view you could include content from other blogs like Gigaom, VentureBeat, TWSJ, Forbes Tech, Mashable, Wired, The Verge, etc. without extra effort once the tool has learned to identify and predict relationships (e.g. funding to companies).

And of course, as the demo outlines, we were able to read several thousand news articles, extract the information to build a database and make the demo without arousing the deepest murderous rage in us that reading ~100k articles looking for that relationship can awake.




Disclaimer: this post and the demo doesn't pretend to be a comprehensive analysis of the VC industry but to show what information extraction can be used for

25 Feb 2015 1:14pm GMT

PyPy Development: Experiments in Pyrlang with RPython

Pyrlang is an Erlang BEAM bytecode interpreter written in RPython.

It implements approximately 25% of BEAM instructions. It can support integer calculations (but not bigint), closures, exception handling, some operators to atom, list and tuple, user modules, and multi-process in single core. Pyrlang is still in development.

There are some differences between BEAM and the VM of PyPy:

Regarding bytecode dispatch loop, Pyrlang uses a while loop to fetch instructions and operands, call the function corresponding to every instruction, and jump back to the head of the while loop. Due to the differences between the RPython call-stack and BEAM's Y register, we decided to implement and manage the Y register by hand. On the other hand, PyPy uses RPython's call stack to implement Python's call stack. As a result, the function for the dispatch loop in PyPy calls itself recursively. This does not happen in Pyrlang.

The Erlang compiler (erlc) usually compiles the bytecode instructions for function invocation into CALL (for normal invocation) and CALL_ONLY (for tail recursive invocation). You can use a trampoline semantic to implement it:

The current implementation only inserts the JIT hint of can_enter_jit following the CALL_ONLY instruction. This means that the JIT only traces the tail-recursive invocation in Erlang code, which has a very similar semantic to the loop in imperative programming languages like Python.

We have also written a single scheduler to implement the language level process in a single core. There is a runable queue in the scheduler. On each iteration, the scheduler pops one element (which is a process object with dispatch loop) from the queue, and executes the dispatch loop of the process object. In the dispatch loop, however, there is a counter-call "reduction" inside the dispatch loop. The reduction decrements during the execution of the loop, and when the reduction becomes 0, the dispatch loop terminates. Then the scheduler pushes that element into the runable queue again, and pops the next element for the queue, and so on.

We are planning to implement a multi-process scheduler for multi-core CPUs, which will require multiple schedulers and even multiple runable queues for each core, but that will be another story. :-)

Methods

We wrote two benchmark programs of Erlang:

  • FACT: A benchmark to calculate the factorial in a tail-recursive style, but because we haven't implemented big int, we do a remainder calculation to the argument for the next iteration, so the number never overflows.
  • REVERSE: The benchmark creates a reversed list of numbers, such as [20000, 19999, 19998, …], and applies a bubble sort to it.

Results

The Value of Reduction

We used REVERSE to evaluate the JIT with different values of reduction:

The X axis is the value of reduction, and the Y axis is the execution time (by second).

It seems that when the value of reduction is small, the reduction influences the performance significantly, but when reduction becomes larger, it only increases the speed very slightly. In fact, we use 2000 as the default reduction value (as well as the reduction value in the official Erlang interpreter).

Surprisingly, the trace is always generated even when the reduction is very small, such as 0, which means the dispatch loop can only run for a very limited number of iterations, and the language level process executes fewer instructions than an entire loop in one switch of the scheduler). The generated trace is almost the same, regardless of different reduction values.

Actually, the RPython JIT only cares what code it meets, but does not care who executes it, thus the JIT always generates the results above. The trace even can be shared among different threads if they execute the same code.

The overhead at low reduction value may be due to the scheduler, which switches from different processes too frequently, or from the too-frequent switching between bytecode interpreter and native code, but not from JIT itself.

Here is more explanation from Armin Rigo:

"The JIT works well because you're using a scheme where some counter is decremented (and the soft-thread interrupted when it reaches zero) only once in each app-level loop. The soft-thread switch is done by returning to some scheduler, which will resume a different soft-thread by calling it. It means the JIT can still compile each of the loops as usual, with the generated machine code containing the decrease-and-check-for-zero operation which, when true, exits the assembler."

Fair Process Switching vs. Unfair Process Switching

We are also concerned about the timing for decreasing reduction value. In our initial version of Pyrlang, we decrease reduction value at every local function invocation, module function invocation, and BIF (built-in function) invocation, since this is what the official Erlang interpreter does. However, since the JIT in RPython basically traces the target language loop (which is the tail recursive invocation in Pyrlang) it is typically better to keep the loop whole during a switch of the language level process. We modified Pyrlang, and made the reduction decrement only occur after CALL_ONLY, which is actually the loop boundary of the target language.

Of course, this strategy may cause an "unfair" execution among language level processes. For example, if one process has only a single long-sequence code, it executes until the end of the code. On the other hand, if a process has a very short loop, it may be executed by very limited steps then be switched out by the scheduler. However, in the real world, this "unfairness" is usually considered acceptable, and is used in many VM implementations including PyPy for improving the overall performance.

We compared these two versions of Pyrlang in the FACT benchmark. The reduction decrement is quite different because there are some BIF invocations inside the loop. In the old version the process can be suspended at loop boundaries or other function invocation, but in the new version, it can be suspended only at loop boundaries.

We show that the strategy is effective, removing around 7% of the overhead. We have also compared it in REVERSE, but since there are no extra invocations inside the trace, it cannot provide any performance improvement. In the real world, we believe there is usually more than one extra invocation inside a single loop, so this strategy is effective for most cases.

Comparison with Default Erlang and HiPE

We compared the performance of Pyrlang with the default Erlang interpreter and the HiPE (High Performance Erlang) complier. HiPE is an official Erlang compiler that can compile Erlang source code to native code. The speed of Erlang programs obviously improves but loses its generality instead.

Please note that Pyrlang is still in development, so in some situations it does less work than the default Erlang interpreter, such as not checking integer overflow when dealing with big integer, and not checking and adding locks when accessing message queues in the language-level process, so is therefore faster. The final version of Pyrlang may be slower.

We used the two benchmark programs above, and made sure both of them are executed for more than five seconds to cover the JIT warm-up time for RPython. The experiment environment is a OS X 10.10 machine with 3.5GHZ 6-core Intel Xeon E5 CPU and 14GB 1866 MHz DDR3 ECC memory.

Let's look at the result of FACT. The graph shows that Pyrlang runs 177.41% faster on average than Erlang, and runs at almost the same speed as HiPE. However, since we haven't implemented big integer in Pyrlang, the arithmetical operators do not do any extra overflow checking. It is reasonable that the final version for Pyrlang will be slower than the current version and HiPE.

As for REVERSE, the graph shows that Pyrlang runs 45.09% faster than Erlang, but 63.45% slower than HiPE on average. We think this is reasonable because there are only few arithmetical operators in this benchmark so the speeds of these three implementations are closer. However, we observed that at the scale of 40,000, the speed of Pyrlang slowed down significantly (111.35% slower than HiPE) compared with the other two scales (56.38% and 22.63% slower than HiPE).

Until now we can only hypothesize why Pyrlang slows down at that scale. We guess that the overhead might be from GC. This is because the BEAM bytecode provides some GC hints to help the default Erlang compiler to perform some GC operations immediately. For example, using GC_BIF instead of a BIF instruction tells the VM that there may be a GC opportunity, and tells the VM how many live variables should be around one instruction. In Pyrlang we do not use these kinds of hints but rely on RPython's GC totally. When there are a huge number of objects during runtime, (as for REVERSE, it should be the Erlang list object) the speed therefore slows down.

Ruochen Huang

25 Feb 2015 11:13am GMT

PyPy Development: Experiments in Pyrlang with RPython

Pyrlang is an Erlang BEAM bytecode interpreter written in RPython.

It implements approximately 25% of BEAM instructions. It can support integer calculations (but not bigint), closures, exception handling, some operators to atom, list and tuple, user modules, and multi-process in single core. Pyrlang is still in development.

There are some differences between BEAM and the VM of PyPy:

Regarding bytecode dispatch loop, Pyrlang uses a while loop to fetch instructions and operands, call the function corresponding to every instruction, and jump back to the head of the while loop. Due to the differences between the RPython call-stack and BEAM's Y register, we decided to implement and manage the Y register by hand. On the other hand, PyPy uses RPython's call stack to implement Python's call stack. As a result, the function for the dispatch loop in PyPy calls itself recursively. This does not happen in Pyrlang.

The Erlang compiler (erlc) usually compiles the bytecode instructions for function invocation into CALL (for normal invocation) and CALL_ONLY (for tail recursive invocation). You can use a trampoline semantic to implement it:

The current implementation only inserts the JIT hint of can_enter_jit following the CALL_ONLY instruction. This means that the JIT only traces the tail-recursive invocation in Erlang code, which has a very similar semantic to the loop in imperative programming languages like Python.

We have also written a single scheduler to implement the language level process in a single core. There is a runable queue in the scheduler. On each iteration, the scheduler pops one element (which is a process object with dispatch loop) from the queue, and executes the dispatch loop of the process object. In the dispatch loop, however, there is a counter-call "reduction" inside the dispatch loop. The reduction decrements during the execution of the loop, and when the reduction becomes 0, the dispatch loop terminates. Then the scheduler pushes that element into the runable queue again, and pops the next element for the queue, and so on.

We are planning to implement a multi-process scheduler for multi-core CPUs, which will require multiple schedulers and even multiple runable queues for each core, but that will be another story. :-)

Methods

We wrote two benchmark programs of Erlang:

  • FACT: A benchmark to calculate the factorial in a tail-recursive style, but because we haven't implemented big int, we do a remainder calculation to the argument for the next iteration, so the number never overflows.
  • REVERSE: The benchmark creates a reversed list of numbers, such as [20000, 19999, 19998, …], and applies a bubble sort to it.

Results

The Value of Reduction

We used REVERSE to evaluate the JIT with different values of reduction:

The X axis is the value of reduction, and the Y axis is the execution time (by second).

It seems that when the value of reduction is small, the reduction influences the performance significantly, but when reduction becomes larger, it only increases the speed very slightly. In fact, we use 2000 as the default reduction value (as well as the reduction value in the official Erlang interpreter).

Surprisingly, the trace is always generated even when the reduction is very small, such as 0, which means the dispatch loop can only run for a very limited number of iterations, and the language level process executes fewer instructions than an entire loop in one switch of the scheduler). The generated trace is almost the same, regardless of different reduction values.

Actually, the RPython JIT only cares what code it meets, but does not care who executes it, thus the JIT always generates the results above. The trace even can be shared among different threads if they execute the same code.

The overhead at low reduction value may be due to the scheduler, which switches from different processes too frequently, or from the too-frequent switching between bytecode interpreter and native code, but not from JIT itself.

Here is more explanation from Armin Rigo:

"The JIT works well because you're using a scheme where some counter is decremented (and the soft-thread interrupted when it reaches zero) only once in each app-level loop. The soft-thread switch is done by returning to some scheduler, which will resume a different soft-thread by calling it. It means the JIT can still compile each of the loops as usual, with the generated machine code containing the decrease-and-check-for-zero operation which, when true, exits the assembler."

Fair Process Switching vs. Unfair Process Switching

We are also concerned about the timing for decreasing reduction value. In our initial version of Pyrlang, we decrease reduction value at every local function invocation, module function invocation, and BIF (built-in function) invocation, since this is what the official Erlang interpreter does. However, since the JIT in RPython basically traces the target language loop (which is the tail recursive invocation in Pyrlang) it is typically better to keep the loop whole during a switch of the language level process. We modified Pyrlang, and made the reduction decrement only occur after CALL_ONLY, which is actually the loop boundary of the target language.

Of course, this strategy may cause an "unfair" execution among language level processes. For example, if one process has only a single long-sequence code, it executes until the end of the code. On the other hand, if a process has a very short loop, it may be executed by very limited steps then be switched out by the scheduler. However, in the real world, this "unfairness" is usually considered acceptable, and is used in many VM implementations including PyPy for improving the overall performance.

We compared these two versions of Pyrlang in the FACT benchmark. The reduction decrement is quite different because there are some BIF invocations inside the loop. In the old version the process can be suspended at loop boundaries or other function invocation, but in the new version, it can be suspended only at loop boundaries.

We show that the strategy is effective, removing around 7% of the overhead. We have also compared it in REVERSE, but since there are no extra invocations inside the trace, it cannot provide any performance improvement. In the real world, we believe there is usually more than one extra invocation inside a single loop, so this strategy is effective for most cases.

Comparison with Default Erlang and HiPE

We compared the performance of Pyrlang with the default Erlang interpreter and the HiPE (High Performance Erlang) complier. HiPE is an official Erlang compiler that can compile Erlang source code to native code. The speed of Erlang programs obviously improves but loses its generality instead.

Please note that Pyrlang is still in development, so in some situations it does less work than the default Erlang interpreter, such as not checking integer overflow when dealing with big integer, and not checking and adding locks when accessing message queues in the language-level process, so is therefore faster. The final version of Pyrlang may be slower.

We used the two benchmark programs above, and made sure both of them are executed for more than five seconds to cover the JIT warm-up time for RPython. The experiment environment is a OS X 10.10 machine with 3.5GHZ 6-core Intel Xeon E5 CPU and 14GB 1866 MHz DDR3 ECC memory.

Let's look at the result of FACT. The graph shows that Pyrlang runs 177.41% faster on average than Erlang, and runs at almost the same speed as HiPE. However, since we haven't implemented big integer in Pyrlang, the arithmetical operators do not do any extra overflow checking. It is reasonable that the final version for Pyrlang will be slower than the current version and HiPE.

As for REVERSE, the graph shows that Pyrlang runs 45.09% faster than Erlang, but 63.45% slower than HiPE on average. We think this is reasonable because there are only few arithmetical operators in this benchmark so the speeds of these three implementations are closer. However, we observed that at the scale of 40,000, the speed of Pyrlang slowed down significantly (111.35% slower than HiPE) compared with the other two scales (56.38% and 22.63% slower than HiPE).

Until now we can only hypothesize why Pyrlang slows down at that scale. We guess that the overhead might be from GC. This is because the BEAM bytecode provides some GC hints to help the default Erlang compiler to perform some GC operations immediately. For example, using GC_BIF instead of a BIF instruction tells the VM that there may be a GC opportunity, and tells the VM how many live variables should be around one instruction. In Pyrlang we do not use these kinds of hints but rely on RPython's GC totally. When there are a huge number of objects during runtime, (as for REVERSE, it should be the Erlang list object) the speed therefore slows down.

Ruochen Huang

25 Feb 2015 11:13am GMT

Kushal Das: My talk in MSF, India

Last week I gave a talk on Free and Open Source Software in the Metal and Steel factory, Indian Ordinance Factories, Ishapore, India. I met Mr. Amartya Talukdar, a well known activist and blogger from Kolkata in the blogger's meet. He currently manages the I.T. team in the above mentioned place and he arranged the talk to spread more awareness about FOSS.

I reached the main gate an hour before the talk. The securities came around to ask me why I was standing there in the road. I was sure this is going to happen again. I went into the factory along with Mr. Talukdar, at least three times the securities stopped me while the guns were ready. They also took my mobile phone, I left my camera back at home for the same reason.

I met the I.T. Department and few developers who work there, before the talk. Around 9:40am we moved to the big conference room for my talk. The talk started with Mr. Talukdar giving a small introduction. I was not sure how many technical people will attend the talk, so it was less technical and more on demo side. The room was almost full within few minutes, and I hope that my introductions to FOSS, Fedora, and Python went well. I was carrying a few Python docs with me and few other Fedora stickers. In the talk I spent most of time demoing various tools which can increase productivity of the management by using the right tools. We saw reStructuredText, rst2pdf and Sphinx for managing documents. We also looked into version control systems and how we can use them. We talked a bit about Owncloud, but without network, I could not demo. I also demoed various small Python scripts I use, to keep my life simple. I learned about various FOSS tools they are already using. They use Linux in the servers, my biggest suggestion was about using Linux in the desktops too. Viruses are always a common problem which can easily be eliminated with Linux on the desktops.

My talk ended around 12pm. After lunch, while walking back to the factory Mr. Talukdar showed me various historical places and items from Dutch and British colony days. Of course there were again the securities while going out and coming in.

We spent next few hours discussing various technology and workflow related queries with the Jt. General Manager Mr. Neeraj Agrawal. It was very nice to see that he is updated with all the latest news and information from the FOSS and technology world. We really need more people like him who are open to new ideas and capable of managing both the worlds. In future we will be doing a few workshops targeting the needs of the developers of the factory.

25 Feb 2015 9:06am GMT

Kushal Das: My talk in MSF, India

Last week I gave a talk on Free and Open Source Software in the Metal and Steel factory, Indian Ordinance Factories, Ishapore, India. I met Mr. Amartya Talukdar, a well known activist and blogger from Kolkata in the blogger's meet. He currently manages the I.T. team in the above mentioned place and he arranged the talk to spread more awareness about FOSS.

I reached the main gate an hour before the talk. The securities came around to ask me why I was standing there in the road. I was sure this is going to happen again. I went into the factory along with Mr. Talukdar, at least three times the securities stopped me while the guns were ready. They also took my mobile phone, I left my camera back at home for the same reason.

I met the I.T. Department and few developers who work there, before the talk. Around 9:40am we moved to the big conference room for my talk. The talk started with Mr. Talukdar giving a small introduction. I was not sure how many technical people will attend the talk, so it was less technical and more on demo side. The room was almost full within few minutes, and I hope that my introductions to FOSS, Fedora, and Python went well. I was carrying a few Python docs with me and few other Fedora stickers. In the talk I spent most of time demoing various tools which can increase productivity of the management by using the right tools. We saw reStructuredText, rst2pdf and Sphinx for managing documents. We also looked into version control systems and how we can use them. We talked a bit about Owncloud, but without network, I could not demo. I also demoed various small Python scripts I use, to keep my life simple. I learned about various FOSS tools they are already using. They use Linux in the servers, my biggest suggestion was about using Linux in the desktops too. Viruses are always a common problem which can easily be eliminated with Linux on the desktops.

My talk ended around 12pm. After lunch, while walking back to the factory Mr. Talukdar showed me various historical places and items from Dutch and British colony days. Of course there were again the securities while going out and coming in.

We spent next few hours discussing various technology and workflow related queries with the Jt. General Manager Mr. Neeraj Agrawal. It was very nice to see that he is updated with all the latest news and information from the FOSS and technology world. We really need more people like him who are open to new ideas and capable of managing both the worlds. In future we will be doing a few workshops targeting the needs of the developers of the factory.

25 Feb 2015 9:06am GMT

Vasudev Ram: Publish SQLite data to PDF using named tuples

By Vasudev Ram


Some time ago I had written this post:

Publishing SQLite data to PDF is easy with xtopdf.

It showed how to get data from an SQLite (Wikipedia) database and write it to PDF, using xtopdf, my open source PDF creation library for Python.

Today I was browsing the Python standard library docs, and so thought of modifying that program to use the namedtuple data type from the collections module of Python, which is described as implementing "High-performance container datatypes". The collections module was introduced in Python 2.4.
Here is a modified version of that program, SQLiteToPDF.py, called SQLiteToPDFWithNamedTuples.py, that uses named tuples:

# SQLiteToPDFWithNamedTuples.py
# Author: Vasudev Ram - http://www.dancingbison.com
# SQLiteToPDFWithNamedTuples.py is a program to demonstrate how to read
# SQLite database data and convert it to PDF. It uses the Python
# data structure called namedtuple from the collections module of
# the Python standard library.

from __future__ import print_function
import sys
from collections import namedtuple
import sqlite3
from PDFWriter import PDFWriter

# Helper function to output a string to both screen and PDF.
def print_and_write(pw, strng):
print(strng)
pw.writeLine(strng)

try:

# Create the stocks database.
conn = sqlite3.connect('stocks.db')
# Get a cursor to it.
curs = conn.cursor()

# Create the stocks table.
curs.execute('''DROP TABLE IF EXISTS stocks''')
curs.execute('''CREATE TABLE stocks
(date text, trans text, symbol text, qty real, price real)''')

# Insert a few rows of data into the stocks table.
curs.execute("INSERT INTO stocks VALUES ('2006-01-05', 'BUY', 'RHAT', 100, 25.1)")
curs.execute("INSERT INTO stocks VALUES ('2007-02-06', 'SELL', 'ORCL', 200, 35.2)")
curs.execute("INSERT INTO stocks VALUES ('2008-03-07', 'HOLD', 'IBM', 300, 45.3)")
conn.commit()

# Create a namedtuple to represent stock rows.
StockRecord = namedtuple('StockRecord', 'date, trans, symbol, qty, price')

# Run the query to get the stocks data.
curs.execute("SELECT date, trans, symbol, qty, price FROM stocks")

# Create a PDFWriter and set some of its fields.
pw = PDFWriter("stocks.pdf")
pw.setFont("Courier", 12)
pw.setHeader("SQLite data to PDF with named tuples")
pw.setFooter("Generated by xtopdf - https://bitbucket.org/vasudevram/xtopdf")

# Write header info.
hdr_flds = [ str(hdr_fld).rjust(10) + " " for hdr_fld in StockRecord._fields ]
hdr_fld_str = ''.join(hdr_flds)
print_and_write(pw, '=' * len(hdr_fld_str))
print_and_write(pw, hdr_fld_str)
print_and_write(pw, '-' * len(hdr_fld_str))

# Now loop over the fetched data and write it to PDF.
# Map the StockRecord namedtuple's _make class method
# (that creates a new instance) to all the rows fetched.
for stock in map(StockRecord._make, curs.fetchall()):
row = [ str(col).rjust(10) + " " for col in (stock.date, \
stock.trans, stock.symbol, stock.qty, stock.price) ]
# Above line can instead be written more simply as:
# row = [ str(col).rjust(10) + " " for col in stock ]
row_str = ''.join(row)
print_and_write(pw, row_str)

print_and_write(pw, '=' * len(hdr_fld_str))

except Exception as e:
print("ERROR: Caught exception: " + e.message)
sys.exit(1)

finally:
pw.close()
conn.close()

This time I've imported print_function so that I can use print as a function instead of as a statement.

Here's a screenshot of the PDF output in Foxit PDF Reader:


- Vasudev Ram - Online Python training and programming Dancing Bison EnterprisesSignup to hear about new products or services from me. Posts about Python Posts about xtopdf Contact Page

Share |
Vasudev Ram

25 Feb 2015 3:50am GMT

Vasudev Ram: Publish SQLite data to PDF using named tuples

By Vasudev Ram


Some time ago I had written this post:

Publishing SQLite data to PDF is easy with xtopdf.

It showed how to get data from an SQLite (Wikipedia) database and write it to PDF, using xtopdf, my open source PDF creation library for Python.

Today I was browsing the Python standard library docs, and so thought of modifying that program to use the namedtuple data type from the collections module of Python, which is described as implementing "High-performance container datatypes". The collections module was introduced in Python 2.4.
Here is a modified version of that program, SQLiteToPDF.py, called SQLiteToPDFWithNamedTuples.py, that uses named tuples:

# SQLiteToPDFWithNamedTuples.py
# Author: Vasudev Ram - http://www.dancingbison.com
# SQLiteToPDFWithNamedTuples.py is a program to demonstrate how to read
# SQLite database data and convert it to PDF. It uses the Python
# data structure called namedtuple from the collections module of
# the Python standard library.

from __future__ import print_function
import sys
from collections import namedtuple
import sqlite3
from PDFWriter import PDFWriter

# Helper function to output a string to both screen and PDF.
def print_and_write(pw, strng):
print(strng)
pw.writeLine(strng)

try:

# Create the stocks database.
conn = sqlite3.connect('stocks.db')
# Get a cursor to it.
curs = conn.cursor()

# Create the stocks table.
curs.execute('''DROP TABLE IF EXISTS stocks''')
curs.execute('''CREATE TABLE stocks
(date text, trans text, symbol text, qty real, price real)''')

# Insert a few rows of data into the stocks table.
curs.execute("INSERT INTO stocks VALUES ('2006-01-05', 'BUY', 'RHAT', 100, 25.1)")
curs.execute("INSERT INTO stocks VALUES ('2007-02-06', 'SELL', 'ORCL', 200, 35.2)")
curs.execute("INSERT INTO stocks VALUES ('2008-03-07', 'HOLD', 'IBM', 300, 45.3)")
conn.commit()

# Create a namedtuple to represent stock rows.
StockRecord = namedtuple('StockRecord', 'date, trans, symbol, qty, price')

# Run the query to get the stocks data.
curs.execute("SELECT date, trans, symbol, qty, price FROM stocks")

# Create a PDFWriter and set some of its fields.
pw = PDFWriter("stocks.pdf")
pw.setFont("Courier", 12)
pw.setHeader("SQLite data to PDF with named tuples")
pw.setFooter("Generated by xtopdf - https://bitbucket.org/vasudevram/xtopdf")

# Write header info.
hdr_flds = [ str(hdr_fld).rjust(10) + " " for hdr_fld in StockRecord._fields ]
hdr_fld_str = ''.join(hdr_flds)
print_and_write(pw, '=' * len(hdr_fld_str))
print_and_write(pw, hdr_fld_str)
print_and_write(pw, '-' * len(hdr_fld_str))

# Now loop over the fetched data and write it to PDF.
# Map the StockRecord namedtuple's _make class method
# (that creates a new instance) to all the rows fetched.
for stock in map(StockRecord._make, curs.fetchall()):
row = [ str(col).rjust(10) + " " for col in (stock.date, \
stock.trans, stock.symbol, stock.qty, stock.price) ]
# Above line can instead be written more simply as:
# row = [ str(col).rjust(10) + " " for col in stock ]
row_str = ''.join(row)
print_and_write(pw, row_str)

print_and_write(pw, '=' * len(hdr_fld_str))

except Exception as e:
print("ERROR: Caught exception: " + e.message)
sys.exit(1)

finally:
pw.close()
conn.close()

This time I've imported print_function so that I can use print as a function instead of as a statement.

Here's a screenshot of the PDF output in Foxit PDF Reader:


- Vasudev Ram - Online Python training and programming Dancing Bison EnterprisesSignup to hear about new products or services from me. Posts about Python Posts about xtopdf Contact Page

Share |
Vasudev Ram

25 Feb 2015 3:50am GMT

10 Nov 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: King Willams Town Bahnhof

Gestern musste ich morgens zur Station nach KWT um unsere Rerservierten Bustickets für die Weihnachtsferien in Capetown abzuholen. Der Bahnhof selber ist seit Dezember aus kostengründen ohne Zugverbindung - aber Translux und co - die langdistanzbusse haben dort ihre Büros.


Größere Kartenansicht




© benste CC NC SA

10 Nov 2011 10:57am GMT

09 Nov 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein

Niemand ist besorgt um so was - mit dem Auto fährt man einfach durch, und in der City - nahe Gnobie- "ne das ist erst gefährlich wenn die Feuerwehr da ist" - 30min später auf dem Rückweg war die Feuerwehr da.




© benste CC NC SA

09 Nov 2011 8:25pm GMT

08 Nov 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: Brai Party

Brai = Grillabend o.ä.

Die möchte gern Techniker beim Flicken ihrer SpeakOn / Klinke Stecker Verzweigungen...

Die Damen "Mamas" der Siedlung bei der offiziellen Eröffnungsrede

Auch wenn weniger Leute da waren als erwartet, Laute Musik und viele Leute ...

Und natürlich ein Feuer mit echtem Holz zum Grillen.

© benste CC NC SA

08 Nov 2011 2:30pm GMT

07 Nov 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: Lumanyano Primary

One of our missions was bringing Katja's Linux Server back to her room. While doing that we saw her new decoration.

Björn, Simphiwe carried the PC to Katja's school


© benste CC NC SA

07 Nov 2011 2:00pm GMT

06 Nov 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: Nelisa Haircut

Today I went with Björn to Needs Camp to Visit Katja's guest family for a special Party. First of all we visited some friends of Nelisa - yeah the one I'm working with in Quigney - Katja's guest fathers sister - who did her a haircut.

African Women usually get their hair done by arranging extensions and not like Europeans just cutting some hair.

In between she looked like this...

And then she was done - looks amazing considering the amount of hair she had last week - doesn't it ?

© benste CC NC SA

06 Nov 2011 7:45pm GMT

05 Nov 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: Mein Samstag

Irgendwie viel mir heute auf das ich meine Blogposts mal ein bischen umstrukturieren muss - wenn ich immer nur von neuen Plätzen berichte, dann müsste ich ja eine Rundreise machen. Hier also mal ein paar Sachen aus meinem heutigen Alltag.

Erst einmal vorweg, Samstag zählt zumindest für uns Voluntäre zu den freien Tagen.

Dieses Wochenende sind nur Rommel und ich auf der Farm - Katja und Björn sind ja mittlerweile in ihren Einsatzstellen, und meine Mitbewohner Kyle und Jonathan sind zu Hause in Grahamstown - sowie auch Sipho der in Dimbaza wohnt.
Robin, die Frau von Rommel ist in Woodie Cape - schon seit Donnerstag um da ein paar Sachen zur erledigen.
Naja wie dem auch sei heute morgen haben wir uns erstmal ein gemeinsames Weetbix/Müsli Frühstück gegönnt und haben uns dann auf den Weg nach East London gemacht. 2 Sachen waren auf der Checkliste Vodacom, Ethienne (Imobilienmakler) außerdem auf dem Rückweg die fehlenden Dinge nach NeedsCamp bringen.

Nachdem wir gerade auf der Dirtroad losgefahren sind mussten wir feststellen das wir die Sachen für Needscamp und Ethienne nicht eingepackt hatten aber die Pumpe für die Wasserversorgung im Auto hatten.

Also sind wir in EastLondon ersteinmal nach Farmerama - nein nicht das onlinespiel farmville - sondern einen Laden mit ganz vielen Sachen für eine Farm - in Berea einem nördlichen Stadteil gefahren.

In Farmerama haben wir uns dann beraten lassen für einen Schnellverschluss der uns das leben mit der Pumpe leichter machen soll und außerdem eine leichtere Pumpe zur Reperatur gebracht, damit es nicht immer so ein großer Aufwand ist, wenn mal wieder das Wasser ausgegangen ist.

Fego Caffé ist in der Hemmingways Mall, dort mussten wir und PIN und PUK einer unserer Datensimcards geben lassen, da bei der PIN Abfrage leider ein zahlendreher unterlaufen ist. Naja auf jeden Fall speichern die Shops in Südafrika so sensible Daten wie eine PUK - die im Prinzip zugang zu einem gesperrten Phone verschafft.

Im Cafe hat Rommel dann ein paar online Transaktionen mit dem 3G Modem durchgeführt, welches ja jetzt wieder funktionierte - und übrigens mittlerweile in Ubuntu meinem Linuxsystem perfekt klappt.

Nebenbei bin ich nach 8ta gegangen um dort etwas über deren neue Deals zu erfahren, da wir in einigen von Hilltops Centern Internet anbieten wollen. Das Bild zeigt die Abdeckung UMTS in NeedsCamp Katjas Ort. 8ta ist ein neuer Telefonanbieter von Telkom, nachdem Vodafone sich Telkoms anteile an Vodacom gekauft hat müssen die komplett neu aufbauen.
Wir haben uns dazu entschieden mal eine kostenlose Prepaidkarte zu testen zu organisieren, denn wer weis wie genau die Karte oben ist ... Bevor man einen noch so billigen Deal für 24 Monate signed sollte man wissen obs geht.

Danach gings nach Checkers in Vincent, gesucht wurden zwei Hotplates für WoodyCape - R 129.00 eine - also ca. 12€ für eine zweigeteilte Kochplatte.
Wie man sieht im Hintergrund gibts schon Weihnachtsdeko - Anfang November und das in Südafrika bei sonnig warmen min- 25°C

Mittagessen haben wir uns bei einem Pakistanischen Curry Imbiss gegönnt - sehr empfehlenswert !
Naja und nachdem wir dann vor ner Stunde oder so zurück gekommen sind habe ich noch den Kühlschrank geputzt den ich heute morgen zum defrosten einfach nach draußen gestellt hatte. Jetzt ist der auch mal wieder sauber und ohne 3m dicke Eisschicht...

Morgen ... ja darüber werde ich gesondert berichten ... aber vermutlich erst am Montag, denn dann bin ich nochmal wieder in Quigney(East London) und habe kostenloses Internet.

© benste CC NC SA

05 Nov 2011 4:33pm GMT

31 Oct 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: Sterkspruit Computer Center

Sterkspruit is one of Hilltops Computer Centres in the far north of Eastern Cape. On the trip to J'burg we've used the opportunity to take a look at the centre.

Pupils in the big classroom


The Trainer


School in Countryside


Adult Class in the Afternoon


"Town"


© benste CC NC SA

31 Oct 2011 4:58pm GMT

Benedict Stein: Technical Issues

What are you doing in an internet cafe if your ADSL and Faxline has been discontinued before months end. Well my idea was sitting outside and eating some ice cream.
At least it's sunny and not as rainy as on the weekend.


© benste CC NC SA

31 Oct 2011 3:11pm GMT

30 Oct 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: Nellis Restaurant

For those who are traveling through Zastron - there is a very nice Restaurant which is serving delicious food at reasanable prices.
In addition they're selling home made juices jams and honey.




interior


home made specialities - the shop in the shop


the Bar


© benste CC NC SA

30 Oct 2011 4:47pm GMT

29 Oct 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: The way back from J'burg

Having the 10 - 12h trip from J'burg back to ELS I was able to take a lot of pcitures including these different roadsides

Plain Street


Orange River in its beginngings (near Lesotho)


Zastron Anglican Church


The Bridge in Between "Free State" and Eastern Cape next to Zastron


my new Background ;)


If you listen to GoogleMaps you'll end up traveling 50km of gravel road - as it was just renewed we didn't have that many problems and saved 1h compared to going the official way with all it's constructions sites




Freeway


getting dark


© benste CC NC SA

29 Oct 2011 4:23pm GMT

28 Oct 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: Wie funktioniert eigentlich eine Baustelle ?

Klar einiges mag anders sein, vieles aber gleich - aber ein in Deutschland täglich übliches Bild einer Straßenbaustelle - wie läuft das eigentlich in Südafrika ?

Ersteinmal vorweg - NEIN keine Ureinwohner die mit den Händen graben - auch wenn hier mehr Manpower genutzt wird - sind sie fleißig mit Technologie am arbeiten.

Eine ganz normale "Bundesstraße"


und wie sie erweitert wird


gaaaanz viele LKWs


denn hier wird eine Seite über einen langen Abschnitt komplett gesperrt, so das eine Ampelschaltung mit hier 45 Minuten Wartezeit entsteht


Aber wenigstens scheinen die ihren Spaß zu haben ;) - Wie auch wir denn gücklicher Weise mussten wir nie länger als 10 min. warten.

© benste CC NC SA

28 Oct 2011 4:20pm GMT