23 May 2015

feedPlanet Python

Davide Moro: Kotti CMS - ElasticSearch integration

Announcing a new Kotti CMS (Python web framework based on Pylons/Pyramid and SQLAlchemy) plugin that provides ElasticSearch integration for fulltext search and indexing:

Development status? It should be considered experimental because this is the very first implementation. So any kind of help will be very appreciated! Beer, testing, pull releases, feedback, improving test coverage and so on.

Acknowledgements

kotti_es is based on a pyramid_es fork (https://github.com/truelab/pyramid_es/tree/feature-wrapper, there is a PR in progress). The pyramid_es author is Scott Torborg (https://github.com/storborg).

Configuration

The configuration is very simple.

Just enable the kotti_es plugin just add the kotti_es plugin, choose the index name and elastic search server addresses.

From the kotti_es README file:

kotti.configurators =
kotti_es.kotti_configure

elastic.index = your_project
elastic.servers = localhost:9200
elastic.ensure_index_on_start = 1
kotti_es.blacklist =
Image
...

kotti.search_content = kotti_es.util.es_search_content

Index already existing contents

With kotti_es you can reindex all your already existing contents without any change to the original Kotti code base with just one command:

$ reindex_es -c app.ini

So kotti_es plays well with models defined by third party plugins that are not ElasticSearch aware. You can install kotti_es on an already existing Kotti instance.

Custom behaviours

If you want you can override/extend the default indexing policy just registering your custom adapter. See the kotti_es tests for more info.

So no need to change existing models, no need to inherit from mixin classes and so on.

Video

kotti_es in action:

Wanna know more about Kotti CMS?

If you want to know more about Kotti CMS have a look at:

All Kotti posts published by @davidemoro:

23 May 2015 12:51am GMT

Davide Moro: Kotti CMS - ElasticSearch integration

Announcing a new Kotti CMS (Python web framework based on Pylons/Pyramid and SQLAlchemy) plugin that provides ElasticSearch integration for fulltext search and indexing:

Development status? It should be considered experimental because this is the very first implementation. So any kind of help will be very appreciated! Beer, testing, pull releases, feedback, improving test coverage and so on.

Acknowledgements

kotti_es is based on a pyramid_es fork (https://github.com/truelab/pyramid_es/tree/feature-wrapper, there is a PR in progress). The pyramid_es author is Scott Torborg (https://github.com/storborg).

Configuration

The configuration is very simple.

Just enable the kotti_es plugin just add the kotti_es plugin, choose the index name and elastic search server addresses.

From the kotti_es README file:

kotti.configurators =
kotti_es.kotti_configure

elastic.index = your_project
elastic.servers = localhost:9200
elastic.ensure_index_on_start = 1
kotti_es.blacklist =
Image
...

kotti.search_content = kotti_es.util.es_search_content

Index already existing contents

With kotti_es you can reindex all your already existing contents without any change to the original Kotti code base with just one command:

$ reindex_es -c app.ini

So kotti_es plays well with models defined by third party plugins that are not ElasticSearch aware. You can install kotti_es on an already existing Kotti instance.

Custom behaviours

If you want you can override/extend the default indexing policy just registering your custom adapter. See the kotti_es tests for more info.

So no need to change existing models, no need to inherit from mixin classes and so on.

Video

kotti_es in action:

Wanna know more about Kotti CMS?

If you want to know more about Kotti CMS have a look at:

All Kotti posts published by @davidemoro:

23 May 2015 12:51am GMT

Django Weblog: DjangoCon US Registration is open!

Registration is open! You may book your tickets and reserve your hotel accommodations for DjangoCon US!

We've worked hard to lower ticket prices this year, and thus have low prices for students and individuals. Financial assistance is also available to those with limited budgets.

Register for Djangocon 2015 here.

Reserve your hotel accommodation to take advantage of our discounted room rates. This year the daily room rate for single or double occupancy is $139. Please use the reservation code of DJANGO0915.

Applications for Django Girls at DjangoCon (as a coach or as a participant) will be handled separately; please visit Django Girls Austin for information about attending or coaching at the workshop.

For more information see our registration page.

23 May 2015 12:16am GMT

Django Weblog: DjangoCon US Registration is open!

Registration is open! You may book your tickets and reserve your hotel accommodations for DjangoCon US!

We've worked hard to lower ticket prices this year, and thus have low prices for students and individuals. Financial assistance is also available to those with limited budgets.

Register for Djangocon 2015 here.

Reserve your hotel accommodation to take advantage of our discounted room rates. This year the daily room rate for single or double occupancy is $139. Please use the reservation code of DJANGO0915.

Applications for Django Girls at DjangoCon (as a coach or as a participant) will be handled separately; please visit Django Girls Austin for information about attending or coaching at the workshop.

For more information see our registration page.

23 May 2015 12:16am GMT

PyTexas: Call For Proposals

The PyTexas 2015 Call for Proposals is now open! Submit your talk proposals today. Hey, you have some extra time this Memorial Weekend, so eat an extra hamburger and before you fall into a food coma, submit an idea for a lightning talk (5 min), short talk (20 min), long talk (50 min), or tutorial (3 hrs). See our Call for Proposals for all the details about speaking at PyTexas. We love both experienced and first time speakers, so no matter what your age or background, summon your inner honey badger and submit your talk ideas.

Helpful links:

23 May 2015 12:00am GMT

PyTexas: Call For Proposals

The PyTexas 2015 Call for Proposals is now open! Submit your talk proposals today. Hey, you have some extra time this Memorial Weekend, so eat an extra hamburger and before you fall into a food coma, submit an idea for a lightning talk (5 min), short talk (20 min), long talk (50 min), or tutorial (3 hrs). See our Call for Proposals for all the details about speaking at PyTexas. We love both experienced and first time speakers, so no matter what your age or background, summon your inner honey badger and submit your talk ideas.

Helpful links:

23 May 2015 12:00am GMT

22 May 2015

feedPlanet Python

PyPy Development: CFFI 1.0.1 released

CFFI 1.0.1 final has now been released for CPython! CFFI is a (CPython and PyPy) module to interact with C code from Python.

The main news from CFFI 0.9 is the new way to build extension modules: the "out-of-line" mode, where you have a separate build script. When this script is executed, it produces the extension module. This comes with associated Setuptools support that fixes the headache of distributing your own CFFI-using packages. It also massively cuts down the import times.

Although this is a major new version, it should be fully backward-compatible: existing projects should continue to work, in what is now called the "in-line mode".

The documentation has been reorganized and split into a few pages. For more information about this new "out-of-line" mode, as well as more general information about what CFFI is and how to use it, read the Goals and proceed to the Overview.

Unlike the 1.0 beta 1 version (- click for a motivated introduction), the final version also supports an out-of-line mode for projects using ffi.dlopen(), instead of only ffi.verify().

PyPy support: PyPy needs integrated support for efficient JITting, so you cannot install a different version of CFFI on top of an existing PyPy. You need to wait for the upcoming PyPy 2.6 to use CFFI 1.0---or get a nightly build.

My thanks again to the PSF (Python Software Foundation) for their financial support!

UPDATE:

Bug with the first example "ABI out-of-line": variadic functions (like printf, ending in a "..." argument) crash. Fixed in CFFI 1.0.2.

22 May 2015 7:03pm GMT

PyPy Development: CFFI 1.0.1 released

CFFI 1.0.1 final has now been released for CPython! CFFI is a (CPython and PyPy) module to interact with C code from Python.

The main news from CFFI 0.9 is the new way to build extension modules: the "out-of-line" mode, where you have a separate build script. When this script is executed, it produces the extension module. This comes with associated Setuptools support that fixes the headache of distributing your own CFFI-using packages. It also massively cuts down the import times.

Although this is a major new version, it should be fully backward-compatible: existing projects should continue to work, in what is now called the "in-line mode".

The documentation has been reorganized and split into a few pages. For more information about this new "out-of-line" mode, as well as more general information about what CFFI is and how to use it, read the Goals and proceed to the Overview.

Unlike the 1.0 beta 1 version (- click for a motivated introduction), the final version also supports an out-of-line mode for projects using ffi.dlopen(), instead of only ffi.verify().

PyPy support: PyPy needs integrated support for efficient JITting, so you cannot install a different version of CFFI on top of an existing PyPy. You need to wait for the upcoming PyPy 2.6 to use CFFI 1.0---or get a nightly build.

My thanks again to the PSF (Python Software Foundation) for their financial support!

UPDATE:

Bug with the first example "ABI out-of-line": variadic functions (like printf, ending in a "..." argument) crash. Fixed in CFFI 1.0.2.

22 May 2015 7:03pm GMT

Reinout van Rees: Pygrunn: ZeroMQ - Pieter Hintjens

(One of the summaries of the 2015 Pygrunn conference)

Pieter Hintjens has quite some some experience with distributed systems. Distributed systems are, to him, about making our systems look more like the real world. The real world is distributed.

Writing distributed systems is hard. You need a big stack. The reason that we're using http such a lot is because that was one of the first ones that is pretty simple and that we could understand. Almost everything seems to be http now.

Three comments:

  • So: the costs of such a system must be low. He really likes ZeroMQ, especially because it makes it cheap.

  • We lack a lot of knowledge. The people that can do it well are few. Ideally, the community should be bigger. We have to build the culture, build the knowledge. Zeromq is one of the first bigger open source projects that succeeded.

  • Conway's law: an organization will build software that looks like itself. A centralized power-hungry organization will probably build centralized power-hungry software.

    So: if you want to write distributed systems stuff, your organization has to be distributed!

    Who has meetings in his company? They are bad bad bad. They're blocking. You have to "synchronize state" and wait for agreement. A conference like pygrunn is fine: meeting people is fine. At pygrunn, there's no state synchronization. Imagine that it were a meeting to agree on a standard editor...

In a distributed system, what you really want is participation. Open source development needs pull requests, so to say.

A question about making money from open source resulted in a rant (I don't mean the term very negatively, here) about open source software being the only way to produce valuable software. "You might as well ask about how you can make money from a free school system". "It is idiotic to ask the question". And some things about people believing things because someone says it is so (like "you can only make money with ...") without thinking themselves.

Something to emulate: our food system. None of us owns the complete food system. Nobody owns the full food system. But it works! Lots of smaller and bigger actors. And everyone had breakfast and lunch today. The system works. This kind of distributed system is an example to emulate in our open source software.

Nice comparison when asked about succesful commercial software. Gmail is a succesful example, but that's something that grew pretty organically. Compare that with google wave or google plus: who even remembers them? Those were vision driven software. Made based on money. A failure.

22 May 2015 6:49pm GMT

Reinout van Rees: Pygrunn: ZeroMQ - Pieter Hintjens

(One of the summaries of the 2015 Pygrunn conference)

Pieter Hintjens has quite some some experience with distributed systems. Distributed systems are, to him, about making our systems look more like the real world. The real world is distributed.

Writing distributed systems is hard. You need a big stack. The reason that we're using http such a lot is because that was one of the first ones that is pretty simple and that we could understand. Almost everything seems to be http now.

Three comments:

  • So: the costs of such a system must be low. He really likes ZeroMQ, especially because it makes it cheap.

  • We lack a lot of knowledge. The people that can do it well are few. Ideally, the community should be bigger. We have to build the culture, build the knowledge. Zeromq is one of the first bigger open source projects that succeeded.

  • Conway's law: an organization will build software that looks like itself. A centralized power-hungry organization will probably build centralized power-hungry software.

    So: if you want to write distributed systems stuff, your organization has to be distributed!

    Who has meetings in his company? They are bad bad bad. They're blocking. You have to "synchronize state" and wait for agreement. A conference like pygrunn is fine: meeting people is fine. At pygrunn, there's no state synchronization. Imagine that it were a meeting to agree on a standard editor...

In a distributed system, what you really want is participation. Open source development needs pull requests, so to say.

A question about making money from open source resulted in a rant (I don't mean the term very negatively, here) about open source software being the only way to produce valuable software. "You might as well ask about how you can make money from a free school system". "It is idiotic to ask the question". And some things about people believing things because someone says it is so (like "you can only make money with ...") without thinking themselves.

Something to emulate: our food system. None of us owns the complete food system. Nobody owns the full food system. But it works! Lots of smaller and bigger actors. And everyone had breakfast and lunch today. The system works. This kind of distributed system is an example to emulate in our open source software.

Nice comparison when asked about succesful commercial software. Gmail is a succesful example, but that's something that grew pretty organically. Compare that with google wave or google plus: who even remembers them? Those were vision driven software. Made based on money. A failure.

22 May 2015 6:49pm GMT

PyCharm: Announcing the PyCharm 4.5.1 release update

Just one week after the PyCharm 4.5 release, we are pleased to announce the general availability of the PyCharm 4.5.1 bug-fix update. It has been uploaded and is now available from the download page. It also will be available shortly as a patch update from within the IDE (from PyCharm 4.5 and 4.5.1 RC only).

As a recap, some notable highlights of this release include: a fix for incorrect encoding and numerous fixes for brand-new manage.py tool.

For further details on the bug fixes and changes, please consult the Release Notes.
As usual, please report any problem you found in the issue tracker.

If you would like to discuss your experiences with PyCharm, we look forward to your feedback in the comments to this blog post and on twitter.

Develop with Pleasure!
-PyCharm team

22 May 2015 4:05pm GMT

PyCharm: Announcing the PyCharm 4.5.1 release update

Just one week after the PyCharm 4.5 release, we are pleased to announce the general availability of the PyCharm 4.5.1 bug-fix update. It has been uploaded and is now available from the download page. It also will be available shortly as a patch update from within the IDE (from PyCharm 4.5 and 4.5.1 RC only).

As a recap, some notable highlights of this release include: a fix for incorrect encoding and numerous fixes for brand-new manage.py tool.

For further details on the bug fixes and changes, please consult the Release Notes.
As usual, please report any problem you found in the issue tracker.

If you would like to discuss your experiences with PyCharm, we look forward to your feedback in the comments to this blog post and on twitter.

Develop with Pleasure!
-PyCharm team

22 May 2015 4:05pm GMT

Reinout van Rees: Pygrunn: Orchestrating Python projects using CoreOS - Oscar Vilaplana

(One of the summaries of the 2015 Pygrunn conference)

(Note: Oscar Vilaplana had a lot of info in his presentation and also a lot on his slides, so this summary is not as elaborate as what he told us. Wait for the video for the full version.)

"Orchestrating python": why? He cares about reliability. You need a static application environment. Reliable deployments. Easy and reliable continuous integration. And self-healing. Nice is if it is also portable.

A common way to make scalable systems is to use microservices. You compose, mix and extend them into bigger wholes. Ideally it is "cluster-first": also locally you test with a couple of instances. A "microservices architecture".

Wouldn't it be nice to take the "blue pill" and move to a different reality? One in where you have small services, each running in a separate container without a care for what occurs around it? No sysadmin stuff? And similary the smart infrastructure people only have to deal with generic containers that can't break anything.

He did a little demo with rethinkdb and flask.

For the demo it uses coreOS: kernel + docker + etcd. CoreOS uses a read-only root filesystem and it by design doesn't have a package manager. Journald for logging (it automatically captures the stdout). Systemd for managing processes.

etcd? It is a distributed configuration store. It has a http API.

Also: "fleet". "systemd for services". It starts up the containers. It coordinates accross the cluster. It will re-start containers if they die.

How do we get containers to talk to each other? They're containerized... For that there's "flannel": "dhcp for containers". Per-cluster specific subnet. Per-machine smaller subnet. The best system to run all this is Kubernetes.

Kubernetes uses "replication controllers". The basis is a "pod", from which multiple replicas are made, depending on the amount of instances you need.

He then showed a demo. Including a rolling update. Nice. Similarly for a rethinkdb cluster where he increased the number of nodes halfway the demo. Nice, too.

In development, it might be easy to use "nspawn" instead of docker. It is mostly the same, only less isolated (which is handy for development).

22 May 2015 1:50pm GMT

Reinout van Rees: Pygrunn: Orchestrating Python projects using CoreOS - Oscar Vilaplana

(One of the summaries of the 2015 Pygrunn conference)

(Note: Oscar Vilaplana had a lot of info in his presentation and also a lot on his slides, so this summary is not as elaborate as what he told us. Wait for the video for the full version.)

"Orchestrating python": why? He cares about reliability. You need a static application environment. Reliable deployments. Easy and reliable continuous integration. And self-healing. Nice is if it is also portable.

A common way to make scalable systems is to use microservices. You compose, mix and extend them into bigger wholes. Ideally it is "cluster-first": also locally you test with a couple of instances. A "microservices architecture".

Wouldn't it be nice to take the "blue pill" and move to a different reality? One in where you have small services, each running in a separate container without a care for what occurs around it? No sysadmin stuff? And similary the smart infrastructure people only have to deal with generic containers that can't break anything.

He did a little demo with rethinkdb and flask.

For the demo it uses coreOS: kernel + docker + etcd. CoreOS uses a read-only root filesystem and it by design doesn't have a package manager. Journald for logging (it automatically captures the stdout). Systemd for managing processes.

etcd? It is a distributed configuration store. It has a http API.

Also: "fleet". "systemd for services". It starts up the containers. It coordinates accross the cluster. It will re-start containers if they die.

How do we get containers to talk to each other? They're containerized... For that there's "flannel": "dhcp for containers". Per-cluster specific subnet. Per-machine smaller subnet. The best system to run all this is Kubernetes.

Kubernetes uses "replication controllers". The basis is a "pod", from which multiple replicas are made, depending on the amount of instances you need.

He then showed a demo. Including a rolling update. Nice. Similarly for a rethinkdb cluster where he increased the number of nodes halfway the demo. Nice, too.

In development, it might be easy to use "nspawn" instead of docker. It is mostly the same, only less isolated (which is handy for development).

22 May 2015 1:50pm GMT

Reinout van Rees: Pygrunn: Laurence de Jong - Towards a web framework for distributed apps

(One of the summaries of the 2015 Pygrunn conference)

Laurence de Jong is a graduate student.

Everyone uses the internet. Many of the most-used sites are centralized. Centralization means control. It also gives scale advantages, like with gmail's great spam filter.

It also has drawbacks. If the site goes down, it is really down. Another drawback is the control they have over our data and what they do with it. If you're not paying for it, you're the product being sold. Also: eavesdropping. Centralized data makes it easy for agencies to collect the data. And: censorship!

A better way would be decentralized websites. There are existing decentralized things like Freenet, but they're a pain to install and the content on there is not the content you want to see... And part of it is stored on your harddisk...

See also Mealstrom, which distributes websites as torrents. A problem there is the non-existence of proper decentralized DNS: you have unreadable hashes.

A solution could be the blockchain system from bitcoin. It is called namecoin. This way, you could store secure DNS records to torrent hashes in a decentralized way.

https://github.com/HelloZeroNet/ZeroNet uses namecoin to have proper DNS addresses and to download the website via bittorrent. Not many people use it right now.

And.... the websites you download right now are all static. We want dynamic content! You can do even that with blockchains. An example is the decentralized twitter alternative http://twister.net.co/ . Mostly used by chinese people because twitter is mostly unavailable there.

There are problems, of course. Where do you store your data? Agencies can still do traffic analysis. How do you manage your private keys? Aren't we getting browsers wars all over again? And can your mom install it (answer: no, it is too hard).

An extra problem is more technical: distributed hash tables are considered unsafe.

And... in the end, if you use hashes for everything (like every individual tweet, email and webpage), that's a lot of hashes to store, partially locally. So it isn't the solution, but at least it is a solution.

22 May 2015 11:58am GMT

Reinout van Rees: Pygrunn: Laurence de Jong - Towards a web framework for distributed apps

(One of the summaries of the 2015 Pygrunn conference)

Laurence de Jong is a graduate student.

Everyone uses the internet. Many of the most-used sites are centralized. Centralization means control. It also gives scale advantages, like with gmail's great spam filter.

It also has drawbacks. If the site goes down, it is really down. Another drawback is the control they have over our data and what they do with it. If you're not paying for it, you're the product being sold. Also: eavesdropping. Centralized data makes it easy for agencies to collect the data. And: censorship!

A better way would be decentralized websites. There are existing decentralized things like Freenet, but they're a pain to install and the content on there is not the content you want to see... And part of it is stored on your harddisk...

See also Mealstrom, which distributes websites as torrents. A problem there is the non-existence of proper decentralized DNS: you have unreadable hashes.

A solution could be the blockchain system from bitcoin. It is called namecoin. This way, you could store secure DNS records to torrent hashes in a decentralized way.

https://github.com/HelloZeroNet/ZeroNet uses namecoin to have proper DNS addresses and to download the website via bittorrent. Not many people use it right now.

And.... the websites you download right now are all static. We want dynamic content! You can do even that with blockchains. An example is the decentralized twitter alternative http://twister.net.co/ . Mostly used by chinese people because twitter is mostly unavailable there.

There are problems, of course. Where do you store your data? Agencies can still do traffic analysis. How do you manage your private keys? Aren't we getting browsers wars all over again? And can your mom install it (answer: no, it is too hard).

An extra problem is more technical: distributed hash tables are considered unsafe.

And... in the end, if you use hashes for everything (like every individual tweet, email and webpage), that's a lot of hashes to store, partially locally. So it isn't the solution, but at least it is a solution.

22 May 2015 11:58am GMT

Reinout van Rees: Pygrunn: Data acquisition with the Vlermv database - Thomas Levine

(One of the summaries of the 2015 Pygrunn conference)

Thomas Levine wrote vlermv. A simple "kind of database" by using folders and files. Python is always a bit verbose when dealing with files, so that's why he wrote vlermv.

Usage:

from vlermv import Vlermv
vlermv = Vlermv('/tmp/a-directory')

vlermv['filename'] = 'something'
# ^^^ This saves a python pickle with 'something' to /tmp/a-directory/filename

The advantage is that the results are always readable, even if you lose the original program.

You can choose a different serializer, for intance json instead of pickle.

You can also choose your own key_transformer. A key_transformer translates a key to a filename. Handy if you want to use a datetime or tuple as a key, for instance.

The two hard things in computer science are:

  • Cache invalidation.
  • Naming things.

Cache invalidation? Well, vlermv doesn't do cache invalidation, so that's easy. Naming things? Well, the name 'vlermv' comes from typing randomly on his (dvorak) keyboard... :-)

Testing an app that uses vlermv is easy: you can mock the entire database with a simple python dictionary.

What if vlermv is too new for you? You can use the standard library shelve module that does mostly the same, only it stores everything in one file.

A drawback of vlermv: it is quite slow.

Fancy full-featured databases are fast and nice, but do you really need all those features? If not, wouldn't you be better served by a simple vlermv database? You might even use it as a replacement for mongodb! That one is used often only because it is so easy to start with and so easy to create a database. If you don't have a lot of data, vlermv might be a much better fit.

22 May 2015 11:07am GMT

Reinout van Rees: Pygrunn: Data acquisition with the Vlermv database - Thomas Levine

(One of the summaries of the 2015 Pygrunn conference)

Thomas Levine wrote vlermv. A simple "kind of database" by using folders and files. Python is always a bit verbose when dealing with files, so that's why he wrote vlermv.

Usage:

from vlermv import Vlermv
vlermv = Vlermv('/tmp/a-directory')

vlermv['filename'] = 'something'
# ^^^ This saves a python pickle with 'something' to /tmp/a-directory/filename

The advantage is that the results are always readable, even if you lose the original program.

You can choose a different serializer, for intance json instead of pickle.

You can also choose your own key_transformer. A key_transformer translates a key to a filename. Handy if you want to use a datetime or tuple as a key, for instance.

The two hard things in computer science are:

  • Cache invalidation.
  • Naming things.

Cache invalidation? Well, vlermv doesn't do cache invalidation, so that's easy. Naming things? Well, the name 'vlermv' comes from typing randomly on his (dvorak) keyboard... :-)

Testing an app that uses vlermv is easy: you can mock the entire database with a simple python dictionary.

What if vlermv is too new for you? You can use the standard library shelve module that does mostly the same, only it stores everything in one file.

A drawback of vlermv: it is quite slow.

Fancy full-featured databases are fast and nice, but do you really need all those features? If not, wouldn't you be better served by a simple vlermv database? You might even use it as a replacement for mongodb! That one is used often only because it is so easy to start with and so easy to create a database. If you don't have a lot of data, vlermv might be a much better fit.

22 May 2015 11:07am GMT

Reinout van Rees: Pygrunn: Reliable distributed task scheduling - Niels Hageman

(One of the summaries of the 2015 Pygrunn conference)

Note: see Niels Hageman's somewhat-related talk from 2012 . Niels works at Paylogic . Wow, the room was packed.

They discovered the normal problem of operations that took too long for the regular request/response cycle. The normal solution is to use a task queue. Some requirements:

  • Support python, as most of their code is in python.
  • It has to be super-reliable. It also needs to allow running in multiple data centers (for redundacy).
  • Ideally, a low-maintenance solution as they already have enough other work.

Option 1: celery + rabbitMQ. It is widely used and relatively easy to use. But rabbitMQ was unreliable. With alarming frequency, the two queues in the two datacenters lost sync. They also got clogged from time to time.

Option 2: celery + mysql. They already use mysql, which is an advantage. But... the combination was buggy and not-production ready.

Option 3: gearman with mysql. Python bindings were buggy and non-maintained. And you could also run one gearman bundle, so multiple datacenters was out of the window.

Option 4: do it yourself. They did this and ended up with "Taskman" (which I couldn't find online, they're planning on making it open source later on: they still need to add installation documentation).

The backend? They started with mysql. It is a great relational database, but it isn't a great queue. There is a saying on the internet: Thou shalt not use thine database as a task queue. With some adjustments, like autocommit, they got it working nicely anyway.

The task server consists of a python daemon (running under supervisor) and a separate task runner. It runs in a separate process to provide isolation and resource control.

Of course, the task server needs to be integrated in the main server. The task server is written as an independent application, so how does the task finder find the python functions it needs to run? They do this via "server plugins" that define which environment variables are needed, which python path you need and which function and which version you need. All this gets applied by the task runner and subsequently it can import and run the function.

Some additional features of their task runner:

  • Tasks can report progress.
  • Tasks can be aborted.
  • Task start time can be constrained.
  • There's exception handling.

Some of the properties of taskman: it is optimized for long running tasks. And: it is designed for reliability. Very necessary, as Paylogic is a payment processor.

It also means it is less suited when you have lots of little tasks. Running everything as a separate process is fine for longer-running processes, but it is too heavy-weight for lots of small tasks. Oh, and there's no admin UI yet: he uses phpmysqladmin :-)

22 May 2015 10:28am GMT

Reinout van Rees: Pygrunn: Reliable distributed task scheduling - Niels Hageman

(One of the summaries of the 2015 Pygrunn conference)

Note: see Niels Hageman's somewhat-related talk from 2012 . Niels works at Paylogic . Wow, the room was packed.

They discovered the normal problem of operations that took too long for the regular request/response cycle. The normal solution is to use a task queue. Some requirements:

  • Support python, as most of their code is in python.
  • It has to be super-reliable. It also needs to allow running in multiple data centers (for redundacy).
  • Ideally, a low-maintenance solution as they already have enough other work.

Option 1: celery + rabbitMQ. It is widely used and relatively easy to use. But rabbitMQ was unreliable. With alarming frequency, the two queues in the two datacenters lost sync. They also got clogged from time to time.

Option 2: celery + mysql. They already use mysql, which is an advantage. But... the combination was buggy and not-production ready.

Option 3: gearman with mysql. Python bindings were buggy and non-maintained. And you could also run one gearman bundle, so multiple datacenters was out of the window.

Option 4: do it yourself. They did this and ended up with "Taskman" (which I couldn't find online, they're planning on making it open source later on: they still need to add installation documentation).

The backend? They started with mysql. It is a great relational database, but it isn't a great queue. There is a saying on the internet: Thou shalt not use thine database as a task queue. With some adjustments, like autocommit, they got it working nicely anyway.

The task server consists of a python daemon (running under supervisor) and a separate task runner. It runs in a separate process to provide isolation and resource control.

Of course, the task server needs to be integrated in the main server. The task server is written as an independent application, so how does the task finder find the python functions it needs to run? They do this via "server plugins" that define which environment variables are needed, which python path you need and which function and which version you need. All this gets applied by the task runner and subsequently it can import and run the function.

Some additional features of their task runner:

  • Tasks can report progress.
  • Tasks can be aborted.
  • Task start time can be constrained.
  • There's exception handling.

Some of the properties of taskman: it is optimized for long running tasks. And: it is designed for reliability. Very necessary, as Paylogic is a payment processor.

It also means it is less suited when you have lots of little tasks. Running everything as a separate process is fine for longer-running processes, but it is too heavy-weight for lots of small tasks. Oh, and there's no admin UI yet: he uses phpmysqladmin :-)

22 May 2015 10:28am GMT

Reinout van Rees: Pygrunn: Python, WebRTC and You - Saúl Ibarra Corretgé

(One of the summaries of the 2015 Pygrunn conference )

Saúl Ibarra Corretgé does telecom and VOIP stuff for his work, which is what webRTC calls legacy :-)

webRTC is Real-Time Communication for the web via simple APIs. So: voice calling, video chat, P2P file sharing without needing internal or external plugins.

Basically it is a big pile of C++ that sits in your browser. One of the implementations is http://www.webrtc.org/. Some people say that webRTC stand for Well, Everybody Better Restart Their Chrome. Because the browser support is mostly limited to chrome. There's a plugin for IE/safari, though.

There are several javascript libraries for webRTC. They help you set up a secure connection to another person (a "RTCPeerConnection"). The connection is directly, if possible. If not, due to firewalls for instance, you can use an external server. It uses ICE, which means Interactive Connectivity Establishment (see ICE trickle which he apparently used). A way to set up the connection.

Once you have a connection, you have an RTCDataChannel. Which you can use, for instance, to send a file from one browser to another.

As a testcase, he wrote Call Roulette. The app is in python, but in the browser javascript is used as that is more-or-less the native way to do it. The "call roulette" app connects a user to a random other user. Users will send simple json requests to the app. Once the app finds two candidates, both get the other's data to set up a subsequent webRTC connection.

He made the toy app in python 3.3 because it is new. It has websockets. And async via asyncio "because async is modern :-)". All, nice new and shiny.

So: users connect from their browser with a websocket connection to the app. They are paired up and the webRTC connection data is send back. Very fast.

Fun: light-weight django-models-like models via https://pypi.python.org/pypi/jsonmodels/ ! Look it up.

He did a live demo with web video with someone from the audience. Worked basically like a charm.

22 May 2015 9:15am GMT

Reinout van Rees: Pygrunn: Python, WebRTC and You - Saúl Ibarra Corretgé

(One of the summaries of the 2015 Pygrunn conference )

Saúl Ibarra Corretgé does telecom and VOIP stuff for his work, which is what webRTC calls legacy :-)

webRTC is Real-Time Communication for the web via simple APIs. So: voice calling, video chat, P2P file sharing without needing internal or external plugins.

Basically it is a big pile of C++ that sits in your browser. One of the implementations is http://www.webrtc.org/. Some people say that webRTC stand for Well, Everybody Better Restart Their Chrome. Because the browser support is mostly limited to chrome. There's a plugin for IE/safari, though.

There are several javascript libraries for webRTC. They help you set up a secure connection to another person (a "RTCPeerConnection"). The connection is directly, if possible. If not, due to firewalls for instance, you can use an external server. It uses ICE, which means Interactive Connectivity Establishment (see ICE trickle which he apparently used). A way to set up the connection.

Once you have a connection, you have an RTCDataChannel. Which you can use, for instance, to send a file from one browser to another.

As a testcase, he wrote Call Roulette. The app is in python, but in the browser javascript is used as that is more-or-less the native way to do it. The "call roulette" app connects a user to a random other user. Users will send simple json requests to the app. Once the app finds two candidates, both get the other's data to set up a subsequent webRTC connection.

He made the toy app in python 3.3 because it is new. It has websockets. And async via asyncio "because async is modern :-)". All, nice new and shiny.

So: users connect from their browser with a websocket connection to the app. They are paired up and the webRTC connection data is send back. Very fast.

Fun: light-weight django-models-like models via https://pypi.python.org/pypi/jsonmodels/ ! Look it up.

He did a live demo with web video with someone from the audience. Worked basically like a charm.

22 May 2015 9:15am GMT

Reinout van Rees: Pygrunn: IPython and MongoDB as big data scratchpads - Jens de Smit

(One of the summaries of the 2015 Pygrunn conference )

A show of hand: about half the people in the room have used mongodb and half used ipython notebooks. There's not a lot of overlap.

Jens de Smit works for optiver, a financial company. A "high-frequency trader", so they use a lot of data and they do a lot of calculations. They do a lot of financial transactions and they need to monitor if they made the right trades.

Trading is now almost exclusively done electronically. Waving hands and shouting on the trading floor at a stock exchange is mostly a thing of the past. Match-making between supply and demand is done centrally. It started 15 years ago. The volume of transactions really exploded. Interesting fact: the response time has gone from 300ms to just 1ms!

So... being fast is important in electronic trading. If you're slow, you trade at the wrong prices. Trading at the wrong prices means losing money. So speed is important. Just as making the right choices.

What he had to do is to figure out how fast an order was made and wether it was a good order. Non-intrusively. So: what market event did we react to? What was the automatic trade decision (done by an algorithm)? Was it a good one? How long did it all take?

So he monitors data going in and out of their system. He couldn't change the base system, so: log files, network data and an accounting database. Most of the data is poorly indexed. And a very low signal-to-noise ratio. And of course the logfiles aren't all consistent. And documentation is bad.

Oh, and the data size is of course also to big to fit in memory :-)

He used mongodb. A schemaless json (well, bson, binary version of json) store. Great for messy data. Easy to use. Just put in a python dictionary, basically. The data is persisted to disk, but as long as you have enough RAM, it'll keep it in memory. Very fast that way. You get indexes and speedups by default.

After he managed to get everything into mongodb, he had to make sense of things. So: correlate decision logs to network data. This is easy for humans to spot, but hard for computers. Computers are good at exact matches, humans are better at inexact pattern matches.

He used ipython notebook, a nice interactive python shell with a browser interface. Including matplotlib integration for easy graphs. Syntax highlighting; you can render html inside the shell; you can save your work at the end of the day (which you can't with a regular python shell!); inline editing.

Nice: since last week, rendering such notebooks is supported by github. (I guess he means this announcement ).

Now mongodb. It is very simple to create a directory and start mongodb. If you stop mongo and delete the directory, it is gone as if it was never there. Easy. And with pymongo it is just a few lines of python code and you're set. Including a handy query language.

He showed a couple of code examples. Looked pretty handy.

Creating an index is a oneliner. If you know beforehand what kinds of queries you want to do, you can quickly create an index for it, which speeds up your queries a lot. You can make complex indexes, but in his experience, simple single-field indexes are often enough.

Something to watch out for: mongo does never return disk space to the OS. If you delete lots of objects, the OS doesn't get it back unless you shut mongodb down and "repair" the database. What he does is simply delete the database at the end of the day!

He showed one of the outputs: a graph with response times which immediately showed that several responses were too slow. Good, useful information. One year ago he wouldn't have dreamt of being able to do this sort of analysis.

Mongo is very useful for this kind of work. You use mongodb's strengths and you aren't bothered by many of the drawbacks, like missing transactions.

22 May 2015 8:34am GMT

Reinout van Rees: Pygrunn: IPython and MongoDB as big data scratchpads - Jens de Smit

(One of the summaries of the 2015 Pygrunn conference )

A show of hand: about half the people in the room have used mongodb and half used ipython notebooks. There's not a lot of overlap.

Jens de Smit works for optiver, a financial company. A "high-frequency trader", so they use a lot of data and they do a lot of calculations. They do a lot of financial transactions and they need to monitor if they made the right trades.

Trading is now almost exclusively done electronically. Waving hands and shouting on the trading floor at a stock exchange is mostly a thing of the past. Match-making between supply and demand is done centrally. It started 15 years ago. The volume of transactions really exploded. Interesting fact: the response time has gone from 300ms to just 1ms!

So... being fast is important in electronic trading. If you're slow, you trade at the wrong prices. Trading at the wrong prices means losing money. So speed is important. Just as making the right choices.

What he had to do is to figure out how fast an order was made and wether it was a good order. Non-intrusively. So: what market event did we react to? What was the automatic trade decision (done by an algorithm)? Was it a good one? How long did it all take?

So he monitors data going in and out of their system. He couldn't change the base system, so: log files, network data and an accounting database. Most of the data is poorly indexed. And a very low signal-to-noise ratio. And of course the logfiles aren't all consistent. And documentation is bad.

Oh, and the data size is of course also to big to fit in memory :-)

He used mongodb. A schemaless json (well, bson, binary version of json) store. Great for messy data. Easy to use. Just put in a python dictionary, basically. The data is persisted to disk, but as long as you have enough RAM, it'll keep it in memory. Very fast that way. You get indexes and speedups by default.

After he managed to get everything into mongodb, he had to make sense of things. So: correlate decision logs to network data. This is easy for humans to spot, but hard for computers. Computers are good at exact matches, humans are better at inexact pattern matches.

He used ipython notebook, a nice interactive python shell with a browser interface. Including matplotlib integration for easy graphs. Syntax highlighting; you can render html inside the shell; you can save your work at the end of the day (which you can't with a regular python shell!); inline editing.

Nice: since last week, rendering such notebooks is supported by github. (I guess he means this announcement ).

Now mongodb. It is very simple to create a directory and start mongodb. If you stop mongo and delete the directory, it is gone as if it was never there. Easy. And with pymongo it is just a few lines of python code and you're set. Including a handy query language.

He showed a couple of code examples. Looked pretty handy.

Creating an index is a oneliner. If you know beforehand what kinds of queries you want to do, you can quickly create an index for it, which speeds up your queries a lot. You can make complex indexes, but in his experience, simple single-field indexes are often enough.

Something to watch out for: mongo does never return disk space to the OS. If you delete lots of objects, the OS doesn't get it back unless you shut mongodb down and "repair" the database. What he does is simply delete the database at the end of the day!

He showed one of the outputs: a graph with response times which immediately showed that several responses were too slow. Good, useful information. One year ago he wouldn't have dreamt of being able to do this sort of analysis.

Mongo is very useful for this kind of work. You use mongodb's strengths and you aren't bothered by many of the drawbacks, like missing transactions.

22 May 2015 8:34am GMT

Reinout van Rees: Pygrunn: Leveraging procedural knowledge - K Rain Leander

(One of the summaries of the 2015 Pygrunn conference )

K Rain Leander works at Red Hat and yes, she wore a bright red hat :-) She's a python and django newbie. She knows how it is to be a newbie: there is so much in linux that there are always areas where you're a complete newbie. So everyone is helpful there.

"Amsterdam is the capital of the netherlands" is declarative knowledge. Procedural knowledge is things like learning to ride a bike or a knew language. So: What versus How. You might know declaratively how to swim, but procedurally you might still drown: you need to practice and try.

Some background: she was a dancer in the USA. Unless you're famous, you barely scrape by financially. So she started teaching herself new languages. Both real-life languages and computer languages. Css, html for starters. And she kept learning.

She got a job at Red Hat. You have to pass a RHCE certification test within 90 days of starting work there - or you're fired. She made it. She

She has military background. In bootcamp, the purpose is not the pushups and the long runs. The goal is to break you down so that you jump when they say "jump".

In the Red Hat bootcamp, the goal is not making the test. The goal is to figure out if you're able to drink from the firehose. Which means if you get a support request, you say "I'll figure it out for you" and you just dive in and try to figure it out. You have to be able to dive into a whole lot of new information without panicking. That's drinking from the firehose.

She re-used existing knowledge and previous skills to learn everything. The important part was not being afraid to dive in.

She moved towards programming. Python, django. She was new to it. One of the first steps? "Set up a virtualenv and....". It can frighten you, but it is just a question of RTFM. Just read the manual. Just read it and then start doing it.

She went to a Django Girls Workshop. (One of the results: http://leanderthalblog.herokuapp.com/). Django girls does a really good job of providing material and documentation. She had some problems installing it, but continued (and succeeded) anyway.

... and then someone challenged her to deploy it on openshift. http://django-leanderthal.rhcloud.com/ It hasn't succeeded completely yet. But she'll persevere and get it working.

She recommends http://learnpythonthehardway.org/ to learn python.

What's next: she'll practice, practice, practice. And she'll contribute to the community. Probably build one or two apps. And she'll be a coach at the upcoming Groningen django girls workshop ("as a coach. No, I'm not worried....")

So: re-use your existing knowledge and build from there. Don't be afraid. Just do it.

22 May 2015 7:45am GMT

Reinout van Rees: Pygrunn: Leveraging procedural knowledge - K Rain Leander

(One of the summaries of the 2015 Pygrunn conference )

K Rain Leander works at Red Hat and yes, she wore a bright red hat :-) She's a python and django newbie. She knows how it is to be a newbie: there is so much in linux that there are always areas where you're a complete newbie. So everyone is helpful there.

"Amsterdam is the capital of the netherlands" is declarative knowledge. Procedural knowledge is things like learning to ride a bike or a knew language. So: What versus How. You might know declaratively how to swim, but procedurally you might still drown: you need to practice and try.

Some background: she was a dancer in the USA. Unless you're famous, you barely scrape by financially. So she started teaching herself new languages. Both real-life languages and computer languages. Css, html for starters. And she kept learning.

She got a job at Red Hat. You have to pass a RHCE certification test within 90 days of starting work there - or you're fired. She made it. She

She has military background. In bootcamp, the purpose is not the pushups and the long runs. The goal is to break you down so that you jump when they say "jump".

In the Red Hat bootcamp, the goal is not making the test. The goal is to figure out if you're able to drink from the firehose. Which means if you get a support request, you say "I'll figure it out for you" and you just dive in and try to figure it out. You have to be able to dive into a whole lot of new information without panicking. That's drinking from the firehose.

She re-used existing knowledge and previous skills to learn everything. The important part was not being afraid to dive in.

She moved towards programming. Python, django. She was new to it. One of the first steps? "Set up a virtualenv and....". It can frighten you, but it is just a question of RTFM. Just read the manual. Just read it and then start doing it.

She went to a Django Girls Workshop. (One of the results: http://leanderthalblog.herokuapp.com/). Django girls does a really good job of providing material and documentation. She had some problems installing it, but continued (and succeeded) anyway.

... and then someone challenged her to deploy it on openshift. http://django-leanderthal.rhcloud.com/ It hasn't succeeded completely yet. But she'll persevere and get it working.

She recommends http://learnpythonthehardway.org/ to learn python.

What's next: she'll practice, practice, practice. And she'll contribute to the community. Probably build one or two apps. And she'll be a coach at the upcoming Groningen django girls workshop ("as a coach. No, I'm not worried....")

So: re-use your existing knowledge and build from there. Don't be afraid. Just do it.

22 May 2015 7:45am GMT

Chris Mitchell: Minimizing render times of shared Django forms

A common situation with Django sites is the need to render a given form across all pages, such as a login-form that is embedded in the header. There is a recipe I came upon, probably from stackoverflow, that has some derivation of the following pattern:


# as a context_processor
from .forms import SomeLoginForm

def loginFormProcessor(request):
ctx = {}
if not request.user.is_authenticated():
ctx['login_form'] = SomeLoginForm
return ctx

# your template
{% if not request.user.is_authenticated %}
{% crispy login_form %}
{% endif %}


I was using this pattern for a rather complicated form without thinking about the overhead incurred. However, when new-relic revealed this was taking ~600 ms per render, I knew it had to be fixed.

The simplest solution is template caching, making our template look like so:

# your template
{% load cache %}
{% if not request.user.is_authenticated %}
{% cache 99999 login_form_cache %}
{% crispy login_form %}
{% endcache %}
{% endif %}


The problem with this is we still incur the overhead in our context processor. We can avoid this by doing all our work within the cache tag. First, we need to move the logic of generating the form out of the context processor and into a template_tag.

# our template_tag.py file
@register.assignment_tag
def get_login_forms():
from ..forms import StepOne, StepTwo, StepThree
ctx = {}
ctx['first'] = StepOne
ctx['second'] = StepTwo
ctx['third'] = StepThree
return Context(ctx)

Now, we need to integrate this tag into our text, so our final template looks like the following (this is also more related to my particular example where I have a multi-stepped form):

# our template file
{% load cache our_tags %}
{% if not request.user.is_authenticated %}
{% cache 99999 login_form_cache %}
{% get_login_forms as modal_login_forms %}
{% crispy modal_login_forms.first %}
{% crispy modal_login_forms.second %}
{% crispy modal_login_forms.third %}
{% endcache %}
{% endif %}

This alone made the server response time come from ~2-3 seconds down to 0.69 seconds. Not too shabby.

Note: This code should run but I didn't test it as it isn't exactly my code copy & pasted, but an example.

22 May 2015 7:24am GMT

Chris Mitchell: Minimizing render times of shared Django forms

A common situation with Django sites is the need to render a given form across all pages, such as a login-form that is embedded in the header. There is a recipe I came upon, probably from stackoverflow, that has some derivation of the following pattern:


# as a context_processor
from .forms import SomeLoginForm

def loginFormProcessor(request):
ctx = {}
if not request.user.is_authenticated():
ctx['login_form'] = SomeLoginForm
return ctx

# your template
{% if not request.user.is_authenticated %}
{% crispy login_form %}
{% endif %}


I was using this pattern for a rather complicated form without thinking about the overhead incurred. However, when new-relic revealed this was taking ~600 ms per render, I knew it had to be fixed.

The simplest solution is template caching, making our template look like so:

# your template
{% load cache %}
{% if not request.user.is_authenticated %}
{% cache 99999 login_form_cache %}
{% crispy login_form %}
{% endcache %}
{% endif %}


The problem with this is we still incur the overhead in our context processor. We can avoid this by doing all our work within the cache tag. First, we need to move the logic of generating the form out of the context processor and into a template_tag.

# our template_tag.py file
@register.assignment_tag
def get_login_forms():
from ..forms import StepOne, StepTwo, StepThree
ctx = {}
ctx['first'] = StepOne
ctx['second'] = StepTwo
ctx['third'] = StepThree
return Context(ctx)

Now, we need to integrate this tag into our text, so our final template looks like the following (this is also more related to my particular example where I have a multi-stepped form):

# our template file
{% load cache our_tags %}
{% if not request.user.is_authenticated %}
{% cache 99999 login_form_cache %}
{% get_login_forms as modal_login_forms %}
{% crispy modal_login_forms.first %}
{% crispy modal_login_forms.second %}
{% crispy modal_login_forms.third %}
{% endcache %}
{% endif %}

This alone made the server response time come from ~2-3 seconds down to 0.69 seconds. Not too shabby.

Note: This code should run but I didn't test it as it isn't exactly my code copy & pasted, but an example.

22 May 2015 7:24am GMT

Python Sweetness: Block Range (BRIN) Indexes in PostgreSQL 9.5

After reading that PostgreSQL 9.5 will support BRIN indexes, it took me a couple of re-reads of the attached documentation to understand exactly what this index technique is about. Actually, it's really simple, but for people like me who prefer things to be spelled out, here are some hopefully useful (and at least somewhat accurate) notes.

Tables

As a quick recap, table rows in PostgreSQL are stored into an on-disk structure known as the heap. The heap is an array that is logically partitioned into 8kb "pages", with each page containing one or more "tuples" (rows). To ease management, as the heap grows it is additionally split into 1GB-sized files on disk, however the overall structure is still essentially just one big logical array.

When you ask PostgreSQL to insert a row into a table, it uses an auxilliary structure known as the free space map to locate the first available heap page for your relation ("table") that has sufficient space to store the data for your row. If your row is larger than a pre-set limit (2kb), large columns are split out of the row's data and stored in a series of rows in an internal table (the so-called TOAST tables).

The net result is that each data row exists entirely within one page, and that row lives at a particular logical index (the "item ID") within its page. If PostgreSQL must refer to a row, it can uniquely identify it using just its page number, and its index within the page. The combination of this pair of numbers is known as the row's ctid, or its tuple ID. Tuple IDs can thus be used as a small, efficient, unique locator for every row in a database, and they exist regardless of your schema design.

[Side note: that's not entirely true! If a row has been updated since the database was last VACUUMed, multiple versions will exist, chained together using some special fields in each version's on-disk data. For simplicity let's just assume only one version exists.]

In the current PostgreSQL implementation, 32 bits are used for the page number, and 16 bits for the item number (placing an absolute upper bound on a single database table to somewhere around 32 PiB), allowing the ctid to fit comfortably in 64 bits.

Using just the name of a relation and a ctid, PG can first split the page number from the ctid and use that to efficiently locate the physical database file and offset where the page lives:

page_size = 8KiB
pages_per_segment = 1GiB / page_size
segment, index = divmod(page_number, pages_per_segment)
page_offset = page_size * index

Finally to locate the tuple within the page, a small, constant-sized lookup table exists at the start of each page that maps its item IDs to byte offsets within the page:

item_offset = page.lookup_table[item_id]

Indexes

Without further help, answering a query such as SELECT * FROM person WHERE age BETWEEN 18 AND 23 would require PG to visit every page in the heap, decoding each row in turn, and comparing its age column to the WHERE predicate. Naturally for larger tables, we prefer to avoid that, and an index is necessary to allow PostgreSQL to avoid scanning the full table.

Btree Indexes

The most common index type in PG is the btree, which maintains an efficient map from column value to ctid. Given the imaginary table:

Person table heap layout
Page Number Item ID ctid Name Age Creation Date
1 1 (1, 1) John 10 1998-01
2 (1, 2) Jack 99 1998-02
3 (1, 3) Jill 70 1998-03
4 (1, 4) Jemma 19 1998-04
2 1 (2, 1) George 60 1998-05
2 (2, 2) James 44 1998-05
3 (2, 3) Jocelyn 55 1998-06
4 (2, 4) Jemima 22 1998-07
3 1 (3, 1) Jerry 60 1999-01
2 (3, 2) Jarvis 44 1999-02
3 (3, 3) Jasper 55 1999-03
4 (3, 4) Josh 24 1999-04
4 1 (4, 1) Jacob 60 2000-01
2 (4, 2) Jesse 44 2000-02
3 (4, 3) Janet 55 2000-03
4 (4, 4) Justine 24 2000-04

A btree index created using CREATE INDEX person_age ON person(age) might resemble:

person(age) btree index layout
Age ctid
10 (1, 1)
19 (1, 4)
22 (2, 4)
24 (3, 4)
24 (4, 4)
44 (3, 2)
44 (4, 2)
44 (2, 2)
55 (3, 3)
55 (4, 3)
55 (2, 3)
60 (3, 1)
60 (4, 1)
60 (2, 1)
80 (1, 3)
99 (1, 2)

This is getting too long already, so skipping to the chase we can see that PG can now efficiently locate an exact row given its associated indexed column value, and that value in turn is stored in a data structure that permits fast lookup.

For our SELECT query from above, PG can jump to btree key 18 and scan out ctids until it reaches a key with an entry larger than 23. In the demo table, this means PG must only visit 2 rows from our set of 16, and prior to accessing the row data, it already knows the row definitely matches the predicate.

For some other queries, such as SELECT COUNT(*) FROM person WHERE age = 22, PG may not even need to visit the row data itself, since it can infer from index entries how many data rows exist. [Another MVCC caveat! This is not entirely true, since index entries may exist pointing to deleted rows, or rows created in later transactions]

The crucial point to note, though, is that one exact index entry is produced for every row, which usually doesn't amount to much, maybe no more than 5-15% overhead relative to the source table, however for a large table, that overhead may be the difference between a dataset that fits in RAM, and one in which common queries end up hitting disk, or IO is doubled due to index access, since the dataset was already vastly larger than available RAM. It's easy to imagine indexes quickly adding up, such that perhaps half of an application's storage is wasted on them.

BRIN Indexes

Finally enough verbiage is spilled so that we can reach the point: BRIN indexes introduce a cool tradeoff where instead of covering individual rows, index entries cover one or more heap pages:

person(age) BRIN index with group size 1
Page Number Has NULL values? Lowest Age Highest Age
1 No 10 99
2 No 22 60
3 No 24 60
4 No 24 60

The structure is used like so: given a query such as SELECT * FROM person WHERE age BETWEEN 10 AND 15, PG will visit every index entry in turn, comparing its minimum/maximum values against the query predicate. If the index entry indicates that a range of pages contains at least one record matching the query, those pages will be scanned for matching rows. For this query, only one page contains rows whose age fields overlap the desired region, and so PG can avoid visiting 75% of the table.

Notice that in order to find just one row, PG must now scan a full page and compare each of its 4 rows against the query predicates. While index size is reduced, query time has increased! There is also little pattern in our age column: in fact, it is quite lucky that our index described only a single page covering the range 10..15. Had users signed up in a slightly different order, the distribution of ages across physical storage pages may have resulted in PG having to scan many more pages.

[Another side note: unlike our dummy table above, a typical PG heap page may contain over 200 rows, depending on how many columns are present, and how many of those are strings. Our dummy BRIN index above looks as if it contains just as much information as the original btree index, but that's just because my example only has 16 rows instead of 800].

BRIN also permits configuring how many heap pages contribute to an index entry. For example, we can halve the size of our first index while also halving its precision:

person(age) BRIN index with group size 2
Page Number Has NULL values? Lowest Age Highest Age
1-2 No 10 99
3-4 No 24 60

Due to the work-increasing factor, and also since every index entry must be visited (resulting in a potentially high fixed cost for any query), BRIN is probably never useful for "hot" queries against a table, or even much use at all in a typical "hot" database, however for auxilliary queries, such as producing once-per-month reports or bulk queries against archival data, where reduced runtime or IO is desired without the storage costs of an exact index, BRIN may be just the tool.

Finally, notice in the original table how as new records were inserted, their creation date roughly tracked which database page they ended up on. This is quite a natural outcome since as the table grows, newer items will occupy later pages in the array, and so there is quite a reliable correlation between page number and the creation date column value. A BRIN index over this column would work very well.


This was supposed to be a 5 minute "hey, that's cool!" post, but somehow I suck at keeping these things short. I found the source code documentation the best explanation for how this stuff works; the public wiki is pretty vague. If you have corrections, pump the Ask Me Anything link to the right of this page.

22 May 2015 1:56am GMT

Python Sweetness: Block Range (BRIN) Indexes in PostgreSQL 9.5

After reading that PostgreSQL 9.5 will support BRIN indexes, it took me a couple of re-reads of the attached documentation to understand exactly what this index technique is about. Actually, it's really simple, but for people like me who prefer things to be spelled out, here are some hopefully useful (and at least somewhat accurate) notes.

Tables

As a quick recap, table rows in PostgreSQL are stored into an on-disk structure known as the heap. The heap is an array that is logically partitioned into 8kb "pages", with each page containing one or more "tuples" (rows). To ease management, as the heap grows it is additionally split into 1GB-sized files on disk, however the overall structure is still essentially just one big logical array.

When you ask PostgreSQL to insert a row into a table, it uses an auxilliary structure known as the free space map to locate the first available heap page for your relation ("table") that has sufficient space to store the data for your row. If your row is larger than a pre-set limit (2kb), large columns are split out of the row's data and stored in a series of rows in an internal table (the so-called TOAST tables).

The net result is that each data row exists entirely within one page, and that row lives at a particular logical index (the "item ID") within its page. If PostgreSQL must refer to a row, it can uniquely identify it using just its page number, and its index within the page. The combination of this pair of numbers is known as the row's ctid, or its tuple ID. Tuple IDs can thus be used as a small, efficient, unique locator for every row in a database, and they exist regardless of your schema design.

[Side note: that's not entirely true! If a row has been updated since the database was last VACUUMed, multiple versions will exist, chained together using some special fields in each version's on-disk data. For simplicity let's just assume only one version exists.]

In the current PostgreSQL implementation, 32 bits are used for the page number, and 16 bits for the item number (placing an absolute upper bound on a single database table to somewhere around 32 PiB), allowing the ctid to fit comfortably in 64 bits.

Using just the name of a relation and a ctid, PG can first split the page number from the ctid and use that to efficiently locate the physical database file and offset where the page lives:

page_size = 8KiB
pages_per_segment = 1GiB / page_size
segment, index = divmod(page_number, pages_per_segment)
page_offset = page_size * index

Finally to locate the tuple within the page, a small, constant-sized lookup table exists at the start of each page that maps its item IDs to byte offsets within the page:

item_offset = page.lookup_table[item_id]

Indexes

Without further help, answering a query such as SELECT * FROM person WHERE age BETWEEN 18 AND 23 would require PG to visit every page in the heap, decoding each row in turn, and comparing its age column to the WHERE predicate. Naturally for larger tables, we prefer to avoid that, and an index is necessary to allow PostgreSQL to avoid scanning the full table.

Btree Indexes

The most common index type in PG is the btree, which maintains an efficient map from column value to ctid. Given the imaginary table:

Person table heap layout
Page Number Item ID ctid Name Age Creation Date
1 1 (1, 1) John 10 1998-01
2 (1, 2) Jack 99 1998-02
3 (1, 3) Jill 70 1998-03
4 (1, 4) Jemma 19 1998-04
2 1 (2, 1) George 60 1998-05
2 (2, 2) James 44 1998-05
3 (2, 3) Jocelyn 55 1998-06
4 (2, 4) Jemima 22 1998-07
3 1 (3, 1) Jerry 60 1999-01
2 (3, 2) Jarvis 44 1999-02
3 (3, 3) Jasper 55 1999-03
4 (3, 4) Josh 24 1999-04
4 1 (4, 1) Jacob 60 2000-01
2 (4, 2) Jesse 44 2000-02
3 (4, 3) Janet 55 2000-03
4 (4, 4) Justine 24 2000-04

A btree index created using CREATE INDEX person_age ON person(age) might resemble:

person(age) btree index layout
Age ctid
10 (1, 1)
19 (1, 4)
22 (2, 4)
24 (3, 4)
24 (4, 4)
44 (3, 2)
44 (4, 2)
44 (2, 2)
55 (3, 3)
55 (4, 3)
55 (2, 3)
60 (3, 1)
60 (4, 1)
60 (2, 1)
80 (1, 3)
99 (1, 2)

This is getting too long already, so skipping to the chase we can see that PG can now efficiently locate an exact row given its associated indexed column value, and that value in turn is stored in a data structure that permits fast lookup.

For our SELECT query from above, PG can jump to btree key 18 and scan out ctids until it reaches a key with an entry larger than 23. In the demo table, this means PG must only visit 2 rows from our set of 16, and prior to accessing the row data, it already knows the row definitely matches the predicate.

For some other queries, such as SELECT COUNT(*) FROM person WHERE age = 22, PG may not even need to visit the row data itself, since it can infer from index entries how many data rows exist. [Another MVCC caveat! This is not entirely true, since index entries may exist pointing to deleted rows, or rows created in later transactions]

The crucial point to note, though, is that one exact index entry is produced for every row, which usually doesn't amount to much, maybe no more than 5-15% overhead relative to the source table, however for a large table, that overhead may be the difference between a dataset that fits in RAM, and one in which common queries end up hitting disk, or IO is doubled due to index access, since the dataset was already vastly larger than available RAM. It's easy to imagine indexes quickly adding up, such that perhaps half of an application's storage is wasted on them.

BRIN Indexes

Finally enough verbiage is spilled so that we can reach the point: BRIN indexes introduce a cool tradeoff where instead of covering individual rows, index entries cover one or more heap pages:

person(age) BRIN index with group size 1
Page Number Has NULL values? Lowest Age Highest Age
1 No 10 99
2 No 22 60
3 No 24 60
4 No 24 60

The structure is used like so: given a query such as SELECT * FROM person WHERE age BETWEEN 10 AND 15, PG will visit every index entry in turn, comparing its minimum/maximum values against the query predicate. If the index entry indicates that a range of pages contains at least one record matching the query, those pages will be scanned for matching rows. For this query, only one page contains rows whose age fields overlap the desired region, and so PG can avoid visiting 75% of the table.

Notice that in order to find just one row, PG must now scan a full page and compare each of its 4 rows against the query predicates. While index size is reduced, query time has increased! There is also little pattern in our age column: in fact, it is quite lucky that our index described only a single page covering the range 10..15. Had users signed up in a slightly different order, the distribution of ages across physical storage pages may have resulted in PG having to scan many more pages.

[Another side note: unlike our dummy table above, a typical PG heap page may contain over 200 rows, depending on how many columns are present, and how many of those are strings. Our dummy BRIN index above looks as if it contains just as much information as the original btree index, but that's just because my example only has 16 rows instead of 800].

BRIN also permits configuring how many heap pages contribute to an index entry. For example, we can halve the size of our first index while also halving its precision:

person(age) BRIN index with group size 2
Page Number Has NULL values? Lowest Age Highest Age
1-2 No 10 99
3-4 No 24 60

Due to the work-increasing factor, and also since every index entry must be visited (resulting in a potentially high fixed cost for any query), BRIN is probably never useful for "hot" queries against a table, or even much use at all in a typical "hot" database, however for auxilliary queries, such as producing once-per-month reports or bulk queries against archival data, where reduced runtime or IO is desired without the storage costs of an exact index, BRIN may be just the tool.

Finally, notice in the original table how as new records were inserted, their creation date roughly tracked which database page they ended up on. This is quite a natural outcome since as the table grows, newer items will occupy later pages in the array, and so there is quite a reliable correlation between page number and the creation date column value. A BRIN index over this column would work very well.


This was supposed to be a 5 minute "hey, that's cool!" post, but somehow I suck at keeping these things short. I found the source code documentation the best explanation for how this stuff works; the public wiki is pretty vague. If you have corrections, pump the Ask Me Anything link to the right of this page.

22 May 2015 1:56am GMT

Vasudev Ram: Talk Python To Me podcast; Australia mandates text-based programming

By Vasudev Ram


Today, for a change, a different kind of Python post, but one that I think will be interesting to my readers:

From the horse's, er, snake's mouth :)

Talk Python to Me "is a weekly podcast hosted by Michael Kennedy. The show covers a wide array of Python topics as well as many related topics (e.g. MongoDB, AngularJS, DevOps)."

The format is a casual 30 minute conversation with industry experts.

I just came across it today, and am checking out the site a bit. The site itself looks good, visually, I mean.

Some of the podcasts also have text transcripts.

I'm reading one of the transcripts, Transcript for Episode #8, of the conversation with Dr. James Curran:

Teaching Python at Grok Learning and Classrooms

Excerpt:

"James Curran is an associate professor in computer science at the University of Sidney and co-founder of Grok Learning, which you can find at groklearning.com. James has been teaching computer science to students and teachers for over a decade. In 2010 he was named one of Sidney magazines top 100 influential people for his work in computer science education."

(I guess the 'Sidney' spelling is due to an error in the automated or manual transcription of the podcast.)

Anyway, the transcript is interesting, since is about Python and/in education / training, which are both among my interests.

An interesting point mentioned in the talk, is that Australia is mandating text-based computer programming in its schools, as opposed to only having visual programming with tools like Scratch. Methinks that is a good idea, since only being able to work with GUI's and no skill with text-based tools (such as text editors and the command line), is not a good thing, IMO. I've come across, and once interviewed (for a client), some Java "programmers" who could not write a simple Java program without reaching for Eclipse. What the hell. Not knocking Java. I'm sure there must be people like that for other languages too. [ Dons latest flame shield made of modern composites in advance ... :) ]

Dr. Curran on the topic:

"So we have done a lot of work with teaching students in seventh and eighth grade, and I think that that is the ideal zone in fact the Australian curriculum that I was just involved in writing has mandated that the kids will learn a text based programming language as opposed to something like visual language, like Scratch. So text based programming is mandated for year seventh and eighth, and Python I think is the ideal language to be teaching there."

- Vasudev Ram - Online Python training and programming Dancing Bison EnterprisesSignup to hear about new products or services that I create. Posts about Python Posts about xtopdf Contact Page

Share |
Vasudev Ram

22 May 2015 12:52am GMT

Vasudev Ram: Talk Python To Me podcast; Australia mandates text-based programming

By Vasudev Ram


Today, for a change, a different kind of Python post, but one that I think will be interesting to my readers:

From the horse's, er, snake's mouth :)

Talk Python to Me "is a weekly podcast hosted by Michael Kennedy. The show covers a wide array of Python topics as well as many related topics (e.g. MongoDB, AngularJS, DevOps)."

The format is a casual 30 minute conversation with industry experts.

I just came across it today, and am checking out the site a bit. The site itself looks good, visually, I mean.

Some of the podcasts also have text transcripts.

I'm reading one of the transcripts, Transcript for Episode #8, of the conversation with Dr. James Curran:

Teaching Python at Grok Learning and Classrooms

Excerpt:

"James Curran is an associate professor in computer science at the University of Sidney and co-founder of Grok Learning, which you can find at groklearning.com. James has been teaching computer science to students and teachers for over a decade. In 2010 he was named one of Sidney magazines top 100 influential people for his work in computer science education."

(I guess the 'Sidney' spelling is due to an error in the automated or manual transcription of the podcast.)

Anyway, the transcript is interesting, since is about Python and/in education / training, which are both among my interests.

An interesting point mentioned in the talk, is that Australia is mandating text-based computer programming in its schools, as opposed to only having visual programming with tools like Scratch. Methinks that is a good idea, since only being able to work with GUI's and no skill with text-based tools (such as text editors and the command line), is not a good thing, IMO. I've come across, and once interviewed (for a client), some Java "programmers" who could not write a simple Java program without reaching for Eclipse. What the hell. Not knocking Java. I'm sure there must be people like that for other languages too. [ Dons latest flame shield made of modern composites in advance ... :) ]

Dr. Curran on the topic:

"So we have done a lot of work with teaching students in seventh and eighth grade, and I think that that is the ideal zone in fact the Australian curriculum that I was just involved in writing has mandated that the kids will learn a text based programming language as opposed to something like visual language, like Scratch. So text based programming is mandated for year seventh and eighth, and Python I think is the ideal language to be teaching there."

- Vasudev Ram - Online Python training and programming Dancing Bison EnterprisesSignup to hear about new products or services that I create. Posts about Python Posts about xtopdf Contact Page

Share |
Vasudev Ram

22 May 2015 12:52am GMT

10 Nov 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: King Willams Town Bahnhof

Gestern musste ich morgens zur Station nach KWT um unsere Rerservierten Bustickets für die Weihnachtsferien in Capetown abzuholen. Der Bahnhof selber ist seit Dezember aus kostengründen ohne Zugverbindung - aber Translux und co - die langdistanzbusse haben dort ihre Büros.


Größere Kartenansicht




© benste CC NC SA

10 Nov 2011 10:57am GMT

09 Nov 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein

Niemand ist besorgt um so was - mit dem Auto fährt man einfach durch, und in der City - nahe Gnobie- "ne das ist erst gefährlich wenn die Feuerwehr da ist" - 30min später auf dem Rückweg war die Feuerwehr da.




© benste CC NC SA

09 Nov 2011 8:25pm GMT

08 Nov 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: Brai Party

Brai = Grillabend o.ä.

Die möchte gern Techniker beim Flicken ihrer SpeakOn / Klinke Stecker Verzweigungen...

Die Damen "Mamas" der Siedlung bei der offiziellen Eröffnungsrede

Auch wenn weniger Leute da waren als erwartet, Laute Musik und viele Leute ...

Und natürlich ein Feuer mit echtem Holz zum Grillen.

© benste CC NC SA

08 Nov 2011 2:30pm GMT

07 Nov 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: Lumanyano Primary

One of our missions was bringing Katja's Linux Server back to her room. While doing that we saw her new decoration.

Björn, Simphiwe carried the PC to Katja's school


© benste CC NC SA

07 Nov 2011 2:00pm GMT

06 Nov 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: Nelisa Haircut

Today I went with Björn to Needs Camp to Visit Katja's guest family for a special Party. First of all we visited some friends of Nelisa - yeah the one I'm working with in Quigney - Katja's guest fathers sister - who did her a haircut.

African Women usually get their hair done by arranging extensions and not like Europeans just cutting some hair.

In between she looked like this...

And then she was done - looks amazing considering the amount of hair she had last week - doesn't it ?

© benste CC NC SA

06 Nov 2011 7:45pm GMT

05 Nov 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: Mein Samstag

Irgendwie viel mir heute auf das ich meine Blogposts mal ein bischen umstrukturieren muss - wenn ich immer nur von neuen Plätzen berichte, dann müsste ich ja eine Rundreise machen. Hier also mal ein paar Sachen aus meinem heutigen Alltag.

Erst einmal vorweg, Samstag zählt zumindest für uns Voluntäre zu den freien Tagen.

Dieses Wochenende sind nur Rommel und ich auf der Farm - Katja und Björn sind ja mittlerweile in ihren Einsatzstellen, und meine Mitbewohner Kyle und Jonathan sind zu Hause in Grahamstown - sowie auch Sipho der in Dimbaza wohnt.
Robin, die Frau von Rommel ist in Woodie Cape - schon seit Donnerstag um da ein paar Sachen zur erledigen.
Naja wie dem auch sei heute morgen haben wir uns erstmal ein gemeinsames Weetbix/Müsli Frühstück gegönnt und haben uns dann auf den Weg nach East London gemacht. 2 Sachen waren auf der Checkliste Vodacom, Ethienne (Imobilienmakler) außerdem auf dem Rückweg die fehlenden Dinge nach NeedsCamp bringen.

Nachdem wir gerade auf der Dirtroad losgefahren sind mussten wir feststellen das wir die Sachen für Needscamp und Ethienne nicht eingepackt hatten aber die Pumpe für die Wasserversorgung im Auto hatten.

Also sind wir in EastLondon ersteinmal nach Farmerama - nein nicht das onlinespiel farmville - sondern einen Laden mit ganz vielen Sachen für eine Farm - in Berea einem nördlichen Stadteil gefahren.

In Farmerama haben wir uns dann beraten lassen für einen Schnellverschluss der uns das leben mit der Pumpe leichter machen soll und außerdem eine leichtere Pumpe zur Reperatur gebracht, damit es nicht immer so ein großer Aufwand ist, wenn mal wieder das Wasser ausgegangen ist.

Fego Caffé ist in der Hemmingways Mall, dort mussten wir und PIN und PUK einer unserer Datensimcards geben lassen, da bei der PIN Abfrage leider ein zahlendreher unterlaufen ist. Naja auf jeden Fall speichern die Shops in Südafrika so sensible Daten wie eine PUK - die im Prinzip zugang zu einem gesperrten Phone verschafft.

Im Cafe hat Rommel dann ein paar online Transaktionen mit dem 3G Modem durchgeführt, welches ja jetzt wieder funktionierte - und übrigens mittlerweile in Ubuntu meinem Linuxsystem perfekt klappt.

Nebenbei bin ich nach 8ta gegangen um dort etwas über deren neue Deals zu erfahren, da wir in einigen von Hilltops Centern Internet anbieten wollen. Das Bild zeigt die Abdeckung UMTS in NeedsCamp Katjas Ort. 8ta ist ein neuer Telefonanbieter von Telkom, nachdem Vodafone sich Telkoms anteile an Vodacom gekauft hat müssen die komplett neu aufbauen.
Wir haben uns dazu entschieden mal eine kostenlose Prepaidkarte zu testen zu organisieren, denn wer weis wie genau die Karte oben ist ... Bevor man einen noch so billigen Deal für 24 Monate signed sollte man wissen obs geht.

Danach gings nach Checkers in Vincent, gesucht wurden zwei Hotplates für WoodyCape - R 129.00 eine - also ca. 12€ für eine zweigeteilte Kochplatte.
Wie man sieht im Hintergrund gibts schon Weihnachtsdeko - Anfang November und das in Südafrika bei sonnig warmen min- 25°C

Mittagessen haben wir uns bei einem Pakistanischen Curry Imbiss gegönnt - sehr empfehlenswert !
Naja und nachdem wir dann vor ner Stunde oder so zurück gekommen sind habe ich noch den Kühlschrank geputzt den ich heute morgen zum defrosten einfach nach draußen gestellt hatte. Jetzt ist der auch mal wieder sauber und ohne 3m dicke Eisschicht...

Morgen ... ja darüber werde ich gesondert berichten ... aber vermutlich erst am Montag, denn dann bin ich nochmal wieder in Quigney(East London) und habe kostenloses Internet.

© benste CC NC SA

05 Nov 2011 4:33pm GMT

31 Oct 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: Sterkspruit Computer Center

Sterkspruit is one of Hilltops Computer Centres in the far north of Eastern Cape. On the trip to J'burg we've used the opportunity to take a look at the centre.

Pupils in the big classroom


The Trainer


School in Countryside


Adult Class in the Afternoon


"Town"


© benste CC NC SA

31 Oct 2011 4:58pm GMT

Benedict Stein: Technical Issues

What are you doing in an internet cafe if your ADSL and Faxline has been discontinued before months end. Well my idea was sitting outside and eating some ice cream.
At least it's sunny and not as rainy as on the weekend.


© benste CC NC SA

31 Oct 2011 3:11pm GMT

30 Oct 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: Nellis Restaurant

For those who are traveling through Zastron - there is a very nice Restaurant which is serving delicious food at reasanable prices.
In addition they're selling home made juices jams and honey.




interior


home made specialities - the shop in the shop


the Bar


© benste CC NC SA

30 Oct 2011 4:47pm GMT

29 Oct 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: The way back from J'burg

Having the 10 - 12h trip from J'burg back to ELS I was able to take a lot of pcitures including these different roadsides

Plain Street


Orange River in its beginngings (near Lesotho)


Zastron Anglican Church


The Bridge in Between "Free State" and Eastern Cape next to Zastron


my new Background ;)


If you listen to GoogleMaps you'll end up traveling 50km of gravel road - as it was just renewed we didn't have that many problems and saved 1h compared to going the official way with all it's constructions sites




Freeway


getting dark


© benste CC NC SA

29 Oct 2011 4:23pm GMT

28 Oct 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: Wie funktioniert eigentlich eine Baustelle ?

Klar einiges mag anders sein, vieles aber gleich - aber ein in Deutschland täglich übliches Bild einer Straßenbaustelle - wie läuft das eigentlich in Südafrika ?

Ersteinmal vorweg - NEIN keine Ureinwohner die mit den Händen graben - auch wenn hier mehr Manpower genutzt wird - sind sie fleißig mit Technologie am arbeiten.

Eine ganz normale "Bundesstraße"


und wie sie erweitert wird


gaaaanz viele LKWs


denn hier wird eine Seite über einen langen Abschnitt komplett gesperrt, so das eine Ampelschaltung mit hier 45 Minuten Wartezeit entsteht


Aber wenigstens scheinen die ihren Spaß zu haben ;) - Wie auch wir denn gücklicher Weise mussten wir nie länger als 10 min. warten.

© benste CC NC SA

28 Oct 2011 4:20pm GMT