24 Oct 2014

feedPlanet Python

Peter Bengtsson: Go vs. Python

tl;dr; It's not a competition! I'm just comparing Go and Python. So I can learn Go.

So recently I've been trying to learn Go. It's a modern programming language that started at Google but has very little to do with Google except that some of its core contributors are staff at Google.

The true strength of Go is that it's succinct and minimalistic and fast. It's not a scripting language like Python or Ruby but lots of people write scripts with it. It's growing in popularity with systems people but web developers like me have started to pay attention too.

The best way to learn a language is to do something with it. Build something. However, I don't disagree with that but I just felt I needed to cover the basics first and instead of taking notes I decided to learn by comparing it to something I know well, Python. I did this a zillion years ago when I tried to learn ZPT by comparing it DTML which I already knew well.

My free time is very limited so I'm taking things by small careful baby steps. I read through An Introduction to Programming in Go by Caleb Doxey in a couple of afternoons and then I decided to spend a couple of minutes every day with each chapter and implement something from that book and compare it to how you'd do it in Python.

I also added some slightly more full examples, Markdownserver which was fun because it showed that a simple Go HTTP server that does something can be 10 times faster than the Python equivalent.

What I've learned

24 Oct 2014 10:03pm GMT

Peter Bengtsson: Go vs. Python

tl;dr; It's not a competition! I'm just comparing Go and Python. So I can learn Go.

So recently I've been trying to learn Go. It's a modern programming language that started at Google but has very little to do with Google except that some of its core contributors are staff at Google.

The true strength of Go is that it's succinct and minimalistic and fast. It's not a scripting language like Python or Ruby but lots of people write scripts with it. It's growing in popularity with systems people but web developers like me have started to pay attention too.

The best way to learn a language is to do something with it. Build something. However, I don't disagree with that but I just felt I needed to cover the basics first and instead of taking notes I decided to learn by comparing it to something I know well, Python. I did this a zillion years ago when I tried to learn ZPT by comparing it DTML which I already knew well.

My free time is very limited so I'm taking things by small careful baby steps. I read through An Introduction to Programming in Go by Caleb Doxey in a couple of afternoons and then I decided to spend a couple of minutes every day with each chapter and implement something from that book and compare it to how you'd do it in Python.

I also added some slightly more full examples, Markdownserver which was fun because it showed that a simple Go HTTP server that does something can be 10 times faster than the Python equivalent.

What I've learned

24 Oct 2014 10:03pm GMT

Brett Cannon: Telephony in the modern age

Making voice calls using the telephone network is now antiquated. There are practically no good financial reasons to have a landline anymore if you can afford Internet at home and a smartphone. Consider this blog post an open letter to my parents and in-laws where I attempt to convince them that having their landlines is not necessary and that there are better, cheaper options to talk to me and everyone else who isn't a free phone call on their mobile phone already.

In this post I'm working from the assumption that everyone has at least one smartphone with a non-data phone plan in the house and everyone has wireless internet throughout their home. I'm also assuming their phone is used to make calls and receive calls and nothing else, e.g. no fax machine. I'm also going to use pricing and such from Canada because that's what I'm most familiar with, but since we have some of the highest telecom rates in the world it means I'm being somewhat liberal for my US family members when making price comparisons (but as you will see, essentially anything above free is wasted money when it comes to a landline).

Making calls

At home

When making a call from a landline you need to consider local, long distance, and international calls. Let's assume you're living in Vancouver, putting you under the purview of Telus. Local calls are unlimited and free with your monthly charge of $30/month (past first six months, no bundling discount; otherwise it's $25/month in a bundle). You're going to call people outside of your local calling area which means long distance, so we will toss in the $6/month for 300 minutes of Canada/US calling ($0.02/minute if you manage to use all the minutes). Internationally it's e.g. $0.05/minute to call the UK. So that makes the cost range from $25/month if you have another Telus service and never call outside your local area to let's say $38/month for no bundling discount, 300 long-distance minutes, and a 60 minute call to the UK.

Now let's look at what you can do with your mobile phone. From the outset I said that I assumed people had a smartphone with the most basic plan available; that's the WIND 25 plan from WIND Mobile. That gives you unlimited Canada-wide calling but no data. So right there for $25/month you cover everything but any US calls as part of the $6/month Canada+US add-on and the 60 minute call to the UK. Now WIND charges $0.15/minute for any Canada/US calls and $0.75/minute to UK numbers (landline or mobile).

But really none of these extra costs matter. If you install Google+ Hangouts you can make any of these calls cheaper than a landline or through WIND using nothing more than the WiFi in your house and your Google+ account. And I really want to get the point across that since this uses the WiFi in the house you are not gaining or losing anything compared to a landline in terms of accessibility. This means lacking a data plan is not a problem, nor is lacking good cell service in the house since what I'm about to suggest is over the home internet connection.

If you're on an iPhone or iPad you need Hangouts for iOS and if you're on Android you need both Hangouts and the Hangouts Dialer. With that software installed you can use Hangouts to make phone calls to any phone number in the world. All calls to the Canada and the US are free, with calls to the UK being $0.01 - $0.03/minute depending if you're calling a landline or mobile. That makes Hangouts either equivalent or cheaper than either the landline or WIND for Canada or US calls depending on distance, and the rate to the UK is less than either a landline or mobile phone call no matter what.

Now there is one drawback to making calls this way and that's the caller ID will show up as "Anonymous" for the person you are calling. If you live in the US, though, you can get Google Voice and that will make the call show up as your Google Voice number. And since you can have Google Voice ring your landline and in Hangouts on you mobile phone simultaneously you can help transition people over to that number before you cancel your landline and have people not notice the switch (or you can port your landline number to Google Voice and not have to tell anyone you changed numbers). And if the people you are calling happen to have Hangouts installed then you can make a Hangouts-to-Hangouts call which will show up as you. The call will also be much clearer since the internet can send better audio than the phone network can.

Literally the only true loss of convenience for placing a call is having to carry your mobile phone with you in the house compared to if you happen to have multiple phones strewn about to place a call. But since Hangouts also works on tablets and computers you literally have to not be logged into Hangouts on any device in the room that has a mic and speakers to lose convenient access to a phone for placing a call. Or you simply go to where your mobile phone is. =)

Away from home

Now mobile phones are mobile, so what about when you are out and about? Can the benefit of using Hangouts for phone calls be extended to when you are away from home? The answer is yes.

When I have used Hangouts to call a landline I have noticed the bandwidth used is roughly 750KB/minute. WIND charges $0.05/MB when roaming. That means making a call while roaming using Hangouts less than $0.04/minute. That is less than a call to the UK on the landline. It's also cheaper than a roaming call on WIND. So even in a pinch it's still relatively cheap.

But let's say you bumped up to the WIND 35 plan that gives you unlimited provincial calls but also unlimited data. In the end it really isn't going to matter that your free calls go from national to provincial since you will use Hangouts for any non-free calls anyway. So for an extra $10/month you can now make the same free calls anywhere you happen to be with your mobile phone along with mobile data to check your emails, etc. And if you compare the worst-case costs for the landline, it's $37/month vs. $38/month for the 300 minutes of Canada + US calls and an hour call to the UK, or $35/month vs. $36/month if you leave out the UK call. The only way to shave $4 off your monthly bill compared to a mobile phone is to have a bundle with Telus ($25/month + $6 for 300 long-distance minutes).

But this is assuming it's mobile vs. landline. In fact it's usually mobile and landline, which makes this comparison borderline pointless and simply shows there is literally no reason financially to keep a landline anymore when looking at it from the perspective of making a phone call (and if anyone says "911" then I will point out that by law, phone companies have to service a 911 call even if you don't pay for a phone line so you can keep a phone plugged in for emergencies if you want).

Receiving calls

If you ditch your landline, how do you receive calls? Well, presumably people have your cell number so you can just answer your mobile phone. As mentioned above, if you have multiple phones in your house this can be the single inconvenience you lose at the price of saving some money every month by ditching your landline. Now if you someone is calling your Google Voice number you actually mitigate this inconvenience as Hangouts will ring on your mobile phone thanks to the WiFi in the house, your tablets, and any computer you're logged into where you have the Chrome extension or Chrome app installed or are logged into Inbox or Gmail (the app is Chrome OS and WIndows only, the extension is any OS).

The only other inconvenience from ditching the landline at this point is what to do when the power goes out. That requires actually examining how often you make or receive phone calls during a power outage in a year compared to a couple hundred dollars (but do realize that's enough to buy a new Chromebook every year so it's not a small amount of money).

Alternatives

Now my entire family has Google+ accounts, so pointing them at Hangouts is no issue. But if for some reason you are not in such a position, you do have some alternatives for making voice calls. You can use something like Facebook Messenger or Skype, but the former only supports Messenger-to-Messenger and Skype charges to call a phone number. Talky lets you do video calls using nothing but your browser as long as it's Chrome, Firefox, or Opera (which I hope it is for anyone's sake in providing tech support from a security perspective). Lastly you can always switch to a VoIP-based solution like Shaw, Vonage, or Ooma if you truly cannot let go of your landline, but that's still throwing money away compared to just using your smartphone and Hangouts.

Conclusion

Using a mobile phone and Hangouts to make calls will cost you caller ID, unless you live in the States in which case it will simply come from your Google Voice number. It will also require you to use a device with Hangouts installed on it to place a call (mobile phone, tablet, or laptop). For receiving calls you only use your mobile phone, unless you have Google Voice in which case any device you can place a call with will also be able to receive a call to your Google Voice number (complete with transcribed voicemail).

For all of those "drawbacks", you get to drop your landline and save a couple hundred dollars a year. You also simplify what is required for people to call you by having one less phone number for them to try calling. And by actively using Hangouts to make calls it means you will also be actively using Hangouts itself, which means you can also use it for messaging. Anyone with children will enjoy that as using Hangouts means a much easier, more prompt reply to communication compared to trying to get a hold of your children with a phone call. And with Google Voice you can even send SMS messages if your children are "old school" and have not transitioned over to messaging apps yet. And finally, it means when you travel there won't be any issues placing or receiving calls since almost everyone has WiFi where they stay.

24 Oct 2014 7:29pm GMT

Brett Cannon: Telephony in the modern age

Making voice calls using the telephone network is now antiquated. There are practically no good financial reasons to have a landline anymore if you can afford Internet at home and a smartphone. Consider this blog post an open letter to my parents and in-laws where I attempt to convince them that having their landlines is not necessary and that there are better, cheaper options to talk to me and everyone else who isn't a free phone call on their mobile phone already.

In this post I'm working from the assumption that everyone has at least one smartphone with a non-data phone plan in the house and everyone has wireless internet throughout their home. I'm also assuming their phone is used to make calls and receive calls and nothing else, e.g. no fax machine. I'm also going to use pricing and such from Canada because that's what I'm most familiar with, but since we have some of the highest telecom rates in the world it means I'm being somewhat liberal for my US family members when making price comparisons (but as you will see, essentially anything above free is wasted money when it comes to a landline).

Making calls

At home

When making a call from a landline you need to consider local, long distance, and international calls. Let's assume you're living in Vancouver, putting you under the purview of Telus. Local calls are unlimited and free with your monthly charge of $30/month (past first six months, no bundling discount; otherwise it's $25/month in a bundle). You're going to call people outside of your local calling area which means long distance, so we will toss in the $6/month for 300 minutes of Canada/US calling ($0.02/minute if you manage to use all the minutes). Internationally it's e.g. $0.05/minute to call the UK. So that makes the cost range from $25/month if you have another Telus service and never call outside your local area to let's say $38/month for no bundling discount, 300 long-distance minutes, and a 60 minute call to the UK.

Now let's look at what you can do with your mobile phone. From the outset I said that I assumed people had a smartphone with the most basic plan available; that's the WIND 25 plan from WIND Mobile. That gives you unlimited Canada-wide calling but no data. So right there for $25/month you cover everything but any US calls as part of the $6/month Canada+US add-on and the 60 minute call to the UK. Now WIND charges $0.15/minute for any Canada/US calls and $0.75/minute to UK numbers (landline or mobile).

But really none of these extra costs matter. If you install Google+ Hangouts you can make any of these calls cheaper than a landline or through WIND using nothing more than the WiFi in your house and your Google+ account. And I really want to get the point across that since this uses the WiFi in the house you are not gaining or losing anything compared to a landline in terms of accessibility. This means lacking a data plan is not a problem, nor is lacking good cell service in the house since what I'm about to suggest is over the home internet connection.

If you're on an iPhone or iPad you need Hangouts for iOS and if you're on Android you need both Hangouts and the Hangouts Dialer. With that software installed you can use Hangouts to make phone calls to any phone number in the world. All calls to the Canada and the US are free, with calls to the UK being $0.01 - $0.03/minute depending if you're calling a landline or mobile. That makes Hangouts either equivalent or cheaper than either the landline or WIND for Canada or US calls depending on distance, and the rate to the UK is less than either a landline or mobile phone call no matter what.

Now there is one drawback to making calls this way and that's the caller ID will show up as "Anonymous" for the person you are calling. If you live in the US, though, you can get Google Voice and that will make the call show up as your Google Voice number. And since you can have Google Voice ring your landline and in Hangouts on you mobile phone simultaneously you can help transition people over to that number before you cancel your landline and have people not notice the switch (or you can port your landline number to Google Voice and not have to tell anyone you changed numbers). And if the people you are calling happen to have Hangouts installed then you can make a Hangouts-to-Hangouts call which will show up as you. The call will also be much clearer since the internet can send better audio than the phone network can.

Literally the only true loss of convenience for placing a call is having to carry your mobile phone with you in the house compared to if you happen to have multiple phones strewn about to place a call. But since Hangouts also works on tablets and computers you literally have to not be logged into Hangouts on any device in the room that has a mic and speakers to lose convenient access to a phone for placing a call. Or you simply go to where your mobile phone is. =)

Away from home

Now mobile phones are mobile, so what about when you are out and about? Can the benefit of using Hangouts for phone calls be extended to when you are away from home? The answer is yes.

When I have used Hangouts to call a landline I have noticed the bandwidth used is roughly 750KB/minute. WIND charges $0.05/MB when roaming. That means making a call while roaming using Hangouts less than $0.04/minute. That is less than a call to the UK on the landline. It's also cheaper than a roaming call on WIND. So even in a pinch it's still relatively cheap.

But let's say you bumped up to the WIND 35 plan that gives you unlimited provincial calls but also unlimited data. In the end it really isn't going to matter that your free calls go from national to provincial since you will use Hangouts for any non-free calls anyway. So for an extra $10/month you can now make the same free calls anywhere you happen to be with your mobile phone along with mobile data to check your emails, etc. And if you compare the worst-case costs for the landline, it's $37/month vs. $38/month for the 300 minutes of Canada + US calls and an hour call to the UK, or $35/month vs. $36/month if you leave out the UK call. The only way to shave $4 off your monthly bill compared to a mobile phone is to have a bundle with Telus ($25/month + $6 for 300 long-distance minutes).

But this is assuming it's mobile vs. landline. In fact it's usually mobile and landline, which makes this comparison borderline pointless and simply shows there is literally no reason financially to keep a landline anymore when looking at it from the perspective of making a phone call (and if anyone says "911" then I will point out that by law, phone companies have to service a 911 call even if you don't pay for a phone line so you can keep a phone plugged in for emergencies if you want).

Receiving calls

If you ditch your landline, how do you receive calls? Well, presumably people have your cell number so you can just answer your mobile phone. As mentioned above, if you have multiple phones in your house this can be the single inconvenience you lose at the price of saving some money every month by ditching your landline. Now if you someone is calling your Google Voice number you actually mitigate this inconvenience as Hangouts will ring on your mobile phone thanks to the WiFi in the house, your tablets, and any computer you're logged into where you have the Chrome extension or Chrome app installed or are logged into Inbox or Gmail (the app is Chrome OS and WIndows only, the extension is any OS).

The only other inconvenience from ditching the landline at this point is what to do when the power goes out. That requires actually examining how often you make or receive phone calls during a power outage in a year compared to a couple hundred dollars (but do realize that's enough to buy a new Chromebook every year so it's not a small amount of money).

Alternatives

Now my entire family has Google+ accounts, so pointing them at Hangouts is no issue. But if for some reason you are not in such a position, you do have some alternatives for making voice calls. You can use something like Facebook Messenger or Skype, but the former only supports Messenger-to-Messenger and Skype charges to call a phone number. Talky lets you do video calls using nothing but your browser as long as it's Chrome, Firefox, or Opera (which I hope it is for anyone's sake in providing tech support from a security perspective). Lastly you can always switch to a VoIP-based solution like Shaw, Vonage, or Ooma if you truly cannot let go of your landline, but that's still throwing money away compared to just using your smartphone and Hangouts.

Conclusion

Using a mobile phone and Hangouts to make calls will cost you caller ID, unless you live in the States in which case it will simply come from your Google Voice number. It will also require you to use a device with Hangouts installed on it to place a call (mobile phone, tablet, or laptop). For receiving calls you only use your mobile phone, unless you have Google Voice in which case any device you can place a call with will also be able to receive a call to your Google Voice number (complete with transcribed voicemail).

For all of those "drawbacks", you get to drop your landline and save a couple hundred dollars a year. You also simplify what is required for people to call you by having one less phone number for them to try calling. And by actively using Hangouts to make calls it means you will also be actively using Hangouts itself, which means you can also use it for messaging. Anyone with children will enjoy that as using Hangouts means a much easier, more prompt reply to communication compared to trying to get a hold of your children with a phone call. And with Google Voice you can even send SMS messages if your children are "old school" and have not transitioned over to messaging apps yet. And finally, it means when you travel there won't be any issues placing or receiving calls since almost everyone has WiFi where they stay.

24 Oct 2014 7:29pm GMT

Mike Driscoll: Using Python to Log Data to Loggly

One of my readers suggested that I should try logging my data to a web service called Loggly. As I understand it, Loggly is a way to share log data with everyone in a business so that you no longer need to log in to individual machines. They also provide graphs, filters and searches of the logs. They don't have a Python API, but it's still pretty easy to send data to Loggly via Pythons urllib2 module and simplejson. Also note that you can use Loggly for 30-day trial period.

Let's take a look at some code. This code is based on the code from my article aboud logging currently running processes. You will need the following modules installed for this example to work:

I just used pip to install them both. Now that we have those, let's see how to use them to connect to Loggly

import psutil
import simplejson
import time
import urllib2
 
#----------------------------------------------------------------------
def log():
    """"""
    token = "YOUR-LOGGLY-TOKEN"
    url = "https://logs-01.loggly.com/inputs/%s/tag/python/" % token
 
    proc_dict = {}
 
    while True:
        procs = psutil.get_process_list()
        procs = sorted(procs, key=lambda proc: proc.name)
        for proc in procs:
            cpu_percent = proc.get_cpu_percent()
            mem_percent = proc.get_memory_percent()
            try:
                name = proc.name()
            except:
                # this is a process with no name, so skip it
                continue
 
            data = {"cpu_percent": str(cpu_percent),
                    "mem_percent": str(mem_percent),
                    }
            proc_dict[name] = data
 
        log_data = simplejson.dumps(proc_dict)
        urllib2.urlopen(url, log_data)
        time.sleep(60)
 
if __name__ == "__main__":
    log()

This is a pretty simple function, but let's break it down anyway. First off we set out Loggly token and create a Loggly URL to send our data to. Then we create an infinite loop that will grab a list of currently running processes every 60 seconds. Next we extract out the bits that we want to log and then we put those pieces into a dictionary of dictionaries. Finally we use simplejson's dumps method to turn our nested dictionary into a JSON formatted string and we pass that to our url. This send the log data to Loggly where it can get parsed.

Once you've sent enough data to Loggly for it to analyze, you can login to your account and see some bar charts that are created automatically based on your data. My data in this example didn't translate very well into graphs or trends, so those ended up looking pretty boring. I would recommend sending a system log or something else that contains more variety in it to get a better understanding of how useful this service would be for you.


Related Code

  • One of my readers came up with a modified version of my example

24 Oct 2014 5:15pm GMT

Mike Driscoll: Using Python to Log Data to Loggly

One of my readers suggested that I should try logging my data to a web service called Loggly. As I understand it, Loggly is a way to share log data with everyone in a business so that you no longer need to log in to individual machines. They also provide graphs, filters and searches of the logs. They don't have a Python API, but it's still pretty easy to send data to Loggly via Pythons urllib2 module and simplejson. Also note that you can use Loggly for 30-day trial period.

Let's take a look at some code. This code is based on the code from my article aboud logging currently running processes. You will need the following modules installed for this example to work:

I just used pip to install them both. Now that we have those, let's see how to use them to connect to Loggly

import psutil
import simplejson
import time
import urllib2
 
#----------------------------------------------------------------------
def log():
    """"""
    token = "YOUR-LOGGLY-TOKEN"
    url = "https://logs-01.loggly.com/inputs/%s/tag/python/" % token
 
    proc_dict = {}
 
    while True:
        procs = psutil.get_process_list()
        procs = sorted(procs, key=lambda proc: proc.name)
        for proc in procs:
            cpu_percent = proc.get_cpu_percent()
            mem_percent = proc.get_memory_percent()
            try:
                name = proc.name()
            except:
                # this is a process with no name, so skip it
                continue
 
            data = {"cpu_percent": str(cpu_percent),
                    "mem_percent": str(mem_percent),
                    }
            proc_dict[name] = data
 
        log_data = simplejson.dumps(proc_dict)
        urllib2.urlopen(url, log_data)
        time.sleep(60)
 
if __name__ == "__main__":
    log()

This is a pretty simple function, but let's break it down anyway. First off we set out Loggly token and create a Loggly URL to send our data to. Then we create an infinite loop that will grab a list of currently running processes every 60 seconds. Next we extract out the bits that we want to log and then we put those pieces into a dictionary of dictionaries. Finally we use simplejson's dumps method to turn our nested dictionary into a JSON formatted string and we pass that to our url. This send the log data to Loggly where it can get parsed.

Once you've sent enough data to Loggly for it to analyze, you can login to your account and see some bar charts that are created automatically based on your data. My data in this example didn't translate very well into graphs or trends, so those ended up looking pretty boring. I would recommend sending a system log or something else that contains more variety in it to get a better understanding of how useful this service would be for you.


Related Code

  • One of my readers came up with a modified version of my example

24 Oct 2014 5:15pm GMT

David Winterbottom: Bootstrapped virtualenvs

The excellent virtualenvwrapper supports a postmkvirtualenv script to bootstrap your virtual environments. Here's a useful implementation:

#!/usr/bin/env bash

# Grab project name from virtualenv name
NAME=$(basename $VIRTUAL_ENV)

# Set terminal title on postactivate
echo "title $NAME" > $VIRTUAL_ENV/bin/postactivate

# Change directory to root of project on postactivate. We assume the
# mkvirtualenv is being run from the root of the project. This line
# will need to edited later if not.
echo "cd $PWD" >> $VIRTUAL_ENV/bin/postactivate

# Run postactivate now to get the title set
source $POSTACTIVATE

This ensures that a new virtualenv has a postactivate script which:

  1. Sets the terminal title to that of the virtualenv
  2. Changes directory to the root of the project

By convention, such a script ...

24 Oct 2014 4:38pm GMT

David Winterbottom: Bootstrapped virtualenvs

The excellent virtualenvwrapper supports a postmkvirtualenv script to bootstrap your virtual environments. Here's a useful implementation:

#!/usr/bin/env bash

# Grab project name from virtualenv name
NAME=$(basename $VIRTUAL_ENV)

# Set terminal title on postactivate
echo "title $NAME" > $VIRTUAL_ENV/bin/postactivate

# Change directory to root of project on postactivate. We assume the
# mkvirtualenv is being run from the root of the project. This line
# will need to edited later if not.
echo "cd $PWD" >> $VIRTUAL_ENV/bin/postactivate

# Run postactivate now to get the title set
source $POSTACTIVATE

This ensures that a new virtualenv has a postactivate script which:

  1. Sets the terminal title to that of the virtualenv
  2. Changes directory to the root of the project

By convention, such a script ...

24 Oct 2014 4:38pm GMT

PyTennessee: Keynote: Jesse Noller

Jesse Noller (Twitter)

Jesse Noller is a long time Python community member, developer and has contributed to everything from distributed systems to front-end interfaces. He's passionate about community, developer experience and empowering developers everywhere, in any language to build amazing applications. He currently works for Rackspace as a Developer Advocate and open source contributor.

Only 21 more days to get your proposal in!

More info at https://www.pytennessee.org/speaking/cfp/

24 Oct 2014 1:22pm GMT

PyTennessee: Keynote: Jesse Noller

Jesse Noller (Twitter)

Jesse Noller is a long time Python community member, developer and has contributed to everything from distributed systems to front-end interfaces. He's passionate about community, developer experience and empowering developers everywhere, in any language to build amazing applications. He currently works for Rackspace as a Developer Advocate and open source contributor.

Only 21 more days to get your proposal in!

More info at https://www.pytennessee.org/speaking/cfp/

24 Oct 2014 1:22pm GMT

eGenix.com: eGenix pyOpenSSL Distribution 0.13.5 GA

Introduction

The eGenix.com pyOpenSSL Distribution includes everything you need to get started with SSL in Python. It comes with an easy to use installer that includes the most recent OpenSSL library versions in pre-compiled
form, making your application independent of OS provided OpenSSL libraries:

>>> eGenix pyOpenSSL Distribution Page

pyOpenSSL is an open-source Python add-on that allows writing SSL-aware networking applications as as certificate managment tools. It uses the OpenSSL library as performant and robust SSL engine.

OpenSSL is an open-source implementation of the SSL/TLS protocol.

News

This new release of the eGenix.com pyOpenSSL Distribution This new release of the eGenix.com pyOpenSSL Distribution updates the included OpenSSL version to the latest OpenSSL 1.0.1h version and adds a few more context options:

New in OpenSSL

New in eGenix pyOpenSSL

pyOpenSSL / OpenSSL Binaries Included

In addition to providing sources, we make binaries available that include both pyOpenSSL and the necessary OpenSSL libraries for all supported platforms: Windows x86 and x64, Linux x86 and x64, Mac OS X PPC, x86 and x64.

We have also added .egg-file distribution versions of our eGenix.com pyOpenSSL Distribution for Windows, Linux and Mac OS X to the available download options. These make setups using e.g. zc.buildout and other egg-file based installers a lot easier.

Downloads

Please visit the eGenix pyOpenSSL Distribution page for downloads, instructions on installation and documentation of the package.

Upgrading

Before installing this version of pyOpenSSL, please make sure that you uninstall any previously installed pyOpenSSL version. Otherwise, you could end up not using the included OpenSSL libs.

More Information

For more information on the eGenix pyOpenSSL Distribution, licensing and download instructions, please write to sales@egenix.com.

Enjoy !

Marc-Andre Lemburg, eGenix.com

24 Oct 2014 8:00am GMT

eGenix.com: eGenix pyOpenSSL Distribution 0.13.5 GA

Introduction

The eGenix.com pyOpenSSL Distribution includes everything you need to get started with SSL in Python. It comes with an easy to use installer that includes the most recent OpenSSL library versions in pre-compiled
form, making your application independent of OS provided OpenSSL libraries:

>>> eGenix pyOpenSSL Distribution Page

pyOpenSSL is an open-source Python add-on that allows writing SSL-aware networking applications as as certificate managment tools. It uses the OpenSSL library as performant and robust SSL engine.

OpenSSL is an open-source implementation of the SSL/TLS protocol.

News

This new release of the eGenix.com pyOpenSSL Distribution This new release of the eGenix.com pyOpenSSL Distribution updates the included OpenSSL version to the latest OpenSSL 1.0.1h version and adds a few more context options:

New in OpenSSL

New in eGenix pyOpenSSL

pyOpenSSL / OpenSSL Binaries Included

In addition to providing sources, we make binaries available that include both pyOpenSSL and the necessary OpenSSL libraries for all supported platforms: Windows x86 and x64, Linux x86 and x64, Mac OS X PPC, x86 and x64.

We have also added .egg-file distribution versions of our eGenix.com pyOpenSSL Distribution for Windows, Linux and Mac OS X to the available download options. These make setups using e.g. zc.buildout and other egg-file based installers a lot easier.

Downloads

Please visit the eGenix pyOpenSSL Distribution page for downloads, instructions on installation and documentation of the package.

Upgrading

Before installing this version of pyOpenSSL, please make sure that you uninstall any previously installed pyOpenSSL version. Otherwise, you could end up not using the included OpenSSL libs.

More Information

For more information on the eGenix pyOpenSSL Distribution, licensing and download instructions, please write to sales@egenix.com.

Enjoy !

Marc-Andre Lemburg, eGenix.com

24 Oct 2014 8:00am GMT

Yasoob Khalid: How to become a programmer, or the art of Googling well

Yasoob:

Not particularly related to Python but still a good read for every programmer :)

Originally posted on okepi:

*Note: Please read all italicized technical words as if they were in a foreign language.

The fall semester of my senior year, I was having some serious self-confidence issues. I had slowly come to realize that I did not, in fact, want to become a researcher. Statistics pained me, and the seemingly endless and fruitless nature of research bored me. I was someone who was driven by results - tangible products with deadlines that, upon completion, had a binary state: success, or failure. Going into my senior year, this revelation was followed by another. All of my skills thus far had been cultivated for research. If I wasn't going into research, I had… nothing.

At a liberal arts college, being a computer science major does not mean you are a "hacker". It can mean something as simple as, you were shopping around different departments, saw a command line for the…

View original 1,398 more words


24 Oct 2014 6:46am GMT

Yasoob Khalid: How to become a programmer, or the art of Googling well

Yasoob:

Not particularly related to Python but still a good read for every programmer :)

Originally posted on okepi:

*Note: Please read all italicized technical words as if they were in a foreign language.

The fall semester of my senior year, I was having some serious self-confidence issues. I had slowly come to realize that I did not, in fact, want to become a researcher. Statistics pained me, and the seemingly endless and fruitless nature of research bored me. I was someone who was driven by results - tangible products with deadlines that, upon completion, had a binary state: success, or failure. Going into my senior year, this revelation was followed by another. All of my skills thus far had been cultivated for research. If I wasn't going into research, I had… nothing.

At a liberal arts college, being a computer science major does not mean you are a "hacker". It can mean something as simple as, you were shopping around different departments, saw a command line for the…

View original 1,398 more words


24 Oct 2014 6:46am GMT

Vasudev Ram: Print selected text pages to PDF with Python, selpg and xtopdf on Linux

By Vasudev Ram



In a recent blog post, titled My IBM developerWorks article, I talked about a tutorial that I had written for IBM developerWorks a while ago. The tutorial showed some of the recommended techniques and practices to follow when writing a Linux command-line utility that is intended for production use, and how to write it in such a way that it can easily cooperate with existing UNIX command-line tools, when used in a UNIX command pipeline.

This ability of properly written command-line tools to cooperate with each other when used in a pipeline, is, as I said in that IBM article, one of the keys to the power of Linux (and UNIX) as a development environment. (See the classic book The UNIX Programming Environment, for much more on this topic.)

The utility I wrote and discussed (in that IBM article), called selpg (for SELect PaGes), allows the user to select a specified range of pages from a text file. At the end of the aforementioned blog post, I had said that I would show some practical uses of the selpg utility later. I describe one such use case below, involving a combination of selpg and my xtopdf toolkit), which is a Python library for PDF creation.

(The xtopdf toolkit contains a PDF creation library, and also includes some sample applications that show how to use the library to create PDF output in various ways, and from various input sources, which is why I tend to call xtopdf a toolkit instead of just a library.

I had written one such application of xtopdf a while ago, called StdinToPDF(.py) (for standard input to PDF). I blogged about it at the time, here:

[xtopdf] PDFWriter can create PDF from standard input. (PDFWriter is a module of xtopdf, which provides the core PDF creation functionality.)

The selpg utility can be used with StdinToPDF, in a pipeline, to select a range of pages (by starting and ending page numbers) from a (possibly large) text file, and write only those selected pages to a PDF file. Here is an example of how to do that:

First, build the selpg utility from source, for your Linux OS. selpg is only meant to work on Linux, since it uses some Linux C standard library functions, such as from stdio.h, and popen(); but you can try to run it on Windows (at your own risk), since Windows does have (had?) a POSIX subsystem, from Windows NT onward. I have used it in the past. (Update: I checked - according to this section of the Wikipedia article about POSIX, Windows may have had POSIX support only from Windows NT up to Windows 2000.) Anyway, to build selpg on Linux, follow the steps below (the $ sign is the shell prompt and not to be typed):

1. Download the source code from the sources section of the selpg project repository on Bitbucket.

Download all of these files: makefile, mk, selpg.c and showsyserr.c .

2. Make the (shell script) file mk executable, with the command:

$ chmod u+x mk

3. Then run the file mk, with:

$ ./mk

That will run the makefile that builds the selpg executable using the C compiler on your Linux box. The C compiler (invoked as cc or gcc) is installed on most mainstream Linux distributions. If it is not, you will need to install it from the repository for your Linux distribution. Sometimes only a minimal version of a C compiler is installed, which is only enough to (re)compile the kernel after making kernel parameter changes, such as for performance tuning. Consult your local Linux expert for help if such is the case.

3. Now make the file selpg executable, with the command:

$ chmod u+x selpg

4. (Optional) You can check the usage of selpg by reading the IBM tutorial article and/or running selpg without any command-line arguments:

$ ./selpg

which will show a usage message.

6. (Optional) You can run selpg a few times with some text file(s) as input, and different values for the -s and -e command-line options, to get a feel for how it works.
Now download xtopdf (which includes StdinToPDF) from here:

xtopdf on Bitbucket.

To install it, follow the steps given in this post:

Guide to installing and using xtopdf, including creating simple PDF e-books

That post was written a while ago, when xtopdf was hosted on SourceForge. So you need to make one change to the instructions given in that guide: instead of downloading xtopdf from SourceForge, as stated in Step 5 of the guide, get it from the xtopdf Bitbucket link I gave above.

(To make xtopdf work, you also have to install ReportLab, which xtopdf depends uses internally; the steps for that are given in my xtopdf installation guide linked above, or you can also look at the instructions in the ReportLab distribution. It is easy, just a couple of steps - download, unzip, configure a setting or two.)

Once you have both selpg and xtopdf installed, you can use selpg and StdinToPDF together. Here is an example run, to select only pages 2 through 4 from an input text file:

I wrote a simple Python program, gen_selpg_test_file,py, to create a text file that can be used to test the selpg and StdinToPDf programs together.

Here is an excerpt of the core logic of gen_selpg_test_file.py, omitting argument and error handling for brevity (I have those in the actual code):

# Generate the test file with the given filename and number of lines of text.
try:
out_fil = open(out_filename, "w")
except IOError as ioe:
sys.stderr.write("Error: Could not open output file {}.\n".format(out_filename))
sys.exit(1)
for line_num in range(1, num_lines + 1):
line = "Line #" + str(line_num).zfill(10) + "\n"
out_fil.write(line)
out_fil.close()

I ran it like this:

$ python gen_selpg_test_file.py selpg_test_file_1000.txt 1000

to generate a text file with 1000 lines, in the file selpg_test_file_1000.txt .

Then I could run the pipeline using selpg and StdinToPDF, as described above:

$ ./selpg -s2 -e4 selpg_test_file_1000.txt | python StdinToPDF.py p2-p4.pdf

This command extracts only the specifed pages (2 to 4) from the input file, and pipes them to StdinToPDF, which converts those pages only, to PDF, in the filename specified at the end of the command.

After doing the above, you can open the file p2_p4.pdf in your favorite PDF reader (Evince is one PDF reader for Linux), to confirm that it contains all (and only) the lines from page 2 to 4 of the input file selpg_test_file_1000.txt (considering 72 lines per page, which is the default that selpg uses).

Read the IBM article to see how that default can be changed - to either another number of lines per page, e.g. 66 or 80 or whatever, or to specify form feeds (ASCII code 12) as the page delimiter. Form feeds are often used as a page delimiter in text file reports generated by programs, when the reports are destined for a printer, since the form feed character causes the printer to advance the print head to the top of the next page/form (that's how the character got its name).

Though this post seemed long, note that a lot it was either background information or instructions on how to build selpg and install xtopdf. Those are both one time jobs. Once those are done, you can select the needed pages from any text file and print them to PDF with a single command-line, as shown in the last command above.

This is useful when you printed the entire file earlier, and some pages didn't print properly because the printer jammed. Just use selpg with xtopdf to print only the needed pages again.



The image above is from the Wikipedia article on Printing, and titled:

Jikji, "Selected Teachings of Buddhist Sages and Son Masters" from Korea, the earliest known book printed with movable metal type, 1377. Bibliothèque Nationale de France, Paris

- Enjoy.

- Vasudev Ram - Dancing Bison EnterprisesClick here to get email about new products from Vasudev Ram. Contact Page

Share |
Vasudev Ram

24 Oct 2014 5:20am GMT

Vasudev Ram: Print selected text pages to PDF with Python, selpg and xtopdf on Linux

By Vasudev Ram



In a recent blog post, titled My IBM developerWorks article, I talked about a tutorial that I had written for IBM developerWorks a while ago. The tutorial showed some of the recommended techniques and practices to follow when writing a Linux command-line utility that is intended for production use, and how to write it in such a way that it can easily cooperate with existing UNIX command-line tools, when used in a UNIX command pipeline.

This ability of properly written command-line tools to cooperate with each other when used in a pipeline, is, as I said in that IBM article, one of the keys to the power of Linux (and UNIX) as a development environment. (See the classic book The UNIX Programming Environment, for much more on this topic.)

The utility I wrote and discussed (in that IBM article), called selpg (for SELect PaGes), allows the user to select a specified range of pages from a text file. At the end of the aforementioned blog post, I had said that I would show some practical uses of the selpg utility later. I describe one such use case below, involving a combination of selpg and my xtopdf toolkit), which is a Python library for PDF creation.

(The xtopdf toolkit contains a PDF creation library, and also includes some sample applications that show how to use the library to create PDF output in various ways, and from various input sources, which is why I tend to call xtopdf a toolkit instead of just a library.

I had written one such application of xtopdf a while ago, called StdinToPDF(.py) (for standard input to PDF). I blogged about it at the time, here:

[xtopdf] PDFWriter can create PDF from standard input. (PDFWriter is a module of xtopdf, which provides the core PDF creation functionality.)

The selpg utility can be used with StdinToPDF, in a pipeline, to select a range of pages (by starting and ending page numbers) from a (possibly large) text file, and write only those selected pages to a PDF file. Here is an example of how to do that:

First, build the selpg utility from source, for your Linux OS. selpg is only meant to work on Linux, since it uses some Linux C standard library functions, such as from stdio.h, and popen(); but you can try to run it on Windows (at your own risk), since Windows does have (had?) a POSIX subsystem, from Windows NT onward. I have used it in the past. (Update: I checked - according to this section of the Wikipedia article about POSIX, Windows may have had POSIX support only from Windows NT up to Windows 2000.) Anyway, to build selpg on Linux, follow the steps below (the $ sign is the shell prompt and not to be typed):

1. Download the source code from the sources section of the selpg project repository on Bitbucket.

Download all of these files: makefile, mk, selpg.c and showsyserr.c .

2. Make the (shell script) file mk executable, with the command:

$ chmod u+x mk

3. Then run the file mk, with:

$ ./mk

That will run the makefile that builds the selpg executable using the C compiler on your Linux box. The C compiler (invoked as cc or gcc) is installed on most mainstream Linux distributions. If it is not, you will need to install it from the repository for your Linux distribution. Sometimes only a minimal version of a C compiler is installed, which is only enough to (re)compile the kernel after making kernel parameter changes, such as for performance tuning. Consult your local Linux expert for help if such is the case.

3. Now make the file selpg executable, with the command:

$ chmod u+x selpg

4. (Optional) You can check the usage of selpg by reading the IBM tutorial article and/or running selpg without any command-line arguments:

$ ./selpg

which will show a usage message.

6. (Optional) You can run selpg a few times with some text file(s) as input, and different values for the -s and -e command-line options, to get a feel for how it works.
Now download xtopdf (which includes StdinToPDF) from here:

xtopdf on Bitbucket.

To install it, follow the steps given in this post:

Guide to installing and using xtopdf, including creating simple PDF e-books

That post was written a while ago, when xtopdf was hosted on SourceForge. So you need to make one change to the instructions given in that guide: instead of downloading xtopdf from SourceForge, as stated in Step 5 of the guide, get it from the xtopdf Bitbucket link I gave above.

(To make xtopdf work, you also have to install ReportLab, which xtopdf depends uses internally; the steps for that are given in my xtopdf installation guide linked above, or you can also look at the instructions in the ReportLab distribution. It is easy, just a couple of steps - download, unzip, configure a setting or two.)

Once you have both selpg and xtopdf installed, you can use selpg and StdinToPDF together. Here is an example run, to select only pages 2 through 4 from an input text file:

I wrote a simple Python program, gen_selpg_test_file,py, to create a text file that can be used to test the selpg and StdinToPDf programs together.

Here is an excerpt of the core logic of gen_selpg_test_file.py, omitting argument and error handling for brevity (I have those in the actual code):

# Generate the test file with the given filename and number of lines of text.
try:
out_fil = open(out_filename, "w")
except IOError as ioe:
sys.stderr.write("Error: Could not open output file {}.\n".format(out_filename))
sys.exit(1)
for line_num in range(1, num_lines + 1):
line = "Line #" + str(line_num).zfill(10) + "\n"
out_fil.write(line)
out_fil.close()

I ran it like this:

$ python gen_selpg_test_file.py selpg_test_file_1000.txt 1000

to generate a text file with 1000 lines, in the file selpg_test_file_1000.txt .

Then I could run the pipeline using selpg and StdinToPDF, as described above:

$ ./selpg -s2 -e4 selpg_test_file_1000.txt | python StdinToPDF.py p2-p4.pdf

This command extracts only the specifed pages (2 to 4) from the input file, and pipes them to StdinToPDF, which converts those pages only, to PDF, in the filename specified at the end of the command.

After doing the above, you can open the file p2_p4.pdf in your favorite PDF reader (Evince is one PDF reader for Linux), to confirm that it contains all (and only) the lines from page 2 to 4 of the input file selpg_test_file_1000.txt (considering 72 lines per page, which is the default that selpg uses).

Read the IBM article to see how that default can be changed - to either another number of lines per page, e.g. 66 or 80 or whatever, or to specify form feeds (ASCII code 12) as the page delimiter. Form feeds are often used as a page delimiter in text file reports generated by programs, when the reports are destined for a printer, since the form feed character causes the printer to advance the print head to the top of the next page/form (that's how the character got its name).

Though this post seemed long, note that a lot it was either background information or instructions on how to build selpg and install xtopdf. Those are both one time jobs. Once those are done, you can select the needed pages from any text file and print them to PDF with a single command-line, as shown in the last command above.

This is useful when you printed the entire file earlier, and some pages didn't print properly because the printer jammed. Just use selpg with xtopdf to print only the needed pages again.



The image above is from the Wikipedia article on Printing, and titled:

Jikji, "Selected Teachings of Buddhist Sages and Son Masters" from Korea, the earliest known book printed with movable metal type, 1377. Bibliothèque Nationale de France, Paris

- Enjoy.

- Vasudev Ram - Dancing Bison EnterprisesClick here to get email about new products from Vasudev Ram. Contact Page

Share |
Vasudev Ram

24 Oct 2014 5:20am GMT

23 Oct 2014

feedPlanet Python

Python Sweetness: Guerrilla optimization for PyPy: lazy protocol buffer decoding

I've been hugely distracted this year with commercial and $other work, so for-fun projects have been taking a back seat. Still, that doesn't mean my interests have changed, it's just that my energy is a bit low, and stringing out cohesive blog posts is even harder than usual. Consequently I'm trying to keep this post concise, and somewhat lighter on narrative and heavier on useful information compared to my usual rambling.

It only took 8 months, but finally I've made some fresh commits to Acid, this time progressing its internal Protocol Buffers implementation toward usefulness. Per ticket #41, the selling point of this implementation will be its ability to operate without requiring any copies (at least in the CPython extension implementation), and its ability to avoid decoding values unless explicitly requested by the user.

The ultimate goal is to let Python scan collections at close to the speed of the storage engine (6m+ rows/sec) without having to write tricksy code, import insane 3rd party code (*cough* SQL) or specialized stores (Neo4j?).

I've only prototyped the pure-Python module that will eventually be used on PyPy, trying to get a feel for what the internals of the CPython extension will need to look like, and figuring out the kinds of primitive types (and their corresponding pitfalls) the module might want to provide/avoid.

The design is pretty straightforward, although special care must be paid, e.g. when handling of repeating elements, which will be represented by lists, or list-like things that know how to share memory and lazily decode.

The road to a 55x speedup

In the course of the past few days experimentation, quite a fun story has emerged around optimizing bit-twiddling code like this to run on PyPy.

My initial implementation, based on some old code from an abandoned project, was sufficient to implement immediate decoding of the Protocol Buffer into a dict, where the dict-like Struct class would then proxy __getitem__ calls and suchlike directly on to the dict.

The downside, though, per the design requirement, is that in order to implement a row scan where only one or two fields are selected from each row during the scan, a huge penalty is paid in decoding and then discarding every other unused field.

For testing decoding/encoding, I began with a "representative" Struct/StructType containing the fields:

Field #1: varint bool, 1 byte, True
Field #2: varint list, 5x 1 byte elements, [1, 2, 3, 4, 5]
Field #3: inet4, 1x fixed size 32bit element '255.0.255.0'
Field #4: string, 1979 byte /etc/passwd file
Field #5: bool, 1 byte, True
Field #6: string, 12 bytes, my full name

Tests are done using the timeit module to either measure encoding the entire struct, or instantiating the struct from its encoded form, and accessing only field #6. We're only interested in field #6 since in a lazy implementation, it requires the most effort to locate, since the decoder must first skip over all the previous elements.

On PyPy, the initial implementation was sufficient to net around 9.1usec to decode and 5.1usec to encode, corresponding to a throughput of around 100k rows/sec. Not bad for a starting point, but we can definitely do much better than that.

StringIO/BytesIO is slowww

From previous experience I knew the first place to start looking was the use of file-like objects for buffer management. Both on CPython and PyPy, use of StringIO to coordinate read-only buffer access is horrendously slow. I'm not sure I know why exactly, but I do know how to avoid it.

So first up came replace StringIO with direct access. Instead of passing a file-like object between all the parsing functions, simply to track the current read offset, instead we pass (buf, pos) in the parameter list, and all parsing functions return (pos2, value) as their return value. The caller resumes parsing at pos2. For free, we now get IndexError thrown any time a bounds check fails for a single element access, where previously we had to check the length of the string returned by fp.read(1). The fastest code is nonexistent code.

I'm not even going to attempt guessing at why this is so effective, but clearly it is: parsing time on PyPy 2.4.0 amd64 already dropped from 9.1usec to 1.69usec, all for a very simple, systematic modification to each function signature. Now we're up from 100k rows/sec to almost 600k/sec.

Not only that, but now the parser can operate on any sequence-like object that has 1-string elements and supports slicing, including e.g. mmap.mmap, which you could call the ultimate form of lazy decoding ;)

Lazy decoding take #1

Next up is Implement lazy decoding. This modifies the Struct type to simply stash the encoded buffer passed to it during initializatoin, and ask StructType during each __getitem__ to find and decode only the single requested element. Once the element is fetched, Struct stores it in its local dict to avoid having to decode it again.

With lazy decoding, work has shifted from heap allocating lists of integers and duplicating large 2kb strings, to simply scanning for field #6, never even having to touch a byte of that 2kb /etc/passwd file embedded in the record. Our parsing time drops from 1.69usec to 0.494usec. Now we're getting warmer - 2m Struct instantiation + field accessese/sec.

inline read_key() call for read_value()

At this point I thought it was already time to break out the micro-optimizations, and so I tried inlining the read_key() function, responsible for splitting the protocol buffer's field tag varint into its component parts, moving its implementation to its only call site.

Not much juice here, but a small win - 0.44usec.

precalculate varint to avoid some shift/masking

Now we're really in micro-optimizations territory. For a savings of 5nsec, precalculate some trivial math. Barely worth of the effort.

specialize encode for PyPy

By now the code is still using StringIO for the encode path, since the convenience is too hard to give up. PyPy provides a magical StringBuilder type, which knows how to incrementally build a string while avoiding (at least) the final copy where it is finalized.

As you can see from the commit message, this switch to more efficient buffering brought encode time on PyPy down considerably.

unroll write_varint()

You'll probably notice by now that the benchmark script used in the commit messages was getting edited as I went along. The numbers in the messages are a fair indication of the level of speedup occurring, but due to horrendous bugs in the initial unrolled write_varint(), I can't easily reproduce the reference runtime from this post using the current version of that script.

Usually loop unrolling is a technique reserved for extremely tricky C code, but that doesn't mean it doesn't have a place in Python land. This commit takes a while loop that can only execute up to 10 iterations and manually unfolds it, replacing all the state variables with immediate constants and very direct code.

Due to the ubiquitous use of variable-length integers in the protocol buffers scheme, in return we see a huge speed increase: encoding is now 2.5x faster than the clean "Pythonic" implementation. In some ways, this code is vastly easier to follow than the old loop, although I bet if I tried to run flake8 on the resulting file, a black hole would spontaneously form and swallow the entire universe.

use bytearray() instead of StringIO.

While we're targetting PyPy, that's not to say we can't also look after old CPython. Even though a C extension will exist for CPython, having the fallback implementation work well there is also beneficial.

Here we exploit the fact that bytearray.extend is generally faster on CPython than the equivalent StringIO dance, and so in this commit we bid farewell to our final use of StringIO.

Notice how the use of "if StringBuilder:" effectively avoids performing runtime checks in the hot path: instead of wiring the test into the _to_raw() function, we simply substitute the entire function with one written specifically for whatever string builder implementation is available.

For a small amount of effort, encoding on CPython is now almost 25% faster.

partially unroll read_varint()

We're not really interested in encoding - writes are generally always going to be a slow path in the average web app. We care mainly about decoding, and so back to looking for quick wins in decoder land.

Anyone familiar with what this code does may be noticing some rather extraordinarily obvious bugs in these commits. My only excuse is that this is experimental code, and I've already done a full working day before sitting down to it. ;)

Loop unrolling need not go the full hog. By noticing that most variable-length integers are actualy less than 7 bits long, here we avoid the slow ugly loop by testing explicitly for a 7-bit varint and exiting out quickly in that case.

In return, decoding time on PyPy drops by another 33%.

fully unroll read_varint and fix 64bit bugs.

Aware of all the bit-twiddling terrors of the past few days, tonight I rewrote write_varint/read_varint, to ensure that these functions did what they claim.

In the process, unrolled the read_varint loop. I don't have benchmarks for here, since by this point my benchmarking script would crash due to the aforementioned bugs.

The remainder of the loop unrolling probably isn't so helpful, since most varints are quite small, but at least it is very easy to understand the varint format from reading the code.

only cache mutable values in Struct dict.

We're already down to 0.209usec/field access on PyPy, corresponding to somewhere in the region of 4.9m scanned rows/sec.

At this point I re-read ticket #41, realize introducing the "avoid work" cache of decoded elements was not part of the original design, and probably also wasn't helping performance.

For mutable elements we always track the value returned to the user, since if they modify it, and later re-serialize the Struct, they expect their edits to be reflected. So I removed the cache, preserving it for mutable elements only.

In return, PyPy awards us with a delicious special case, relating to its collection strategies feature. On PyPy, a dict is not really ever a dict. It is one of about 5 different implementations, depending on the contents of the dict.

In the case of a dict that has never been populated with a single element, only 3 machine words are allocated for it (type, strategy, storage), and it's configured to use the initial EmptyDictStrategy.

By removing a performance misfeature, the runtime has a better chance to do its job, and single field decoding becomes 25% faster, yielding our final instantiate + field access time of 0.168usec.

constant time skip function selection [late addition]

This change splits up the _skip() function, responsible for skipping over unknown or unwanted fields into a set of functions stored in a map keyed by wire type.

This has no discernible effect on my PyPy microbenchmark, but it wins nearly 11% on CPython.

remove method call & reorder branch to avoid jump in the usual case [late addition]

Another micro-optimization with no benefit on PyPy. Swaps the order of the branches in an if: statement to avoid a jump on CPython. Additionally replace a method call with a primitive operation (thus avoiding e.g. building the argument tuple).

Yields another 3% on CPython.

Futures

I notice that 15% of time is spent in Struct.__getitem__, mostly trying to figure out whether the key is a valid field or not, and whether the result value is mutable.

We can get some of this dispatch 'for free' by folding the lookup into the type system by producing proper Struct subclasses for each StructType, and introducing a BoundField type that implements the descriptor protocol.

But that means abandoning the dict interface, and significantly complicating the CPython extension implementation when it comes time to write that, and I'm not sure just how much of the 15% can really be recovered by taking this approach.


So there we have it: in a few hours we've gone from 100k rows/sec to upwards of 6 million/sec, all through just a little mechanical sympathy. Now imagine this code scaled up, on the average software project, and the hardware cost savings involved should the code see any heavy traffic. But of course, there is never any real business benefit to wasting time on optimization like this! And of course, lets not forget how ugly and un-pythonic the resulting mess is.

Ok, I'm all done for now. If you know any more tricks relating to this code, please drop me a line. Email address is available in git logs, or press the Ask Me Anything link to the right of this text. Remember to include an email address!

23 Oct 2014 9:46pm GMT

Python Sweetness: Guerrilla optimization for PyPy: lazy protocol buffer decoding

I've been hugely distracted this year with commercial and $other work, so for-fun projects have been taking a back seat. Still, that doesn't mean my interests have changed, it's just that my energy is a bit low, and stringing out cohesive blog posts is even harder than usual. Consequently I'm trying to keep this post concise, and somewhat lighter on narrative and heavier on useful information compared to my usual rambling.

It only took 8 months, but finally I've made some fresh commits to Acid, this time progressing its internal Protocol Buffers implementation toward usefulness. Per ticket #41, the selling point of this implementation will be its ability to operate without requiring any copies (at least in the CPython extension implementation), and its ability to avoid decoding values unless explicitly requested by the user.

The ultimate goal is to let Python scan collections at close to the speed of the storage engine (6m+ rows/sec) without having to write tricksy code, import insane 3rd party code (*cough* SQL) or specialized stores (Neo4j?).

I've only prototyped the pure-Python module that will eventually be used on PyPy, trying to get a feel for what the internals of the CPython extension will need to look like, and figuring out the kinds of primitive types (and their corresponding pitfalls) the module might want to provide/avoid.

The design is pretty straightforward, although special care must be paid, e.g. when handling of repeating elements, which will be represented by lists, or list-like things that know how to share memory and lazily decode.

The road to a 55x speedup

In the course of the past few days experimentation, quite a fun story has emerged around optimizing bit-twiddling code like this to run on PyPy.

My initial implementation, based on some old code from an abandoned project, was sufficient to implement immediate decoding of the Protocol Buffer into a dict, where the dict-like Struct class would then proxy __getitem__ calls and suchlike directly on to the dict.

The downside, though, per the design requirement, is that in order to implement a row scan where only one or two fields are selected from each row during the scan, a huge penalty is paid in decoding and then discarding every other unused field.

For testing decoding/encoding, I began with a "representative" Struct/StructType containing the fields:

Field #1: varint bool, 1 byte, True
Field #2: varint list, 5x 1 byte elements, [1, 2, 3, 4, 5]
Field #3: inet4, 1x fixed size 32bit element '255.0.255.0'
Field #4: string, 1979 byte /etc/passwd file
Field #5: bool, 1 byte, True
Field #6: string, 12 bytes, my full name

Tests are done using the timeit module to either measure encoding the entire struct, or instantiating the struct from its encoded form, and accessing only field #6. We're only interested in field #6 since in a lazy implementation, it requires the most effort to locate, since the decoder must first skip over all the previous elements.

On PyPy, the initial implementation was sufficient to net around 9.1usec to decode and 5.1usec to encode, corresponding to a throughput of around 100k rows/sec. Not bad for a starting point, but we can definitely do much better than that.

StringIO/BytesIO is slowww

From previous experience I knew the first place to start looking was the use of file-like objects for buffer management. Both on CPython and PyPy, use of StringIO to coordinate read-only buffer access is horrendously slow. I'm not sure I know why exactly, but I do know how to avoid it.

So first up came replace StringIO with direct access. Instead of passing a file-like object between all the parsing functions, simply to track the current read offset, instead we pass (buf, pos) in the parameter list, and all parsing functions return (pos2, value) as their return value. The caller resumes parsing at pos2. For free, we now get IndexError thrown any time a bounds check fails for a single element access, where previously we had to check the length of the string returned by fp.read(1). The fastest code is nonexistent code.

I'm not even going to attempt guessing at why this is so effective, but clearly it is: parsing time on PyPy 2.4.0 amd64 already dropped from 9.1usec to 1.69usec, all for a very simple, systematic modification to each function signature. Now we're up from 100k rows/sec to almost 600k/sec.

Not only that, but now the parser can operate on any sequence-like object that has 1-string elements and supports slicing, including e.g. mmap.mmap, which you could call the ultimate form of lazy decoding ;)

Lazy decoding take #1

Next up is Implement lazy decoding. This modifies the Struct type to simply stash the encoded buffer passed to it during initializatoin, and ask StructType during each __getitem__ to find and decode only the single requested element. Once the element is fetched, Struct stores it in its local dict to avoid having to decode it again.

With lazy decoding, work has shifted from heap allocating lists of integers and duplicating large 2kb strings, to simply scanning for field #6, never even having to touch a byte of that 2kb /etc/passwd file embedded in the record. Our parsing time drops from 1.69usec to 0.494usec. Now we're getting warmer - 2m Struct instantiation + field accessese/sec.

inline read_key() call for read_value()

At this point I thought it was already time to break out the micro-optimizations, and so I tried inlining the read_key() function, responsible for splitting the protocol buffer's field tag varint into its component parts, moving its implementation to its only call site.

Not much juice here, but a small win - 0.44usec.

precalculate varint to avoid some shift/masking

Now we're really in micro-optimizations territory. For a savings of 5nsec, precalculate some trivial math. Barely worth of the effort.

specialize encode for PyPy

By now the code is still using StringIO for the encode path, since the convenience is too hard to give up. PyPy provides a magical StringBuilder type, which knows how to incrementally build a string while avoiding (at least) the final copy where it is finalized.

As you can see from the commit message, this switch to more efficient buffering brought encode time on PyPy down considerably.

unroll write_varint()

You'll probably notice by now that the benchmark script used in the commit messages was getting edited as I went along. The numbers in the messages are a fair indication of the level of speedup occurring, but due to horrendous bugs in the initial unrolled write_varint(), I can't easily reproduce the reference runtime from this post using the current version of that script.

Usually loop unrolling is a technique reserved for extremely tricky C code, but that doesn't mean it doesn't have a place in Python land. This commit takes a while loop that can only execute up to 10 iterations and manually unfolds it, replacing all the state variables with immediate constants and very direct code.

Due to the ubiquitous use of variable-length integers in the protocol buffers scheme, in return we see a huge speed increase: encoding is now 2.5x faster than the clean "Pythonic" implementation. In some ways, this code is vastly easier to follow than the old loop, although I bet if I tried to run flake8 on the resulting file, a black hole would spontaneously form and swallow the entire universe.

use bytearray() instead of StringIO.

While we're targetting PyPy, that's not to say we can't also look after old CPython. Even though a C extension will exist for CPython, having the fallback implementation work well there is also beneficial.

Here we exploit the fact that bytearray.extend is generally faster on CPython than the equivalent StringIO dance, and so in this commit we bid farewell to our final use of StringIO.

Notice how the use of "if StringBuilder:" effectively avoids performing runtime checks in the hot path: instead of wiring the test into the _to_raw() function, we simply substitute the entire function with one written specifically for whatever string builder implementation is available.

For a small amount of effort, encoding on CPython is now almost 25% faster.

partially unroll read_varint()

We're not really interested in encoding - writes are generally always going to be a slow path in the average web app. We care mainly about decoding, and so back to looking for quick wins in decoder land.

Anyone familiar with what this code does may be noticing some rather extraordinarily obvious bugs in these commits. My only excuse is that this is experimental code, and I've already done a full working day before sitting down to it. ;)

Loop unrolling need not go the full hog. By noticing that most variable-length integers are actualy less than 7 bits long, here we avoid the slow ugly loop by testing explicitly for a 7-bit varint and exiting out quickly in that case.

In return, decoding time on PyPy drops by another 33%.

fully unroll read_varint and fix 64bit bugs.

Aware of all the bit-twiddling terrors of the past few days, tonight I rewrote write_varint/read_varint, to ensure that these functions did what they claim.

In the process, unrolled the read_varint loop. I don't have benchmarks for here, since by this point my benchmarking script would crash due to the aforementioned bugs.

The remainder of the loop unrolling probably isn't so helpful, since most varints are quite small, but at least it is very easy to understand the varint format from reading the code.

only cache mutable values in Struct dict.

We're already down to 0.209usec/field access on PyPy, corresponding to somewhere in the region of 4.9m scanned rows/sec.

At this point I re-read ticket #41, realize introducing the "avoid work" cache of decoded elements was not part of the original design, and probably also wasn't helping performance.

For mutable elements we always track the value returned to the user, since if they modify it, and later re-serialize the Struct, they expect their edits to be reflected. So I removed the cache, preserving it for mutable elements only.

In return, PyPy awards us with a delicious special case, relating to its collection strategies feature. On PyPy, a dict is not really ever a dict. It is one of about 5 different implementations, depending on the contents of the dict.

In the case of a dict that has never been populated with a single element, only 3 machine words are allocated for it (type, strategy, storage), and it's configured to use the initial EmptyDictStrategy.

By removing a performance misfeature, the runtime has a better chance to do its job, and single field decoding becomes 25% faster, yielding our final instantiate + field access time of 0.168usec.

constant time skip function selection [late addition]

This change splits up the _skip() function, responsible for skipping over unknown or unwanted fields into a set of functions stored in a map keyed by wire type.

This has no discernible effect on my PyPy microbenchmark, but it wins nearly 11% on CPython.

remove method call & reorder branch to avoid jump in the usual case [late addition]

Another micro-optimization with no benefit on PyPy. Swaps the order of the branches in an if: statement to avoid a jump on CPython. Additionally replace a method call with a primitive operation (thus avoiding e.g. building the argument tuple).

Yields another 3% on CPython.

Futures

I notice that 15% of time is spent in Struct.__getitem__, mostly trying to figure out whether the key is a valid field or not, and whether the result value is mutable.

We can get some of this dispatch 'for free' by folding the lookup into the type system by producing proper Struct subclasses for each StructType, and introducing a BoundField type that implements the descriptor protocol.

But that means abandoning the dict interface, and significantly complicating the CPython extension implementation when it comes time to write that, and I'm not sure just how much of the 15% can really be recovered by taking this approach.


So there we have it: in a few hours we've gone from 100k rows/sec to upwards of 6 million/sec, all through just a little mechanical sympathy. Now imagine this code scaled up, on the average software project, and the hardware cost savings involved should the code see any heavy traffic. But of course, there is never any real business benefit to wasting time on optimization like this! And of course, lets not forget how ugly and un-pythonic the resulting mess is.

Ok, I'm all done for now. If you know any more tricks relating to this code, please drop me a line. Email address is available in git logs, or press the Ask Me Anything link to the right of this text. Remember to include an email address!

23 Oct 2014 9:46pm GMT

Lennart Regebro: 59% of maintained packages support Python 3

I ran some statistics on PyPI:

Of the maintained packages:

This means:

So: The 59% of maintained packages that declare what version they support, support Python 3.

And if you wonder: "Maintained" means at least one versions released this year (with files uploaded to PyPI) *or* at least 3 versions released the last three years.


Filed under: python, python3 Tagged: python, python 3

23 Oct 2014 7:14pm GMT

Lennart Regebro: 59% of maintained packages support Python 3

I ran some statistics on PyPI:

Of the maintained packages:

This means:

So: The 59% of maintained packages that declare what version they support, support Python 3.

And if you wonder: "Maintained" means at least one versions released this year (with files uploaded to PyPI) *or* at least 3 versions released the last three years.


Filed under: python, python3 Tagged: python, python 3

23 Oct 2014 7:14pm GMT

Mike Driscoll: PyWin32: How to Get an Application’s Version Number

Occasionally you will need to know what version of software you are using. The normal way to find this information out is usually done by opening the program, going to its Help menu and clicking the About menu item. But this is a Python blog and we want to do it programmatically! To do that on a Windows machine, we need PyWin32. In this article, we'll look at two different methods of getting the version number of an application.


Getting the Version with win32api

First off, we'll get the version number using PyWin32's win32api module. It's actually quite easy to use. Let's take a look:

from win32api import GetFileVersionInfo, LOWORD, HIWORD
 
def get_version_number(filename):
    try:
        info = GetFileVersionInfo (filename, "\\")
        ms = info['FileVersionMS']
        ls = info['FileVersionLS']
        return HIWORD (ms), LOWORD (ms), HIWORD (ls), LOWORD (ls)
    except:
        return "Unknown version"
 
if __name__ == "__main__":
    version = ".".join([str (i) for i in get_version_number (
        r'C:\Program Files\Internet Explorer\iexplore.exe')])
    print version

Here we call GetFileVersionInfo with a path and then attempt to parse the result. If we cannot parse it, then that means that the method didn't return us anything useful and that will cause an exception to be raised. We catch the exception and just return a string that tells us we couldn't find a version number. For this example, we check to see what version of Internet Explorer is installed.


Getting the Version with win32com

To make things more interesting, in the following example we check Google Chrome's version number using PyWin32's win32com module. Let's take a look:

# based on http://stackoverflow.com/questions/580924/python-windows-file-version-attribute
from win32com.client import Dispatch
 
def get_version_via_com(filename):
    parser = Dispatch("Scripting.FileSystemObject")
    version = parser.GetFileVersion(filename)
    return version
 
if __name__ == "__main__":
    path = r"C:\Program Files\Google\Chrome\Application\chrome.exe"
    print get_version_via_com(path)

All we do here is import win32com's Dispatch class and create an instance of that class. Next we call its GetFileVersion method and pass it the path to our executable. Finally we return the result which will be either the number or a message saying that no version information was available. I like this second method a bit more in that it automatically returns a message when no version information was found.


Wrapping Up

Now you know how to check an application version number on Windows. This can be helpful if you need to check if key software needs to be upgraded or perhaps you need to make sure it hasn't been upgraded because some other application requires the older version.


Related Reading

23 Oct 2014 12:30pm GMT

Mike Driscoll: PyWin32: How to Get an Application’s Version Number

Occasionally you will need to know what version of software you are using. The normal way to find this information out is usually done by opening the program, going to its Help menu and clicking the About menu item. But this is a Python blog and we want to do it programmatically! To do that on a Windows machine, we need PyWin32. In this article, we'll look at two different methods of getting the version number of an application.


Getting the Version with win32api

First off, we'll get the version number using PyWin32's win32api module. It's actually quite easy to use. Let's take a look:

from win32api import GetFileVersionInfo, LOWORD, HIWORD
 
def get_version_number(filename):
    try:
        info = GetFileVersionInfo (filename, "\\")
        ms = info['FileVersionMS']
        ls = info['FileVersionLS']
        return HIWORD (ms), LOWORD (ms), HIWORD (ls), LOWORD (ls)
    except:
        return "Unknown version"
 
if __name__ == "__main__":
    version = ".".join([str (i) for i in get_version_number (
        r'C:\Program Files\Internet Explorer\iexplore.exe')])
    print version

Here we call GetFileVersionInfo with a path and then attempt to parse the result. If we cannot parse it, then that means that the method didn't return us anything useful and that will cause an exception to be raised. We catch the exception and just return a string that tells us we couldn't find a version number. For this example, we check to see what version of Internet Explorer is installed.


Getting the Version with win32com

To make things more interesting, in the following example we check Google Chrome's version number using PyWin32's win32com module. Let's take a look:

# based on http://stackoverflow.com/questions/580924/python-windows-file-version-attribute
from win32com.client import Dispatch
 
def get_version_via_com(filename):
    parser = Dispatch("Scripting.FileSystemObject")
    version = parser.GetFileVersion(filename)
    return version
 
if __name__ == "__main__":
    path = r"C:\Program Files\Google\Chrome\Application\chrome.exe"
    print get_version_via_com(path)

All we do here is import win32com's Dispatch class and create an instance of that class. Next we call its GetFileVersion method and pass it the path to our executable. Finally we return the result which will be either the number or a message saying that no version information was available. I like this second method a bit more in that it automatically returns a message when no version information was found.


Wrapping Up

Now you know how to check an application version number on Windows. This can be helpful if you need to check if key software needs to be upgraded or perhaps you need to make sure it hasn't been upgraded because some other application requires the older version.


Related Reading

23 Oct 2014 12:30pm GMT

Kushal Das: More Fedora in life

I am using Fedora from the very fast release. Started contributing to the project from around 2005. I worked on Fedora during my free time, I did that before I joined Red Hat in 2008, during the time I worked in Red Hat and after I left Red Hat last year.

But for the last two weeks I am working on Fedora not only on my free times but also as my day job. I am the Fedora Cloud Engineer as a part of Fedora Engineering team and part of the amazing community of long time Fedora Friends.

23 Oct 2014 9:46am GMT

Kushal Das: More Fedora in life

I am using Fedora from the very fast release. Started contributing to the project from around 2005. I worked on Fedora during my free time, I did that before I joined Red Hat in 2008, during the time I worked in Red Hat and after I left Red Hat last year.

But for the last two weeks I am working on Fedora not only on my free times but also as my day job. I am the Fedora Cloud Engineer as a part of Fedora Engineering team and part of the amazing community of long time Fedora Friends.

23 Oct 2014 9:46am GMT

Kushal Das: Using docker in Fedora for your development work

Last week I worked on DNF for the first time. In this post I am going to explain how I used Docker and a Fedora cloud instance for the same.

I was using a CentOS vm as my primary work system for last two weeks and I had access to a cloud. I created a Fedora 20 instance there.

The first step was to install docker in it and update the system, I also had to upgrade the selinux-policy package and reboot the instance.

# yum upgrade selinux-policy -y; yum update -y
# reboot
# yum install docker-io
# systemctl start docker
# systemctl enable docker

Then pull in the Fedora 21 Docker image.

# docker pull fedora:21

The above command will take time as it will download the image. After this we will start a Fedora 21 container.

# docker run -t -i fedora:21 /bin/bash

We will install all the required dependencies in the image, use yum as you do normally and then get out by pressing Crrl+d.

[root@3e5de622ac00 /]# yum install dnf python-nose python-mock cmake -y

Now we can commit this as a new image so that we can reuse it in the future. We do this by docker commit command.

#  docker commit -m "with dnf" -a "Kushal Das" 3e5de622ac00 kushaldas/dnfimage

After this the only thing left to start a container with this newly created image and mounted directory from host machine.

# docker run -t -i -v /opt/dnf:/opt/dnf kushaldas/dnfimage /bin/bash

This command assumes the code is already in the /opt/dnf of the host system. Even if I managed to do something bad in that container, my actual host is safe. I just have to get out of the container and start a new one.

23 Oct 2014 9:30am GMT

Kushal Das: Using docker in Fedora for your development work

Last week I worked on DNF for the first time. In this post I am going to explain how I used Docker and a Fedora cloud instance for the same.

I was using a CentOS vm as my primary work system for last two weeks and I had access to a cloud. I created a Fedora 20 instance there.

The first step was to install docker in it and update the system, I also had to upgrade the selinux-policy package and reboot the instance.

# yum upgrade selinux-policy -y; yum update -y
# reboot
# yum install docker-io
# systemctl start docker
# systemctl enable docker

Then pull in the Fedora 21 Docker image.

# docker pull fedora:21

The above command will take time as it will download the image. After this we will start a Fedora 21 container.

# docker run -t -i fedora:21 /bin/bash

We will install all the required dependencies in the image, use yum as you do normally and then get out by pressing Crrl+d.

[root@3e5de622ac00 /]# yum install dnf python-nose python-mock cmake -y

Now we can commit this as a new image so that we can reuse it in the future. We do this by docker commit command.

#  docker commit -m "with dnf" -a "Kushal Das" 3e5de622ac00 kushaldas/dnfimage

After this the only thing left to start a container with this newly created image and mounted directory from host machine.

# docker run -t -i -v /opt/dnf:/opt/dnf kushaldas/dnfimage /bin/bash

This command assumes the code is already in the /opt/dnf of the host system. Even if I managed to do something bad in that container, my actual host is safe. I just have to get out of the container and start a new one.

23 Oct 2014 9:30am GMT

Python Piedmont Triad User Group: PYPTUG Meeting - October 27th

PYthon Piedmont Triad User Group meeting

Come join PYPTUG at out next meeting (October 27th 2014) to learn more about the Python programming language, modules and tools. Python is the perfect language to learn if you've never programmed before, and at the other end, it is also the perfect tool that no expert would do without.


What

Meeting will start at 5:30pm.

We will open on an Intro to PYPTUG and on how to get started with Python, PYPTUG activities and members projects, then on to News from the community.

This month we will have a tutorial review followed by a main talk.

Internet Tutorial Review

Continuing on the review last month of Gizeh (Cairo for Tourists), this month we will review an Internet tutorial on creating digital coupons: "Branded MMS coupon generation", and a few ways that this could be made better. This should be of interest to many: mobile devs, devops, infrastructure architects, web app devs, marketers, CIO/CTOs.

Main Talk

by Francois Dion
Title: "Mystery Python Theater 3K: What should be your next step"

Bio: Francois Dion is the founder of PYPTUG. In the few words of his blog's profile he is an "Entrepreneur, Hacker, Mentor, Polyglot, Polymath, Musician, Photographer"

Abstract: Francois will talk about Python 3, why it should be on your radar, what are some differences, how you should prepare for a transition. He will also review some specific cases in different fields, and what kind of changes had to be done, such as the case of the MMA software, a "Band in a box" style python program that let's you create full midi scores from basic chords, with Python 2 or 3.

Lightning talks!


We will have some time for extemporaneous "lightning talks" of 5-10 minute duration. If you'd like to do one, some suggestions of talks were provided here, if you are looking for inspiration. Or talk about a project you are working on.






When

Monday, October 27th 2014
Meeting starts at 5:30PM

Where

Wake Forest University,

close to Polo Rd and University Parkway:

Manchester Hall
room: Manchester 241
Wake Forest University, Winston-Salem, NC 27109

Map this

See also this campus map (PDF) and also the Parking Map (PDF) (Manchester hall is #20A on the parking map)

And speaking of parking: Parking after 5pm is on a first-come, first-serve basis. The official parking policy is:
"Visitors can park in any general parking lot on campus. Visitors should avoid reserved spaces, faculty/staff lots, fire lanes or other restricted area on campus. Frequent visitors should contact Parking and Transportation to register for a parking permit."

Mailing List


Don't forget to sign up to our user group mailing list:

https://groups.google.com/d/forum/pyptug?hl=en

It is the only step required to become a PYPTUG member.

Meetup Group


In order to get a feel for how much food we'll need, we ask that you register your attendance to this meeting on meetup:

http://www.meetup.com/PYthon-Piedmont-Triad-User-Group-PYPTUG/events/213427092/

23 Oct 2014 7:46am GMT

Python Piedmont Triad User Group: PYPTUG Meeting - October 27th

PYthon Piedmont Triad User Group meeting

Come join PYPTUG at out next meeting (October 27th 2014) to learn more about the Python programming language, modules and tools. Python is the perfect language to learn if you've never programmed before, and at the other end, it is also the perfect tool that no expert would do without.


What

Meeting will start at 5:30pm.

We will open on an Intro to PYPTUG and on how to get started with Python, PYPTUG activities and members projects, then on to News from the community.

This month we will have a tutorial review followed by a main talk.

Internet Tutorial Review

Continuing on the review last month of Gizeh (Cairo for Tourists), this month we will review an Internet tutorial on creating digital coupons: "Branded MMS coupon generation", and a few ways that this could be made better. This should be of interest to many: mobile devs, devops, infrastructure architects, web app devs, marketers, CIO/CTOs.

Main Talk

by Francois Dion
Title: "Mystery Python Theater 3K: What should be your next step"

Bio: Francois Dion is the founder of PYPTUG. In the few words of his blog's profile he is an "Entrepreneur, Hacker, Mentor, Polyglot, Polymath, Musician, Photographer"

Abstract: Francois will talk about Python 3, why it should be on your radar, what are some differences, how you should prepare for a transition. He will also review some specific cases in different fields, and what kind of changes had to be done, such as the case of the MMA software, a "Band in a box" style python program that let's you create full midi scores from basic chords, with Python 2 or 3.

Lightning talks!


We will have some time for extemporaneous "lightning talks" of 5-10 minute duration. If you'd like to do one, some suggestions of talks were provided here, if you are looking for inspiration. Or talk about a project you are working on.






When

Monday, October 27th 2014
Meeting starts at 5:30PM

Where

Wake Forest University,

close to Polo Rd and University Parkway:

Manchester Hall
room: Manchester 241
Wake Forest University, Winston-Salem, NC 27109

Map this

See also this campus map (PDF) and also the Parking Map (PDF) (Manchester hall is #20A on the parking map)

And speaking of parking: Parking after 5pm is on a first-come, first-serve basis. The official parking policy is:
"Visitors can park in any general parking lot on campus. Visitors should avoid reserved spaces, faculty/staff lots, fire lanes or other restricted area on campus. Frequent visitors should contact Parking and Transportation to register for a parking permit."

Mailing List


Don't forget to sign up to our user group mailing list:

https://groups.google.com/d/forum/pyptug?hl=en

It is the only step required to become a PYPTUG member.

Meetup Group


In order to get a feel for how much food we'll need, we ask that you register your attendance to this meeting on meetup:

http://www.meetup.com/PYthon-Piedmont-Triad-User-Group-PYPTUG/events/213427092/

23 Oct 2014 7:46am GMT

Montreal Python User Group: Mercurial: An easy and powerful alternative to git!

You have probably heard about git and you understand that source control is a good thing. Who made this change? When? Why? How do I go back to before this change happened? Which change broke the code? How do I combine two different streams of the same code? How do I collaborate with others? How do I collaborate with my past self, who knew things that my present self has forgotten?

Mercurial answers all of these questions! Like git and others, Mercurial is a distributed version control system (DVCS). Big players like Python, Mozilla, Facebook and others use it to keep track of their source code. Many hosting services exist for it, such as Mozdev, Google Code, or Bitbucket.

During our workshop, we will introduce the basics of using DVCS and how to configure and use Mercurial to suit your needs. The presentation will be in English, but we encourage questions and discussions in French.

Just bring your laptop, we'll have power and wifi!

Where

Room A-3230
École de Technologie Supérieure
1100 Rue Notre-Dame Ouest
Montréal, QC H3C 1K3 (Canada)
(https://goo.gl/maps/9auBB)

When

Novembre 6th, at 6pm, until 9pm

Subscription

http://mercurial-workshop.eventbrite.ca

23 Oct 2014 4:00am GMT

Montreal Python User Group: Mercurial: An easy and powerful alternative to git!

You have probably heard about git and you understand that source control is a good thing. Who made this change? When? Why? How do I go back to before this change happened? Which change broke the code? How do I combine two different streams of the same code? How do I collaborate with others? How do I collaborate with my past self, who knew things that my present self has forgotten?

Mercurial answers all of these questions! Like git and others, Mercurial is a distributed version control system (DVCS). Big players like Python, Mozilla, Facebook and others use it to keep track of their source code. Many hosting services exist for it, such as Mozdev, Google Code, or Bitbucket.

During our workshop, we will introduce the basics of using DVCS and how to configure and use Mercurial to suit your needs. The presentation will be in English, but we encourage questions and discussions in French.

Just bring your laptop, we'll have power and wifi!

Where

Room A-3230
École de Technologie Supérieure
1100 Rue Notre-Dame Ouest
Montréal, QC H3C 1K3 (Canada)
(https://goo.gl/maps/9auBB)

When

Novembre 6th, at 6pm, until 9pm

Subscription

http://mercurial-workshop.eventbrite.ca

23 Oct 2014 4:00am GMT

22 Oct 2014

feedPlanet Python

Obey the Testing Goat: Decorators!

Someone recently wrote to me asking about decorators, and saying they found them a bit confusing. Here's a post based on the email I replied to them with.

The best way to understand decorators is to build a couple of them, so here are two examples for you to try out. The first is in the Django world, the second is actually a simpler, pure-python one.

Challenge: build a decorator in a simple Django app

We've built a very basic todo lists app using Django. It has views to deal with viewing lists, creating new lists, and adding to existing lists. Two of these views end up doing some similar work, which is to retrieve a list object from the database based on its list ID:

def add_item(request, list_id):
    list_ = List.objects.get(id=list_id)
    Item.objects.create(text=request.POST['item_text'], list=list_)
    return redirect('/lists/%d/' % (list_.id,))


def view_list(request, list_id):
    list_ = List.objects.get(id=list_id)
    return render(request, 'list.html', {'list': list_})

(Full code here)

This is a good use case for a decorator.

A decorator can be used to extract duplicated work, and also to change the arguments to a function. So we should be able to build a decorator that does the list-getting for us. Here's the target:

@get_list
def add_item(request, list_):
    Item.objects.create(text=request.POST['item_text'], list=list_)
    return redirect('/lists/%d/' % (list_.id,))


@get_list
def view_list(request, list_):
    return render(request, 'list.html', {'list': list_})

So how do we build a decorator that does that? A decorator is a function that takes a function, and returns another function that does a slightly modified version of the work the original function was doing. We want our decorator to transform the simplified view functions we have above, into something that looks like the original functions.

(you end up saying "function" a lot in any explanation of decorators...)

Here's a template:

def get_list(view_fn):

    def decorated_view(...?):
        ???
        return view_fn(...?)

    return decorated_view

Can you get it working? Thankfully, our code has tests, so they'll tell you when you get it right...

git clone -b chapter_06 https://github.com/hjwp/book-example
python3 manage.py test lists # dependencies: django 1.7

Some rules of thumb for decorators:

Decorators definitely are a bit brain-melting, so it may take a bit of effort to wrap your head around it. Once you get the hang of them, they're dead useful though,

A simpler decorator challenge:

If you're finding it impossible, you could start with a simpler challenge... say, building a decorator to make functions return an absolute value:

def absolute(fn):
    # this decorator currently does nothing
    def modified_fn(x):
        return fn(x)
    return modified_fn


def foo(x):
    return 1 - x

assert foo(3) == -2


@absolute
def foo(x):
    return 1 - x

assert foo(3) == 2  # this will fail, get is passing!

Try it out:

git clone https://gist.github.com/2cc523b66d9c0fe41c4b.git deccy
python3 deccy/deccy.py

Enjoy!

[update 2014-10-23 at 3pm, see also @baroque, the decorating decorator decorator]

22 Oct 2014 11:00pm GMT

Obey the Testing Goat: Decorators!

Someone recently wrote to me asking about decorators, and saying they found them a bit confusing. Here's a post based on the email I replied to them with.

The best way to understand decorators is to build a couple of them, so here are two examples for you to try out. The first is in the Django world, the second is actually a simpler, pure-python one.

Challenge: build a decorator in a simple Django app

We've built a very basic todo lists app using Django. It has views to deal with viewing lists, creating new lists, and adding to existing lists. Two of these views end up doing some similar work, which is to retrieve a list object from the database based on its list ID:

def add_item(request, list_id):
    list_ = List.objects.get(id=list_id)
    Item.objects.create(text=request.POST['item_text'], list=list_)
    return redirect('/lists/%d/' % (list_.id,))


def view_list(request, list_id):
    list_ = List.objects.get(id=list_id)
    return render(request, 'list.html', {'list': list_})

(Full code here)

This is a good use case for a decorator.

A decorator can be used to extract duplicated work, and also to change the arguments to a function. So we should be able to build a decorator that does the list-getting for us. Here's the target:

@get_list
def add_item(request, list_):
    Item.objects.create(text=request.POST['item_text'], list=list_)
    return redirect('/lists/%d/' % (list_.id,))


@get_list
def view_list(request, list_):
    return render(request, 'list.html', {'list': list_})

So how do we build a decorator that does that? A decorator is a function that takes a function, and returns another function that does a slightly modified version of the work the original function was doing. We want our decorator to transform the simplified view functions we have above, into something that looks like the original functions.

(you end up saying "function" a lot in any explanation of decorators...)

Here's a template:

def get_list(view_fn):

    def decorated_view(...?):
        ???
        return view_fn(...?)

    return decorated_view

Can you get it working? Thankfully, our code has tests, so they'll tell you when you get it right...

git clone -b chapter_06 https://github.com/hjwp/book-example
python3 manage.py test lists # dependencies: django 1.7

Some rules of thumb for decorators:

Decorators definitely are a bit brain-melting, so it may take a bit of effort to wrap your head around it. Once you get the hang of them, they're dead useful though,

A simpler decorator challenge:

If you're finding it impossible, you could start with a simpler challenge... say, building a decorator to make functions return an absolute value:

def absolute(fn):
    # this decorator currently does nothing
    def modified_fn(x):
        return fn(x)
    return modified_fn


def foo(x):
    return 1 - x

assert foo(3) == -2


@absolute
def foo(x):
    return 1 - x

assert foo(3) == 2  # this will fail, get is passing!

Try it out:

git clone https://gist.github.com/2cc523b66d9c0fe41c4b.git deccy
python3 deccy/deccy.py

Enjoy!

[update 2014-10-23 at 3pm, see also @baroque, the decorating decorator decorator]

22 Oct 2014 11:00pm GMT

10 Nov 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: King Willams Town Bahnhof

Gestern musste ich morgens zur Station nach KWT um unsere Rerservierten Bustickets für die Weihnachtsferien in Capetown abzuholen. Der Bahnhof selber ist seit Dezember aus kostengründen ohne Zugverbindung - aber Translux und co - die langdistanzbusse haben dort ihre Büros.


Größere Kartenansicht




© benste CC NC SA

10 Nov 2011 10:57am GMT

09 Nov 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein

Niemand ist besorgt um so was - mit dem Auto fährt man einfach durch, und in der City - nahe Gnobie- "ne das ist erst gefährlich wenn die Feuerwehr da ist" - 30min später auf dem Rückweg war die Feuerwehr da.




© benste CC NC SA

09 Nov 2011 8:25pm GMT

08 Nov 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: Brai Party

Brai = Grillabend o.ä.

Die möchte gern Techniker beim Flicken ihrer SpeakOn / Klinke Stecker Verzweigungen...

Die Damen "Mamas" der Siedlung bei der offiziellen Eröffnungsrede

Auch wenn weniger Leute da waren als erwartet, Laute Musik und viele Leute ...

Und natürlich ein Feuer mit echtem Holz zum Grillen.

© benste CC NC SA

08 Nov 2011 2:30pm GMT

07 Nov 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: Lumanyano Primary

One of our missions was bringing Katja's Linux Server back to her room. While doing that we saw her new decoration.

Björn, Simphiwe carried the PC to Katja's school


© benste CC NC SA

07 Nov 2011 2:00pm GMT

06 Nov 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: Nelisa Haircut

Today I went with Björn to Needs Camp to Visit Katja's guest family for a special Party. First of all we visited some friends of Nelisa - yeah the one I'm working with in Quigney - Katja's guest fathers sister - who did her a haircut.

African Women usually get their hair done by arranging extensions and not like Europeans just cutting some hair.

In between she looked like this...

And then she was done - looks amazing considering the amount of hair she had last week - doesn't it ?

© benste CC NC SA

06 Nov 2011 7:45pm GMT

05 Nov 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: Mein Samstag

Irgendwie viel mir heute auf das ich meine Blogposts mal ein bischen umstrukturieren muss - wenn ich immer nur von neuen Plätzen berichte, dann müsste ich ja eine Rundreise machen. Hier also mal ein paar Sachen aus meinem heutigen Alltag.

Erst einmal vorweg, Samstag zählt zumindest für uns Voluntäre zu den freien Tagen.

Dieses Wochenende sind nur Rommel und ich auf der Farm - Katja und Björn sind ja mittlerweile in ihren Einsatzstellen, und meine Mitbewohner Kyle und Jonathan sind zu Hause in Grahamstown - sowie auch Sipho der in Dimbaza wohnt.
Robin, die Frau von Rommel ist in Woodie Cape - schon seit Donnerstag um da ein paar Sachen zur erledigen.
Naja wie dem auch sei heute morgen haben wir uns erstmal ein gemeinsames Weetbix/Müsli Frühstück gegönnt und haben uns dann auf den Weg nach East London gemacht. 2 Sachen waren auf der Checkliste Vodacom, Ethienne (Imobilienmakler) außerdem auf dem Rückweg die fehlenden Dinge nach NeedsCamp bringen.

Nachdem wir gerade auf der Dirtroad losgefahren sind mussten wir feststellen das wir die Sachen für Needscamp und Ethienne nicht eingepackt hatten aber die Pumpe für die Wasserversorgung im Auto hatten.

Also sind wir in EastLondon ersteinmal nach Farmerama - nein nicht das onlinespiel farmville - sondern einen Laden mit ganz vielen Sachen für eine Farm - in Berea einem nördlichen Stadteil gefahren.

In Farmerama haben wir uns dann beraten lassen für einen Schnellverschluss der uns das leben mit der Pumpe leichter machen soll und außerdem eine leichtere Pumpe zur Reperatur gebracht, damit es nicht immer so ein großer Aufwand ist, wenn mal wieder das Wasser ausgegangen ist.

Fego Caffé ist in der Hemmingways Mall, dort mussten wir und PIN und PUK einer unserer Datensimcards geben lassen, da bei der PIN Abfrage leider ein zahlendreher unterlaufen ist. Naja auf jeden Fall speichern die Shops in Südafrika so sensible Daten wie eine PUK - die im Prinzip zugang zu einem gesperrten Phone verschafft.

Im Cafe hat Rommel dann ein paar online Transaktionen mit dem 3G Modem durchgeführt, welches ja jetzt wieder funktionierte - und übrigens mittlerweile in Ubuntu meinem Linuxsystem perfekt klappt.

Nebenbei bin ich nach 8ta gegangen um dort etwas über deren neue Deals zu erfahren, da wir in einigen von Hilltops Centern Internet anbieten wollen. Das Bild zeigt die Abdeckung UMTS in NeedsCamp Katjas Ort. 8ta ist ein neuer Telefonanbieter von Telkom, nachdem Vodafone sich Telkoms anteile an Vodacom gekauft hat müssen die komplett neu aufbauen.
Wir haben uns dazu entschieden mal eine kostenlose Prepaidkarte zu testen zu organisieren, denn wer weis wie genau die Karte oben ist ... Bevor man einen noch so billigen Deal für 24 Monate signed sollte man wissen obs geht.

Danach gings nach Checkers in Vincent, gesucht wurden zwei Hotplates für WoodyCape - R 129.00 eine - also ca. 12€ für eine zweigeteilte Kochplatte.
Wie man sieht im Hintergrund gibts schon Weihnachtsdeko - Anfang November und das in Südafrika bei sonnig warmen min- 25°C

Mittagessen haben wir uns bei einem Pakistanischen Curry Imbiss gegönnt - sehr empfehlenswert !
Naja und nachdem wir dann vor ner Stunde oder so zurück gekommen sind habe ich noch den Kühlschrank geputzt den ich heute morgen zum defrosten einfach nach draußen gestellt hatte. Jetzt ist der auch mal wieder sauber und ohne 3m dicke Eisschicht...

Morgen ... ja darüber werde ich gesondert berichten ... aber vermutlich erst am Montag, denn dann bin ich nochmal wieder in Quigney(East London) und habe kostenloses Internet.

© benste CC NC SA

05 Nov 2011 4:33pm GMT

31 Oct 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: Sterkspruit Computer Center

Sterkspruit is one of Hilltops Computer Centres in the far north of Eastern Cape. On the trip to J'burg we've used the opportunity to take a look at the centre.

Pupils in the big classroom


The Trainer


School in Countryside


Adult Class in the Afternoon


"Town"


© benste CC NC SA

31 Oct 2011 4:58pm GMT

Benedict Stein: Technical Issues

What are you doing in an internet cafe if your ADSL and Faxline has been discontinued before months end. Well my idea was sitting outside and eating some ice cream.
At least it's sunny and not as rainy as on the weekend.


© benste CC NC SA

31 Oct 2011 3:11pm GMT

30 Oct 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: Nellis Restaurant

For those who are traveling through Zastron - there is a very nice Restaurant which is serving delicious food at reasanable prices.
In addition they're selling home made juices jams and honey.




interior


home made specialities - the shop in the shop


the Bar


© benste CC NC SA

30 Oct 2011 4:47pm GMT

29 Oct 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: The way back from J'burg

Having the 10 - 12h trip from J'burg back to ELS I was able to take a lot of pcitures including these different roadsides

Plain Street


Orange River in its beginngings (near Lesotho)


Zastron Anglican Church


The Bridge in Between "Free State" and Eastern Cape next to Zastron


my new Background ;)


If you listen to GoogleMaps you'll end up traveling 50km of gravel road - as it was just renewed we didn't have that many problems and saved 1h compared to going the official way with all it's constructions sites




Freeway


getting dark


© benste CC NC SA

29 Oct 2011 4:23pm GMT

28 Oct 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: Wie funktioniert eigentlich eine Baustelle ?

Klar einiges mag anders sein, vieles aber gleich - aber ein in Deutschland täglich übliches Bild einer Straßenbaustelle - wie läuft das eigentlich in Südafrika ?

Ersteinmal vorweg - NEIN keine Ureinwohner die mit den Händen graben - auch wenn hier mehr Manpower genutzt wird - sind sie fleißig mit Technologie am arbeiten.

Eine ganz normale "Bundesstraße"


und wie sie erweitert wird


gaaaanz viele LKWs


denn hier wird eine Seite über einen langen Abschnitt komplett gesperrt, so das eine Ampelschaltung mit hier 45 Minuten Wartezeit entsteht


Aber wenigstens scheinen die ihren Spaß zu haben ;) - Wie auch wir denn gücklicher Weise mussten wir nie länger als 10 min. warten.

© benste CC NC SA

28 Oct 2011 4:20pm GMT