21 Feb 2017

feedPlanet Python

Chris Moffitt: Populating MS Word Templates with Python

Introduction

In a previous post, I covered one approach for generating documents using HTML templates to create a PDF. While PDF is great, the world still relies on Microsoft Word for document creation. In reality, it will be much simpler for a business user to create the desired template that supports all the custom formatting they need in Word versus trying to use HTML+CSS. Fortunately, there is a a package that supports doing a MS Word mailmerge purely within python. This approach has the advantage of running on any system - even if Word is not installed. The benefit to using python for the merge (vs. an Excel sheet) is that you are not limited in how you retrieve or process the data. The full flexibility and power of the python ecosystem is at your finger tips. This should be a useful tool to keep in mind any time you need to automate document creation.

Background

The package that makes all of this possible is fittingly called docx-mailmerge. It is a mature package that can parse the MS Word docx file, find the merge fields and populate them with whatever values you need. The package also support some helper functions for populating tables and generating single files with multiple page breaks.

The one comment I have about this package is that using the term "mailmerge" evokes a very simple use case - populating multiple documents with mailing addresses. I know that the standard Word approach is to call this process a mailmerge but this "mailmerge" can be a useful templating system that can be used for a lot more sophisticated solution than just populating names and addresses in a document.

Installation

The package requires lxml which has platform specific binary installs. I recommend using conda to install lxml and the dependencies then using pip for the mailmerge package itself. I tested this on linux and Windows and seems to work fine on both platforms.

conda install lxml
pip install docx-mailmerge

That's it. Before we show how to populate the Word fields, let's walk through creating the Word document.

Word Merge Fields

In order for docx-mailmerge to work correctly, you need to create a standard Word document and define the appropriate merge fields. The examples below are for Word 2010. Other versions of Word should be similar. It actually took me a while to figure out this process but once you do it a couple of times, it is pretty simple.

Start Word and create the basic document structure. Then place the cursor in the location where the merged data should be inserted and choose Insert -> Quick Parts -> Field..:

Word Quick Parts

From the Field dialog box, select the "MergeField" option from the Field Names list. In the Field Name, enter the name you want for the field. In this case, we are using Business Name.

Word Add Field

Once you click ok, you should see something like this: <<Business Name>> in the Word document. You can go ahead and create the document with all the needed fields.

Simple Merge

Once you have the Word document created, merging the values is a simple operation. The code below contains the standard imports and defines the name of the Word file. In most cases, you will need to include the full path to the template but for simplicity, I am assuming it is in the same directory as your python scripts:

from __future__ import print_function
from mailmerge import MailMerge
from datetime import date

template = "Practical-Business-Python.docx"

To create a mailmerge document and look at all of the fields:

document = MailMerge(template)
print(document.get_merge_fields())
{'purchases', 'Business', 'address', 'discount', 'recipient', 'date', 'zip', 'status', 'phone_number', 'city', 'shipping_limit', 'state'}

To merge in the values and save the results, use document.merge with all of the variables assigned a value and document.write to save the output:

document.merge(
    status='Gold',
    city='Springfield',
    phone_number='800-555-5555',
    Business='Cool Shoes',
    zip='55555',
    purchases='$500,000',
    shipping_limit='$500',
    state='MO',
    address='1234 Main Street',
    date='{:%d-%b-%Y}'.format(date.today()),
    discount='5%',
    recipient='Mr. Jones')

document.write('test-output.docx')

Here is a sample of what the final document will look like:

Final Document

This is a simple document but pretty much anything you can do in Word can be turned into a template and populated in this manner.

Complex Merge

If you would like to replicate the results onto multiple pages, there is a shortcut called merge_pages which will take a list of dictionaries of key,value pairs and create multiple pages in a single file.

In a real world scenario you would pull the data from your master source (i.e. database, Excel, csv, etc.) and transform the data into the required dictionary format. For the purposes of keeping this simple, here are three customer dictionaries containing our output data:

cust_1 = {
    'status': 'Gold',
    'city': 'Springfield',
    'phone_number': '800-555-5555',
    'Business': 'Cool Shoes',
    'zip': '55555',
    'purchases': '$500,000',
    'shipping_limit': '$500',
    'state': 'MO',
    'address': '1234 Main Street',
    'date': '{:%d-%b-%Y}'.format(date.today()),
    'discount': '5%',
    'recipient': 'Mr. Jones'
}

cust_2 = {
    'status': 'Silver',
    'city': 'Columbus',
    'phone_number': '800-555-5551',
    'Business': 'Fancy Pants',
    'zip': '55551',
    'purchases': '$250,000',
    'shipping_limit': '$2000',
    'state': 'OH',
    'address': '1234 Elm St',
    'date': '{:%d-%b-%Y}'.format(date.today()),
    'discount': '2%',
    'recipient': 'Mrs. Smith'
}

cust_3 = {
    'status': 'Bronze',
    'city': 'Franklin',
    'phone_number': '800-555-5511',
    'Business': 'Tango Tops',
    'zip': '55511',
    'purchases': '$100,000',
    'shipping_limit': '$2500',
    'state': 'KY',
    'address': '1234 Adams St',
    'date': '{:%d-%b-%Y}'.format(date.today()),
    'discount': '2%',
    'recipient': 'Mr. Lincoln'
}

Creating a 3 page document is done by passing a list of dictionaries to the merge_pages function:

document.merge_pages([cust_1, cust_2, cust_3])
document.write('test-output-mult-custs.docx')

The output file is formatted and ready for printing or further editing.

Populating Tables

Another frequent need when generating templates is efficiently populating a table of values. In our example, we could attach an exhibit to the letter that includes the customer's purchase history. When completing the template, we do not know how many rows to include and the challenge of naming each field would get overwhelming very quickly. Using merge_rows makes table population much easier.

To build out the template, create a standard Word table with 1 row and insert the fields in the appropriate columns. There is no special formatting required. It should look something like this:

Word Table Template

Next, we need to define a list of dictionaries for each item in the table.

sales_history = [{
    'prod_desc': 'Red Shoes',
    'price': '$10.00',
    'quantity': '2500',
    'total_purchases': '$25,000.00'
}, {
    'prod_desc': 'Green Shirt',
    'price': '$20.00',
    'quantity': '10000',
    'total_purchases': '$200,000.00'
}, {
    'prod_desc': 'Purple belt',
    'price': '$5.00',
    'quantity': '5000',
    'total_purchases': '$25,000.00'
}]

The keys in each dictionary correspond to the merge fields in the document. To build out the rows in the table:

document.merge(**cust_2)
document.merge_rows('prod_desc', sales_history)
document.write('test-output-table.docx')

In this example, we pass a dictionary to merge by passing the two ** . Python knows how to convert that into the key=value format that the function needs. The final step is to call merge_rows to build out the rows of the table.

The final result has each row populated with the values we need and preserves the default table formatting we defined in the template document:

Word Table

Full Code Example

In case the process was a little confusing, here is a full example showing all of the various approaches presented in this article. In addition, the template files can be downloaded from the github repo.

from __future__ import print_function
from mailmerge import MailMerge
from datetime import date

# Define the templates - assumes they are in the same directory as the code
template_1 = "Practical-Business-Python.docx"
template_2 = "Practical-Business-Python-History.docx"

# Show a simple example
document_1 = MailMerge(template_1)
print("Fields included in {}: {}".format(template_1,
                                         document_1.get_merge_fields()))

# Merge in the values
document_1.merge(
    status='Gold',
    city='Springfield',
    phone_number='800-555-5555',
    Business='Cool Shoes',
    zip='55555',
    purchases='$500,000',
    shipping_limit='$500',
    state='MO',
    address='1234 Main Street',
    date='{:%d-%b-%Y}'.format(date.today()),
    discount='5%',
    recipient='Mr. Jones')

# Save the document as example 1
document_1.write('example1.docx')

# Try example number two where we create multiple pages
# Define a dictionary for 3 customers
cust_1 = {
    'status': 'Gold',
    'city': 'Springfield',
    'phone_number': '800-555-5555',
    'Business': 'Cool Shoes',
    'zip': '55555',
    'purchases': '$500,000',
    'shipping_limit': '$500',
    'state': 'MO',
    'address': '1234 Main Street',
    'date': '{:%d-%b-%Y}'.format(date.today()),
    'discount': '5%',
    'recipient': 'Mr. Jones'
}

cust_2 = {
    'status': 'Silver',
    'city': 'Columbus',
    'phone_number': '800-555-5551',
    'Business': 'Fancy Pants',
    'zip': '55551',
    'purchases': '$250,000',
    'shipping_limit': '$2000',
    'state': 'OH',
    'address': '1234 Elm St',
    'date': '{:%d-%b-%Y}'.format(date.today()),
    'discount': '2%',
    'recipient': 'Mrs. Smith'
}

cust_3 = {
    'status': 'Bronze',
    'city': 'Franklin',
    'phone_number': '800-555-5511',
    'Business': 'Tango Tops',
    'zip': '55511',
    'purchases': '$100,000',
    'shipping_limit': '$2500',
    'state': 'KY',
    'address': '1234 Adams St',
    'date': '{:%d-%b-%Y}'.format(date.today()),
    'discount': '2%',
    'recipient': 'Mr. Lincoln'
}

document_2 = MailMerge(template_1)
document_2.merge_pages([cust_1, cust_2, cust_3])
document_2.write('example2.docx')

# Final Example includes a table with the sales history

sales_history = [{
    'prod_desc': 'Red Shoes',
    'price': '$10.00',
    'quantity': '2500',
    'total_purchases': '$25,000.00'
}, {
    'prod_desc': 'Green Shirt',
    'price': '$20.00',
    'quantity': '10000',
    'total_purchases': '$200,000.00'
}, {
    'prod_desc': 'Purple belt',
    'price': '$5.00',
    'quantity': '5000',
    'total_purchases': '$25,000.00'
}]

document_3 = MailMerge(template_2)
document_3.merge(**cust_2)
document_3.merge_rows('prod_desc', sales_history)
document_3.write('example3.docx')

Conclusion

I am always happy to find python-based solutions that will help me get away from using MS Office automation. I am generally more proficient with python and feel that the solutions are more portable. The docx-mailmerge library is one of those simple but powerful tools that I am sure I will use on many occasions in the future.

21 Feb 2017 1:25pm GMT

Chris Moffitt: Populating MS Word Templates with Python

Introduction

In a previous post, I covered one approach for generating documents using HTML templates to create a PDF. While PDF is great, the world still relies on Microsoft Word for document creation. In reality, it will be much simpler for a business user to create the desired template that supports all the custom formatting they need in Word versus trying to use HTML+CSS. Fortunately, there is a a package that supports doing a MS Word mailmerge purely within python. This approach has the advantage of running on any system - even if Word is not installed. The benefit to using python for the merge (vs. an Excel sheet) is that you are not limited in how you retrieve or process the data. The full flexibility and power of the python ecosystem is at your finger tips. This should be a useful tool to keep in mind any time you need to automate document creation.

Background

The package that makes all of this possible is fittingly called docx-mailmerge. It is a mature package that can parse the MS Word docx file, find the merge fields and populate them with whatever values you need. The package also support some helper functions for populating tables and generating single files with multiple page breaks.

The one comment I have about this package is that using the term "mailmerge" evokes a very simple use case - populating multiple documents with mailing addresses. I know that the standard Word approach is to call this process a mailmerge but this "mailmerge" can be a useful templating system that can be used for a lot more sophisticated solution than just populating names and addresses in a document.

Installation

The package requires lxml which has platform specific binary installs. I recommend using conda to install lxml and the dependencies then using pip for the mailmerge package itself. I tested this on linux and Windows and seems to work fine on both platforms.

conda install lxml
pip install docx-mailmerge

That's it. Before we show how to populate the Word fields, let's walk through creating the Word document.

Word Merge Fields

In order for docx-mailmerge to work correctly, you need to create a standard Word document and define the appropriate merge fields. The examples below are for Word 2010. Other versions of Word should be similar. It actually took me a while to figure out this process but once you do it a couple of times, it is pretty simple.

Start Word and create the basic document structure. Then place the cursor in the location where the merged data should be inserted and choose Insert -> Quick Parts -> Field..:

Word Quick Parts

From the Field dialog box, select the "MergeField" option from the Field Names list. In the Field Name, enter the name you want for the field. In this case, we are using Business Name.

Word Add Field

Once you click ok, you should see something like this: <<Business Name>> in the Word document. You can go ahead and create the document with all the needed fields.

Simple Merge

Once you have the Word document created, merging the values is a simple operation. The code below contains the standard imports and defines the name of the Word file. In most cases, you will need to include the full path to the template but for simplicity, I am assuming it is in the same directory as your python scripts:

from __future__ import print_function
from mailmerge import MailMerge
from datetime import date

template = "Practical-Business-Python.docx"

To create a mailmerge document and look at all of the fields:

document = MailMerge(template)
print(document.get_merge_fields())
{'purchases', 'Business', 'address', 'discount', 'recipient', 'date', 'zip', 'status', 'phone_number', 'city', 'shipping_limit', 'state'}

To merge in the values and save the results, use document.merge with all of the variables assigned a value and document.write to save the output:

document.merge(
    status='Gold',
    city='Springfield',
    phone_number='800-555-5555',
    Business='Cool Shoes',
    zip='55555',
    purchases='$500,000',
    shipping_limit='$500',
    state='MO',
    address='1234 Main Street',
    date='{:%d-%b-%Y}'.format(date.today()),
    discount='5%',
    recipient='Mr. Jones')

document.write('test-output.docx')

Here is a sample of what the final document will look like:

Final Document

This is a simple document but pretty much anything you can do in Word can be turned into a template and populated in this manner.

Complex Merge

If you would like to replicate the results onto multiple pages, there is a shortcut called merge_pages which will take a list of dictionaries of key,value pairs and create multiple pages in a single file.

In a real world scenario you would pull the data from your master source (i.e. database, Excel, csv, etc.) and transform the data into the required dictionary format. For the purposes of keeping this simple, here are three customer dictionaries containing our output data:

cust_1 = {
    'status': 'Gold',
    'city': 'Springfield',
    'phone_number': '800-555-5555',
    'Business': 'Cool Shoes',
    'zip': '55555',
    'purchases': '$500,000',
    'shipping_limit': '$500',
    'state': 'MO',
    'address': '1234 Main Street',
    'date': '{:%d-%b-%Y}'.format(date.today()),
    'discount': '5%',
    'recipient': 'Mr. Jones'
}

cust_2 = {
    'status': 'Silver',
    'city': 'Columbus',
    'phone_number': '800-555-5551',
    'Business': 'Fancy Pants',
    'zip': '55551',
    'purchases': '$250,000',
    'shipping_limit': '$2000',
    'state': 'OH',
    'address': '1234 Elm St',
    'date': '{:%d-%b-%Y}'.format(date.today()),
    'discount': '2%',
    'recipient': 'Mrs. Smith'
}

cust_3 = {
    'status': 'Bronze',
    'city': 'Franklin',
    'phone_number': '800-555-5511',
    'Business': 'Tango Tops',
    'zip': '55511',
    'purchases': '$100,000',
    'shipping_limit': '$2500',
    'state': 'KY',
    'address': '1234 Adams St',
    'date': '{:%d-%b-%Y}'.format(date.today()),
    'discount': '2%',
    'recipient': 'Mr. Lincoln'
}

Creating a 3 page document is done by passing a list of dictionaries to the merge_pages function:

document.merge_pages([cust_1, cust_2, cust_3])
document.write('test-output-mult-custs.docx')

The output file is formatted and ready for printing or further editing.

Populating Tables

Another frequent need when generating templates is efficiently populating a table of values. In our example, we could attach an exhibit to the letter that includes the customer's purchase history. When completing the template, we do not know how many rows to include and the challenge of naming each field would get overwhelming very quickly. Using merge_rows makes table population much easier.

To build out the template, create a standard Word table with 1 row and insert the fields in the appropriate columns. There is no special formatting required. It should look something like this:

Word Table Template

Next, we need to define a list of dictionaries for each item in the table.

sales_history = [{
    'prod_desc': 'Red Shoes',
    'price': '$10.00',
    'quantity': '2500',
    'total_purchases': '$25,000.00'
}, {
    'prod_desc': 'Green Shirt',
    'price': '$20.00',
    'quantity': '10000',
    'total_purchases': '$200,000.00'
}, {
    'prod_desc': 'Purple belt',
    'price': '$5.00',
    'quantity': '5000',
    'total_purchases': '$25,000.00'
}]

The keys in each dictionary correspond to the merge fields in the document. To build out the rows in the table:

document.merge(**cust_2)
document.merge_rows('prod_desc', sales_history)
document.write('test-output-table.docx')

In this example, we pass a dictionary to merge by passing the two ** . Python knows how to convert that into the key=value format that the function needs. The final step is to call merge_rows to build out the rows of the table.

The final result has each row populated with the values we need and preserves the default table formatting we defined in the template document:

Word Table

Full Code Example

In case the process was a little confusing, here is a full example showing all of the various approaches presented in this article. In addition, the template files can be downloaded from the github repo.

from __future__ import print_function
from mailmerge import MailMerge
from datetime import date

# Define the templates - assumes they are in the same directory as the code
template_1 = "Practical-Business-Python.docx"
template_2 = "Practical-Business-Python-History.docx"

# Show a simple example
document_1 = MailMerge(template_1)
print("Fields included in {}: {}".format(template_1,
                                         document_1.get_merge_fields()))

# Merge in the values
document_1.merge(
    status='Gold',
    city='Springfield',
    phone_number='800-555-5555',
    Business='Cool Shoes',
    zip='55555',
    purchases='$500,000',
    shipping_limit='$500',
    state='MO',
    address='1234 Main Street',
    date='{:%d-%b-%Y}'.format(date.today()),
    discount='5%',
    recipient='Mr. Jones')

# Save the document as example 1
document_1.write('example1.docx')

# Try example number two where we create multiple pages
# Define a dictionary for 3 customers
cust_1 = {
    'status': 'Gold',
    'city': 'Springfield',
    'phone_number': '800-555-5555',
    'Business': 'Cool Shoes',
    'zip': '55555',
    'purchases': '$500,000',
    'shipping_limit': '$500',
    'state': 'MO',
    'address': '1234 Main Street',
    'date': '{:%d-%b-%Y}'.format(date.today()),
    'discount': '5%',
    'recipient': 'Mr. Jones'
}

cust_2 = {
    'status': 'Silver',
    'city': 'Columbus',
    'phone_number': '800-555-5551',
    'Business': 'Fancy Pants',
    'zip': '55551',
    'purchases': '$250,000',
    'shipping_limit': '$2000',
    'state': 'OH',
    'address': '1234 Elm St',
    'date': '{:%d-%b-%Y}'.format(date.today()),
    'discount': '2%',
    'recipient': 'Mrs. Smith'
}

cust_3 = {
    'status': 'Bronze',
    'city': 'Franklin',
    'phone_number': '800-555-5511',
    'Business': 'Tango Tops',
    'zip': '55511',
    'purchases': '$100,000',
    'shipping_limit': '$2500',
    'state': 'KY',
    'address': '1234 Adams St',
    'date': '{:%d-%b-%Y}'.format(date.today()),
    'discount': '2%',
    'recipient': 'Mr. Lincoln'
}

document_2 = MailMerge(template_1)
document_2.merge_pages([cust_1, cust_2, cust_3])
document_2.write('example2.docx')

# Final Example includes a table with the sales history

sales_history = [{
    'prod_desc': 'Red Shoes',
    'price': '$10.00',
    'quantity': '2500',
    'total_purchases': '$25,000.00'
}, {
    'prod_desc': 'Green Shirt',
    'price': '$20.00',
    'quantity': '10000',
    'total_purchases': '$200,000.00'
}, {
    'prod_desc': 'Purple belt',
    'price': '$5.00',
    'quantity': '5000',
    'total_purchases': '$25,000.00'
}]

document_3 = MailMerge(template_2)
document_3.merge(**cust_2)
document_3.merge_rows('prod_desc', sales_history)
document_3.write('example3.docx')

Conclusion

I am always happy to find python-based solutions that will help me get away from using MS Office automation. I am generally more proficient with python and feel that the solutions are more portable. The docx-mailmerge library is one of those simple but powerful tools that I am sure I will use on many occasions in the future.

21 Feb 2017 1:25pm GMT

S. Lott: Intro to Python CSV Processing for Actual Beginners

I've written a lot about CSV processing. Here are some examples http://slott-softwarearchitect.blogspot.com/search/label/csv.

It crops up in my books. A lot.

In all cases, though, I make the implicit assumption that my readers already know a lot of Python. This is a disservice to anyone who's getting started.

Getting Started

You'll need Python 3.6. Nothing else will do if you're starting out.

Go to https://www.continuum.io/downloads and get Python 3.6. You can get the small "miniconda" version to start with. It has some of what you'll need to hack around with CSV files. The full Anaconda version contains a mountain of cool stuff, but it's a big download.

Once you have Python installed, what next? To be sure things are running do this:

  1. Find a command line prompt (terminal window, cmd.exe, whatever it's called on your OS.)
  2. Enter python3.6 (or just python in Windows.)
  3. If Anaconda installed everything properly, you'll have an interaction that looks like this:


MacBookPro-SLott:Python2v3 slott$ python3.5
Python 3.5.1 (v3.5.1:37a07cee5969, Dec 5 2015, 21:12:44)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>

More-or-less. (Yes, the example shows 3.5.1 even though I said you should get 3.6. As soon as the Lynda.com course drops, I'll upgrade. The differences between 3.5 and 3.6 are almost invisible.)

Here's your first interaction.

>>> 355/113
3.1415929203539825

Yep. Python did math. Stuff is happening.

Here's some more.

>>> exit
Use exit() or Ctrl-D (i.e. EOF) to exit
>>> exit()

Okay. That was fun. But it's not data wrangling. When do we get to the good stuff?

To Script or Not To Script

We have two paths when it comes to scripting. You can write script files and run them. This is pretty normal application development stuff. It works well.

Or.

You can use a Jupyter Notebook. This isn't exactly a script. But. You can use it like a script. It's a good place to start building some code that's useful. You can rerun some (or all) of the notebook to make it script-like.

If you downloaded Anaconda, you have Jupyter. Done. Skip over the next part on installing Jupyter.

Installing Jupyter

If you did not download the full Anaconda -- perhaps because you used the miniconda -- you'll need to add Jupyter. You can use the command conda install jupyter for this.

Another choice is to use the PIP program to install jupyter. The net effect is the same. It starts like this


MacBookPro-SLott:Python2v3 slott$ pip3 install jupyter
Collecting jupyter
Downloading jupyter-1.0.0-py2.py3-none-any.whl
Collecting ipykernel (from jupyter)
Downloading ipykernel-4.5.2-py2.py3-none-any.whl (98kB)

100% |████████████████████████████████| 102kB 1.3MB/s

It ends like this.

Downloading pyparsing-2.1.10-py2.py3-none-any.whl (56kB)
100% |████████████████████████████████| 61kB 2.1MB/s
Installing collected packages: ipython-genutils, decorator, traitlets, appnope, appdirs, pyparsing, packaging, setuptools, ptyprocess, pexpect, simplegeneric, wcwidth, prompt-toolkit, pickleshare, ipython, jupyter-core, pyzmq, jupyter-client, tornado, ipykernel, qtconsole, terminado, nbformat, entrypoints, mistune, pandocfilters, testpath, bleach, nbconvert, notebook, widgetsnbextension, ipywidgets, jupyter-console, jupyter
Found existing installation: setuptools 18.2
Uninstalling setuptools-18.2:
Successfully uninstalled setuptools-18.2
Running setup.py install for simplegeneric ... done
Running setup.py install for tornado ... done
Running setup.py install for terminado ... done
Running setup.py install for pandocfilters ... done
Successfully installed appdirs-1.4.0 appnope-0.1.0 bleach-1.5.0 decorator-4.0.11 entrypoints-0.2.2 ipykernel-4.5.2 ipython-5.2.2 ipython-genutils-0.1.0 ipywidgets-5.2.2 jupyter-1.0.0 jupyter-client-4.4.0 jupyter-console-5.1.0 jupyter-core-4.2.1 mistune-0.7.3 nbconvert-5.1.1 nbformat-4.2.0 notebook-4.4.1 packaging-16.8 pandocfilters-1.4.1 pexpect-4.2.1 pickleshare-0.7.4 prompt-toolkit-1.0.13 ptyprocess-0.5.1 pyparsing-2.1.10 pyzmq-16.0.2 qtconsole-4.2.1 setuptools-34.1.1 simplegeneric-0.8.1 terminado-0.6 testpath-0.3 tornado-4.4.2 traitlets-4.3.1 wcwidth-0.1.7 widgetsnbextension-1.2.6



Now you have Jupyter.

What just happened? You installed a large number of Python packages. All of those packages were required to run Jupyter. You can see jupyter-1.0.0 hidden in the list of packages that were installed.

Starting Jupyter

The Jupyter tool does a number of things. We're going to use the notebook feature to save some code that we can rerun. We can also save notes and do other things in the notebook. When you start the notebook, two things will happen.
  1. The terminal window will start displaying the Jupyter console log.
  2. A browser will pop open showing the local Jupyter notebook home page.
Here's what the console log looks like:

MacBookPro-SLott:Python2v3 slott$ jupyter notebook
[I 08:51:56.746 NotebookApp] Writing notebook server cookie secret to /Users/slott/Library/Jupyter/runtime/notebook_cookie_secret
[I 08:51:56.778 NotebookApp] Serving notebooks from local directory: /Users/slott/Documents/Writing/Python/Python2v3
[I 08:51:56.778 NotebookApp] 0 active kernels
[I 08:51:56.778 NotebookApp] The Jupyter Notebook is running at: http://localhost:8888/?token=2eb40fbb96d7788dd05a49600b1fca4e07cd9c8fe931f9af
[I 08:51:56.778 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).

You can glance at it to see that things are still working. The "Use Control-C to stop this server" is a reminder of how to stop things when you're done.

Your Jupyter home page will have this logo in the corner. Things are working.


You can pick files from this list and edit them. And -- important for what we're going to do -- you can create new notebooks.

On the right side of the web page, you'll see this:


You can create files and folders. That's cool. You can create an interactive terminal session. That's also cool. More important, though, is that you can create a new Python 3 notebook. That's were we'll wrangle with CSV files.

"But Wait," you say. "What directory is it using for this?"

The jupyter server is using the current working directory when you started it.

If you don't like this choice, you have two alternatives.
  • Stop Jupyter. Change directory to your preferred place to keep files. Restart Jupyter.
  • Stop Jupyter. Include the --notebook-dir=your_working_directory option.
The second choice looks like this:

MacBookPro-SLott:Python2v3 slott$ jupyter notebook --notebook-dir=~/Documents/Writing/Python
[I 11:15:42.964 NotebookApp] Serving notebooks from local directory: /Users/slott/Documents/Writing/Python

Now you know where your files are going to be. You can make sure that your .CSV files are here. You will have your ".ipynb" files here also. Lots of goodness in the right place.

Using Jupyter

Here's what a notebook looks like. Here's a screen shot.


First. The notebook was originally called "untitled" which seemed less than ideal. So I clicked on the name and changed it to "csv_wrestling".

Second. There was a box labeled In [ ]:. I entered some Python code to the right of this label. Then I clicked the run cell icon. (It's similar to this emoji -- ⏯ -- but not exactly.)

The In [ ]: changed to In [1]:. A second box appeared labeled Out [1]:. This annotates our dialog with Python: each input and Python's response is tracked. It's pretty nice. We can change our input and rerun the cell. We can add new cells with different things to run. We can run all of the cells. Lots of things are possible based on this idea of a cell with our command. When we run a cell, Python processes the command and we see the output.

For many expressions, a value is displayed. For some expressions, however, nothing is displayed. For complete statements, nothing is displayed. This means we'll often have to throw the name of a variable in to see the value of that variable.


The rest of the notebook is published separately. It's awkward to work in Blogger when describing a Jupyter notebook. It's much easier to simply post the notebook in GitHub.

The notebook is published here: slott56/introduction-python-csv. You can follow the notebook to build your own copy which reads and writes CSV files.


21 Feb 2017 7:37am GMT

S. Lott: Intro to Python CSV Processing for Actual Beginners

I've written a lot about CSV processing. Here are some examples http://slott-softwarearchitect.blogspot.com/search/label/csv.

It crops up in my books. A lot.

In all cases, though, I make the implicit assumption that my readers already know a lot of Python. This is a disservice to anyone who's getting started.

Getting Started

You'll need Python 3.6. Nothing else will do if you're starting out.

Go to https://www.continuum.io/downloads and get Python 3.6. You can get the small "miniconda" version to start with. It has some of what you'll need to hack around with CSV files. The full Anaconda version contains a mountain of cool stuff, but it's a big download.

Once you have Python installed, what next? To be sure things are running do this:

  1. Find a command line prompt (terminal window, cmd.exe, whatever it's called on your OS.)
  2. Enter python3.6 (or just python in Windows.)
  3. If Anaconda installed everything properly, you'll have an interaction that looks like this:


MacBookPro-SLott:Python2v3 slott$ python3.5
Python 3.5.1 (v3.5.1:37a07cee5969, Dec 5 2015, 21:12:44)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>

More-or-less. (Yes, the example shows 3.5.1 even though I said you should get 3.6. As soon as the Lynda.com course drops, I'll upgrade. The differences between 3.5 and 3.6 are almost invisible.)

Here's your first interaction.

>>> 355/113
3.1415929203539825

Yep. Python did math. Stuff is happening.

Here's some more.

>>> exit
Use exit() or Ctrl-D (i.e. EOF) to exit
>>> exit()

Okay. That was fun. But it's not data wrangling. When do we get to the good stuff?

To Script or Not To Script

We have two paths when it comes to scripting. You can write script files and run them. This is pretty normal application development stuff. It works well.

Or.

You can use a Jupyter Notebook. This isn't exactly a script. But. You can use it like a script. It's a good place to start building some code that's useful. You can rerun some (or all) of the notebook to make it script-like.

If you downloaded Anaconda, you have Jupyter. Done. Skip over the next part on installing Jupyter.

Installing Jupyter

If you did not download the full Anaconda -- perhaps because you used the miniconda -- you'll need to add Jupyter. You can use the command conda install jupyter for this.

Another choice is to use the PIP program to install jupyter. The net effect is the same. It starts like this


MacBookPro-SLott:Python2v3 slott$ pip3 install jupyter
Collecting jupyter
Downloading jupyter-1.0.0-py2.py3-none-any.whl
Collecting ipykernel (from jupyter)
Downloading ipykernel-4.5.2-py2.py3-none-any.whl (98kB)

100% |████████████████████████████████| 102kB 1.3MB/s

It ends like this.

Downloading pyparsing-2.1.10-py2.py3-none-any.whl (56kB)
100% |████████████████████████████████| 61kB 2.1MB/s
Installing collected packages: ipython-genutils, decorator, traitlets, appnope, appdirs, pyparsing, packaging, setuptools, ptyprocess, pexpect, simplegeneric, wcwidth, prompt-toolkit, pickleshare, ipython, jupyter-core, pyzmq, jupyter-client, tornado, ipykernel, qtconsole, terminado, nbformat, entrypoints, mistune, pandocfilters, testpath, bleach, nbconvert, notebook, widgetsnbextension, ipywidgets, jupyter-console, jupyter
Found existing installation: setuptools 18.2
Uninstalling setuptools-18.2:
Successfully uninstalled setuptools-18.2
Running setup.py install for simplegeneric ... done
Running setup.py install for tornado ... done
Running setup.py install for terminado ... done
Running setup.py install for pandocfilters ... done
Successfully installed appdirs-1.4.0 appnope-0.1.0 bleach-1.5.0 decorator-4.0.11 entrypoints-0.2.2 ipykernel-4.5.2 ipython-5.2.2 ipython-genutils-0.1.0 ipywidgets-5.2.2 jupyter-1.0.0 jupyter-client-4.4.0 jupyter-console-5.1.0 jupyter-core-4.2.1 mistune-0.7.3 nbconvert-5.1.1 nbformat-4.2.0 notebook-4.4.1 packaging-16.8 pandocfilters-1.4.1 pexpect-4.2.1 pickleshare-0.7.4 prompt-toolkit-1.0.13 ptyprocess-0.5.1 pyparsing-2.1.10 pyzmq-16.0.2 qtconsole-4.2.1 setuptools-34.1.1 simplegeneric-0.8.1 terminado-0.6 testpath-0.3 tornado-4.4.2 traitlets-4.3.1 wcwidth-0.1.7 widgetsnbextension-1.2.6



Now you have Jupyter.

What just happened? You installed a large number of Python packages. All of those packages were required to run Jupyter. You can see jupyter-1.0.0 hidden in the list of packages that were installed.

Starting Jupyter

The Jupyter tool does a number of things. We're going to use the notebook feature to save some code that we can rerun. We can also save notes and do other things in the notebook. When you start the notebook, two things will happen.
  1. The terminal window will start displaying the Jupyter console log.
  2. A browser will pop open showing the local Jupyter notebook home page.
Here's what the console log looks like:

MacBookPro-SLott:Python2v3 slott$ jupyter notebook
[I 08:51:56.746 NotebookApp] Writing notebook server cookie secret to /Users/slott/Library/Jupyter/runtime/notebook_cookie_secret
[I 08:51:56.778 NotebookApp] Serving notebooks from local directory: /Users/slott/Documents/Writing/Python/Python2v3
[I 08:51:56.778 NotebookApp] 0 active kernels
[I 08:51:56.778 NotebookApp] The Jupyter Notebook is running at: http://localhost:8888/?token=2eb40fbb96d7788dd05a49600b1fca4e07cd9c8fe931f9af
[I 08:51:56.778 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).

You can glance at it to see that things are still working. The "Use Control-C to stop this server" is a reminder of how to stop things when you're done.

Your Jupyter home page will have this logo in the corner. Things are working.


You can pick files from this list and edit them. And -- important for what we're going to do -- you can create new notebooks.

On the right side of the web page, you'll see this:


You can create files and folders. That's cool. You can create an interactive terminal session. That's also cool. More important, though, is that you can create a new Python 3 notebook. That's were we'll wrangle with CSV files.

"But Wait," you say. "What directory is it using for this?"

The jupyter server is using the current working directory when you started it.

If you don't like this choice, you have two alternatives.
  • Stop Jupyter. Change directory to your preferred place to keep files. Restart Jupyter.
  • Stop Jupyter. Include the --notebook-dir=your_working_directory option.
The second choice looks like this:

MacBookPro-SLott:Python2v3 slott$ jupyter notebook --notebook-dir=~/Documents/Writing/Python
[I 11:15:42.964 NotebookApp] Serving notebooks from local directory: /Users/slott/Documents/Writing/Python

Now you know where your files are going to be. You can make sure that your .CSV files are here. You will have your ".ipynb" files here also. Lots of goodness in the right place.

Using Jupyter

Here's what a notebook looks like. Here's a screen shot.


First. The notebook was originally called "untitled" which seemed less than ideal. So I clicked on the name and changed it to "csv_wrestling".

Second. There was a box labeled In [ ]:. I entered some Python code to the right of this label. Then I clicked the run cell icon. (It's similar to this emoji -- ⏯ -- but not exactly.)

The In [ ]: changed to In [1]:. A second box appeared labeled Out [1]:. This annotates our dialog with Python: each input and Python's response is tracked. It's pretty nice. We can change our input and rerun the cell. We can add new cells with different things to run. We can run all of the cells. Lots of things are possible based on this idea of a cell with our command. When we run a cell, Python processes the command and we see the output.

For many expressions, a value is displayed. For some expressions, however, nothing is displayed. For complete statements, nothing is displayed. This means we'll often have to throw the name of a variable in to see the value of that variable.


The rest of the notebook is published separately. It's awkward to work in Blogger when describing a Jupyter notebook. It's much easier to simply post the notebook in GitHub.

The notebook is published here: slott56/introduction-python-csv. You can follow the notebook to build your own copy which reads and writes CSV files.


21 Feb 2017 7:37am GMT

Gocept Weblog: Zope at the turnpike of the Python 3 wonderland

A little tale

Once upon the time there was an earl named Zope II. He lived happily in a land called Python 2. Since some years there where rumours that a huge disaster would hit the country. The people ironically used to call it "sunset". Prophets arose and said that 2020 would be the year where this disaster would finally happen.

Zope II got anxious about his future and the future of his descendants. But there were some brave people who liked Zope II and told him of the Python 3 wonderland. A land of eternal joy and happiness without "sunset" disasters and with no problems at all. It seemed like a dream to Zope II - too nice to be true?

After some research it became clear that the Python 3 wonderland was real - not completely as advertised by the people but nice to live in. So Zope II set the goal to settle down in the Python 3 wonderland before the "sunset" disaster would happen.

But this was not as easy as it seemed to be. The immigrant authority told Zope II that he is "not compatible" with the Python 3 wonderland and that he needs "to be ported" to be able to breath the special Python 3 air.

As if this was not enough: As an earl Zope II was not able to migrate without his staff, the many people he relied on for his daily procedures. Some of them have been found to be "ported" already thus they were already "compatible" for the new country. But there was one old, but very important servant of Zope II named RestrictedPython. Zope II could not live without him - but he got told that he never will be "compatible". The authority even required a "complete rewrite".

The Python 3 wonderland seemed so near but is was so difficult to reach. But there where so many people who helped Zope II and encouraged him not to give up. Eventually it seemed to be possible for Zope II to get beyond the turnpike into the wonderful land called Python 3.

Back in reality

We are the people who like Zope II and help him to get past the turnpike into Python 3. Since the Alpine City Sprint earlier this month RestrictedPython is no longer a blocker for other packages depending on it to be ported.

Come, join us porting the remaining dependencies of Zope and Zope itself to Python 3. There will be a Zope 2 Resurrection Sprint in May this year in Halle (Saale), Germany at gocept. Join us on site or remote.


21 Feb 2017 6:46am GMT

Gocept Weblog: Zope at the turnpike of the Python 3 wonderland

A little tale

Once upon the time there was an earl named Zope II. He lived happily in a land called Python 2. Since some years there where rumours that a huge disaster would hit the country. The people ironically used to call it "sunset". Prophets arose and said that 2020 would be the year where this disaster would finally happen.

Zope II got anxious about his future and the future of his descendants. But there were some brave people who liked Zope II and told him of the Python 3 wonderland. A land of eternal joy and happiness without "sunset" disasters and with no problems at all. It seemed like a dream to Zope II - too nice to be true?

After some research it became clear that the Python 3 wonderland was real - not completely as advertised by the people but nice to live in. So Zope II set the goal to settle down in the Python 3 wonderland before the "sunset" disaster would happen.

But this was not as easy as it seemed to be. The immigrant authority told Zope II that he is "not compatible" with the Python 3 wonderland and that he needs "to be ported" to be able to breath the special Python 3 air.

As if this was not enough: As an earl Zope II was not able to migrate without his staff, the many people he relied on for his daily procedures. Some of them have been found to be "ported" already thus they were already "compatible" for the new country. But there was one old, but very important servant of Zope II named RestrictedPython. Zope II could not live without him - but he got told that he never will be "compatible". The authority even required a "complete rewrite".

The Python 3 wonderland seemed so near but is was so difficult to reach. But there where so many people who helped Zope II and encouraged him not to give up. Eventually it seemed to be possible for Zope II to get beyond the turnpike into the wonderful land called Python 3.

Back in reality

We are the people who like Zope II and help him to get past the turnpike into Python 3. Since the Alpine City Sprint earlier this month RestrictedPython is no longer a blocker for other packages depending on it to be ported.

Come, join us porting the remaining dependencies of Zope and Zope itself to Python 3. There will be a Zope 2 Resurrection Sprint in May this year in Halle (Saale), Germany at gocept. Join us on site or remote.


21 Feb 2017 6:46am GMT

Daniel Bader: Sublime Text Settings for Writing Clean Python

Sublime Text Settings for Writing Clean Python

How to write beautiful and clean Python by tweaking your Sublime Text settings so that they make it easier to adhere to the PEP 8 style guide recommendations.

There are a few settings you can change to make it easier for you to write PEP 8 compliant Python with Sublime Text 3. PEP 8 is the most common Python style guide and widely used in the Python community.

The tweaks I describe in this article mainly deal with getting the placement of whitespace correct so that you don't have to manage this (boring) aspect yourself.

I'll also show you how to get visual indicators for the maximum allowed line-lengths in your editor window so that your lines can be concise and beautifully PEP 8 compliant-just like Guido wants them to be 🙂

Optional: Opening Sublime's Syntax-Specific Settings for Python

The settings we're changing now are specific to Python. Feel free to place them in your User settings, that will work just fine. However if you'd like to apply some or all of the settings in this chapter only to Python code then here's how you can do that:

  1. Open a Python file in Sublime Text (or create a new file, open the Command Palette and execute the "Set Syntax: Python" command)
  2. Click on Sublime Text → Preferences → Settings - More → Syntax Specific - User to open your Python-specific user settings. Make sure this opens a new editor tab called Python.sublime-settings. That's the one you want!

If you'd like like to learn more about how Sublime Text's preferences system works, then check out this tutorial I wrote.

Better Whitespace Handling

The following changes you can make to your (Syntax Specific) User Settings will help you keep the whitespace in your Python code clean and consistent:

"tab_size": 4,
"translate_tabs_to_spaces": true,
"trim_trailing_white_space_on_save": true,
"ensure_newline_at_eof_on_save": true

A tab_size of 4 is the general recommendation for writing Python. You'll also want to enable translate_tabs_to_spaces to ensure that you don't have a mixture of tabs and spaces in your Python files, which should be avoided.

The trim_trailing_white_space_on_save option will remove superfluous whitespace at the end of lines or on empty lines. I highly recommend enabling this because it can save headaches and merge conflicts when working with Git and other forms of source control.

PEP 8 recommends that Python files should end with a blank line to ensure that POSIX tools can process the file correctly. If you want to never have to worry about this again then turn on the ensure_newline_at_eof_on_save setting as this will make sure that your Python files end with a newline automatically.

Enable PEP 8 Line-Length Indicators

Another setting that's really handy for writing PEP 8 compliant code is the "rulers" feature. It enables visual indicators in the editor area that show you the preferred maximum line length.

You can enable several rulers with different line lengths at the same time. This helps you follow the PEP 8 recommendations of limiting your docstrings to 72 characters and limiting all other lines to 79 characters.

Here's how to set up the rulers feature for Python development. Open your (Syntax Specific) User Settings and add the following setting:

"rulers": [
    72,
    79
]

This will add two line-length indicators-one at 72 characters for docstrings, and one at 79 characters for regular lines. You can see them in the screenshot as vertical lines on the right-hand side of the editor area.

Turn On Word Wrapping

I like enabling Sublime's word-wrapping feature when I'm writing Python. Most of my projects follow the PEP 8 style guide and therefore use a maximum line length of 79 characters.

I don't want to get into an argument why that's a good idea or not-but one benefit I found from limiting the lengths of my lines is that I can comfortably fit several files on my screen at once using Sublime's "split layouts" feature.

This is especially useful if you're following a test-heavy development process because you can see and edit the test and the production code at the same time.

Of course sometimes you'll encounter a file that uses line-lengths above the 79 characters recommended by PEP 8. If I'm using split layouts with multiple editor panes at the same time it impacts my productivity if I have to scroll around horizontally.

The idea is to see all of the code at once. So, how can we fix that?

The best way I found to handle this is to enable Sublime's word-wrap feature. This will visually break apart lines that are longer than the maximum line length. It might look a little odd sometimes but it's still light years better than having to scroll around horizontally.

Here's how you enable word wrapping. Open your (Syntax Specific) User Settings and add (or modify) the following options:

"word_wrap": true,
"wrap_width": 80

I'm setting the wrap_width to 80 which is one character past the 79 characters recommended by PEP 8. Therefore any line that goes beyond the PEP 8 recommendations will get wrapped.

21 Feb 2017 12:00am GMT

Daniel Bader: Sublime Text Settings for Writing Clean Python

Sublime Text Settings for Writing Clean Python

How to write beautiful and clean Python by tweaking your Sublime Text settings so that they make it easier to adhere to the PEP 8 style guide recommendations.

There are a few settings you can change to make it easier for you to write PEP 8 compliant Python with Sublime Text 3. PEP 8 is the most common Python style guide and widely used in the Python community.

The tweaks I describe in this article mainly deal with getting the placement of whitespace correct so that you don't have to manage this (boring) aspect yourself.

I'll also show you how to get visual indicators for the maximum allowed line-lengths in your editor window so that your lines can be concise and beautifully PEP 8 compliant-just like Guido wants them to be 🙂

Optional: Opening Sublime's Syntax-Specific Settings for Python

The settings we're changing now are specific to Python. Feel free to place them in your User settings, that will work just fine. However if you'd like to apply some or all of the settings in this chapter only to Python code then here's how you can do that:

  1. Open a Python file in Sublime Text (or create a new file, open the Command Palette and execute the "Set Syntax: Python" command)
  2. Click on Sublime Text → Preferences → Settings - More → Syntax Specific - User to open your Python-specific user settings. Make sure this opens a new editor tab called Python.sublime-settings. That's the one you want!

If you'd like like to learn more about how Sublime Text's preferences system works, then check out this tutorial I wrote.

Better Whitespace Handling

The following changes you can make to your (Syntax Specific) User Settings will help you keep the whitespace in your Python code clean and consistent:

"tab_size": 4,
"translate_tabs_to_spaces": true,
"trim_trailing_white_space_on_save": true,
"ensure_newline_at_eof_on_save": true

A tab_size of 4 is the general recommendation for writing Python. You'll also want to enable translate_tabs_to_spaces to ensure that you don't have a mixture of tabs and spaces in your Python files, which should be avoided.

The trim_trailing_white_space_on_save option will remove superfluous whitespace at the end of lines or on empty lines. I highly recommend enabling this because it can save headaches and merge conflicts when working with Git and other forms of source control.

PEP 8 recommends that Python files should end with a blank line to ensure that POSIX tools can process the file correctly. If you want to never have to worry about this again then turn on the ensure_newline_at_eof_on_save setting as this will make sure that your Python files end with a newline automatically.

Enable PEP 8 Line-Length Indicators

Another setting that's really handy for writing PEP 8 compliant code is the "rulers" feature. It enables visual indicators in the editor area that show you the preferred maximum line length.

You can enable several rulers with different line lengths at the same time. This helps you follow the PEP 8 recommendations of limiting your docstrings to 72 characters and limiting all other lines to 79 characters.

Here's how to set up the rulers feature for Python development. Open your (Syntax Specific) User Settings and add the following setting:

"rulers": [
    72,
    79
]

This will add two line-length indicators-one at 72 characters for docstrings, and one at 79 characters for regular lines. You can see them in the screenshot as vertical lines on the right-hand side of the editor area.

Turn On Word Wrapping

I like enabling Sublime's word-wrapping feature when I'm writing Python. Most of my projects follow the PEP 8 style guide and therefore use a maximum line length of 79 characters.

I don't want to get into an argument why that's a good idea or not-but one benefit I found from limiting the lengths of my lines is that I can comfortably fit several files on my screen at once using Sublime's "split layouts" feature.

This is especially useful if you're following a test-heavy development process because you can see and edit the test and the production code at the same time.

Of course sometimes you'll encounter a file that uses line-lengths above the 79 characters recommended by PEP 8. If I'm using split layouts with multiple editor panes at the same time it impacts my productivity if I have to scroll around horizontally.

The idea is to see all of the code at once. So, how can we fix that?

The best way I found to handle this is to enable Sublime's word-wrap feature. This will visually break apart lines that are longer than the maximum line length. It might look a little odd sometimes but it's still light years better than having to scroll around horizontally.

Here's how you enable word wrapping. Open your (Syntax Specific) User Settings and add (or modify) the following options:

"word_wrap": true,
"wrap_width": 80

I'm setting the wrap_width to 80 which is one character past the 79 characters recommended by PEP 8. Therefore any line that goes beyond the PEP 8 recommendations will get wrapped.

21 Feb 2017 12:00am GMT

20 Feb 2017

feedPlanet Python

Django Weblog: Django 1.11 beta 1 released

Django 1.11 beta 1 is an opportunity for you to try out the medley of new features in Django 1.11.

Only bugs in new features and regressions from earlier versions of Django will be fixed between now and 1.11 final (also, translations will be updated following the "string freeze" when the release candidate is issued). The current release schedule calls for a release candidate about a month from now with the final release to follow about two weeks after that around April 1. We'll only be able to keep this schedule if we get early and often testing from the community. Updates on the release schedule schedule are available on the django-developers mailing list.

As with all alpha and beta packages, this is not for production use. But if you'd like to take some of the new features for a spin, or to help find and fix bugs (which should be reported to the issue tracker), you can grab a copy of the beta package from our downloads page or on PyPI.

The PGP key ID used for this release is Tim Graham: 1E8ABDC773EDE252.

20 Feb 2017 11:27pm GMT

Django Weblog: Django 1.11 beta 1 released

Django 1.11 beta 1 is an opportunity for you to try out the medley of new features in Django 1.11.

Only bugs in new features and regressions from earlier versions of Django will be fixed between now and 1.11 final (also, translations will be updated following the "string freeze" when the release candidate is issued). The current release schedule calls for a release candidate about a month from now with the final release to follow about two weeks after that around April 1. We'll only be able to keep this schedule if we get early and often testing from the community. Updates on the release schedule schedule are available on the django-developers mailing list.

As with all alpha and beta packages, this is not for production use. But if you'd like to take some of the new features for a spin, or to help find and fix bugs (which should be reported to the issue tracker), you can grab a copy of the beta package from our downloads page or on PyPI.

The PGP key ID used for this release is Tim Graham: 1E8ABDC773EDE252.

20 Feb 2017 11:27pm GMT

Coding Diet: Flask and Pytest coverage

I have written before about Flask and obtaining test coverage results here and with an update here. This is pretty trivial if you're writing unit tests that directly call the application, but if you actually want to write tests which animate a browser, for example with selenium, then it's a little more complicated, because the browser/test code has to run concurrently with the server code.

Previously I would have the Flask server run in a separate process and run 'coverage' over that process. This was slightly unsatisfying, partly because you sometimes want coverage analysis of your actual tests. Test suites, just like application code, can grow in size with many utility functions and imports etc. which may eventually end up not actually being used. So it is good to know that you're not needlessly maintaining some test code which is not actually invoked.

We could probably get around this restriction by running coverage in both the server process and the test-runner's process and combine the results (or simply view them separately). However, this was unsatisfying simply because it felt like something that should not be necessary. Today I spent a bit of time setting up the scheme to test a Flask application without the need for a separate process.

I solved this now, by not using Flask's included Werkzeug server and instead using the WSGI server included in the standard-library wsgiref.simple_server module. Here is, a minimal example:

import flask

class Configuration(object):
    TEST_SERVER_PORT = 5001

application = flask.Flask(__name__)
application.config.from_object(Configuration)


@application.route("/")
def frontpage():
    if False:
        pass # Should not be covered
    else:
        return 'I am the lizard queen!' # Should be in coverage.



# Now for some testing.
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
import pytest
# Currently just used for the temporary hack to quit the phantomjs process
# see below in quit_driver.
import signal

import threading

import wsgiref.simple_server

class ServerThread(threading.Thread):
    def setup(self):
        application.config['TESTING'] = True
        self.port = application.config['TEST_SERVER_PORT']

    def run(self):
        self.httpd = wsgiref.simple_server.make_server('localhost', self.port, application)
        self.httpd.serve_forever()

    def stop(self):
        self.httpd.shutdown()

class BrowserClient(object):
    """Interacts with a running instance of the application via animating a
    browser."""
    def __init__(self, browser="phantom"):
        driver_class = {
            'phantom': webdriver.PhantomJS,
            'chrome': webdriver.Chrome,
            'firefox': webdriver.Firefox
            }.get(browser)
        self.driver = driver_class()
        self.driver.set_window_size(1200, 760)


    def finalise(self):
        self.driver.close()
        # A bit of hack this but currently there is some bug I believe in
        # the phantomjs code rather than selenium, but in any case it means that
        # the phantomjs process is not being killed so we do so explicitly here
        # for the time being. Obviously we can remove this when that bug is
        # fixed. See: https://github.com/SeleniumHQ/selenium/issues/767
        self.driver.service.process.send_signal(signal.SIGTERM)
        self.driver.quit()


    def log_current_page(self, message=None, output_basename=None):
        content = self.driver.page_source
        # This is frequently what we really care about so I also output it
        # here as well to make it convenient to inspect (with highlighting).
        basename = output_basename or 'log-current-page'
        file_name = basename + '.html'
        with open(file_name, 'w') as outfile:
            if message:
                outfile.write("<!-- {} --> ".format(message))
            outfile.write(content)
        filename = basename + '.png'
        self.driver.save_screenshot(filename)

def make_url(endpoint, **kwargs):
    with application.app_context():
        return flask.url_for(endpoint, **kwargs)

# TODO: Ultimately we'll need a fixture so that we can have multiple
# test functions that all use the same server thread and possibly the same
# browser client.
def test_server():
    server_thread = ServerThread()
    server_thread.setup()
    server_thread.start()

    client = BrowserClient()
    driver = client.driver

    try:
        port = application.config['TEST_SERVER_PORT']
        application.config['SERVER_NAME'] = 'localhost:{}'.format(port)

        driver.get(make_url('frontpage'))
        assert 'I am the lizard queen!' in driver.page_source

    finally:
        client.finalise()
        server_thread.stop()
        server_thread.join()

To run this you will of course need flask as well as pytest, pytest-cov, and selenium:

$ pip install flask pytest pytest-cov selenium

In addition you will need the phantomjs to run:

$ npm install phantomjs
$ export PATH=$PATH:./node_modules/.bin/

Then to run it, the command is:

$ py.test --cov=./ app.py
$ coverage html

The coverage html is of course optional and only if you wish to view the results in friendly HTML format.

Notes

I've not used this extensively myself yet, so there may be some problems when using a more interesting flask application.

Don't put your virtual environment directory in the same directory as app.py because in that case it will perform coverage analysis over the standard library and dependencies.

In a real application you will probably want to make a pytest fixture out of the server thread and browser client. So that you can use each for multiple separate test functions. Essentially your test function should just be the part inside the try clause.

I have not used the log_current_page method but I frequently find it quite useful so included it here nonetheless.

20 Feb 2017 4:27pm GMT

Coding Diet: Flask and Pytest coverage

I have written before about Flask and obtaining test coverage results here and with an update here. This is pretty trivial if you're writing unit tests that directly call the application, but if you actually want to write tests which animate a browser, for example with selenium, then it's a little more complicated, because the browser/test code has to run concurrently with the server code.

Previously I would have the Flask server run in a separate process and run 'coverage' over that process. This was slightly unsatisfying, partly because you sometimes want coverage analysis of your actual tests. Test suites, just like application code, can grow in size with many utility functions and imports etc. which may eventually end up not actually being used. So it is good to know that you're not needlessly maintaining some test code which is not actually invoked.

We could probably get around this restriction by running coverage in both the server process and the test-runner's process and combine the results (or simply view them separately). However, this was unsatisfying simply because it felt like something that should not be necessary. Today I spent a bit of time setting up the scheme to test a Flask application without the need for a separate process.

I solved this now, by not using Flask's included Werkzeug server and instead using the WSGI server included in the standard-library wsgiref.simple_server module. Here is, a minimal example:

import flask

class Configuration(object):
    TEST_SERVER_PORT = 5001

application = flask.Flask(__name__)
application.config.from_object(Configuration)


@application.route("/")
def frontpage():
    if False:
        pass # Should not be covered
    else:
        return 'I am the lizard queen!' # Should be in coverage.



# Now for some testing.
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
import pytest
# Currently just used for the temporary hack to quit the phantomjs process
# see below in quit_driver.
import signal

import threading

import wsgiref.simple_server

class ServerThread(threading.Thread):
    def setup(self):
        application.config['TESTING'] = True
        self.port = application.config['TEST_SERVER_PORT']

    def run(self):
        self.httpd = wsgiref.simple_server.make_server('localhost', self.port, application)
        self.httpd.serve_forever()

    def stop(self):
        self.httpd.shutdown()

class BrowserClient(object):
    """Interacts with a running instance of the application via animating a
    browser."""
    def __init__(self, browser="phantom"):
        driver_class = {
            'phantom': webdriver.PhantomJS,
            'chrome': webdriver.Chrome,
            'firefox': webdriver.Firefox
            }.get(browser)
        self.driver = driver_class()
        self.driver.set_window_size(1200, 760)


    def finalise(self):
        self.driver.close()
        # A bit of hack this but currently there is some bug I believe in
        # the phantomjs code rather than selenium, but in any case it means that
        # the phantomjs process is not being killed so we do so explicitly here
        # for the time being. Obviously we can remove this when that bug is
        # fixed. See: https://github.com/SeleniumHQ/selenium/issues/767
        self.driver.service.process.send_signal(signal.SIGTERM)
        self.driver.quit()


    def log_current_page(self, message=None, output_basename=None):
        content = self.driver.page_source
        # This is frequently what we really care about so I also output it
        # here as well to make it convenient to inspect (with highlighting).
        basename = output_basename or 'log-current-page'
        file_name = basename + '.html'
        with open(file_name, 'w') as outfile:
            if message:
                outfile.write("<!-- {} --> ".format(message))
            outfile.write(content)
        filename = basename + '.png'
        self.driver.save_screenshot(filename)

def make_url(endpoint, **kwargs):
    with application.app_context():
        return flask.url_for(endpoint, **kwargs)

# TODO: Ultimately we'll need a fixture so that we can have multiple
# test functions that all use the same server thread and possibly the same
# browser client.
def test_server():
    server_thread = ServerThread()
    server_thread.setup()
    server_thread.start()

    client = BrowserClient()
    driver = client.driver

    try:
        port = application.config['TEST_SERVER_PORT']
        application.config['SERVER_NAME'] = 'localhost:{}'.format(port)

        driver.get(make_url('frontpage'))
        assert 'I am the lizard queen!' in driver.page_source

    finally:
        client.finalise()
        server_thread.stop()
        server_thread.join()

To run this you will of course need flask as well as pytest, pytest-cov, and selenium:

$ pip install flask pytest pytest-cov selenium

In addition you will need the phantomjs to run:

$ npm install phantomjs
$ export PATH=$PATH:./node_modules/.bin/

Then to run it, the command is:

$ py.test --cov=./ app.py
$ coverage html

The coverage html is of course optional and only if you wish to view the results in friendly HTML format.

Notes

I've not used this extensively myself yet, so there may be some problems when using a more interesting flask application.

Don't put your virtual environment directory in the same directory as app.py because in that case it will perform coverage analysis over the standard library and dependencies.

In a real application you will probably want to make a pytest fixture out of the server thread and browser client. So that you can use each for multiple separate test functions. Essentially your test function should just be the part inside the try clause.

I have not used the log_current_page method but I frequently find it quite useful so included it here nonetheless.

20 Feb 2017 4:27pm GMT

GoDjango: Why You Should Pin Your Dependencies by My Mistakes

Have you ever been bitten by not pinning your dependencies in your django project? If not be glad, and come learn from my problems.

Pinning your dependencies is important to solve future unknown issues, better the devil you know and all that.

In this weeks video I talk about 3 times I had issues. They are either not pinning my dependencies, a weird edge case with pinning and python, and not really understanding what I was doing with pinned dependencies.

Why You Should Pin Your Dependencies

20 Feb 2017 4:00pm GMT

GoDjango: Why You Should Pin Your Dependencies by My Mistakes

Have you ever been bitten by not pinning your dependencies in your django project? If not be glad, and come learn from my problems.

Pinning your dependencies is important to solve future unknown issues, better the devil you know and all that.

In this weeks video I talk about 3 times I had issues. They are either not pinning my dependencies, a weird edge case with pinning and python, and not really understanding what I was doing with pinned dependencies.

Why You Should Pin Your Dependencies

20 Feb 2017 4:00pm GMT

Senthil Kumaran: CPython moved to Github

CPython project moved it's source code hosting from self-hosted mercurial repository, at hg.python.org to Git version control system hosted at Github. The new location of python project is http://www.github.com/python/cpython

This is second big version control migration that is happening since I got involved. The first one was when we moved from svn to mercurial. Branches were sub-optimal in svn and we used svn-merge.py to merge across branches. Mercurial helped there and everyone got used to a distributed version control written in python, mercurial. It was interesting for me personally to compare mercurial with the other popular DVCS, git.

Over the years, Github has become popular place for developers to host their projects. They have constantly improved their service offering. Many python developers, got used to git version control system and found it's utility value too.

Two years ago, it was decided that Python will move to Git and Github. The effort was led by Bret Cannon assisted by number of other developers and the migration happened on Feb 10, 2017.

I helped with the migration too and helped with providing tool around converting the hg to git, using the facilities available from hg-git mercurial plugin.

We made use hg-git, and wrote some conversions scripts that could get us to the converted repo as we wanted.

  1. https://github.com/orsenthil/cpython-hg-to-git
  2. https://bitbucket.org/orsenthil/hg-git

Now that the migration is done, we are getting ourselves familiar to the new workflow.

20 Feb 2017 3:09pm GMT

Senthil Kumaran: CPython moved to Github

CPython project moved it's source code hosting from self-hosted mercurial repository, at hg.python.org to Git version control system hosted at Github. The new location of python project is http://www.github.com/python/cpython

This is second big version control migration that is happening since I got involved. The first one was when we moved from svn to mercurial. Branches were sub-optimal in svn and we used svn-merge.py to merge across branches. Mercurial helped there and everyone got used to a distributed version control written in python, mercurial. It was interesting for me personally to compare mercurial with the other popular DVCS, git.

Over the years, Github has become popular place for developers to host their projects. They have constantly improved their service offering. Many python developers, got used to git version control system and found it's utility value too.

Two years ago, it was decided that Python will move to Git and Github. The effort was led by Bret Cannon assisted by number of other developers and the migration happened on Feb 10, 2017.

I helped with the migration too and helped with providing tool around converting the hg to git, using the facilities available from hg-git mercurial plugin.

We made use hg-git, and wrote some conversions scripts that could get us to the converted repo as we wanted.

  1. https://github.com/orsenthil/cpython-hg-to-git
  2. https://bitbucket.org/orsenthil/hg-git

Now that the migration is done, we are getting ourselves familiar to the new workflow.

20 Feb 2017 3:09pm GMT

Rene Dudfield: Is Type Tracing for Python useful? Some experiments.

Type Tracing - as a program runs you trace it and record the types of variables coming in and out of functions, and being assigned to variables.
Is Type Tracing useful for providing quality benefits, documentation benefits, porting benefits, and also speed benefits to real python programs?

Python is now a gradually typed language, meaning that you can gradually apply types and along with type inference statically check your code is correct. Once you have added types to everything, you can catch quite a lot of errors. For several years I've been using the new type checking tools that have been popping up in the python ecosystem. I've given talks to user groups about them, and also trained people to use them. I think a lot of people are using these tools without even realizing it. They see in their IDE warnings about type issues, and methods are automatically completed for them.

But I've always had some thoughts in the back of my head about recording types at runtime of a program in order to help the type inference out (and to avoid having to annotate them manually yourself).

Note, that this technique is a different, but related thing to what is done in a tracing jit compiler.
Some days ago I decided to try Type Tracing out... and I was quite surprised by the results.

I asked myself these questions.

  • Can I store the types coming in and out of python functions, and the types assigned to variables in order to be useful for other things based on tracing the running of a program? (Yes)
  • Can I "Type Trace" a complex program? (Yes, a flask+sqlalchemy app test suite runs)
  • Is porting python 2 code quicker by Type Tracing combined with static type checking, documentation generation, and test generation? (Yes, refactoring is safer with a type checker and no manually written tests)
  • Can I generate better documentation automatically with Type Tracing? (Yes, return and parameter types and example values helps understanding greatly)
  • Can I use the types for automatic property testing? (Yes, hypothesis does useful testing just knowing some types and a few examples... which we recorded with the tracer)
  • Can I use example capture for tests and docs, as well as the types? (Yes)
  • Can I generate faster compiled code automatically just using the recorded types and Cython (Yes).

Benefits from Type Tracing.

Below I try to show that the following benefits can be obtained by combining Type Tracing with other existing python tools.
  • Automate documentation generation, by providing types to the documentation tool, and by collecting some example inputs and outputs.
  • Automate some type annotation.
  • Automatically find bugs static type checking can not. Without full type inference, existing python static type checkers can not find many issues until the types are fully annotated. Type Tracing can provide those types.
  • Speed up Python2 porting process, by finding issues other tools can't. It can also speed things up by showing people types and example inputs. This can greatly help people understand large programs when documentation is limited.
  • Use for Ahead Of Time (AOT) compilation with Cython.
  • Help property testing tools to find simple bugs without manually setting properties.

Tools used to hack something together.

  • coverage (extended the coverage checker to record types as it goes)
  • mypy (static type checker for python)
  • Hypothesis (property testing... automated test generator)
  • Cython (a compiler for python code, and code with type annotations)
  • jedi (another python static type checker)
  • Sphinx (automatic documentation generator).
  • Cpython (the original C implementation of python)
More details below on the experiments.

Type Tracing using 'coverage'.

Originally I hacked up a set_trace script... and started going. But there really are so many corner cases. Also, I already run the "coverage" tool over the code base I'm working on.

I started with coverage.pytracer.PyTracer, since it's python. Coverage also comes with a faster tracer written in C. So far I'm just using the python one.

The plan later would be to perhaps use CoverageData. Which uses JSON, which means storing the type will be hard sometimes (eg, when they are dynamically generated). However, I think I'm happy to start with easy types. To start simple, I'll just record object types as strings with something like `repr(type(o)) if type(o) is not type else repr(o)`. Well, I'm not sure. So far, I'm happy with hacking everything into my fork of coverage, but to move it into production there is more work to be done. Things like multiprocess, multithreading all need to be handled.

Porting python 2 code with type tracing.

I first started porting code to python 3 in the betas... around 2007. Including some C API modules. I think I worked on one of the first single code base packages. Since then the tooling has gotten a lot better. Compatibility libraries exist (six), lots of people have figured out the dangerous points and documented them. Forward compatibility features were added into the python2.6 and 2.7, and 3.5 releases to make porting easier. However, it can still be hard.

Especially when Python 2 code bases often don't have many tests. Often zero tests. Also, there may be very little documentation, and the original developers have moved on.

But the code works, and it's been in production for a long time, and gets updates occasionally. Maybe it's not updated as often as it's needed because people are afraid of breaking things.

Steps to port to python 3 are usually these:

  1. Understand the code.
  2. Run the code in production (or on a copy of production data).
  3. With a debugger, look at what is coming in and out of functions.
  4. Write tests for everything.
  5. Write documentation.
  6. Run 2to3.
  7. Do lots of manual QA.
  8. Start refactoring.
  9. Repeat. Repeat manually writing tests, docs, and testing manually. Many times.
Remember that writing tests is usually harder than writing the code in the first place.

With type tracing helping to generate docs, types for the type checker, examples for human reading plus for the hypothesis property checker we get a lot more tools to help ensure quality.

A new way to port python2 code could be something like...
  1. Run program under Type Tracing, line/branch coverage, and example capture.
  2. Look at generated types, example inputs and outputs.
  3. Look at generated documentation.
  4. Gradually add type checking info with help of Type Tracing recorded types.
  5. Generate tests automatically with Type Tracing types, examples, and hypothesis automated property testing. Generate empty test stubs for things you still need to test.
  6. Once each module is fully typed, you can statically type check it.
  7. You can cross validate your type checked python code against your original code. Under the Type Tracer.
  8. Refactoring is easier with better docs, static type checks, tests, types for arguments and return values, and example inputs and outputs.
  9. Everything should be ported to work with the new forwards compatibility functionality in python2.7.
  10. Now with your various quality checks in place, you can start porting to python3. Note, you might not have needed to change any of the original code - only add types.
I would suggest the effort is about 1/5th of the normal time it takes to port things. Especially if you want to make sure the chance of introducing errors is very low.

Below are a couple of issues where Type Tracing can help over existing tools.

Integer divide issue.

Here I will show that the 2to3 conversion tool makes a bug with. Also, mypy does not detect a problem with the code.

# int_issue.py
def int_problem(x):
return x / 4
print(int_problem(3))

$ python2 int_issue.py
0

$ python3 int_issue.py
0.75

$ mypy --py2 int_issue.py
$ mypy int_issue.py

$ 2to3 int_issue.py
RefactoringTool: Skipping optional fixer: buffer
RefactoringTool: Skipping optional fixer: idioms
RefactoringTool: Skipping optional fixer: set_literal
RefactoringTool: Skipping optional fixer: ws_comma
RefactoringTool: Refactored int_issue.py
--- int_issue.py (original)
+++ int_issue.py (refactored)
@@ -3,4 +3,4 @@
def int_problem(x):
return x / 4

-print(int_problem(3))
+print((int_problem(3)))
RefactoringTool: Files that need to be modified:
RefactoringTool: int_issue.py


See how when run under python3 it gives a different result?

Can we fix it when Type Tracing adds types? (Yes)

So, how about if we run the program under type tracing, and record the input types coming in and out? See how it adds a python3 compatible comment about taking an int, and returning an int. This is so that mypy (and other type checkers) can see what it is supposed to take in.

def int_problem(x):
# type: (int) -> int
return x / 4
print(int_problem(3))

$ mypy int_issue.py
int_issue.py:5: error: Incompatible return value type (got "float", expected "int")

I'm happy that Yes, Type Tracing combined with mypy can detect this issue whereas mypy can not by itself.


Binary or Text file issue?

Another porting issue not caught by existing tools is trying to do the right thing when a python file is in binary mode or in text mode. If in binary, read() will return bytes, otherwise it might return text.

In theory this could be made to work, however at the time of writing, there is an open issue with "dependent types" or "Factory Pattern" functions in mypy. More information on this, and also a work around I wrote see this issue: https://github.com/python/mypy/issues/2337#issuecomment-280850128

In there I show that you can create your own io.open replacement that always returns one type. eg, open_rw(fname) instead of open(fname, 'rw').

Once you know that .read() will return bytes, then you also know that it can't call .format() in python 3. The solution is to use % string formatting on bytes, which is supported from python3.5 upwards.

x = f.read() # type: bytes

So the answer here is that mypy could likely solve this issue by itself in the future (once things are fully type annotated). But for now, it's good to see combining type tracing with mypy could help detect binary and text encoding issues much faster.

Generating Cython code with recorded types.

I wanted to see if this was possible. So I took the simple example from the cython documentation.
http://cython.readthedocs.io/en/latest/src/quickstart/cythonize.html

I used my type tracer to transform this python:
def f(x):
return x**2-x

def integrate_f(a, b, N):
s = 0
dx = (b-a)/N
for i in range(N):
s += f(a+i*dx)
return s * dx

Before you look below... take a guess what parameters a, b, and N are? Note, how there are no comments. Note how the variable names are single letter. Note, how there are no tests. There are no examples.

In [2]: %timeit integrate_f(10.4, 2.3, 17)
100000 loops, best of 3: 5.12 µs per loop



Into this Cython code with annotated types after running it through Type Tracing:

In [1]: %load_ext Cython

In [2]: %%cython
...: cdef double f(double x):
...: return x**2-x
...:
...: def integrate_f_c(double a, double b, int N):
...: """
...: :Example:
...: >>> integrate_f_c(10.4, 2.3, 17)
...: -342.34804152249137
...: """
...: cdef int i
...: cdef double s, dx
...: s = 0
...: dx = (b-a)/N
...: for i in range(N):
...: s += f(a+i*dx)
...: return s * dx
...:

In [3]: %timeit integrate_f_c(10.4, 2.3, 17)

10000000 loops, best of 3: 117 ns per loop

Normal python was 5200 nanoseconds. The cython compiled version is 117 nanoseconds. The result is 44x faster code, and we have all the types annotated, with an example. This helps you understand it a little better than before too.

This was a great result for me. It shows that yes combining Type Tracing with Cython can give improvements over Cython just by itself. Note, that Cython is not only for speeding up simple numeric code. It's also been used to speed up string based code, database access, network access, and game code.

So far I've made a simple mapping of python types to cython types. To make the code more useful would require quite a bit more effort. However, if you use it as a tool to help you write cython code yourself, then it's very useful to speed up that process.

The best cases so far are when it knows all of the types, all of the types have direct cython mappings, and it avoids calling python functions inside the function. In other words, 'pure' functions.

Cross validation for Cython and python versions?

In a video processing project I worked on there were implementations in C, and other assembly implementations of the same functions. A very simple way of testing is to run all the implementations and compare the results. If the C implementation gives the same results as the assembly implementations, then there's a pretty good chance they are correct.

In [1]: assert integrate_f_c(10.4, 2.3, 17) == integrate_f(10.4, 2.3, 17)

If we have a test runner, we can check if the inputs and outputs are the same between the compiled code and the non compiled code. That is, cross validate implementations against each other for correctness.

Property testing.

The most popular property testing framework Quickcheck from the Haskell world. However, python also has an implementation - Hypothesis. Rather than supply examples, as is usual with unit testing you tell it about properties which hold true.

Can we generate a hypothesis test automatically using just types collected with Type Tracing?


Below we can see some unit tests (example based testing), as well as some Hypothesis tests (property testing). They are for a function "always_add_something(x)", which always adds something to the number given in. As a property, we would say that "always_add_something(x) > x". That property will hold to be true for every value of x given x is an int.

Note, that the program is fully typed, and passes type checking with mypy. Also note that there is 100% test coverage if I remove the divide by zero error I inserted.

from hypothesis import given
import hypothesis.strategies

from bad_logic_issue import always_add_something, always_add_something_good

def test_always_add_something():# type: () -> None
#type: () -> None
assert always_add_something(5) >= 5
assert always_add_something(200) >= 200

def test_always_add_something_good():
#type: () -> None
assert always_add_something_good(5) >= 5
assert always_add_something_good(200) >= 200

@given(hypothesis.strategies.integers())
def test_always_add_something(x):
assert always_add_something(x) > x


# Here we test the good one.
@given(hypothesis.strategies.integers())
def test_always_add_something(x):
assert always_add_something_good(x) > x

Here are two implementations of the function. The first one is a contrived example in order to show two types of logic errors that are quite common. Even 30 year old code used by billions of people has been shown to have these errors. They're sort of hard to find with normal testing methods.

def always_add_something(x):
# type: (int) -> int
'''Silly function that is supposed to always add something to x.

But it doesn't always... even though we have
- 'complete' test coverage.
- fully typed
'''
r = x #type: int
if x > 0 and x < 10:
r += 20
elif x > 15 and x < 30:
r //= 0

elif x > 100:

r += 30

return r


def always_add_something_good(x):
# type: (int) -> int
'''This one always does add something.
'''
return x + 1


Now, hypothesis can find the errors when you write the property that the return value needs to be greater than the input. What about if we just use the types we record with Type Tracing to give hypothesis a chance to test? Hypothesis comes with a number of test strategies which generate many variations of a type. Eg, there is an "integers" strategy.

# Will it find an error just telling hypothesis that it takes an int as input?
@given(hypothesis.strategies.integers())
def test_always_add_something(x):
always_add_something(x)


It finds the divide by zero issue (when x is 16). However it does not find the other issue, because it still does not know that there is a problem. We haven't told it anything about the result always needing to be greater than the input.

bad_logic_issue.py:13: ZeroDivisionError
-------------------------------------------------------- Hypothesis --------------------------------------------------------
Falsifying example: test_always_add_something(x=16)

The result is that yes, it could find one issue automatically, without having to write any extra test code, just from Trace Typing.

For pure functions, it would be also useful to record some examples for unit test generation.

In conclusion.

I'm happy with the experiment overall. I think it shows it can be a fairly useful technique for making python programs more understandable, faster, and more correct. It can also help speed up porting old python2 code dramatically (especially when that code has limited documentation and tests).

I think the experiment also shows that combining existing python tools (coverage, mypy, Cython, and hypothesis) can give some interesting extra abilities without not too much extra effort. eg. I didn't need to write a robust tracing module, I didn't need to write a static type checker, or a python compiler. However, it would take some effort to turn these into robust general purpose tools. Currently what I have is a collection of fragile hacks, without support for many corner cases :)

For now I don't plan to work on this any more in the short term. (Unless of course someone wants to hire me to port some python2 code. Then I'll work on these tools again since it speeds things up quite a lot).

Any corrections or suggestions? Please leave a comment, or see you on twitter @renedudfield

20 Feb 2017 3:01pm GMT

Rene Dudfield: Is Type Tracing for Python useful? Some experiments.

Type Tracing - as a program runs you trace it and record the types of variables coming in and out of functions, and being assigned to variables.
Is Type Tracing useful for providing quality benefits, documentation benefits, porting benefits, and also speed benefits to real python programs?

Python is now a gradually typed language, meaning that you can gradually apply types and along with type inference statically check your code is correct. Once you have added types to everything, you can catch quite a lot of errors. For several years I've been using the new type checking tools that have been popping up in the python ecosystem. I've given talks to user groups about them, and also trained people to use them. I think a lot of people are using these tools without even realizing it. They see in their IDE warnings about type issues, and methods are automatically completed for them.

But I've always had some thoughts in the back of my head about recording types at runtime of a program in order to help the type inference out (and to avoid having to annotate them manually yourself).

Note, that this technique is a different, but related thing to what is done in a tracing jit compiler.
Some days ago I decided to try Type Tracing out... and I was quite surprised by the results.

I asked myself these questions.

  • Can I store the types coming in and out of python functions, and the types assigned to variables in order to be useful for other things based on tracing the running of a program? (Yes)
  • Can I "Type Trace" a complex program? (Yes, a flask+sqlalchemy app test suite runs)
  • Is porting python 2 code quicker by Type Tracing combined with static type checking, documentation generation, and test generation? (Yes, refactoring is safer with a type checker and no manually written tests)
  • Can I generate better documentation automatically with Type Tracing? (Yes, return and parameter types and example values helps understanding greatly)
  • Can I use the types for automatic property testing? (Yes, hypothesis does useful testing just knowing some types and a few examples... which we recorded with the tracer)
  • Can I use example capture for tests and docs, as well as the types? (Yes)
  • Can I generate faster compiled code automatically just using the recorded types and Cython (Yes).

Benefits from Type Tracing.

Below I try to show that the following benefits can be obtained by combining Type Tracing with other existing python tools.
  • Automate documentation generation, by providing types to the documentation tool, and by collecting some example inputs and outputs.
  • Automate some type annotation.
  • Automatically find bugs static type checking can not. Without full type inference, existing python static type checkers can not find many issues until the types are fully annotated. Type Tracing can provide those types.
  • Speed up Python2 porting process, by finding issues other tools can't. It can also speed things up by showing people types and example inputs. This can greatly help people understand large programs when documentation is limited.
  • Use for Ahead Of Time (AOT) compilation with Cython.
  • Help property testing tools to find simple bugs without manually setting properties.

Tools used to hack something together.

  • coverage (extended the coverage checker to record types as it goes)
  • mypy (static type checker for python)
  • Hypothesis (property testing... automated test generator)
  • Cython (a compiler for python code, and code with type annotations)
  • jedi (another python static type checker)
  • Sphinx (automatic documentation generator).
  • Cpython (the original C implementation of python)
More details below on the experiments.

Type Tracing using 'coverage'.

Originally I hacked up a set_trace script... and started going. But there really are so many corner cases. Also, I already run the "coverage" tool over the code base I'm working on.

I started with coverage.pytracer.PyTracer, since it's python. Coverage also comes with a faster tracer written in C. So far I'm just using the python one.

The plan later would be to perhaps use CoverageData. Which uses JSON, which means storing the type will be hard sometimes (eg, when they are dynamically generated). However, I think I'm happy to start with easy types. To start simple, I'll just record object types as strings with something like `repr(type(o)) if type(o) is not type else repr(o)`. Well, I'm not sure. So far, I'm happy with hacking everything into my fork of coverage, but to move it into production there is more work to be done. Things like multiprocess, multithreading all need to be handled.

Porting python 2 code with type tracing.

I first started porting code to python 3 in the betas... around 2007. Including some C API modules. I think I worked on one of the first single code base packages. Since then the tooling has gotten a lot better. Compatibility libraries exist (six), lots of people have figured out the dangerous points and documented them. Forward compatibility features were added into the python2.6 and 2.7, and 3.5 releases to make porting easier. However, it can still be hard.

Especially when Python 2 code bases often don't have many tests. Often zero tests. Also, there may be very little documentation, and the original developers have moved on.

But the code works, and it's been in production for a long time, and gets updates occasionally. Maybe it's not updated as often as it's needed because people are afraid of breaking things.

Steps to port to python 3 are usually these:

  1. Understand the code.
  2. Run the code in production (or on a copy of production data).
  3. With a debugger, look at what is coming in and out of functions.
  4. Write tests for everything.
  5. Write documentation.
  6. Run 2to3.
  7. Do lots of manual QA.
  8. Start refactoring.
  9. Repeat. Repeat manually writing tests, docs, and testing manually. Many times.
Remember that writing tests is usually harder than writing the code in the first place.

With type tracing helping to generate docs, types for the type checker, examples for human reading plus for the hypothesis property checker we get a lot more tools to help ensure quality.

A new way to port python2 code could be something like...
  1. Run program under Type Tracing, line/branch coverage, and example capture.
  2. Look at generated types, example inputs and outputs.
  3. Look at generated documentation.
  4. Gradually add type checking info with help of Type Tracing recorded types.
  5. Generate tests automatically with Type Tracing types, examples, and hypothesis automated property testing. Generate empty test stubs for things you still need to test.
  6. Once each module is fully typed, you can statically type check it.
  7. You can cross validate your type checked python code against your original code. Under the Type Tracer.
  8. Refactoring is easier with better docs, static type checks, tests, types for arguments and return values, and example inputs and outputs.
  9. Everything should be ported to work with the new forwards compatibility functionality in python2.7.
  10. Now with your various quality checks in place, you can start porting to python3. Note, you might not have needed to change any of the original code - only add types.
I would suggest the effort is about 1/5th of the normal time it takes to port things. Especially if you want to make sure the chance of introducing errors is very low.

Below are a couple of issues where Type Tracing can help over existing tools.

Integer divide issue.

Here I will show that the 2to3 conversion tool makes a bug with. Also, mypy does not detect a problem with the code.

# int_issue.py
def int_problem(x):
return x / 4
print(int_problem(3))

$ python2 int_issue.py
0

$ python3 int_issue.py
0.75

$ mypy --py2 int_issue.py
$ mypy int_issue.py

$ 2to3 int_issue.py
RefactoringTool: Skipping optional fixer: buffer
RefactoringTool: Skipping optional fixer: idioms
RefactoringTool: Skipping optional fixer: set_literal
RefactoringTool: Skipping optional fixer: ws_comma
RefactoringTool: Refactored int_issue.py
--- int_issue.py (original)
+++ int_issue.py (refactored)
@@ -3,4 +3,4 @@
def int_problem(x):
return x / 4

-print(int_problem(3))
+print((int_problem(3)))
RefactoringTool: Files that need to be modified:
RefactoringTool: int_issue.py


See how when run under python3 it gives a different result?

Can we fix it when Type Tracing adds types? (Yes)

So, how about if we run the program under type tracing, and record the input types coming in and out? See how it adds a python3 compatible comment about taking an int, and returning an int. This is so that mypy (and other type checkers) can see what it is supposed to take in.

def int_problem(x):
# type: (int) -> int
return x / 4
print(int_problem(3))

$ mypy int_issue.py
int_issue.py:5: error: Incompatible return value type (got "float", expected "int")

I'm happy that Yes, Type Tracing combined with mypy can detect this issue whereas mypy can not by itself.


Binary or Text file issue?

Another porting issue not caught by existing tools is trying to do the right thing when a python file is in binary mode or in text mode. If in binary, read() will return bytes, otherwise it might return text.

In theory this could be made to work, however at the time of writing, there is an open issue with "dependent types" or "Factory Pattern" functions in mypy. More information on this, and also a work around I wrote see this issue: https://github.com/python/mypy/issues/2337#issuecomment-280850128

In there I show that you can create your own io.open replacement that always returns one type. eg, open_rw(fname) instead of open(fname, 'rw').

Once you know that .read() will return bytes, then you also know that it can't call .format() in python 3. The solution is to use % string formatting on bytes, which is supported from python3.5 upwards.

x = f.read() # type: bytes

So the answer here is that mypy could likely solve this issue by itself in the future (once things are fully type annotated). But for now, it's good to see combining type tracing with mypy could help detect binary and text encoding issues much faster.

Generating Cython code with recorded types.

I wanted to see if this was possible. So I took the simple example from the cython documentation.
http://cython.readthedocs.io/en/latest/src/quickstart/cythonize.html

I used my type tracer to transform this python:
def f(x):
return x**2-x

def integrate_f(a, b, N):
s = 0
dx = (b-a)/N
for i in range(N):
s += f(a+i*dx)
return s * dx

Before you look below... take a guess what parameters a, b, and N are? Note, how there are no comments. Note how the variable names are single letter. Note, how there are no tests. There are no examples.

In [2]: %timeit integrate_f(10.4, 2.3, 17)
100000 loops, best of 3: 5.12 µs per loop



Into this Cython code with annotated types after running it through Type Tracing:

In [1]: %load_ext Cython

In [2]: %%cython
...: cdef double f(double x):
...: return x**2-x
...:
...: def integrate_f_c(double a, double b, int N):
...: """
...: :Example:
...: >>> integrate_f_c(10.4, 2.3, 17)
...: -342.34804152249137
...: """
...: cdef int i
...: cdef double s, dx
...: s = 0
...: dx = (b-a)/N
...: for i in range(N):
...: s += f(a+i*dx)
...: return s * dx
...:

In [3]: %timeit integrate_f_c(10.4, 2.3, 17)

10000000 loops, best of 3: 117 ns per loop

Normal python was 5200 nanoseconds. The cython compiled version is 117 nanoseconds. The result is 44x faster code, and we have all the types annotated, with an example. This helps you understand it a little better than before too.

This was a great result for me. It shows that yes combining Type Tracing with Cython can give improvements over Cython just by itself. Note, that Cython is not only for speeding up simple numeric code. It's also been used to speed up string based code, database access, network access, and game code.

So far I've made a simple mapping of python types to cython types. To make the code more useful would require quite a bit more effort. However, if you use it as a tool to help you write cython code yourself, then it's very useful to speed up that process.

The best cases so far are when it knows all of the types, all of the types have direct cython mappings, and it avoids calling python functions inside the function. In other words, 'pure' functions.

Cross validation for Cython and python versions?

In a video processing project I worked on there were implementations in C, and other assembly implementations of the same functions. A very simple way of testing is to run all the implementations and compare the results. If the C implementation gives the same results as the assembly implementations, then there's a pretty good chance they are correct.

In [1]: assert integrate_f_c(10.4, 2.3, 17) == integrate_f(10.4, 2.3, 17)

If we have a test runner, we can check if the inputs and outputs are the same between the compiled code and the non compiled code. That is, cross validate implementations against each other for correctness.

Property testing.

The most popular property testing framework Quickcheck from the Haskell world. However, python also has an implementation - Hypothesis. Rather than supply examples, as is usual with unit testing you tell it about properties which hold true.

Can we generate a hypothesis test automatically using just types collected with Type Tracing?


Below we can see some unit tests (example based testing), as well as some Hypothesis tests (property testing). They are for a function "always_add_something(x)", which always adds something to the number given in. As a property, we would say that "always_add_something(x) > x". That property will hold to be true for every value of x given x is an int.

Note, that the program is fully typed, and passes type checking with mypy. Also note that there is 100% test coverage if I remove the divide by zero error I inserted.

from hypothesis import given
import hypothesis.strategies

from bad_logic_issue import always_add_something, always_add_something_good

def test_always_add_something():# type: () -> None
#type: () -> None
assert always_add_something(5) >= 5
assert always_add_something(200) >= 200

def test_always_add_something_good():
#type: () -> None
assert always_add_something_good(5) >= 5
assert always_add_something_good(200) >= 200

@given(hypothesis.strategies.integers())
def test_always_add_something(x):
assert always_add_something(x) > x


# Here we test the good one.
@given(hypothesis.strategies.integers())
def test_always_add_something(x):
assert always_add_something_good(x) > x

Here are two implementations of the function. The first one is a contrived example in order to show two types of logic errors that are quite common. Even 30 year old code used by billions of people has been shown to have these errors. They're sort of hard to find with normal testing methods.

def always_add_something(x):
# type: (int) -> int
'''Silly function that is supposed to always add something to x.

But it doesn't always... even though we have
- 'complete' test coverage.
- fully typed
'''
r = x #type: int
if x > 0 and x < 10:
r += 20
elif x > 15 and x < 30:
r //= 0

elif x > 100:

r += 30

return r


def always_add_something_good(x):
# type: (int) -> int
'''This one always does add something.
'''
return x + 1


Now, hypothesis can find the errors when you write the property that the return value needs to be greater than the input. What about if we just use the types we record with Type Tracing to give hypothesis a chance to test? Hypothesis comes with a number of test strategies which generate many variations of a type. Eg, there is an "integers" strategy.

# Will it find an error just telling hypothesis that it takes an int as input?
@given(hypothesis.strategies.integers())
def test_always_add_something(x):
always_add_something(x)


It finds the divide by zero issue (when x is 16). However it does not find the other issue, because it still does not know that there is a problem. We haven't told it anything about the result always needing to be greater than the input.

bad_logic_issue.py:13: ZeroDivisionError
-------------------------------------------------------- Hypothesis --------------------------------------------------------
Falsifying example: test_always_add_something(x=16)

The result is that yes, it could find one issue automatically, without having to write any extra test code, just from Trace Typing.

For pure functions, it would be also useful to record some examples for unit test generation.

In conclusion.

I'm happy with the experiment overall. I think it shows it can be a fairly useful technique for making python programs more understandable, faster, and more correct. It can also help speed up porting old python2 code dramatically (especially when that code has limited documentation and tests).

I think the experiment also shows that combining existing python tools (coverage, mypy, Cython, and hypothesis) can give some interesting extra abilities without not too much extra effort. eg. I didn't need to write a robust tracing module, I didn't need to write a static type checker, or a python compiler. However, it would take some effort to turn these into robust general purpose tools. Currently what I have is a collection of fragile hacks, without support for many corner cases :)

For now I don't plan to work on this any more in the short term. (Unless of course someone wants to hire me to port some python2 code. Then I'll work on these tools again since it speeds things up quite a lot).

Any corrections or suggestions? Please leave a comment, or see you on twitter @renedudfield

20 Feb 2017 3:01pm GMT

Weekly Python Chat: Django Forms

Special guest Kenneth Love is going answer your questions about how to use Django's forms.

20 Feb 2017 3:00pm GMT

Weekly Python Chat: Django Forms

Special guest Kenneth Love is going answer your questions about how to use Django's forms.

20 Feb 2017 3:00pm GMT

Doug Hellmann: uuid — Universally Unique Identifiers — PyMOTW 3

RFC 4122 defines a system for creating universally unique identifiers for resources in a way that does not require a central registrar. UUID values are 128 bits long and, as the reference guide says, "can guarantee uniqueness across space and time." They are useful for generating identifiers for documents, hosts, application clients, and other situations … Continue reading uuid - Universally Unique Identifiers - PyMOTW 3

20 Feb 2017 2:00pm GMT

Doug Hellmann: uuid — Universally Unique Identifiers — PyMOTW 3

RFC 4122 defines a system for creating universally unique identifiers for resources in a way that does not require a central registrar. UUID values are 128 bits long and, as the reference guide says, "can guarantee uniqueness across space and time." They are useful for generating identifiers for documents, hosts, application clients, and other situations … Continue reading uuid - Universally Unique Identifiers - PyMOTW 3

20 Feb 2017 2:00pm GMT

Mike Driscoll: PyDev of the Week: Petr Viktorin

This week our PyDev of the Week is Petr Viktorin (@EnCuKou). Petr is the author of PEP 489 - Multi-phase extension module initialization and teaches Python for the local PyLadies in Czech Republic. You can some of what he's up to via his Github page or on his website. Let's take some time to get to know Petr better!

Can you tell us a little about yourself (hobbies, education, etc):

Sure!

I'm a Python programmer from Brno, Czech Republic. I studied at the Brno University of Technology, and for my master's I switched to the University of Eastern Finland.

When I'm not programming, I enjoy playing board games with my friends, and sometimes go to an orienteering race (without much success).

Why did you start using Python?

At the university, I did coursework in languages like C, Java, and Lisp, but then I found Python and got hooked. It fit the way I think about programs, abstracted away most of the boring stuff, and makes it easy to keep the code understandable.

After I returned home from the university, I found a community that was starting to form around the language, and that's probably what keeps me around the language now.

What other programming languages do you know and which is your favorite?

Since I work with CPython a lot, I code in C - or at least I *read* C regularly. And I'd say C's my favorite, after Python - they complement each other quite nicely..I can also throw something together in JavaScript. And C++, Java or PHP, though I don't find much reason to code in those languages any more. Since I finished school, I sadly haven't made much time to learn new languages. Someday, I'd like to explore Rust more seriously, but I haven't found a good project for starting that yet.

What projects are you working on now?

I work at Red Hat, and the main job of our team is to package Python for Fedora and RHEL. The mission is to make sure everything works really great together, so when we succeed, the results of the work are somewhat invisible.

My other project is teaching Python. A few years back, and without much teaching experience, I've started a beginners' Python course for the local PyLadies. I've spent a lot of time on making the content online and accessible to everyone, and over the years it got picked up in two more cities, and sometimes I find people going through the course from home. Now people are refining the course, and even building new workshops and other courses on top of it. Like any open-source project, it needs some maintenance, and I'm lucky to be able to spend some paid time both teaching and coordinating and improving Czech Python teaching materials.

When I find some spare time, I hack on crazy side projects like a minimalistic 3D-printed MicroPython-powered game console.

Which Python libraries are your favorite (core or 3rd party)?

I'm sure Requests appeared on these interviews before: it's a great example of how a library should be designed.

I also like the pyglet library. It's an easy way to draw graphics on the screen, and I also use it to introduce people to event-driven programming.

Where do you see Python going as a programming language?

Strictly as a language, I don't think Python will evolve too much. It's already a good way to structure code and express algorithms. There will of course be improvements - especially the async parts are quite new and still have some rough corners - but I'm skeptical about any revolutionary additions.

I think most improvements will come to the CPython implementation, not the language itself. I'm hopeful for projects like Pyjion and Gilectomy. I'm involved in a similar effort, making CPython's subinterpreter more useful. Sadly, it's currently stalled, but maybe I'll be able to mentor it as a student project.

What is your take on the current market for Python programmers?

When I finished school, I had no idea I could actually get a job using Python. But it turns out there's always demand for Python programmers. And I see new projects started in Python all the time. It doesn't look like the demand is going away.

Is there anything else you'd like to say?

If you visit Czech Republic, look at http://pyvo.cz/en and visit one of our meetups!

Thanks so much for doing the interview!

20 Feb 2017 1:30pm GMT

Mike Driscoll: PyDev of the Week: Petr Viktorin

This week our PyDev of the Week is Petr Viktorin (@EnCuKou). Petr is the author of PEP 489 - Multi-phase extension module initialization and teaches Python for the local PyLadies in Czech Republic. You can some of what he's up to via his Github page or on his website. Let's take some time to get to know Petr better!

Can you tell us a little about yourself (hobbies, education, etc):

Sure!

I'm a Python programmer from Brno, Czech Republic. I studied at the Brno University of Technology, and for my master's I switched to the University of Eastern Finland.

When I'm not programming, I enjoy playing board games with my friends, and sometimes go to an orienteering race (without much success).

Why did you start using Python?

At the university, I did coursework in languages like C, Java, and Lisp, but then I found Python and got hooked. It fit the way I think about programs, abstracted away most of the boring stuff, and makes it easy to keep the code understandable.

After I returned home from the university, I found a community that was starting to form around the language, and that's probably what keeps me around the language now.

What other programming languages do you know and which is your favorite?

Since I work with CPython a lot, I code in C - or at least I *read* C regularly. And I'd say C's my favorite, after Python - they complement each other quite nicely..I can also throw something together in JavaScript. And C++, Java or PHP, though I don't find much reason to code in those languages any more. Since I finished school, I sadly haven't made much time to learn new languages. Someday, I'd like to explore Rust more seriously, but I haven't found a good project for starting that yet.

What projects are you working on now?

I work at Red Hat, and the main job of our team is to package Python for Fedora and RHEL. The mission is to make sure everything works really great together, so when we succeed, the results of the work are somewhat invisible.

My other project is teaching Python. A few years back, and without much teaching experience, I've started a beginners' Python course for the local PyLadies. I've spent a lot of time on making the content online and accessible to everyone, and over the years it got picked up in two more cities, and sometimes I find people going through the course from home. Now people are refining the course, and even building new workshops and other courses on top of it. Like any open-source project, it needs some maintenance, and I'm lucky to be able to spend some paid time both teaching and coordinating and improving Czech Python teaching materials.

When I find some spare time, I hack on crazy side projects like a minimalistic 3D-printed MicroPython-powered game console.

Which Python libraries are your favorite (core or 3rd party)?

I'm sure Requests appeared on these interviews before: it's a great example of how a library should be designed.

I also like the pyglet library. It's an easy way to draw graphics on the screen, and I also use it to introduce people to event-driven programming.

Where do you see Python going as a programming language?

Strictly as a language, I don't think Python will evolve too much. It's already a good way to structure code and express algorithms. There will of course be improvements - especially the async parts are quite new and still have some rough corners - but I'm skeptical about any revolutionary additions.

I think most improvements will come to the CPython implementation, not the language itself. I'm hopeful for projects like Pyjion and Gilectomy. I'm involved in a similar effort, making CPython's subinterpreter more useful. Sadly, it's currently stalled, but maybe I'll be able to mentor it as a student project.

What is your take on the current market for Python programmers?

When I finished school, I had no idea I could actually get a job using Python. But it turns out there's always demand for Python programmers. And I see new projects started in Python all the time. It doesn't look like the demand is going away.

Is there anything else you'd like to say?

If you visit Czech Republic, look at http://pyvo.cz/en and visit one of our meetups!

Thanks so much for doing the interview!

20 Feb 2017 1:30pm GMT

Full Stack Python: Creating SSH Keys on macOS Sierra

Deploying Python applications typically requires SSH keys. An SSH key has both a public and a private key file. You can use the private key to authenticate when syncing remote Git repositories, connect to remote servers and automate your application's deployments via configuration management tools like Ansible. Let's learn how to generate SSH key pairs on macOS Sierra.

Generating New Keys

Bring up a new terminal window on macOS by going into Applications/Utilities and opening "Terminal".

New macOS terminal window.

The ssh-keygen command provides an interactive command line interface for generating both the public and private keys. Invoke ssh-keygen with the following -t and -b arguments to ensure we get a 4096 bit RSA key. Note that you must use a key with 2048 or more bits in macOS Sierra or the system will not allow you to connect to servers with it.

Optionally, you can also specify your email address with -C (otherwise one will be generated off your current macOS account):

ssh-keygen -t rsa -b 4096 -C my.email.address@company.com

The first prompt you will see asks where to save the key. However, there are actually two files that will be generated: the public key and the private key.

Generating public/private rsa key pair.
Enter file in which to save the key (/Users/matt/.ssh/id_rsa):

This prompt refers to the private key and whatever you enter will also generate a second file for the public key that has the same name and .pub appended.

If you already have a key then specify a new filename. I use many SSH keys so I oftne name them "test-deploy", "prod-deploy", "ci-server" along with a unique project name. Naming is one of those hard computer science problems, so take some time to come up with a system that works for you!

Next you will see a prompt for an optional passphrase:

Enter passphrase (empty for no passphrase):

Whether or not you want a passphrase depends on how you will use the key. The system will ask you for the passphrase whenever you use the SSH key, although macOS can store the passphrase in your system Keychain after the first time you enter it. However, if you are automating deployments with a continuous integration server like Jenkins then you will not want a passphrase.

Note that it is impossible to recover a passphrase if it is lost. Keep that passphrase safe and secure because otherwise a completely new key would have to be generated.

Enter the passphrase (or just press enter to not have a passphrase) twice. You'll see some output like the following:

Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /Users/matt/.ssh/deploy_prod.
Your public key has been saved in /Users/matt/.ssh/deploy_prod.pub.
The key fingerprint is:
SHA256:UnRGH/nzYzxUFS9jjd0wOl1ScFGKgW3pU60sSxGnyHo matthew.makai@gmail.com
The key's randomart image is:
+---[RSA 4096]----+
|        ..+o++**@|
|       . +.o*O.@=|
|        . oo*=B.*|
|       . .  =o=+ |
|      . S E. +oo |
|       . .  .  =.|
|              . o|
|                 |
|                 |
+----[SHA256]-----+

Your SSH key is ready to use!

What now?

Now that you have your public and private keys, I recommend building and deploying some Python web apps such as:

Additional ssh-keygen command resources:

Questions? Contact me via Twitter @fullstackpython or @mattmakai. I'm also on GitHub with the username mattmakai.

Something wrong with this post? Fork this page's source on GitHub.

20 Feb 2017 5:00am GMT

Full Stack Python: Creating SSH Keys on macOS Sierra

Deploying Python applications typically requires SSH keys. An SSH key has both a public and a private key file. You can use the private key to authenticate when syncing remote Git repositories, connect to remote servers and automate your application's deployments via configuration management tools like Ansible. Let's learn how to generate SSH key pairs on macOS Sierra.

Generating New Keys

Bring up a new terminal window on macOS by going into Applications/Utilities and opening "Terminal".

New macOS terminal window.

The ssh-keygen command provides an interactive command line interface for generating both the public and private keys. Invoke ssh-keygen with the following -t and -b arguments to ensure we get a 4096 bit RSA key. Note that you must use a key with 2048 or more bits in macOS Sierra or the system will not allow you to connect to servers with it.

Optionally, you can also specify your email address with -C (otherwise one will be generated off your current macOS account):

ssh-keygen -t rsa -b 4096 -C my.email.address@company.com

The first prompt you will see asks where to save the key. However, there are actually two files that will be generated: the public key and the private key.

Generating public/private rsa key pair.
Enter file in which to save the key (/Users/matt/.ssh/id_rsa):

This prompt refers to the private key and whatever you enter will also generate a second file for the public key that has the same name and .pub appended.

If you already have a key then specify a new filename. I use many SSH keys so I oftne name them "test-deploy", "prod-deploy", "ci-server" along with a unique project name. Naming is one of those hard computer science problems, so take some time to come up with a system that works for you!

Next you will see a prompt for an optional passphrase:

Enter passphrase (empty for no passphrase):

Whether or not you want a passphrase depends on how you will use the key. The system will ask you for the passphrase whenever you use the SSH key, although macOS can store the passphrase in your system Keychain after the first time you enter it. However, if you are automating deployments with a continuous integration server like Jenkins then you will not want a passphrase.

Note that it is impossible to recover a passphrase if it is lost. Keep that passphrase safe and secure because otherwise a completely new key would have to be generated.

Enter the passphrase (or just press enter to not have a passphrase) twice. You'll see some output like the following:

Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /Users/matt/.ssh/deploy_prod.
Your public key has been saved in /Users/matt/.ssh/deploy_prod.pub.
The key fingerprint is:
SHA256:UnRGH/nzYzxUFS9jjd0wOl1ScFGKgW3pU60sSxGnyHo matthew.makai@gmail.com
The key's randomart image is:
+---[RSA 4096]----+
|        ..+o++**@|
|       . +.o*O.@=|
|        . oo*=B.*|
|       . .  =o=+ |
|      . S E. +oo |
|       . .  .  =.|
|              . o|
|                 |
|                 |
+----[SHA256]-----+

Your SSH key is ready to use!

What now?

Now that you have your public and private keys, I recommend building and deploying some Python web apps such as:

Additional ssh-keygen command resources:

Questions? Contact me via Twitter @fullstackpython or @mattmakai. I'm also on GitHub with the username mattmakai.

Something wrong with this post? Fork this page's source on GitHub.

20 Feb 2017 5:00am GMT

Carl Trachte: Filling in Missing Grouping Columns of MSSQL SSRS Report Dumped to Excel

This is another simple but common problem in certain business environments:

1) Data are presented via a Microsoft SQL Server Reporting Services report, BUT

2) The user wants the data in Excel, and, further, wants to play with it (pivot, etc.) there. The problem is that the grouping column labels are not in every record, only in the one row that begins the list of records for that group (sanitized screenshot below):

But I don't WANT to copy and paste all those groupings for 30,000 records :*-(

I had this assignment recently from a remote request. It took about four rounds of an e-mail exchange to figure out that it really wasn't a data problem, but a formatting one that needed solving.

It is possible to do the whole thing in Python. I did the Excel part by hand in order to get a handle on the data:

1) In Excel, delete the extra rows on top of the report leaving just the headers and the data.

2) In Excel, select everything on the data page, format the cells correctly by unselecting the Merge Cells and Wraparound options.

3) In Excel, at this point you should be able to see if there are extra empty columns as space fillers; delete them. Save the worksheet as a csv file.

4) In a text editor, open your csv file, identify any empty rows, and delete them. Change column header names as desired.

Now the Python part:

#!python36

"""
Doctor csv dump from unmerged cell
dump of SSRS dump from MSSQL database.

Fill in cell gaps where merged
cells had only one grouping value
so that all rows are complete records.
"""

import pprint

COMMA = ','
EMPTY = ''

INFILE = 'rawdata.csv'
OUTFILE = 'canneddumpfixed.csv'

ERRORFLAG = 'ERROR!'

f = open(INFILE, 'r')
headerline = next(f)
numbercolumns = len(headerline.split(COMMA))

f2 = open(OUTFILE, 'w')

# Assume at least one data column on far right.
missingvalues = (numbercolumns - 1) * [ERRORFLAG]

for linex in f:
print('Processing line {:s} . . .'.format(linex))
splitrecord = linex.split(COMMA)
for slotx in range(0, numbercolumns - 1):
if splitrecord[slotx] != EMPTY:
missingvalues[slotx] = splitrecord[slotx]
else:
splitrecord[slotx] = missingvalues[slotx]
f2.write(COMMA.join(splitrecord))

f2.close()

print('Finished')


At this point you've got your data in csv format - you can open it in Excel and go to work.

There may be a free or COTS (commercial off the shelf) utility that does all this somewhere in the Microsoft "ecosystem" (I think that's their fancy enviro-friendly word for vendor-user community) but I don't know of one.


Thanks for stopping by.





20 Feb 2017 12:34am GMT

Carl Trachte: Filling in Missing Grouping Columns of MSSQL SSRS Report Dumped to Excel

This is another simple but common problem in certain business environments:

1) Data are presented via a Microsoft SQL Server Reporting Services report, BUT

2) The user wants the data in Excel, and, further, wants to play with it (pivot, etc.) there. The problem is that the grouping column labels are not in every record, only in the one row that begins the list of records for that group (sanitized screenshot below):

But I don't WANT to copy and paste all those groupings for 30,000 records :*-(

I had this assignment recently from a remote request. It took about four rounds of an e-mail exchange to figure out that it really wasn't a data problem, but a formatting one that needed solving.

It is possible to do the whole thing in Python. I did the Excel part by hand in order to get a handle on the data:

1) In Excel, delete the extra rows on top of the report leaving just the headers and the data.

2) In Excel, select everything on the data page, format the cells correctly by unselecting the Merge Cells and Wraparound options.

3) In Excel, at this point you should be able to see if there are extra empty columns as space fillers; delete them. Save the worksheet as a csv file.

4) In a text editor, open your csv file, identify any empty rows, and delete them. Change column header names as desired.

Now the Python part:

#!python36

"""
Doctor csv dump from unmerged cell
dump of SSRS dump from MSSQL database.

Fill in cell gaps where merged
cells had only one grouping value
so that all rows are complete records.
"""

import pprint

COMMA = ','
EMPTY = ''

INFILE = 'rawdata.csv'
OUTFILE = 'canneddumpfixed.csv'

ERRORFLAG = 'ERROR!'

f = open(INFILE, 'r')
headerline = next(f)
numbercolumns = len(headerline.split(COMMA))

f2 = open(OUTFILE, 'w')

# Assume at least one data column on far right.
missingvalues = (numbercolumns - 1) * [ERRORFLAG]

for linex in f:
print('Processing line {:s} . . .'.format(linex))
splitrecord = linex.split(COMMA)
for slotx in range(0, numbercolumns - 1):
if splitrecord[slotx] != EMPTY:
missingvalues[slotx] = splitrecord[slotx]
else:
splitrecord[slotx] = missingvalues[slotx]
f2.write(COMMA.join(splitrecord))

f2.close()

print('Finished')


At this point you've got your data in csv format - you can open it in Excel and go to work.

There may be a free or COTS (commercial off the shelf) utility that does all this somewhere in the Microsoft "ecosystem" (I think that's their fancy enviro-friendly word for vendor-user community) but I don't know of one.


Thanks for stopping by.





20 Feb 2017 12:34am GMT

Matthew Rocklin: Dask Development Log

This work is supported by Continuum Analytics the XDATA Program and the Data Driven Discovery Initiative from the Moore Foundation

To increase transparency I'm blogging weekly(ish) about the work done on Dask and related projects during the previous week. This log covers work done between 2017-02-01 and 2017-02-20. Nothing here is ready for production. This blogpost is written in haste, so refined polish should not be expected.

Themes of the last couple of weeks:

  1. Profiling experiments with Dask-GLM
  2. Subsequent graph optimizations, both non-linear fusion and avoiding repeatedly creating new graphs
  3. Tensorflow and Keras experiments
  4. XGBoost experiments
  5. Dask tutorial refactor
  6. Google Cloud Storage support
  7. Cleanup of Dask + SKLearn project

Dask-GLM and iterative algorithms

Dask-GLM is currently just a bunch of solvers like Newton, Gradient Descent, BFGS, Proximal Gradient Descent, and ADMM. These are useful in solving problems like logistic regression, but also several others. The mathematical side of this work is mostly done by Chris White and Hussain Sultan at Capital One.

We've been using this project also to see how Dask can scale out machine learning algorithms. To this end we ran a few benchmarks here: https://github.com/dask/dask-glm/issues/26 . This just generates and solves some random problems, but at larger scales.

What we found is that some algorithms, like ADMM perform beautifully, while for others, like gradient descent, scheduler overhead can become a substantial bottleneck at scale. This is mostly just because the actual in-memory NumPy operations are so fast; any sluggishness on Dask's part becomes very apparent. Here is a profile of gradient descent:

Notice all the white space. This is Dask figuring out what to do during different iterations. We're now working to bring this down to make all of the colored parts of this graph squeeze together better. This will result in general overhead improvements throughout the project.

Graph Optimizations - Aggressive Fusion

We're approaching this in two ways:

  1. More aggressively fuse tasks together so that there are fewer blocks for the scheduler to think about
  2. Avoid repeated work when generating very similar graphs

In the first case, Dask already does standard task fusion. For example, if you have the following to tasks:

x = f(w)
y = g(x)
z = h(y)

Dask (along with every other compiler-like project since the 1980's) already turns this into the following:

z = h(g(f(w)))

What's tricky with a lot of these mathematical or optimization algorithms though is that they are mostly, but not entirely linear. Consider the following example:

y = exp(x) - 1/x

Visualized as a node-link diagram, this graph looks like a diamond like the following:

         o  exp(x) - 1/x
        / \
exp(x) o   o   1/x
        \ /
         o  x

Graphs like this generally don't get fused together because we could compute both exp(x) and 1/x in parallel. However when we're bound by scheduling overhead and when we have plenty of parallel work to do, we'd prefer to fuse these into a single task, even though we lose some potential parallelism. There is a tradeoff here and we'd like to be able to exchange some parallelism (of which we have a lot) for less overhead.

PR here dask/dask #1979 by Erik Welch (Erik has written and maintained most of Dask's graph optimizations).

Graph Optimizations - Structural Sharing

Additionally, we no longer make copies of graphs in dask.array. Every collection like a dask.array or dask.dataframe holds onto a Python dictionary holding all of the tasks that are needed to construct that array. When we perform an operation on a dask.array we get a new dask.array with a new dictionary pointing to a new graph. The new graph generally has all of the tasks of the old graph, plus a few more. As a result, we frequently make copies of the underlying task graph.

y = (x + 1)
assert set(y.dask).issuperset(x.dask)

Normally this doesn't matter (copying graphs is usually cheap) but it can become very expensive for large arrays when you're doing many mathematical operations.

Now we keep dask graphs in a custom mapping (dict-like object) that shares subgraphs with other arrays. As a result, we rarely make unnecessary copies and some algorithms incur far less overhead. Work done in dask/dask #1985.

TensorFlow and Keras experiments

Two weeks ago I gave a talk with Stan Seibert (Numba developer) on Deep Learning (Stan's bit) and Dask (my bit). As part of that talk I decided to launch tensorflow from Dask and feed Tensorflow from a distributed Dask array. See this blogpost for more information.

That experiment was nice in that it showed how easy it is to deploy and interact with other distributed servies from Dask. However from a deep learning perspective it was immature. Fortunately, it succeeded in attracting the attention of other potential developers (the true goal of all blogposts) and now Brett Naul is using Dask to manage his GPU workloads with Keras. Brett contributed code to help Dask move around Keras models. He seems to particularly value Dask's ability to manage resources to help him fully saturate the GPUs on his workstation.

XGBoost experiments

After deploying Tensorflow we asked what would it take to do the same for XGBoost, another very popular (though very different) machine learning library. The conversation for that is here: dmlc/xgboost #2032 with prototype code here mrocklin/dask-xgboost. As with TensorFlow, the integration is relatively straightforward (if perhaps a bit simpler in this case). The challenge for me is that I have little concrete experience with the applications that these libraries were designed to solve. Feedback and collaboration from open source developers who use these libraries in production is welcome.

Dask tutorial refactor

The dask/dask-tutorial project on Github was originally written or PyData Seattle in July 2015 (roughly 19 months ago). Dask has evolved substantially since then but this is still our only educational material. Fortunately Martin Durant is doing a pretty serious rewrite, both correcting parts that are no longer modern API, and also adding in new material around distributed computing and debugging.

Google Cloud Storage

Dask developers (mostly Martin) maintain libraries to help Python users connect to distributed file systems like HDFS (with hdfs3, S3 (with s3fs, and Azure Data Lake (with adlfs), which subsequently become usable from Dask. Martin has been working on support for Google Cloud Storage (with gcsfs) with another small project that uses the same API.

Cleanup of Dask+SKLearn project

Last year Jim Crist published three great blogposts about using Dask with SKLearn. The result was a small library dask-learn that had a variety of features, some incredibly useful, like a cluster-ready Pipeline and GridSearchCV, other less so. Because of the experimental nature of this work we had labeled the library "not ready for use", which drew some curious responses from potential users.

Jim is now busy dusting off the project, removing less-useful parts and generally reducing scope to strictly model-parallel algorithms.

20 Feb 2017 12:00am GMT

Matthew Rocklin: Dask Development Log

This work is supported by Continuum Analytics the XDATA Program and the Data Driven Discovery Initiative from the Moore Foundation

To increase transparency I'm blogging weekly(ish) about the work done on Dask and related projects during the previous week. This log covers work done between 2017-02-01 and 2017-02-20. Nothing here is ready for production. This blogpost is written in haste, so refined polish should not be expected.

Themes of the last couple of weeks:

  1. Profiling experiments with Dask-GLM
  2. Subsequent graph optimizations, both non-linear fusion and avoiding repeatedly creating new graphs
  3. Tensorflow and Keras experiments
  4. XGBoost experiments
  5. Dask tutorial refactor
  6. Google Cloud Storage support
  7. Cleanup of Dask + SKLearn project

Dask-GLM and iterative algorithms

Dask-GLM is currently just a bunch of solvers like Newton, Gradient Descent, BFGS, Proximal Gradient Descent, and ADMM. These are useful in solving problems like logistic regression, but also several others. The mathematical side of this work is mostly done by Chris White and Hussain Sultan at Capital One.

We've been using this project also to see how Dask can scale out machine learning algorithms. To this end we ran a few benchmarks here: https://github.com/dask/dask-glm/issues/26 . This just generates and solves some random problems, but at larger scales.

What we found is that some algorithms, like ADMM perform beautifully, while for others, like gradient descent, scheduler overhead can become a substantial bottleneck at scale. This is mostly just because the actual in-memory NumPy operations are so fast; any sluggishness on Dask's part becomes very apparent. Here is a profile of gradient descent:

Notice all the white space. This is Dask figuring out what to do during different iterations. We're now working to bring this down to make all of the colored parts of this graph squeeze together better. This will result in general overhead improvements throughout the project.

Graph Optimizations - Aggressive Fusion

We're approaching this in two ways:

  1. More aggressively fuse tasks together so that there are fewer blocks for the scheduler to think about
  2. Avoid repeated work when generating very similar graphs

In the first case, Dask already does standard task fusion. For example, if you have the following to tasks:

x = f(w)
y = g(x)
z = h(y)

Dask (along with every other compiler-like project since the 1980's) already turns this into the following:

z = h(g(f(w)))

What's tricky with a lot of these mathematical or optimization algorithms though is that they are mostly, but not entirely linear. Consider the following example:

y = exp(x) - 1/x

Visualized as a node-link diagram, this graph looks like a diamond like the following:

         o  exp(x) - 1/x
        / \
exp(x) o   o   1/x
        \ /
         o  x

Graphs like this generally don't get fused together because we could compute both exp(x) and 1/x in parallel. However when we're bound by scheduling overhead and when we have plenty of parallel work to do, we'd prefer to fuse these into a single task, even though we lose some potential parallelism. There is a tradeoff here and we'd like to be able to exchange some parallelism (of which we have a lot) for less overhead.

PR here dask/dask #1979 by Erik Welch (Erik has written and maintained most of Dask's graph optimizations).

Graph Optimizations - Structural Sharing

Additionally, we no longer make copies of graphs in dask.array. Every collection like a dask.array or dask.dataframe holds onto a Python dictionary holding all of the tasks that are needed to construct that array. When we perform an operation on a dask.array we get a new dask.array with a new dictionary pointing to a new graph. The new graph generally has all of the tasks of the old graph, plus a few more. As a result, we frequently make copies of the underlying task graph.

y = (x + 1)
assert set(y.dask).issuperset(x.dask)

Normally this doesn't matter (copying graphs is usually cheap) but it can become very expensive for large arrays when you're doing many mathematical operations.

Now we keep dask graphs in a custom mapping (dict-like object) that shares subgraphs with other arrays. As a result, we rarely make unnecessary copies and some algorithms incur far less overhead. Work done in dask/dask #1985.

TensorFlow and Keras experiments

Two weeks ago I gave a talk with Stan Seibert (Numba developer) on Deep Learning (Stan's bit) and Dask (my bit). As part of that talk I decided to launch tensorflow from Dask and feed Tensorflow from a distributed Dask array. See this blogpost for more information.

That experiment was nice in that it showed how easy it is to deploy and interact with other distributed servies from Dask. However from a deep learning perspective it was immature. Fortunately, it succeeded in attracting the attention of other potential developers (the true goal of all blogposts) and now Brett Naul is using Dask to manage his GPU workloads with Keras. Brett contributed code to help Dask move around Keras models. He seems to particularly value Dask's ability to manage resources to help him fully saturate the GPUs on his workstation.

XGBoost experiments

After deploying Tensorflow we asked what would it take to do the same for XGBoost, another very popular (though very different) machine learning library. The conversation for that is here: dmlc/xgboost #2032 with prototype code here mrocklin/dask-xgboost. As with TensorFlow, the integration is relatively straightforward (if perhaps a bit simpler in this case). The challenge for me is that I have little concrete experience with the applications that these libraries were designed to solve. Feedback and collaboration from open source developers who use these libraries in production is welcome.

Dask tutorial refactor

The dask/dask-tutorial project on Github was originally written or PyData Seattle in July 2015 (roughly 19 months ago). Dask has evolved substantially since then but this is still our only educational material. Fortunately Martin Durant is doing a pretty serious rewrite, both correcting parts that are no longer modern API, and also adding in new material around distributed computing and debugging.

Google Cloud Storage

Dask developers (mostly Martin) maintain libraries to help Python users connect to distributed file systems like HDFS (with hdfs3, S3 (with s3fs, and Azure Data Lake (with adlfs), which subsequently become usable from Dask. Martin has been working on support for Google Cloud Storage (with gcsfs) with another small project that uses the same API.

Cleanup of Dask+SKLearn project

Last year Jim Crist published three great blogposts about using Dask with SKLearn. The result was a small library dask-learn that had a variety of features, some incredibly useful, like a cluster-ready Pipeline and GridSearchCV, other less so. Because of the experimental nature of this work we had labeled the library "not ready for use", which drew some curious responses from potential users.

Jim is now busy dusting off the project, removing less-useful parts and generally reducing scope to strictly model-parallel algorithms.

20 Feb 2017 12:00am GMT

19 Feb 2017

feedPlanet Python

Bhishan Bhandari: Raising and Handling Exceptions in Python – Python Programming Essentials

Brief Introduction Any unexpected events that occur during the execution of a program is known to be an exception. Like everything, exceptions are also objects in python that is either an instance of Exception class or an instance of underlying class derived from the base class Exception. Exceptions may occur due to logical errors in […]

19 Feb 2017 1:31pm GMT

Bhishan Bhandari: Raising and Handling Exceptions in Python – Python Programming Essentials

Brief Introduction Any unexpected events that occur during the execution of a program is known to be an exception. Like everything, exceptions are also objects in python that is either an instance of Exception class or an instance of underlying class derived from the base class Exception. Exceptions may occur due to logical errors in […]

19 Feb 2017 1:31pm GMT

10 Nov 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: King Willams Town Bahnhof

Gestern musste ich morgens zur Station nach KWT um unsere Rerservierten Bustickets für die Weihnachtsferien in Capetown abzuholen. Der Bahnhof selber ist seit Dezember aus kostengründen ohne Zugverbindung - aber Translux und co - die langdistanzbusse haben dort ihre Büros.


Größere Kartenansicht




© benste CC NC SA

10 Nov 2011 10:57am GMT

09 Nov 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein

Niemand ist besorgt um so was - mit dem Auto fährt man einfach durch, und in der City - nahe Gnobie- "ne das ist erst gefährlich wenn die Feuerwehr da ist" - 30min später auf dem Rückweg war die Feuerwehr da.




© benste CC NC SA

09 Nov 2011 8:25pm GMT

08 Nov 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: Brai Party

Brai = Grillabend o.ä.

Die möchte gern Techniker beim Flicken ihrer SpeakOn / Klinke Stecker Verzweigungen...

Die Damen "Mamas" der Siedlung bei der offiziellen Eröffnungsrede

Auch wenn weniger Leute da waren als erwartet, Laute Musik und viele Leute ...

Und natürlich ein Feuer mit echtem Holz zum Grillen.

© benste CC NC SA

08 Nov 2011 2:30pm GMT

07 Nov 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: Lumanyano Primary

One of our missions was bringing Katja's Linux Server back to her room. While doing that we saw her new decoration.

Björn, Simphiwe carried the PC to Katja's school


© benste CC NC SA

07 Nov 2011 2:00pm GMT

06 Nov 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: Nelisa Haircut

Today I went with Björn to Needs Camp to Visit Katja's guest family for a special Party. First of all we visited some friends of Nelisa - yeah the one I'm working with in Quigney - Katja's guest fathers sister - who did her a haircut.

African Women usually get their hair done by arranging extensions and not like Europeans just cutting some hair.

In between she looked like this...

And then she was done - looks amazing considering the amount of hair she had last week - doesn't it ?

© benste CC NC SA

06 Nov 2011 7:45pm GMT

05 Nov 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: Mein Samstag

Irgendwie viel mir heute auf das ich meine Blogposts mal ein bischen umstrukturieren muss - wenn ich immer nur von neuen Plätzen berichte, dann müsste ich ja eine Rundreise machen. Hier also mal ein paar Sachen aus meinem heutigen Alltag.

Erst einmal vorweg, Samstag zählt zumindest für uns Voluntäre zu den freien Tagen.

Dieses Wochenende sind nur Rommel und ich auf der Farm - Katja und Björn sind ja mittlerweile in ihren Einsatzstellen, und meine Mitbewohner Kyle und Jonathan sind zu Hause in Grahamstown - sowie auch Sipho der in Dimbaza wohnt.
Robin, die Frau von Rommel ist in Woodie Cape - schon seit Donnerstag um da ein paar Sachen zur erledigen.
Naja wie dem auch sei heute morgen haben wir uns erstmal ein gemeinsames Weetbix/Müsli Frühstück gegönnt und haben uns dann auf den Weg nach East London gemacht. 2 Sachen waren auf der Checkliste Vodacom, Ethienne (Imobilienmakler) außerdem auf dem Rückweg die fehlenden Dinge nach NeedsCamp bringen.

Nachdem wir gerade auf der Dirtroad losgefahren sind mussten wir feststellen das wir die Sachen für Needscamp und Ethienne nicht eingepackt hatten aber die Pumpe für die Wasserversorgung im Auto hatten.

Also sind wir in EastLondon ersteinmal nach Farmerama - nein nicht das onlinespiel farmville - sondern einen Laden mit ganz vielen Sachen für eine Farm - in Berea einem nördlichen Stadteil gefahren.

In Farmerama haben wir uns dann beraten lassen für einen Schnellverschluss der uns das leben mit der Pumpe leichter machen soll und außerdem eine leichtere Pumpe zur Reperatur gebracht, damit es nicht immer so ein großer Aufwand ist, wenn mal wieder das Wasser ausgegangen ist.

Fego Caffé ist in der Hemmingways Mall, dort mussten wir und PIN und PUK einer unserer Datensimcards geben lassen, da bei der PIN Abfrage leider ein zahlendreher unterlaufen ist. Naja auf jeden Fall speichern die Shops in Südafrika so sensible Daten wie eine PUK - die im Prinzip zugang zu einem gesperrten Phone verschafft.

Im Cafe hat Rommel dann ein paar online Transaktionen mit dem 3G Modem durchgeführt, welches ja jetzt wieder funktionierte - und übrigens mittlerweile in Ubuntu meinem Linuxsystem perfekt klappt.

Nebenbei bin ich nach 8ta gegangen um dort etwas über deren neue Deals zu erfahren, da wir in einigen von Hilltops Centern Internet anbieten wollen. Das Bild zeigt die Abdeckung UMTS in NeedsCamp Katjas Ort. 8ta ist ein neuer Telefonanbieter von Telkom, nachdem Vodafone sich Telkoms anteile an Vodacom gekauft hat müssen die komplett neu aufbauen.
Wir haben uns dazu entschieden mal eine kostenlose Prepaidkarte zu testen zu organisieren, denn wer weis wie genau die Karte oben ist ... Bevor man einen noch so billigen Deal für 24 Monate signed sollte man wissen obs geht.

Danach gings nach Checkers in Vincent, gesucht wurden zwei Hotplates für WoodyCape - R 129.00 eine - also ca. 12€ für eine zweigeteilte Kochplatte.
Wie man sieht im Hintergrund gibts schon Weihnachtsdeko - Anfang November und das in Südafrika bei sonnig warmen min- 25°C

Mittagessen haben wir uns bei einem Pakistanischen Curry Imbiss gegönnt - sehr empfehlenswert !
Naja und nachdem wir dann vor ner Stunde oder so zurück gekommen sind habe ich noch den Kühlschrank geputzt den ich heute morgen zum defrosten einfach nach draußen gestellt hatte. Jetzt ist der auch mal wieder sauber und ohne 3m dicke Eisschicht...

Morgen ... ja darüber werde ich gesondert berichten ... aber vermutlich erst am Montag, denn dann bin ich nochmal wieder in Quigney(East London) und habe kostenloses Internet.

© benste CC NC SA

05 Nov 2011 4:33pm GMT

31 Oct 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: Sterkspruit Computer Center

Sterkspruit is one of Hilltops Computer Centres in the far north of Eastern Cape. On the trip to J'burg we've used the opportunity to take a look at the centre.

Pupils in the big classroom


The Trainer


School in Countryside


Adult Class in the Afternoon


"Town"


© benste CC NC SA

31 Oct 2011 4:58pm GMT

Benedict Stein: Technical Issues

What are you doing in an internet cafe if your ADSL and Faxline has been discontinued before months end. Well my idea was sitting outside and eating some ice cream.
At least it's sunny and not as rainy as on the weekend.


© benste CC NC SA

31 Oct 2011 3:11pm GMT

30 Oct 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: Nellis Restaurant

For those who are traveling through Zastron - there is a very nice Restaurant which is serving delicious food at reasanable prices.
In addition they're selling home made juices jams and honey.




interior


home made specialities - the shop in the shop


the Bar


© benste CC NC SA

30 Oct 2011 4:47pm GMT

29 Oct 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: The way back from J'burg

Having the 10 - 12h trip from J'burg back to ELS I was able to take a lot of pcitures including these different roadsides

Plain Street


Orange River in its beginngings (near Lesotho)


Zastron Anglican Church


The Bridge in Between "Free State" and Eastern Cape next to Zastron


my new Background ;)


If you listen to GoogleMaps you'll end up traveling 50km of gravel road - as it was just renewed we didn't have that many problems and saved 1h compared to going the official way with all it's constructions sites




Freeway


getting dark


© benste CC NC SA

29 Oct 2011 4:23pm GMT

28 Oct 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: Wie funktioniert eigentlich eine Baustelle ?

Klar einiges mag anders sein, vieles aber gleich - aber ein in Deutschland täglich übliches Bild einer Straßenbaustelle - wie läuft das eigentlich in Südafrika ?

Ersteinmal vorweg - NEIN keine Ureinwohner die mit den Händen graben - auch wenn hier mehr Manpower genutzt wird - sind sie fleißig mit Technologie am arbeiten.

Eine ganz normale "Bundesstraße"


und wie sie erweitert wird


gaaaanz viele LKWs


denn hier wird eine Seite über einen langen Abschnitt komplett gesperrt, so das eine Ampelschaltung mit hier 45 Minuten Wartezeit entsteht


Aber wenigstens scheinen die ihren Spaß zu haben ;) - Wie auch wir denn gücklicher Weise mussten wir nie länger als 10 min. warten.

© benste CC NC SA

28 Oct 2011 4:20pm GMT