25 Apr 2017

feedPlanet Python

Philip Semanchuk: Thanks, Science!

I took part in the Raleigh March for Science last Saturday. For the opportunity to learn about it, participate in it, photograph it, share it with you - oh, and also for, you know, being alive today - thanks, science!







25 Apr 2017 9:20pm GMT

Philip Semanchuk: Thanks, Science!

I took part in the Raleigh March for Science last Saturday. For the opportunity to learn about it, participate in it, photograph it, share it with you - oh, and also for, you know, being alive today - thanks, science!







25 Apr 2017 9:20pm GMT

DataCamp: Keras Cheat Sheet: Neural Networks in Python

Keras is an easy-to-use and powerful library for Theano and TensorFlow that provides a high-level neural networks API to develop and evaluate deep learning models.

We recently launched one of the first online interactive deep learning course using Keras 2.0, called "Deep Learning in Python.

Now, DataCamp has created a Keras cheat sheet for those who have already taken the course and that still want a handy one-page reference or for those who need an extra push to get started.

In short, you'll see that this cheat sheet not only presents you with the six steps that you can go through to make neural networks in Python with the Keras library.

keras cheat sheet

In no time, this Keras cheat sheet will make you familiar with how you can load datasets from the library itself, preprocess the data, build up a model architecture, and compile, train, and evaluate it. As there is a considerable amount of freedom in how you build up your models, you'll see that the cheat sheet uses some of the simple key code examples of the Keras library that you need to know to get started with building your own neural networks in Python.

Furthermore, you'll also see some examples of how to inspect your model, and how you can save and reload it. Lastly, you'll also find examples of how you can predict values for test data and how you can fine tune your models by adjusting the optimization parameters and early stopping.

Everything you need to make your first neural networks in Python with Keras!

Also, don't miss out on our scikit-learn cheat sheet, NumPy cheat sheet and Pandas cheat sheet!

25 Apr 2017 6:10pm GMT

DataCamp: Keras Cheat Sheet: Neural Networks in Python

Keras is an easy-to-use and powerful library for Theano and TensorFlow that provides a high-level neural networks API to develop and evaluate deep learning models.

We recently launched one of the first online interactive deep learning course using Keras 2.0, called "Deep Learning in Python.

Now, DataCamp has created a Keras cheat sheet for those who have already taken the course and that still want a handy one-page reference or for those who need an extra push to get started.

In short, you'll see that this cheat sheet not only presents you with the six steps that you can go through to make neural networks in Python with the Keras library.

keras cheat sheet

In no time, this Keras cheat sheet will make you familiar with how you can load datasets from the library itself, preprocess the data, build up a model architecture, and compile, train, and evaluate it. As there is a considerable amount of freedom in how you build up your models, you'll see that the cheat sheet uses some of the simple key code examples of the Keras library that you need to know to get started with building your own neural networks in Python.

Furthermore, you'll also see some examples of how to inspect your model, and how you can save and reload it. Lastly, you'll also find examples of how you can predict values for test data and how you can fine tune your models by adjusting the optimization parameters and early stopping.

Everything you need to make your first neural networks in Python with Keras!

Also, don't miss out on our scikit-learn cheat sheet, NumPy cheat sheet and Pandas cheat sheet!

25 Apr 2017 6:10pm GMT

Mike Driscoll: Getting Started with pywebview

I stumbled across the pywebview project a couple of weeks ago. The pywebview package "is a lightweight cross-platform wrapper around a webview component that allows to display HTML content in its own native GUI window." It uses WebKit on OSX and Linux and Trident (MSHTML) on Windows, which is actually what wxPython's webview widget also does. The idea behind pywebview is that it provides you the ability to load a website in a desktop application, kind of Electron.

While pywebview claims it "has no dependencies on an external GUI framework", on Windows it requires pythonnet, PyWin32 and comtypes installed. OSX requires "pyobjc", although that is included with the default Python installed in OSX. For Linux, it's a bit more complicated. On GTK3 based systems you will need PyGObject whereas on Debian based systems, you'll need to install PyGObject + gir1.2-webkit-3.0. Finally, you can also use PyQt 4 or 5.

You can use Python micro-web frameworks, such as Flask or bottle, with pywebview to create cool applications using HTML5 instead of Python.

To install pywebview itself, just use pip:


pip install pywebview

Once installed and assuming you also have the prerequisites, you can do something like this:

import webview
 
webview.create_window('My Web App', 'http://www.mousevspython.com')

This will load the specified URL in a window with the specified title (i.e. the first argument). Your new application should end up looking something like this:

The API for pywebview is quite short and sweet and can be found here:

There are only a handful of methods that you can use, which makes them easy to remember. But since you can't create any other controls for your pywebview application, you will need to do all your user interface logic in your web application.

The pywebview package supports being frozen using PyInstaller for Windows and py2app for OSX. It also works with virtualenv, although there are known issues that you will want to read about before using virtualenv.

Wrapping Up

The pywebview package is actually pretty neat and I personally think it's worth a look. If you want something that's a bit more integrated to your desktop, then you might want to give wxPython or PyQt a try. But if all you need to do is distribute an HTML5-based web app, then this package might be just the one for you.

25 Apr 2017 12:30pm GMT

Mike Driscoll: Getting Started with pywebview

I stumbled across the pywebview project a couple of weeks ago. The pywebview package "is a lightweight cross-platform wrapper around a webview component that allows to display HTML content in its own native GUI window." It uses WebKit on OSX and Linux and Trident (MSHTML) on Windows, which is actually what wxPython's webview widget also does. The idea behind pywebview is that it provides you the ability to load a website in a desktop application, kind of Electron.

While pywebview claims it "has no dependencies on an external GUI framework", on Windows it requires pythonnet, PyWin32 and comtypes installed. OSX requires "pyobjc", although that is included with the default Python installed in OSX. For Linux, it's a bit more complicated. On GTK3 based systems you will need PyGObject whereas on Debian based systems, you'll need to install PyGObject + gir1.2-webkit-3.0. Finally, you can also use PyQt 4 or 5.

You can use Python micro-web frameworks, such as Flask or bottle, with pywebview to create cool applications using HTML5 instead of Python.

To install pywebview itself, just use pip:


pip install pywebview

Once installed and assuming you also have the prerequisites, you can do something like this:

import webview
 
webview.create_window('My Web App', 'http://www.mousevspython.com')

This will load the specified URL in a window with the specified title (i.e. the first argument). Your new application should end up looking something like this:

The API for pywebview is quite short and sweet and can be found here:

There are only a handful of methods that you can use, which makes them easy to remember. But since you can't create any other controls for your pywebview application, you will need to do all your user interface logic in your web application.

The pywebview package supports being frozen using PyInstaller for Windows and py2app for OSX. It also works with virtualenv, although there are known issues that you will want to read about before using virtualenv.

Wrapping Up

The pywebview package is actually pretty neat and I personally think it's worth a look. If you want something that's a bit more integrated to your desktop, then you might want to give wxPython or PyQt a try. But if all you need to do is distribute an HTML5-based web app, then this package might be just the one for you.

25 Apr 2017 12:30pm GMT

Chris Moffitt: Effectively Using Matplotlib

Introduction

The python visualization world can be a frustrating place for a new user. There are many different options and choosing the right one is a challenge. For example, even after 2 years, this article is one of the top posts that lead people to this site. In that article, I threw some shade at matplotlib and dismissed it during the analysis. However, after using tools such as pandas, scikit-learn, seaborn and the rest of the data science stack in python - I think I was a little premature in dismissing matplotlib. To be honest, I did not quite understand it and how to use it effectively in my workflow.

Now that I have taken the time to learn some of these tools and how to use them with matplotlib, I have started to see matplotlib as an indispensable tool. This post will show how I use matplotlib and provide some recommendations for users getting started or users who have not taken the time to learn matplotlib. I do firmly believe matplotlib is an essential part of the python data science stack and hope this article will help people understand how to use it for their own visualizations.

Why all the negativity towards matplotlib?

In my opinion, there are a couple of reasons why matplotlib is challenging for the new user to learn.

First, matplotlib has two interfaces. The first is based on MATLAB and uses a state-based interface. The second option is an an object-oriented interface. The why's of this dual approach are outside the scope of this post but knowing that there are two approaches is vitally important when plotting with matplotlib.

The reason two interfaces cause confusion is that in the world of stack overflow and tons of information available via google searches, new users will stumble across multiple solutions to problems that look somewhat similar but are not the same. I can speak from experience. Looking back on some of my old code, I can tell that there is a mishmash of matplotlib code - which is confusing to me (even if I wrote it).

Key Point
New matplotlib users should learn and use the object oriented interface.

Another historic challenge with matplotlib is that some of the default style choices were rather unattractive. In a world where R could generate some really cool plots with ggplot, the matplotlib options tended to look a bit ugly in comparison. The good news is that matplotlib 2.0 has much nicer styling capabilities and ability to theme your visualizations with minimal effort.

The third challenge I see with matplotlib is that there is confusion as to when you should use pure matplotlib to plot something vs. a tool like pandas or seaborn that is built on top of matplotlib. Anytime there can be more than one way to do something, it is challenging for the new or infrequent user to follow the right path. Couple this confusion with the two different API's and it is a recipe for frustration.

Why stick with matplotlib?

Despite some of these issues, I have come to appreciate matplotlib because it is extremely powerful. The library allows you to create almost any visualization you could imagine. Additionally, there is a rich ecosystem of python tools built around it and many of the more advanced visualization tools use matplotlib as the base library. If you do any work in the python data science stack, you will need to develop some basic familiarity with how to use matplotlib. That is the focus of the rest of this post - developing a basic approach for effectively using matplotlib.

Basic Premises

If you take nothing else away from this post, I recommend the following steps for learning how to use matplotlib:

  1. Learn the basic matplotlib terminology, specifically what is a Figure and an Axes .
  2. Always use the object-oriented interface. Get in the habit of using it from the start of your analysis.
  3. Start your visualizations with basic pandas plotting.
  4. Use seaborn for the more complex statistical visualizations.
  5. Use matplotlib to customize the pandas or seaborn visualization.

This graphic from the matplotlib faq is gold. Keep it handy to understand the different terminology of a plot.

Matplotlib parts

Most of the terms are straightforward but the main thing to remember is that the Figure is the final image that may contain 1 or more axes. The Axes represent an individual plot. Once you understand what these are and how to access them through the object oriented API, the rest of the process starts to fall into place.

The other benefit of this knowledge is that you have a starting point when you see things on the web. If you take the time to understand this point, the rest of the matplotlib API will start to make sense. Also, many of the advanced python packages like seaborn and ggplot rely on matplotlib so understanding the basics will make those more powerful frameworks much easier to learn.

Finally, I am not saying that you should avoid the other good options like ggplot (aka ggpy), bokeh, plotly or altair. I just think you'll need a basic understanding of matplotlib + pandas + seaborn to start. Once you understand the basic visualization stack, you can explore the other options and make informed choices based on your needs.

Getting Started

The rest of this post will be a primer on how to do the basic visualization creation in pandas and customize the most common items using matplotlib. Once you understand the basic process, further customizations are relatively straightforward.

I have focused on the most common plotting tasks I encounter such as labeling axes, adjusting limits, updating plot titles, saving figures and adjusting legends. If you would like to follow along, the notebook includes additional detail that should be helpful.

To get started, I am going to setup my imports and read in some data:

import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.ticker import FuncFormatter

df = pd.read_excel("https://github.com/chris1610/pbpython/blob/master/data/sample-salesv3.xlsx?raw=true")
df.head()
account number name sku quantity unit price ext price date
0 740150 Barton LLC B1-20000 39 86.69 3380.91 2014-01-01 07:21:51
1 714466 Trantow-Barrows S2-77896 -1 63.16 -63.16 2014-01-01 10:00:47
2 218895 Kulas Inc B1-69924 23 90.70 2086.10 2014-01-01 13:24:58
3 307599 Kassulke, Ondricka and Metz S1-65481 41 21.05 863.05 2014-01-01 15:05:22
4 412290 Jerde-Hilpert S2-34077 6 83.21 499.26 2014-01-01 23:26:55

The data consists of sales transactions for 2014. In order to make this post a little shorter, I'm going to summarize the data so we can see the total number of purchases and total sales for the top 10 customers. I am also going to rename columns for clarity during plots.

top_10 = (df.groupby('name')['ext price', 'quantity'].agg({'ext price': 'sum', 'quantity': 'count'})
          .sort_values(by='ext price', ascending=False))[:10].reset_index()
top_10.rename(columns={'name': 'Name', 'ext price': 'Sales', 'quantity': 'Purchases'}, inplace=True)

Here is what the data looks like.

Name Purchases Sales
0 Kulas Inc 94 137351.96
1 White-Trantow 86 135841.99
2 Trantow-Barrows 94 123381.38
3 Jerde-Hilpert 89 112591.43
4 Fritsch, Russel and Anderson 81 112214.71
5 Barton LLC 82 109438.50
6 Will LLC 74 104437.60
7 Koepp Ltd 82 103660.54
8 Frami, Hills and Schmidt 72 103569.59
9 Keeling LLC 74 100934.30

Now that the data is formatted in a simple table, let's talk about plotting these results as a bar chart.

As I mentioned earlier, matplotlib has many different styles available for rendering plots. You can see which ones are available on your system using plt.style.available .

plt.style.available
['seaborn-dark',
 'seaborn-dark-palette',
 'fivethirtyeight',
 'seaborn-whitegrid',
 'seaborn-darkgrid',
 'seaborn',
 'bmh',
 'classic',
 'seaborn-colorblind',
 'seaborn-muted',
 'seaborn-white',
 'seaborn-talk',
 'grayscale',
 'dark_background',
 'seaborn-deep',
 'seaborn-bright',
 'ggplot',
 'seaborn-paper',
 'seaborn-notebook',
 'seaborn-poster',
 'seaborn-ticks',
 'seaborn-pastel']

Using a style is as simple as:

plt.style.use('ggplot')

I encourage you to play around with different styles and see which ones you like.

Now that we have a nicer style in place, the first step is to plot the data using the standard pandas plotting function:

top_10.plot(kind='barh', y="Sales", x="Name")
Pandas plot 1

The reason I recommend using pandas plotting first is that it is a quick and easy way to prototype your visualization. Since most people are probably already doing some level of data manipulation/analysis in pandas as a first step, go ahead and use the basic plots to get started.

Customizing the Plot

Assuming you are comfortable with the gist of this plot, the next step is to customize it. Some of the customizations (like adding titles and labels) are very simple to use with the pandas plot function. However, you will probably find yourself needing to move outside of that functionality at some point. That's why I recommend getting in the habit of doing this:

fig, ax = plt.subplots()
top_10.plot(kind='barh', y="Sales", x="Name", ax=ax)

The resulting plot looks exactly the same as the original but we added an additional call to plt.subplots() and passed the ax to the plotting function. Why should you do this? Remember when I said it is critical to get access to the axes and figures in matplotlib? That's what we have accomplished here. Any future customization will be done via the ax or fig objects.

We have the benefit of a quick plot from pandas but access to all the power from matplotlib now. An example should show what we can do now. Also, by using this naming convention, it is fairly straightforward to adapt others' solutions to your unique needs.

Suppose we want to tweak the x limits and change some axis labels? Now that we have the axes in the ax variable, we have a lot of control:

fig, ax = plt.subplots()
top_10.plot(kind='barh', y="Sales", x="Name", ax=ax)
ax.set_xlim([-10000, 140000])
ax.set_xlabel('Total Revenue')
ax.set_ylabel('Customer');
Pandas plot 2

Here's another shortcut we can use to change the title and both labels:

fig, ax = plt.subplots()
top_10.plot(kind='barh', y="Sales", x="Name", ax=ax)
ax.set_xlim([-10000, 140000])
ax.set(title='2014 Revenue', xlabel='Total Revenue', ylabel='Customer')
Pandas plot 3

To further demonstrate this approach, we can also adjust the size of this image. By using the plt.subplots() function, we can define the figsize in inches. We can also remove the legend using ax.legend().set_visible(False)

fig, ax = plt.subplots(figsize=(5, 6))
top_10.plot(kind='barh', y="Sales", x="Name", ax=ax)
ax.set_xlim([-10000, 140000])
ax.set(title='2014 Revenue', xlabel='Total Revenue')
ax.legend().set_visible(False)
Pandas plot 4

There are plenty of things you probably want to do to clean up this plot. One of the biggest eye sores is the formatting of the Total Revenue numbers. Matplotlib can help us with this through the use of the FuncFormatter . This versatile function can apply a user defined function to a value and return a nicely formatted string to place on the axis.

Here is a currency formatting function to gracefully handle US dollars in the several hundred thousand dollar range:

def currency(x, pos):
    'The two args are the value and tick position'
    if x >= 1000000:
        return '${:1.1f}M'.format(x*1e-6)
    return '${:1.0f}K'.format(x*1e-3)

Now that we have a formatter function, we need to define it and apply it to the x axis. Here is the full code:

fig, ax = plt.subplots()
top_10.plot(kind='barh', y="Sales", x="Name", ax=ax)
ax.set_xlim([-10000, 140000])
ax.set(title='2014 Revenue', xlabel='Total Revenue', ylabel='Customer')
formatter = FuncFormatter(currency)
ax.xaxis.set_major_formatter(formatter)
ax.legend().set_visible(False)
Pandas plot 5

That's much nicer and shows a good example of the flexibility to define your own solution to the problem.

The final customization feature I will go through is the ability to add annotations to the plot. In order to draw a vertical line, you can use ax.axvline() and to add custom text, you can use ax.text() .

For this example, we'll draw a line showing an average and include labels showing three new customers. Here is the full code with comments to pull it all together.

# Create the figure and the axes
fig, ax = plt.subplots()

# Plot the data and get the averaged
top_10.plot(kind='barh', y="Sales", x="Name", ax=ax)
avg = top_10['Sales'].mean()

# Set limits and labels
ax.set_xlim([-10000, 140000])
ax.set(title='2014 Revenue', xlabel='Total Revenue', ylabel='Customer')

# Add a line for the average
ax.axvline(x=avg, color='b', label='Average', linestyle='--', linewidth=1)

# Annotate the new customers
for cust in [3, 5, 8]:
    ax.text(115000, cust, "New Customer")

# Format the currency
formatter = FuncFormatter(currency)
ax.xaxis.set_major_formatter(formatter)

# Hide the legend
ax.legend().set_visible(False)
Pandas plot 6

While this may not be the most exciting plot it does show how much power you have when following this approach.

Figures and Plots

Up until now, all the changes we have made have been with the indivudual plot. Fortunately, we also have the ability to add multiple plots on a figure as well as save the entire figure using various options.

If we decided that we wanted to put two plots on the same figure, we should have a basic understanding of how to do it. First, create the figure, then the axes, then plot it all together. We can accomplish this using plt.subplots() :

fig, (ax0, ax1) = plt.subplots(nrows=1, ncols=2, sharey=True, figsize=(7, 4))

In this example, I'm using nrows and ncols to specify the size because this is very clear to the new user. In sample code you will frequently just see variables like 1,2. I think using the named parameters is a little easier to interpret later on when you're looking at your code.

I am also using sharey=True so that the yaxis will share the same labels.

This example is also kind of nifty because the various axes get unpacked to ax0 and ax1 . Now that we have these axes, you can plot them like the examples above but put one plot on ax0 and the other on ax1 .

# Get the figure and the axes
fig, (ax0, ax1) = plt.subplots(nrows=1,ncols=2, sharey=True, figsize=(7, 4))
top_10.plot(kind='barh', y="Sales", x="Name", ax=ax0)
ax0.set_xlim([-10000, 140000])
ax0.set(title='Revenue', xlabel='Total Revenue', ylabel='Customers')

# Plot the average as a vertical line
avg = top_10['Sales'].mean()
ax0.axvline(x=avg, color='b', label='Average', linestyle='--', linewidth=1)

# Repeat for the unit plot
top_10.plot(kind='barh', y="Purchases", x="Name", ax=ax1)
avg = top_10['Purchases'].mean()
ax1.set(title='Units', xlabel='Total Units', ylabel='')
ax1.axvline(x=avg, color='b', label='Average', linestyle='--', linewidth=1)

# Title the figure
fig.suptitle('2014 Sales Analysis', fontsize=14, fontweight='bold');

# Hide the legends
ax1.legend().set_visible(False)
ax0.legend().set_visible(False)
Pandas plot 7

Up until now, I have been relying on the jupyter notebook to display the figures by virtue of the %matplotlib inline directive. However, there are going to be plenty of times where you have the need to save a figure in a specific format and integrate it with some other presentation.

Matplotlib supports many different formats for saving files. You can use fig.canvas.get_supported_filetypes() to see what your system supports:

fig.canvas.get_supported_filetypes()
{'eps': 'Encapsulated Postscript',
 'jpeg': 'Joint Photographic Experts Group',
 'jpg': 'Joint Photographic Experts Group',
 'pdf': 'Portable Document Format',
 'pgf': 'PGF code for LaTeX',
 'png': 'Portable Network Graphics',
 'ps': 'Postscript',
 'raw': 'Raw RGBA bitmap',
 'rgba': 'Raw RGBA bitmap',
 'svg': 'Scalable Vector Graphics',
 'svgz': 'Scalable Vector Graphics',
 'tif': 'Tagged Image File Format',
 'tiff': 'Tagged Image File Format'}

Since we have the fig object, we can save the figure using multiple options:

fig.savefig('sales.png', transparent=False, dpi=80, bbox_inches="tight")

This version saves the plot as a png with opaque background. I have also specified the dpi and bbox_inches="tight" in order to minimize excess white space.

Conclusion

Hopefully this process has helped you understand how to more effectively use matplotlib in your daily data analysis. If you get in the habit of using this approach when doing your analysis, you should be able to quickly find out how to do whatever you need to do to customize your plot.

As a final bonus, I am including a quick guide to unify all the concepts. I hope this helps bring this post together and proves a handy reference for future use.

Matplotlib example

25 Apr 2017 12:15pm GMT

Chris Moffitt: Effectively Using Matplotlib

Introduction

The python visualization world can be a frustrating place for a new user. There are many different options and choosing the right one is a challenge. For example, even after 2 years, this article is one of the top posts that lead people to this site. In that article, I threw some shade at matplotlib and dismissed it during the analysis. However, after using tools such as pandas, scikit-learn, seaborn and the rest of the data science stack in python - I think I was a little premature in dismissing matplotlib. To be honest, I did not quite understand it and how to use it effectively in my workflow.

Now that I have taken the time to learn some of these tools and how to use them with matplotlib, I have started to see matplotlib as an indispensable tool. This post will show how I use matplotlib and provide some recommendations for users getting started or users who have not taken the time to learn matplotlib. I do firmly believe matplotlib is an essential part of the python data science stack and hope this article will help people understand how to use it for their own visualizations.

Why all the negativity towards matplotlib?

In my opinion, there are a couple of reasons why matplotlib is challenging for the new user to learn.

First, matplotlib has two interfaces. The first is based on MATLAB and uses a state-based interface. The second option is an an object-oriented interface. The why's of this dual approach are outside the scope of this post but knowing that there are two approaches is vitally important when plotting with matplotlib.

The reason two interfaces cause confusion is that in the world of stack overflow and tons of information available via google searches, new users will stumble across multiple solutions to problems that look somewhat similar but are not the same. I can speak from experience. Looking back on some of my old code, I can tell that there is a mishmash of matplotlib code - which is confusing to me (even if I wrote it).

Key Point
New matplotlib users should learn and use the object oriented interface.

Another historic challenge with matplotlib is that some of the default style choices were rather unattractive. In a world where R could generate some really cool plots with ggplot, the matplotlib options tended to look a bit ugly in comparison. The good news is that matplotlib 2.0 has much nicer styling capabilities and ability to theme your visualizations with minimal effort.

The third challenge I see with matplotlib is that there is confusion as to when you should use pure matplotlib to plot something vs. a tool like pandas or seaborn that is built on top of matplotlib. Anytime there can be more than one way to do something, it is challenging for the new or infrequent user to follow the right path. Couple this confusion with the two different API's and it is a recipe for frustration.

Why stick with matplotlib?

Despite some of these issues, I have come to appreciate matplotlib because it is extremely powerful. The library allows you to create almost any visualization you could imagine. Additionally, there is a rich ecosystem of python tools built around it and many of the more advanced visualization tools use matplotlib as the base library. If you do any work in the python data science stack, you will need to develop some basic familiarity with how to use matplotlib. That is the focus of the rest of this post - developing a basic approach for effectively using matplotlib.

Basic Premises

If you take nothing else away from this post, I recommend the following steps for learning how to use matplotlib:

  1. Learn the basic matplotlib terminology, specifically what is a Figure and an Axes .
  2. Always use the object-oriented interface. Get in the habit of using it from the start of your analysis.
  3. Start your visualizations with basic pandas plotting.
  4. Use seaborn for the more complex statistical visualizations.
  5. Use matplotlib to customize the pandas or seaborn visualization.

This graphic from the matplotlib faq is gold. Keep it handy to understand the different terminology of a plot.

Matplotlib parts

Most of the terms are straightforward but the main thing to remember is that the Figure is the final image that may contain 1 or more axes. The Axes represent an individual plot. Once you understand what these are and how to access them through the object oriented API, the rest of the process starts to fall into place.

The other benefit of this knowledge is that you have a starting point when you see things on the web. If you take the time to understand this point, the rest of the matplotlib API will start to make sense. Also, many of the advanced python packages like seaborn and ggplot rely on matplotlib so understanding the basics will make those more powerful frameworks much easier to learn.

Finally, I am not saying that you should avoid the other good options like ggplot (aka ggpy), bokeh, plotly or altair. I just think you'll need a basic understanding of matplotlib + pandas + seaborn to start. Once you understand the basic visualization stack, you can explore the other options and make informed choices based on your needs.

Getting Started

The rest of this post will be a primer on how to do the basic visualization creation in pandas and customize the most common items using matplotlib. Once you understand the basic process, further customizations are relatively straightforward.

I have focused on the most common plotting tasks I encounter such as labeling axes, adjusting limits, updating plot titles, saving figures and adjusting legends. If you would like to follow along, the notebook includes additional detail that should be helpful.

To get started, I am going to setup my imports and read in some data:

import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.ticker import FuncFormatter

df = pd.read_excel("https://github.com/chris1610/pbpython/blob/master/data/sample-salesv3.xlsx?raw=true")
df.head()
account number name sku quantity unit price ext price date
0 740150 Barton LLC B1-20000 39 86.69 3380.91 2014-01-01 07:21:51
1 714466 Trantow-Barrows S2-77896 -1 63.16 -63.16 2014-01-01 10:00:47
2 218895 Kulas Inc B1-69924 23 90.70 2086.10 2014-01-01 13:24:58
3 307599 Kassulke, Ondricka and Metz S1-65481 41 21.05 863.05 2014-01-01 15:05:22
4 412290 Jerde-Hilpert S2-34077 6 83.21 499.26 2014-01-01 23:26:55

The data consists of sales transactions for 2014. In order to make this post a little shorter, I'm going to summarize the data so we can see the total number of purchases and total sales for the top 10 customers. I am also going to rename columns for clarity during plots.

top_10 = (df.groupby('name')['ext price', 'quantity'].agg({'ext price': 'sum', 'quantity': 'count'})
          .sort_values(by='ext price', ascending=False))[:10].reset_index()
top_10.rename(columns={'name': 'Name', 'ext price': 'Sales', 'quantity': 'Purchases'}, inplace=True)

Here is what the data looks like.

Name Purchases Sales
0 Kulas Inc 94 137351.96
1 White-Trantow 86 135841.99
2 Trantow-Barrows 94 123381.38
3 Jerde-Hilpert 89 112591.43
4 Fritsch, Russel and Anderson 81 112214.71
5 Barton LLC 82 109438.50
6 Will LLC 74 104437.60
7 Koepp Ltd 82 103660.54
8 Frami, Hills and Schmidt 72 103569.59
9 Keeling LLC 74 100934.30

Now that the data is formatted in a simple table, let's talk about plotting these results as a bar chart.

As I mentioned earlier, matplotlib has many different styles available for rendering plots. You can see which ones are available on your system using plt.style.available .

plt.style.available
['seaborn-dark',
 'seaborn-dark-palette',
 'fivethirtyeight',
 'seaborn-whitegrid',
 'seaborn-darkgrid',
 'seaborn',
 'bmh',
 'classic',
 'seaborn-colorblind',
 'seaborn-muted',
 'seaborn-white',
 'seaborn-talk',
 'grayscale',
 'dark_background',
 'seaborn-deep',
 'seaborn-bright',
 'ggplot',
 'seaborn-paper',
 'seaborn-notebook',
 'seaborn-poster',
 'seaborn-ticks',
 'seaborn-pastel']

Using a style is as simple as:

plt.style.use('ggplot')

I encourage you to play around with different styles and see which ones you like.

Now that we have a nicer style in place, the first step is to plot the data using the standard pandas plotting function:

top_10.plot(kind='barh', y="Sales", x="Name")
Pandas plot 1

The reason I recommend using pandas plotting first is that it is a quick and easy way to prototype your visualization. Since most people are probably already doing some level of data manipulation/analysis in pandas as a first step, go ahead and use the basic plots to get started.

Customizing the Plot

Assuming you are comfortable with the gist of this plot, the next step is to customize it. Some of the customizations (like adding titles and labels) are very simple to use with the pandas plot function. However, you will probably find yourself needing to move outside of that functionality at some point. That's why I recommend getting in the habit of doing this:

fig, ax = plt.subplots()
top_10.plot(kind='barh', y="Sales", x="Name", ax=ax)

The resulting plot looks exactly the same as the original but we added an additional call to plt.subplots() and passed the ax to the plotting function. Why should you do this? Remember when I said it is critical to get access to the axes and figures in matplotlib? That's what we have accomplished here. Any future customization will be done via the ax or fig objects.

We have the benefit of a quick plot from pandas but access to all the power from matplotlib now. An example should show what we can do now. Also, by using this naming convention, it is fairly straightforward to adapt others' solutions to your unique needs.

Suppose we want to tweak the x limits and change some axis labels? Now that we have the axes in the ax variable, we have a lot of control:

fig, ax = plt.subplots()
top_10.plot(kind='barh', y="Sales", x="Name", ax=ax)
ax.set_xlim([-10000, 140000])
ax.set_xlabel('Total Revenue')
ax.set_ylabel('Customer');
Pandas plot 2

Here's another shortcut we can use to change the title and both labels:

fig, ax = plt.subplots()
top_10.plot(kind='barh', y="Sales", x="Name", ax=ax)
ax.set_xlim([-10000, 140000])
ax.set(title='2014 Revenue', xlabel='Total Revenue', ylabel='Customer')
Pandas plot 3

To further demonstrate this approach, we can also adjust the size of this image. By using the plt.subplots() function, we can define the figsize in inches. We can also remove the legend using ax.legend().set_visible(False)

fig, ax = plt.subplots(figsize=(5, 6))
top_10.plot(kind='barh', y="Sales", x="Name", ax=ax)
ax.set_xlim([-10000, 140000])
ax.set(title='2014 Revenue', xlabel='Total Revenue')
ax.legend().set_visible(False)
Pandas plot 4

There are plenty of things you probably want to do to clean up this plot. One of the biggest eye sores is the formatting of the Total Revenue numbers. Matplotlib can help us with this through the use of the FuncFormatter . This versatile function can apply a user defined function to a value and return a nicely formatted string to place on the axis.

Here is a currency formatting function to gracefully handle US dollars in the several hundred thousand dollar range:

def currency(x, pos):
    'The two args are the value and tick position'
    if x >= 1000000:
        return '${:1.1f}M'.format(x*1e-6)
    return '${:1.0f}K'.format(x*1e-3)

Now that we have a formatter function, we need to define it and apply it to the x axis. Here is the full code:

fig, ax = plt.subplots()
top_10.plot(kind='barh', y="Sales", x="Name", ax=ax)
ax.set_xlim([-10000, 140000])
ax.set(title='2014 Revenue', xlabel='Total Revenue', ylabel='Customer')
formatter = FuncFormatter(currency)
ax.xaxis.set_major_formatter(formatter)
ax.legend().set_visible(False)
Pandas plot 5

That's much nicer and shows a good example of the flexibility to define your own solution to the problem.

The final customization feature I will go through is the ability to add annotations to the plot. In order to draw a vertical line, you can use ax.axvline() and to add custom text, you can use ax.text() .

For this example, we'll draw a line showing an average and include labels showing three new customers. Here is the full code with comments to pull it all together.

# Create the figure and the axes
fig, ax = plt.subplots()

# Plot the data and get the averaged
top_10.plot(kind='barh', y="Sales", x="Name", ax=ax)
avg = top_10['Sales'].mean()

# Set limits and labels
ax.set_xlim([-10000, 140000])
ax.set(title='2014 Revenue', xlabel='Total Revenue', ylabel='Customer')

# Add a line for the average
ax.axvline(x=avg, color='b', label='Average', linestyle='--', linewidth=1)

# Annotate the new customers
for cust in [3, 5, 8]:
    ax.text(115000, cust, "New Customer")

# Format the currency
formatter = FuncFormatter(currency)
ax.xaxis.set_major_formatter(formatter)

# Hide the legend
ax.legend().set_visible(False)
Pandas plot 6

While this may not be the most exciting plot it does show how much power you have when following this approach.

Figures and Plots

Up until now, all the changes we have made have been with the indivudual plot. Fortunately, we also have the ability to add multiple plots on a figure as well as save the entire figure using various options.

If we decided that we wanted to put two plots on the same figure, we should have a basic understanding of how to do it. First, create the figure, then the axes, then plot it all together. We can accomplish this using plt.subplots() :

fig, (ax0, ax1) = plt.subplots(nrows=1, ncols=2, sharey=True, figsize=(7, 4))

In this example, I'm using nrows and ncols to specify the size because this is very clear to the new user. In sample code you will frequently just see variables like 1,2. I think using the named parameters is a little easier to interpret later on when you're looking at your code.

I am also using sharey=True so that the yaxis will share the same labels.

This example is also kind of nifty because the various axes get unpacked to ax0 and ax1 . Now that we have these axes, you can plot them like the examples above but put one plot on ax0 and the other on ax1 .

# Get the figure and the axes
fig, (ax0, ax1) = plt.subplots(nrows=1,ncols=2, sharey=True, figsize=(7, 4))
top_10.plot(kind='barh', y="Sales", x="Name", ax=ax0)
ax0.set_xlim([-10000, 140000])
ax0.set(title='Revenue', xlabel='Total Revenue', ylabel='Customers')

# Plot the average as a vertical line
avg = top_10['Sales'].mean()
ax0.axvline(x=avg, color='b', label='Average', linestyle='--', linewidth=1)

# Repeat for the unit plot
top_10.plot(kind='barh', y="Purchases", x="Name", ax=ax1)
avg = top_10['Purchases'].mean()
ax1.set(title='Units', xlabel='Total Units', ylabel='')
ax1.axvline(x=avg, color='b', label='Average', linestyle='--', linewidth=1)

# Title the figure
fig.suptitle('2014 Sales Analysis', fontsize=14, fontweight='bold');

# Hide the legends
ax1.legend().set_visible(False)
ax0.legend().set_visible(False)
Pandas plot 7

Up until now, I have been relying on the jupyter notebook to display the figures by virtue of the %matplotlib inline directive. However, there are going to be plenty of times where you have the need to save a figure in a specific format and integrate it with some other presentation.

Matplotlib supports many different formats for saving files. You can use fig.canvas.get_supported_filetypes() to see what your system supports:

fig.canvas.get_supported_filetypes()
{'eps': 'Encapsulated Postscript',
 'jpeg': 'Joint Photographic Experts Group',
 'jpg': 'Joint Photographic Experts Group',
 'pdf': 'Portable Document Format',
 'pgf': 'PGF code for LaTeX',
 'png': 'Portable Network Graphics',
 'ps': 'Postscript',
 'raw': 'Raw RGBA bitmap',
 'rgba': 'Raw RGBA bitmap',
 'svg': 'Scalable Vector Graphics',
 'svgz': 'Scalable Vector Graphics',
 'tif': 'Tagged Image File Format',
 'tiff': 'Tagged Image File Format'}

Since we have the fig object, we can save the figure using multiple options:

fig.savefig('sales.png', transparent=False, dpi=80, bbox_inches="tight")

This version saves the plot as a png with opaque background. I have also specified the dpi and bbox_inches="tight" in order to minimize excess white space.

Conclusion

Hopefully this process has helped you understand how to more effectively use matplotlib in your daily data analysis. If you get in the habit of using this approach when doing your analysis, you should be able to quickly find out how to do whatever you need to do to customize your plot.

As a final bonus, I am including a quick guide to unify all the concepts. I hope this helps bring this post together and proves a handy reference for future use.

Matplotlib example

25 Apr 2017 12:15pm GMT

Django Weblog: DjangoCon Europe 2017 in retrospect

DjangoCon Europe 2017 upheld all the traditions established by previous editions: a volunteer-run event, speakers from all sections of the community and a commitment to stage a memorable, enjoyable conference for all attendees.

Held in a stunning Art Deco cinema in the centre of the city, this year's edition was host to over 350 Djangonauts.

The team of always-smiling and willing volunteers, led by Emanuela Dal Mas and Iacopo Spalletti under the auspices of the Fuzzy Brains association, created a stellar success on behalf of all the community.

Of note in this year's conference was an emphasis on inclusion, as expressed in the conference's manifesto. The organisers' efforts to expand the notion of inclusion was visible in the number of attendees from Africa and south Asia, nearly all of whom were also given a platform at the event. This was made possible not only by the financial assistance programme but also through the considerable logistical help the organisers were able to offer.

The conference's opening keynote talk by Anna Makarudze and Humphrey Butau on the growing Python community in Zimbabwe, and an all-woman panel discussing their journeys in technology, were just two examples of a commitment to making more space for voices and stories that are less often heard.

DjangoCon Europe continues to thrive and sparkle in the hands of the people who care about it most, and who step forward each year as volunteers who commit hundreds of hours of their time to make the best possible success of it. Once again, this care has shone through.

On behalf of the whole Django community, the Django Software Foundation would like to thank the entire organising team and all the other volunteers of this year's DjangoCon Europe, for putting on a superb and memorable production.

The next DjangoCons in Europe

The DSF Board is considering bids for DjangoCon Europe 2018-2020. If you're interested in hosting the event in one of these years, we'd like to hear from you as soon as possible.

25 Apr 2017 9:05am GMT

Django Weblog: DjangoCon Europe 2017 in retrospect

DjangoCon Europe 2017 upheld all the traditions established by previous editions: a volunteer-run event, speakers from all sections of the community and a commitment to stage a memorable, enjoyable conference for all attendees.

Held in a stunning Art Deco cinema in the centre of the city, this year's edition was host to over 350 Djangonauts.

The team of always-smiling and willing volunteers, led by Emanuela Dal Mas and Iacopo Spalletti under the auspices of the Fuzzy Brains association, created a stellar success on behalf of all the community.

Of note in this year's conference was an emphasis on inclusion, as expressed in the conference's manifesto. The organisers' efforts to expand the notion of inclusion was visible in the number of attendees from Africa and south Asia, nearly all of whom were also given a platform at the event. This was made possible not only by the financial assistance programme but also through the considerable logistical help the organisers were able to offer.

The conference's opening keynote talk by Anna Makarudze and Humphrey Butau on the growing Python community in Zimbabwe, and an all-woman panel discussing their journeys in technology, were just two examples of a commitment to making more space for voices and stories that are less often heard.

DjangoCon Europe continues to thrive and sparkle in the hands of the people who care about it most, and who step forward each year as volunteers who commit hundreds of hours of their time to make the best possible success of it. Once again, this care has shone through.

On behalf of the whole Django community, the Django Software Foundation would like to thank the entire organising team and all the other volunteers of this year's DjangoCon Europe, for putting on a superb and memorable production.

The next DjangoCons in Europe

The DSF Board is considering bids for DjangoCon Europe 2018-2020. If you're interested in hosting the event in one of these years, we'd like to hear from you as soon as possible.

25 Apr 2017 9:05am GMT

PyBites: How to Write a Simple Slack Bot to Monitor Your Brand on Twitter

In this article I show you how to monitor Twitter and post alerts to a Slack channel. We built a nice tool to monitor whenever our domain gets mentioned on Twitter. The slacker and twython modules made this pretty easy. We also use configparser and logging.

25 Apr 2017 9:00am GMT

PyBites: How to Write a Simple Slack Bot to Monitor Your Brand on Twitter

In this article I show you how to monitor Twitter and post alerts to a Slack channel. We built a nice tool to monitor whenever our domain gets mentioned on Twitter. The slacker and twython modules made this pretty easy. We also use configparser and logging.

25 Apr 2017 9:00am GMT

S. Lott: Modules vs. Monoliths vs. Microservices:

Dan Bader (@dbader_org)
Worth a read: "Modules vs. Microservices" (and how to find a middle ground) oreilly.com/ideas/modules-…


"don't trick yourself into a microservices-only mindset"

Thanks for sharing.

The referenced post gives you the freedom to have a "big-ish" microservice. My current example has four very closely-related resources. There's agony in decomposing these into separate services. So we have several distinct Python modules bound into a single Flask container.

Yes. We lack the advertised static type checking for module boundaries. The kind of static type checking that doesn't actually solve any actual problems, since the issues are always semantic and can only be found with unit tests and integration tests and Gherkin-based acceptance testing (see Python BDD: https://pypi.python.org/pypi/pytest-bdd and https://pypi.python.org/pypi/behave/1.2.5).

We walk a fine line. How tightly coupled are these resources? Can they actually be used in isolation? What do the possible future changes look like? Where is the swagger.json going to change?

It's helpful to have both options on the table.

25 Apr 2017 8:00am GMT

S. Lott: Modules vs. Monoliths vs. Microservices:

Dan Bader (@dbader_org)
Worth a read: "Modules vs. Microservices" (and how to find a middle ground) oreilly.com/ideas/modules-…


"don't trick yourself into a microservices-only mindset"

Thanks for sharing.

The referenced post gives you the freedom to have a "big-ish" microservice. My current example has four very closely-related resources. There's agony in decomposing these into separate services. So we have several distinct Python modules bound into a single Flask container.

Yes. We lack the advertised static type checking for module boundaries. The kind of static type checking that doesn't actually solve any actual problems, since the issues are always semantic and can only be found with unit tests and integration tests and Gherkin-based acceptance testing (see Python BDD: https://pypi.python.org/pypi/pytest-bdd and https://pypi.python.org/pypi/behave/1.2.5).

We walk a fine line. How tightly coupled are these resources? Can they actually be used in isolation? What do the possible future changes look like? Where is the swagger.json going to change?

It's helpful to have both options on the table.

25 Apr 2017 8:00am GMT

Daniel Bader: Sets and Multisets in Python

Sets and Multisets in Python

How to implement mutable and immutable set and multiset (bag) data structures in Python using built-in data types and classes from the standard library.

A set is an unordered collection of objects that does not allow duplicate elements. Typically sets are used to quickly test a value for membership in the set, to insert or delete new values from a set, and to compute the union or intersection of two sets.

In a "proper" set implementation, membership tests are expected to run in O(1) time. Union, intersection, difference, and subset operations should take O(n) time on average. The set implementations included in Python's standard library follow these performance characteristics.

Just like dictionaries, sets get special treatment in Python and have some syntactic sugar that makes it easier to create sets. For example, the curly-braces set expression syntax and set comprehensions allow you to conveniently define new set instances:

vowels = {'a', 'e', 'i', 'o', 'u'}
squares = {x * x for x in range(10)}

Careful: To create an empty set you'll need to call the set() constructor, as using empty curly-braces ({}) is ambiguous and will create a dictionary instead.

Python and its standard library provide the following set implementations:

✅ The set Built-in

The built-in set implementation in Python. The set type in Python is mutable and allows the dynamic insertion and deletion of elements. Python's sets are backed by the dict data type and share the same performance characteristics. Any hashable object can be stored in a set.

>>> vowels = {'a', 'e', 'i', 'o', 'u'}
>>> 'e' in vowels
True

>>> letters = set('alice')
>>> letters.intersection(vowels)
{'a', 'e', 'i'}

>>> vowels.add('x')
>>> vowels
{'i', 'a', 'u', 'o', 'x', 'e'}

✅ The frozenset Built-in

An immutable version of set that cannot be changed after it was constructed. Frozensets are static and only allow query operations on their elements (no inserts or deletions.) Because frozensets are static and hashable they can be used as dictionary keys or as elements of another set.

>>> vowels = frozenset({'a', 'e', 'i', 'o', 'u'})
>>> vowels.add('p')
AttributeError: "'frozenset' object has no attribute 'add'"

✅ The collections.Counter Class

The collections.Counter class in the Python standard library implements a multiset (or bag) type that allows elements in the set to have more than one occurrence. This is useful if you need to keep track not only if an element is part of a set but also how many times it is included in the set.

>>> from collections import Counter
>>> inventory = Counter()

>>> loot = {'sword': 1, 'bread': 3}
>>> inventory.update(loot)
>>> inventory
Counter({'bread': 3, 'sword': 1})

>>> more_loot = {'sword': 1, 'apple': 1}
>>> inventory.update(more_loot)
>>> inventory
Counter({'bread': 3, 'sword': 2, 'apple': 1})

📺🐍 Learn More With This Video Tutorial

I recorded a step-by-step video tutorial to go along with the article. See how sets work in general and how to use them in Python. Watch the video embedded below or on my YouTube channel:

» Subscribe to the dbader.org YouTube Channel for more Python tutorials.

Read the full "Fundamental Data Structures in Python" article series here. This article is missing something or you found an error? Help a brother out and leave a comment below.

25 Apr 2017 12:00am GMT

Daniel Bader: Sets and Multisets in Python

Sets and Multisets in Python

How to implement mutable and immutable set and multiset (bag) data structures in Python using built-in data types and classes from the standard library.

A set is an unordered collection of objects that does not allow duplicate elements. Typically sets are used to quickly test a value for membership in the set, to insert or delete new values from a set, and to compute the union or intersection of two sets.

In a "proper" set implementation, membership tests are expected to run in O(1) time. Union, intersection, difference, and subset operations should take O(n) time on average. The set implementations included in Python's standard library follow these performance characteristics.

Just like dictionaries, sets get special treatment in Python and have some syntactic sugar that makes it easier to create sets. For example, the curly-braces set expression syntax and set comprehensions allow you to conveniently define new set instances:

vowels = {'a', 'e', 'i', 'o', 'u'}
squares = {x * x for x in range(10)}

Careful: To create an empty set you'll need to call the set() constructor, as using empty curly-braces ({}) is ambiguous and will create a dictionary instead.

Python and its standard library provide the following set implementations:

✅ The set Built-in

The built-in set implementation in Python. The set type in Python is mutable and allows the dynamic insertion and deletion of elements. Python's sets are backed by the dict data type and share the same performance characteristics. Any hashable object can be stored in a set.

>>> vowels = {'a', 'e', 'i', 'o', 'u'}
>>> 'e' in vowels
True

>>> letters = set('alice')
>>> letters.intersection(vowels)
{'a', 'e', 'i'}

>>> vowels.add('x')
>>> vowels
{'i', 'a', 'u', 'o', 'x', 'e'}

✅ The frozenset Built-in

An immutable version of set that cannot be changed after it was constructed. Frozensets are static and only allow query operations on their elements (no inserts or deletions.) Because frozensets are static and hashable they can be used as dictionary keys or as elements of another set.

>>> vowels = frozenset({'a', 'e', 'i', 'o', 'u'})
>>> vowels.add('p')
AttributeError: "'frozenset' object has no attribute 'add'"

✅ The collections.Counter Class

The collections.Counter class in the Python standard library implements a multiset (or bag) type that allows elements in the set to have more than one occurrence. This is useful if you need to keep track not only if an element is part of a set but also how many times it is included in the set.

>>> from collections import Counter
>>> inventory = Counter()

>>> loot = {'sword': 1, 'bread': 3}
>>> inventory.update(loot)
>>> inventory
Counter({'bread': 3, 'sword': 1})

>>> more_loot = {'sword': 1, 'apple': 1}
>>> inventory.update(more_loot)
>>> inventory
Counter({'bread': 3, 'sword': 2, 'apple': 1})

📺🐍 Learn More With This Video Tutorial

I recorded a step-by-step video tutorial to go along with the article. See how sets work in general and how to use them in Python. Watch the video embedded below or on my YouTube channel:

» Subscribe to the dbader.org YouTube Channel for more Python tutorials.

Read the full "Fundamental Data Structures in Python" article series here. This article is missing something or you found an error? Help a brother out and leave a comment below.

25 Apr 2017 12:00am GMT

24 Apr 2017

feedPlanet Python

Andre Roberge: Easily modifiable Python

You want to try out some new syntactic construction in Python but do NOT want to have to go through the trouble of 1. modifying Python's grammar file 2. modifying Python's lexer 3. modifying Python's parser 4. modifying Python's compiler 5. re-compile all the sources so as to create a new Python interpreter? (These steps are described in these two blog posts.) It can be done much more easily

24 Apr 2017 10:47pm GMT

Andre Roberge: Easily modifiable Python

You want to try out some new syntactic construction in Python but do NOT want to have to go through the trouble of 1. modifying Python's grammar file 2. modifying Python's lexer 3. modifying Python's parser 4. modifying Python's compiler 5. re-compile all the sources so as to create a new Python interpreter? (These steps are described in these two blog posts.) It can be done much more easily

24 Apr 2017 10:47pm GMT

François Dion: Meet Eliza #AI



I will be presenting and directing a discussion on artificial intelligence, from various angles including the arts, Tuesday April 25th at Wake Forest in Winston Salem, NC.

Details here:
http://www.pyptug.org/2017/04/pyptug-monthly-meeting-meet-eliza-april.html

Francois Dion
@f_dion

24 Apr 2017 9:48pm GMT

François Dion: Meet Eliza #AI



I will be presenting and directing a discussion on artificial intelligence, from various angles including the arts, Tuesday April 25th at Wake Forest in Winston Salem, NC.

Details here:
http://www.pyptug.org/2017/04/pyptug-monthly-meeting-meet-eliza-april.html

Francois Dion
@f_dion

24 Apr 2017 9:48pm GMT

NumFOCUS: Moore Foundation gives grant to support NumFOCUS Diversity & Inclusion in Scientific Computing initiatives

As part of our mission to support and promote better science through support of the open source scientific software community, NumFOCUS champions technical progress through diversity. NumFOCUS recognizes that the open source data science community is currently highly homogenous. We believe that diverse contributors and community members produce better science and better projects. NumFOCUS strives […]

24 Apr 2017 7:23pm GMT

NumFOCUS: Moore Foundation gives grant to support NumFOCUS Diversity & Inclusion in Scientific Computing initiatives

As part of our mission to support and promote better science through support of the open source scientific software community, NumFOCUS champions technical progress through diversity. NumFOCUS recognizes that the open source data science community is currently highly homogenous. We believe that diverse contributors and community members produce better science and better projects. NumFOCUS strives […]

24 Apr 2017 7:23pm GMT

Weekly Python Chat: Python Oddities

I tweeted one Python oddity each week in 2016. Let's chat about them.

24 Apr 2017 5:15pm GMT

Weekly Python Chat: Python Oddities

I tweeted one Python oddity each week in 2016. Let's chat about them.

24 Apr 2017 5:15pm GMT

Ned Batchelder: Shell = Maybe

A common help Python question: how do I get Python to run this complicated command line program? Often, the answer involves details of how shells work. I tried my hand at explaining it what a shell does, why you want to avoid them, how to avoid them from Python, and why you might want to use one: Shell = Maybe.

24 Apr 2017 3:38pm GMT

Ned Batchelder: Shell = Maybe

A common help Python question: how do I get Python to run this complicated command line program? Often, the answer involves details of how shells work. I tried my hand at explaining it what a shell does, why you want to avoid them, how to avoid them from Python, and why you might want to use one: Shell = Maybe.

24 Apr 2017 3:38pm GMT

NumFOCUS: Anyone Can Do Astronomy with Python and Open Data

Ole Moeller-Nilsson, CTO at Pivigo, was kind enough to share his insights on how a beginner can easily get started exploring astronomy using Python. This blog post grew out of a presentation he gave at PyData London meetup on March 7th. Python is a great language for science, and specifically for astronomy. The various packages […]

24 Apr 2017 3:00pm GMT

NumFOCUS: Anyone Can Do Astronomy with Python and Open Data

Ole Moeller-Nilsson, CTO at Pivigo, was kind enough to share his insights on how a beginner can easily get started exploring astronomy using Python. This blog post grew out of a presentation he gave at PyData London meetup on March 7th. Python is a great language for science, and specifically for astronomy. The various packages […]

24 Apr 2017 3:00pm GMT

Doug Hellmann: zlib — GNU zlib Compression — PyMOTW 3

The zlib module provides a lower-level interface to many of the functions in the zlib compression library from the GNU project. Read more… This post is part of the Python Module of the Week series for Python 3. See PyMOTW.com for more articles from the series.

24 Apr 2017 1:00pm GMT

Doug Hellmann: zlib — GNU zlib Compression — PyMOTW 3

The zlib module provides a lower-level interface to many of the functions in the zlib compression library from the GNU project. Read more… This post is part of the Python Module of the Week series for Python 3. See PyMOTW.com for more articles from the series.

24 Apr 2017 1:00pm GMT

Mike Driscoll: PyDev of the Week: Honza Král

This week we welcome Honza Král (@HonzaKral) as our PyDev of the Week! Honza is one of the core developers of the Django web framework. He is also the maintainer of the official Python client to Elasticsearch. You can see some of the projects he is interested in or working on over at Github. Let's spend some time getting to know Honza better!

Can you tell us a little about yourself (hobbies, education, etc):

I grew up in the Old Town of Prague, Czech Republic where I also went to school, including University where I studied computer science (which I didn't finish). During my studies I discovered Python and immediately fell in love, first with the language and later, after going to my first Pycon in 2008, with the community.

I became a part of the Django community which was (and still is to this day) very welcoming. I became a part of it to learn and, hopefully, contribute something back. For my part it worked amazingly well - I got all my recent jobs through the community and even met my fiancee at a EuroPython conference!

Nowadays I work for Elastic, the company behind Elasticsearch where I do consulting - traveling around the world helping people be successful with open source - and also maintain the official Python client.

Why did you start using Python / Django?

I started in university for a project. I had my eye on Python for some time then and a perfect opportunity presented itself to use it for one of our team projects - we decided it would be a web-based application and Django was the shiny new thing that caught my attention. I introduced it to the rest of the team and have been using it ever since.

What other programming languages do you know and which is your favorite?

My other favorite, besides Python of course :), would be shell - I still enjoy writing both simple (or a bit more complex) one-liners as well as longer scripts in shell. I like the variety of it where you have so many possibilities on how to accomplish things with a seemingly limited language. I like how it's very unlike other languages I am used to working in as it forces you to think more in terms of pipelines and "just" passing off problems to another app/language. It definitely helped my Python skills as well I think - thinking of what I need to keep in a list/variable vs where a simple generator would suffice (as would a pipeline in shell) etc.

Right now I am looking at rust and hoping that a suitable project will come along for me to give it a serious try.

What projects are you working on now?

Currently I am working on elasticsearch-dsl which is an ORM-like abstraction on top of elasticsearch. It is an ambitious project since I wanted to expose all the power of elasticsearch's query DSL while also trying to make it more accessible to non-experts and generally friendlier. While I think that in some areas I might have achieved this, I still have a long way to go in others; and for some I don't even see the clear path of what needs to happen which makes it interesting - gives me something to ponder 🙂

Which Python libraries are your favorite (core or 3rd party)?

As a 3rd party library I really like click because of how well it fulfills its purpose. From the standard library I am a huge fan of itertools, collections and other "helper" libraries that don't just offer some functionality but also, and often more importantly, show how a real pythonic code should look like and makes your code more pythonic when you use them properly - the same quality I always liked about Django where it guides you in the right direction and makes the right-way to things so obvious that it trains you to recognize it and use it also outside of your Django projects.

Another library I find really interesting is py.test - I love the feature set and how easy and flexible it makes testing but I am also a bit wary of the "magic" that happens there. To me that is a perfect example of the pragmatism in the python community - I think we can all accept a little magic as long as it is clearly defined and serves its purpose.

Is there anything else you'd like to say?

I want to say a huge Thank you to the global Python community out there! I have been privileged enough to see a huge part of it and it still manages to catch me off guard regularly with the amount of awesomeness and caring that make for a great environment. I interact with developers all over the world as part of my job and really appreciate the amount effort and thought the people in the Python community are putting into making sure it is a thriving and welcoming one.

Thanks so much for doing the interview!

24 Apr 2017 12:30pm GMT

Mike Driscoll: PyDev of the Week: Honza Král

This week we welcome Honza Král (@HonzaKral) as our PyDev of the Week! Honza is one of the core developers of the Django web framework. He is also the maintainer of the official Python client to Elasticsearch. You can see some of the projects he is interested in or working on over at Github. Let's spend some time getting to know Honza better!

Can you tell us a little about yourself (hobbies, education, etc):

I grew up in the Old Town of Prague, Czech Republic where I also went to school, including University where I studied computer science (which I didn't finish). During my studies I discovered Python and immediately fell in love, first with the language and later, after going to my first Pycon in 2008, with the community.

I became a part of the Django community which was (and still is to this day) very welcoming. I became a part of it to learn and, hopefully, contribute something back. For my part it worked amazingly well - I got all my recent jobs through the community and even met my fiancee at a EuroPython conference!

Nowadays I work for Elastic, the company behind Elasticsearch where I do consulting - traveling around the world helping people be successful with open source - and also maintain the official Python client.

Why did you start using Python / Django?

I started in university for a project. I had my eye on Python for some time then and a perfect opportunity presented itself to use it for one of our team projects - we decided it would be a web-based application and Django was the shiny new thing that caught my attention. I introduced it to the rest of the team and have been using it ever since.

What other programming languages do you know and which is your favorite?

My other favorite, besides Python of course :), would be shell - I still enjoy writing both simple (or a bit more complex) one-liners as well as longer scripts in shell. I like the variety of it where you have so many possibilities on how to accomplish things with a seemingly limited language. I like how it's very unlike other languages I am used to working in as it forces you to think more in terms of pipelines and "just" passing off problems to another app/language. It definitely helped my Python skills as well I think - thinking of what I need to keep in a list/variable vs where a simple generator would suffice (as would a pipeline in shell) etc.

Right now I am looking at rust and hoping that a suitable project will come along for me to give it a serious try.

What projects are you working on now?

Currently I am working on elasticsearch-dsl which is an ORM-like abstraction on top of elasticsearch. It is an ambitious project since I wanted to expose all the power of elasticsearch's query DSL while also trying to make it more accessible to non-experts and generally friendlier. While I think that in some areas I might have achieved this, I still have a long way to go in others; and for some I don't even see the clear path of what needs to happen which makes it interesting - gives me something to ponder 🙂

Which Python libraries are your favorite (core or 3rd party)?

As a 3rd party library I really like click because of how well it fulfills its purpose. From the standard library I am a huge fan of itertools, collections and other "helper" libraries that don't just offer some functionality but also, and often more importantly, show how a real pythonic code should look like and makes your code more pythonic when you use them properly - the same quality I always liked about Django where it guides you in the right direction and makes the right-way to things so obvious that it trains you to recognize it and use it also outside of your Django projects.

Another library I find really interesting is py.test - I love the feature set and how easy and flexible it makes testing but I am also a bit wary of the "magic" that happens there. To me that is a perfect example of the pragmatism in the python community - I think we can all accept a little magic as long as it is clearly defined and serves its purpose.

Is there anything else you'd like to say?

I want to say a huge Thank you to the global Python community out there! I have been privileged enough to see a huge part of it and it still manages to catch me off guard regularly with the amount of awesomeness and caring that make for a great environment. I interact with developers all over the world as part of my job and really appreciate the amount effort and thought the people in the Python community are putting into making sure it is a thriving and welcoming one.

Thanks so much for doing the interview!

24 Apr 2017 12:30pm GMT

10 Nov 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: King Willams Town Bahnhof

Gestern musste ich morgens zur Station nach KWT um unsere Rerservierten Bustickets für die Weihnachtsferien in Capetown abzuholen. Der Bahnhof selber ist seit Dezember aus kostengründen ohne Zugverbindung - aber Translux und co - die langdistanzbusse haben dort ihre Büros.


Größere Kartenansicht




© benste CC NC SA

10 Nov 2011 10:57am GMT

09 Nov 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein

Niemand ist besorgt um so was - mit dem Auto fährt man einfach durch, und in der City - nahe Gnobie- "ne das ist erst gefährlich wenn die Feuerwehr da ist" - 30min später auf dem Rückweg war die Feuerwehr da.




© benste CC NC SA

09 Nov 2011 8:25pm GMT

08 Nov 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: Brai Party

Brai = Grillabend o.ä.

Die möchte gern Techniker beim Flicken ihrer SpeakOn / Klinke Stecker Verzweigungen...

Die Damen "Mamas" der Siedlung bei der offiziellen Eröffnungsrede

Auch wenn weniger Leute da waren als erwartet, Laute Musik und viele Leute ...

Und natürlich ein Feuer mit echtem Holz zum Grillen.

© benste CC NC SA

08 Nov 2011 2:30pm GMT

07 Nov 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: Lumanyano Primary

One of our missions was bringing Katja's Linux Server back to her room. While doing that we saw her new decoration.

Björn, Simphiwe carried the PC to Katja's school


© benste CC NC SA

07 Nov 2011 2:00pm GMT

06 Nov 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: Nelisa Haircut

Today I went with Björn to Needs Camp to Visit Katja's guest family for a special Party. First of all we visited some friends of Nelisa - yeah the one I'm working with in Quigney - Katja's guest fathers sister - who did her a haircut.

African Women usually get their hair done by arranging extensions and not like Europeans just cutting some hair.

In between she looked like this...

And then she was done - looks amazing considering the amount of hair she had last week - doesn't it ?

© benste CC NC SA

06 Nov 2011 7:45pm GMT

05 Nov 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: Mein Samstag

Irgendwie viel mir heute auf das ich meine Blogposts mal ein bischen umstrukturieren muss - wenn ich immer nur von neuen Plätzen berichte, dann müsste ich ja eine Rundreise machen. Hier also mal ein paar Sachen aus meinem heutigen Alltag.

Erst einmal vorweg, Samstag zählt zumindest für uns Voluntäre zu den freien Tagen.

Dieses Wochenende sind nur Rommel und ich auf der Farm - Katja und Björn sind ja mittlerweile in ihren Einsatzstellen, und meine Mitbewohner Kyle und Jonathan sind zu Hause in Grahamstown - sowie auch Sipho der in Dimbaza wohnt.
Robin, die Frau von Rommel ist in Woodie Cape - schon seit Donnerstag um da ein paar Sachen zur erledigen.
Naja wie dem auch sei heute morgen haben wir uns erstmal ein gemeinsames Weetbix/Müsli Frühstück gegönnt und haben uns dann auf den Weg nach East London gemacht. 2 Sachen waren auf der Checkliste Vodacom, Ethienne (Imobilienmakler) außerdem auf dem Rückweg die fehlenden Dinge nach NeedsCamp bringen.

Nachdem wir gerade auf der Dirtroad losgefahren sind mussten wir feststellen das wir die Sachen für Needscamp und Ethienne nicht eingepackt hatten aber die Pumpe für die Wasserversorgung im Auto hatten.

Also sind wir in EastLondon ersteinmal nach Farmerama - nein nicht das onlinespiel farmville - sondern einen Laden mit ganz vielen Sachen für eine Farm - in Berea einem nördlichen Stadteil gefahren.

In Farmerama haben wir uns dann beraten lassen für einen Schnellverschluss der uns das leben mit der Pumpe leichter machen soll und außerdem eine leichtere Pumpe zur Reperatur gebracht, damit es nicht immer so ein großer Aufwand ist, wenn mal wieder das Wasser ausgegangen ist.

Fego Caffé ist in der Hemmingways Mall, dort mussten wir und PIN und PUK einer unserer Datensimcards geben lassen, da bei der PIN Abfrage leider ein zahlendreher unterlaufen ist. Naja auf jeden Fall speichern die Shops in Südafrika so sensible Daten wie eine PUK - die im Prinzip zugang zu einem gesperrten Phone verschafft.

Im Cafe hat Rommel dann ein paar online Transaktionen mit dem 3G Modem durchgeführt, welches ja jetzt wieder funktionierte - und übrigens mittlerweile in Ubuntu meinem Linuxsystem perfekt klappt.

Nebenbei bin ich nach 8ta gegangen um dort etwas über deren neue Deals zu erfahren, da wir in einigen von Hilltops Centern Internet anbieten wollen. Das Bild zeigt die Abdeckung UMTS in NeedsCamp Katjas Ort. 8ta ist ein neuer Telefonanbieter von Telkom, nachdem Vodafone sich Telkoms anteile an Vodacom gekauft hat müssen die komplett neu aufbauen.
Wir haben uns dazu entschieden mal eine kostenlose Prepaidkarte zu testen zu organisieren, denn wer weis wie genau die Karte oben ist ... Bevor man einen noch so billigen Deal für 24 Monate signed sollte man wissen obs geht.

Danach gings nach Checkers in Vincent, gesucht wurden zwei Hotplates für WoodyCape - R 129.00 eine - also ca. 12€ für eine zweigeteilte Kochplatte.
Wie man sieht im Hintergrund gibts schon Weihnachtsdeko - Anfang November und das in Südafrika bei sonnig warmen min- 25°C

Mittagessen haben wir uns bei einem Pakistanischen Curry Imbiss gegönnt - sehr empfehlenswert !
Naja und nachdem wir dann vor ner Stunde oder so zurück gekommen sind habe ich noch den Kühlschrank geputzt den ich heute morgen zum defrosten einfach nach draußen gestellt hatte. Jetzt ist der auch mal wieder sauber und ohne 3m dicke Eisschicht...

Morgen ... ja darüber werde ich gesondert berichten ... aber vermutlich erst am Montag, denn dann bin ich nochmal wieder in Quigney(East London) und habe kostenloses Internet.

© benste CC NC SA

05 Nov 2011 4:33pm GMT

31 Oct 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: Sterkspruit Computer Center

Sterkspruit is one of Hilltops Computer Centres in the far north of Eastern Cape. On the trip to J'burg we've used the opportunity to take a look at the centre.

Pupils in the big classroom


The Trainer


School in Countryside


Adult Class in the Afternoon


"Town"


© benste CC NC SA

31 Oct 2011 4:58pm GMT

Benedict Stein: Technical Issues

What are you doing in an internet cafe if your ADSL and Faxline has been discontinued before months end. Well my idea was sitting outside and eating some ice cream.
At least it's sunny and not as rainy as on the weekend.


© benste CC NC SA

31 Oct 2011 3:11pm GMT

30 Oct 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: Nellis Restaurant

For those who are traveling through Zastron - there is a very nice Restaurant which is serving delicious food at reasanable prices.
In addition they're selling home made juices jams and honey.




interior


home made specialities - the shop in the shop


the Bar


© benste CC NC SA

30 Oct 2011 4:47pm GMT

29 Oct 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: The way back from J'burg

Having the 10 - 12h trip from J'burg back to ELS I was able to take a lot of pcitures including these different roadsides

Plain Street


Orange River in its beginngings (near Lesotho)


Zastron Anglican Church


The Bridge in Between "Free State" and Eastern Cape next to Zastron


my new Background ;)


If you listen to GoogleMaps you'll end up traveling 50km of gravel road - as it was just renewed we didn't have that many problems and saved 1h compared to going the official way with all it's constructions sites




Freeway


getting dark


© benste CC NC SA

29 Oct 2011 4:23pm GMT

28 Oct 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: Wie funktioniert eigentlich eine Baustelle ?

Klar einiges mag anders sein, vieles aber gleich - aber ein in Deutschland täglich übliches Bild einer Straßenbaustelle - wie läuft das eigentlich in Südafrika ?

Ersteinmal vorweg - NEIN keine Ureinwohner die mit den Händen graben - auch wenn hier mehr Manpower genutzt wird - sind sie fleißig mit Technologie am arbeiten.

Eine ganz normale "Bundesstraße"


und wie sie erweitert wird


gaaaanz viele LKWs


denn hier wird eine Seite über einen langen Abschnitt komplett gesperrt, so das eine Ampelschaltung mit hier 45 Minuten Wartezeit entsteht


Aber wenigstens scheinen die ihren Spaß zu haben ;) - Wie auch wir denn gücklicher Weise mussten wir nie länger als 10 min. warten.

© benste CC NC SA

28 Oct 2011 4:20pm GMT