24 Nov 2020

feedPlanet Python

PyCoder’s Weekly: Issue #448 (Nov. 24, 2020)

#448 - NOVEMBER 24, 2020
View in Browser »

The PyCoder’s Weekly Logo


Synthetic Data Vault (SDV): A Python Library for Dataset Modeling

Creating realistic data for testing applications can be difficult, especially when you have complex data requirements and privacy concerns make using real data problematic. Enter Synthetic Data Vault, a tool for modeling datasets that closely preserves important statistics, like mean and standard variation.
ESMAEIL ALIZADEH

Python enumerate(): Simplify Looping With Counters

Once you learn about for loops in Python, you know that using an index to access items in a sequence isn't very Pythonic. So what do you do when you need that index value? In this tutorial, you'll learn all about Python's built-in enumerate(), where it's used, and how you can emulate its behavior.
REAL PYTHON

Profile, Understand & Optimize Code Performance

alt

You can't improve what you can't measure. Profile and understand code behavior and performance (Wall-time, I/O, CPU, HTTP requests, SQL queries). Install in minutes. Browse through appealing graphs. Supports all Python versions. Works in dev, test/staging & production →
BLACKFIRE sponsor

Reproducible and Upgradable Conda Environments: Dependency Management With conda-lock

If your application uses Conda to manage dependencies, you face a dilemma. On the one hand, you want to pin all your dependencies to specific versions, so you get reproducible builds. On the other hand, once you've pinned everything, upgrades become difficult. Enter conda-lock.
ITAMAR TURNER-TRAURING

Regular Expressions and Building Regexes in Python

In this course, you'll learn how to perform more complex string pattern matching using regular expressions, or regexes, in Python. You'll also explore more advanced regex tools and techniques that are available in Python.
REAL PYTHON course

PyInstaller 4.1 Supports Python 3.8 and 3.9

PYINSTALLER.READTHEDOCS.IO

Discussions

My Students Challenged Me to Write the Smallest Graphical User Interface That Includes Actual User Interaction

REDDIT

Advantages of Pattern Matching: A Simple Comparative Analysis

PYTHON.ORG

Python Jobs

Advanced Python Engineer (Newport Beach, CA, USA)

Research Affiliates

Python Developer / Software Engineer (Berlin, Germany)

Thermondo GmbH

Senior Full Stack Developer (Chicago, IL, USA)

Panopta

Senior Software Engineer, Platform (Remote)

Silicon Therapeutics

More Python Jobs >>>

Articles & Tutorials

10 Python Skills They Don't Teach in Bootcamp

Here are ten practical and little-known pandas tips to help you take your skills to the next level.
NICOLE JANEWAY BILLS

Using Python's bisect module

Python's bisect module has tools for searching and inserting values into sorted lists. It's one of his "batteries-included" features that often gets overlooked, but can be a great tool for optimizing certain kinds of code.
JOHN LEKBERG

Python Developers Are in Demand on Vettery

alt

Get discovered by top companies using Vettery to actively grow their tech teams with Python developers (like you). Here's how it works: create a profile, name your salary, and connect with hiring managers at startups to Fortune 500 companies. Sign up today - it's completely free for job-seekers →
VETTERY sponsor

How to Use Serializers in the Django Python Web Framework

Serialization transforms data into a format that can be stored or transmitted and then reconstructs it for use. There are some quick-and-dirty ways to serialize data in pure Python, but you often need to perform more complex actions during the serialization process, like validating data. The Django REST Framework has some particularly robust and full-featured serializers.
RENATO OLIVEIRA

Sentiment Analysis, Fourier Transforms, and More Python Data Science

Are you interested in learning more about Natural Language Processing? Have you heard of sentiment analysis? This week on the show, Kyle Stratis returns to talk about his new article titled, Use Sentiment Analysis With Python to Classify Movie Reviews. David Amos is also here, and all of us cover another batch of PyCoder's Weekly articles and projects.
REAL PYTHON podcast

Formatting Python Strings

In this course, you'll see two items to add to your Python string formatting toolkit. You'll learn about Python's string format method and the formatted string literal, or f-string. You'll learn about these formatting techniques in detail and add them to your Python string formatting toolkit.
REAL PYTHON course

Spend Less Time Debugging, and More Time Building with Scout APM

Scout APM uses tracing logic that ties bottlenecks to source code to give you the performance insights you need in less than 4 minutes! Start your free 14-day trial today and Scout will donate $5 to the OSS of your choice when you deploy.
SCOUT APM sponsor

When You Import a Python Package and It Is Empty

Did you know Python has two different kinds of packages: regular packages and namespace packages? It turns out that trying to import a regular package when you don't have the right permissions causes Python to import it as a namespace package, and some unexpected things happen.
PETR ZEMEK

Python Extensions with Rust and Go

Python extensions are a great way to leverage performance from another language while keeping a friendly Python API. How viable are Rust and Go for writing Python extensions? Are there reasons to use one over the other?
BRUCE ECKEL

Split Your Dataset With scikit-learn's train_test_split()

In this tutorial, you'll learn why it's important to split your dataset in supervised machine learning and how to do that with train_test_split() from scikit-learn.
REAL PYTHON

IPython for Web Devs

This free, open-source book will help you learn more about IPython, a rich toolkit that helps you make the most out of using Python interactively.
ERIC HAMITER

Add a New Dimension to Your Photos Using Python

Learn how to add some motion and a third dimension to a photo using depth estimation and inpainting.
DYLAN ROY

Projects & Code

klio: Smarter Data Pipelines for Audio

GITHUB.COM/SPOTIFY

nbdev: Create Delightful Python Projects Using Jupyter Notebooks

GITHUB.COM/FASTAI

pyo3: Rust Bindings for the Python Interpreter

GITHUB.COM/PYO3

yappi: Yet Another Python Profiler

GITHUB.COM/SUMERC

topalias: Linux Bash/ZSH Aliases Generator

GITHUB.COM/CSREDRAT

eff: Library for Working With Algebraic Effects

GITHUB.COM/ORSINIUM-LABS

SDV: Synthetic Data Generation for Tabular, Relational, Time Series Data

GITHUB.COM/SDV-DEV

Events

Real Python Office Hours (Virtual)

November 25, 2020
REALPYTHON.COM

Pyjamas 2020 (Virtual)

December 5, 2020
PYJAMAS.LIVE

BelPy 2021 (Virtual)

January 30 - 31, 2021
BELPY.IN • Shared by Gajendra Deshpande


Happy Pythoning!
This was PyCoder's Weekly Issue #448.
View in Browser »

alt


[ Subscribe to 🐍 PyCoder's Weekly 💌 - Get the best Python news, articles, and tutorials delivered to your inbox once a week >> Click here to learn more ]

24 Nov 2020 7:30pm GMT

PyCoder’s Weekly: Issue #448 (Nov. 24, 2020)

#448 - NOVEMBER 24, 2020
View in Browser »

The PyCoder’s Weekly Logo


Synthetic Data Vault (SDV): A Python Library for Dataset Modeling

Creating realistic data for testing applications can be difficult, especially when you have complex data requirements and privacy concerns make using real data problematic. Enter Synthetic Data Vault, a tool for modeling datasets that closely preserves important statistics, like mean and standard variation.
ESMAEIL ALIZADEH

Python enumerate(): Simplify Looping With Counters

Once you learn about for loops in Python, you know that using an index to access items in a sequence isn't very Pythonic. So what do you do when you need that index value? In this tutorial, you'll learn all about Python's built-in enumerate(), where it's used, and how you can emulate its behavior.
REAL PYTHON

Profile, Understand & Optimize Code Performance

alt

You can't improve what you can't measure. Profile and understand code behavior and performance (Wall-time, I/O, CPU, HTTP requests, SQL queries). Install in minutes. Browse through appealing graphs. Supports all Python versions. Works in dev, test/staging & production →
BLACKFIRE sponsor

Reproducible and Upgradable Conda Environments: Dependency Management With conda-lock

If your application uses Conda to manage dependencies, you face a dilemma. On the one hand, you want to pin all your dependencies to specific versions, so you get reproducible builds. On the other hand, once you've pinned everything, upgrades become difficult. Enter conda-lock.
ITAMAR TURNER-TRAURING

Regular Expressions and Building Regexes in Python

In this course, you'll learn how to perform more complex string pattern matching using regular expressions, or regexes, in Python. You'll also explore more advanced regex tools and techniques that are available in Python.
REAL PYTHON course

PyInstaller 4.1 Supports Python 3.8 and 3.9

PYINSTALLER.READTHEDOCS.IO

Discussions

My Students Challenged Me to Write the Smallest Graphical User Interface That Includes Actual User Interaction

REDDIT

Advantages of Pattern Matching: A Simple Comparative Analysis

PYTHON.ORG

Python Jobs

Advanced Python Engineer (Newport Beach, CA, USA)

Research Affiliates

Python Developer / Software Engineer (Berlin, Germany)

Thermondo GmbH

Senior Full Stack Developer (Chicago, IL, USA)

Panopta

Senior Software Engineer, Platform (Remote)

Silicon Therapeutics

More Python Jobs >>>

Articles & Tutorials

10 Python Skills They Don't Teach in Bootcamp

Here are ten practical and little-known pandas tips to help you take your skills to the next level.
NICOLE JANEWAY BILLS

Using Python's bisect module

Python's bisect module has tools for searching and inserting values into sorted lists. It's one of his "batteries-included" features that often gets overlooked, but can be a great tool for optimizing certain kinds of code.
JOHN LEKBERG

Python Developers Are in Demand on Vettery

alt

Get discovered by top companies using Vettery to actively grow their tech teams with Python developers (like you). Here's how it works: create a profile, name your salary, and connect with hiring managers at startups to Fortune 500 companies. Sign up today - it's completely free for job-seekers →
VETTERY sponsor

How to Use Serializers in the Django Python Web Framework

Serialization transforms data into a format that can be stored or transmitted and then reconstructs it for use. There are some quick-and-dirty ways to serialize data in pure Python, but you often need to perform more complex actions during the serialization process, like validating data. The Django REST Framework has some particularly robust and full-featured serializers.
RENATO OLIVEIRA

Sentiment Analysis, Fourier Transforms, and More Python Data Science

Are you interested in learning more about Natural Language Processing? Have you heard of sentiment analysis? This week on the show, Kyle Stratis returns to talk about his new article titled, Use Sentiment Analysis With Python to Classify Movie Reviews. David Amos is also here, and all of us cover another batch of PyCoder's Weekly articles and projects.
REAL PYTHON podcast

Formatting Python Strings

In this course, you'll see two items to add to your Python string formatting toolkit. You'll learn about Python's string format method and the formatted string literal, or f-string. You'll learn about these formatting techniques in detail and add them to your Python string formatting toolkit.
REAL PYTHON course

Spend Less Time Debugging, and More Time Building with Scout APM

Scout APM uses tracing logic that ties bottlenecks to source code to give you the performance insights you need in less than 4 minutes! Start your free 14-day trial today and Scout will donate $5 to the OSS of your choice when you deploy.
SCOUT APM sponsor

When You Import a Python Package and It Is Empty

Did you know Python has two different kinds of packages: regular packages and namespace packages? It turns out that trying to import a regular package when you don't have the right permissions causes Python to import it as a namespace package, and some unexpected things happen.
PETR ZEMEK

Python Extensions with Rust and Go

Python extensions are a great way to leverage performance from another language while keeping a friendly Python API. How viable are Rust and Go for writing Python extensions? Are there reasons to use one over the other?
BRUCE ECKEL

Split Your Dataset With scikit-learn's train_test_split()

In this tutorial, you'll learn why it's important to split your dataset in supervised machine learning and how to do that with train_test_split() from scikit-learn.
REAL PYTHON

IPython for Web Devs

This free, open-source book will help you learn more about IPython, a rich toolkit that helps you make the most out of using Python interactively.
ERIC HAMITER

Add a New Dimension to Your Photos Using Python

Learn how to add some motion and a third dimension to a photo using depth estimation and inpainting.
DYLAN ROY

Projects & Code

klio: Smarter Data Pipelines for Audio

GITHUB.COM/SPOTIFY

nbdev: Create Delightful Python Projects Using Jupyter Notebooks

GITHUB.COM/FASTAI

pyo3: Rust Bindings for the Python Interpreter

GITHUB.COM/PYO3

yappi: Yet Another Python Profiler

GITHUB.COM/SUMERC

topalias: Linux Bash/ZSH Aliases Generator

GITHUB.COM/CSREDRAT

eff: Library for Working With Algebraic Effects

GITHUB.COM/ORSINIUM-LABS

SDV: Synthetic Data Generation for Tabular, Relational, Time Series Data

GITHUB.COM/SDV-DEV

Events

Real Python Office Hours (Virtual)

November 25, 2020
REALPYTHON.COM

Pyjamas 2020 (Virtual)

December 5, 2020
PYJAMAS.LIVE

BelPy 2021 (Virtual)

January 30 - 31, 2021
BELPY.IN • Shared by Gajendra Deshpande


Happy Pythoning!
This was PyCoder's Weekly Issue #448.
View in Browser »

alt


[ Subscribe to 🐍 PyCoder's Weekly 💌 - Get the best Python news, articles, and tutorials delivered to your inbox once a week >> Click here to learn more ]

24 Nov 2020 7:30pm GMT

Stack Abuse: Rotate Axis Labels in Matplotlib

Introduction

Matplotlib is one of the most widely used data visualization libraries in Python. Much of Matplotlib's popularity comes from its customization options - you can tweak just about any element from its hierarchy of objects.

In this tutorial, we'll take a look at how to rotate axis text/labels in a Matplotlib plot.

Creating a Plot

Let's create a simple plot first:

import matplotlib.pyplot as plt
import numpy as np

x = np.arange(0, 10, 0.1)
y = np.sin(x)

plt.plot(x, y)
plt.show()

simple matplotlib plot

Rotate X-Axis Labels in Matplotlib

Now, let's take a look at how we can rotate the X-Axis labels here. There are two ways to go about it - change it on the Figure-level using plt.xticks() or change it on an Axes-level by using tick.set_rotation() individually, or even by using ax.set_xticklabels() and ax.xtick_params().

Let's start off with the first option:

import matplotlib.pyplot as plt
import numpy as np

x = np.arange(0, 10, 0.1)
y = np.sin(x)

plt.plot(x, y)
plt.xticks(rotation = 45) # Rotates X-Axis Ticks by 45-degrees
plt.show()

Here, we've set the rotation of xticks to 45, signifying a 45-degree tilt, counterclockwise:

rotate x-axis label with xticks

Note: This function, like all others here, should be called after plt.plot(), lest the ticks end up being potentially cropped or misplaced.

Another option would be to get the current Axes object and call ax.set_xticklabels() on it. Here we can set the labels, as well as their rotation:

import matplotlib.pyplot as plt
import numpy as np

x = np.arange(0, 10, 0.1)
y = np.sin(x)

plt.plot(x, y)

ax = plt.gca()
plt.draw()

ax.set_xticklabels(ax.get_xticks(), rotation = 45)

plt.show()

Note: For this approach to work, you'll need to call plt.draw() before accessing or setting the X tick labels. This is because the labels are populated after the plot is drawn, otherwise, they'll return empty text values.

rotate x axis labels with xticklabels

Alternatively, we could've iterated over the ticks in the ax.get_xticklabels() list. Then, we can call tick.set_rotation() on each of them:

import matplotlib.pyplot as plt
import numpy as np

x = np.arange(0, 10, 0.1)
y = np.sin(x)

plt.plot(x, y)

ax = plt.gca()
plt.draw()

for tick in ax.get_xticklabels():
    tick.set_rotation(45)
plt.show()

This also results in:

rotate x axis labels with set_rotation

And finally, you can use the ax.tick_params() function and set the label rotation there:

import matplotlib.pyplot as plt
import numpy as np

x = np.arange(0, 10, 0.1)
y = np.sin(x)

plt.plot(x, y)

ax = plt.gca()
ax.tick_params(axis='x', labelrotation = 45)
plt.show()

This also results in:

rotate x axis labels with tick_params

Rotate Y-Axis Labels in Matplotlib

The exact same steps can be applied for the Y-Axis labels.

Firstly, you can change it on the Figure-level with plt.yticks(), or on the Axes-lebel by using tick.set_rotation() or by manipulating the ax.set_yticklabels() and ax.tick_params().

Let's start off with the first option:

import matplotlib.pyplot as plt
import numpy as np

x = np.arange(0, 10, 0.1)
y = np.sin(x)

plt.plot(x, y)
plt.yticks(rotation = 45)
plt.show()

Sme as last time, this sets the rotation of yticks by 45-degrees:

rotate y axis labels yticks

Now, let's work directly with the Axes object:

import matplotlib.pyplot as plt
import numpy as np

x = np.arange(0, 10, 0.1)
y = np.sin(x)

plt.plot(x, y)

ax = plt.gca()
plt.draw()

ax.set_yticklabels(ax.get_yticks(), rotation = 45)

plt.show()

The same note applies here, you have to call plt.draw() before this call to make it work correctly.

rotate y axis labels with yticklabels

Now, let's iterate over the list of ticks and set_rotation() on each of them:

import matplotlib.pyplot as plt
import numpy as np

x = np.arange(0, 10, 0.1)
y = np.sin(x)

plt.plot(x, y)

ax = plt.gca()
plt.draw()

for tick in ax.get_yticklabels():
    tick.set_rotation(45)
plt.show()

This also results in:

rotate y axis labels with set_rotation

And finally, you can use the ax.tick_params() function and set the label rotation there:

import matplotlib.pyplot as plt
import numpy as np

x = np.arange(0, 10, 0.1)
y = np.sin(x)

plt.plot(x, y)

ax = plt.gca()
ax.tick_params(axis='y', labelrotation = 45)
plt.show()

This also results in:

rotate y axis labels with tick_params

Rotate Dates to Fit in Matplotlib

Most often, the reason people rotate ticks in their plots is because they contain dates. Dates can get long, and even with a small dataset, they'll start overlapping and will quickly become unreadable.

Of course, you can rotate them like we did before, usually, a 45-degree tilt will solve most of the problems, while a 90-degree tilt will free up even more.

Though, there's another option for rotating and fixing dates in Matplotlib, which is even easier than the previous methods - fig.autofmt__date().

This function can be used either as fig.autofmt_xdate() or fig.autofmt_ydate() for the two different axes.

Let's take a look at how we can use it on the Seattle Weather Dataset:

import pandas as pd
import matplotlib.pyplot as plt

weather_data = pd.read_csv("seattleWeather.csv")

fig = plt.figure()
plt.plot(weather_data['DATE'], weather_data['PRCP'])
fig.autofmt_xdate()
plt.show()

This results in:

auto format dates to fit in matplotlib

Conclusion

In this tutorial, we've gone over several ways to rotate Axis text/labels in a Matplotlib plot, including a specific way to format and fit dates .

If you're interested in Data Visualization and don't know where to start, make sure to check out our book on Data Visualization in Python.

Data Visualization in Python, a book for beginner to intermediate Python developers, will guide you through simple data manipulation with Pandas, cover core plotting libraries like Matplotlib and Seaborn, and show you how to take advantage of declarative and experimental libraries like Altair.

Data Visualization in Python

Understand your data better with visualizations! With over 275+ pages, you'll learn the ins and outs of visualizing data in Python with popular libraries like Matplotlib, Seaborn, Bokeh, and more.

24 Nov 2020 3:29pm GMT

Stack Abuse: Rotate Axis Labels in Matplotlib

Introduction

Matplotlib is one of the most widely used data visualization libraries in Python. Much of Matplotlib's popularity comes from its customization options - you can tweak just about any element from its hierarchy of objects.

In this tutorial, we'll take a look at how to rotate axis text/labels in a Matplotlib plot.

Creating a Plot

Let's create a simple plot first:

import matplotlib.pyplot as plt
import numpy as np

x = np.arange(0, 10, 0.1)
y = np.sin(x)

plt.plot(x, y)
plt.show()

simple matplotlib plot

Rotate X-Axis Labels in Matplotlib

Now, let's take a look at how we can rotate the X-Axis labels here. There are two ways to go about it - change it on the Figure-level using plt.xticks() or change it on an Axes-level by using tick.set_rotation() individually, or even by using ax.set_xticklabels() and ax.xtick_params().

Let's start off with the first option:

import matplotlib.pyplot as plt
import numpy as np

x = np.arange(0, 10, 0.1)
y = np.sin(x)

plt.plot(x, y)
plt.xticks(rotation = 45) # Rotates X-Axis Ticks by 45-degrees
plt.show()

Here, we've set the rotation of xticks to 45, signifying a 45-degree tilt, counterclockwise:

rotate x-axis label with xticks

Note: This function, like all others here, should be called after plt.plot(), lest the ticks end up being potentially cropped or misplaced.

Another option would be to get the current Axes object and call ax.set_xticklabels() on it. Here we can set the labels, as well as their rotation:

import matplotlib.pyplot as plt
import numpy as np

x = np.arange(0, 10, 0.1)
y = np.sin(x)

plt.plot(x, y)

ax = plt.gca()
plt.draw()

ax.set_xticklabels(ax.get_xticks(), rotation = 45)

plt.show()

Note: For this approach to work, you'll need to call plt.draw() before accessing or setting the X tick labels. This is because the labels are populated after the plot is drawn, otherwise, they'll return empty text values.

rotate x axis labels with xticklabels

Alternatively, we could've iterated over the ticks in the ax.get_xticklabels() list. Then, we can call tick.set_rotation() on each of them:

import matplotlib.pyplot as plt
import numpy as np

x = np.arange(0, 10, 0.1)
y = np.sin(x)

plt.plot(x, y)

ax = plt.gca()
plt.draw()

for tick in ax.get_xticklabels():
    tick.set_rotation(45)
plt.show()

This also results in:

rotate x axis labels with set_rotation

And finally, you can use the ax.tick_params() function and set the label rotation there:

import matplotlib.pyplot as plt
import numpy as np

x = np.arange(0, 10, 0.1)
y = np.sin(x)

plt.plot(x, y)

ax = plt.gca()
ax.tick_params(axis='x', labelrotation = 45)
plt.show()

This also results in:

rotate x axis labels with tick_params

Rotate Y-Axis Labels in Matplotlib

The exact same steps can be applied for the Y-Axis labels.

Firstly, you can change it on the Figure-level with plt.yticks(), or on the Axes-lebel by using tick.set_rotation() or by manipulating the ax.set_yticklabels() and ax.tick_params().

Let's start off with the first option:

import matplotlib.pyplot as plt
import numpy as np

x = np.arange(0, 10, 0.1)
y = np.sin(x)

plt.plot(x, y)
plt.yticks(rotation = 45)
plt.show()

Sme as last time, this sets the rotation of yticks by 45-degrees:

rotate y axis labels yticks

Now, let's work directly with the Axes object:

import matplotlib.pyplot as plt
import numpy as np

x = np.arange(0, 10, 0.1)
y = np.sin(x)

plt.plot(x, y)

ax = plt.gca()
plt.draw()

ax.set_yticklabels(ax.get_yticks(), rotation = 45)

plt.show()

The same note applies here, you have to call plt.draw() before this call to make it work correctly.

rotate y axis labels with yticklabels

Now, let's iterate over the list of ticks and set_rotation() on each of them:

import matplotlib.pyplot as plt
import numpy as np

x = np.arange(0, 10, 0.1)
y = np.sin(x)

plt.plot(x, y)

ax = plt.gca()
plt.draw()

for tick in ax.get_yticklabels():
    tick.set_rotation(45)
plt.show()

This also results in:

rotate y axis labels with set_rotation

And finally, you can use the ax.tick_params() function and set the label rotation there:

import matplotlib.pyplot as plt
import numpy as np

x = np.arange(0, 10, 0.1)
y = np.sin(x)

plt.plot(x, y)

ax = plt.gca()
ax.tick_params(axis='y', labelrotation = 45)
plt.show()

This also results in:

rotate y axis labels with tick_params

Rotate Dates to Fit in Matplotlib

Most often, the reason people rotate ticks in their plots is because they contain dates. Dates can get long, and even with a small dataset, they'll start overlapping and will quickly become unreadable.

Of course, you can rotate them like we did before, usually, a 45-degree tilt will solve most of the problems, while a 90-degree tilt will free up even more.

Though, there's another option for rotating and fixing dates in Matplotlib, which is even easier than the previous methods - fig.autofmt__date().

This function can be used either as fig.autofmt_xdate() or fig.autofmt_ydate() for the two different axes.

Let's take a look at how we can use it on the Seattle Weather Dataset:

import pandas as pd
import matplotlib.pyplot as plt

weather_data = pd.read_csv("seattleWeather.csv")

fig = plt.figure()
plt.plot(weather_data['DATE'], weather_data['PRCP'])
fig.autofmt_xdate()
plt.show()

This results in:

auto format dates to fit in matplotlib

Conclusion

In this tutorial, we've gone over several ways to rotate Axis text/labels in a Matplotlib plot, including a specific way to format and fit dates .

If you're interested in Data Visualization and don't know where to start, make sure to check out our book on Data Visualization in Python.

Data Visualization in Python, a book for beginner to intermediate Python developers, will guide you through simple data manipulation with Pandas, cover core plotting libraries like Matplotlib and Seaborn, and show you how to take advantage of declarative and experimental libraries like Altair.

Data Visualization in Python

Understand your data better with visualizations! With over 275+ pages, you'll learn the ins and outs of visualizing data in Python with popular libraries like Matplotlib, Seaborn, Bokeh, and more.

24 Nov 2020 3:29pm GMT

Real Python: Formatting Python Strings

In this course, you'll see two items to add to your Python string formatting toolkit. You'll learn about Python's string format method and the formatted string literal, or f-string. You'll learn about these formatting techniques in detail and add them to your Python string formatting toolkit.

In this course, you'll learn about:

  1. The string .format() method
  2. The formatted string literal, or f-string

[ Improve Your Python With 🐍 Python Tricks 💌 - Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

24 Nov 2020 2:00pm GMT

Real Python: Formatting Python Strings

In this course, you'll see two items to add to your Python string formatting toolkit. You'll learn about Python's string format method and the formatted string literal, or f-string. You'll learn about these formatting techniques in detail and add them to your Python string formatting toolkit.

In this course, you'll learn about:

  1. The string .format() method
  2. The formatted string literal, or f-string

[ Improve Your Python With 🐍 Python Tricks 💌 - Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

24 Nov 2020 2:00pm GMT

Mike Driscoll: Free Python Videos from Manning Publications

Manning Publications recently contacted me to let me know that they had some new Python videos going up on their YouTube channel.

Carl Osipov - author of Cloud Native Machine Learning. He created a session on automatic differentiation used by PyTorch autograd for deep learning:

Jonathan Rioux - author of Data Analysis with Python and PySpark did a session in PySpark covering how to reason about your code before you write it, keep your data manipulation code tidy and reason about your code performance.

Note: Manning Publications is not a sponsor of this post. I just thought it was neat that they are putting out new, free Python content!

For more Python videos, check out the MouseVsPython YouTube channel!

The post Free Python Videos from Manning Publications appeared first on The Mouse Vs. The Python.

24 Nov 2020 6:05am GMT

Mike Driscoll: Free Python Videos from Manning Publications

Manning Publications recently contacted me to let me know that they had some new Python videos going up on their YouTube channel.

Carl Osipov - author of Cloud Native Machine Learning. He created a session on automatic differentiation used by PyTorch autograd for deep learning:

Jonathan Rioux - author of Data Analysis with Python and PySpark did a session in PySpark covering how to reason about your code before you write it, keep your data manipulation code tidy and reason about your code performance.

Note: Manning Publications is not a sponsor of this post. I just thought it was neat that they are putting out new, free Python content!

For more Python videos, check out the MouseVsPython YouTube channel!

The post Free Python Videos from Manning Publications appeared first on The Mouse Vs. The Python.

24 Nov 2020 6:05am GMT

Matt Layman: Episode 10 - User Auth

On this episode, we're going to look at working with users in a Django project. We'll see Django's tools for identifying users and checking what those users are permitted to do on your website. Listen at djangoriffs.com. Last Episode On the last episode, I explained the structure of Django application. We also talked why this structure is significant and how Django apps benefit the Django ecosystem as a tool for sharing code.

24 Nov 2020 12:00am GMT

Matt Layman: Episode 10 - User Auth

On this episode, we're going to look at working with users in a Django project. We'll see Django's tools for identifying users and checking what those users are permitted to do on your website. Listen at djangoriffs.com. Last Episode On the last episode, I explained the structure of Django application. We also talked why this structure is significant and how Django apps benefit the Django ecosystem as a tool for sharing code.

24 Nov 2020 12:00am GMT

23 Nov 2020

feedPlanet Python

Podcast.__init__: Pants Has Got Your Python Monorepo Covered - Episode 290

In a software project writing code is just one step of the overall lifecycle. There are many repetitive steps such as linting, running tests, and packaging that need to be run for each project that you maintain. In order to reduce the overhead of these repeat tasks, and to simplify the process of integrating code across multiple systems the use of monorepos has been growing in popularity. The Pants build tool is purpose built for addressing all of the drudgery and for working with monorepos of all sizes. In this episode core maintainers Eric Arellano and Stu Hood explain how the Pants project works, the benefits of automatic dependency inference, and how you can start using it in your own projects today. They also share useful tips for how to organize your projects, and how the plugin oriented architecture adds flexibility for you to customize Pants to your specific needs.

Summary

In a software project writing code is just one step of the overall lifecycle. There are many repetitive steps such as linting, running tests, and packaging that need to be run for each project that you maintain. In order to reduce the overhead of these repeat tasks, and to simplify the process of integrating code across multiple systems the use of monorepos has been growing in popularity. The Pants build tool is purpose built for addressing all of the drudgery and for working with monorepos of all sizes. In this episode core maintainers Eric Arellano and Stu Hood explain how the Pants project works, the benefits of automatic dependency inference, and how you can start using it in your own projects today. They also share useful tips for how to organize your projects, and how the plugin oriented architecture adds flexibility for you to customize Pants to your specific needs.

Announcements

  • Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
  • When you're ready to launch your next app or want to try a project you hear about on the show, you'll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it's easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don't forget to thank them for their continued support of this show!
  • Python has become the default language for working with data, whether as a data scientist, data engineer, data analyst, or machine learning engineer. Springboard has launched their School of Data to help you get a career in the field through a comprehensive set of programs that are 100% online and tailored to fit your busy schedule. With a network of expert mentors who are available to coach you during weekly 1:1 video calls, a tuition-back guarantee that means you don't pay until you get a job, resume preparation, and interview assistance there's no reason to wait. Springboard is offering up to 20 scholarships of $500 towards the tuition cost, exclusively to listeners of this show. Go to pythonpodcast.com/springboard today to learn more and give your career a boost to the next level.
  • Feature flagging is a simple concept that enables you to ship faster, test in production, and do easy rollbacks without redeploying code. Teams using feature flags release new software with less risk, and release more often. ConfigCat is a feature flag service that lets you easily add flags to your Python code, and 9 other platforms. By adopting ConfigCat you and your manager can track and toggle your feature flags from their visual dashboard without redeploying any code or configuration, including granular targeting rules. You can roll out new features to a subset or your users for beta testing or canary deployments. With their simple API, clear documentation, and pricing that is independent of your team size you can get your first feature flags added in minutes without breaking the bank. Go to pythonpodcast.com/configcat today to get 35% off any paid plan with code PYTHONPODCAST or try out their free forever plan.
  • Your host as usual is Tobias Macey and today I'm interviewing Eric Arellano and Stu Hood about Pants, a flexible build system that works well with monorepos.

Interview

  • Introductions
  • How did you get introduced to Python?
  • Can you start by describing what Pants is and how it got started?
    • What's the story behind the name?
  • What is a monorepo and why might I want one?
    • What are the challenges caused by working with a monorepo?
    • Why are monorepos so uncommon in Python projects?
  • What is the workflow for a developer or team who is managing a project with Pants?
  • How does Pants integrate with the broader ecosystem of Python tools for dependency management and packaging (e.g. Poetry, Pip, pip-tools, Flit, Twine, Pex, Shiv, etc.)?
  • What is involved in setting up Pants for working with a new Python project?
    • What complications might developers encounter when trying to implement Pants in an existing project?
  • How is Pants itself implemented?
    • How have the design, goals, or architecture evolved since Pants was first created?
    • What are the major changes in the v2 release?
      • What was the motivation for the major overhaul of the project?
  • How do you recommend developers lay out their projects to work well with Python?
  • How can I handle code shared between different modules or packages, and reducing the third party dependencies that are built into the respective packages?
  • What are some of the most interesting, unexpected, or innovative ways that you have seen Pants used?
  • What have you found to be the most interesting, unexpected, or challenging aspects of working on Pants?
  • What are the cases where Pants is the wrong choice?
  • What do you have planned for the future of the pants project?

Keep In Touch

Picks

Links

The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

23 Nov 2020 11:41pm GMT

Podcast.__init__: Pants Has Got Your Python Monorepo Covered - Episode 290

In a software project writing code is just one step of the overall lifecycle. There are many repetitive steps such as linting, running tests, and packaging that need to be run for each project that you maintain. In order to reduce the overhead of these repeat tasks, and to simplify the process of integrating code across multiple systems the use of monorepos has been growing in popularity. The Pants build tool is purpose built for addressing all of the drudgery and for working with monorepos of all sizes. In this episode core maintainers Eric Arellano and Stu Hood explain how the Pants project works, the benefits of automatic dependency inference, and how you can start using it in your own projects today. They also share useful tips for how to organize your projects, and how the plugin oriented architecture adds flexibility for you to customize Pants to your specific needs.

Summary

In a software project writing code is just one step of the overall lifecycle. There are many repetitive steps such as linting, running tests, and packaging that need to be run for each project that you maintain. In order to reduce the overhead of these repeat tasks, and to simplify the process of integrating code across multiple systems the use of monorepos has been growing in popularity. The Pants build tool is purpose built for addressing all of the drudgery and for working with monorepos of all sizes. In this episode core maintainers Eric Arellano and Stu Hood explain how the Pants project works, the benefits of automatic dependency inference, and how you can start using it in your own projects today. They also share useful tips for how to organize your projects, and how the plugin oriented architecture adds flexibility for you to customize Pants to your specific needs.

Announcements

  • Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
  • When you're ready to launch your next app or want to try a project you hear about on the show, you'll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it's easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don't forget to thank them for their continued support of this show!
  • Python has become the default language for working with data, whether as a data scientist, data engineer, data analyst, or machine learning engineer. Springboard has launched their School of Data to help you get a career in the field through a comprehensive set of programs that are 100% online and tailored to fit your busy schedule. With a network of expert mentors who are available to coach you during weekly 1:1 video calls, a tuition-back guarantee that means you don't pay until you get a job, resume preparation, and interview assistance there's no reason to wait. Springboard is offering up to 20 scholarships of $500 towards the tuition cost, exclusively to listeners of this show. Go to pythonpodcast.com/springboard today to learn more and give your career a boost to the next level.
  • Feature flagging is a simple concept that enables you to ship faster, test in production, and do easy rollbacks without redeploying code. Teams using feature flags release new software with less risk, and release more often. ConfigCat is a feature flag service that lets you easily add flags to your Python code, and 9 other platforms. By adopting ConfigCat you and your manager can track and toggle your feature flags from their visual dashboard without redeploying any code or configuration, including granular targeting rules. You can roll out new features to a subset or your users for beta testing or canary deployments. With their simple API, clear documentation, and pricing that is independent of your team size you can get your first feature flags added in minutes without breaking the bank. Go to pythonpodcast.com/configcat today to get 35% off any paid plan with code PYTHONPODCAST or try out their free forever plan.
  • Your host as usual is Tobias Macey and today I'm interviewing Eric Arellano and Stu Hood about Pants, a flexible build system that works well with monorepos.

Interview

  • Introductions
  • How did you get introduced to Python?
  • Can you start by describing what Pants is and how it got started?
    • What's the story behind the name?
  • What is a monorepo and why might I want one?
    • What are the challenges caused by working with a monorepo?
    • Why are monorepos so uncommon in Python projects?
  • What is the workflow for a developer or team who is managing a project with Pants?
  • How does Pants integrate with the broader ecosystem of Python tools for dependency management and packaging (e.g. Poetry, Pip, pip-tools, Flit, Twine, Pex, Shiv, etc.)?
  • What is involved in setting up Pants for working with a new Python project?
    • What complications might developers encounter when trying to implement Pants in an existing project?
  • How is Pants itself implemented?
    • How have the design, goals, or architecture evolved since Pants was first created?
    • What are the major changes in the v2 release?
      • What was the motivation for the major overhaul of the project?
  • How do you recommend developers lay out their projects to work well with Python?
  • How can I handle code shared between different modules or packages, and reducing the third party dependencies that are built into the respective packages?
  • What are some of the most interesting, unexpected, or innovative ways that you have seen Pants used?
  • What have you found to be the most interesting, unexpected, or challenging aspects of working on Pants?
  • What are the cases where Pants is the wrong choice?
  • What do you have planned for the future of the pants project?

Keep In Touch

Picks

Links

The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

23 Nov 2020 11:41pm GMT

Python Morsels: What is a sequence?

Related Article:

Transcript

Sequences are a special type of iterable that can be indexed using square brackets ([...]) to get items by their position. You can also ask sequences for their length to see how many things are inside them.

A sequence is an ordered collection. They maintain the order of the things in them.

Which ones are sequences?

Here we have a list, a tuple, a string, a set, and a dictionary:

>>> fruits = ['apple', 'lemon', 'pear', 'watermelon']
>>> coordinates = (1, 8, 2)
>>> greeting = "Hi y'all!"
>>> colors = {'red', 'blue', 'yellow'}
>>> item_counts = {'computers': 1, 'headphones': 2, 'ducks': 3}

We can write a for loop to loop over these (they're all iterables).

>>> for fruit in fruits:
...     print(fruit)
...
apple
lemon
pear
watermelon

And we can get the length of any of these (using the built-in len function).

>>> len(fruits)
4

But not all of these are indexable.

All of these are iterables, but not all of them are sequences. Only the list, tuple, and string above are sequences.

The properties of a sequence

Sequences can be indexed from 0 up until index len(my_sequence)-1.

Index 0 represents the first item in a sequence:

>>> fruits[0]
'apple'

Sequences can also usually be negative indexed to get items from the end of the sequence.

Index -1 represents the last thing in a sequence:

>>> coordinates[-1]
2

You can usually slice sequences also. This will give us everything up to (but not including) the last character in the greeting string (so we get everything but the exclamation mark):

>>> greeting[:-1]
"Hi y'all"

Iterables that aren't sequences

Sets are iterables, but they're not sequences:

>>> colors[0]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'set' object is not subscriptable

So, if we try to index a set, it's not going to work.

If we try to index a dictionary, it might seem like it works depending on what the keys the dictionary are.

>>> item_counts[0]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 0

But we're not actually indexing the dictionary here, we're doing a key-value lookup.

If you try to slice a dictionary as well:

>>> item_counts[:-1]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'slice'

Lists, tuples and strings are sequences but ets and dictionaries are not.

Summary

Sequences are iterables that have a length and you can be indexed.

You can usually slice sequences. You can also usually negative index.

The most common sequences built-in to Python are strings, tuples, and lists (though range objects are also sequences, which is interesting). You'll also see other sequences floating around in Python, but strings, tuples, and lists are the most common ones.

23 Nov 2020 4:00pm GMT

Python Morsels: What is a sequence?

Related Article:

Transcript

Sequences are a special type of iterable that can be indexed using square brackets ([...]) to get items by their position. You can also ask sequences for their length to see how many things are inside them.

A sequence is an ordered collection. They maintain the order of the things in them.

Which ones are sequences?

Here we have a list, a tuple, a string, a set, and a dictionary:

>>> fruits = ['apple', 'lemon', 'pear', 'watermelon']
>>> coordinates = (1, 8, 2)
>>> greeting = "Hi y'all!"
>>> colors = {'red', 'blue', 'yellow'}
>>> item_counts = {'computers': 1, 'headphones': 2, 'ducks': 3}

We can write a for loop to loop over these (they're all iterables).

>>> for fruit in fruits:
...     print(fruit)
...
apple
lemon
pear
watermelon

And we can get the length of any of these (using the built-in len function).

>>> len(fruits)
4

But not all of these are indexable.

All of these are iterables, but not all of them are sequences. Only the list, tuple, and string above are sequences.

The properties of a sequence

Sequences can be indexed from 0 up until index len(my_sequence)-1.

Index 0 represents the first item in a sequence:

>>> fruits[0]
'apple'

Sequences can also usually be negative indexed to get items from the end of the sequence.

Index -1 represents the last thing in a sequence:

>>> coordinates[-1]
2

You can usually slice sequences also. This will give us everything up to (but not including) the last character in the greeting string (so we get everything but the exclamation mark):

>>> greeting[:-1]
"Hi y'all"

Iterables that aren't sequences

Sets are iterables, but they're not sequences:

>>> colors[0]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'set' object is not subscriptable

So, if we try to index a set, it's not going to work.

If we try to index a dictionary, it might seem like it works depending on what the keys the dictionary are.

>>> item_counts[0]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 0

But we're not actually indexing the dictionary here, we're doing a key-value lookup.

If you try to slice a dictionary as well:

>>> item_counts[:-1]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'slice'

Lists, tuples and strings are sequences but ets and dictionaries are not.

Summary

Sequences are iterables that have a length and you can be indexed.

You can usually slice sequences. You can also usually negative index.

The most common sequences built-in to Python are strings, tuples, and lists (though range objects are also sequences, which is interesting). You'll also see other sequences floating around in Python, but strings, tuples, and lists are the most common ones.

23 Nov 2020 4:00pm GMT

Stack Abuse: How to Change Plot Background in Matplotlib

Introduction

Matplotlib is one of the most widely used data visualization libraries in Python. From simple to complex visualizations, it's the go-to library for most.

In this tutorial, we'll take a look at how to change the background of a plot in Matplotlib.

Importing Data and Libraries

Let's import the required libraries first. We'll obviously need Matplotlib, and we'll use Pandas to read the data:

import matplotlib.pyplot as plt
import pandas as pd

Specifically, we'll be using the Seattle Weather Dataset:

weather_data = pd.read_csv("seattleWeather.csv")
print(weather_data.head())

         DATE  PRCP  TMAX  TMIN  RAIN
0  1948-01-01  0.47    51    42  True
1  1948-01-02  0.59    45    36  True
2  1948-01-03  0.42    45    35  True
3  1948-01-04  0.31    45    34  True
4  1948-01-05  0.17    45    32  True

Creating a Plot

Now, let's create a simple Matplotlib Scatterplot, with a few different variables we want to visualize:

PRCP = weather_data['PRCP']
TMAX = weather_data['TMAX']
TMIN = weather_data['TMIN']

Now, we'll construct a scatter plot between the minimum temperature and precipitation and show() it using Matplotlib's PyPlot:

plt.scatter(TMIN, PRCP)
plt.show()

matplotlib scatter plot

The graph that we have produced is interpretable, but it is looking a little plain. Let's try customizing it. We want to customize the background of our plot using a couple of different methods.

Change Plot Background in Matplotlib

Now, let's go ahead and change the background of this plot. We can do this with two different approaches. We can change the color of the face, which is currently set to white. Or, we can input a picture using imshow().

Change Axes Background in Matplotlib

Let's first change the color of the face. This can either be done with the set() function, passing in the face argument and its new value, or via the dedicated set_facecolor() function:

ax = plt.axes()
ax.set_facecolor("orange")
# OR
ax.set(facecolor = "orange")

plt.scatter(TMIN, PRCP)
plt.show()

change axes background color matplotlib

Either of these approaches produces the same result, as they both call on the same function under the hood.

Change Figure Background in Matplotlib

If you would like to set the background for the figure and need an axes to be transparent, this can be done with the set_alpha() argument when you create the figure. Let's create a figure and an axes object. Of course, you can also use the set() function, and pass the alpha attribute instead.

The color of the entire figure will be blue and we will initially set the alpha of the axes object to 1.0, meaning fully opaque. We color the axes object orange, giving us an orange background within the blue figure:

fig = plt.figure()
fig.patch.set_facecolor('blue')
fig.patch.set_alpha(0.6)

ax = fig.add_subplot(111)
ax.patch.set_facecolor('orange')
ax.patch.set_alpha(1.0)

plt.scatter(TMIN, PRCP)
plt.show()

change figure background matplotlib

Now let's see what happens when we adjust the alpha of the axes subplot down to 0.0:

fig = plt.figure()
fig.patch.set_facecolor('blue')
fig.patch.set_alpha(0.6)

ax = fig.add_subplot(111)
ax.patch.set_facecolor('orange')
ax.patch.set_alpha(0.0)

plt.scatter(TMIN, PRCP)
plt.show()

change axes background matplotlib within figure

Notice that the background of the plot itself is transparent now.

Add Image to Plot Background in Matplotlib

If you would like to use an image as the background for a plot, this can be done by using PyPlot's imread() function. This function loads an image into Matplotlib, which can be displayed with the function imshow().

In order to plot on top of the image, the extent of the image has to be specified. By default, Matplotlib uses the upper left corner of the image as the image's origin. We can give a list of points to the imshow() function, specifying what region of the image should be displayed. When combined with subplots, another plot can be inserted on top of the image.

Let's use an image of rain as the background for our plot:

img = plt.imread("rain.jpg")
fig, ax = plt.subplots()
ax.imshow(img, extent=[-5, 80, -5, 30])
ax.scatter(TMIN, PRCP, color="#ebb734")
plt.show()

adding image to background matplotlib

The extent argument takes in additional arguments in this order: horizontal_min, horizontal_max, vertical_min, vertical_max).

Here, we've read the image, cropped it and showed it on the axes using imshow(). Then, we've plotted the scatter plot with a different color and shown the plot.

Conclusion

In this tutorial, we've gone over several ways to change a background of a plot using Python and Matplotlib.

If you're interested in Data Visualization and don't know where to start, make sure to check out our book on Data Visualization in Python.

Data Visualization in Python, a book for beginner to intermediate Python developers, will guide you through simple data manipulation with Pandas, cover core plotting libraries like Matplotlib and Seaborn, and show you how to take advantage of declarative and experimental libraries like Altair.

Data Visualization in Python

Understand your data better with visualizations! With over 275+ pages, you'll learn the ins and outs of visualizing data in Python with popular libraries like Matplotlib, Seaborn, Bokeh, and more.

23 Nov 2020 3:25pm GMT

Stack Abuse: How to Change Plot Background in Matplotlib

Introduction

Matplotlib is one of the most widely used data visualization libraries in Python. From simple to complex visualizations, it's the go-to library for most.

In this tutorial, we'll take a look at how to change the background of a plot in Matplotlib.

Importing Data and Libraries

Let's import the required libraries first. We'll obviously need Matplotlib, and we'll use Pandas to read the data:

import matplotlib.pyplot as plt
import pandas as pd

Specifically, we'll be using the Seattle Weather Dataset:

weather_data = pd.read_csv("seattleWeather.csv")
print(weather_data.head())

         DATE  PRCP  TMAX  TMIN  RAIN
0  1948-01-01  0.47    51    42  True
1  1948-01-02  0.59    45    36  True
2  1948-01-03  0.42    45    35  True
3  1948-01-04  0.31    45    34  True
4  1948-01-05  0.17    45    32  True

Creating a Plot

Now, let's create a simple Matplotlib Scatterplot, with a few different variables we want to visualize:

PRCP = weather_data['PRCP']
TMAX = weather_data['TMAX']
TMIN = weather_data['TMIN']

Now, we'll construct a scatter plot between the minimum temperature and precipitation and show() it using Matplotlib's PyPlot:

plt.scatter(TMIN, PRCP)
plt.show()

matplotlib scatter plot

The graph that we have produced is interpretable, but it is looking a little plain. Let's try customizing it. We want to customize the background of our plot using a couple of different methods.

Change Plot Background in Matplotlib

Now, let's go ahead and change the background of this plot. We can do this with two different approaches. We can change the color of the face, which is currently set to white. Or, we can input a picture using imshow().

Change Axes Background in Matplotlib

Let's first change the color of the face. This can either be done with the set() function, passing in the face argument and its new value, or via the dedicated set_facecolor() function:

ax = plt.axes()
ax.set_facecolor("orange")
# OR
ax.set(facecolor = "orange")

plt.scatter(TMIN, PRCP)
plt.show()

change axes background color matplotlib

Either of these approaches produces the same result, as they both call on the same function under the hood.

Change Figure Background in Matplotlib

If you would like to set the background for the figure and need an axes to be transparent, this can be done with the set_alpha() argument when you create the figure. Let's create a figure and an axes object. Of course, you can also use the set() function, and pass the alpha attribute instead.

The color of the entire figure will be blue and we will initially set the alpha of the axes object to 1.0, meaning fully opaque. We color the axes object orange, giving us an orange background within the blue figure:

fig = plt.figure()
fig.patch.set_facecolor('blue')
fig.patch.set_alpha(0.6)

ax = fig.add_subplot(111)
ax.patch.set_facecolor('orange')
ax.patch.set_alpha(1.0)

plt.scatter(TMIN, PRCP)
plt.show()

change figure background matplotlib

Now let's see what happens when we adjust the alpha of the axes subplot down to 0.0:

fig = plt.figure()
fig.patch.set_facecolor('blue')
fig.patch.set_alpha(0.6)

ax = fig.add_subplot(111)
ax.patch.set_facecolor('orange')
ax.patch.set_alpha(0.0)

plt.scatter(TMIN, PRCP)
plt.show()

change axes background matplotlib within figure

Notice that the background of the plot itself is transparent now.

Add Image to Plot Background in Matplotlib

If you would like to use an image as the background for a plot, this can be done by using PyPlot's imread() function. This function loads an image into Matplotlib, which can be displayed with the function imshow().

In order to plot on top of the image, the extent of the image has to be specified. By default, Matplotlib uses the upper left corner of the image as the image's origin. We can give a list of points to the imshow() function, specifying what region of the image should be displayed. When combined with subplots, another plot can be inserted on top of the image.

Let's use an image of rain as the background for our plot:

img = plt.imread("rain.jpg")
fig, ax = plt.subplots()
ax.imshow(img, extent=[-5, 80, -5, 30])
ax.scatter(TMIN, PRCP, color="#ebb734")
plt.show()

adding image to background matplotlib

The extent argument takes in additional arguments in this order: horizontal_min, horizontal_max, vertical_min, vertical_max).

Here, we've read the image, cropped it and showed it on the axes using imshow(). Then, we've plotted the scatter plot with a different color and shown the plot.

Conclusion

In this tutorial, we've gone over several ways to change a background of a plot using Python and Matplotlib.

If you're interested in Data Visualization and don't know where to start, make sure to check out our book on Data Visualization in Python.

Data Visualization in Python, a book for beginner to intermediate Python developers, will guide you through simple data manipulation with Pandas, cover core plotting libraries like Matplotlib and Seaborn, and show you how to take advantage of declarative and experimental libraries like Altair.

Data Visualization in Python

Understand your data better with visualizations! With over 275+ pages, you'll learn the ins and outs of visualizing data in Python with popular libraries like Matplotlib, Seaborn, Bokeh, and more.

23 Nov 2020 3:25pm GMT

NumFOCUS: Anaconda Announces Multi-Year Partnership with NumFOCUS

A key stakeholder in the open source scientific computing ecosystem has further formalized their long-standing partnership with NumFOCUS. Anaconda, the Austin, Texas-based software development and consulting company which provides global distribution of Python and R software packages, last month introduced their Anaconda Dividend Program. Through this initiative, Anaconda plans to direct a portion of their […]

The post Anaconda Announces Multi-Year Partnership with NumFOCUS appeared first on NumFOCUS.

23 Nov 2020 2:44pm GMT

NumFOCUS: Anaconda Announces Multi-Year Partnership with NumFOCUS

A key stakeholder in the open source scientific computing ecosystem has further formalized their long-standing partnership with NumFOCUS. Anaconda, the Austin, Texas-based software development and consulting company which provides global distribution of Python and R software packages, last month introduced their Anaconda Dividend Program. Through this initiative, Anaconda plans to direct a portion of their […]

The post Anaconda Announces Multi-Year Partnership with NumFOCUS appeared first on NumFOCUS.

23 Nov 2020 2:44pm GMT

Real Python: Split Your Dataset With scikit-learn's train_test_split()

One of the key aspects of supervised machine learning is model evaluation and validation. When you evaluate the predictive performance of your model, it's essential that the process be unbiased. Using train_test_split() from the data science library scikit-learn, you can split your dataset into subsets that minimize the potential for bias in your evaluation and validation process.

In this tutorial, you'll learn:

  • Why you need to split your dataset in supervised machine learning
  • Which subsets of the dataset you need for an unbiased evaluation of your model
  • How to use train_test_split() to split your data
  • How to combine train_test_split() with prediction methods

In addition, you'll get information on related tools from sklearn.model_selection.

Free Bonus: Click here to get access to a free NumPy Resources Guide that points you to the best tutorials, videos, and books for improving your NumPy skills.

The Importance of Data Splitting

Supervised machine learning is about creating models that precisely map the given inputs (independent variables, or predictors) to the given outputs (dependent variables, or responses).

How you measure the precision of your model depends on the type of a problem you're trying to solve. In regression analysis, you typically use the coefficient of determination, root-mean-square error, mean absolute error, or similar quantities. For classification problems, you often apply accuracy, precision, recall, F1 score, and related indicators.

The acceptable numeric values that measure precision vary from field to field. You can find detailed explanations from Statistics By Jim, Quora, and many other resources.

What's most important to understand is that you usually need unbiased evaluation to properly use these measures, assess the predictive performance of your model, and validate the model.

This means that you can't evaluate the predictive performance of a model with the same data you used for training. You need evaluate the model with fresh data that hasn't been seen by the model before. You can accomplish that by splitting your dataset before you use it.

Training, Validation, and Test Sets

Splitting your dataset is essential for an unbiased evaluation of prediction performance. In most cases, it's enough to split your dataset randomly into three subsets:

  1. The training set is applied to train, or fit, your model. For example, you use the training set to find the optimal weights, or coefficients, for linear regression, logistic regression, or neural networks.

  2. The validation set is used for unbiased model evaluation during hyperparameter tuning. For example, when you want to find the optimal number of neurons in a neural network or the best kernel for a support vector machine, you experiment with different values. For each considered setting of hyperparameters, you fit the model with the training set and assess its performance with the validation set.

  3. The test set is needed for an unbiased evaluation of the final model. You shouldn't use it for fitting or validation.

In less complex cases, when you don't have to tune hyperparameters, it's okay to work with only the training and test sets.

Underfitting and Overfitting

Splitting a dataset might also be important for detecting if your model suffers from one of two very common problems, called underfitting and overfitting:

  1. Underfitting is usually the consequence of a model being unable to encapsulate the relations among data. For example, this can happen when trying to represent nonlinear relations with a linear model. Underfitted models will likely have poor performance with both training and test sets.

  2. Overfitting usually takes place when a model has an excessively complex structure and learns both the existing relations among data and noise. Such models often have bad generalization capabilities. Although they work well with training data, they usually yield poor performance with unseen (test) data.

You can find a more detailed explanation of underfitting and overfitting in Linear Regression in Python.

Prerequisites for Using train_test_split()

Now that you understand the need to split a dataset in order to perform unbiased model evaluation and identify underfitting or overfitting, you're ready to learn how to split your own datasets.

You'll use version 0.23.1 of scikit-learn, or sklearn. It has many packages for data science and machine learning, but for this tutorial you'll focus on the model_selection package, specifically on the function train_test_split().

You can install sklearn with pip install:

$ python -m pip install -U "scikit-learn==0.23.1"

If you use Anaconda, then you probably already have it installed. However, if you want to use a fresh environment, ensure that you have the specified version, or use Miniconda, then you can install sklearn from Anaconda Cloud with conda install:

Read the full article at https://realpython.com/train-test-split-python-data/ »


[ Improve Your Python With 🐍 Python Tricks 💌 - Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

23 Nov 2020 2:00pm GMT

Real Python: Split Your Dataset With scikit-learn's train_test_split()

One of the key aspects of supervised machine learning is model evaluation and validation. When you evaluate the predictive performance of your model, it's essential that the process be unbiased. Using train_test_split() from the data science library scikit-learn, you can split your dataset into subsets that minimize the potential for bias in your evaluation and validation process.

In this tutorial, you'll learn:

  • Why you need to split your dataset in supervised machine learning
  • Which subsets of the dataset you need for an unbiased evaluation of your model
  • How to use train_test_split() to split your data
  • How to combine train_test_split() with prediction methods

In addition, you'll get information on related tools from sklearn.model_selection.

Free Bonus: Click here to get access to a free NumPy Resources Guide that points you to the best tutorials, videos, and books for improving your NumPy skills.

The Importance of Data Splitting

Supervised machine learning is about creating models that precisely map the given inputs (independent variables, or predictors) to the given outputs (dependent variables, or responses).

How you measure the precision of your model depends on the type of a problem you're trying to solve. In regression analysis, you typically use the coefficient of determination, root-mean-square error, mean absolute error, or similar quantities. For classification problems, you often apply accuracy, precision, recall, F1 score, and related indicators.

The acceptable numeric values that measure precision vary from field to field. You can find detailed explanations from Statistics By Jim, Quora, and many other resources.

What's most important to understand is that you usually need unbiased evaluation to properly use these measures, assess the predictive performance of your model, and validate the model.

This means that you can't evaluate the predictive performance of a model with the same data you used for training. You need evaluate the model with fresh data that hasn't been seen by the model before. You can accomplish that by splitting your dataset before you use it.

Training, Validation, and Test Sets

Splitting your dataset is essential for an unbiased evaluation of prediction performance. In most cases, it's enough to split your dataset randomly into three subsets:

  1. The training set is applied to train, or fit, your model. For example, you use the training set to find the optimal weights, or coefficients, for linear regression, logistic regression, or neural networks.

  2. The validation set is used for unbiased model evaluation during hyperparameter tuning. For example, when you want to find the optimal number of neurons in a neural network or the best kernel for a support vector machine, you experiment with different values. For each considered setting of hyperparameters, you fit the model with the training set and assess its performance with the validation set.

  3. The test set is needed for an unbiased evaluation of the final model. You shouldn't use it for fitting or validation.

In less complex cases, when you don't have to tune hyperparameters, it's okay to work with only the training and test sets.

Underfitting and Overfitting

Splitting a dataset might also be important for detecting if your model suffers from one of two very common problems, called underfitting and overfitting:

  1. Underfitting is usually the consequence of a model being unable to encapsulate the relations among data. For example, this can happen when trying to represent nonlinear relations with a linear model. Underfitted models will likely have poor performance with both training and test sets.

  2. Overfitting usually takes place when a model has an excessively complex structure and learns both the existing relations among data and noise. Such models often have bad generalization capabilities. Although they work well with training data, they usually yield poor performance with unseen (test) data.

You can find a more detailed explanation of underfitting and overfitting in Linear Regression in Python.

Prerequisites for Using train_test_split()

Now that you understand the need to split a dataset in order to perform unbiased model evaluation and identify underfitting or overfitting, you're ready to learn how to split your own datasets.

You'll use version 0.23.1 of scikit-learn, or sklearn. It has many packages for data science and machine learning, but for this tutorial you'll focus on the model_selection package, specifically on the function train_test_split().

You can install sklearn with pip install:

$ python -m pip install -U "scikit-learn==0.23.1"

If you use Anaconda, then you probably already have it installed. However, if you want to use a fresh environment, ensure that you have the specified version, or use Miniconda, then you can install sklearn from Anaconda Cloud with conda install:

Read the full article at https://realpython.com/train-test-split-python-data/ »


[ Improve Your Python With 🐍 Python Tricks 💌 - Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

23 Nov 2020 2:00pm GMT

Codementor: AutoScraper and Flask: Create an API From Any Website in Less Than 5 Minutes And with Fewer Than 20 Lines of Python

In this tutorial, we are going to create our own e-commerce search API with support for both eBay and Etsy without using any external APIs. With the power of AutoScraper...

23 Nov 2020 1:00pm GMT

Codementor: AutoScraper and Flask: Create an API From Any Website in Less Than 5 Minutes And with Fewer Than 20 Lines of Python

In this tutorial, we are going to create our own e-commerce search API with support for both eBay and Etsy without using any external APIs. With the power of AutoScraper...

23 Nov 2020 1:00pm GMT

Zato Blog: Understanding API rate-limiting techniques

Enabling rate-limiting in Zato means that access to Zato-based APIs can be throttled per endpoint, user or service - including options to make limits apply to specific IP addresses only - and if limits are exceeded within a selected period of time, the invocation will fail. Let's check how to use it all.

Where and when limits apply

Rate-limiting aware objects in Zato

API rate limiting works on several levels and the configuration is always checked in the order below, which follows from the narrowest, most specific parts of the system (endpoints), through users which may apply to multiple endpoints, up to services which in turn may be used by both multiple endpoints and users.

When a request arrives through an endpoint, that endpoint's rate limiting configuration is checked. If the limit is already reached for the IP address or network of the calling application, the request is rejected.

Next, if there is any user associated with the endpoint, that account's rate limits are checked in the same manner and, similarly, if they are reached, the request is rejected.

Finally, if the endpoint's underlying service is configured to do so, it also checks if its invocation limits are not exceeded, rejecting the message accordingly if they are.

Note that the three levels are distinct yet they overlap in what they allow one to achieve.

For instance, it is possible to have the same user credentials be used in multiple endpoints and express ideas such as "Allow this and that user to invoke my APIs 1,000 requests/day but limit each endpoint to at most 5 requests/minute no matter which user".

Moreover, because limits can be set on services, it is possible to make it even more flexible, e.g. "Let this service be invoked at most 10,000 requests/hour, no matter which user it is, with particular users being able to invoke at most 500 requests/minute, no matter which service, topping it off with per separate limits for REST vs. SOAP vs. JSON-RPC endpoint, depending on what application is invoke the endpoints". That lets one conveniently express advanced scenarios that often occur in practical situations.

Also, observe that API rate limiting applies to REST, SOAP and JSON-RPC endpoints only, it is not used with other API endpoints, such as AMQP, IBM MQ, SAP, task scheduler or any other technologies. However, per-service limits work no matter which endpoint the service is invoked with and they will work with endpoints such as WebSockets, ZeroMQ or any other.

Lastly, limits pertain to with incoming requests only - any outgoing ones, from Zato to external resources - are not covered by it.

Per-IP restrictions

The architecture is made even more versatile thanks to the fact that for each object - endpoint, user or service - different limits can be configured depending on the caller's IP address.

This adds yet another dimension and allows to express ideas commonly witnessed in API-based projects, such as:

IP-based limits work hand in hand are an integral part of the mechanism - they do not rule out per-endpoit, user or service limits. In fact, for each such object, multiple IP-using limits can be set independently, thus allowing for highest degree of flexibility.

Exact or approximate

Rate limits come in two types:

Exact rate limits are just that, exact - they en that a limit is not exceeded at all, not even by a single request.

Approximate limits may let a very small number of requests to exceed the limit with the benefit being that approximate limits are faster to check than exact ones.

When to use which type depends on a particular project:

Python code and web-admin

Alright, let's check how to define the limits in Zato web-admin. We will use the sample service below:

# -*- coding: utf-8 -*-

# Zato
from zato.server.service import Service

class Sample(Service):
    name = 'api.sample'

    def handle(self):

        # Return a simple string on response
        self.response.payload = 'Hello there!\n'

Now, in web-admin, we will configure limits - separately for the service, a new and a new REST API channel (endpoint).

Configuring rate limits for service
Configuring rate limits for user
Configuring rate limits for user

Points of interest:

Testing it out

Now, all is left is to invoke the service from curl.

As long as limits are not reached, a business response is returned:

$ curl http://my.user:password@localhost:11223/api/sample
Hello there!
$

But if a limit is reached, the caller receives an error message with the 429 HTTP status.

$ curl -v http://my.user:password@localhost:11223/api/sample
*   Trying 127.0.0.1...

...

< HTTP/1.1 429 Too Many Requests
< Server: Zato
< X-Zato-CID: b8053d68612d626d338b02

...

{"zato_env":{"result":"ZATO_ERROR","cid":"b8053d68612d626d338b02eb",
 "details":"Error 429 Too Many Requests"}}
$

Note that the caller never knows what the limit was - that information is saved in Zato server logs along with other details so that API authors can correlate what callers get with the very rate limiting definition that prevented them from accessing the service.

zato.common.rate_limiting.common.RateLimitReached: Max. rate limit of 100/m reached;
from:`10.74.199.53`, network:`*`; last_from:`127.0.0.1;
last_request_time_utc:`2020-11-22T15:30:41.943794;
last_cid:`5f4f1ef65490a23e5c37eda1`; (cid:b8053d68612d626d338b02)

And this is it - we have created a new API rate limiting definition in Zato and tested it out successfully!

23 Nov 2020 12:53pm GMT

Zato Blog: Understanding API rate-limiting techniques

Enabling rate-limiting in Zato means that access to Zato-based APIs can be throttled per endpoint, user or service - including options to make limits apply to specific IP addresses only - and if limits are exceeded within a selected period of time, the invocation will fail. Let's check how to use it all.

Where and when limits apply

Rate-limiting aware objects in Zato

API rate limiting works on several levels and the configuration is always checked in the order below, which follows from the narrowest, most specific parts of the system (endpoints), through users which may apply to multiple endpoints, up to services which in turn may be used by both multiple endpoints and users.

When a request arrives through an endpoint, that endpoint's rate limiting configuration is checked. If the limit is already reached for the IP address or network of the calling application, the request is rejected.

Next, if there is any user associated with the endpoint, that account's rate limits are checked in the same manner and, similarly, if they are reached, the request is rejected.

Finally, if the endpoint's underlying service is configured to do so, it also checks if its invocation limits are not exceeded, rejecting the message accordingly if they are.

Note that the three levels are distinct yet they overlap in what they allow one to achieve.

For instance, it is possible to have the same user credentials be used in multiple endpoints and express ideas such as "Allow this and that user to invoke my APIs 1,000 requests/day but limit each endpoint to at most 5 requests/minute no matter which user".

Moreover, because limits can be set on services, it is possible to make it even more flexible, e.g. "Let this service be invoked at most 10,000 requests/hour, no matter which user it is, with particular users being able to invoke at most 500 requests/minute, no matter which service, topping it off with per separate limits for REST vs. SOAP vs. JSON-RPC endpoint, depending on what application is invoke the endpoints". That lets one conveniently express advanced scenarios that often occur in practical situations.

Also, observe that API rate limiting applies to REST, SOAP and JSON-RPC endpoints only, it is not used with other API endpoints, such as AMQP, IBM MQ, SAP, task scheduler or any other technologies. However, per-service limits work no matter which endpoint the service is invoked with and they will work with endpoints such as WebSockets, ZeroMQ or any other.

Lastly, limits pertain to with incoming requests only - any outgoing ones, from Zato to external resources - are not covered by it.

Per-IP restrictions

The architecture is made even more versatile thanks to the fact that for each object - endpoint, user or service - different limits can be configured depending on the caller's IP address.

This adds yet another dimension and allows to express ideas commonly witnessed in API-based projects, such as:

IP-based limits work hand in hand are an integral part of the mechanism - they do not rule out per-endpoit, user or service limits. In fact, for each such object, multiple IP-using limits can be set independently, thus allowing for highest degree of flexibility.

Exact or approximate

Rate limits come in two types:

Exact rate limits are just that, exact - they en that a limit is not exceeded at all, not even by a single request.

Approximate limits may let a very small number of requests to exceed the limit with the benefit being that approximate limits are faster to check than exact ones.

When to use which type depends on a particular project:

Python code and web-admin

Alright, let's check how to define the limits in Zato web-admin. We will use the sample service below:

# -*- coding: utf-8 -*-

# Zato
from zato.server.service import Service

class Sample(Service):
    name = 'api.sample'

    def handle(self):

        # Return a simple string on response
        self.response.payload = 'Hello there!\n'

Now, in web-admin, we will configure limits - separately for the service, a new and a new REST API channel (endpoint).

Configuring rate limits for service
Configuring rate limits for user
Configuring rate limits for user

Points of interest:

Testing it out

Now, all is left is to invoke the service from curl.

As long as limits are not reached, a business response is returned:

$ curl http://my.user:password@localhost:11223/api/sample
Hello there!
$

But if a limit is reached, the caller receives an error message with the 429 HTTP status.

$ curl -v http://my.user:password@localhost:11223/api/sample
*   Trying 127.0.0.1...

...

< HTTP/1.1 429 Too Many Requests
< Server: Zato
< X-Zato-CID: b8053d68612d626d338b02

...

{"zato_env":{"result":"ZATO_ERROR","cid":"b8053d68612d626d338b02eb",
 "details":"Error 429 Too Many Requests"}}
$

Note that the caller never knows what the limit was - that information is saved in Zato server logs along with other details so that API authors can correlate what callers get with the very rate limiting definition that prevented them from accessing the service.

zato.common.rate_limiting.common.RateLimitReached: Max. rate limit of 100/m reached;
from:`10.74.199.53`, network:`*`; last_from:`127.0.0.1;
last_request_time_utc:`2020-11-22T15:30:41.943794;
last_cid:`5f4f1ef65490a23e5c37eda1`; (cid:b8053d68612d626d338b02)

And this is it - we have created a new API rate limiting definition in Zato and tested it out successfully!

23 Nov 2020 12:53pm GMT

PyCharm: PyCharm 2020.3 Release Candidate

We're now in the final stages of our preparations for the PyCharm 2020.3 release. This week's build brings a variety of bug fixes that will help ensure the new version runs smoothly. Please try this version out and let us know how we're doing. And if you run into any issues, don't forget to submit a ticket on YouTrack.

Improvements in this version

Read the release notes to learn more.

How to download

Download the RC from our website. Alternatively, you can use the JetBrains Toolbox App to stay up to date.

If you're on Ubuntu 16.04 or later, you can use a snap package to get PyCharm RC versions, and stay up to date. You can find installation instructions on our website.

The release candidate (RC) is not an early access program (EAP) build and does not bundle an EAP license. To use PyCharm Professional Edition RC, you will either need a currently active PyCharm subscription or to start a 30-day free trial.

23 Nov 2020 12:05pm GMT

PyCharm: PyCharm 2020.3 Release Candidate

We're now in the final stages of our preparations for the PyCharm 2020.3 release. This week's build brings a variety of bug fixes that will help ensure the new version runs smoothly. Please try this version out and let us know how we're doing. And if you run into any issues, don't forget to submit a ticket on YouTrack.

Improvements in this version

Read the release notes to learn more.

How to download

Download the RC from our website. Alternatively, you can use the JetBrains Toolbox App to stay up to date.

If you're on Ubuntu 16.04 or later, you can use a snap package to get PyCharm RC versions, and stay up to date. You can find installation instructions on our website.

The release candidate (RC) is not an early access program (EAP) build and does not bundle an EAP license. To use PyCharm Professional Edition RC, you will either need a currently active PyCharm subscription or to start a 30-day free trial.

23 Nov 2020 12:05pm GMT

Mike Driscoll: PyDev of the Week: Reuven Lerner

This week we welcome Reuven Lerner (@reuvenmlerner) as our PyDev of the Week. Reuven is a trainer who teaches Python and data science all over the world. You can find out more on his website. Reuven also has a newsletter on becoming a better developer that you might enjoy.

Reuven also has the following resources freely available:

Let's take some time getting to know Reuven better!

Can you tell us a little about yourself (hobbies, education, etc):

I grew up in the Northeastern United States, and studied computer science at MIT, graduating in 1992. After working for Hewlett Packard and Time Warner, I moved to Israel in December 1995, opening my own consulting company. I had neither consulted nor run a business at that point, but I was single and optimistic, so I gave it a shot.

I've been in business for myself since then, pausing along the way to get a PhD in learning sciences from Northwestern University. My dissertation involved the creation of the Modeling Commons, which allows people to collaborate in the creation of agent-based models.

For years, I did a little bit of everything: I wrote software, did system administration, tuned databases, consulted with companies, and did training. About a decade ago, I realized that training was more fun and more lucrative than development - and that it was a good business practice to specialize in one thing. I've been a full-time Python trainer since then. Most days, I teach between 4-10 hours for companies around the world, teaching everything from "Python for non-programmers" all the way up to advanced Python workshops.

I'm married, with three children (20, 18, and 15), and live in Modi'in, a small city halfway between Tel Aviv and Jerusalem.

As for hobbies, my big obsession over the last few years has been studying Chinese. I find it fun and interesting, and also practical, given that I normally travel to China a few times each year to do corporate training there. (That has obviously been put on hold, thanks to the pandemic.)

Aside from Chinese, I read a lot, especially about current events. I also enjoy doing crosswords, and am steadily getting better at them. Everyone in my family, including me, also enjoys cooking, although I don't often have a chance to do it as much as I'd like. And as of the start of the pandemic, I've been taking very long, very early walks - about 15 km/day, starting at 4 a.m. I have found it a nice, refreshing way to get out in this time of staying
at home.

Why did you start using Python?

I was introduced to Python back in early 1993, when the Web was young and we were looking for languages with which we could write server-side scripts, aka "Web applications." (I actually objected to having the term "application developer" on my business card, because I thought it was laughable that you could call what we wrote "applications." Whoops.)

At the time, I did some Perl and some Python. At the time, Perl was more popular and had a much larger library of third-party modules. So while I knew Python and recommended it to anyone I knew who wanted to start programming, I personally used Perl for a while, continuing to use Python here and there, but not doing much with it.

I saw that Perl wasn't doing well as a language or community, and tried to figure out in which direction I could move. I tried Python, but the Web application frameworks at the time were too weak or too weird. (I even did a big project using Zope, with its object database.) That's when Ruby on Rails was released; because Ruby is basically Perl with Smalltalk objects, I was delighted to use the language.

But I couldn't escape noticing that Ruby was largely trapped in the Web world, whereas Python was growing in scope and mindshare. The number of third-party packages on PyPI was growing rapidly, and when I decided to exclusively do training (rather than doing it alongside development and consulting), I found that there was far more demand for Python than for anything else.

I've been deeply steeped in the Python world ever since, and I couldn't be happier.

What other programming languages do you know and which is your favorite?

I learned Lisp back in college, and I still use Emacs for editing - so I continue to have affection for Lisp as a language, and often refer to the concepts, I learned in it when working with Python.

As I wrote above, I loved working with Ruby. Everything is an object in Python, but that's even more the case in Ruby. I loved the freedom and creativity of the Ruby world, but the object model is hard for people to grasp - and in Ruby, if you don't eat objects for breakfast, you'll have a hard time with it.

My research group in graduate school developed NetLogo, an agent-based modeling language. That's a completely different way of writing code and expressing ideas, one which more developers should try.

I'm not sure if any of these would count as my favorite; I've now been using Python for long enough that I find that I can most easily express myself using its idioms.

I keep hoping to find time to learn Rust, because the idea of a systems language that doesn't require me to use pointers seems really attractive, and I've heard such great things about it. But I keep struggling to find time to learn it.

What projects are you working on now?

I'm always doing far too many things! Here are a few of them:

Which Python libraries are your favorite (core or 3rd party)?

I use Jupyter a ton in my training, and even in my day-to-day work, if I want to experiment with things. I continue to be impressed by the functionality that the core developers have put in there.

But for sheer depth, I have to say that Pandas continues to amaze me. Every time I use it, I discover not just a little bit of new functionality, but dozens of methods and options that I didn't previously know existed. You could spend your entire career just working with Pandas, and you would probably will not know all it has to offer.

I've become a big fan of pytest, also. All of my "Weekly Python Exercise" courses now use pytest for checking code, and I'd say that the courses have improved dramatically as a result.

Finally, the team that developers PyPI and pip deserves everyone's thanks and credit. I did a tiny bit of work on it at the sprints, during PyCon 2019, and I discovered that it's a tiny group of smart and hard-working people who are supporting an infrastructure rivaling many enterprise operations.

What do you enjoy the most as a Python trainer?

I love helping people to overcome conceptual hurdles. So many people have been using Python for months or years without truly understanding the logic of the language. If I can help them break down these ideas so that they can then understand how they work and are applied across the board, it really makes me happy.

It can be something as simple as how dictionaries work, or how a "for" loop works behind the scenes. Or it can be something more complex, such as inner functions, variable scoping, or decorators. Once people understand the simple rules that we can apply in Python, consistently and repeatedly, their work is easier, and I'm satisfied that they'll be able to do much more, in less time and with less code, than before.

Which concepts in Python are the hardest for your students to learn?

Comprehensions are extremely hard for people to understand. The syntax is already hard enough, but then understanding *when* you want to use a comprehension vs. a traditional "for" loop is really difficult. People often ask me if they should use comprehensions because they're more efficient, but

Decorators are a topic that people don't get at first, but which I try to break down and explain. (I gave a talk on the subject, "Practical Decorators," at PyCon 2019.) At first, people don't understand how to write decorators. Then they don't understand what to do with them. And then they're turned on, and find all sorts of uses - and discover that they have been seeing and using decorators already, just without knowing it or understanding what they were doing.

In my advanced courses, I talk about descriptors - and it's really hard for people to grasp. We think that we know what's going on when we access a.b, but if b is a class attribute *and* is an instance of a class that has a __get__ method, then all sorts of magic things happen. People are confused by why we even have them, until I show them that you need descriptors in order for methods to work.

Finally, it's hard for people - especially those coming from backgrounds in C and C++ - to understand that in Python, efficiency isn't the most important thing. Rather, readability and maintainability are. I always tell people that Python is perfect for an age in which people are expensive, but computers are cheap. This doesn't mean that efficiency is bad, but it's not the thing to worry about before anything else. We put people first!

Is there anything else you'd like to say?

I'm generally impressed with the Python community, in that it's welcoming to newcomers and patient with their questions. There are so many people learning Python, and for them it's not a passion or the latest language on a long list - it's something they have to do for work, and they're a bit confused by the terminology, the ecosystem, and even the syntax. I love working with newcomers to the language, and I encourage everyone to do what they can to help the huge influx of programming immigrants (for lack of a better term), to help this all make sense to them.\

Thanks for doing the interview, Reuven!

The post PyDev of the Week: Reuven Lerner appeared first on The Mouse Vs. The Python.

23 Nov 2020 6:05am GMT

Mike Driscoll: PyDev of the Week: Reuven Lerner

This week we welcome Reuven Lerner (@reuvenmlerner) as our PyDev of the Week. Reuven is a trainer who teaches Python and data science all over the world. You can find out more on his website. Reuven also has a newsletter on becoming a better developer that you might enjoy.

Reuven also has the following resources freely available:

Let's take some time getting to know Reuven better!

Can you tell us a little about yourself (hobbies, education, etc):

I grew up in the Northeastern United States, and studied computer science at MIT, graduating in 1992. After working for Hewlett Packard and Time Warner, I moved to Israel in December 1995, opening my own consulting company. I had neither consulted nor run a business at that point, but I was single and optimistic, so I gave it a shot.

I've been in business for myself since then, pausing along the way to get a PhD in learning sciences from Northwestern University. My dissertation involved the creation of the Modeling Commons, which allows people to collaborate in the creation of agent-based models.

For years, I did a little bit of everything: I wrote software, did system administration, tuned databases, consulted with companies, and did training. About a decade ago, I realized that training was more fun and more lucrative than development - and that it was a good business practice to specialize in one thing. I've been a full-time Python trainer since then. Most days, I teach between 4-10 hours for companies around the world, teaching everything from "Python for non-programmers" all the way up to advanced Python workshops.

I'm married, with three children (20, 18, and 15), and live in Modi'in, a small city halfway between Tel Aviv and Jerusalem.

As for hobbies, my big obsession over the last few years has been studying Chinese. I find it fun and interesting, and also practical, given that I normally travel to China a few times each year to do corporate training there. (That has obviously been put on hold, thanks to the pandemic.)

Aside from Chinese, I read a lot, especially about current events. I also enjoy doing crosswords, and am steadily getting better at them. Everyone in my family, including me, also enjoys cooking, although I don't often have a chance to do it as much as I'd like. And as of the start of the pandemic, I've been taking very long, very early walks - about 15 km/day, starting at 4 a.m. I have found it a nice, refreshing way to get out in this time of staying
at home.

Why did you start using Python?

I was introduced to Python back in early 1993, when the Web was young and we were looking for languages with which we could write server-side scripts, aka "Web applications." (I actually objected to having the term "application developer" on my business card, because I thought it was laughable that you could call what we wrote "applications." Whoops.)

At the time, I did some Perl and some Python. At the time, Perl was more popular and had a much larger library of third-party modules. So while I knew Python and recommended it to anyone I knew who wanted to start programming, I personally used Perl for a while, continuing to use Python here and there, but not doing much with it.

I saw that Perl wasn't doing well as a language or community, and tried to figure out in which direction I could move. I tried Python, but the Web application frameworks at the time were too weak or too weird. (I even did a big project using Zope, with its object database.) That's when Ruby on Rails was released; because Ruby is basically Perl with Smalltalk objects, I was delighted to use the language.

But I couldn't escape noticing that Ruby was largely trapped in the Web world, whereas Python was growing in scope and mindshare. The number of third-party packages on PyPI was growing rapidly, and when I decided to exclusively do training (rather than doing it alongside development and consulting), I found that there was far more demand for Python than for anything else.

I've been deeply steeped in the Python world ever since, and I couldn't be happier.

What other programming languages do you know and which is your favorite?

I learned Lisp back in college, and I still use Emacs for editing - so I continue to have affection for Lisp as a language, and often refer to the concepts, I learned in it when working with Python.

As I wrote above, I loved working with Ruby. Everything is an object in Python, but that's even more the case in Ruby. I loved the freedom and creativity of the Ruby world, but the object model is hard for people to grasp - and in Ruby, if you don't eat objects for breakfast, you'll have a hard time with it.

My research group in graduate school developed NetLogo, an agent-based modeling language. That's a completely different way of writing code and expressing ideas, one which more developers should try.

I'm not sure if any of these would count as my favorite; I've now been using Python for long enough that I find that I can most easily express myself using its idioms.

I keep hoping to find time to learn Rust, because the idea of a systems language that doesn't require me to use pointers seems really attractive, and I've heard such great things about it. But I keep struggling to find time to learn it.

What projects are you working on now?

I'm always doing far too many things! Here are a few of them:

Which Python libraries are your favorite (core or 3rd party)?

I use Jupyter a ton in my training, and even in my day-to-day work, if I want to experiment with things. I continue to be impressed by the functionality that the core developers have put in there.

But for sheer depth, I have to say that Pandas continues to amaze me. Every time I use it, I discover not just a little bit of new functionality, but dozens of methods and options that I didn't previously know existed. You could spend your entire career just working with Pandas, and you would probably will not know all it has to offer.

I've become a big fan of pytest, also. All of my "Weekly Python Exercise" courses now use pytest for checking code, and I'd say that the courses have improved dramatically as a result.

Finally, the team that developers PyPI and pip deserves everyone's thanks and credit. I did a tiny bit of work on it at the sprints, during PyCon 2019, and I discovered that it's a tiny group of smart and hard-working people who are supporting an infrastructure rivaling many enterprise operations.

What do you enjoy the most as a Python trainer?

I love helping people to overcome conceptual hurdles. So many people have been using Python for months or years without truly understanding the logic of the language. If I can help them break down these ideas so that they can then understand how they work and are applied across the board, it really makes me happy.

It can be something as simple as how dictionaries work, or how a "for" loop works behind the scenes. Or it can be something more complex, such as inner functions, variable scoping, or decorators. Once people understand the simple rules that we can apply in Python, consistently and repeatedly, their work is easier, and I'm satisfied that they'll be able to do much more, in less time and with less code, than before.

Which concepts in Python are the hardest for your students to learn?

Comprehensions are extremely hard for people to understand. The syntax is already hard enough, but then understanding *when* you want to use a comprehension vs. a traditional "for" loop is really difficult. People often ask me if they should use comprehensions because they're more efficient, but

Decorators are a topic that people don't get at first, but which I try to break down and explain. (I gave a talk on the subject, "Practical Decorators," at PyCon 2019.) At first, people don't understand how to write decorators. Then they don't understand what to do with them. And then they're turned on, and find all sorts of uses - and discover that they have been seeing and using decorators already, just without knowing it or understanding what they were doing.

In my advanced courses, I talk about descriptors - and it's really hard for people to grasp. We think that we know what's going on when we access a.b, but if b is a class attribute *and* is an instance of a class that has a __get__ method, then all sorts of magic things happen. People are confused by why we even have them, until I show them that you need descriptors in order for methods to work.

Finally, it's hard for people - especially those coming from backgrounds in C and C++ - to understand that in Python, efficiency isn't the most important thing. Rather, readability and maintainability are. I always tell people that Python is perfect for an age in which people are expensive, but computers are cheap. This doesn't mean that efficiency is bad, but it's not the thing to worry about before anything else. We put people first!

Is there anything else you'd like to say?

I'm generally impressed with the Python community, in that it's welcoming to newcomers and patient with their questions. There are so many people learning Python, and for them it's not a passion or the latest language on a long list - it's something they have to do for work, and they're a bit confused by the terminology, the ecosystem, and even the syntax. I love working with newcomers to the language, and I encourage everyone to do what they can to help the huge influx of programming immigrants (for lack of a better term), to help this all make sense to them.\

Thanks for doing the interview, Reuven!

The post PyDev of the Week: Reuven Lerner appeared first on The Mouse Vs. The Python.

23 Nov 2020 6:05am GMT

22 Nov 2020

feedPlanet Python

Erik Marsja: How to use Python to Perform a Paired Sample T-test

The post How to use Python to Perform a Paired Sample T-test appeared first on Erik Marsja.

In this Python data analysis tutorial, you will learn how to perform a paired sample t-test in Python. First, you will learn about this type of t-test (e.g. when to use it, the assumptions of the test). Second, you will learn how to check whether your data follow the assumptions and what you can do if your data violates some of the assumptions.

Third, you will learn how to perform a paired sample t-test using the following Python packages:

In the final sections, of this tutorial, you will also learn how to:

In the first section, you will learn about what is required to follow this post.

Prerequisites

In this tutorial, we are going to use both SciPy and Pingouin, two great Python packages, to carry out the dependent sample t-test. Furthermore, to read the dataset we are going to use Pandas. Finally, we are also going to use Seaborn to visualize the data. In the next three subsections, you will find a brief description of each of these packages.

SciPy

SciPy is one of the essential data science packages. This package is, furthermore, a dependency of all the other packages that we are going to use in this tutorial. In this tutorial, we are going to use it to test the assumption of normality as well as carry out the paired sample t-test. This means, of course, that if you are going to carry out the data analysis using Pingouin you will get SciPy installed anyway.

Pandas

Pandas is also a very great Python package for someone carrying out data analysis with Python, whether a data scientist or a psychologist. In this post, we will use Pandas import data into a dataframe and to calculate summary statistics.

Seaborn

In this tutorial, we are going to use data visualization to guide our interpretation of the paired sample t-test. Seaborn is a great package for carrying out data visualization (see for example these 9 examples of how to use Seaborn for data visualization in Python).

Pingouin

In this tutorial, Pingouin is the second package that we are going to use to do a paired sample t-test in Python. One great thing with the ttest function is that it returns a lot of information we need when reporting the results from the test. For instance, when using Pingouin we also get the degrees of freedom, Bayes Factor, power, effect size (Cohen's d), and confidence interval.

How to Install the Needed Packages

In Python, we can install packages with pip. To install all the required packages run the following code:

pip install scipy pandas seaborn pingouin

In the next section, we are going to learn about the paired t-test and it's assumptions.

Paired Sample T-test

The paired sample t-test is also known as the dependent sample t-test, and paired t-test. Furthermore, this type of t-test compares two averages (means) and will give you information if the difference between these two averages are zero. In a paired sample t-test, each participant is measured twice, which results in pairs of observations (the next section will give you an example).

Example on When to Use this Test

For example, if clinical psychologists want to test whether a treatment for depression will change the quality of life, they might set up an experiment. In this experiment, they will collect information about the participants' quality of life before the intervention (i.e., the treatment and after. They are conducting a pre- and post-test study. In the pre-test the average quality of life might be 3, while in the post-test the average quality of life might be 5. Numerically, we could think that the treatment is working. However, it could be due to a fluke and, in order to test this, the clinical researchers can use the paired sample t-test.

Hypotheses

Now, when performing dependent sample t-tests you typically have the following two hypotheses:

  1. Null hypotheses: the true mean difference is equal to zero (between the observations)
  2. Alternative hypotheses: the true mean difference is not equal to zero (two-tailed)

Note, in some cases we also may have a specific idea, based on theory, about the direction of the measured effect. For example, we may strongly believe (due to previous research and/or theory) that a specific intervention should have a positive effect. In such a case, the alternative hypothesis will be something like: the true mean difference is greater than zero (one-tailed). Note, it can also be smaller than zero, of course.

Assumptions

Before we continue and import data we will briefly have a look at the assumptions of this paired t-test. Now, besides that the dependent variable is on interval/ratio scale, and is continuous, there are three assumptions that need to be met.

If your data is not following a normal distribution you can transform your dependent variable using square root, log, or Box-Cox in Python. In the next section, we will import data.

Example Data

Before we check the normality assumption of the paired t-test in Python, we need some data to even do so. In this tutorial post, we are going to work with a dataset that can be found here. Here we will use Pandas and the read_csv method to import the dataset (stored in a .csv file):

df = pd.read_csv('./SimData/paired_samples_data.csv', index_col=0)
Structure of the dataset to carry out paired t-test on

In the image above, we can see the structure of the dataframe. Our dataset contains 100 observations and three variables (columns). Furthermore, there are three different datatypes in the dataframe. First, we have an integer column (i.e., "ids"). This column contains the identifier for each individual in the study. Second, we have the column "test" which is of object data type and contains the information about the test time point. Finally, we have the "score" column where the dependent variable is. We can check the pairs by grouping the Pandas dataframe and calculate descriptive statistics:

Descriptive statistics

In the code chunk above, we grouped the data by "test" and selected the dependent variable, and got some descriptive statistics using the describe() method. If we want, we can use Pandas to count unique values in a column:

df['test'].value_counts()

This way we got the information that we have as many observations in the post test as in the pre test. A quick note: before we continue to the next subsection, in which we subset the data, it has to be mentioned that you should check whether the dependent variable is normally distributed or not. This can be done by creating a histogram (e.g., with Pandas) and/or carrying out the Shapiro-Wilks test.

Subset the Data

Both the methods, whether using SciPy or Pingouin, require that we have our dependent variable in two Python variables. Therefore, we are going to subset the data and select only the dependent variable. To our help we have the query() method and we will select a column using the brackets ([]):

b = df.query('test == "Pre"')['score'] a = df.query('test == "Post"')['score']

Now, we have the variables a and b containing the dependent variable pairs we can use SciPy to do a paired sample t-test.

Python Paired Sample T-test using SciPy

Here's how to carry out a paired sample t-test in Python using SciPy:

from scipy.stats import ttest_rel # Python paired sample t-test ttest_rel(a, b)

In the code chunk above, we first started by importing ttest_rel(), the method we then used to carry out the dependent sample t-test. Furthermore, the two parameters we used were the data, containing the dependent variable, in the pairs (a, and b). Now, we can see by the results (image below) that the difference between the pre- and post-test is statistically significant.

The result from python paired sample-test with SciPyResult from the paired sample t-test

In the next section, we will use Pingouin to carry out the paired t-test.

How to Perform Paired Sample T-test in Python with Pingouin

Here's how to carry out the dependent samples t-test using the Python package Pingouin:

import pingouin as pt # Python paired sample t-test: pt.ttest(a, b, paired=True)

There's not that much to explain, about the code chunk above, but we started by importing pingouin. Next, we used the ttest() method and used our data. Notice how we used the paired parameter and set it to True. We did this because it is a paired sample t-test we wanted to carry out. Here's the output:

Table with result from python paired sample t-test

As you can see, we get more information when using Pingouin to do the paired t-test. In fact, here we basically get all we need to continue and interpret the results. In the next section, before learning how to interpret the results, you can also watch a YouTube video explaining all the above (with some exceptions, of course):

Python Paired Sample T-test YouTube Video:

Here's the majority of the current blog post explained in a YouTube video:

How to Interpret the Results from a paired T-test

In this section, you will be given a short explanation on how to interpret the results from a paired t-test carried out with Python. Note, we will focus on the results that we got from Pingouin as they give us more information (e.g., degrees of freedom, effect size).

Interpreting the P-value

Now, the p-value of the test is smaller than 0.001, which is less than the significance level alpha (e.g., 0.05). This means that we can draw the conclusion that the quality of life has increased when the participants conducted the post-test. Note, this can, of course, be due to other things than the intervention but that's another story.

Note that, the p-value is a probability of getting an effect at least as extreme as the one in our data, assuming that the null hypothesis is true. Pp-values address only one question: how likely your collected data is, assuming a true null hypothesis? Notice, the p-value can never be used as support for the alternative hypothesis.

Interpreting Cohen's D

Normally, we interpret Cohen's D in terms of the relative strength of e.g. the treatment. Cohen (1988) suggested that d=0.2 is a 'small' effect size, 0.5 is a 'medium' effect size, and that 0.8 is a 'large' effect size. You can interpret this such as that iif two groups' means don't differ by 0.2 standard deviations or more, the difference is trivial, even if it is statistically significant.

Interpreting the Bayes Factor from Pingouin

When using Pingouin to carry out the paired t-test we also get the Bayes Factor. See this post for more information on how to interpret BF10.

How to Report the Results

In this section, you will learn how to report the results according to the APA guidelines. In our case, we can report the results from the t-test like this:

The results from the pre-test (M = 39.77, SD = 6.758) and post-test (M = 45.737, SD = 6.77) quality of life test suggest that the treatment resulted in an improvement in quality of life, t(49) = 115.4384, p < .01. Note, that the "quality of life test" is something made up, for this post (or there might be such a test, of course, that I don't know of!).

In the final section, before the conclusion, you will learn how to visualize the data in two different ways: creating boxplots and violin plots.

How to Visualize the Data using Boxplots:

Here's how we can guide the interpretation of the paired t-test using boxplots:

import seaborn as sns sns.boxplot(x='test', y='score', data=df)

In the code chunk above, we imported seaborn (as sns), and used the boxplot method. First, we put the column that we want to display separate plots for on the x-axis. Here's the resulting plot:

Paired sample boxplot

How to Visualize the Data using Violin Plots:

Here's another way to report the results from the t-test by creating a violin plot:

import seaborn as sns sns.violinplot(x='test', y='score', data=df)

Much like creating the box plot, we import seaborn and add the columns/variables we want as x- and y-axis'. Here's the resulting plot:

Alternative Data Analysis Methods in Python

As you may already be aware of, there are other ways to analyze data. For example, you can use Analysis of Variance (ANOVA) if there are more than two levels in the factorial (e.g. tests during the treatment, as well as pre- and post -tests) in the data. See the following posts about how to carry out ANOVA:

Recently, machine learning methods have grown popular. See the following posts for more information:

Summary

In this post, you have learned two methods to perform a paired sample t-test.Specifically, in this post you have installed, and used, three Python packages for data analysis (Pandas, SciPy, and Pingouin). Furthermore, you have learned how to interpret and report the results from this statistical test, including data visualization using Seaborn. In the Resources and References section, you will find useful resources and references to learn more. As a final word: the Python package Pingouin will give you the most comprehensive result and that's the package I'd choose to carry out many statistical methods in Python.

If you liked the post, please share it on your social media accounts and/or leave a comment below. Commenting is also a great way to give me suggestions. However, if you are looking for any help please use other means of contact (see e.g., the About or Contact pages).

Finally, support me and my content (much appreciated, especially if you use an AdBlocker): become a patron. Becoming a patron will give you access to a Discord channel in which you can ask questions and may get interactive feedback.

Additional Resources and References

Here are some useful peer-reviewed articles, blog posts, and books. Refer to these if you want to learn more about the t-test, p-value, effect size, and Bayes Factors.

Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum Associates, Publishers.

Interpreting P-values.

It's the Effect Size, Stupid - What effect size is and why it is important

Using Effect Size-or Why the P Value Is Not Enough.

Beyond Cohen's d: Alternative Effect Size Measures for Between-Subject Designs (Paywalled).

A tutorial on testing hypotheses using the Bayes factor.

The post How to use Python to Perform a Paired Sample T-test appeared first on Erik Marsja.

22 Nov 2020 10:46pm GMT

Erik Marsja: How to use Python to Perform a Paired Sample T-test

The post How to use Python to Perform a Paired Sample T-test appeared first on Erik Marsja.

In this Python data analysis tutorial, you will learn how to perform a paired sample t-test in Python. First, you will learn about this type of t-test (e.g. when to use it, the assumptions of the test). Second, you will learn how to check whether your data follow the assumptions and what you can do if your data violates some of the assumptions.

Third, you will learn how to perform a paired sample t-test using the following Python packages:

In the final sections, of this tutorial, you will also learn how to:

In the first section, you will learn about what is required to follow this post.

Prerequisites

In this tutorial, we are going to use both SciPy and Pingouin, two great Python packages, to carry out the dependent sample t-test. Furthermore, to read the dataset we are going to use Pandas. Finally, we are also going to use Seaborn to visualize the data. In the next three subsections, you will find a brief description of each of these packages.

SciPy

SciPy is one of the essential data science packages. This package is, furthermore, a dependency of all the other packages that we are going to use in this tutorial. In this tutorial, we are going to use it to test the assumption of normality as well as carry out the paired sample t-test. This means, of course, that if you are going to carry out the data analysis using Pingouin you will get SciPy installed anyway.

Pandas

Pandas is also a very great Python package for someone carrying out data analysis with Python, whether a data scientist or a psychologist. In this post, we will use Pandas import data into a dataframe and to calculate summary statistics.

Seaborn

In this tutorial, we are going to use data visualization to guide our interpretation of the paired sample t-test. Seaborn is a great package for carrying out data visualization (see for example these 9 examples of how to use Seaborn for data visualization in Python).

Pingouin

In this tutorial, Pingouin is the second package that we are going to use to do a paired sample t-test in Python. One great thing with the ttest function is that it returns a lot of information we need when reporting the results from the test. For instance, when using Pingouin we also get the degrees of freedom, Bayes Factor, power, effect size (Cohen's d), and confidence interval.

How to Install the Needed Packages

In Python, we can install packages with pip. To install all the required packages run the following code:

pip install scipy pandas seaborn pingouin

In the next section, we are going to learn about the paired t-test and it's assumptions.

Paired Sample T-test

The paired sample t-test is also known as the dependent sample t-test, and paired t-test. Furthermore, this type of t-test compares two averages (means) and will give you information if the difference between these two averages are zero. In a paired sample t-test, each participant is measured twice, which results in pairs of observations (the next section will give you an example).

Example on When to Use this Test

For example, if clinical psychologists want to test whether a treatment for depression will change the quality of life, they might set up an experiment. In this experiment, they will collect information about the participants' quality of life before the intervention (i.e., the treatment and after. They are conducting a pre- and post-test study. In the pre-test the average quality of life might be 3, while in the post-test the average quality of life might be 5. Numerically, we could think that the treatment is working. However, it could be due to a fluke and, in order to test this, the clinical researchers can use the paired sample t-test.

Hypotheses

Now, when performing dependent sample t-tests you typically have the following two hypotheses:

  1. Null hypotheses: the true mean difference is equal to zero (between the observations)
  2. Alternative hypotheses: the true mean difference is not equal to zero (two-tailed)

Note, in some cases we also may have a specific idea, based on theory, about the direction of the measured effect. For example, we may strongly believe (due to previous research and/or theory) that a specific intervention should have a positive effect. In such a case, the alternative hypothesis will be something like: the true mean difference is greater than zero (one-tailed). Note, it can also be smaller than zero, of course.

Assumptions

Before we continue and import data we will briefly have a look at the assumptions of this paired t-test. Now, besides that the dependent variable is on interval/ratio scale, and is continuous, there are three assumptions that need to be met.

If your data is not following a normal distribution you can transform your dependent variable using square root, log, or Box-Cox in Python. In the next section, we will import data.

Example Data

Before we check the normality assumption of the paired t-test in Python, we need some data to even do so. In this tutorial post, we are going to work with a dataset that can be found here. Here we will use Pandas and the read_csv method to import the dataset (stored in a .csv file):

df = pd.read_csv('./SimData/paired_samples_data.csv', index_col=0)
Structure of the dataset to carry out paired t-test on

In the image above, we can see the structure of the dataframe. Our dataset contains 100 observations and three variables (columns). Furthermore, there are three different datatypes in the dataframe. First, we have an integer column (i.e., "ids"). This column contains the identifier for each individual in the study. Second, we have the column "test" which is of object data type and contains the information about the test time point. Finally, we have the "score" column where the dependent variable is. We can check the pairs by grouping the Pandas dataframe and calculate descriptive statistics:

Descriptive statistics

In the code chunk above, we grouped the data by "test" and selected the dependent variable, and got some descriptive statistics using the describe() method. If we want, we can use Pandas to count unique values in a column:

df['test'].value_counts()

This way we got the information that we have as many observations in the post test as in the pre test. A quick note: before we continue to the next subsection, in which we subset the data, it has to be mentioned that you should check whether the dependent variable is normally distributed or not. This can be done by creating a histogram (e.g., with Pandas) and/or carrying out the Shapiro-Wilks test.

Subset the Data

Both the methods, whether using SciPy or Pingouin, require that we have our dependent variable in two Python variables. Therefore, we are going to subset the data and select only the dependent variable. To our help we have the query() method and we will select a column using the brackets ([]):

b = df.query('test == "Pre"')['score'] a = df.query('test == "Post"')['score']

Now, we have the variables a and b containing the dependent variable pairs we can use SciPy to do a paired sample t-test.

Python Paired Sample T-test using SciPy

Here's how to carry out a paired sample t-test in Python using SciPy:

from scipy.stats import ttest_rel # Python paired sample t-test ttest_rel(a, b)

In the code chunk above, we first started by importing ttest_rel(), the method we then used to carry out the dependent sample t-test. Furthermore, the two parameters we used were the data, containing the dependent variable, in the pairs (a, and b). Now, we can see by the results (image below) that the difference between the pre- and post-test is statistically significant.

The result from python paired sample-test with SciPyResult from the paired sample t-test

In the next section, we will use Pingouin to carry out the paired t-test.

How to Perform Paired Sample T-test in Python with Pingouin

Here's how to carry out the dependent samples t-test using the Python package Pingouin:

import pingouin as pt # Python paired sample t-test: pt.ttest(a, b, paired=True)

There's not that much to explain, about the code chunk above, but we started by importing pingouin. Next, we used the ttest() method and used our data. Notice how we used the paired parameter and set it to True. We did this because it is a paired sample t-test we wanted to carry out. Here's the output:

Table with result from python paired sample t-test

As you can see, we get more information when using Pingouin to do the paired t-test. In fact, here we basically get all we need to continue and interpret the results. In the next section, before learning how to interpret the results, you can also watch a YouTube video explaining all the above (with some exceptions, of course):

Python Paired Sample T-test YouTube Video:

Here's the majority of the current blog post explained in a YouTube video:

How to Interpret the Results from a paired T-test

In this section, you will be given a short explanation on how to interpret the results from a paired t-test carried out with Python. Note, we will focus on the results that we got from Pingouin as they give us more information (e.g., degrees of freedom, effect size).

Interpreting the P-value

Now, the p-value of the test is smaller than 0.001, which is less than the significance level alpha (e.g., 0.05). This means that we can draw the conclusion that the quality of life has increased when the participants conducted the post-test. Note, this can, of course, be due to other things than the intervention but that's another story.

Note that, the p-value is a probability of getting an effect at least as extreme as the one in our data, assuming that the null hypothesis is true. Pp-values address only one question: how likely your collected data is, assuming a true null hypothesis? Notice, the p-value can never be used as support for the alternative hypothesis.

Interpreting Cohen's D

Normally, we interpret Cohen's D in terms of the relative strength of e.g. the treatment. Cohen (1988) suggested that d=0.2 is a 'small' effect size, 0.5 is a 'medium' effect size, and that 0.8 is a 'large' effect size. You can interpret this such as that iif two groups' means don't differ by 0.2 standard deviations or more, the difference is trivial, even if it is statistically significant.

Interpreting the Bayes Factor from Pingouin

When using Pingouin to carry out the paired t-test we also get the Bayes Factor. See this post for more information on how to interpret BF10.

How to Report the Results

In this section, you will learn how to report the results according to the APA guidelines. In our case, we can report the results from the t-test like this:

The results from the pre-test (M = 39.77, SD = 6.758) and post-test (M = 45.737, SD = 6.77) quality of life test suggest that the treatment resulted in an improvement in quality of life, t(49) = 115.4384, p < .01. Note, that the "quality of life test" is something made up, for this post (or there might be such a test, of course, that I don't know of!).

In the final section, before the conclusion, you will learn how to visualize the data in two different ways: creating boxplots and violin plots.

How to Visualize the Data using Boxplots:

Here's how we can guide the interpretation of the paired t-test using boxplots:

import seaborn as sns sns.boxplot(x='test', y='score', data=df)

In the code chunk above, we imported seaborn (as sns), and used the boxplot method. First, we put the column that we want to display separate plots for on the x-axis. Here's the resulting plot:

Paired sample boxplot

How to Visualize the Data using Violin Plots:

Here's another way to report the results from the t-test by creating a violin plot:

import seaborn as sns sns.violinplot(x='test', y='score', data=df)

Much like creating the box plot, we import seaborn and add the columns/variables we want as x- and y-axis'. Here's the resulting plot:

Alternative Data Analysis Methods in Python

As you may already be aware of, there are other ways to analyze data. For example, you can use Analysis of Variance (ANOVA) if there are more than two levels in the factorial (e.g. tests during the treatment, as well as pre- and post -tests) in the data. See the following posts about how to carry out ANOVA:

Recently, machine learning methods have grown popular. See the following posts for more information:

Summary

In this post, you have learned two methods to perform a paired sample t-test.Specifically, in this post you have installed, and used, three Python packages for data analysis (Pandas, SciPy, and Pingouin). Furthermore, you have learned how to interpret and report the results from this statistical test, including data visualization using Seaborn. In the Resources and References section, you will find useful resources and references to learn more. As a final word: the Python package Pingouin will give you the most comprehensive result and that's the package I'd choose to carry out many statistical methods in Python.

If you liked the post, please share it on your social media accounts and/or leave a comment below. Commenting is also a great way to give me suggestions. However, if you are looking for any help please use other means of contact (see e.g., the About or Contact pages).

Finally, support me and my content (much appreciated, especially if you use an AdBlocker): become a patron. Becoming a patron will give you access to a Discord channel in which you can ask questions and may get interactive feedback.

Additional Resources and References

Here are some useful peer-reviewed articles, blog posts, and books. Refer to these if you want to learn more about the t-test, p-value, effect size, and Bayes Factors.

Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum Associates, Publishers.

Interpreting P-values.

It's the Effect Size, Stupid - What effect size is and why it is important

Using Effect Size-or Why the P Value Is Not Enough.

Beyond Cohen's d: Alternative Effect Size Measures for Between-Subject Designs (Paywalled).

A tutorial on testing hypotheses using the Bayes factor.

The post How to use Python to Perform a Paired Sample T-test appeared first on Erik Marsja.

22 Nov 2020 10:46pm GMT

Codementor: Building a basic HTTP Server from scratch in Python

Build a web server from scratch using Python sockets.

22 Nov 2020 10:27am GMT

Codementor: Building a basic HTTP Server from scratch in Python

Build a web server from scratch using Python sockets.

22 Nov 2020 10:27am GMT

10 Nov 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: King Willams Town Bahnhof

Gestern musste ich morgens zur Station nach KWT um unsere Rerservierten Bustickets für die Weihnachtsferien in Capetown abzuholen. Der Bahnhof selber ist seit Dezember aus kostengründen ohne Zugverbindung - aber Translux und co - die langdistanzbusse haben dort ihre Büros.


Größere Kartenansicht




© benste CC NC SA

10 Nov 2011 10:57am GMT

09 Nov 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein

Niemand ist besorgt um so was - mit dem Auto fährt man einfach durch, und in der City - nahe Gnobie- "ne das ist erst gefährlich wenn die Feuerwehr da ist" - 30min später auf dem Rückweg war die Feuerwehr da.




© benste CC NC SA

09 Nov 2011 8:25pm GMT

08 Nov 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: Brai Party

Brai = Grillabend o.ä.

Die möchte gern Techniker beim Flicken ihrer SpeakOn / Klinke Stecker Verzweigungen...

Die Damen "Mamas" der Siedlung bei der offiziellen Eröffnungsrede

Auch wenn weniger Leute da waren als erwartet, Laute Musik und viele Leute ...

Und natürlich ein Feuer mit echtem Holz zum Grillen.

© benste CC NC SA

08 Nov 2011 2:30pm GMT

07 Nov 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: Lumanyano Primary

One of our missions was bringing Katja's Linux Server back to her room. While doing that we saw her new decoration.

Björn, Simphiwe carried the PC to Katja's school


© benste CC NC SA

07 Nov 2011 2:00pm GMT

06 Nov 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: Nelisa Haircut

Today I went with Björn to Needs Camp to Visit Katja's guest family for a special Party. First of all we visited some friends of Nelisa - yeah the one I'm working with in Quigney - Katja's guest fathers sister - who did her a haircut.

African Women usually get their hair done by arranging extensions and not like Europeans just cutting some hair.

In between she looked like this...

And then she was done - looks amazing considering the amount of hair she had last week - doesn't it ?

© benste CC NC SA

06 Nov 2011 7:45pm GMT

05 Nov 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: Mein Samstag

Irgendwie viel mir heute auf das ich meine Blogposts mal ein bischen umstrukturieren muss - wenn ich immer nur von neuen Plätzen berichte, dann müsste ich ja eine Rundreise machen. Hier also mal ein paar Sachen aus meinem heutigen Alltag.

Erst einmal vorweg, Samstag zählt zumindest für uns Voluntäre zu den freien Tagen.

Dieses Wochenende sind nur Rommel und ich auf der Farm - Katja und Björn sind ja mittlerweile in ihren Einsatzstellen, und meine Mitbewohner Kyle und Jonathan sind zu Hause in Grahamstown - sowie auch Sipho der in Dimbaza wohnt.
Robin, die Frau von Rommel ist in Woodie Cape - schon seit Donnerstag um da ein paar Sachen zur erledigen.
Naja wie dem auch sei heute morgen haben wir uns erstmal ein gemeinsames Weetbix/Müsli Frühstück gegönnt und haben uns dann auf den Weg nach East London gemacht. 2 Sachen waren auf der Checkliste Vodacom, Ethienne (Imobilienmakler) außerdem auf dem Rückweg die fehlenden Dinge nach NeedsCamp bringen.

Nachdem wir gerade auf der Dirtroad losgefahren sind mussten wir feststellen das wir die Sachen für Needscamp und Ethienne nicht eingepackt hatten aber die Pumpe für die Wasserversorgung im Auto hatten.

Also sind wir in EastLondon ersteinmal nach Farmerama - nein nicht das onlinespiel farmville - sondern einen Laden mit ganz vielen Sachen für eine Farm - in Berea einem nördlichen Stadteil gefahren.

In Farmerama haben wir uns dann beraten lassen für einen Schnellverschluss der uns das leben mit der Pumpe leichter machen soll und außerdem eine leichtere Pumpe zur Reperatur gebracht, damit es nicht immer so ein großer Aufwand ist, wenn mal wieder das Wasser ausgegangen ist.

Fego Caffé ist in der Hemmingways Mall, dort mussten wir und PIN und PUK einer unserer Datensimcards geben lassen, da bei der PIN Abfrage leider ein zahlendreher unterlaufen ist. Naja auf jeden Fall speichern die Shops in Südafrika so sensible Daten wie eine PUK - die im Prinzip zugang zu einem gesperrten Phone verschafft.

Im Cafe hat Rommel dann ein paar online Transaktionen mit dem 3G Modem durchgeführt, welches ja jetzt wieder funktionierte - und übrigens mittlerweile in Ubuntu meinem Linuxsystem perfekt klappt.

Nebenbei bin ich nach 8ta gegangen um dort etwas über deren neue Deals zu erfahren, da wir in einigen von Hilltops Centern Internet anbieten wollen. Das Bild zeigt die Abdeckung UMTS in NeedsCamp Katjas Ort. 8ta ist ein neuer Telefonanbieter von Telkom, nachdem Vodafone sich Telkoms anteile an Vodacom gekauft hat müssen die komplett neu aufbauen.
Wir haben uns dazu entschieden mal eine kostenlose Prepaidkarte zu testen zu organisieren, denn wer weis wie genau die Karte oben ist ... Bevor man einen noch so billigen Deal für 24 Monate signed sollte man wissen obs geht.

Danach gings nach Checkers in Vincent, gesucht wurden zwei Hotplates für WoodyCape - R 129.00 eine - also ca. 12€ für eine zweigeteilte Kochplatte.
Wie man sieht im Hintergrund gibts schon Weihnachtsdeko - Anfang November und das in Südafrika bei sonnig warmen min- 25°C

Mittagessen haben wir uns bei einem Pakistanischen Curry Imbiss gegönnt - sehr empfehlenswert !
Naja und nachdem wir dann vor ner Stunde oder so zurück gekommen sind habe ich noch den Kühlschrank geputzt den ich heute morgen zum defrosten einfach nach draußen gestellt hatte. Jetzt ist der auch mal wieder sauber und ohne 3m dicke Eisschicht...

Morgen ... ja darüber werde ich gesondert berichten ... aber vermutlich erst am Montag, denn dann bin ich nochmal wieder in Quigney(East London) und habe kostenloses Internet.

© benste CC NC SA

05 Nov 2011 4:33pm GMT

31 Oct 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: Sterkspruit Computer Center

Sterkspruit is one of Hilltops Computer Centres in the far north of Eastern Cape. On the trip to J'burg we've used the opportunity to take a look at the centre.

Pupils in the big classroom


The Trainer


School in Countryside


Adult Class in the Afternoon


"Town"


© benste CC NC SA

31 Oct 2011 4:58pm GMT

Benedict Stein: Technical Issues

What are you doing in an internet cafe if your ADSL and Faxline has been discontinued before months end. Well my idea was sitting outside and eating some ice cream.
At least it's sunny and not as rainy as on the weekend.


© benste CC NC SA

31 Oct 2011 3:11pm GMT

30 Oct 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: Nellis Restaurant

For those who are traveling through Zastron - there is a very nice Restaurant which is serving delicious food at reasanable prices.
In addition they're selling home made juices jams and honey.




interior


home made specialities - the shop in the shop


the Bar


© benste CC NC SA

30 Oct 2011 4:47pm GMT

29 Oct 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: The way back from J'burg

Having the 10 - 12h trip from J'burg back to ELS I was able to take a lot of pcitures including these different roadsides

Plain Street


Orange River in its beginngings (near Lesotho)


Zastron Anglican Church


The Bridge in Between "Free State" and Eastern Cape next to Zastron


my new Background ;)


If you listen to GoogleMaps you'll end up traveling 50km of gravel road - as it was just renewed we didn't have that many problems and saved 1h compared to going the official way with all it's constructions sites




Freeway


getting dark


© benste CC NC SA

29 Oct 2011 4:23pm GMT

28 Oct 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: Wie funktioniert eigentlich eine Baustelle ?

Klar einiges mag anders sein, vieles aber gleich - aber ein in Deutschland täglich übliches Bild einer Straßenbaustelle - wie läuft das eigentlich in Südafrika ?

Ersteinmal vorweg - NEIN keine Ureinwohner die mit den Händen graben - auch wenn hier mehr Manpower genutzt wird - sind sie fleißig mit Technologie am arbeiten.

Eine ganz normale "Bundesstraße"


und wie sie erweitert wird


gaaaanz viele LKWs


denn hier wird eine Seite über einen langen Abschnitt komplett gesperrt, so das eine Ampelschaltung mit hier 45 Minuten Wartezeit entsteht


Aber wenigstens scheinen die ihren Spaß zu haben ;) - Wie auch wir denn gücklicher Weise mussten wir nie länger als 10 min. warten.

© benste CC NC SA

28 Oct 2011 4:20pm GMT