02 Jun 2020

feedPlanet Python

PyCoder’s Weekly: Issue #423 (June 2, 2020)

#423 - JUNE 2, 2020
View in Browser »

The PyCoder’s Weekly Logo


The Many Ways to Pass Code to Python From the Terminal

You might know about pointing Python to a file path, or using -m to execute a module. But did you know that Python can execute a directory? Or a .zip file?
BRETT CANNON

The PEPs of Python 3.9

The first Python 3.9 beta release is upon us! Learn what to expect in the final October release by taking a tour of the Python Enhancement Proposals (PEPs) that were accepted for Python 3.9.
JAKE EDGE

Python Developers Are in Demand on Vettery

alt

Vettery is an online hiring marketplace that's changing the way people hire and get hired. Ready for a bold career move? Make a free profile, name your salary, and connect with hiring managers from top employers today →
VETTERY sponsor

Overview of Python Dependency Management Tools

While pip is often considered the de facto Python package manager, the dependency management ecosystem has really grown over that last few years. Learn about the different tools available and how they fit into this ecosystem.
MARIO KOSTELAC

Build Physical Projects With Python on the Raspberry Pi

In this tutorial, you'll learn to use Python on the Raspberry Pi. The Raspberry Pi is one of the leading physical computing boards on the market and a great way to get started using Python to interact with the physical world.
REAL PYTHON

CPython Internals Book (Early Access Discount - Save 60%)

alt

Unlock the inner workings of the Python language, compile the Python interpreter from source code, and participate in the development of CPython. The new "CPython Internals" book shows you exactly how. Download the sample chapters and claim your Early Access discount →
REAL PYTHON sponsor

PSF Board Nomination Period Extended to June 3rd

PYTHON.ORG

Qt for Python 5.15.0 Is Out!

QT.IO

Discussions

How Often Do You Use Python's pdb Debugger?

RAYMOND HETTINGER

Python Jobs

Senior Python Engineer (Remote)

Gorgias

Python Developer (Remote)

Authentechit Llc

Python Software Engineer (Charleston, SC, Partially Remote)

Netizen Corporation

More Python Jobs >>>

Articles & Tutorials

Advice on Getting Started With Testing in Python

Have you wanted to get started with testing in Python? Maybe you feel a little nervous about diving in deeper than just confirming your code runs. What are the tools needed and what would be the next steps to level up your Python testing?
REAL PYTHON podcast

Stop Using datetime.now! (With Dependency Injection)

How do you test a function that relies on datetime.now() or date.today()? You could use libraries like FreezeGun or libfaketime, but not every project can afford the luxury of reaching for third-party solutions. Learn how dependency injection can help you write code that is more testable, maintainable, and practical.
HAKI BENITA

Become a Python Guru With PyCharm

alt

PyCharm is the Python IDE for Professional Developers by JetBrains providing a complete set of tools for productive Python, Web and scientific development. Be more productive and save time while PyCharm takes care of the routine →
JETBRAINS sponsor

How to Write an Installable Django App

In this step-by-step tutorial, you'll learn how to create an installable Django app. You'll cover everything you need to know, from extracting your app from a Django project to turning it into a package that's available on PyPI and installable through pip.
REAL PYTHON

Some Sessions From the Python Language Summit

The Python Language Summit is an annual gathering for the developers of various Python implementations. This year, the gathering was conducted via videoconference. Here are summaries of some of the sessions from this year's summit.
JAKE EDGE

Parallel Iteration With Python's zip() Function

How to use the Python zip() function to solve common programming problems. You'll learn how to traverse multiple iterables in parallel and create dictionaries with just a few lines of code.
REAL PYTHON video

Building an OpenCV Social Distancing Detector

In this tutorial, you will learn how to implement a COVID-19 social distancing detector using OpenCV, Deep Learning, and Computer Vision.
ADRIAN ROSEBROCK

RSVP for the Python Web Conference (Virtual, June 17-19, 2020)

International experts share best practices for hard web production problems. 40+ talks on Django, Plone, CI/CD, Containers, Serverless, REST APIs, web security, microservices, etc. Join JetBrains and Six Feet Up to discuss what the future holds.
SIX FEET UP sponsor

How to Set Up RSS/Atom Feeds for Your Blog

FLORIAN DAHLITZ • Shared by Florian Dahlitz

A Pythonic Guide to SOLID Design

DEREK DRUMMOND • Shared by Derek Drummond

Projects & Code

httpx: A Next Generation HTTP Client for Python

GITHUB.COM/ENCODE

python-skyfield: Elegant Astronomy for Python

GITHUB.COM/SKYFIELDERS

PySyft: A Library for Encrypted, Privacy Preserving Machine Learning

GITHUB.COM/OPENMINED

pybridge-ios: Reuse Python Code in Native iOS Applications

GITHUB.COM/JOAOVENTURA

returns: Make Your Functions Return Something Meaningful, Typed, and Safe!

GITHUB.COM/DRY-PYTHON

snakeware: A Free Linux Distro With a Fully Python Userspace

GITHUB.COM/JOSHIEMOORE

python-testing-crawler: Crawler for Automated Functional Testing of a Web Application

GITHUB.COM/PYTHON-TESTING-CRAWLER

Events

Python Web Conf 2020 (Virtual)

June 17-19, 2020
PYTHONWEBCONF.COM

FlaskCon 2020 (Virtual)

July 4-5, 2020. The call for speakers is open until June 7th, 2020.
FLASKCON.COM


Happy Pythoning!
This was PyCoder's Weekly Issue #423.
View in Browser »

alt


[ Subscribe to 🐍 PyCoder's Weekly 💌 - Get the best Python news, articles, and tutorials delivered to your inbox once a week >> Click here to learn more ]

02 Jun 2020 7:30pm GMT

Real Python: Parallel Iteration With Python's zip() Function

Python's zip() function creates an iterator that will aggregate elements from two or more iterables. You can use the resulting iterator to quickly and consistently solve common programming problems, like creating dictionaries. In this course, you'll discover the logic behind the Python zip() function and how you can use it to solve real-world problems.

By the end of this course, you'll learn:


[ Improve Your Python With 🐍 Python Tricks 💌 - Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

02 Jun 2020 2:00pm GMT

Chris Moffitt: sidetable - Create Simple Summary Tables in Pandas

Introduction

Today I am happy to announce the release of a new pandas utility library called sidetable. This library makes it easy to build a frequency table and simple summary of missing values in a DataFrame. I have found it to be a useful tool when starting data exploration on a new data set and I hope others find it useful as well.

This project is also an opportunity to illustrate how to use pandas new API to register custom DataFrame accessors. This API allows you to build custom functions for working with pandas DataFrames and Series and could be really useful for building out your own library of custom pandas accessor functions.

sidetable

At its core, sidetable is a super-charged version of pandas value_counts with a little bit of crosstab mixed in. For instance, let's look at some data on School Improvement Grants so we can see how sidetable can help us explore a new data set and figure out approaches for more complex analysis.

The only external dependency is pandas version >= 1.0. Make sure it is installed, then install sidetable:

python -m pip install sidetable

Once sidetable is installed, you need to import it to get the pandas accessor registered.

import pandas as pd
import sidetable

df = pd.read_csv(df = pd.read_csv('https://github.com/chris1610/pbpython/blob/master/data/school_transform.csv?raw=True', index_col=0))
Data from file

Now that sidetable is imported, you have a new accessor on all your DataFrames - stb that you can use to build summary tables. For instance, we can use .stb.freq() to build a frequency table to show how many schools were included by state with cumulative totals and percentages:

df.stb.freq(['State'])
State frequency table

This example shows that CA occurs 92 times and represents 12.15% of the total number of schools. If you include FL in the counts, you now have 163 total schools that represent 21.5% of the total.

For comparison, here's value_counts(normalize=True) next to sidetable's output:

df.info()

I think you'll agree sidetable provides a lot more insight with not much more effort.

But wait, there's more!

What if we want a quick view of the states that contribute around 50% of the total? Use the thresh argument to group all of the rest into an "Others" category:

df.stb.freq(['State'], thresh=.5)
Top 50 percent

This is handy. Now we can see that 8 states contributed almost 50% of the total and all the other states account for the remainder.

If we want, we can rename the catch-all category using other_label

df.stb.freq(['State'], thresh=.5, other_label='Rest of states')

One of the useful features of sidetable is that it can group columns together to further understand the distribution. For instance, what if we want to see how the various "Transformation Models" are applied across Regions?

df.stb.freq(['Region', 'Model Selected'])
Region and Model Selected

This view is a quick way to understand the interaction and distribution of the various data elements. I find that this is an easy way to explore data and get some insights that might warrant further analysis. A table like this is also easy to share with others since it is relatively simple to understand.

You could definitely perform this analysis with standard pandas (that's all that is behind the scenes after all). It is cumbersome though, to remember the code. My experience is that if it is tough to remember then you are less likely to do it. simpletable tries to make this type of summary very easy to do.

Up until now, we have been counting the number of instances. What might be much more interesting is looking at the total breakdown by Award Amount . sidetable allows you to pass a value column that can be summed (instead of counting occurrences).

df.stb.freq(['Region'], value='Award_Amount')
Award distribution

This view gives us insight that the Northeast has the least amount of dollars spent on these projects and that 37% of the total spend went to schools in the South region.

Finally, we can look at the types of models selected and determine the 80/20 breakdown of the allocated dollars:

df.stb.freq(['Region', 'Model Selected'],
             value='Award_Amount', thresh=.82,
             other_label='Remaining')
Award distribution

If you're familiar with pandas crosstab, then one way to look at sidetable is that it is an expanded version of a crosstab with some convenience functions to view the data more easily:

Cross tab vs. sidetable

One of sidetable's goals is that its output is easy to interpret. If you would like to leverage pandas style functions to format your output for improved readability, sidetable can format Percentage and Amount columns to be more readable. This is not used by default but can be seen by passing style=True to the function:

df.stb.freq(['Region'], value='Award_Amount', style=True)
Formatted tables

So far I have only shown the freq function but in the interest of showing how to add other functions to the library, here's an example of building a simple missing values table:

df.stb.missing()
Missing values

In this table, there are 10 missing values in the Region column that represent a little less than 1.3% of the total values in that column.

You can get similar information using df.info() but I find this table easier to interpret when it comes to quickly identifying missing values:

df.info()

The documentation shows more information on usage and other options. Please check it out and let me know if it is useful to you.

One thing I do want to do is thank three people for their contributions to make sidetable work.

  • Peter Baumgartner - For the original inspiration in this tweet thread
  • Steve Miller - For an article that illustrates the value of looking at frequency distribution article
  • Ted Petrou - Made this post showing how to count null values in a DataFrame.

Each of these references was leveraged very heavily to make sidetable. Thank you!

Finally, the functionality in missing is not meant to be a replacement for the excellent missingno module. The implementation included in sidetable is a quick summary version and does not include any of the useful visualizations in missingno.

Introducing the pandas accessor API

If you would like to learn how to build your own accessor, it's actually relatively straightforward. As a reference, you can view the file that does all the work here.

Here's a short summary of how to get started. At the top of your file import pandas to get access to the decorator:

import pandas as pd

@pd.api.extensions.register_dataframe_accessor("stb")
class SideTableAccessor:

    def __init__(self, pandas_obj):
        self._validate(pandas_obj)
        self._obj = pandas_obj

This portion of code creates the accessor class and defines the accessor value which I have chosen as stb . Once this is in place, any time you import the python module containing this code, you will get the accessor registered and available on all DataFrames.

When the class is instantiated, the current pandas DataFrame will be validated through the _validate() method and then the DataFrame will be reference in subsequent functions using self._obj

In this case, I don't really do much with the validate method but you could choose to add more logic:

@staticmethod
def _validate(obj):
    # verify this is a DataFrame
    if not isinstance(obj, pd.DataFrame):
        raise AttributeError("Must be a pandas DataFrame")

All of the work is done in the freq and missing functions. For the most part, it is all standard pandas code. You just need to make sure you return a valid DataFrame.

For example, here is the full version of the missing function at the time of this article:

def missing(self, clip_0=False, style=False):
    """ Build table of missing data in each column.

        clip_0 (bool):     In cases where 0 counts are generated, remove them from the list
        style (bool):     Apply a pandas style to format percentages

    Returns:
        DataFrame with each Column including total Missing Values, Percent Missing
        and Total rows
    """
    missing = pd.concat([self._obj.isna().sum(),
                         self._obj.isna().mean()],
                        axis='columns').rename(columns={
                            0: 'Missing',
                            1: 'Percent'
                        })
    missing['Total'] = len(self._obj)
    if clip_0:
        missing = missing[missing['Missing'] > 0]

    results = missing[['Missing', 'Total',
                       'Percent']].sort_values(by=['Missing'],
                                               ascending=False)
    if style:
        format_dict = {'Percent': '{:.2%}', 'Total': '{0:,.0f}'}
        return results.style.format(format_dict)
    else:
        return results

In your "normal" pandas code, you would reference the DataFrame using df but here, use self._obj as your DataFrame to perform your concatenation and sorting.

I can see this as a very useful approach for building your own custom flavor of pandas functions. If you have certain transformation, cleaning or summarizing data that you do, then this might be an approach to consider - instead of just copying and pasting the code from file to file.

Summary

Pandas has a very rich API but sometimes it can take a lot of typing and wrangling to get the data in the format that is easy to understand. sidetable can make some of those summary tasks a lot easier by building frequency tables on combinations of your data and identifying gaps in your data.

sidetable does not replace any of the sophisticated analysis you will likely need to do to answer complex questions. However, it is a handy tool for quickly analyzing your data and identifying patterns you may want to investigate further.

In addition, I want sidetable to serve as an example of how to build you own pandas accessor that streamlines your normal analysis process.

I hope you find sidetable useful. If you have ideas for improvements or bug reports, head on over to github and let me know. I hope this can grow over time and become a useful tool that helps many others. I am curious to see what the community does with it.

02 Jun 2020 12:45pm GMT

10 Nov 2011

feedPlanetJava

OSDir.com - Java: Oracle Introduces New Java Specification Requests to Evolve Java Community Process

From the Yet Another dept.:

To further its commitment to the Java Community Process (JCP), Oracle has submitted the first of two Java Specification Requests (JSRs) to update and revitalize the JCP.

10 Nov 2011 6:01am GMT

OSDir.com - Java: No copied Java code or weapons of mass destruction found in Android

From the Fact Checking dept.:

ZDNET: Sometimes the sheer wrongness of what is posted on the web leaves us speechless. Especially when it's picked up and repeated as gospel by otherwise reputable sites like Engadget. "Google copied Oracle's Java code, pasted in a new license, and shipped it," they reported this morning.



Sorry, but that just isn't true.

10 Nov 2011 6:01am GMT

OSDir.com - Java: Java SE 7 Released

From the Grande dept.:

Oracle today announced the availability of Java Platform, Standard Edition 7 (Java SE 7), the first release of the Java platform under Oracle stewardship.

10 Nov 2011 6:01am GMT

28 Oct 2011

feedPlanet Ruby

O'Reilly Ruby: MacRuby: The Definitive Guide

Ruby and Cocoa on OS X, the iPhone, and the Device That Shall Not Be Named

28 Oct 2011 8:00pm GMT

14 Oct 2011

feedPlanet Ruby

Charles Oliver Nutter: Why Clojure Doesn't Need Invokedynamic (Unless You Want It to be More Awesome)

This was originally posted as a comment on @fogus's blog post "Why Clojure doesn't need invokedynamic, but it might be nice". I figured it's worth a top-level post here.

Ok, there's some good points here and a few misguided/misinformed positions. I'll try to cover everything.

First, I need to point out a key detail of invokedynamic that may have escaped notice: any case where you must bounce through a generic piece of code to do dispatch -- regardless of how fast that bounce may be -- prevents a whole slew of optimizations from happening. This might affect Java dispatch, if there's any argument-twiddling logic shared between call sites. It would definitely affect multimethods, which are using a hand-implemented PIC. Any case where there's intervening code between the call site and the target would benefit from invokedynamic, since invokedynamic could be used to plumb that logic and let it inline straight through. This is, indeed, the primary benefit of using invokedynamic: arbitrarily complex dispatch logic folds away allowing the dispatch to optimize as if it were direct.

Your point about inference in Java dispatch is a fair one...if Clojure is able to infer all cases, then there's no need to use invokedynamic at all. But unless Clojure is able to infer all cases, then you've got this little performance time bomb just waiting to happen. Tweak some code path and obscure the inference, and kablam, you're back on a slow reflective impl. Invokedynamic would provide a measure of consistency; the only unforeseen perf impact would be when the dispatch turns out to *actually* be polymorphic, in which case even a direct call wouldn't do much better.

For multimethods, the benefit should be clear: the MM selection logic would be mostly implemented using method handles and "leaf" logic, allowing hotspot to inline it everywhere it is used. That means for small-morphic MM call sites, all targets could potentially inline too. That's impossible without invokedynamic unless you generate every MM path immediately around the eventual call.

Now, on to defs and Var lookup. Depending on the cost of Var lookup, using a SwitchPoint-based invalidation plus invokedynamic could be a big win. In Java 7u2, SwitchPoint-based invalidation is essentially free until invalidated, and as you point out that's a rare case. There would essentially be *no* cost in indirecting through a var until that var changes...and then it would settle back into no cost until it changes again. Frequently-changing vars could gracefully degrade to a PIC.

It's also dangerous to understate the impact code size has on JVM optimization. The usual recommendation on the JVM is to move code into many small methods, possibly using call-through logic as in multimethods to reuse the same logic in many places. As I've mentioned, that defeats many optimizations, so the next approach is often to hand-inline logic everywhere it's used, to let the JVM have a more optimizable view of the system. But now we're stepping on our own feet...by adding more bytecode, we're almost certainly impacting the JVM's optimization and inlining budgets.

OpenJDK (and probably the other VMs too) has various limits on how far it will go to optimize code. A large number of these limits are based on the bytecoded size of the target methods. Methods that get too big won't inline, and sometimes won't compile. Methods that inline a lot of code might not get inlined into other methods. Methods that inline one path and eat up too much budget might push out more important calls later on. The only way around this is to reduce bytecode size, which is where invokedynamic comes in.

As of OpenJDK 7u2, MethodHandle logic is not included when calculating inlining budgets. In other words, if you push all the Java dispatch logic or multimethod dispatch logic or var lookup into mostly MethodHandles, you're getting that logic *for free*. That has had a tremendous impact on JRuby performance; I had previous versions of our compiler that did indeed infer static target methods from the interpreter, but they were often *slower* than call site caching solely because the code was considerably larger. With invokedynamic, a call is a call is a call, and the intervening plumbing is not counted against you.

Now, what about negative impacts to Clojure itself...

#0 is a red herring. JRuby supports Java 5, 6, and 7 with only a few hundred lines of changes in the compiler. Basically, the compiler has abstract interfaces for doing things like constant lookup, literal loading, and dispatch that we simply reimplement to use invokedynamic (extending the old non-indy logic for non-indified paths). In order to compile our uses of invokedynamic, we use Rémi Forax's JSR-292 backport, which includes a "mock" jar with all the invokedynamic APIs stubbed out. In our release, we just leave that library out, reflectively load the invokedynamic-based compiler impls, and we're off to the races.

#1 would be fair if the Oracle Java 7u2 early-access drops did not already include the optimizations that gave JRuby those awesome numbers. The biggest of those optimizations was making SwitchPoint free, but also important are the inlining discounting and MutableCallSite improvements. The perf you see for JRuby there can apply to any indirected behavior in Clojure, with the same perf benefits as of 7u2.

For #2, to address the apparent vagueness in my blog post...the big perf gain was largely from using SwitchPoint to invalidate constants rather than pinging a global serial number. Again, indirection folds away if you can shove it into MethodHandles. And it's pretty easy to do it.

#3 is just plain FUD. Oracle has committed to making invokedynamic work well for Java too. The current thinking is that "lambda", the support for closures in Java 7, will use invokedynamic under the covers to implement "function-like" constructs. Oracle has also committed to Nashorn, a fully invokedynamic-based JavaScript implementation, which has many of the same challenges as languages like Ruby or Python. I talked with Adam Messinger at Oracle, who explained to me that Oracle chose JavaScript in part because it's so far away from Java...as I put it (and he agreed) it's going to "keep Oracle honest" about optimizing for non-Java languages. Invokedynamic is driving the future of the JVM, and Oracle knows it all too well.

As for #4...well, all good things take a little effort :) I think the effort required is far lower than you suspect, though.

14 Oct 2011 2:40pm GMT

07 Oct 2011

feedPlanet Ruby

Ruby on Rails: Rails 3.1.1 has been released!

Hi everyone,

Rails 3.1.1 has been released. This release requires at least sass-rails 3.1.4

CHANGES

ActionMailer

ActionPack

ActiveModel

ActiveRecord

ActiveResource

ActiveSupport

Railties

SHA-1

You can find an exhaustive list of changes on github. Along with the closed issues marked for v3.1.1.

Thanks to everyone!

07 Oct 2011 5:26pm GMT

21 Mar 2011

feedPlanet Perl

Planet Perl is going dormant

Planet Perl is going dormant. This will be the last post there for a while.

image from planet.perl.org

Why? There are better ways to get your Perl blog fix these days.

You might enjoy some of the following:

Will Planet Perl awaken again in the future? It might! The universe is a big place, filled with interesting places, people and things. You never know what might happen, so keep your towel handy.

21 Mar 2011 2:04am GMT

improving on my little wooden "miniatures"

A few years ago, I wrote about cheap wooden discs as D&D minis, and I've been using them ever since. They do a great job, and cost nearly nothing. For the most part, we've used a few for the PCs, marked with the characters' initials, and the rest for NPCs and enemies, usually marked with numbers.

With D&D 4E, we've tended to have combats with more and more varied enemies. (Minions are wonderful things.) Numbering has become insufficient. It's too hard to remember what numbers are what monster, and to keep initiative order separate from token numbers. In the past, I've colored a few tokens in with the red or green whiteboard markers, and that has been useful. So, this afternoon I found my old paints and painted six sets of five colors. (The black ones I'd already made with sharpies.)

D&D tokens: now in color

I'm not sure what I'll want next: either I'll want five more of each color or I'll want five more colors. More colors will require that I pick up some white paint, while more of those colors will only require that I re-match the secondary colors when mixing. I think I'll wait to see which I end up wanting during real combats.

These colored tokens should work together well with my previous post about using a whiteboard for combat overview. Like-type monsters will get one color, and will all get grouped to one slot on initiative. Last night, for example, the two halfling warriors were red and acted in the same initiative slot. The three halfling minions were unpainted, and acted in another, later slot. Only PCs get their own initiative.

I think that it did a good amount to speed up combat, and that's even when I totally forgot to bring the combat whiteboard (and the character sheets!) with me. Next time, we'll see how it works when it's all brought together.

21 Mar 2011 12:47am GMT

20 Mar 2011

feedPlanet Perl

Perl Vogue T-Shirts

Is Plack the new Black?In Pisa I gave a lightning talk about Perl Vogue. People enjoyed it and for a while I thought that it might actually turn into a project.

I won't though. It would just take far too much effort. And, besides, a couple of people have pointed out to be that the real Vogue are rather protective of their brand.

So it's not going to happen, I'm afraid. But as a subtle reminder of the ideas behind Perl Vogue I've created some t-shirts containing the article titles from the talk. You can get them from my Spreadshirt shop.

20 Mar 2011 12:02pm GMT