24 Apr 2019

feedPlanet Python

tryexceptpass: Designing Continuous Build Systems

Continuous integration and delivery is finally becoming a common goal for teams of all sizes. After building a couple of these systems at small and medium scales, I wanted to write down ideas, design choices and lessons learned. This post is the first in a series that explores the design of a custom build system created around common development workflows, using off-the-shelf components where possible. You'll get an understanding of the basic components, how they interact, and maybe an open source project with example code from which to start your own.

24 Apr 2019 4:00am GMT

tryexceptpass: Designing Continuous Build Systems

Continuous integration and delivery is finally becoming a common goal for teams of all sizes. After building a couple of these systems at small and medium scales, I wanted to write down ideas, design choices and lessons learned. This post is the first in a series that explores the design of a custom build system created around common development workflows, using off-the-shelf components where possible. You'll get an understanding of the basic components, how they interact, and maybe an open source project with example code from which to start your own.

24 Apr 2019 4:00am GMT

Python Engineering at Microsoft: Python in Visual Studio Code – April 2019 Release

We are pleased to announce that the April 2019 release of the Python Extension for Visual Studio Code is now available. You can download the Python extension from the Marketplace, or install it directly from the extension gallery in Visual Studio Code. You can learn more about Python support in Visual Studio Code in the documentation.

In this release we made a series of improvements that are listed in our changelog, closing a total of 84 issues including:

Keep on reading to learn more!

Variable Explorer and Data Viewer

The Python Interactive experience now comes with a built-in variable explorer along with a data viewer, a highly requested feature from our users. Now you can easily view, inspect and filter the variables in your application, including lists, NumPy arrays, pandas data frames, and more!

A variables section will now be shown when running code and cells in the Python Interactive window. Once you expand it, you'll see the list of the variables in the current Jupyter session. More variables will show up automatically as they get used in the code. Clicking on each column header will sort the variables in the table.

You can also double-click on each row or use the "Show variable in data viewer" button to view full data of each variable in the newly-added Data Viewer, as well as perform simple search over its values:

The Data Viewer requires pandas package 0.20 or later, and you will get a message to install or upgrade if it's not available.

The Variable Explore is enabled by default. You can turn it off through File > Preferences > Settings and looking for the Python > Data Science: Show Jupyter Variable Explorer setting.

Enhancements to debug configuration

We simplified the process of configuring the debugger for your workspace. When you start debugging (through the Debug Panel, F5 or Debug > Start Debugging) and no debug configuration exists, you now will be prompted to create a debug configuration for your application. Creating a debug configuration is accomplished through a set of menus, instead of manually configuring the launch.json file.

This prompt will also be displayed when adding another debug configuration through the launch.json file:

Additional improvements to the Python Language Server

This release includes several fixes and improvements to the Python Language Server. We added back features that were removed in the 0.2 release: "Rename Symbol", "Go to Definition" and "Find All References", and made improvements to loading time and memory usage when importing scientific libraries such as pandas, Plotly, PyQt5 , especially when running in full Anaconda environments.

To opt-in to the Language Server, change the python.jediEnabled setting to false in File > Preferences > User Settings. We are working towards making the language server the default in the next few releases, so if you run into problems, please file an issue on the Python Language Server GitHub page.

Other Changes and Enhancements

We have also added small enhancements and fixed issues requested by users that should improve your experience working with Python in Visual Studio Code. Some notable changes include:

Be sure to download the Python extension for Visual Studio Code now to try out the above improvements. If you run into any problems, please file an issue on the Python VS Code GitHub page.

The post Python in Visual Studio Code - April 2019 Release appeared first on Python.

24 Apr 2019 12:41am GMT

Python Engineering at Microsoft: Python in Visual Studio Code – April 2019 Release

We are pleased to announce that the April 2019 release of the Python Extension for Visual Studio Code is now available. You can download the Python extension from the Marketplace, or install it directly from the extension gallery in Visual Studio Code. You can learn more about Python support in Visual Studio Code in the documentation.

In this release we made a series of improvements that are listed in our changelog, closing a total of 84 issues including:

Keep on reading to learn more!

Variable Explorer and Data Viewer

The Python Interactive experience now comes with a built-in variable explorer along with a data viewer, a highly requested feature from our users. Now you can easily view, inspect and filter the variables in your application, including lists, NumPy arrays, pandas data frames, and more!

A variables section will now be shown when running code and cells in the Python Interactive window. Once you expand it, you'll see the list of the variables in the current Jupyter session. More variables will show up automatically as they get used in the code. Clicking on each column header will sort the variables in the table.

You can also double-click on each row or use the "Show variable in data viewer" button to view full data of each variable in the newly-added Data Viewer, as well as perform simple search over its values:

The Data Viewer requires pandas package 0.20 or later, and you will get a message to install or upgrade if it's not available.

The Variable Explore is enabled by default. You can turn it off through File > Preferences > Settings and looking for the Python > Data Science: Show Jupyter Variable Explorer setting.

Enhancements to debug configuration

We simplified the process of configuring the debugger for your workspace. When you start debugging (through the Debug Panel, F5 or Debug > Start Debugging) and no debug configuration exists, you now will be prompted to create a debug configuration for your application. Creating a debug configuration is accomplished through a set of menus, instead of manually configuring the launch.json file.

This prompt will also be displayed when adding another debug configuration through the launch.json file:

Additional improvements to the Python Language Server

This release includes several fixes and improvements to the Python Language Server. We added back features that were removed in the 0.2 release: "Rename Symbol", "Go to Definition" and "Find All References", and made improvements to loading time and memory usage when importing scientific libraries such as pandas, Plotly, PyQt5 , especially when running in full Anaconda environments.

To opt-in to the Language Server, change the python.jediEnabled setting to false in File > Preferences > User Settings. We are working towards making the language server the default in the next few releases, so if you run into problems, please file an issue on the Python Language Server GitHub page.

Other Changes and Enhancements

We have also added small enhancements and fixed issues requested by users that should improve your experience working with Python in Visual Studio Code. Some notable changes include:

Be sure to download the Python extension for Visual Studio Code now to try out the above improvements. If you run into any problems, please file an issue on the Python VS Code GitHub page.

The post Python in Visual Studio Code - April 2019 Release appeared first on Python.

24 Apr 2019 12:41am GMT

23 Apr 2019

feedPlanet Python

PyCoder’s Weekly: Issue #365 (April 23, 2019)

#365 - APRIL 23, 2019
View in Browser »

The PyCoder’s Weekly Logo


Requests III: HTTP for Humans and Machines

Requests is getting a makeover slated for release in 2020: asyncio, HTTP/2, connection pooling, timeouts, Python 3.6+, and more.
PYTHON-REQUESTS.ORG

Python Learning Paths

Step-by-Step Python learning paths and study plans for beginner, intermediate, and advanced Python developers.
REAL PYTHON

Packaging Python Inside Your Organization

What do you do when your organization uses Python for in-house development and you can't (or don't want to) make everything Open Source? Where do you store and manage your code? How do you distribute your packages? Stefan lays out his approach in this detailed article.
STEFAN SCHERFKE

Find a Python Job Through Vettery

alt

Vettery specializes in developer roles and is completely free for job seekers. Interested? Submit your profile, and if accepted, you can receive interview requests directly from top companies seeking Python devs. Get started →
VETTERY sponsor

Flask vs Django: Choose Your Python Web Framework

What are the main differences between Django and Flask? What are their respective strengths and weaknesses? Read Damian's article to find out.
DAMIAN HITES

Make a Python GUI App for NASA's Image Search API

Learn how to build a Python GUI app for browsing the NASA image library from scratch using the wxPython toolkit.
MIKE DRISCOLL

Positional-Only Parameters for Python

Inside baseball about PEP 570.
JAKE EDGE

Discussions

Why Use Anaconda?

REDDIT

Suitable Tattoo for a Job Interview

Now that's some dedication!
REDDIT

Python Jobs

Senior Python Developer (Copenhagen, Denmark)

GameAnalytics Ltd.

Senior Python Engineer (Remote)

ReCharge Payments

Python Engineer in Healthcare (Burlington, MA)

Nuance Communications

Machine Learning and Data Science Developer (Austin, TX)

Protection Engineering Consultants LLC

More Python Jobs >>>

Articles & Tutorials

How to Work With a PDF in Python

In this step-by-step tutorial, you'll learn how to work with PDF files in Python. You'll see how to extract metadata from preexisting PDFs . You'll also learn how to merge, split, watermark, and rotate pages in PDFs using Python and PyPDF2.
REAL PYTHON

The 10 Most Common Mistakes That Python Developers Make

A list of harmful patterns & pitfalls you can avoid in your own Python code. This is an older post but it still applies as of 2019. Worth a read!
MARTIN CHIKILIAN

Python Opportunities Come to You on Indeed Prime

alt

Indeed prime is a hiring platform exclusively for tech talent like you. If you're accepted, we'll match you with companies and roles that line up with your skills, career goals and salary expectations. Apply for free today.
INDEED sponsor

Sending Emails With Python

In this course, you'll learn how to send emails using Python. Find out how to send plain-text and HTML messages, add files as attachments, and send personalized emails to multiple people. Later on you'll build a CSV-powered email sending script from scratch.
REAL PYTHON video

Getting Started With Google Coral's TPU USB Accelerator

Learn how to get started with your Google Coral TPU Accelerator on Raspberry Pi and Ubuntu. You'll then learn how to perform classification and object detection using Google Coral's USB Accelerator.
ADRIAN ROSEBROCK

What Do Companies Expect From Python Devs in 2019?

"We took 300 job specs for Python developers, scraped from StackOverflow, AngelList, LinkedIn, and the websites of some fast-growing tech companies worldwide. From all these descriptions, we extracted the skills which were mentioned most frequently"
ANDREW STETSENKO • Shared by Andrew Stetsenko

Raspberry Pi for Computer Vision and Deep Learning

You can teach your Raspberry Pi to "see" using Computer Vision, Deep Learning, and OpenCV. Let Adrian Rosebrock show you how →
PYIMAGESEARCH sponsor

Comparison of Top Data Science Libraries for Python, R, and Scala

This is an infographic comparing commit frequency and other metrics for the most popular data science libraries in Python, R, and Scala.
CORIERS.COM

Guide to the Python time Module

Learn how to use Python's time module to represent dates and times in your application, manage code execution, and measure performance.
REAL PYTHON

Creating a Heatmap From Scratch in Python

GEODOSE.COM

Extending Mypy With Plugins

MYPY-LANG.BLOGSPOT.COM • Shared by Ricky White

Projects & Code

puppeteer: An Opinionated Way to Manage Ansible Projects

GITHUB.COM/HAANI-NIYAZ

datatest: Tools for Test Driven Data-Wrangling and Data Validation

GITHUB.COM/SHAWNBROWN • Shared by Shawn Brown

markplates: Templated Line-Based File Inclusions in Markdown Texts

GITHUB.COM/JIMA80525

totext.py: Convert URL or RSS Feed to Plain Text

RAYMII.ORG

PySnooper: Never Use Print for Debugging Again

GITHUB.COM/COOL-RR

dephell: Python Project Management (Packages, Venvs, and More)

GITHUB.COM/DEPHELL

flipper-client: Simple but Powerful Feature Flagging Tool

GITHUB.COM/CARTA

inline_python: Write Python Inline in Your Rust Code

DOCS.RS

Events

SciPy Japan 2019

April 23 to April 25, 2019
SCIPY.ORG

Python Sudeste 2019

April 26 to April 29, 2019
PYTHONSUDESTE.ORG

PythOnRio Meetup

April 27, 2019
PYTHON.ORG.BR

PyDelhi User Group Meetup

April 27, 2019
MEETUP.COM

Dominican Republic Python User Group

April 30, 2019
PYTHON.DO

PyCon US 2019

May 1 to May 10, 2019
PYCON.ORG

PyDays Vienna

May 3 to May 5, 2019
PYDAYS.AT


Happy Pythoning!
This was PyCoder's Weekly Issue #365.
View in Browser »

alt


[ Subscribe to 🐍 PyCoder's Weekly 💌 - Get the best Python news, articles, and tutorials delivered to your inbox once a week >> Click here to learn more ]

23 Apr 2019 7:30pm GMT

PyCoder’s Weekly: Issue #365 (April 23, 2019)

#365 - APRIL 23, 2019
View in Browser »

The PyCoder’s Weekly Logo


Requests III: HTTP for Humans and Machines

Requests is getting a makeover slated for release in 2020: asyncio, HTTP/2, connection pooling, timeouts, Python 3.6+, and more.
PYTHON-REQUESTS.ORG

Python Learning Paths

Step-by-Step Python learning paths and study plans for beginner, intermediate, and advanced Python developers.
REAL PYTHON

Packaging Python Inside Your Organization

What do you do when your organization uses Python for in-house development and you can't (or don't want to) make everything Open Source? Where do you store and manage your code? How do you distribute your packages? Stefan lays out his approach in this detailed article.
STEFAN SCHERFKE

Find a Python Job Through Vettery

alt

Vettery specializes in developer roles and is completely free for job seekers. Interested? Submit your profile, and if accepted, you can receive interview requests directly from top companies seeking Python devs. Get started →
VETTERY sponsor

Flask vs Django: Choose Your Python Web Framework

What are the main differences between Django and Flask? What are their respective strengths and weaknesses? Read Damian's article to find out.
DAMIAN HITES

Make a Python GUI App for NASA's Image Search API

Learn how to build a Python GUI app for browsing the NASA image library from scratch using the wxPython toolkit.
MIKE DRISCOLL

Positional-Only Parameters for Python

Inside baseball about PEP 570.
JAKE EDGE

Discussions

Why Use Anaconda?

REDDIT

Suitable Tattoo for a Job Interview

Now that's some dedication!
REDDIT

Python Jobs

Senior Python Developer (Copenhagen, Denmark)

GameAnalytics Ltd.

Senior Python Engineer (Remote)

ReCharge Payments

Python Engineer in Healthcare (Burlington, MA)

Nuance Communications

Machine Learning and Data Science Developer (Austin, TX)

Protection Engineering Consultants LLC

More Python Jobs >>>

Articles & Tutorials

How to Work With a PDF in Python

In this step-by-step tutorial, you'll learn how to work with PDF files in Python. You'll see how to extract metadata from preexisting PDFs . You'll also learn how to merge, split, watermark, and rotate pages in PDFs using Python and PyPDF2.
REAL PYTHON

The 10 Most Common Mistakes That Python Developers Make

A list of harmful patterns & pitfalls you can avoid in your own Python code. This is an older post but it still applies as of 2019. Worth a read!
MARTIN CHIKILIAN

Python Opportunities Come to You on Indeed Prime

alt

Indeed prime is a hiring platform exclusively for tech talent like you. If you're accepted, we'll match you with companies and roles that line up with your skills, career goals and salary expectations. Apply for free today.
INDEED sponsor

Sending Emails With Python

In this course, you'll learn how to send emails using Python. Find out how to send plain-text and HTML messages, add files as attachments, and send personalized emails to multiple people. Later on you'll build a CSV-powered email sending script from scratch.
REAL PYTHON video

Getting Started With Google Coral's TPU USB Accelerator

Learn how to get started with your Google Coral TPU Accelerator on Raspberry Pi and Ubuntu. You'll then learn how to perform classification and object detection using Google Coral's USB Accelerator.
ADRIAN ROSEBROCK

What Do Companies Expect From Python Devs in 2019?

"We took 300 job specs for Python developers, scraped from StackOverflow, AngelList, LinkedIn, and the websites of some fast-growing tech companies worldwide. From all these descriptions, we extracted the skills which were mentioned most frequently"
ANDREW STETSENKO • Shared by Andrew Stetsenko

Raspberry Pi for Computer Vision and Deep Learning

You can teach your Raspberry Pi to "see" using Computer Vision, Deep Learning, and OpenCV. Let Adrian Rosebrock show you how →
PYIMAGESEARCH sponsor

Comparison of Top Data Science Libraries for Python, R, and Scala

This is an infographic comparing commit frequency and other metrics for the most popular data science libraries in Python, R, and Scala.
CORIERS.COM

Guide to the Python time Module

Learn how to use Python's time module to represent dates and times in your application, manage code execution, and measure performance.
REAL PYTHON

Creating a Heatmap From Scratch in Python

GEODOSE.COM

Extending Mypy With Plugins

MYPY-LANG.BLOGSPOT.COM • Shared by Ricky White

Projects & Code

puppeteer: An Opinionated Way to Manage Ansible Projects

GITHUB.COM/HAANI-NIYAZ

datatest: Tools for Test Driven Data-Wrangling and Data Validation

GITHUB.COM/SHAWNBROWN • Shared by Shawn Brown

markplates: Templated Line-Based File Inclusions in Markdown Texts

GITHUB.COM/JIMA80525

totext.py: Convert URL or RSS Feed to Plain Text

RAYMII.ORG

PySnooper: Never Use Print for Debugging Again

GITHUB.COM/COOL-RR

dephell: Python Project Management (Packages, Venvs, and More)

GITHUB.COM/DEPHELL

flipper-client: Simple but Powerful Feature Flagging Tool

GITHUB.COM/CARTA

inline_python: Write Python Inline in Your Rust Code

DOCS.RS

Events

SciPy Japan 2019

April 23 to April 25, 2019
SCIPY.ORG

Python Sudeste 2019

April 26 to April 29, 2019
PYTHONSUDESTE.ORG

PythOnRio Meetup

April 27, 2019
PYTHON.ORG.BR

PyDelhi User Group Meetup

April 27, 2019
MEETUP.COM

Dominican Republic Python User Group

April 30, 2019
PYTHON.DO

PyCon US 2019

May 1 to May 10, 2019
PYCON.ORG

PyDays Vienna

May 3 to May 5, 2019
PYDAYS.AT


Happy Pythoning!
This was PyCoder's Weekly Issue #365.
View in Browser »

alt


[ Subscribe to 🐍 PyCoder's Weekly 💌 - Get the best Python news, articles, and tutorials delivered to your inbox once a week >> Click here to learn more ]

23 Apr 2019 7:30pm GMT

Codementor: What do companies expect from Python devs in 2019?

What skills does a Python dev need in 2019? Here is what a data-driven research shows.

23 Apr 2019 3:13pm GMT

Codementor: What do companies expect from Python devs in 2019?

What skills does a Python dev need in 2019? Here is what a data-driven research shows.

23 Apr 2019 3:13pm GMT

Neckbeard Republic: Sending Emails With Python

In this course, you'll learn how to send emails using Python. Find out how to send plain-text and HTML messages, add files as attachments, and send personalized emails to multiple people. Later on you'll build a CSV-powered email sending script from scratch.

23 Apr 2019 2:00pm GMT

Neckbeard Republic: Sending Emails With Python

In this course, you'll learn how to send emails using Python. Find out how to send plain-text and HTML messages, add files as attachments, and send personalized emails to multiple people. Later on you'll build a CSV-powered email sending script from scratch.

23 Apr 2019 2:00pm GMT

PyCon: Welcome Capital One: Python Software Foundation Principal Sponsor



A big welcome and thank you to Capital One for joining the PSF as a Principal sponsor!

Capital One is also a PyCon 2019 Principal sponsor and is excited to share a few things with attendees, including a deeper look at their intelligent virtual assistant, Eno. Eno's NLP models were built in-house with Python. Eno is a key component of the customer experience at Capital One, proactively looking out for customers and their money. Eno notifies customers about unusual transactions or duplicate charges, helping to spot fraud in its tracks. It also sends bill reminders and makes paying your bill as easy as sending a text or emoji; plus, its new virtual card capabilities let customers shop online without using their real credit card number.

The benefits they've seen by developing Eno with Python are numerous: fast time to market, the ability to prototype and iterate quickly, ease of integration with machine learning frameworks, and extensive support for everything we need (like Kafka and Redis). Plus, they see faster performance using Python's Asynchronous I/O.

For Capital One, sponsoring important industry conferences like PyCon brings a lot of benefits, like recruiting and brand awareness, but they're here first and foremost for the community. By sponsoring PyCon, they feel they're helping support, strengthen, and engage with the Python community.

Capital One sees the future of banking as real-time, data-driven, and enabled by machine learning and data science -- and Python plays a big role in that. They have embedded machine learning across the entire enterprise, from call center operations to back-office processes, fraud, internal operations, the customer experience, and much more. To them, machine learning not only creates efficiency and scale on a level not possible before, but it also helps give their customers greater protection, security, confidence, and control of their finances.

Python has been and will continue to be critical to advances in machine learning and data science, so they see a lot of exciting innovation, growth, and potential for the Python community. They hope to share back with the community some of their own insights, best practices, and broader work with Python.

As an open source first organization, Capital One has been working in the open source space for several years -- consuming and contributing code, as well as releasing their own projects. One example of an open source project they'll be showcasing at PyCon is Cloud Custodian. Cloud Custodian is a tool built with Python to allow users to easily define rules to enable a well-managed cloud infrastructure in the enterprise. It's both secure and cost-optimized and consolidates many of the ad-hoc scripts organizations have into a lightweight and flexible tool, with unified metrics and reporting.

They also developed a Javascript project called Hygieia, a single, configurable dashboard that visualizes the health of an entire software delivery pipeline. All their open source projects are on GitHub and their Python projects can be found here.

According to the Python Software Foundation and JetBrains' 2018 Python Developers Survey, using Python for machine learning grew 7 percentage points since 2017, which is incredible. Machine learning experienced faster growth than Web development, which has only increased by 2 percentage points when compared to the previous year. Capital One is increasingly focused on using machine learning across the enterprise. One recent Python-based project is work they've done in Explainable AI. Their team created a technique called Global Attribution Mapping (GAM), which is capable of explaining neural network predictions across subpopulations. This approach surfaces subpopulations with their most representative explanations, allowing them to inspect global model behavior and essentially make it easier to generate global explanations based on local attributions. You can learn more about the open source tool they developed for GAM along with a recent whitepaper with more details.

Be sure to stop by their booth, #303, and get even more details about how they're using Python.



23 Apr 2019 9:26am GMT

PyCon: Welcome Capital One: Python Software Foundation Principal Sponsor



A big welcome and thank you to Capital One for joining the PSF as a Principal sponsor!

Capital One is also a PyCon 2019 Principal sponsor and is excited to share a few things with attendees, including a deeper look at their intelligent virtual assistant, Eno. Eno's NLP models were built in-house with Python. Eno is a key component of the customer experience at Capital One, proactively looking out for customers and their money. Eno notifies customers about unusual transactions or duplicate charges, helping to spot fraud in its tracks. It also sends bill reminders and makes paying your bill as easy as sending a text or emoji; plus, its new virtual card capabilities let customers shop online without using their real credit card number.

The benefits they've seen by developing Eno with Python are numerous: fast time to market, the ability to prototype and iterate quickly, ease of integration with machine learning frameworks, and extensive support for everything we need (like Kafka and Redis). Plus, they see faster performance using Python's Asynchronous I/O.

For Capital One, sponsoring important industry conferences like PyCon brings a lot of benefits, like recruiting and brand awareness, but they're here first and foremost for the community. By sponsoring PyCon, they feel they're helping support, strengthen, and engage with the Python community.

Capital One sees the future of banking as real-time, data-driven, and enabled by machine learning and data science -- and Python plays a big role in that. They have embedded machine learning across the entire enterprise, from call center operations to back-office processes, fraud, internal operations, the customer experience, and much more. To them, machine learning not only creates efficiency and scale on a level not possible before, but it also helps give their customers greater protection, security, confidence, and control of their finances.

Python has been and will continue to be critical to advances in machine learning and data science, so they see a lot of exciting innovation, growth, and potential for the Python community. They hope to share back with the community some of their own insights, best practices, and broader work with Python.

As an open source first organization, Capital One has been working in the open source space for several years -- consuming and contributing code, as well as releasing their own projects. One example of an open source project they'll be showcasing at PyCon is Cloud Custodian. Cloud Custodian is a tool built with Python to allow users to easily define rules to enable a well-managed cloud infrastructure in the enterprise. It's both secure and cost-optimized and consolidates many of the ad-hoc scripts organizations have into a lightweight and flexible tool, with unified metrics and reporting.

They also developed a Javascript project called Hygieia, a single, configurable dashboard that visualizes the health of an entire software delivery pipeline. All their open source projects are on GitHub and their Python projects can be found here.

According to the Python Software Foundation and JetBrains' 2018 Python Developers Survey, using Python for machine learning grew 7 percentage points since 2017, which is incredible. Machine learning experienced faster growth than Web development, which has only increased by 2 percentage points when compared to the previous year. Capital One is increasingly focused on using machine learning across the enterprise. One recent Python-based project is work they've done in Explainable AI. Their team created a technique called Global Attribution Mapping (GAM), which is capable of explaining neural network predictions across subpopulations. This approach surfaces subpopulations with their most representative explanations, allowing them to inspect global model behavior and essentially make it easier to generate global explanations based on local attributions. You can learn more about the open source tool they developed for GAM along with a recent whitepaper with more details.

Be sure to stop by their booth, #303, and get even more details about how they're using Python.



23 Apr 2019 9:26am GMT

The Code Bits: Printing star patterns in Python: One line tricks!

In this post, we will see how to print some of the common star patterns using Python3 with one line of code!

How to print a half-pyramid pattern in Python?

>>> n = 5
>>> print('\n'.join('*' * i for i in range(1, n+1)))
*
**
***
****
*****
>>>
>>> print('\n'.join('* ' * i for i in range(1, n+1)))
*
* *
* * *
* * * *
* * * * *

How to print a rotated half-pyramid pattern in Python?

>>> n = 5
>>> print('\n'.join(' ' * (n-i) + '*' * (i) for i in range(1, n+1)))
    *
   **
  ***
 ****
*****
>>>
>>> print('\n'.join('  ' * (n-i) + '* ' * (i) for i in range(1, n+1)))
        *
      * *
    * * *
  * * * *
* * * * *

How to print an inverted half-pyramid pattern in Python?

>>> n = 5
>>> print('\n'.join('*' * (n-i) for i in range(n)))
*****
****
***
**
*
>>>
>>> print('\n'.join('* ' * (n-i) for i in range(n)))
* * * * *
* * * *
* * *
* *
*

How to print an inverted and rotated half-pyramid pattern in Python?

>>> n = 5
>>> print('\n'.join(' ' * i + '*' * (n-i) for i in range(n)))
*****
 ****
  ***
   **
    *
>>>
>>> print('\n'.join('  ' * i + '* ' * (n-i) for i in range(n)))
* * * * *
  * * * *
    * * *
      * *
        *

How to print a full triangle pyramid pattern in Python?

>>> n = 5
>>> print('\n'.join(' ' * (n-i) + '* ' * i for i in range(1, n+1)))
    *
   * *
  * * *
 * * * *
* * * * *
>>>
>>> print('\n'.join(' ' * (n-1-i) + '*' * ((i*2)+1) for i in range(n)))
    *
   ***
  *****
 *******
*********

How to print an inverted full triangle pyramid pattern in Python?

>>> n = 5
>>> print('\n'.join(' ' * (n-i) + '* ' * i for i in range(n, 0, -1)))
* * * * *
 * * * *
  * * *
   * *
    *
>>>
>>> print('\n'.join(' ' * (n-i) + '*' * ((i*2)-1) for i in range(n, 0, -1)))
*********
 *******
  *****
   ***
    *

23 Apr 2019 7:06am GMT

The Code Bits: Printing star patterns in Python: One line tricks!

In this post, we will see how to print some of the common star patterns using Python3 with one line of code!

How to print a half-pyramid pattern in Python?

>>> n = 5
>>> print('\n'.join('*' * i for i in range(1, n+1)))
*
**
***
****
*****
>>>
>>> print('\n'.join('* ' * i for i in range(1, n+1)))
*
* *
* * *
* * * *
* * * * *

How to print a rotated half-pyramid pattern in Python?

>>> n = 5
>>> print('\n'.join(' ' * (n-i) + '*' * (i) for i in range(1, n+1)))
    *
   **
  ***
 ****
*****
>>>
>>> print('\n'.join('  ' * (n-i) + '* ' * (i) for i in range(1, n+1)))
        *
      * *
    * * *
  * * * *
* * * * *

How to print an inverted half-pyramid pattern in Python?

>>> n = 5
>>> print('\n'.join('*' * (n-i) for i in range(n)))
*****
****
***
**
*
>>>
>>> print('\n'.join('* ' * (n-i) for i in range(n)))
* * * * *
* * * *
* * *
* *
*

How to print an inverted and rotated half-pyramid pattern in Python?

>>> n = 5
>>> print('\n'.join(' ' * i + '*' * (n-i) for i in range(n)))
*****
 ****
  ***
   **
    *
>>>
>>> print('\n'.join('  ' * i + '* ' * (n-i) for i in range(n)))
* * * * *
  * * * *
    * * *
      * *
        *

How to print a full triangle pyramid pattern in Python?

>>> n = 5
>>> print('\n'.join(' ' * (n-i) + '* ' * i for i in range(1, n+1)))
    *
   * *
  * * *
 * * * *
* * * * *
>>>
>>> print('\n'.join(' ' * (n-1-i) + '*' * ((i*2)+1) for i in range(n)))
    *
   ***
  *****
 *******
*********

How to print an inverted full triangle pyramid pattern in Python?

>>> n = 5
>>> print('\n'.join(' ' * (n-i) + '* ' * i for i in range(n, 0, -1)))
* * * * *
 * * * *
  * * *
   * *
    *
>>>
>>> print('\n'.join(' ' * (n-i) + '*' * ((i*2)-1) for i in range(n, 0, -1)))
*********
 *******
  *****
   ***
    *

23 Apr 2019 7:06am GMT

Catalin George Festila: Testing firebase with Python 3.7.3 .

The tutorial for today consists of using the Firebase service with python version 3.7.3 . As you know Firebase offers multiple free and paid services. In order to use the Python programming language, we need to use the pip utility to enter the required modules. If your installation requires other python modules then you will need to install them in the same way. C:\Python373>pip install

23 Apr 2019 4:56am GMT

Catalin George Festila: Testing firebase with Python 3.7.3 .

The tutorial for today consists of using the Firebase service with python version 3.7.3 . As you know Firebase offers multiple free and paid services. In order to use the Python programming language, we need to use the pip utility to enter the required modules. If your installation requires other python modules then you will need to install them in the same way. C:\Python373>pip install

23 Apr 2019 4:56am GMT

22 Apr 2019

feedPlanet Python

NumFOCUS: NumFOCUS Projects to Apply for Inaugural Google Season of Docs

The post NumFOCUS Projects to Apply for Inaugural Google Season of Docs appeared first on NumFOCUS.

22 Apr 2019 9:41pm GMT

NumFOCUS: NumFOCUS Projects to Apply for Inaugural Google Season of Docs

The post NumFOCUS Projects to Apply for Inaugural Google Season of Docs appeared first on NumFOCUS.

22 Apr 2019 9:41pm GMT

Podcast.__init__: Exploring Indico: A Full Featured Event Management Platform

Managing an event is rife with inherent complexity that scales as you move from scheduling a meeting to organizing a conference. Indico is a platform built at CERN to handle their efforts to organize events such as the Computing in High Energy Physics (CHEP) conference, and now it has grown to manage booking of meeting rooms. In this episode Adrian Mönnich, core developer on the Indico project, explains how it is architected to facilitate this use case, how it has evolved since its first incarnation two decades ago, and what he has learned while working on it. The Indico platform is definitely a feature rich and mature platform that is worth considering if you are responsible for organizing a conference or need a room booking system for your office.

Summary

Managing an event is rife with inherent complexity that scales as you move from scheduling a meeting to organizing a conference. Indico is a platform built at CERN to handle their efforts to organize events such as the Computing in High Energy Physics (CHEP) conference, and now it has grown to manage booking of meeting rooms. In this episode Adrian Mönnich, core developer on the Indico project, explains how it is architected to facilitate this use case, how it has evolved since its first incarnation two decades ago, and what he has learned while working on it. The Indico platform is definitely a feature rich and mature platform that is worth considering if you are responsible for organizing a conference or need a room booking system for your office.

Announcements

  • Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
  • When you're ready to launch your next app or want to try a project you hear about on the show, you'll need somewhere to deploy it, so take a look at our friends over at Linode. With 200 Gbit/s private networking, scalable shared block storage, node balancers, and a 40 Gbit/s public network, all controlled by a brand new API you've got everything you need to scale up. And for your tasks that need fast computation, such as training machine learning models, they just launched dedicated CPU instances. Go to pythonpodcast.com/linode to get a $20 credit and launch a new server in under a minute. And don't forget to thank them for their continued support of this show!
  • Bots and automation are taking over whole categories of online interaction. Discover.bot is an online community designed to serve as a platform-agnostic digital space for bot developers and enthusiasts of all skill levels to learn from one another, share their stories, and move the conversation forward together. They regularly publish guides and resources to help you learn about topics such as bot development, using them for business, and the latest in chatbot news. For newcomers to the space they have the Beginners Guide To Bots that will teach you the basics of how bots work, what they can do, and where they are developed and published. To help you choose the right framework and avoid the confusion about which NLU features and platform APIs you will need they have compiled a list of the major options and how they compare. Go to pythonpodcast.com/discoverbot today to get started and thank them for their support of the show.
  • You listen to this show to learn and stay up to date with the ways that Python is being used, including the latest in machine learning and data analysis. For even more opportunities to meet, listen, and learn from your peers you don't want to miss out on this year's conference season. We have partnered with organizations such as O'Reilly Media, Dataversity, and the Open Data Science Conference. Go to pythonpodcast.com/conferences to learn more and take advantage of our partner discounts when you register.
  • Visit the site to subscribe to the show, sign up for the newsletter, and read the show notes. And if you have any questions, comments, or suggestions I would love to hear them. You can reach me on Twitter at @Podcast__init__ or email hosts@podcastinit.com)
  • To help other people find the show please leave a review on iTunes and tell your friends and co-workers
  • Join the community in the new Zulip chat workspace at pythonpodcast.com/chat
  • Your host as usual is Tobias Macey and today I'm interviewing Adrian Mönnich about Indico, the effortless open-source tool for event organisation, archival and collaboration

Interview

  • Introductions
  • How did you get introduced to Python?
  • Can you start by describing what Indico is and how the project got started?
    • What are some other projects which target a similar use case and what were they lacking that led to Indico being necessary?
  • Can you talk through an example workflow for setting up and managing an event in Indico?
    • How does the lifecycle change when working with larger events, such as PyCon?
  • Can you describe how Indico is architected and how its design has evolved since it was first built?
    • What are some of the most complex or challenging portions of Indico to implement and maintain?
  • There are a lot of areas for exercising constraint resolution algorithms. Can you talk through some of the business logic of how that operates?
  • Most of Indico is highly configurable and flexible. How do you approach managing sane defaults to prevent users getting overwhelmed when onboarding?
    • What is your approach to testing given how complex the project is?
  • What are some of the most interesting or unexpected ways that you have seen Indico used?
  • What are some of the most interesting/unexpected lessons that you have learned in the process of building Indico?
  • What do you have planned for the future of the project?

Keep In Touch

Picks

Links

The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

22 Apr 2019 6:18pm GMT

Podcast.__init__: Exploring Indico: A Full Featured Event Management Platform

Managing an event is rife with inherent complexity that scales as you move from scheduling a meeting to organizing a conference. Indico is a platform built at CERN to handle their efforts to organize events such as the Computing in High Energy Physics (CHEP) conference, and now it has grown to manage booking of meeting rooms. In this episode Adrian Mönnich, core developer on the Indico project, explains how it is architected to facilitate this use case, how it has evolved since its first incarnation two decades ago, and what he has learned while working on it. The Indico platform is definitely a feature rich and mature platform that is worth considering if you are responsible for organizing a conference or need a room booking system for your office.

Summary

Managing an event is rife with inherent complexity that scales as you move from scheduling a meeting to organizing a conference. Indico is a platform built at CERN to handle their efforts to organize events such as the Computing in High Energy Physics (CHEP) conference, and now it has grown to manage booking of meeting rooms. In this episode Adrian Mönnich, core developer on the Indico project, explains how it is architected to facilitate this use case, how it has evolved since its first incarnation two decades ago, and what he has learned while working on it. The Indico platform is definitely a feature rich and mature platform that is worth considering if you are responsible for organizing a conference or need a room booking system for your office.

Announcements

  • Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
  • When you're ready to launch your next app or want to try a project you hear about on the show, you'll need somewhere to deploy it, so take a look at our friends over at Linode. With 200 Gbit/s private networking, scalable shared block storage, node balancers, and a 40 Gbit/s public network, all controlled by a brand new API you've got everything you need to scale up. And for your tasks that need fast computation, such as training machine learning models, they just launched dedicated CPU instances. Go to pythonpodcast.com/linode to get a $20 credit and launch a new server in under a minute. And don't forget to thank them for their continued support of this show!
  • Bots and automation are taking over whole categories of online interaction. Discover.bot is an online community designed to serve as a platform-agnostic digital space for bot developers and enthusiasts of all skill levels to learn from one another, share their stories, and move the conversation forward together. They regularly publish guides and resources to help you learn about topics such as bot development, using them for business, and the latest in chatbot news. For newcomers to the space they have the Beginners Guide To Bots that will teach you the basics of how bots work, what they can do, and where they are developed and published. To help you choose the right framework and avoid the confusion about which NLU features and platform APIs you will need they have compiled a list of the major options and how they compare. Go to pythonpodcast.com/discoverbot today to get started and thank them for their support of the show.
  • You listen to this show to learn and stay up to date with the ways that Python is being used, including the latest in machine learning and data analysis. For even more opportunities to meet, listen, and learn from your peers you don't want to miss out on this year's conference season. We have partnered with organizations such as O'Reilly Media, Dataversity, and the Open Data Science Conference. Go to pythonpodcast.com/conferences to learn more and take advantage of our partner discounts when you register.
  • Visit the site to subscribe to the show, sign up for the newsletter, and read the show notes. And if you have any questions, comments, or suggestions I would love to hear them. You can reach me on Twitter at @Podcast__init__ or email hosts@podcastinit.com)
  • To help other people find the show please leave a review on iTunes and tell your friends and co-workers
  • Join the community in the new Zulip chat workspace at pythonpodcast.com/chat
  • Your host as usual is Tobias Macey and today I'm interviewing Adrian Mönnich about Indico, the effortless open-source tool for event organisation, archival and collaboration

Interview

  • Introductions
  • How did you get introduced to Python?
  • Can you start by describing what Indico is and how the project got started?
    • What are some other projects which target a similar use case and what were they lacking that led to Indico being necessary?
  • Can you talk through an example workflow for setting up and managing an event in Indico?
    • How does the lifecycle change when working with larger events, such as PyCon?
  • Can you describe how Indico is architected and how its design has evolved since it was first built?
    • What are some of the most complex or challenging portions of Indico to implement and maintain?
  • There are a lot of areas for exercising constraint resolution algorithms. Can you talk through some of the business logic of how that operates?
  • Most of Indico is highly configurable and flexible. How do you approach managing sane defaults to prevent users getting overwhelmed when onboarding?
    • What is your approach to testing given how complex the project is?
  • What are some of the most interesting or unexpected ways that you have seen Indico used?
  • What are some of the most interesting/unexpected lessons that you have learned in the process of building Indico?
  • What do you have planned for the future of the project?

Keep In Touch

Picks

Links

The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

22 Apr 2019 6:18pm GMT

Codementor: Variable references in Python

Variable reference in python

22 Apr 2019 3:58pm GMT

Codementor: Variable references in Python

Variable reference in python

22 Apr 2019 3:58pm GMT

ListenData: Loops in Python explained with examples

This tutorial covers various ways to execute loops in python. Loops is an important concept of any programming language which performs iterations i.e. run specific code repeatedly until a certain condition is reached.

Real Time Examples of Loop

  1. Software of the ATM machine is in a loop to process transaction after transaction until you acknowledge that you have no more to do.
  2. Software program in a mobile device allows user to unlock the mobile with 5 password attempts. After that it resets mobile device.
  3. You put your favorite song on a repeat mode. It is also a loop.
  4. You want to run a particular analysis on each column of your data set.

1. For Loop

Like R and C programming language, you can use for loop in Python. It is one of the most commonly used loop method to automate the repetitive tasks.

How for loop works?

Suppose you are asked to print sequence of numbers from 1 to 9, increment by 2.
for i in range(1,10,2):
print(i)
Output
1
3
5
7
9
range(1,10,2) means starts from 1 and ends with 9 (excluding 10), increment by 2.

Iteration over list
This section covers how to run for in loop on a list.
mylist = [30,21,33,42,53,64,71,86,97,10]
for i in mylist:
print(i)
Output
30
21
33
42
53
64
71
86
97
10

Suppose you need to select every 3rd value of list.
for i in mylist[::3]:
print(i)
Output
30
42
71
10
mylist[::3] is equivalent to mylist[0::3] which follows this syntax style list[start:stop:step]

Python Loop Explained with Examples

Example 1 : Create a new list with only items from list that is between 0 and 10
l1 = [100, 1, 10, 2, 3, 5, 8, 13, 21, 34, 55, 98]

new = [] #Blank list
for i in l1:
if i > 0 and i <= 10:
new.append(i)

new
Output: [1, 10, 2, 3, 5, 8]
It can also be done via numpy package by creating list as numpy array. See the code below.

import numpy as np
k=np.array(l1)
new=k[np.where(k<=10)]


Example 2 : Check which alphabet (a-z) is mentioned in string

Suppose you have a string named k and you want to check which alphabet exists in the string k.
k = "deepanshu"

import string
for n in string.ascii_lowercase:
if n in k:
print(n + ' exists in ' + k)
else:
print(n + ' does not exist in ' + k)
string.ascii_lowercase returns 'abcdefghijklmnopqrstuvwxyz'.

Practical Examples : for in loop in Python

Create sample pandas data frame for illustrative purpose.
import pandas as pd
np.random.seed(234)
df = pd.DataFrame({"x1" : np.random.randint(low=1, high=100, size=10),
"Month1" : np.random.normal(size=10),
"Month2" : np.random.normal(size=10),
"Month3" : np.random.normal(size=10),
"price" : range(10)
})

df
1. Multiple each month column by 1.2
for i in range(1,4):
print(df["Month"+str(i)]*1.2)

range(1,4) returns 1, 2 and 3. str( ) function is used to covert to string. "Month" + str(1) means Month1.

2. Store computed columns in new data frame
import pandas as pd
newDF = pd.DataFrame()
for i in range(1,4):
data = pd.DataFrame(df["Month"+str(i)]*1.2)
newDF=pd.concat([newDF,data], axis=1)

pd.DataFrame( ) is used to create blank data frame. The concat() function from pandas package is used to concatenate two data frames.


3. Check if value of x1 >= 50, multiply each month cost by price. Otherwise same as month.
import pandas as pd
import numpy as np
for i in range(1,4):
df['newcol'+str(i)] = np.where(df['x1'] >= 50,
df['Month'+str(i)] * df['price'],
df['Month'+str(i)])

In this example, we are adding new columns named newcol1, newcol2 and newcol3. np.where(condition, value_if condition meets, value_if condition does not meet) is used to construct IF ELSE statement.


4. Filter data frame by each unique value of a column and store it in a separate data frame
mydata = pd.DataFrame({"X1" : ["A","A","B","B","C"]})

for name in mydata.X1.unique():
temp = pd.DataFrame(mydata[mydata.X1 == name])
exec('{} = temp'.format(name))

The unique( ) function is used to calculate distinct values of a variable. The exec( ) function is used for dynamic execution of Python program. See the usage of string format( ) function below -

s= "Your Input"
"i am {}".format(s)

Output: 'i am Your Input'

Loop Control Statements

Loop control statements change execution from its normal iteration. When execution leaves a scope, all automatic objects that were created in that scope are destroyed.

Python supports the following control statements.
  1. Continue statement
  2. Break statement

Continue Statement
When continue statement is executed, it skips the further code in the loop and continue iteration.
In the code below, we are avoiding letters a and d to be printed.
for n in "abcdef":
if n =="a" or n =="d":
continue
print("letter :", n)
letter : b
letter : c
letter : e
letter : f
Break Statement
When break statement runs, it breaks or stops the loop.
In this program, when n is either c or d, loop stops executing.
for n in "abcdef":
if n =="c" or n =="d":
break
print("letter :", n)
letter : a
letter : b

for loop with else clause

Using else clause with for loop is not common among python developers community.

The else clause executes after the loop completes. It means that the loop did not encounter a break statement.

The program below calculates factors for numbers between 2 to 10. Else clause returns numbers which have no factors and are therefore prime numbers:

for k in range(2, 10):
for y in range(2, k):
if k % y == 0:
print( k, '=', y, '*', round(k/y))
break
else:
print(k, 'is a prime number')
2 is a prime number
3 is a prime number
4 = 2 * 2
5 is a prime number
6 = 2 * 3
7 is a prime number
8 = 2 * 4
9 = 3 * 3

While Loop

While loop is used to execute code repeatedly until a condition is met. And when the condition becomes false, the line immediately after the loop in program is executed.
i = 1
while i < 10:
print(i)
i += 2 #means i = i + 2
print("new i :", i)
Output:
1
new i : 3
3
new i : 5
5
new i : 7
7
new i : 9
9
new i : 11

While Loop with If-Else Statement

If-Else statement can be used along with While loop. See the program below -

counter = 1 
while (counter <= 5):
if counter < 2:
print("Less than 2")
elif counter > 4:
print("Greater than 4")
else:
print(">= 2 and <=4")
counter += 1

About Author:

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 7 years of experience in data science and predictive modeling. During his tenure, he has worked with global clients in various domains.

Let's Get Connected: LinkedIn

22 Apr 2019 2:10pm GMT

ListenData: Loops in Python explained with examples

This tutorial covers various ways to execute loops in python. Loops is an important concept of any programming language which performs iterations i.e. run specific code repeatedly until a certain condition is reached.

Real Time Examples of Loop

  1. Software of the ATM machine is in a loop to process transaction after transaction until you acknowledge that you have no more to do.
  2. Software program in a mobile device allows user to unlock the mobile with 5 password attempts. After that it resets mobile device.
  3. You put your favorite song on a repeat mode. It is also a loop.
  4. You want to run a particular analysis on each column of your data set.

1. For Loop

Like R and C programming language, you can use for loop in Python. It is one of the most commonly used loop method to automate the repetitive tasks.

How for loop works?

Suppose you are asked to print sequence of numbers from 1 to 9, increment by 2.
for i in range(1,10,2):
print(i)
Output
1
3
5
7
9
range(1,10,2) means starts from 1 and ends with 9 (excluding 10), increment by 2.

Iteration over list
This section covers how to run for in loop on a list.
mylist = [30,21,33,42,53,64,71,86,97,10]
for i in mylist:
print(i)
Output
30
21
33
42
53
64
71
86
97
10

Suppose you need to select every 3rd value of list.
for i in mylist[::3]:
print(i)
Output
30
42
71
10
mylist[::3] is equivalent to mylist[0::3] which follows this syntax style list[start:stop:step]

Python Loop Explained with Examples

Example 1 : Create a new list with only items from list that is between 0 and 10
l1 = [100, 1, 10, 2, 3, 5, 8, 13, 21, 34, 55, 98]

new = [] #Blank list
for i in l1:
if i > 0 and i <= 10:
new.append(i)

new
Output: [1, 10, 2, 3, 5, 8]
It can also be done via numpy package by creating list as numpy array. See the code below.

import numpy as np
k=np.array(l1)
new=k[np.where(k<=10)]


Example 2 : Check which alphabet (a-z) is mentioned in string

Suppose you have a string named k and you want to check which alphabet exists in the string k.
k = "deepanshu"

import string
for n in string.ascii_lowercase:
if n in k:
print(n + ' exists in ' + k)
else:
print(n + ' does not exist in ' + k)
string.ascii_lowercase returns 'abcdefghijklmnopqrstuvwxyz'.

Practical Examples : for in loop in Python

Create sample pandas data frame for illustrative purpose.
import pandas as pd
np.random.seed(234)
df = pd.DataFrame({"x1" : np.random.randint(low=1, high=100, size=10),
"Month1" : np.random.normal(size=10),
"Month2" : np.random.normal(size=10),
"Month3" : np.random.normal(size=10),
"price" : range(10)
})

df
1. Multiple each month column by 1.2
for i in range(1,4):
print(df["Month"+str(i)]*1.2)

range(1,4) returns 1, 2 and 3. str( ) function is used to covert to string. "Month" + str(1) means Month1.

2. Store computed columns in new data frame
import pandas as pd
newDF = pd.DataFrame()
for i in range(1,4):
data = pd.DataFrame(df["Month"+str(i)]*1.2)
newDF=pd.concat([newDF,data], axis=1)

pd.DataFrame( ) is used to create blank data frame. The concat() function from pandas package is used to concatenate two data frames.


3. Check if value of x1 >= 50, multiply each month cost by price. Otherwise same as month.
import pandas as pd
import numpy as np
for i in range(1,4):
df['newcol'+str(i)] = np.where(df['x1'] >= 50,
df['Month'+str(i)] * df['price'],
df['Month'+str(i)])

In this example, we are adding new columns named newcol1, newcol2 and newcol3. np.where(condition, value_if condition meets, value_if condition does not meet) is used to construct IF ELSE statement.


4. Filter data frame by each unique value of a column and store it in a separate data frame
mydata = pd.DataFrame({"X1" : ["A","A","B","B","C"]})

for name in mydata.X1.unique():
temp = pd.DataFrame(mydata[mydata.X1 == name])
exec('{} = temp'.format(name))

The unique( ) function is used to calculate distinct values of a variable. The exec( ) function is used for dynamic execution of Python program. See the usage of string format( ) function below -

s= "Your Input"
"i am {}".format(s)

Output: 'i am Your Input'

Loop Control Statements

Loop control statements change execution from its normal iteration. When execution leaves a scope, all automatic objects that were created in that scope are destroyed.

Python supports the following control statements.
  1. Continue statement
  2. Break statement

Continue Statement
When continue statement is executed, it skips the further code in the loop and continue iteration.
In the code below, we are avoiding letters a and d to be printed.
for n in "abcdef":
if n =="a" or n =="d":
continue
print("letter :", n)
letter : b
letter : c
letter : e
letter : f
Break Statement
When break statement runs, it breaks or stops the loop.
In this program, when n is either c or d, loop stops executing.
for n in "abcdef":
if n =="c" or n =="d":
break
print("letter :", n)
letter : a
letter : b

for loop with else clause

Using else clause with for loop is not common among python developers community.

The else clause executes after the loop completes. It means that the loop did not encounter a break statement.

The program below calculates factors for numbers between 2 to 10. Else clause returns numbers which have no factors and are therefore prime numbers:

for k in range(2, 10):
for y in range(2, k):
if k % y == 0:
print( k, '=', y, '*', round(k/y))
break
else:
print(k, 'is a prime number')
2 is a prime number
3 is a prime number
4 = 2 * 2
5 is a prime number
6 = 2 * 3
7 is a prime number
8 = 2 * 4
9 = 3 * 3

While Loop

While loop is used to execute code repeatedly until a condition is met. And when the condition becomes false, the line immediately after the loop in program is executed.
i = 1
while i < 10:
print(i)
i += 2 #means i = i + 2
print("new i :", i)
Output:
1
new i : 3
3
new i : 5
5
new i : 7
7
new i : 9
9
new i : 11

While Loop with If-Else Statement

If-Else statement can be used along with While loop. See the program below -

counter = 1 
while (counter <= 5):
if counter < 2:
print("Less than 2")
elif counter > 4:
print("Greater than 4")
else:
print(">= 2 and <=4")
counter += 1

About Author:

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 7 years of experience in data science and predictive modeling. During his tenure, he has worked with global clients in various domains.

Let's Get Connected: LinkedIn

22 Apr 2019 2:10pm GMT

Real Python: A Beginner’s Guide to the Python time Module

The Python time module provides many ways of representing time in code, such as objects, numbers, and strings. It also provides functionality other than representing time, like waiting during code execution and measuring the efficiency of your code.

This article will walk you through the most commonly used functions and objects in time.

By the end of this article, you'll be able to:

You'll start by learning how you can use a floating point number to represent time.

Free Bonus: Click here to get our free Python Cheat Sheet that shows you the basics of Python 3, like working with data types, dictionaries, lists, and Python functions.

Dealing With Python Time Using Seconds

One of the ways you can manage the concept of Python time in your application is by using a floating point number that represents the number of seconds that have passed since the beginning of an era-that is, since a certain starting point.

Let's dive deeper into what that means, why it's useful, and how you can use it to implement logic, based on Python time, in your application.

The Epoch

You learned in the previous section that you can manage Python time with a floating point number representing elapsed time since the beginning of an era.

Merriam-Webster defines an era as:

The important concept to grasp here is that, when dealing with Python time, you're considering a period of time identified by a starting point. In computing, you call this starting point the epoch.

The epoch, then, is the starting point against which you can measure the passage of time.

For example, if you define the epoch to be midnight on January 1, 1970 UTC-the epoch as defined on Windows and most UNIX systems-then you can represent midnight on January 2, 1970 UTC as 86400 seconds since the epoch.

This is because there are 60 seconds in a minute, 60 minutes in an hour, and 24 hours in a day. January 2, 1970 UTC is only one day after the epoch, so you can apply basic math to arrive at that result:

>>>
>>> 60 * 60 * 24
86400

It is also important to note that you can still represent time before the epoch. The number of seconds would just be negative.

For example, you would represent midnight on December 31, 1969 UTC (using an epoch of January 1, 1970) as -86400 seconds.

While January 1, 1970 UTC is a common epoch, it is not the only epoch used in computing. In fact, different operating systems, filesystems, and APIs sometimes use different epochs.

As you saw before, UNIX systems define the epoch as January 1, 1970. The Win32 API, on the other hand, defines the epoch as January 1, 1601.

You can use time.gmtime() to determine your system's epoch:

>>>
>>> import time
>>> time.gmtime(0)
time.struct_time(tm_year=1970, tm_mon=1, tm_mday=1, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=3, tm_yday=1, tm_isdst=0)

You'll learn about gmtime() and struct_time throughout the course of this article. For now, just know that you can use time to discover the epoch using this function.

Now that you understand more about how to measure time in seconds using an epoch, let's take a look at Python's time module to see what functions it offers that help you do so.

Python Time in Seconds as a Floating Point Number

First, time.time() returns the number of seconds that have passed since the epoch. The return value is a floating point number to account for fractional seconds:

>>>
>>> from time import time
>>> time()
1551143536.9323719

The number you get on your machine may be very different because the reference point considered to be the epoch may be very different.

Further Reading: Python 3.7 introduced time_ns(), which returns an integer value representing the same elapsed time since the epoch, but in nanoseconds rather than seconds.

Measuring time in seconds is useful for a number of reasons:

Sometimes, however, you may want to see the current time represented as a string. To do so, you can pass the number of seconds you get from time() into time.ctime().

Python Time in Seconds as a String Representing Local Time

As you saw before, you may want to convert the Python time, represented as the number of elapsed seconds since the epoch, to a string. You can do so using ctime():

>>>
>>> from time import time, ctime
>>> t = time()
>>> ctime(t)
'Mon Feb 25 19:11:59 2019'

Here, you've recorded the current time in seconds into the variable t, then passed t as an argument to ctime(), which returns a string representation of that same time.

Technical Detail: The argument, representing seconds since the epoch, is optional according to the ctime() definition. If you don't pass an argument, then ctime() uses the return value of time() by default. So, you could simplify the example above:

>>>
>>> from time import ctime
>>> ctime()
'Mon Feb 25 19:11:59 2019'

The string representation of time, also known as a timestamp, returned by ctime() is formatted with the following structure:

  1. Day of the week: Mon (Monday)
  2. Month of the year: Feb (February)
  3. Day of the month: 25
  4. Hours, minutes, and seconds using the 24-hour clock notation: 19:11:59
  5. Year: 2019

The previous example displays the timestamp of a particular moment captured from a computer in the South Central region of the United States. But, let's say you live in Sydney, Australia, and you executed the same command at the same instant.

Instead of the above output, you'd see the following:

>>>
>>> from time import time, ctime
>>> t = time()
>>> ctime(t)
'Tue Feb 26 12:11:59 2019'

Notice that the day of week, day of month, and hour portions of the timestamp are different than the first example.

These outputs are different because the timestamp returned by ctime() depends on your geographical location.

Note: While the concept of time zones is relative to your physical location, you can modify this in your computer's settings without actually relocating.

The representation of time dependent on your physical location is called local time and makes use of a concept called time zones.

Note: Since local time is related to your locale, timestamps often account for locale-specific details such as the order of the elements in the string and translations of the day and month abbreviations. ctime() ignores these details.

Let's dig a little deeper into the notion of time zones so that you can better understand Python time representations.

Understanding Time Zones

A time zone is a region of the world that conforms to a standardized time. Time zones are defined by their offset from Coordinated Universal Time (UTC) and, potentially, the inclusion of daylight savings time (which we'll cover in more detail later in this article).

Fun Fact: If you're a native English speaker, you might be wondering why the abbreviation for "Coordinated Universal Time" is UTC rather than the more obvious CUT. However, if you're a native French speaker, you would call it "Temps Universel Coordonné," which suggests a different abbreviation: TUC.

Ultimately, the International Telecommunication Union and the International Astronomical Union compromised on UTC as the official abbreviation so that, regardless of language, the abbreviation would be the same.

UTC and Time Zones

UTC is the time standard against which all the world's timekeeping is synchronized (or coordinated). It is not, itself, a time zone but rather a transcendent standard that defines what time zones are.

UTC time is precisely measured using astronomical time, referring to the Earth's rotation, and atomic clocks.

Time zones are then defined by their offset from UTC. For example, in North and South America, the Central Time Zone (CT) is behind UTC by five or six hours and, therefore, uses the notation UTC-5:00 or UTC-6:00.

Sydney, Australia, on the other hand, belongs to the Australian Eastern Time Zone (AET), which is ten or eleven hours ahead of UTC (UTC+10:00 or UTC+11:00).

This difference (UTC-6:00 to UTC+10:00) is the reason for the variance you observed in the two outputs from ctime() in the previous examples:

These times are exactly sixteen hours apart, which is consistent with the time zone offsets mentioned above.

You may be wondering why CT can be either five or six hours behind UTC or why AET can be ten or eleven hours ahead. The reason for this is that some areas around the world, including parts of these time zones, observe daylight savings time.

Daylight Savings Time

Summer months generally experience more daylight hours than winter months. Because of this, some areas observe daylight savings time (DST) during the spring and summer to make better use of those daylight hours.

For places that observe DST, their clocks will jump ahead one hour at the beginning of spring (effectively losing an hour). Then, in the fall, the clocks will be reset to standard time.

The letters S and D represent standard time and daylight savings time in time zone notation:

When you represent times as timestamps in local time, it is always important to consider whether DST is applicable or not.

ctime() accounts for daylight savings time. So, the output difference listed previously would be more accurate as the following:

Dealing With Python Time Using Data Structures

Now that you have a firm grasp on many fundamental concepts of time including epochs, time zones, and UTC, let's take a look at more ways to represent time using the Python time module.

Python Time as a Tuple

Instead of using a number to represent Python time, you can use another primitive data structure: a tuple.

The tuple allows you to manage time a little more easily by abstracting some of the data and making it more readable.

When you represent time as a tuple, each element in your tuple corresponds to a specific element of time:

  1. Year
  2. Month as an integer, ranging between 1 (January) and 12 (December)
  3. Day of the month
  4. Hour as an integer, ranging between 0 (12 A.M.) and 23 (11 P.M.)
  5. Minute
  6. Second
  7. Day of the week as an integer, ranging between 0 (Monday) and 6 (Sunday)
  8. Day of the year
  9. Daylight savings time as an integer with the following values:
    • 1 is daylight savings time.
    • 0 is standard time.
    • -1 is unknown.

Using the methods you've already learned, you can represent the same Python time in two different ways:

>>>
>>> from time import time, ctime
>>> t = time()
>>> t
1551186415.360564
>>> ctime(t)
'Tue Feb 26 07:06:55 2019'

>>> time_tuple = (2019, 2, 26, 7, 6, 55, 1, 57, 0)

In this case, both t and time_tuple represent the same time, but the tuple provides a more readable interface for working with time components.

Technical Detail: Actually, if you look at the Python time represented by time_tuple in seconds (which you'll see how to do later in this article), you'll see that it resolves to 1551186415.0 rather than 1551186415.360564.

This is because the tuple doesn't have a way to represent fractional seconds.

While the tuple provides a more manageable interface for working with Python time, there is an even better object: struct_time.

Python Time as an Object

The problem with the tuple construct is that it still looks like a bunch of numbers, even though it's better organized than a single, seconds-based number.

struct_time provides a solution to this by utilizing NamedTuple, from Python's collections module, to associate the tuple's sequence of numbers with useful identifiers:

>>>
>>> from time import struct_time
>>> time_tuple = (2019, 2, 26, 7, 6, 55, 1, 57, 0)
>>> time_obj = struct_time(time_tuple)
>>> time_obj
time.struct_time(tm_year=2019, tm_mon=2, tm_mday=26, tm_hour=7, tm_min=6, tm_sec=55, tm_wday=1, tm_yday=57, tm_isdst=0)

Technical Detail: If you're coming from another language, the terms struct and object might be in opposition to one another.

In Python, there is no data type called struct. Instead, everything is an object.

However, the name struct_time is derived from the C-based time library where the data type is actually a struct.

In fact, Python's time module, which is implemented in C, uses this struct directly by including the header file times.h.

Now, you can access specific elements of time_obj using the attribute's name rather than an index:

>>>
>>> day_of_year = time_obj.tm_yday
>>> day_of_year
57
>>> day_of_month = time_obj.tm_mday
>>> day_of_month
26

Beyond the readability and usability of struct_time, it is also important to know because it is the return type of many of the functions in the Python time module.

Converting Python Time in Seconds to an Object

Now that you've seen the three primary ways of working with Python time, you'll learn how to convert between the different time data types.

Converting between time data types is dependent on whether the time is in UTC or local time.

Coordinated Universal Time (UTC)

The epoch uses UTC for its definition rather than a time zone. Therefore, the seconds elapsed since the epoch is not variable depending on your geographical location.

However, the same cannot be said of struct_time. The object representation of Python time may or may not take your time zone into account.

There are two ways to convert a float representing seconds to a struct_time:

  1. UTC
  2. Local time

To convert a Python time float to a UTC-based struct_time, the Python time module provides a function called gmtime().

You've seen gmtime() used once before in this article:

>>>
>>> import time
>>> time.gmtime(0)
time.struct_time(tm_year=1970, tm_mon=1, tm_mday=1, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=3, tm_yday=1, tm_isdst=0)

You used this call to discover your system's epoch. Now, you have a better foundation for understanding what's actually happening here.

gmtime() converts the number of elapsed seconds since the epoch to a struct_time in UTC. In this case, you've passed 0 as the number of seconds, meaning you're trying to find the epoch, itself, in UTC.

Note: Notice the attribute tm_isdst is set to 0. This attribute represents whether the time zone is using daylight savings time. UTC never subscribes to DST, so that flag will always be 0 when using gmtime().

As you saw before, struct_time cannot represent fractional seconds, so gmtime() ignores the fractional seconds in the argument:

>>>
>>> import time
>>> time.gmtime(1.99)
time.struct_time(tm_year=1970, tm_mon=1, tm_mday=1, tm_hour=0, tm_min=0, tm_sec=1, tm_wday=3, tm_yday=1, tm_isdst=0)

Notice that even though the number of seconds you passed was very close to 2, the .99 fractional seconds were simply ignored, as shown by tm_sec=1.

The secs parameter for gmtime() is optional, meaning you can call gmtime() with no arguments. Doing so will provide the current time in UTC:

>>>
>>> import time
>>> time.gmtime()
time.struct_time(tm_year=2019, tm_mon=2, tm_mday=28, tm_hour=12, tm_min=57, tm_sec=24, tm_wday=3, tm_yday=59, tm_isdst=0)

Interestingly, there is no inverse for this function within time. Instead, you'll have to look in Python's calendar module for a function named timegm():

>>>
>>> import calendar
>>> import time
>>> time.gmtime()
time.struct_time(tm_year=2019, tm_mon=2, tm_mday=28, tm_hour=13, tm_min=23, tm_sec=12, tm_wday=3, tm_yday=59, tm_isdst=0)
>>> calendar.timegm(time.gmtime())
1551360204

timegm() takes a tuple (or struct_time, since it is a subclass of tuple) and returns the corresponding number of seconds since the epoch.

Historical Context: If you're interested in why timegm() is not in time, you can view the discussion in Python Issue 6280.

In short, it was originally added to calendar because time closely follows C's time library (defined in time.h), which contains no matching function. The above-mentioned issue proposed the idea of moving or copying timegm() into time.

However, with advances to the datetime library, inconsistencies in the patched implementation of time.timegm(), and a question of how to then handle calendar.timegm(), the maintainers declined the patch, encouraging the use of datetime instead.

Working with UTC is valuable in programming because it's a standard. You don't have to worry about DST, time zone, or locale information.

That said, there are plenty of cases when you'd want to use local time. Next, you'll see how to convert from seconds to local time so that you can do just that.

Local Time

In your application, you may need to work with local time rather than UTC. Python's time module provides a function for getting local time from the number of seconds elapsed since the epoch called localtime().

The signature of localtime() is similar to gmtime() in that it takes an optional secs argument, which it uses to build a struct_time using your local time zone:

>>>
>>> import time
>>> time.time()
1551448206.86196
>>> time.localtime(1551448206.86196)
time.struct_time(tm_year=2019, tm_mon=3, tm_mday=1, tm_hour=7, tm_min=50, tm_sec=6, tm_wday=4, tm_yday=60, tm_isdst=0)

Notice that tm_isdst=0. Since DST matters with local time, tm_isdst will change between 0 and 1 depending on whether or not DST is applicable for the given time. Since tm_isdst=0, DST is not applicable for March 1, 2019.

In the United States in 2019, daylight savings time begins on March 10. So, to test if the DST flag will change correctly, you need to add 9 days' worth of seconds to the secs argument.

To compute this, you take the number of seconds in a day (86,400) and multiply that by 9 days:

>>>
>>> new_secs = 1551448206.86196 + (86400 * 9)
>>> time.localtime(new_secs)
time.struct_time(tm_year=2019, tm_mon=3, tm_mday=10, tm_hour=8, tm_min=50, tm_sec=6, tm_wday=6, tm_yday=69, tm_isdst=1)

Now, you'll see that the struct_time shows the date to be March 10, 2019 with tm_isdst=1. Also, notice that tm_hour has also jumped ahead, to 8 instead of 7 in the previous example, because of daylight savings time.

Since Python 3.3, struct_time has also included two attributes that are useful in determining the time zone of the struct_time:

  1. tm_zone
  2. tm_gmtoff

At first, these attributes were platform dependent, but they have been available on all platforms since Python 3.6.

First, tm_zone stores the local time zone:

>>>
>>> import time
>>> current_local = time.localtime()
>>> current_local.tm_zone
'CST'

Here, you can see that localtime() returns a struct_time with the time zone set to CST (Central Standard Time).

As you saw before, you can also tell the time zone based on two pieces of information, the UTC offset and DST (if applicable):

>>>
>>> import time
>>> current_local = time.localtime()
>>> current_local.tm_gmtoff
-21600
>>> current_local.tm_isdst
0

In this case, you can see that current_local is 21600 seconds behind GMT, which stands for Greenwich Mean Time. GMT is the time zone with no UTC offset: UTC±00:00.

21600 seconds divided by seconds per hour (3,600) means that current_local time is GMT-06:00 (or UTC-06:00).

You can use the GMT offset plus the DST status to deduce that current_local is UTC-06:00 at standard time, which corresponds to the Central standard time zone.

Like gmtime(), you can ignore the secs argument when calling localtime(), and it will return the current local time in a struct_time:

>>>
>>> import time
>>> time.localtime()
time.struct_time(tm_year=2019, tm_mon=3, tm_mday=1, tm_hour=8, tm_min=34, tm_sec=28, tm_wday=4, tm_yday=60, tm_isdst=0)

Unlike gmtime(), the inverse function of localtime() does exist in the Python time module. Let's take a look at how that works.

Converting a Local Time Object to Seconds

You've already seen how to convert a UTC time object to seconds using calendar.timegm(). To convert local time to seconds, you'll use mktime().

mktime() requires you to pass a parameter called t that takes the form of either a normal 9-tuple or a struct_time object representing local time:

>>>
>>> import time

>>> time_tuple = (2019, 3, 10, 8, 50, 6, 6, 69, 1)
>>> time.mktime(time_tuple)
1552225806.0

>>> time_struct = time.struct_time(time_tuple)
>>> time.mktime(time_struct)
1552225806.0

It's important to keep in mind that t must be a tuple representing local time, not UTC:

>>>
>>> from time import gmtime, mktime

>>> # 1
>>> current_utc = time.gmtime()
>>> current_utc
time.struct_time(tm_year=2019, tm_mon=3, tm_mday=1, tm_hour=14, tm_min=51, tm_sec=19, tm_wday=4, tm_yday=60, tm_isdst=0)

>>> # 2
>>> current_utc_secs = mktime(current_utc)
>>> current_utc_secs
1551473479.0

>>> # 3
>>> time.gmtime(current_utc_secs)
time.struct_time(tm_year=2019, tm_mon=3, tm_mday=1, tm_hour=20, tm_min=51, tm_sec=19, tm_wday=4, tm_yday=60, tm_isdst=0)

Note: For this example, assume that the local time is March 1, 2019 08:51:19 CST.

This example shows why it's important to use mktime() with local time, rather than UTC:

  1. gmtime() with no argument returns a struct_time using UTC. current_utc shows March 1, 2019 14:51:19 UTC. This is accurate because CST is UTC-06:00, so UTC should be 6 hours ahead of local time.

  2. mktime() tries to return the number of seconds, expecting local time, but you passed current_utc instead. So, instead of understanding that current_utc is UTC time, it assumes you meant March 1, 2019 14:51:19 CST.

  3. gmtime() is then used to convert those seconds back into UTC, which results in an inconsistency. The time is now March 1, 2019 20:51:19 UTC. The reason for this discrepancy is the fact that mktime() expected local time. So, the conversion back to UTC adds another 6 hours to local time.

Working with time zones is notoriously difficult, so it's important to set yourself up for success by understanding the differences between UTC and local time and the Python time functions that deal with each.

Converting a Python Time Object to a String

While working with tuples is fun and all, sometimes it's best to work with strings.

String representations of time, also known as timestamps, help make times more readable and can be especially useful for building intuitive user interfaces.

There are two Python time functions that you use for converting a time.struct_time object to a string:

  1. asctime()
  2. strftime()

You'll begin by learning aboutasctime().

asctime()

You use asctime() for converting a time tuple or struct_time to a timestamp:

>>>
>>> import time
>>> time.asctime(time.gmtime())
'Fri Mar  1 18:42:08 2019'
>>> time.asctime(time.localtime())
'Fri Mar  1 12:42:15 2019'

Both gmtime() and localtime() return struct_time instances, for UTC and local time respectively.

You can use asctime() to convert either struct_time to a timestamp. asctime() works similarly to ctime(), which you learned about earlier in this article, except instead of passing a floating point number, you pass a tuple. Even the timestamp format is the same between the two functions.

As with ctime(), the parameter for asctime() is optional. If you do not pass a time object to asctime(), then it will use the current local time:

>>>
>>> import time
>>> time.asctime()
'Fri Mar  1 12:56:07 2019'

As with ctime(), it also ignores locale information.

One of the biggest drawbacks of asctime() is its format inflexibility. strftime() solves this problem by allowing you to format your timestamps.

strftime()

You may find yourself in a position where the string format from ctime() and asctime() isn't satisfactory for your application. Instead, you may want to format your strings in a way that's more meaningful to your users.

One example of this is if you would like to display your time in a string that takes locale information into account.

To format strings, given a struct_time or Python time tuple, you use strftime(), which stands for "string format time."

strftime() takes two arguments:

  1. format specifies the order and form of the time elements in your string.
  2. t is an optional time tuple.

To format a string, you use directives. Directives are character sequences that begin with a % that specify a particular time element, such as:

For example, you can output the date in your local time using the ISO 8601 standard like this:

>>>
>>> import time
>>> time.strftime('%Y-%m-%d', time.localtime())
'2019-03-01'

Further Reading: While representing dates using Python time is completely valid and acceptable, you should also consider using Python's datetime module, which provides shortcuts and a more robust framework for working with dates and times together.

For example, you can simplify outputting a date in the ISO 8601 format using datetime:

>>>
>>> from datetime import date
>>> date(year=2019, month=3, day=1).isoformat()
'2019-03-01'

As you saw before, a great benefit of using strftime() over asctime() is its ability to render timestamps that make use of locale-specific information.

For example, if you want to represent the date and time in a locale-sensitive way, you can't use asctime():

>>>
>>> from time import asctime
>>> asctime()
'Sat Mar  2 15:21:14 2019'

>>> import locale
>>> locale.setlocale(locale.LC_TIME, 'zh_HK')  # Chinese - Hong Kong
'zh_HK'
>>> asctime()
'Sat Mar  2 15:58:49 2019'

Notice that even after programmatically changing your locale, asctime() still returns the date and time in the same format as before.

Technical Detail: LC_TIME is the locale category for date and time formatting. The locale argument 'zh_HK' may be different, depending on your system.

When you use strftime(), however, you'll see that it accounts for locale:

>>>
>>> from time import strftime, localtime
>>> strftime('%c', localtime())
'Sat Mar  2 15:23:20 2019'

>>> import locale
>>> locale.setlocale(locale.LC_TIME, 'zh_HK')  # Chinese - Hong Kong
'zh_HK'
>>> strftime('%c', localtime())
'六  3/ 2 15:58:12 2019' 2019'

Here, you have successfully utilized the locale information because you used strftime().

Note: %c is the directive for locale-appropriate date and time.

If the time tuple is not passed to the parameter t, then strftime() will use the result of localtime() by default. So, you could simplify the examples above by removing the optional second argument:

>>>
>>> from time import strftime
>>> strftime('The current local datetime is: %c')
'The current local datetime is: Fri Mar  1 23:18:32 2019'

Here, you've used the default time instead of passing your own as an argument. Also, notice that the format argument can consist of text other than formatting directives.

Further Reading: Check out this thorough list of directives available to strftime().

The Python time module also includes the inverse operation of converting a timestamp back into a struct_time object.

Converting a Python Time String to an Object

When you're working with date and time related strings, it can be very valuable to convert the timestamp to a time object.

To convert a time string to a struct_time, you use strptime(), which stands for "string parse time":

>>>
>>> from time import strptime
>>> strptime('2019-03-01', '%Y-%m-%d')
time.struct_time(tm_year=2019, tm_mon=3, tm_mday=1, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=4, tm_yday=60, tm_isdst=-1)

The first argument to strptime() must be the timestamp you wish to convert. The second argument is the format that the timestamp is in.

The format parameter is optional and defaults to '%a %b %d %H:%M:%S %Y'. Therefore, if you have a timestamp in that format, you don't need to pass it as an argument:

>>>
>>> strptime('Fri Mar 01 23:38:40 2019')
time.struct_time(tm_year=2019, tm_mon=3, tm_mday=1, tm_hour=23, tm_min=38, tm_sec=40, tm_wday=4, tm_yday=60, tm_isdst=-1)

Since a struct_time has 9 key date and time components, strptime() must provide reasonable defaults for values for those components it can't parse from string.

In the previous examples, tm_isdst=-1. This means that strptime() can't determine by the timestamp whether it represents daylight savings time or not.

Now you know how to work with Python times and dates using the time module in a variety of ways. However, there are other uses for time outside of simply creating time objects, getting Python time strings, and using seconds elapsed since the epoch.

Suspending Execution

One really useful Python time function is sleep(), which suspends the thread's execution for a specified amount of time.

For example, you can suspend your program's execution for 10 seconds like this:

>>>
>>> from time import sleep, strftime
>>> strftime('%c')
'Fri Mar  1 23:49:26 2019'
>>> sleep(10)
>>> strftime('%c')
'Fri Mar  1 23:49:36 2019'

Your program will print the first formatted datetime string, then pause for 10 seconds, and finally print the second formatted datetime string.

You can also pass fractional seconds to sleep():

>>>
>>> from time import sleep
>>> sleep(0.5)

sleep() is useful for testing or making your program wait for any reason, but you must be careful not to halt your production code unless you have good reason to do so.

Before Python 3.5, a signal sent to your process could interrupt sleep(). However, in 3.5 and later, sleep() will always suspend execution for at least the amount of specified time, even if the process receives a signal.

sleep() is just one Python time function that can help you test your programs and make them more robust.

Measuring Performance

You can use time to measure the performance of your program.

The way you do this is to use perf_counter() which, as the name suggests, provides a performance counter with a high resolution to measure short distances of time.

To use perf_counter(), you place a counter before your code begins execution as well as after your code's execution completes:

>>>
>>> from time import perf_counter
>>> def longrunning_function():
...     for i in range(1, 11):
...         time.sleep(i / i ** 2)
...
>>> start = perf_counter()
>>> longrunning_function()
>>> end = perf_counter()
>>> execution_time = (end - start)
>>> execution_time
8.201258441999926

First, start captures the moment before you call the function. end captures the moment after the function returns. The function's total execution time took (end - start) seconds.

Technical Detail: Python 3.7 introduced perf_counter_ns(), which works the same as perf_counter(), but uses nanoseconds instead of seconds.

perf_counter() (or perf_counter_ns()) is the most precise way to measure the performance of your code using one execution. However, if you're trying to accurately gauge the performance of a code snippet, I recommend using the Python timeit module.

timeit specializes in running code many times to get a more accurate performance analysis and helps you to avoid oversimplifying your time measurement as well as other common pitfalls.

Conclusion

Congratulations! You now have a great foundation for working with dates and times in Python.

Now, you're able to:

On top of all that, you've learned some fundamental concepts surrounding date and time, such as:

Now, it's time for you to apply your newfound knowledge of Python time in your real world applications!

Further Reading

If you want to continue learning more about using dates and times in Python, take a look at the following modules:


[ Improve Your Python With 🐍 Python Tricks 💌 - Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

22 Apr 2019 2:00pm GMT

Real Python: A Beginner’s Guide to the Python time Module

The Python time module provides many ways of representing time in code, such as objects, numbers, and strings. It also provides functionality other than representing time, like waiting during code execution and measuring the efficiency of your code.

This article will walk you through the most commonly used functions and objects in time.

By the end of this article, you'll be able to:

You'll start by learning how you can use a floating point number to represent time.

Free Bonus: Click here to get our free Python Cheat Sheet that shows you the basics of Python 3, like working with data types, dictionaries, lists, and Python functions.

Dealing With Python Time Using Seconds

One of the ways you can manage the concept of Python time in your application is by using a floating point number that represents the number of seconds that have passed since the beginning of an era-that is, since a certain starting point.

Let's dive deeper into what that means, why it's useful, and how you can use it to implement logic, based on Python time, in your application.

The Epoch

You learned in the previous section that you can manage Python time with a floating point number representing elapsed time since the beginning of an era.

Merriam-Webster defines an era as:

The important concept to grasp here is that, when dealing with Python time, you're considering a period of time identified by a starting point. In computing, you call this starting point the epoch.

The epoch, then, is the starting point against which you can measure the passage of time.

For example, if you define the epoch to be midnight on January 1, 1970 UTC-the epoch as defined on Windows and most UNIX systems-then you can represent midnight on January 2, 1970 UTC as 86400 seconds since the epoch.

This is because there are 60 seconds in a minute, 60 minutes in an hour, and 24 hours in a day. January 2, 1970 UTC is only one day after the epoch, so you can apply basic math to arrive at that result:

>>>
>>> 60 * 60 * 24
86400

It is also important to note that you can still represent time before the epoch. The number of seconds would just be negative.

For example, you would represent midnight on December 31, 1969 UTC (using an epoch of January 1, 1970) as -86400 seconds.

While January 1, 1970 UTC is a common epoch, it is not the only epoch used in computing. In fact, different operating systems, filesystems, and APIs sometimes use different epochs.

As you saw before, UNIX systems define the epoch as January 1, 1970. The Win32 API, on the other hand, defines the epoch as January 1, 1601.

You can use time.gmtime() to determine your system's epoch:

>>>
>>> import time
>>> time.gmtime(0)
time.struct_time(tm_year=1970, tm_mon=1, tm_mday=1, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=3, tm_yday=1, tm_isdst=0)

You'll learn about gmtime() and struct_time throughout the course of this article. For now, just know that you can use time to discover the epoch using this function.

Now that you understand more about how to measure time in seconds using an epoch, let's take a look at Python's time module to see what functions it offers that help you do so.

Python Time in Seconds as a Floating Point Number

First, time.time() returns the number of seconds that have passed since the epoch. The return value is a floating point number to account for fractional seconds:

>>>
>>> from time import time
>>> time()
1551143536.9323719

The number you get on your machine may be very different because the reference point considered to be the epoch may be very different.

Further Reading: Python 3.7 introduced time_ns(), which returns an integer value representing the same elapsed time since the epoch, but in nanoseconds rather than seconds.

Measuring time in seconds is useful for a number of reasons:

Sometimes, however, you may want to see the current time represented as a string. To do so, you can pass the number of seconds you get from time() into time.ctime().

Python Time in Seconds as a String Representing Local Time

As you saw before, you may want to convert the Python time, represented as the number of elapsed seconds since the epoch, to a string. You can do so using ctime():

>>>
>>> from time import time, ctime
>>> t = time()
>>> ctime(t)
'Mon Feb 25 19:11:59 2019'

Here, you've recorded the current time in seconds into the variable t, then passed t as an argument to ctime(), which returns a string representation of that same time.

Technical Detail: The argument, representing seconds since the epoch, is optional according to the ctime() definition. If you don't pass an argument, then ctime() uses the return value of time() by default. So, you could simplify the example above:

>>>
>>> from time import ctime
>>> ctime()
'Mon Feb 25 19:11:59 2019'

The string representation of time, also known as a timestamp, returned by ctime() is formatted with the following structure:

  1. Day of the week: Mon (Monday)
  2. Month of the year: Feb (February)
  3. Day of the month: 25
  4. Hours, minutes, and seconds using the 24-hour clock notation: 19:11:59
  5. Year: 2019

The previous example displays the timestamp of a particular moment captured from a computer in the South Central region of the United States. But, let's say you live in Sydney, Australia, and you executed the same command at the same instant.

Instead of the above output, you'd see the following:

>>>
>>> from time import time, ctime
>>> t = time()
>>> ctime(t)
'Tue Feb 26 12:11:59 2019'

Notice that the day of week, day of month, and hour portions of the timestamp are different than the first example.

These outputs are different because the timestamp returned by ctime() depends on your geographical location.

Note: While the concept of time zones is relative to your physical location, you can modify this in your computer's settings without actually relocating.

The representation of time dependent on your physical location is called local time and makes use of a concept called time zones.

Note: Since local time is related to your locale, timestamps often account for locale-specific details such as the order of the elements in the string and translations of the day and month abbreviations. ctime() ignores these details.

Let's dig a little deeper into the notion of time zones so that you can better understand Python time representations.

Understanding Time Zones

A time zone is a region of the world that conforms to a standardized time. Time zones are defined by their offset from Coordinated Universal Time (UTC) and, potentially, the inclusion of daylight savings time (which we'll cover in more detail later in this article).

Fun Fact: If you're a native English speaker, you might be wondering why the abbreviation for "Coordinated Universal Time" is UTC rather than the more obvious CUT. However, if you're a native French speaker, you would call it "Temps Universel Coordonné," which suggests a different abbreviation: TUC.

Ultimately, the International Telecommunication Union and the International Astronomical Union compromised on UTC as the official abbreviation so that, regardless of language, the abbreviation would be the same.

UTC and Time Zones

UTC is the time standard against which all the world's timekeeping is synchronized (or coordinated). It is not, itself, a time zone but rather a transcendent standard that defines what time zones are.

UTC time is precisely measured using astronomical time, referring to the Earth's rotation, and atomic clocks.

Time zones are then defined by their offset from UTC. For example, in North and South America, the Central Time Zone (CT) is behind UTC by five or six hours and, therefore, uses the notation UTC-5:00 or UTC-6:00.

Sydney, Australia, on the other hand, belongs to the Australian Eastern Time Zone (AET), which is ten or eleven hours ahead of UTC (UTC+10:00 or UTC+11:00).

This difference (UTC-6:00 to UTC+10:00) is the reason for the variance you observed in the two outputs from ctime() in the previous examples:

These times are exactly sixteen hours apart, which is consistent with the time zone offsets mentioned above.

You may be wondering why CT can be either five or six hours behind UTC or why AET can be ten or eleven hours ahead. The reason for this is that some areas around the world, including parts of these time zones, observe daylight savings time.

Daylight Savings Time

Summer months generally experience more daylight hours than winter months. Because of this, some areas observe daylight savings time (DST) during the spring and summer to make better use of those daylight hours.

For places that observe DST, their clocks will jump ahead one hour at the beginning of spring (effectively losing an hour). Then, in the fall, the clocks will be reset to standard time.

The letters S and D represent standard time and daylight savings time in time zone notation:

When you represent times as timestamps in local time, it is always important to consider whether DST is applicable or not.

ctime() accounts for daylight savings time. So, the output difference listed previously would be more accurate as the following:

Dealing With Python Time Using Data Structures

Now that you have a firm grasp on many fundamental concepts of time including epochs, time zones, and UTC, let's take a look at more ways to represent time using the Python time module.

Python Time as a Tuple

Instead of using a number to represent Python time, you can use another primitive data structure: a tuple.

The tuple allows you to manage time a little more easily by abstracting some of the data and making it more readable.

When you represent time as a tuple, each element in your tuple corresponds to a specific element of time:

  1. Year
  2. Month as an integer, ranging between 1 (January) and 12 (December)
  3. Day of the month
  4. Hour as an integer, ranging between 0 (12 A.M.) and 23 (11 P.M.)
  5. Minute
  6. Second
  7. Day of the week as an integer, ranging between 0 (Monday) and 6 (Sunday)
  8. Day of the year
  9. Daylight savings time as an integer with the following values:
    • 1 is daylight savings time.
    • 0 is standard time.
    • -1 is unknown.

Using the methods you've already learned, you can represent the same Python time in two different ways:

>>>
>>> from time import time, ctime
>>> t = time()
>>> t
1551186415.360564
>>> ctime(t)
'Tue Feb 26 07:06:55 2019'

>>> time_tuple = (2019, 2, 26, 7, 6, 55, 1, 57, 0)

In this case, both t and time_tuple represent the same time, but the tuple provides a more readable interface for working with time components.

Technical Detail: Actually, if you look at the Python time represented by time_tuple in seconds (which you'll see how to do later in this article), you'll see that it resolves to 1551186415.0 rather than 1551186415.360564.

This is because the tuple doesn't have a way to represent fractional seconds.

While the tuple provides a more manageable interface for working with Python time, there is an even better object: struct_time.

Python Time as an Object

The problem with the tuple construct is that it still looks like a bunch of numbers, even though it's better organized than a single, seconds-based number.

struct_time provides a solution to this by utilizing NamedTuple, from Python's collections module, to associate the tuple's sequence of numbers with useful identifiers:

>>>
>>> from time import struct_time
>>> time_tuple = (2019, 2, 26, 7, 6, 55, 1, 57, 0)
>>> time_obj = struct_time(time_tuple)
>>> time_obj
time.struct_time(tm_year=2019, tm_mon=2, tm_mday=26, tm_hour=7, tm_min=6, tm_sec=55, tm_wday=1, tm_yday=57, tm_isdst=0)

Technical Detail: If you're coming from another language, the terms struct and object might be in opposition to one another.

In Python, there is no data type called struct. Instead, everything is an object.

However, the name struct_time is derived from the C-based time library where the data type is actually a struct.

In fact, Python's time module, which is implemented in C, uses this struct directly by including the header file times.h.

Now, you can access specific elements of time_obj using the attribute's name rather than an index:

>>>
>>> day_of_year = time_obj.tm_yday
>>> day_of_year
57
>>> day_of_month = time_obj.tm_mday
>>> day_of_month
26

Beyond the readability and usability of struct_time, it is also important to know because it is the return type of many of the functions in the Python time module.

Converting Python Time in Seconds to an Object

Now that you've seen the three primary ways of working with Python time, you'll learn how to convert between the different time data types.

Converting between time data types is dependent on whether the time is in UTC or local time.

Coordinated Universal Time (UTC)

The epoch uses UTC for its definition rather than a time zone. Therefore, the seconds elapsed since the epoch is not variable depending on your geographical location.

However, the same cannot be said of struct_time. The object representation of Python time may or may not take your time zone into account.

There are two ways to convert a float representing seconds to a struct_time:

  1. UTC
  2. Local time

To convert a Python time float to a UTC-based struct_time, the Python time module provides a function called gmtime().

You've seen gmtime() used once before in this article:

>>>
>>> import time
>>> time.gmtime(0)
time.struct_time(tm_year=1970, tm_mon=1, tm_mday=1, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=3, tm_yday=1, tm_isdst=0)

You used this call to discover your system's epoch. Now, you have a better foundation for understanding what's actually happening here.

gmtime() converts the number of elapsed seconds since the epoch to a struct_time in UTC. In this case, you've passed 0 as the number of seconds, meaning you're trying to find the epoch, itself, in UTC.

Note: Notice the attribute tm_isdst is set to 0. This attribute represents whether the time zone is using daylight savings time. UTC never subscribes to DST, so that flag will always be 0 when using gmtime().

As you saw before, struct_time cannot represent fractional seconds, so gmtime() ignores the fractional seconds in the argument:

>>>
>>> import time
>>> time.gmtime(1.99)
time.struct_time(tm_year=1970, tm_mon=1, tm_mday=1, tm_hour=0, tm_min=0, tm_sec=1, tm_wday=3, tm_yday=1, tm_isdst=0)

Notice that even though the number of seconds you passed was very close to 2, the .99 fractional seconds were simply ignored, as shown by tm_sec=1.

The secs parameter for gmtime() is optional, meaning you can call gmtime() with no arguments. Doing so will provide the current time in UTC:

>>>
>>> import time
>>> time.gmtime()
time.struct_time(tm_year=2019, tm_mon=2, tm_mday=28, tm_hour=12, tm_min=57, tm_sec=24, tm_wday=3, tm_yday=59, tm_isdst=0)

Interestingly, there is no inverse for this function within time. Instead, you'll have to look in Python's calendar module for a function named timegm():

>>>
>>> import calendar
>>> import time
>>> time.gmtime()
time.struct_time(tm_year=2019, tm_mon=2, tm_mday=28, tm_hour=13, tm_min=23, tm_sec=12, tm_wday=3, tm_yday=59, tm_isdst=0)
>>> calendar.timegm(time.gmtime())
1551360204

timegm() takes a tuple (or struct_time, since it is a subclass of tuple) and returns the corresponding number of seconds since the epoch.

Historical Context: If you're interested in why timegm() is not in time, you can view the discussion in Python Issue 6280.

In short, it was originally added to calendar because time closely follows C's time library (defined in time.h), which contains no matching function. The above-mentioned issue proposed the idea of moving or copying timegm() into time.

However, with advances to the datetime library, inconsistencies in the patched implementation of time.timegm(), and a question of how to then handle calendar.timegm(), the maintainers declined the patch, encouraging the use of datetime instead.

Working with UTC is valuable in programming because it's a standard. You don't have to worry about DST, time zone, or locale information.

That said, there are plenty of cases when you'd want to use local time. Next, you'll see how to convert from seconds to local time so that you can do just that.

Local Time

In your application, you may need to work with local time rather than UTC. Python's time module provides a function for getting local time from the number of seconds elapsed since the epoch called localtime().

The signature of localtime() is similar to gmtime() in that it takes an optional secs argument, which it uses to build a struct_time using your local time zone:

>>>
>>> import time
>>> time.time()
1551448206.86196
>>> time.localtime(1551448206.86196)
time.struct_time(tm_year=2019, tm_mon=3, tm_mday=1, tm_hour=7, tm_min=50, tm_sec=6, tm_wday=4, tm_yday=60, tm_isdst=0)

Notice that tm_isdst=0. Since DST matters with local time, tm_isdst will change between 0 and 1 depending on whether or not DST is applicable for the given time. Since tm_isdst=0, DST is not applicable for March 1, 2019.

In the United States in 2019, daylight savings time begins on March 10. So, to test if the DST flag will change correctly, you need to add 9 days' worth of seconds to the secs argument.

To compute this, you take the number of seconds in a day (86,400) and multiply that by 9 days:

>>>
>>> new_secs = 1551448206.86196 + (86400 * 9)
>>> time.localtime(new_secs)
time.struct_time(tm_year=2019, tm_mon=3, tm_mday=10, tm_hour=8, tm_min=50, tm_sec=6, tm_wday=6, tm_yday=69, tm_isdst=1)

Now, you'll see that the struct_time shows the date to be March 10, 2019 with tm_isdst=1. Also, notice that tm_hour has also jumped ahead, to 8 instead of 7 in the previous example, because of daylight savings time.

Since Python 3.3, struct_time has also included two attributes that are useful in determining the time zone of the struct_time:

  1. tm_zone
  2. tm_gmtoff

At first, these attributes were platform dependent, but they have been available on all platforms since Python 3.6.

First, tm_zone stores the local time zone:

>>>
>>> import time
>>> current_local = time.localtime()
>>> current_local.tm_zone
'CST'

Here, you can see that localtime() returns a struct_time with the time zone set to CST (Central Standard Time).

As you saw before, you can also tell the time zone based on two pieces of information, the UTC offset and DST (if applicable):

>>>
>>> import time
>>> current_local = time.localtime()
>>> current_local.tm_gmtoff
-21600
>>> current_local.tm_isdst
0

In this case, you can see that current_local is 21600 seconds behind GMT, which stands for Greenwich Mean Time. GMT is the time zone with no UTC offset: UTC±00:00.

21600 seconds divided by seconds per hour (3,600) means that current_local time is GMT-06:00 (or UTC-06:00).

You can use the GMT offset plus the DST status to deduce that current_local is UTC-06:00 at standard time, which corresponds to the Central standard time zone.

Like gmtime(), you can ignore the secs argument when calling localtime(), and it will return the current local time in a struct_time:

>>>
>>> import time
>>> time.localtime()
time.struct_time(tm_year=2019, tm_mon=3, tm_mday=1, tm_hour=8, tm_min=34, tm_sec=28, tm_wday=4, tm_yday=60, tm_isdst=0)

Unlike gmtime(), the inverse function of localtime() does exist in the Python time module. Let's take a look at how that works.

Converting a Local Time Object to Seconds

You've already seen how to convert a UTC time object to seconds using calendar.timegm(). To convert local time to seconds, you'll use mktime().

mktime() requires you to pass a parameter called t that takes the form of either a normal 9-tuple or a struct_time object representing local time:

>>>
>>> import time

>>> time_tuple = (2019, 3, 10, 8, 50, 6, 6, 69, 1)
>>> time.mktime(time_tuple)
1552225806.0

>>> time_struct = time.struct_time(time_tuple)
>>> time.mktime(time_struct)
1552225806.0

It's important to keep in mind that t must be a tuple representing local time, not UTC:

>>>
>>> from time import gmtime, mktime

>>> # 1
>>> current_utc = time.gmtime()
>>> current_utc
time.struct_time(tm_year=2019, tm_mon=3, tm_mday=1, tm_hour=14, tm_min=51, tm_sec=19, tm_wday=4, tm_yday=60, tm_isdst=0)

>>> # 2
>>> current_utc_secs = mktime(current_utc)
>>> current_utc_secs
1551473479.0

>>> # 3
>>> time.gmtime(current_utc_secs)
time.struct_time(tm_year=2019, tm_mon=3, tm_mday=1, tm_hour=20, tm_min=51, tm_sec=19, tm_wday=4, tm_yday=60, tm_isdst=0)

Note: For this example, assume that the local time is March 1, 2019 08:51:19 CST.

This example shows why it's important to use mktime() with local time, rather than UTC:

  1. gmtime() with no argument returns a struct_time using UTC. current_utc shows March 1, 2019 14:51:19 UTC. This is accurate because CST is UTC-06:00, so UTC should be 6 hours ahead of local time.

  2. mktime() tries to return the number of seconds, expecting local time, but you passed current_utc instead. So, instead of understanding that current_utc is UTC time, it assumes you meant March 1, 2019 14:51:19 CST.

  3. gmtime() is then used to convert those seconds back into UTC, which results in an inconsistency. The time is now March 1, 2019 20:51:19 UTC. The reason for this discrepancy is the fact that mktime() expected local time. So, the conversion back to UTC adds another 6 hours to local time.

Working with time zones is notoriously difficult, so it's important to set yourself up for success by understanding the differences between UTC and local time and the Python time functions that deal with each.

Converting a Python Time Object to a String

While working with tuples is fun and all, sometimes it's best to work with strings.

String representations of time, also known as timestamps, help make times more readable and can be especially useful for building intuitive user interfaces.

There are two Python time functions that you use for converting a time.struct_time object to a string:

  1. asctime()
  2. strftime()

You'll begin by learning aboutasctime().

asctime()

You use asctime() for converting a time tuple or struct_time to a timestamp:

>>>
>>> import time
>>> time.asctime(time.gmtime())
'Fri Mar  1 18:42:08 2019'
>>> time.asctime(time.localtime())
'Fri Mar  1 12:42:15 2019'

Both gmtime() and localtime() return struct_time instances, for UTC and local time respectively.

You can use asctime() to convert either struct_time to a timestamp. asctime() works similarly to ctime(), which you learned about earlier in this article, except instead of passing a floating point number, you pass a tuple. Even the timestamp format is the same between the two functions.

As with ctime(), the parameter for asctime() is optional. If you do not pass a time object to asctime(), then it will use the current local time:

>>>
>>> import time
>>> time.asctime()
'Fri Mar  1 12:56:07 2019'

As with ctime(), it also ignores locale information.

One of the biggest drawbacks of asctime() is its format inflexibility. strftime() solves this problem by allowing you to format your timestamps.

strftime()

You may find yourself in a position where the string format from ctime() and asctime() isn't satisfactory for your application. Instead, you may want to format your strings in a way that's more meaningful to your users.

One example of this is if you would like to display your time in a string that takes locale information into account.

To format strings, given a struct_time or Python time tuple, you use strftime(), which stands for "string format time."

strftime() takes two arguments:

  1. format specifies the order and form of the time elements in your string.
  2. t is an optional time tuple.

To format a string, you use directives. Directives are character sequences that begin with a % that specify a particular time element, such as:

For example, you can output the date in your local time using the ISO 8601 standard like this:

>>>
>>> import time
>>> time.strftime('%Y-%m-%d', time.localtime())
'2019-03-01'

Further Reading: While representing dates using Python time is completely valid and acceptable, you should also consider using Python's datetime module, which provides shortcuts and a more robust framework for working with dates and times together.

For example, you can simplify outputting a date in the ISO 8601 format using datetime:

>>>
>>> from datetime import date
>>> date(year=2019, month=3, day=1).isoformat()
'2019-03-01'

As you saw before, a great benefit of using strftime() over asctime() is its ability to render timestamps that make use of locale-specific information.

For example, if you want to represent the date and time in a locale-sensitive way, you can't use asctime():

>>>
>>> from time import asctime
>>> asctime()
'Sat Mar  2 15:21:14 2019'

>>> import locale
>>> locale.setlocale(locale.LC_TIME, 'zh_HK')  # Chinese - Hong Kong
'zh_HK'
>>> asctime()
'Sat Mar  2 15:58:49 2019'

Notice that even after programmatically changing your locale, asctime() still returns the date and time in the same format as before.

Technical Detail: LC_TIME is the locale category for date and time formatting. The locale argument 'zh_HK' may be different, depending on your system.

When you use strftime(), however, you'll see that it accounts for locale:

>>>
>>> from time import strftime, localtime
>>> strftime('%c', localtime())
'Sat Mar  2 15:23:20 2019'

>>> import locale
>>> locale.setlocale(locale.LC_TIME, 'zh_HK')  # Chinese - Hong Kong
'zh_HK'
>>> strftime('%c', localtime())
'六  3/ 2 15:58:12 2019' 2019'

Here, you have successfully utilized the locale information because you used strftime().

Note: %c is the directive for locale-appropriate date and time.

If the time tuple is not passed to the parameter t, then strftime() will use the result of localtime() by default. So, you could simplify the examples above by removing the optional second argument:

>>>
>>> from time import strftime
>>> strftime('The current local datetime is: %c')
'The current local datetime is: Fri Mar  1 23:18:32 2019'

Here, you've used the default time instead of passing your own as an argument. Also, notice that the format argument can consist of text other than formatting directives.

Further Reading: Check out this thorough list of directives available to strftime().

The Python time module also includes the inverse operation of converting a timestamp back into a struct_time object.

Converting a Python Time String to an Object

When you're working with date and time related strings, it can be very valuable to convert the timestamp to a time object.

To convert a time string to a struct_time, you use strptime(), which stands for "string parse time":

>>>
>>> from time import strptime
>>> strptime('2019-03-01', '%Y-%m-%d')
time.struct_time(tm_year=2019, tm_mon=3, tm_mday=1, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=4, tm_yday=60, tm_isdst=-1)

The first argument to strptime() must be the timestamp you wish to convert. The second argument is the format that the timestamp is in.

The format parameter is optional and defaults to '%a %b %d %H:%M:%S %Y'. Therefore, if you have a timestamp in that format, you don't need to pass it as an argument:

>>>
>>> strptime('Fri Mar 01 23:38:40 2019')
time.struct_time(tm_year=2019, tm_mon=3, tm_mday=1, tm_hour=23, tm_min=38, tm_sec=40, tm_wday=4, tm_yday=60, tm_isdst=-1)

Since a struct_time has 9 key date and time components, strptime() must provide reasonable defaults for values for those components it can't parse from string.

In the previous examples, tm_isdst=-1. This means that strptime() can't determine by the timestamp whether it represents daylight savings time or not.

Now you know how to work with Python times and dates using the time module in a variety of ways. However, there are other uses for time outside of simply creating time objects, getting Python time strings, and using seconds elapsed since the epoch.

Suspending Execution

One really useful Python time function is sleep(), which suspends the thread's execution for a specified amount of time.

For example, you can suspend your program's execution for 10 seconds like this:

>>>
>>> from time import sleep, strftime
>>> strftime('%c')
'Fri Mar  1 23:49:26 2019'
>>> sleep(10)
>>> strftime('%c')
'Fri Mar  1 23:49:36 2019'

Your program will print the first formatted datetime string, then pause for 10 seconds, and finally print the second formatted datetime string.

You can also pass fractional seconds to sleep():

>>>
>>> from time import sleep
>>> sleep(0.5)

sleep() is useful for testing or making your program wait for any reason, but you must be careful not to halt your production code unless you have good reason to do so.

Before Python 3.5, a signal sent to your process could interrupt sleep(). However, in 3.5 and later, sleep() will always suspend execution for at least the amount of specified time, even if the process receives a signal.

sleep() is just one Python time function that can help you test your programs and make them more robust.

Measuring Performance

You can use time to measure the performance of your program.

The way you do this is to use perf_counter() which, as the name suggests, provides a performance counter with a high resolution to measure short distances of time.

To use perf_counter(), you place a counter before your code begins execution as well as after your code's execution completes:

>>>
>>> from time import perf_counter
>>> def longrunning_function():
...     for i in range(1, 11):
...         time.sleep(i / i ** 2)
...
>>> start = perf_counter()
>>> longrunning_function()
>>> end = perf_counter()
>>> execution_time = (end - start)
>>> execution_time
8.201258441999926

First, start captures the moment before you call the function. end captures the moment after the function returns. The function's total execution time took (end - start) seconds.

Technical Detail: Python 3.7 introduced perf_counter_ns(), which works the same as perf_counter(), but uses nanoseconds instead of seconds.

perf_counter() (or perf_counter_ns()) is the most precise way to measure the performance of your code using one execution. However, if you're trying to accurately gauge the performance of a code snippet, I recommend using the Python timeit module.

timeit specializes in running code many times to get a more accurate performance analysis and helps you to avoid oversimplifying your time measurement as well as other common pitfalls.

Conclusion

Congratulations! You now have a great foundation for working with dates and times in Python.

Now, you're able to:

On top of all that, you've learned some fundamental concepts surrounding date and time, such as:

Now, it's time for you to apply your newfound knowledge of Python time in your real world applications!

Further Reading

If you want to continue learning more about using dates and times in Python, take a look at the following modules:


[ Improve Your Python With 🐍 Python Tricks 💌 - Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

22 Apr 2019 2:00pm GMT

PyCharm: Interview: Dan Tofan for this week’s data science webinar

In the past few years, Python has made a big push into data science and PyCharm has as well. Years ago we added Jupyter Notebook integration, then 2017.3 introduced Scientific Mode for workflows that felt more like an IDE. In 2019.1 we re-invented our Jupyter support to also be more like a professional tool.

PyCharm and data science are thus a hot topic. Dan Tofan very recently published a Pluralsight course on using PyCharm for data science and we invited him for a webinar next week.

To help set the stage, below is an interview with Dan.

webinar-05-2

Let's start with the key point: what does PyCharm bring to data scientists?

PyCharm brings a productivity boost to data scientists, by helping them explore data, debug Python code, write better Python code, and understand Python code faster. As a PyCharm user, I experienced and benefited from these productivity boosters, which I distilled into my first Pluralsight course, so that data scientists can make the most out of PyCharm in their activities.

For the webinar: who is it for and what can people expect you to cover?

If you are a data scientist who dabbled with PyCharm, then this webinar is for you. I will cover PyCharm's most relevant features to data science: the scientific mode and the completely rewritten Jupyter support. I will show how these features interplay with other PyCharm features, such as refactoring code from Jupyter cells. I will use easy-to-understand code examples with popular data science libraries.

Now, back to the start: tell us a little about yourself.

Currently, I am a senior backend developer for Dimensions - a research data platform that uses data science, and links data on a total of over 140 million publications, grants, patents and clinical trials. I've always been curious, which led me to do my PhD studies at the University of Groningen (Netherlands) and learn more about statistics and data analysis.

Do Python data scientists feel like programmers first and data scientists second, or the reverse?

In my opinion, data science is a melting pot of skills from three complementing backgrounds: programmers, statisticians and business analysts. At the start of your data science journey, you are going to rely on the skills from your main background, and - as your skills expand - you are going to feel more and more like a data scientist.

Your course has a bunch of sections on software development practices and IDE tips. How important are these practices to "professional" data science?

As part of the melting pot, programmers bring a lot of value with their experiences ranging from software development practices to IDE tips. Data scientists from a programming background are already familiar with most of these, and those from other backgrounds benefit immensely.

Think of a code base that starts to grow: how do you write better code? How do you refactor the code? How can a new team member understand that code faster? These are some of the questions that my course helps with.

The course also covers three major facilities in PyCharm Professional: Scientific Mode, Jupyter support, and the Database tool. How do these fit in?

All of them are data centric, so they are very relevant to data scientists. These facilities are integrated nicely with other PyCharm capabilities such as debugging and refactoring. Overall, after watching the course and getting familiar with these capabilities, data scientists get a nice productivity boost.

This webinar is good timing. You just released the course and we just re-invented our Jupyter support. What do you think of the new, IDE-centric Jupyter integration?

I think the new Jupyter integration is an excellent step in the right direction, because you can use both Jupyter and PyCharm features such as debugging and code completion. Joel Grus gave an insightful and entertaining talk about Jupyter limitations at JupyterCon 2018. I think the new Jupyter integration in PyCharm can eventually help solve some Jupyter pain points raised by Joel, such as hidden state.

What's one big problem or pain point in Jupyter that could benefit from new ideas or tooling?

Reproducibility is problematic with Jupyter and it is important for data science. For example, it's easy to share a notebook on GitHub, then someone else tries to run it and gets different results. Perhaps the solution is a mix of discipline and better tools.

22 Apr 2019 11:47am GMT

PyCharm: Interview: Dan Tofan for this week’s data science webinar

In the past few years, Python has made a big push into data science and PyCharm has as well. Years ago we added Jupyter Notebook integration, then 2017.3 introduced Scientific Mode for workflows that felt more like an IDE. In 2019.1 we re-invented our Jupyter support to also be more like a professional tool.

PyCharm and data science are thus a hot topic. Dan Tofan very recently published a Pluralsight course on using PyCharm for data science and we invited him for a webinar next week.

To help set the stage, below is an interview with Dan.

webinar-05-2

Let's start with the key point: what does PyCharm bring to data scientists?

PyCharm brings a productivity boost to data scientists, by helping them explore data, debug Python code, write better Python code, and understand Python code faster. As a PyCharm user, I experienced and benefited from these productivity boosters, which I distilled into my first Pluralsight course, so that data scientists can make the most out of PyCharm in their activities.

For the webinar: who is it for and what can people expect you to cover?

If you are a data scientist who dabbled with PyCharm, then this webinar is for you. I will cover PyCharm's most relevant features to data science: the scientific mode and the completely rewritten Jupyter support. I will show how these features interplay with other PyCharm features, such as refactoring code from Jupyter cells. I will use easy-to-understand code examples with popular data science libraries.

Now, back to the start: tell us a little about yourself.

Currently, I am a senior backend developer for Dimensions - a research data platform that uses data science, and links data on a total of over 140 million publications, grants, patents and clinical trials. I've always been curious, which led me to do my PhD studies at the University of Groningen (Netherlands) and learn more about statistics and data analysis.

Do Python data scientists feel like programmers first and data scientists second, or the reverse?

In my opinion, data science is a melting pot of skills from three complementing backgrounds: programmers, statisticians and business analysts. At the start of your data science journey, you are going to rely on the skills from your main background, and - as your skills expand - you are going to feel more and more like a data scientist.

Your course has a bunch of sections on software development practices and IDE tips. How important are these practices to "professional" data science?

As part of the melting pot, programmers bring a lot of value with their experiences ranging from software development practices to IDE tips. Data scientists from a programming background are already familiar with most of these, and those from other backgrounds benefit immensely.

Think of a code base that starts to grow: how do you write better code? How do you refactor the code? How can a new team member understand that code faster? These are some of the questions that my course helps with.

The course also covers three major facilities in PyCharm Professional: Scientific Mode, Jupyter support, and the Database tool. How do these fit in?

All of them are data centric, so they are very relevant to data scientists. These facilities are integrated nicely with other PyCharm capabilities such as debugging and refactoring. Overall, after watching the course and getting familiar with these capabilities, data scientists get a nice productivity boost.

This webinar is good timing. You just released the course and we just re-invented our Jupyter support. What do you think of the new, IDE-centric Jupyter integration?

I think the new Jupyter integration is an excellent step in the right direction, because you can use both Jupyter and PyCharm features such as debugging and code completion. Joel Grus gave an insightful and entertaining talk about Jupyter limitations at JupyterCon 2018. I think the new Jupyter integration in PyCharm can eventually help solve some Jupyter pain points raised by Joel, such as hidden state.

What's one big problem or pain point in Jupyter that could benefit from new ideas or tooling?

Reproducibility is problematic with Jupyter and it is important for data science. For example, it's easy to share a notebook on GitHub, then someone else tries to run it and gets different results. Perhaps the solution is a mix of discipline and better tools.

22 Apr 2019 11:47am GMT

Ram Rachum: PySnooper: Never use print for debugging again

PySnooper: Never use print for debugging again

I just released a new open-source project!

https://github.com/cool-RR/PySnooper/.

22 Apr 2019 10:54am GMT

Ram Rachum: PySnooper: Never use print for debugging again

PySnooper: Never use print for debugging again

I just released a new open-source project!

https://github.com/cool-RR/PySnooper/.

22 Apr 2019 10:54am GMT

ListenData: Pandas Python Tutorial - Learn by Examples

Pandas being one of the most popular package in Python is widely used for data manipulation. It is a very powerful and versatile package which makes data cleaning and wrangling much easier and pleasant.

The Pandas library has a great contribution to the python community and it makes python as one of the top programming language for data science and analytics. It has become first choice of data analysts and scientists for data analysis and manipulation.

Data Analysis with Python : Pandas Step by Step Guide

Why pandas?
It has many functions which are the essence for data handling. In short, it can perform the following tasks for you -
  1. Create a structured data set similar to R's data frame and Excel spreadsheet.
  2. Reading data from various sources such as CSV, TXT, XLSX, SQL database, R etc.
  3. Selecting particular rows or columns from data set
  4. Arranging data in ascending or descending order
  5. Filtering data based on some conditions
  6. Summarizing data by classification variable
  7. Reshape data into wide or long format
  8. Time series analysis
  9. Merging and concatenating two datasets
  10. Iterate over the rows of dataset
  11. Writing or Exporting data in CSV or Excel format

Datasets:

In this tutorial we will use two datasets: 'income' and 'iris'.
  1. 'income' data : This data contains the income of various states from 2002 to 2015. The dataset contains 51 observations and 16 variables. Download link
  2. 'iris' data: It comprises of 150 observations with 5 variables. We have 3 species of flowers(50 flowers for each specie) and for all of them the sepal length and width and petal length and width are given. Download link


Important pandas functions to remember

The following is a list of common tasks along with pandas functions.
Utility Functions
Extract Column Names df.columns
Select first 2 rows df.iloc[:2]
Select first 2 columns df.iloc[:,:2]
Select columns by name df.loc[:,["col1","col2"]]
Select random no. of rows df.sample(n = 10)
Select fraction of random rows df.sample(frac = 0.2)
Rename the variables df.rename( )
Selecting a column as index df.set_index( )
Removing rows or columns df.drop( )
Sorting values df.sort_values( )
Grouping variables df.groupby( )
Filtering df.query( )
Finding the missing values df.isnull( )
Dropping the missing values df.dropna( )
Removing the duplicates df.drop_duplicates( )
Creating dummies pd.get_dummies( )
Ranking df.rank( )
Cumulative sum df.cumsum( )
Quantiles df.quantile( )
Selecting numeric variables df.select_dtypes( )
Concatenating two dataframes pd.concat()
Merging on basis of common variable pd.merge( )

Importing pandas library

You need to import or load the Pandas library first in order to use it. By "Importing a library", it means loading it into the memory and then you can use it. Run the following code to import pandas library:

import pandas as pd

The "pd" is an alias or abbreviation which will be used as a shortcut to access or call pandas functions. To access the functions from pandas library, you just need to type pd.function instead of pandas.function every time you need to apply it.

Importing Dataset

To read or import data from CSV file, you can use read_csv() function. In the function, you need to specify the file location of your CSV file.

income = pd.read_csv("C:\\Users\\Hp\\Python\\Basics\\income.csv")

 Index       State    Y2002    Y2003    Y2004    Y2005    Y2006    Y2007  \
0 A Alabama 1296530 1317711 1118631 1492583 1107408 1440134
1 A Alaska 1170302 1960378 1818085 1447852 1861639 1465841
2 A Arizona 1742027 1968140 1377583 1782199 1102568 1109382
3 A Arkansas 1485531 1994927 1119299 1947979 1669191 1801213
4 C California 1685349 1675807 1889570 1480280 1735069 1812546

Y2008 Y2009 Y2010 Y2011 Y2012 Y2013 Y2014 Y2015
0 1945229 1944173 1237582 1440756 1186741 1852841 1558906 1916661
1 1551826 1436541 1629616 1230866 1512804 1985302 1580394 1979143
2 1752886 1554330 1300521 1130709 1907284 1363279 1525866 1647724
3 1188104 1628980 1669295 1928238 1216675 1591896 1360959 1329341
4 1487315 1663809 1624509 1639670 1921845 1156536 1388461 1644607

Get Variable Names

By using income.columnscommand, you can fetch the names of variables of a data frame.
Index(['Index', 'State', 'Y2002', 'Y2003', 'Y2004', 'Y2005', 'Y2006', 'Y2007',
'Y2008', 'Y2009', 'Y2010', 'Y2011', 'Y2012', 'Y2013', 'Y2014', 'Y2015'],
dtype='object')
income.columns[0:2] returns first two column names 'Index', 'State'. In python, indexing starts from 0.

Knowing the Variable types

You can use the dataFrameName.dtypes command to extract the information of types of variables stored in the data frame.

income.dtypes

Index    object
State object
Y2002 int64
Y2003 int64
Y2004 int64
Y2005 int64
Y2006 int64
Y2007 int64
Y2008 int64
Y2009 int64
Y2010 int64
Y2011 int64
Y2012 int64
Y2013 int64
Y2014 int64
Y2015 int64
dtype: object

Here 'object' means strings or character variables. 'int64' refers to numeric variables (without decimals).

To see the variable type of one variable (let's say "State") instead of all the variables, you can use the command below -

income['State'].dtypes

It returns dtype('O'). In this case, 'O' refers to object i.e. type of variable as character.

Changing the data types

Y2008 is an integer. Suppose we want to convert it to float (numeric variable with decimals) we can write:

income.Y2008 = income.Y2008.astype(float)
income.dtypes

Index     object
State object
Y2002 int64
Y2003 int64
Y2004 int64
Y2005 int64
Y2006 int64
Y2007 int64
Y2008 float64
Y2009 int64
Y2010 int64
Y2011 int64
Y2012 int64
Y2013 int64
Y2014 int64
Y2015 int64
dtype: object

To view the dimensions or shape of the data

income.shape

 (51, 16)

51 is the number of rows and 16 is the number of columns.

You can also use shape[0] to see the number of rows (similar to nrow() in R) and shape[1] for number of columns (similar to ncol() in R).

income.shape[0]
income.shape[1]


To view only some of the rows

By default head( ) shows first 5 rows. If we want to see a specific number of rows we can mention it in the parenthesis. Similarly tail( ) function shows last 5 rows by default.

income.head()
income.head(2) #shows first 2 rows.
income.tail()
income.tail(2) #shows last 2 rows


Alternatively, any of the following commands can be used to fetch first five rows.
income[0:5]
income.iloc[0:5]

Define Categorical Variable

Like factors() function in R, we can include categorical variable in python using "category" dtype.

s = pd.Series([1,2,3,1,2], dtype="category")
s

0    1
1 2
2 3
3 1
4 2
dtype: category
Categories (3, int64): [1, 2, 3]

Extract Unique Values

The unique() function shows the unique levels or categories in the dataset.

income.Index.unique()

array(['A', 'C', 'D', ..., 'U', 'V', 'W'], dtype=object)


The nunique( ) shows the number of unique values.

income.Index.nunique()

It returns 19 as index column contains distinct 19 values.

Generate Cross Tab

pd.crosstab( ) is used to create a bivariate frequency distribution. Here the bivariate frequency distribution is between Index and State columns.

pd.crosstab(income.Index,income.State)


Creating a frequency distribution

income.Index selects the 'Index' column of 'income' dataset and value_counts( ) creates a frequency distribution. By default ascending = False i.e. it will show the 'Index' having the maximum frequency on the top.

income.Index.value_counts(ascending = True)

F    1
G 1
U 1
L 1
H 1
P 1
R 1
D 2
T 2
S 2
V 2
K 2
O 3
C 3
I 4
W 4
A 4
M 8
N 8
Name: Index, dtype: int64

To draw the samples
income.sample( ) is used to draw random samples from the dataset containing all the columns. Here n = 5 depicts we need 5 columns and frac = 0.1 tells that we need 10 percent of the data as my sample.

income.sample(n = 5)
income.sample(frac = 0.1)

Selecting only a few of the columns
To select only a specific columns we use either loc[ ] or iloc[ ] functions. The index or columns to be selected are passed as lists. "Index":"Y2008" denotes the that all the columns from Index to Y2008 are to be selected.

Syntax of df.loc[ ]
df.loc[row_index , column_index]

income.loc[:,["Index","State","Y2008"]]
income.loc[0:2,["Index","State","Y2008"]] #Selecting rows with Index label 0 to 2 & columns
income.loc[:,"Index":"Y2008"] #Selecting consecutive columns
#In the above command both Index and Y2008 are included.
income.iloc[:,0:5] #Columns from 1 to 5 are included. 6th column not included

Difference between loc and iloc

loc considers rows (or columns) with particular labels from the index. Whereas iloc considers rows (or columns) at particular positions in the index so it only takes integers.

x = pd.DataFrame({"var1" : np.arange(1,20,2)}, index=[9,8,7,6,10, 1, 2, 3, 4, 5])

    var1
9 1
8 3
7 5
6 7
10 9
1 11
2 13
3 15
4 17
5 19
iloc Code
x.iloc[:3]

Output:
var1
9 1
8 3
7 5
loc code
x.loc[:3]

Output:
var1
9 1
8 3
7 5
6 7
10 9
1 11
2 13
3 15
You can also use the following syntax to select specific variables.

income[["Index","State","Y2008"]]


Renaming the variables
We create a dataframe 'data' for information of people and their respective zodiac signs.

data = pd.DataFrame({"A" : ["John","Mary","Julia","Kenny","Henry"], "B" : ["Libra","Capricorn","Aries","Scorpio","Aquarius"]})
data

       A          B
0 John Libra
1 Mary Capricorn
2 Julia Aries
3 Kenny Scorpio
4 Henry Aquarius
If all the columns are to be renamed then we can use data.columns and assign the list of new column names.

#Renaming all the variables.
data.columns = ['Names','Zodiac Signs']


   Names Zodiac Signs
0 John Libra
1 Mary Capricorn
2 Julia Aries
3 Kenny Scorpio
4 Henry Aquarius
If only some of the variables are to be renamed then we can use rename( ) function where the new names are passed in the form of a dictionary.

#Renaming only some of the variables.
data.rename(columns = {"Names":"Cust_Name"},inplace = True)

  Cust_Name Zodiac Signs
0 John Libra
1 Mary Capricorn
2 Julia Aries
3 Kenny Scorpio
4 Henry Aquarius
By default in pandas inplace = False which means that no changes are made in the original dataset. Thus if we wish to alter the original dataset we need to define inplace = True.

Suppose we want to replace only a particular character in the list of the column names then we can use str.replace( ) function. For example, renaming the variables which contain "Y" as "Year"

income.columns = income.columns.str.replace('Y' , 'Year ')
income.columns

Index(['Index', 'State', 'Year 2002', 'Year 2003', 'Year 2004', 'Year 2005',
'Year 2006', 'Year 2007', 'Year 2008', 'Year 2009', 'Year 2010',
'Year 2011', 'Year 2012', 'Year 2013', 'Year 2014', 'Year 2015'],
dtype='object')

Setting one column in the data frame as the index
Using set_index("column name") we can set the indices as that column and that column gets removed.

income.set_index("Index",inplace = True)
income.head()
#Note that the indices have changed and Index column is now no more a column
income.columns
income.reset_index(inplace = True)
income.head()

reset_index( ) tells us that one should use the by default indices.

Removing the columns and rows
To drop a column we use drop( ) where the first argument is a list of columns to be removed.

By default axis = 0 which means the operation should take place horizontally, row wise. To remove a column we need to set axis = 1.

income.drop('Index',axis = 1)

#Alternatively
income.drop("Index",axis = "columns")
income.drop(['Index','State'],axis = 1)
income.drop(0,axis = 0)
income.drop(0,axis = "index")
income.drop([0,1,2,3],axis = 0)

Also inplace = False by default thus no alterations are made in the original dataset. axis = "columns" and axis = "index" means the column and row(index) should be removed respectively.

Sorting the data
To sort the data sort_values( ) function is deployed. By default inplace = False and ascending = True.

income.sort_values("State",ascending = False)
income.sort_values("State",ascending = False,inplace = True)
income.Y2006.sort_values()

We have got duplicated for Index thus we need to sort the dataframe firstly by Index and then for each particular index we sort the values by Y2002

income.sort_values(["Index","Y2002"])


Create new variables
Using eval( ) arithmetic operations on various columns can be carried out in a dataset.

income["difference"] = income.Y2008-income.Y2009

#Alternatively
income["difference2"] = income.eval("Y2008 - Y2009")
income.head()

  Index       State    Y2002    Y2003    Y2004    Y2005    Y2006    Y2007  \
0 A Alabama 1296530 1317711 1118631 1492583 1107408 1440134
1 A Alaska 1170302 1960378 1818085 1447852 1861639 1465841
2 A Arizona 1742027 1968140 1377583 1782199 1102568 1109382
3 A Arkansas 1485531 1994927 1119299 1947979 1669191 1801213
4 C California 1685349 1675807 1889570 1480280 1735069 1812546

Y2008 Y2009 Y2010 Y2011 Y2012 Y2013 Y2014 Y2015 \
0 1945229.0 1944173 1237582 1440756 1186741 1852841 1558906 1916661
1 1551826.0 1436541 1629616 1230866 1512804 1985302 1580394 1979143
2 1752886.0 1554330 1300521 1130709 1907284 1363279 1525866 1647724
3 1188104.0 1628980 1669295 1928238 1216675 1591896 1360959 1329341
4 1487315.0 1663809 1624509 1639670 1921845 1156536 1388461 1644607

difference difference2
0 1056.0 1056.0
1 115285.0 115285.0
2 198556.0 198556.0
3 -440876.0 -440876.0
4 -176494.0 -176494.0

income.ratio = income.Y2008/income.Y2009

The above command does not work, thus to create new columns we need to use square brackets.
We can also use assign( ) function but this command does not make changes in the original data as there is no inplace parameter. Hence we need to save it in a new dataset.

data = income.assign(ratio = (income.Y2008 / income.Y2009))
data.head()


Finding Descriptive Statistics
describe( ) is used to find some statistics like mean,minimum, quartiles etc. for numeric variables.

income.describe() #for numeric variables

              Y2002         Y2003         Y2004         Y2005         Y2006  \
count 5.100000e+01 5.100000e+01 5.100000e+01 5.100000e+01 5.100000e+01
mean 1.566034e+06 1.509193e+06 1.540555e+06 1.522064e+06 1.530969e+06
std 2.464425e+05 2.641092e+05 2.813872e+05 2.671748e+05 2.505603e+05
min 1.111437e+06 1.110625e+06 1.118631e+06 1.122030e+06 1.102568e+06
25% 1.374180e+06 1.292390e+06 1.268292e+06 1.267340e+06 1.337236e+06
50% 1.584734e+06 1.485909e+06 1.522230e+06 1.480280e+06 1.531641e+06
75% 1.776054e+06 1.686698e+06 1.808109e+06 1.778170e+06 1.732259e+06
max 1.983285e+06 1.994927e+06 1.979395e+06 1.990062e+06 1.985692e+06

Y2007 Y2008 Y2009 Y2010 Y2011 \
count 5.100000e+01 5.100000e+01 5.100000e+01 5.100000e+01 5.100000e+01
mean 1.553219e+06 1.538398e+06 1.658519e+06 1.504108e+06 1.574968e+06
std 2.539575e+05 2.958132e+05 2.361854e+05 2.400771e+05 2.657216e+05
min 1.109382e+06 1.112765e+06 1.116168e+06 1.103794e+06 1.116203e+06
25% 1.322419e+06 1.254244e+06 1.553958e+06 1.328439e+06 1.371730e+06
50% 1.563062e+06 1.545621e+06 1.658551e+06 1.498662e+06 1.575533e+06
75% 1.780589e+06 1.779538e+06 1.857746e+06 1.639186e+06 1.807766e+06
max 1.983568e+06 1.990431e+06 1.993136e+06 1.999102e+06 1.992996e+06

Y2012 Y2013 Y2014 Y2015
count 5.100000e+01 5.100000e+01 5.100000e+01 5.100000e+01
mean 1.591135e+06 1.530078e+06 1.583360e+06 1.588297e+06
std 2.837675e+05 2.827299e+05 2.601554e+05 2.743807e+05
min 1.108281e+06 1.100990e+06 1.110394e+06 1.110655e+06
25% 1.360654e+06 1.285738e+06 1.385703e+06 1.372523e+06
50% 1.643855e+06 1.531212e+06 1.580394e+06 1.627508e+06
75% 1.866322e+06 1.725377e+06 1.791594e+06 1.848316e+06
max 1.988270e+06 1.994022e+06 1.990412e+06 1.996005e+06
For character or string variables, you can write include = ['object']. It will return total count, maximum occurring string and its frequency

income.describe(include = ['object']) #Only for strings / objects

To find out specific descriptive statistics of each column of data frame

income.mean()
income.median()
income.agg(["mean","median"])


Mean, median, maximum and minimum can be obtained for a particular column(s) as:

income.Y2008.mean()
income.Y2008.median()
income.Y2008.min()
income.loc[:,["Y2002","Y2008"]].max()


GroupBy function

To group the data by a categorical variable we use groupby( ) function and hence we can do the operations on each category.agg( ) function is used to aggregate the data.

The following command finds minimum and maximum values for Y2002 and only mean for Y2003

income.groupby("Index").agg({"Y2002": ["min","max"],"Y2003" : "mean"})

          Y2002                 Y2003
min max mean
Index
A 1170302 1742027 1810289.000
C 1343824 1685349 1595708.000
D 1111437 1330403 1631207.000
F 1964626 1964626 1468852.000
G 1929009 1929009 1541565.000
H 1461570 1461570 1200280.000
I 1353210 1776918 1536164.500
K 1509054 1813878 1369773.000
L 1584734 1584734 1110625.000
M 1221316 1983285 1535717.625
N 1395149 1885081 1382499.625
O 1173918 1802132 1569934.000
P 1320191 1320191 1446723.000
R 1501744 1501744 1942942.000
S 1159037 1631522 1477072.000
T 1520591 1811867 1398343.000
U 1771096 1771096 1195861.000
V 1134317 1146902 1498122.500
W 1677347 1977749 1521118.500
In order to rename the columns after groupby, you can use tuple. See the code below.

income.groupby("Index").agg({"Y2002" : [("Y2002_min","min"),("Y2002_max","max")],
"Y2003" : [("Y2003_mean","mean")]})
Renaming columns can also be done via the method below.

dt = income.groupby("Index").agg({"Y2002": ["min","max"],"Y2003" : "mean"})
dt.columns = ['Y2002_min', 'Y2002_max', 'Y2003_mean']
Groupby more than 1 column

income.groupby(["Index", "State"]).agg({"Y2002": ["min","max"],"Y2003" : "mean"})

Filtering
To filter only those rows which have Index as "A" we write:

income[income.Index == "A"]

#Alternatively
income.loc[income.Index == "A",:]

  Index     State    Y2002    Y2003    Y2004    Y2005    Y2006    Y2007  \
0 A Alabama 1296530 1317711 1118631 1492583 1107408 1440134
1 A Alaska 1170302 1960378 1818085 1447852 1861639 1465841
2 A Arizona 1742027 1968140 1377583 1782199 1102568 1109382
3 A Arkansas 1485531 1994927 1119299 1947979 1669191 1801213

Y2008 Y2009 Y2010 Y2011 Y2012 Y2013 Y2014 Y2015
0 1945229 1944173 1237582 1440756 1186741 1852841 1558906 1916661
1 1551826 1436541 1629616 1230866 1512804 1985302 1580394 1979143
2 1752886 1554330 1300521 1130709 1907284 1363279 1525866 1647724
3 1188104 1628980 1669295 1928238 1216675 1591896 1360959 1329341
To select the States having Index as "A":

income.loc[income.Index == "A","State"]
income.loc[income.Index == "A",:].State

To filter the rows with Index as "A" and income for 2002 > 1500000"

income.loc[(income.Index == "A") & (income.Y2002 > 1500000),:]

To filter the rows with index either "A" or "W", we can use isin( ) function:

income.loc[(income.Index == "A") | (income.Index == "W"),:]

#Alternatively.
income.loc[income.Index.isin(["A","W"]),:]

   Index          State    Y2002    Y2003    Y2004    Y2005    Y2006    Y2007  \
0 A Alabama 1296530 1317711 1118631 1492583 1107408 1440134
1 A Alaska 1170302 1960378 1818085 1447852 1861639 1465841
2 A Arizona 1742027 1968140 1377583 1782199 1102568 1109382
3 A Arkansas 1485531 1994927 1119299 1947979 1669191 1801213
47 W Washington 1977749 1687136 1199490 1163092 1334864 1621989
48 W West Virginia 1677347 1380662 1176100 1888948 1922085 1740826
49 W Wisconsin 1788920 1518578 1289663 1436888 1251678 1721874
50 W Wyoming 1775190 1498098 1198212 1881688 1750527 1523124

Y2008 Y2009 Y2010 Y2011 Y2012 Y2013 Y2014 Y2015
0 1945229 1944173 1237582 1440756 1186741 1852841 1558906 1916661
1 1551826 1436541 1629616 1230866 1512804 1985302 1580394 1979143
2 1752886 1554330 1300521 1130709 1907284 1363279 1525866 1647724
3 1188104 1628980 1669295 1928238 1216675 1591896 1360959 1329341
47 1545621 1555554 1179331 1150089 1775787 1273834 1387428 1377341
48 1238174 1539322 1539603 1872519 1462137 1683127 1204344 1198791
49 1980167 1901394 1648755 1940943 1729177 1510119 1701650 1846238
50 1587602 1504455 1282142 1881814 1673668 1994022 1204029 1853858
Alternatively we can use query( ) function and write our filtering criteria:

income.query('Y2002>1700000 & Y2003 > 1500000')


Dealing with missing values
We create a new dataframe named 'crops' and to create a NaN value we use np.nan by importing numpy.

import numpy as np
mydata = {'Crop': ['Rice', 'Wheat', 'Barley', 'Maize'],
'Yield': [1010, 1025.2, 1404.2, 1251.7],
'cost' : [102, np.nan, 20, 68]}
crops = pd.DataFrame(mydata)
crops

isnull( ) returns True and notnull( ) returns False if the value is NaN.

crops.isnull() #same as is.na in R
crops.notnull() #opposite of previous command.
crops.isnull().sum() #No. of missing values.

crops.cost.isnull() firstly subsets the 'cost' from the dataframe and returns a logical vector with isnull()

crops[crops.cost.isnull()] #shows the rows with NAs.
crops[crops.cost.isnull()].Crop #shows the rows with NAs in crops.Crop
crops[crops.cost.notnull()].Crop #shows the rows without NAs in crops.Crop

To drop all the rows which have missing values in any rows we use dropna(how = "any") . By default inplace = False . If how = "all" means drop a row if all the elements in that row are missing

crops.dropna(how = "any").shape
crops.dropna(how = "all").shape

To remove NaNs if any of 'Yield' or'cost' are missing we use the subset parameter and pass a list:

crops.dropna(subset = ['Yield',"cost"],how = 'any').shape
crops.dropna(subset = ['Yield',"cost"],how = 'all').shape

Replacing the missing values by "UNKNOWN" sub attribute in Column name.

crops['cost'].fillna(value = "UNKNOWN",inplace = True)
crops


Dealing with duplicates
We create a new dataframe comprising of items and their respective prices.

data = pd.DataFrame({"Items" : ["TV","Washing Machine","Mobile","TV","TV","Washing Machine"], "Price" : [10000,50000,20000,10000,10000,40000]})
data

             Items  Price
0 TV 10000
1 Washing Machine 50000
2 Mobile 20000
3 TV 10000
4 TV 10000
5 Washing Machine 40000
duplicated() returns a logical vector returning True when encounters duplicated.

data.loc[data.duplicated(),:]
data.loc[data.duplicated(keep = "first"),:]

By default keep = 'first' i.e. the first occurence is considered a unique value and its repetitions are considered as duplicates.
If keep = "last" the last occurence is considered a unique value and all its repetitions are considered as duplicates.

data.loc[data.duplicated(keep = "last"),:] #last entries are not there,indices have changed.

If keep = "False" then it considers all the occurences of the repeated observations as duplicates.

data.loc[data.duplicated(keep = False),:] #all the duplicates, including unique are shown.

To drop the duplicates drop_duplicates is used with default inplace = False, keep = 'first' or 'last' or 'False' have the respective meanings as in duplicated( )

data.drop_duplicates(keep = "first")
data.drop_duplicates(keep = "last")
data.drop_duplicates(keep = False,inplace = True) #by default inplace = False
data


Creating dummies
Now we will consider the iris dataset.

iris = pd.read_csv("C:\\Users\\Hp\\Desktop\\work\\Python\\Basics\\pandas\\iris.csv")
iris.head()

   Sepal.Length  Sepal.Width  Petal.Length  Petal.Width Species
0 5.1 3.5 1.4 0.2 setosa
1 4.9 3.0 1.4 0.2 setosa
2 4.7 3.2 1.3 0.2 setosa
3 4.6 3.1 1.5 0.2 setosa
4 5.0 3.6 1.4 0.2 setosa
map( ) function is used to match the values and replace them in the new series automatically created.

iris["setosa"] = iris.Species.map({"setosa" : 1,"versicolor":0, "virginica" : 0})
iris.head()

To create dummies get_dummies( ) is used. iris.Species.prefix = "Species" adds a prefix ' Species' to the new series created.

pd.get_dummies(iris.Species,prefix = "Species")
pd.get_dummies(iris.Species,prefix = "Species").iloc[:,0:1] #1 is not included
species_dummies = pd.get_dummies(iris.Species,prefix = "Species").iloc[:,0:]

With concat( ) function we can join multiple series or dataframes. axis = 1 denotes that they should be joined columnwise.

iris = pd.concat([iris,species_dummies],axis = 1)
iris.head()

   Sepal.Length  Sepal.Width  Petal.Length  Petal.Width Species  \
0 5.1 3.5 1.4 0.2 setosa
1 4.9 3.0 1.4 0.2 setosa
2 4.7 3.2 1.3 0.2 setosa
3 4.6 3.1 1.5 0.2 setosa
4 5.0 3.6 1.4 0.2 setosa

Species_setosa Species_versicolor Species_virginica
0 1 0 0
1 1 0 0
2 1 0 0
3 1 0 0
4 1 0 0
It is usual that for a variable with 'n' categories we creat 'n-1' dummies, thus to drop the first 'dummy' column we write drop_first = True

pd.get_dummies(iris,columns = ["Species"],drop_first = True).head()


Ranking
To create a dataframe of all the ranks we use rank( )

iris.rank()

Ranking by a specific variable
Suppose we want to rank the Sepal.Length for different species in ascending order:

iris['Rank2'] = iris['Sepal.Length'].groupby(iris["Species"]).rank(ascending=1)
iris.head()


Calculating the Cumulative sum
Using cumsum( ) function we can obtain the cumulative sum

iris['cum_sum'] = iris["Sepal.Length"].cumsum()
iris.head()

Cumulative sum by a variable
To find the cumulative sum of sepal lengths for different species we use groupby( ) and then use cumsum( )

iris["cumsum2"] = iris.groupby(["Species"])["Sepal.Length"].cumsum()
iris.head()


Calculating the percentiles.
Various quantiles can be obtained by using quantile( )

iris.quantile(0.5)
iris.quantile([0.1,0.2,0.5])
iris.quantile(0.55)


if else in Python
We create a new dataframe of students' name and their respective zodiac signs.

students = pd.DataFrame({'Names': ['John','Mary','Henry','Augustus','Kenny'],
'Zodiac Signs': ['Aquarius','Libra','Gemini','Pisces','Virgo']})

def name(row):
if row["Names"] in ["John","Henry"]:
return "yes"
else:
return "no"

students['flag'] = students.apply(name, axis=1)
students
Functions in python are defined using the block keyword def , followed with the function's name as the block's name. apply( ) function applies function along rows or columns of dataframe.

Note :If using simple 'if else' we need to take care of the indentation . Python does not involve curly braces for the loops and if else.

Output
      Names Zodiac Signs flag
0 John Aquarius yes
1 Mary Libra no
2 Henry Gemini yes
3 Augustus Pisces no
4 Kenny Virgo no

Alternatively, By importing numpy we can use np.where. The first argument is the condition to be evaluated, 2nd argument is the value if condition is True and last argument defines the value if the condition evaluated returns False.

import numpy as np
students['flag'] = np.where(students['Names'].isin(['John','Henry']), 'yes', 'no')
students


Multiple Conditions : If Else-if Else
def mname(row):
if row["Names"] == "John" and row["Zodiac Signs"] == "Aquarius" :
return "yellow"
elif row["Names"] == "Mary" and row["Zodiac Signs"] == "Libra" :
return "blue"
elif row["Zodiac Signs"] == "Pisces" :
return "blue"
else:
return "black"

students['color'] = students.apply(mname, axis=1)
students

We create a list of conditions and their respective values if evaluated True and use np.select where default value is the value if all the conditions is False

conditions = [
(students['Names'] == 'John') & (students['Zodiac Signs'] == 'Aquarius'),
(students['Names'] == 'Mary') & (students['Zodiac Signs'] == 'Libra'),
(students['Zodiac Signs'] == 'Pisces')]
choices = ['yellow', 'blue', 'purple']
students['color'] = np.select(conditions, choices, default='black')
students

      Names Zodiac Signs flag   color
0 John Aquarius yes yellow
1 Mary Libra no blue
2 Henry Gemini yes black
3 Augustus Pisces no purple
4 Kenny Virgo no black

Select numeric or categorical columns only
To include numeric columns we use select_dtypes( )

data1 = iris.select_dtypes(include=[np.number])
data1.head()

_get_numeric_data also provides utility to select the numeric columns only.

data3 = iris._get_numeric_data()
data3.head(3)

   Sepal.Length  Sepal.Width  Petal.Length  Petal.Width  cum_sum  cumsum2
0 5.1 3.5 1.4 0.2 5.1 5.1
1 4.9 3.0 1.4 0.2 10.0 10.0
2 4.7 3.2 1.3 0.2 14.7 14.7
For selecting categorical variables

data4 = iris.select_dtypes(include = ['object'])
data4.head(2)

 Species
0 setosa
1 setosa

Concatenating
We create 2 dataframes containing the details of the students:

students = pd.DataFrame({'Names': ['John','Mary','Henry','Augustus','Kenny'],
'Zodiac Signs': ['Aquarius','Libra','Gemini','Pisces','Virgo']})
students2 = pd.DataFrame({'Names': ['John','Mary','Henry','Augustus','Kenny'],
'Marks' : [50,81,98,25,35]})

using pd.concat( ) function we can join the 2 dataframes:

data = pd.concat([students,students2]) #by default axis = 0

   Marks     Names Zodiac Signs
0 NaN John Aquarius
1 NaN Mary Libra
2 NaN Henry Gemini
3 NaN Augustus Pisces
4 NaN Kenny Virgo
0 50.0 John NaN
1 81.0 Mary NaN
2 98.0 Henry NaN
3 25.0 Augustus NaN
4 35.0 Kenny NaN
By default axis = 0 thus the new dataframe will be added row-wise. If a column is not present then in one of the dataframes it creates NaNs. To join column wise we set axis = 1

data = pd.concat([students,students2],axis = 1)
data

      Names Zodiac Signs  Marks     Names
0 John Aquarius 50 John
1 Mary Libra 81 Mary
2 Henry Gemini 98 Henry
3 Augustus Pisces 25 Augustus
4 Kenny Virgo 35 Kenny
Using append function we can join the dataframes row-wise

students.append(students2) #for rows

Alternatively we can create a dictionary of the two data frames and can use pd.concat to join the dataframes row wise

classes = {'x': students, 'y': students2}
result = pd.concat(classes)
result

     Marks     Names Zodiac Signs
x 0 NaN John Aquarius
1 NaN Mary Libra
2 NaN Henry Gemini
3 NaN Augustus Pisces
4 NaN Kenny Virgo
y 0 50.0 John NaN
1 81.0 Mary NaN
2 98.0 Henry NaN
3 25.0 Augustus NaN
4 35.0 Kenny NaN

Merging or joining on the basis of common variable.
We take 2 dataframes with different number of observations:

students = pd.DataFrame({'Names': ['John','Mary','Henry','Maria'],
'Zodiac Signs': ['Aquarius','Libra','Gemini','Capricorn']})
students2 = pd.DataFrame({'Names': ['John','Mary','Henry','Augustus','Kenny'],
'Marks' : [50,81,98,25,35]})

Using pd.merge we can join the two dataframes. on = 'Names' denotes the common variable on the basis of which the dataframes are to be combined is 'Names'

result = pd.merge(students, students2, on='Names') #it only takes intersections
result

   Names Zodiac Signs  Marks
0 John Aquarius 50
1 Mary Libra 81
2 Henry Gemini 98
By default how = "inner" thus it takes only the common elements in both the dataframes. If you want all the elements in both the dataframes set how = "outer"

result = pd.merge(students, students2, on='Names',how = "outer") #it only takes unions
result

      Names Zodiac Signs  Marks
0 John Aquarius 50.0
1 Mary Libra 81.0
2 Henry Gemini 98.0
3 Maria Capricorn NaN
4 Augustus NaN 25.0
5 Kenny NaN 35.0
To take only intersections and all the values in left df set how = 'left'

result = pd.merge(students, students2, on='Names',how = "left")
result

   Names Zodiac Signs  Marks
0 John Aquarius 50.0
1 Mary Libra 81.0
2 Henry Gemini 98.0
3 Maria Capricorn NaN
Similarly how = 'right' takes only intersections and all the values in right df.

result = pd.merge(students, students2, on='Names',how = "right",indicator = True)
result

      Names Zodiac Signs  Marks      _merge
0 John Aquarius 50 both
1 Mary Libra 81 both
2 Henry Gemini 98 both
3 Augustus NaN 25 right_only
4 Kenny NaN 35 right_only

indicator = True creates a column for indicating that whether the values are present in both the dataframes or either left or right dataframe.

About Author:

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 7 years of experience in data science and predictive modeling. During his tenure, he has worked with global clients in various domains.

Let's Get Connected: LinkedIn

22 Apr 2019 5:07am GMT

ListenData: Pandas Python Tutorial - Learn by Examples

Pandas being one of the most popular package in Python is widely used for data manipulation. It is a very powerful and versatile package which makes data cleaning and wrangling much easier and pleasant.

The Pandas library has a great contribution to the python community and it makes python as one of the top programming language for data science and analytics. It has become first choice of data analysts and scientists for data analysis and manipulation.

Data Analysis with Python : Pandas Step by Step Guide

Why pandas?
It has many functions which are the essence for data handling. In short, it can perform the following tasks for you -
  1. Create a structured data set similar to R's data frame and Excel spreadsheet.
  2. Reading data from various sources such as CSV, TXT, XLSX, SQL database, R etc.
  3. Selecting particular rows or columns from data set
  4. Arranging data in ascending or descending order
  5. Filtering data based on some conditions
  6. Summarizing data by classification variable
  7. Reshape data into wide or long format
  8. Time series analysis
  9. Merging and concatenating two datasets
  10. Iterate over the rows of dataset
  11. Writing or Exporting data in CSV or Excel format

Datasets:

In this tutorial we will use two datasets: 'income' and 'iris'.
  1. 'income' data : This data contains the income of various states from 2002 to 2015. The dataset contains 51 observations and 16 variables. Download link
  2. 'iris' data: It comprises of 150 observations with 5 variables. We have 3 species of flowers(50 flowers for each specie) and for all of them the sepal length and width and petal length and width are given. Download link


Important pandas functions to remember

The following is a list of common tasks along with pandas functions.
Utility Functions
Extract Column Names df.columns
Select first 2 rows df.iloc[:2]
Select first 2 columns df.iloc[:,:2]
Select columns by name df.loc[:,["col1","col2"]]
Select random no. of rows df.sample(n = 10)
Select fraction of random rows df.sample(frac = 0.2)
Rename the variables df.rename( )
Selecting a column as index df.set_index( )
Removing rows or columns df.drop( )
Sorting values df.sort_values( )
Grouping variables df.groupby( )
Filtering df.query( )
Finding the missing values df.isnull( )
Dropping the missing values df.dropna( )
Removing the duplicates df.drop_duplicates( )
Creating dummies pd.get_dummies( )
Ranking df.rank( )
Cumulative sum df.cumsum( )
Quantiles df.quantile( )
Selecting numeric variables df.select_dtypes( )
Concatenating two dataframes pd.concat()
Merging on basis of common variable pd.merge( )

Importing pandas library

You need to import or load the Pandas library first in order to use it. By "Importing a library", it means loading it into the memory and then you can use it. Run the following code to import pandas library:

import pandas as pd

The "pd" is an alias or abbreviation which will be used as a shortcut to access or call pandas functions. To access the functions from pandas library, you just need to type pd.function instead of pandas.function every time you need to apply it.

Importing Dataset

To read or import data from CSV file, you can use read_csv() function. In the function, you need to specify the file location of your CSV file.

income = pd.read_csv("C:\\Users\\Hp\\Python\\Basics\\income.csv")

 Index       State    Y2002    Y2003    Y2004    Y2005    Y2006    Y2007  \
0 A Alabama 1296530 1317711 1118631 1492583 1107408 1440134
1 A Alaska 1170302 1960378 1818085 1447852 1861639 1465841
2 A Arizona 1742027 1968140 1377583 1782199 1102568 1109382
3 A Arkansas 1485531 1994927 1119299 1947979 1669191 1801213
4 C California 1685349 1675807 1889570 1480280 1735069 1812546

Y2008 Y2009 Y2010 Y2011 Y2012 Y2013 Y2014 Y2015
0 1945229 1944173 1237582 1440756 1186741 1852841 1558906 1916661
1 1551826 1436541 1629616 1230866 1512804 1985302 1580394 1979143
2 1752886 1554330 1300521 1130709 1907284 1363279 1525866 1647724
3 1188104 1628980 1669295 1928238 1216675 1591896 1360959 1329341
4 1487315 1663809 1624509 1639670 1921845 1156536 1388461 1644607

Get Variable Names

By using income.columnscommand, you can fetch the names of variables of a data frame.
Index(['Index', 'State', 'Y2002', 'Y2003', 'Y2004', 'Y2005', 'Y2006', 'Y2007',
'Y2008', 'Y2009', 'Y2010', 'Y2011', 'Y2012', 'Y2013', 'Y2014', 'Y2015'],
dtype='object')
income.columns[0:2] returns first two column names 'Index', 'State'. In python, indexing starts from 0.

Knowing the Variable types

You can use the dataFrameName.dtypes command to extract the information of types of variables stored in the data frame.

income.dtypes

Index    object
State object
Y2002 int64
Y2003 int64
Y2004 int64
Y2005 int64
Y2006 int64
Y2007 int64
Y2008 int64
Y2009 int64
Y2010 int64
Y2011 int64
Y2012 int64
Y2013 int64
Y2014 int64
Y2015 int64
dtype: object

Here 'object' means strings or character variables. 'int64' refers to numeric variables (without decimals).

To see the variable type of one variable (let's say "State") instead of all the variables, you can use the command below -

income['State'].dtypes

It returns dtype('O'). In this case, 'O' refers to object i.e. type of variable as character.

Changing the data types

Y2008 is an integer. Suppose we want to convert it to float (numeric variable with decimals) we can write:

income.Y2008 = income.Y2008.astype(float)
income.dtypes

Index     object
State object
Y2002 int64
Y2003 int64
Y2004 int64
Y2005 int64
Y2006 int64
Y2007 int64
Y2008 float64
Y2009 int64
Y2010 int64
Y2011 int64
Y2012 int64
Y2013 int64
Y2014 int64
Y2015 int64
dtype: object

To view the dimensions or shape of the data

income.shape

 (51, 16)

51 is the number of rows and 16 is the number of columns.

You can also use shape[0] to see the number of rows (similar to nrow() in R) and shape[1] for number of columns (similar to ncol() in R).

income.shape[0]
income.shape[1]


To view only some of the rows

By default head( ) shows first 5 rows. If we want to see a specific number of rows we can mention it in the parenthesis. Similarly tail( ) function shows last 5 rows by default.

income.head()
income.head(2) #shows first 2 rows.
income.tail()
income.tail(2) #shows last 2 rows


Alternatively, any of the following commands can be used to fetch first five rows.
income[0:5]
income.iloc[0:5]

Define Categorical Variable

Like factors() function in R, we can include categorical variable in python using "category" dtype.

s = pd.Series([1,2,3,1,2], dtype="category")
s

0    1
1 2
2 3
3 1
4 2
dtype: category
Categories (3, int64): [1, 2, 3]

Extract Unique Values

The unique() function shows the unique levels or categories in the dataset.

income.Index.unique()

array(['A', 'C', 'D', ..., 'U', 'V', 'W'], dtype=object)


The nunique( ) shows the number of unique values.

income.Index.nunique()

It returns 19 as index column contains distinct 19 values.

Generate Cross Tab

pd.crosstab( ) is used to create a bivariate frequency distribution. Here the bivariate frequency distribution is between Index and State columns.

pd.crosstab(income.Index,income.State)


Creating a frequency distribution

income.Index selects the 'Index' column of 'income' dataset and value_counts( ) creates a frequency distribution. By default ascending = False i.e. it will show the 'Index' having the maximum frequency on the top.

income.Index.value_counts(ascending = True)

F    1
G 1
U 1
L 1
H 1
P 1
R 1
D 2
T 2
S 2
V 2
K 2
O 3
C 3
I 4
W 4
A 4
M 8
N 8
Name: Index, dtype: int64

To draw the samples
income.sample( ) is used to draw random samples from the dataset containing all the columns. Here n = 5 depicts we need 5 columns and frac = 0.1 tells that we need 10 percent of the data as my sample.

income.sample(n = 5)
income.sample(frac = 0.1)

Selecting only a few of the columns
To select only a specific columns we use either loc[ ] or iloc[ ] functions. The index or columns to be selected are passed as lists. "Index":"Y2008" denotes the that all the columns from Index to Y2008 are to be selected.

Syntax of df.loc[ ]
df.loc[row_index , column_index]

income.loc[:,["Index","State","Y2008"]]
income.loc[0:2,["Index","State","Y2008"]] #Selecting rows with Index label 0 to 2 & columns
income.loc[:,"Index":"Y2008"] #Selecting consecutive columns
#In the above command both Index and Y2008 are included.
income.iloc[:,0:5] #Columns from 1 to 5 are included. 6th column not included

Difference between loc and iloc

loc considers rows (or columns) with particular labels from the index. Whereas iloc considers rows (or columns) at particular positions in the index so it only takes integers.

x = pd.DataFrame({"var1" : np.arange(1,20,2)}, index=[9,8,7,6,10, 1, 2, 3, 4, 5])

    var1
9 1
8 3
7 5
6 7
10 9
1 11
2 13
3 15
4 17
5 19
iloc Code
x.iloc[:3]

Output:
var1
9 1
8 3
7 5
loc code
x.loc[:3]

Output:
var1
9 1
8 3
7 5
6 7
10 9
1 11
2 13
3 15
You can also use the following syntax to select specific variables.

income[["Index","State","Y2008"]]


Renaming the variables
We create a dataframe 'data' for information of people and their respective zodiac signs.

data = pd.DataFrame({"A" : ["John","Mary","Julia","Kenny","Henry"], "B" : ["Libra","Capricorn","Aries","Scorpio","Aquarius"]})
data

       A          B
0 John Libra
1 Mary Capricorn
2 Julia Aries
3 Kenny Scorpio
4 Henry Aquarius
If all the columns are to be renamed then we can use data.columns and assign the list of new column names.

#Renaming all the variables.
data.columns = ['Names','Zodiac Signs']


   Names Zodiac Signs
0 John Libra
1 Mary Capricorn
2 Julia Aries
3 Kenny Scorpio
4 Henry Aquarius
If only some of the variables are to be renamed then we can use rename( ) function where the new names are passed in the form of a dictionary.

#Renaming only some of the variables.
data.rename(columns = {"Names":"Cust_Name"},inplace = True)

  Cust_Name Zodiac Signs
0 John Libra
1 Mary Capricorn
2 Julia Aries
3 Kenny Scorpio
4 Henry Aquarius
By default in pandas inplace = False which means that no changes are made in the original dataset. Thus if we wish to alter the original dataset we need to define inplace = True.

Suppose we want to replace only a particular character in the list of the column names then we can use str.replace( ) function. For example, renaming the variables which contain "Y" as "Year"

income.columns = income.columns.str.replace('Y' , 'Year ')
income.columns

Index(['Index', 'State', 'Year 2002', 'Year 2003', 'Year 2004', 'Year 2005',
'Year 2006', 'Year 2007', 'Year 2008', 'Year 2009', 'Year 2010',
'Year 2011', 'Year 2012', 'Year 2013', 'Year 2014', 'Year 2015'],
dtype='object')

Setting one column in the data frame as the index
Using set_index("column name") we can set the indices as that column and that column gets removed.

income.set_index("Index",inplace = True)
income.head()
#Note that the indices have changed and Index column is now no more a column
income.columns
income.reset_index(inplace = True)
income.head()

reset_index( ) tells us that one should use the by default indices.

Removing the columns and rows
To drop a column we use drop( ) where the first argument is a list of columns to be removed.

By default axis = 0 which means the operation should take place horizontally, row wise. To remove a column we need to set axis = 1.

income.drop('Index',axis = 1)

#Alternatively
income.drop("Index",axis = "columns")
income.drop(['Index','State'],axis = 1)
income.drop(0,axis = 0)
income.drop(0,axis = "index")
income.drop([0,1,2,3],axis = 0)

Also inplace = False by default thus no alterations are made in the original dataset. axis = "columns" and axis = "index" means the column and row(index) should be removed respectively.

Sorting the data
To sort the data sort_values( ) function is deployed. By default inplace = False and ascending = True.

income.sort_values("State",ascending = False)
income.sort_values("State",ascending = False,inplace = True)
income.Y2006.sort_values()

We have got duplicated for Index thus we need to sort the dataframe firstly by Index and then for each particular index we sort the values by Y2002

income.sort_values(["Index","Y2002"])


Create new variables
Using eval( ) arithmetic operations on various columns can be carried out in a dataset.

income["difference"] = income.Y2008-income.Y2009

#Alternatively
income["difference2"] = income.eval("Y2008 - Y2009")
income.head()

  Index       State    Y2002    Y2003    Y2004    Y2005    Y2006    Y2007  \
0 A Alabama 1296530 1317711 1118631 1492583 1107408 1440134
1 A Alaska 1170302 1960378 1818085 1447852 1861639 1465841
2 A Arizona 1742027 1968140 1377583 1782199 1102568 1109382
3 A Arkansas 1485531 1994927 1119299 1947979 1669191 1801213
4 C California 1685349 1675807 1889570 1480280 1735069 1812546

Y2008 Y2009 Y2010 Y2011 Y2012 Y2013 Y2014 Y2015 \
0 1945229.0 1944173 1237582 1440756 1186741 1852841 1558906 1916661
1 1551826.0 1436541 1629616 1230866 1512804 1985302 1580394 1979143
2 1752886.0 1554330 1300521 1130709 1907284 1363279 1525866 1647724
3 1188104.0 1628980 1669295 1928238 1216675 1591896 1360959 1329341
4 1487315.0 1663809 1624509 1639670 1921845 1156536 1388461 1644607

difference difference2
0 1056.0 1056.0
1 115285.0 115285.0
2 198556.0 198556.0
3 -440876.0 -440876.0
4 -176494.0 -176494.0

income.ratio = income.Y2008/income.Y2009

The above command does not work, thus to create new columns we need to use square brackets.
We can also use assign( ) function but this command does not make changes in the original data as there is no inplace parameter. Hence we need to save it in a new dataset.

data = income.assign(ratio = (income.Y2008 / income.Y2009))
data.head()


Finding Descriptive Statistics
describe( ) is used to find some statistics like mean,minimum, quartiles etc. for numeric variables.

income.describe() #for numeric variables

              Y2002         Y2003         Y2004         Y2005         Y2006  \
count 5.100000e+01 5.100000e+01 5.100000e+01 5.100000e+01 5.100000e+01
mean 1.566034e+06 1.509193e+06 1.540555e+06 1.522064e+06 1.530969e+06
std 2.464425e+05 2.641092e+05 2.813872e+05 2.671748e+05 2.505603e+05
min 1.111437e+06 1.110625e+06 1.118631e+06 1.122030e+06 1.102568e+06
25% 1.374180e+06 1.292390e+06 1.268292e+06 1.267340e+06 1.337236e+06
50% 1.584734e+06 1.485909e+06 1.522230e+06 1.480280e+06 1.531641e+06
75% 1.776054e+06 1.686698e+06 1.808109e+06 1.778170e+06 1.732259e+06
max 1.983285e+06 1.994927e+06 1.979395e+06 1.990062e+06 1.985692e+06

Y2007 Y2008 Y2009 Y2010 Y2011 \
count 5.100000e+01 5.100000e+01 5.100000e+01 5.100000e+01 5.100000e+01
mean 1.553219e+06 1.538398e+06 1.658519e+06 1.504108e+06 1.574968e+06
std 2.539575e+05 2.958132e+05 2.361854e+05 2.400771e+05 2.657216e+05
min 1.109382e+06 1.112765e+06 1.116168e+06 1.103794e+06 1.116203e+06
25% 1.322419e+06 1.254244e+06 1.553958e+06 1.328439e+06 1.371730e+06
50% 1.563062e+06 1.545621e+06 1.658551e+06 1.498662e+06 1.575533e+06
75% 1.780589e+06 1.779538e+06 1.857746e+06 1.639186e+06 1.807766e+06
max 1.983568e+06 1.990431e+06 1.993136e+06 1.999102e+06 1.992996e+06

Y2012 Y2013 Y2014 Y2015
count 5.100000e+01 5.100000e+01 5.100000e+01 5.100000e+01
mean 1.591135e+06 1.530078e+06 1.583360e+06 1.588297e+06
std 2.837675e+05 2.827299e+05 2.601554e+05 2.743807e+05
min 1.108281e+06 1.100990e+06 1.110394e+06 1.110655e+06
25% 1.360654e+06 1.285738e+06 1.385703e+06 1.372523e+06
50% 1.643855e+06 1.531212e+06 1.580394e+06 1.627508e+06
75% 1.866322e+06 1.725377e+06 1.791594e+06 1.848316e+06
max 1.988270e+06 1.994022e+06 1.990412e+06 1.996005e+06
For character or string variables, you can write include = ['object']. It will return total count, maximum occurring string and its frequency

income.describe(include = ['object']) #Only for strings / objects

To find out specific descriptive statistics of each column of data frame

income.mean()
income.median()
income.agg(["mean","median"])


Mean, median, maximum and minimum can be obtained for a particular column(s) as:

income.Y2008.mean()
income.Y2008.median()
income.Y2008.min()
income.loc[:,["Y2002","Y2008"]].max()


GroupBy function

To group the data by a categorical variable we use groupby( ) function and hence we can do the operations on each category.agg( ) function is used to aggregate the data.

The following command finds minimum and maximum values for Y2002 and only mean for Y2003

income.groupby("Index").agg({"Y2002": ["min","max"],"Y2003" : "mean"})

          Y2002                 Y2003
min max mean
Index
A 1170302 1742027 1810289.000
C 1343824 1685349 1595708.000
D 1111437 1330403 1631207.000
F 1964626 1964626 1468852.000
G 1929009 1929009 1541565.000
H 1461570 1461570 1200280.000
I 1353210 1776918 1536164.500
K 1509054 1813878 1369773.000
L 1584734 1584734 1110625.000
M 1221316 1983285 1535717.625
N 1395149 1885081 1382499.625
O 1173918 1802132 1569934.000
P 1320191 1320191 1446723.000
R 1501744 1501744 1942942.000
S 1159037 1631522 1477072.000
T 1520591 1811867 1398343.000
U 1771096 1771096 1195861.000
V 1134317 1146902 1498122.500
W 1677347 1977749 1521118.500
In order to rename the columns after groupby, you can use tuple. See the code below.

income.groupby("Index").agg({"Y2002" : [("Y2002_min","min"),("Y2002_max","max")],
"Y2003" : [("Y2003_mean","mean")]})
Renaming columns can also be done via the method below.

dt = income.groupby("Index").agg({"Y2002": ["min","max"],"Y2003" : "mean"})
dt.columns = ['Y2002_min', 'Y2002_max', 'Y2003_mean']
Groupby more than 1 column

income.groupby(["Index", "State"]).agg({"Y2002": ["min","max"],"Y2003" : "mean"})

Filtering
To filter only those rows which have Index as "A" we write:

income[income.Index == "A"]

#Alternatively
income.loc[income.Index == "A",:]

  Index     State    Y2002    Y2003    Y2004    Y2005    Y2006    Y2007  \
0 A Alabama 1296530 1317711 1118631 1492583 1107408 1440134
1 A Alaska 1170302 1960378 1818085 1447852 1861639 1465841
2 A Arizona 1742027 1968140 1377583 1782199 1102568 1109382
3 A Arkansas 1485531 1994927 1119299 1947979 1669191 1801213

Y2008 Y2009 Y2010 Y2011 Y2012 Y2013 Y2014 Y2015
0 1945229 1944173 1237582 1440756 1186741 1852841 1558906 1916661
1 1551826 1436541 1629616 1230866 1512804 1985302 1580394 1979143
2 1752886 1554330 1300521 1130709 1907284 1363279 1525866 1647724
3 1188104 1628980 1669295 1928238 1216675 1591896 1360959 1329341
To select the States having Index as "A":

income.loc[income.Index == "A","State"]
income.loc[income.Index == "A",:].State

To filter the rows with Index as "A" and income for 2002 > 1500000"

income.loc[(income.Index == "A") & (income.Y2002 > 1500000),:]

To filter the rows with index either "A" or "W", we can use isin( ) function:

income.loc[(income.Index == "A") | (income.Index == "W"),:]

#Alternatively.
income.loc[income.Index.isin(["A","W"]),:]

   Index          State    Y2002    Y2003    Y2004    Y2005    Y2006    Y2007  \
0 A Alabama 1296530 1317711 1118631 1492583 1107408 1440134
1 A Alaska 1170302 1960378 1818085 1447852 1861639 1465841
2 A Arizona 1742027 1968140 1377583 1782199 1102568 1109382
3 A Arkansas 1485531 1994927 1119299 1947979 1669191 1801213
47 W Washington 1977749 1687136 1199490 1163092 1334864 1621989
48 W West Virginia 1677347 1380662 1176100 1888948 1922085 1740826
49 W Wisconsin 1788920 1518578 1289663 1436888 1251678 1721874
50 W Wyoming 1775190 1498098 1198212 1881688 1750527 1523124

Y2008 Y2009 Y2010 Y2011 Y2012 Y2013 Y2014 Y2015
0 1945229 1944173 1237582 1440756 1186741 1852841 1558906 1916661
1 1551826 1436541 1629616 1230866 1512804 1985302 1580394 1979143
2 1752886 1554330 1300521 1130709 1907284 1363279 1525866 1647724
3 1188104 1628980 1669295 1928238 1216675 1591896 1360959 1329341
47 1545621 1555554 1179331 1150089 1775787 1273834 1387428 1377341
48 1238174 1539322 1539603 1872519 1462137 1683127 1204344 1198791
49 1980167 1901394 1648755 1940943 1729177 1510119 1701650 1846238
50 1587602 1504455 1282142 1881814 1673668 1994022 1204029 1853858
Alternatively we can use query( ) function and write our filtering criteria:

income.query('Y2002>1700000 & Y2003 > 1500000')


Dealing with missing values
We create a new dataframe named 'crops' and to create a NaN value we use np.nan by importing numpy.

import numpy as np
mydata = {'Crop': ['Rice', 'Wheat', 'Barley', 'Maize'],
'Yield': [1010, 1025.2, 1404.2, 1251.7],
'cost' : [102, np.nan, 20, 68]}
crops = pd.DataFrame(mydata)
crops

isnull( ) returns True and notnull( ) returns False if the value is NaN.

crops.isnull() #same as is.na in R
crops.notnull() #opposite of previous command.
crops.isnull().sum() #No. of missing values.

crops.cost.isnull() firstly subsets the 'cost' from the dataframe and returns a logical vector with isnull()

crops[crops.cost.isnull()] #shows the rows with NAs.
crops[crops.cost.isnull()].Crop #shows the rows with NAs in crops.Crop
crops[crops.cost.notnull()].Crop #shows the rows without NAs in crops.Crop

To drop all the rows which have missing values in any rows we use dropna(how = "any") . By default inplace = False . If how = "all" means drop a row if all the elements in that row are missing

crops.dropna(how = "any").shape
crops.dropna(how = "all").shape

To remove NaNs if any of 'Yield' or'cost' are missing we use the subset parameter and pass a list:

crops.dropna(subset = ['Yield',"cost"],how = 'any').shape
crops.dropna(subset = ['Yield',"cost"],how = 'all').shape

Replacing the missing values by "UNKNOWN" sub attribute in Column name.

crops['cost'].fillna(value = "UNKNOWN",inplace = True)
crops


Dealing with duplicates
We create a new dataframe comprising of items and their respective prices.

data = pd.DataFrame({"Items" : ["TV","Washing Machine","Mobile","TV","TV","Washing Machine"], "Price" : [10000,50000,20000,10000,10000,40000]})
data

             Items  Price
0 TV 10000
1 Washing Machine 50000
2 Mobile 20000
3 TV 10000
4 TV 10000
5 Washing Machine 40000
duplicated() returns a logical vector returning True when encounters duplicated.

data.loc[data.duplicated(),:]
data.loc[data.duplicated(keep = "first"),:]

By default keep = 'first' i.e. the first occurence is considered a unique value and its repetitions are considered as duplicates.
If keep = "last" the last occurence is considered a unique value and all its repetitions are considered as duplicates.

data.loc[data.duplicated(keep = "last"),:] #last entries are not there,indices have changed.

If keep = "False" then it considers all the occurences of the repeated observations as duplicates.

data.loc[data.duplicated(keep = False),:] #all the duplicates, including unique are shown.

To drop the duplicates drop_duplicates is used with default inplace = False, keep = 'first' or 'last' or 'False' have the respective meanings as in duplicated( )

data.drop_duplicates(keep = "first")
data.drop_duplicates(keep = "last")
data.drop_duplicates(keep = False,inplace = True) #by default inplace = False
data


Creating dummies
Now we will consider the iris dataset.

iris = pd.read_csv("C:\\Users\\Hp\\Desktop\\work\\Python\\Basics\\pandas\\iris.csv")
iris.head()

   Sepal.Length  Sepal.Width  Petal.Length  Petal.Width Species
0 5.1 3.5 1.4 0.2 setosa
1 4.9 3.0 1.4 0.2 setosa
2 4.7 3.2 1.3 0.2 setosa
3 4.6 3.1 1.5 0.2 setosa
4 5.0 3.6 1.4 0.2 setosa
map( ) function is used to match the values and replace them in the new series automatically created.

iris["setosa"] = iris.Species.map({"setosa" : 1,"versicolor":0, "virginica" : 0})
iris.head()

To create dummies get_dummies( ) is used. iris.Species.prefix = "Species" adds a prefix ' Species' to the new series created.

pd.get_dummies(iris.Species,prefix = "Species")
pd.get_dummies(iris.Species,prefix = "Species").iloc[:,0:1] #1 is not included
species_dummies = pd.get_dummies(iris.Species,prefix = "Species").iloc[:,0:]

With concat( ) function we can join multiple series or dataframes. axis = 1 denotes that they should be joined columnwise.

iris = pd.concat([iris,species_dummies],axis = 1)
iris.head()

   Sepal.Length  Sepal.Width  Petal.Length  Petal.Width Species  \
0 5.1 3.5 1.4 0.2 setosa
1 4.9 3.0 1.4 0.2 setosa
2 4.7 3.2 1.3 0.2 setosa
3 4.6 3.1 1.5 0.2 setosa
4 5.0 3.6 1.4 0.2 setosa

Species_setosa Species_versicolor Species_virginica
0 1 0 0
1 1 0 0
2 1 0 0
3 1 0 0
4 1 0 0
It is usual that for a variable with 'n' categories we creat 'n-1' dummies, thus to drop the first 'dummy' column we write drop_first = True

pd.get_dummies(iris,columns = ["Species"],drop_first = True).head()


Ranking
To create a dataframe of all the ranks we use rank( )

iris.rank()

Ranking by a specific variable
Suppose we want to rank the Sepal.Length for different species in ascending order:

iris['Rank2'] = iris['Sepal.Length'].groupby(iris["Species"]).rank(ascending=1)
iris.head()


Calculating the Cumulative sum
Using cumsum( ) function we can obtain the cumulative sum

iris['cum_sum'] = iris["Sepal.Length"].cumsum()
iris.head()

Cumulative sum by a variable
To find the cumulative sum of sepal lengths for different species we use groupby( ) and then use cumsum( )

iris["cumsum2"] = iris.groupby(["Species"])["Sepal.Length"].cumsum()
iris.head()


Calculating the percentiles.
Various quantiles can be obtained by using quantile( )

iris.quantile(0.5)
iris.quantile([0.1,0.2,0.5])
iris.quantile(0.55)


if else in Python
We create a new dataframe of students' name and their respective zodiac signs.

students = pd.DataFrame({'Names': ['John','Mary','Henry','Augustus','Kenny'],
'Zodiac Signs': ['Aquarius','Libra','Gemini','Pisces','Virgo']})

def name(row):
if row["Names"] in ["John","Henry"]:
return "yes"
else:
return "no"

students['flag'] = students.apply(name, axis=1)
students
Functions in python are defined using the block keyword def , followed with the function's name as the block's name. apply( ) function applies function along rows or columns of dataframe.

Note :If using simple 'if else' we need to take care of the indentation . Python does not involve curly braces for the loops and if else.

Output
      Names Zodiac Signs flag
0 John Aquarius yes
1 Mary Libra no
2 Henry Gemini yes
3 Augustus Pisces no
4 Kenny Virgo no

Alternatively, By importing numpy we can use np.where. The first argument is the condition to be evaluated, 2nd argument is the value if condition is True and last argument defines the value if the condition evaluated returns False.

import numpy as np
students['flag'] = np.where(students['Names'].isin(['John','Henry']), 'yes', 'no')
students


Multiple Conditions : If Else-if Else
def mname(row):
if row["Names"] == "John" and row["Zodiac Signs"] == "Aquarius" :
return "yellow"
elif row["Names"] == "Mary" and row["Zodiac Signs"] == "Libra" :
return "blue"
elif row["Zodiac Signs"] == "Pisces" :
return "blue"
else:
return "black"

students['color'] = students.apply(mname, axis=1)
students

We create a list of conditions and their respective values if evaluated True and use np.select where default value is the value if all the conditions is False

conditions = [
(students['Names'] == 'John') & (students['Zodiac Signs'] == 'Aquarius'),
(students['Names'] == 'Mary') & (students['Zodiac Signs'] == 'Libra'),
(students['Zodiac Signs'] == 'Pisces')]
choices = ['yellow', 'blue', 'purple']
students['color'] = np.select(conditions, choices, default='black')
students

      Names Zodiac Signs flag   color
0 John Aquarius yes yellow
1 Mary Libra no blue
2 Henry Gemini yes black
3 Augustus Pisces no purple
4 Kenny Virgo no black

Select numeric or categorical columns only
To include numeric columns we use select_dtypes( )

data1 = iris.select_dtypes(include=[np.number])
data1.head()

_get_numeric_data also provides utility to select the numeric columns only.

data3 = iris._get_numeric_data()
data3.head(3)

   Sepal.Length  Sepal.Width  Petal.Length  Petal.Width  cum_sum  cumsum2
0 5.1 3.5 1.4 0.2 5.1 5.1
1 4.9 3.0 1.4 0.2 10.0 10.0
2 4.7 3.2 1.3 0.2 14.7 14.7
For selecting categorical variables

data4 = iris.select_dtypes(include = ['object'])
data4.head(2)

 Species
0 setosa
1 setosa

Concatenating
We create 2 dataframes containing the details of the students:

students = pd.DataFrame({'Names': ['John','Mary','Henry','Augustus','Kenny'],
'Zodiac Signs': ['Aquarius','Libra','Gemini','Pisces','Virgo']})
students2 = pd.DataFrame({'Names': ['John','Mary','Henry','Augustus','Kenny'],
'Marks' : [50,81,98,25,35]})

using pd.concat( ) function we can join the 2 dataframes:

data = pd.concat([students,students2]) #by default axis = 0

   Marks     Names Zodiac Signs
0 NaN John Aquarius
1 NaN Mary Libra
2 NaN Henry Gemini
3 NaN Augustus Pisces
4 NaN Kenny Virgo
0 50.0 John NaN
1 81.0 Mary NaN
2 98.0 Henry NaN
3 25.0 Augustus NaN
4 35.0 Kenny NaN
By default axis = 0 thus the new dataframe will be added row-wise. If a column is not present then in one of the dataframes it creates NaNs. To join column wise we set axis = 1

data = pd.concat([students,students2],axis = 1)
data

      Names Zodiac Signs  Marks     Names
0 John Aquarius 50 John
1 Mary Libra 81 Mary
2 Henry Gemini 98 Henry
3 Augustus Pisces 25 Augustus
4 Kenny Virgo 35 Kenny
Using append function we can join the dataframes row-wise

students.append(students2) #for rows

Alternatively we can create a dictionary of the two data frames and can use pd.concat to join the dataframes row wise

classes = {'x': students, 'y': students2}
result = pd.concat(classes)
result

     Marks     Names Zodiac Signs
x 0 NaN John Aquarius
1 NaN Mary Libra
2 NaN Henry Gemini
3 NaN Augustus Pisces
4 NaN Kenny Virgo
y 0 50.0 John NaN
1 81.0 Mary NaN
2 98.0 Henry NaN
3 25.0 Augustus NaN
4 35.0 Kenny NaN

Merging or joining on the basis of common variable.
We take 2 dataframes with different number of observations:

students = pd.DataFrame({'Names': ['John','Mary','Henry','Maria'],
'Zodiac Signs': ['Aquarius','Libra','Gemini','Capricorn']})
students2 = pd.DataFrame({'Names': ['John','Mary','Henry','Augustus','Kenny'],
'Marks' : [50,81,98,25,35]})

Using pd.merge we can join the two dataframes. on = 'Names' denotes the common variable on the basis of which the dataframes are to be combined is 'Names'

result = pd.merge(students, students2, on='Names') #it only takes intersections
result

   Names Zodiac Signs  Marks
0 John Aquarius 50
1 Mary Libra 81
2 Henry Gemini 98
By default how = "inner" thus it takes only the common elements in both the dataframes. If you want all the elements in both the dataframes set how = "outer"

result = pd.merge(students, students2, on='Names',how = "outer") #it only takes unions
result

      Names Zodiac Signs  Marks
0 John Aquarius 50.0
1 Mary Libra 81.0
2 Henry Gemini 98.0
3 Maria Capricorn NaN
4 Augustus NaN 25.0
5 Kenny NaN 35.0
To take only intersections and all the values in left df set how = 'left'

result = pd.merge(students, students2, on='Names',how = "left")
result

   Names Zodiac Signs  Marks
0 John Aquarius 50.0
1 Mary Libra 81.0
2 Henry Gemini 98.0
3 Maria Capricorn NaN
Similarly how = 'right' takes only intersections and all the values in right df.

result = pd.merge(students, students2, on='Names',how = "right",indicator = True)
result

      Names Zodiac Signs  Marks      _merge
0 John Aquarius 50 both
1 Mary Libra 81 both
2 Henry Gemini 98 both
3 Augustus NaN 25 right_only
4 Kenny NaN 35 right_only

indicator = True creates a column for indicating that whether the values are present in both the dataframes or either left or right dataframe.

About Author:

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 7 years of experience in data science and predictive modeling. During his tenure, he has worked with global clients in various domains.

Let's Get Connected: LinkedIn

22 Apr 2019 5:07am GMT

10 Nov 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: King Willams Town Bahnhof

Gestern musste ich morgens zur Station nach KWT um unsere Rerservierten Bustickets für die Weihnachtsferien in Capetown abzuholen. Der Bahnhof selber ist seit Dezember aus kostengründen ohne Zugverbindung - aber Translux und co - die langdistanzbusse haben dort ihre Büros.


Größere Kartenansicht




© benste CC NC SA

10 Nov 2011 10:57am GMT

09 Nov 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein

Niemand ist besorgt um so was - mit dem Auto fährt man einfach durch, und in der City - nahe Gnobie- "ne das ist erst gefährlich wenn die Feuerwehr da ist" - 30min später auf dem Rückweg war die Feuerwehr da.




© benste CC NC SA

09 Nov 2011 8:25pm GMT

08 Nov 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: Brai Party

Brai = Grillabend o.ä.

Die möchte gern Techniker beim Flicken ihrer SpeakOn / Klinke Stecker Verzweigungen...

Die Damen "Mamas" der Siedlung bei der offiziellen Eröffnungsrede

Auch wenn weniger Leute da waren als erwartet, Laute Musik und viele Leute ...

Und natürlich ein Feuer mit echtem Holz zum Grillen.

© benste CC NC SA

08 Nov 2011 2:30pm GMT

07 Nov 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: Lumanyano Primary

One of our missions was bringing Katja's Linux Server back to her room. While doing that we saw her new decoration.

Björn, Simphiwe carried the PC to Katja's school


© benste CC NC SA

07 Nov 2011 2:00pm GMT

06 Nov 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: Nelisa Haircut

Today I went with Björn to Needs Camp to Visit Katja's guest family for a special Party. First of all we visited some friends of Nelisa - yeah the one I'm working with in Quigney - Katja's guest fathers sister - who did her a haircut.

African Women usually get their hair done by arranging extensions and not like Europeans just cutting some hair.

In between she looked like this...

And then she was done - looks amazing considering the amount of hair she had last week - doesn't it ?

© benste CC NC SA

06 Nov 2011 7:45pm GMT

05 Nov 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: Mein Samstag

Irgendwie viel mir heute auf das ich meine Blogposts mal ein bischen umstrukturieren muss - wenn ich immer nur von neuen Plätzen berichte, dann müsste ich ja eine Rundreise machen. Hier also mal ein paar Sachen aus meinem heutigen Alltag.

Erst einmal vorweg, Samstag zählt zumindest für uns Voluntäre zu den freien Tagen.

Dieses Wochenende sind nur Rommel und ich auf der Farm - Katja und Björn sind ja mittlerweile in ihren Einsatzstellen, und meine Mitbewohner Kyle und Jonathan sind zu Hause in Grahamstown - sowie auch Sipho der in Dimbaza wohnt.
Robin, die Frau von Rommel ist in Woodie Cape - schon seit Donnerstag um da ein paar Sachen zur erledigen.
Naja wie dem auch sei heute morgen haben wir uns erstmal ein gemeinsames Weetbix/Müsli Frühstück gegönnt und haben uns dann auf den Weg nach East London gemacht. 2 Sachen waren auf der Checkliste Vodacom, Ethienne (Imobilienmakler) außerdem auf dem Rückweg die fehlenden Dinge nach NeedsCamp bringen.

Nachdem wir gerade auf der Dirtroad losgefahren sind mussten wir feststellen das wir die Sachen für Needscamp und Ethienne nicht eingepackt hatten aber die Pumpe für die Wasserversorgung im Auto hatten.

Also sind wir in EastLondon ersteinmal nach Farmerama - nein nicht das onlinespiel farmville - sondern einen Laden mit ganz vielen Sachen für eine Farm - in Berea einem nördlichen Stadteil gefahren.

In Farmerama haben wir uns dann beraten lassen für einen Schnellverschluss der uns das leben mit der Pumpe leichter machen soll und außerdem eine leichtere Pumpe zur Reperatur gebracht, damit es nicht immer so ein großer Aufwand ist, wenn mal wieder das Wasser ausgegangen ist.

Fego Caffé ist in der Hemmingways Mall, dort mussten wir und PIN und PUK einer unserer Datensimcards geben lassen, da bei der PIN Abfrage leider ein zahlendreher unterlaufen ist. Naja auf jeden Fall speichern die Shops in Südafrika so sensible Daten wie eine PUK - die im Prinzip zugang zu einem gesperrten Phone verschafft.

Im Cafe hat Rommel dann ein paar online Transaktionen mit dem 3G Modem durchgeführt, welches ja jetzt wieder funktionierte - und übrigens mittlerweile in Ubuntu meinem Linuxsystem perfekt klappt.

Nebenbei bin ich nach 8ta gegangen um dort etwas über deren neue Deals zu erfahren, da wir in einigen von Hilltops Centern Internet anbieten wollen. Das Bild zeigt die Abdeckung UMTS in NeedsCamp Katjas Ort. 8ta ist ein neuer Telefonanbieter von Telkom, nachdem Vodafone sich Telkoms anteile an Vodacom gekauft hat müssen die komplett neu aufbauen.
Wir haben uns dazu entschieden mal eine kostenlose Prepaidkarte zu testen zu organisieren, denn wer weis wie genau die Karte oben ist ... Bevor man einen noch so billigen Deal für 24 Monate signed sollte man wissen obs geht.

Danach gings nach Checkers in Vincent, gesucht wurden zwei Hotplates für WoodyCape - R 129.00 eine - also ca. 12€ für eine zweigeteilte Kochplatte.
Wie man sieht im Hintergrund gibts schon Weihnachtsdeko - Anfang November und das in Südafrika bei sonnig warmen min- 25°C

Mittagessen haben wir uns bei einem Pakistanischen Curry Imbiss gegönnt - sehr empfehlenswert !
Naja und nachdem wir dann vor ner Stunde oder so zurück gekommen sind habe ich noch den Kühlschrank geputzt den ich heute morgen zum defrosten einfach nach draußen gestellt hatte. Jetzt ist der auch mal wieder sauber und ohne 3m dicke Eisschicht...

Morgen ... ja darüber werde ich gesondert berichten ... aber vermutlich erst am Montag, denn dann bin ich nochmal wieder in Quigney(East London) und habe kostenloses Internet.

© benste CC NC SA

05 Nov 2011 4:33pm GMT

31 Oct 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: Sterkspruit Computer Center

Sterkspruit is one of Hilltops Computer Centres in the far north of Eastern Cape. On the trip to J'burg we've used the opportunity to take a look at the centre.

Pupils in the big classroom


The Trainer


School in Countryside


Adult Class in the Afternoon


"Town"


© benste CC NC SA

31 Oct 2011 4:58pm GMT

Benedict Stein: Technical Issues

What are you doing in an internet cafe if your ADSL and Faxline has been discontinued before months end. Well my idea was sitting outside and eating some ice cream.
At least it's sunny and not as rainy as on the weekend.


© benste CC NC SA

31 Oct 2011 3:11pm GMT

30 Oct 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: Nellis Restaurant

For those who are traveling through Zastron - there is a very nice Restaurant which is serving delicious food at reasanable prices.
In addition they're selling home made juices jams and honey.




interior


home made specialities - the shop in the shop


the Bar


© benste CC NC SA

30 Oct 2011 4:47pm GMT

29 Oct 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: The way back from J'burg

Having the 10 - 12h trip from J'burg back to ELS I was able to take a lot of pcitures including these different roadsides

Plain Street


Orange River in its beginngings (near Lesotho)


Zastron Anglican Church


The Bridge in Between "Free State" and Eastern Cape next to Zastron


my new Background ;)


If you listen to GoogleMaps you'll end up traveling 50km of gravel road - as it was just renewed we didn't have that many problems and saved 1h compared to going the official way with all it's constructions sites




Freeway


getting dark


© benste CC NC SA

29 Oct 2011 4:23pm GMT

28 Oct 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: Wie funktioniert eigentlich eine Baustelle ?

Klar einiges mag anders sein, vieles aber gleich - aber ein in Deutschland täglich übliches Bild einer Straßenbaustelle - wie läuft das eigentlich in Südafrika ?

Ersteinmal vorweg - NEIN keine Ureinwohner die mit den Händen graben - auch wenn hier mehr Manpower genutzt wird - sind sie fleißig mit Technologie am arbeiten.

Eine ganz normale "Bundesstraße"


und wie sie erweitert wird


gaaaanz viele LKWs


denn hier wird eine Seite über einen langen Abschnitt komplett gesperrt, so das eine Ampelschaltung mit hier 45 Minuten Wartezeit entsteht


Aber wenigstens scheinen die ihren Spaß zu haben ;) - Wie auch wir denn gücklicher Weise mussten wir nie länger als 10 min. warten.

© benste CC NC SA

28 Oct 2011 4:20pm GMT