21 Oct 2019

feedPlanet Python

Stack Abuse: Introduction to PyTorch for Classification

PyTorch and TensorFlow libraries are two of the most commonly used Python libraries for deep learning. PyTorch is developed by Facebook, while TensorFlow is a Google project. In this article, you will see how the PyTorch library can be used to solve classification problems.

Classification problems belong to the category of machine learning problems where given a set of features, the task is to predict a discrete value. Predicting whether a tumour is cancerous or not, or whether a student is likely to pass or fail in the exam, are some of the common examples of classification problems.

In this article, given certain characteristics of a bank customer, we will predict whether or not the customer is likely to leave the bank after 6 months. The phenomena where a customer leaves an organization is also called customer churn. Therefore, our task is to predict customer churn based on various customer characteristics.

Before you proceed, it is assumed that you have intermediate level proficiency with the Python programming language and you have installed the PyTorch library. Also, know-how of basic machine learning concepts may help. If you have not installed PyTorch, you can do so with the following pip command:

$ pip install pytorch

The Dataset

The dataset that we are going to use in this article is freely available at this Kaggle link. Let's import the required libraries, and the dataset into our Python application:

import torch
import torch.nn as nn
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

We can use the read_csv() method of the pandas library to import the CSV file that contains our dataset.

dataset = pd.read_csv(r'E:Datasets\customer_data.csv')

Let's print the shape of our dataset:

dataset.shape

Output:

(10000, 14)

The output shows that the dataset has 10 thousand records and 14 columns.

We can use the head() method of the pandas dataframe to print the first five rows of our dataset.

dataset.head()

Output:

alt

You can see the 14 columns in our dataset. Based on the first 13 columns, our task is to predict the value for the 14th column i.e. Exited. It is important to mention that the values for the first 13 columns are recorded 6 months before the value for the Exited column was obtained since the task is to predict customer churn after 6 months from the time when the customer information is recorded.

Exploratory Data Analysis

Let's perform some exploratory data analysis on our dataset. We'll first predict the ratio of the customer who actually left the bank after 6 months and will use a pie plot to visualize.

Let's first increase the default plot size for the graphs:

fig_size = plt.rcParams["figure.figsize"]
fig_size[0] = 10
fig_size[1] = 8
plt.rcParams["figure.figsize"] = fig_size

The following script draws the pie plot for the Exited column.

dataset.Exited.value_counts().plot(kind='pie', autopct='%1.0f%%', colors=['skyblue', 'orange'], explode=(0.05, 0.05))

Output:

alt

The output shows that in our dataset, 20% of the customers left the bank. Here 1 belongs to the case where the customer left the bank, where 0 refers to the scenario where a customer didn't leave the bank.

Let's plot the number of customers from all the geographical locations in the dataset:

sns.countplot(x='Geography', data=dataset)

Output:

alt

The output shows that almost half of the customers belong to France, while the ratio of customers belonging to Spain and Germany is 25% each.

Let's now plot number of customers from each unique geographical location along with customer churn information. We can use the countplot() function from the seaborn library to do so.

sns.countplot(x='Exited', hue='Geography', data=dataset)

Output:

alt

The output shows that though the overall number of French customers is twice that of the number of Spanish and German customers, the ratio of customers who left the bank is the same for French and German customers. Similarly, the overall number of German and Spanish customers is the same, but the number of German customers who left the bank is twice that of the Spanish customers, which shows that German customers are more likely to leave the bank after 6 months.

In this article, we will not visually plot the information related to the rest of the columns in our dataset, but if you want to do so, you check my article on how to perform exploratory data analysis with Python Seaborn Library.

Data Preprocessing

Before we train our PyTorch model, we need to preprocess our data. If you look at the dataset, you will see that it has two types of columns: Numerical and Categorical. The numerical columns contains numerical information. CreditScore, Balance, Age, etc. Similarly, Geography and Gender are categorical columns since they contain categorical information such as the locations and genders of the customers. There are a few columns that can be treated as numeric as well as categorical. For instance, the HasCrCard column can have 1 or 0 as its values. However, the HasCrCard columns contains information about whether or not a customer has credit card. It is advised that the column that can be treated as both categorical and numerical, are treated as categorical. However, it totally depends upon the domain knowledge of the dataset.

Let's again print all the columns in our dataset and find out which of the columns can be treated as numerical and which columns should be treated as categorical. The columns attribute of a dataframe prints all the column names:

dataset.columns

Output:

Index(['RowNumber', 'CustomerId', 'Surname', 'CreditScore', 'Geography',
       'Gender', 'Age', 'Tenure', 'Balance', 'NumOfProducts', 'HasCrCard',
       'IsActiveMember', 'EstimatedSalary', 'Exited'],
      dtype='object')

From the columns in our dataset, we will not use the RowNumber, CustomerId, and Surname columns since the values for these columns are totally random and have no relation with the output. For instance, a customer's surname has no impact on whether or not the customer will leave the bank. Among the rest of the columns, Geography, Gender, HasCrCard, and IsActiveMember columns can be treated as categorical columns. Let's create a list of these columns:

categorical_columns = ['Geography', 'Gender', 'HasCrCard', 'IsActiveMember']

All of the remaining columns except the Exited column can be treated as numerical columns.

numerical_columns = ['CreditScore', 'Age', 'Tenure', 'Balance', 'NumOfProducts', 'EstimatedSalary']

Finally, the output (the values from the Exited column) are stored in the outputs variable.

outputs = ['Exited']

We have created lists of categorical, numeric, and output columns. However, at the moment the type of the categorical columns is not categorical. You can check the type of all the columns in the dataset with the following script:

dataset.dtypes

Output:

RowNumber            int64
CustomerId           int64
Surname             object
CreditScore          int64
Geography           object
Gender              object
Age                  int64
Tenure               int64
Balance            float64
NumOfProducts        int64
HasCrCard            int64
IsActiveMember       int64
EstimatedSalary    float64
Exited               int64
dtype: object

You can see that the type for Geography and Gender columns is object and the type for HasCrCard and IsActive columns is int64. We need to convert the types for categorical columns to category. We can do so using the astype() function, as shown below:

for category in categorical_columns:
    dataset[category] = dataset[category].astype('category')

Now if you again plot the types for the columns in our dataset, you should see the following results:

dataset.dtypes

Output

RowNumber             int64
CustomerId            int64
Surname              object
CreditScore           int64
Geography          category
Gender             category
Age                   int64
Tenure                int64
Balance             float64
NumOfProducts         int64
HasCrCard          category
IsActiveMember     category
EstimatedSalary     float64
Exited                int64
dtype: object

Let's now see all the categories in the Geography column:

dataset['Geography'].cat.categories

Output:

Index(['France', 'Germany', 'Spain'], dtype='object')

When you change a column's data type to category, each category in the column is assigned a unique code. For instance, let's plot the first five rows of the Geography column and print the code values for the first five rows:

dataset['Geography'].head()

Output:

0    France
1     Spain
2    France
3    France
4     Spain
Name: Geography, dtype: category
Categories (3, object): [France, Germany, Spain]

The following script plots the codes for the values in the first five rows of the Geography column:

dataset['Geography'].head().cat.codes

Output:

0    0
1    2
2    0
3    0
4    2
dtype: int8

The output shows that France has been coded as 0, and Spain has been coded as 2.

The basic purpose of separating categorical columns from the numerical columns is that values in the numerical column can be directly fed into neural networks. However, the values for the categorical columns first have to be converted into numeric types. The coding of the values in the categorical column partially solves the task of numerical conversion of the categorical columns.

Since we will be using PyTorch for model training, we need to convert our categorical and numerical columns to tensors.

Let's first convert the categorical columns to tensors. In PyTorch, tensors can be created via the numpy arrays. We will first convert data in the four categorical columns into numpy arrays and then stack all the columns horizontally, as shown in the following script:

geo = dataset['Geography'].cat.codes.values
gen = dataset['Gender'].cat.codes.values
hcc = dataset['HasCrCard'].cat.codes.values
iam = dataset['IsActiveMember'].cat.codes.values

categorical_data = np.stack([geo, gen, hcc, iam], 1)

categorical_data[:10]

The above script prints the first five records from the categorical columns, stacked horizontally. The output is as follows:

Output:

array([[0, 0, 1, 1],
       [2, 0, 0, 1],
       [0, 0, 1, 0],
       [0, 0, 0, 0],
       [2, 0, 1, 1],
       [2, 1, 1, 0],
       [0, 1, 1, 1],
       [1, 0, 1, 0],
       [0, 1, 0, 1],
       [0, 1, 1, 1]], dtype=int8)

Now to create a tensor from the aforementioned numpy array, you can simply pass the array to the tensor class of the torch module. Remember, for the categorical columns the data type should be torch.int64.

categorical_data = torch.tensor(categorical_data, dtype=torch.int64)
categorical_data[:10]

Output:

tensor([[0, 0, 1, 1],
        [2, 0, 0, 1],
        [0, 0, 1, 0],
        [0, 0, 0, 0],
        [2, 0, 1, 1],
        [2, 1, 1, 0],
        [0, 1, 1, 1],
        [1, 0, 1, 0],
        [0, 1, 0, 1],
        [0, 1, 1, 1]])

In the output, you can see that the numpy array of categorical data has now been converted into a tensor object.

In the same way, we can convert our numerical columns to tensors:

numerical_data = np.stack([dataset[col].values for col in numerical_columns], 1)
numerical_data = torch.tensor(numerical_data, dtype=torch.float)
numerical_data[:5]

Output:

tensor([[6.1900e+02, 4.2000e+01, 2.0000e+00, 0.0000e+00, 1.0000e+00, 1.0135e+05],
        [6.0800e+02, 4.1000e+01, 1.0000e+00, 8.3808e+04, 1.0000e+00, 1.1254e+05],
        [5.0200e+02, 4.2000e+01, 8.0000e+00, 1.5966e+05, 3.0000e+00, 1.1393e+05],
        [6.9900e+02, 3.9000e+01, 1.0000e+00, 0.0000e+00, 2.0000e+00, 9.3827e+04],
        [8.5000e+02, 4.3000e+01, 2.0000e+00, 1.2551e+05, 1.0000e+00, 7.9084e+04]])

In the output, you can see the first five rows containing the values for the six numerical columns in our dataset.

The final step is to convert the output numpy array into a tensor object.

outputs = torch.tensor(dataset[outputs].values).flatten()
outputs[:5]

Output:

tensor([1, 0, 1, 0, 0])

Let now plot the shape of our categorial data, numerical data, and the corresponding output:

print(categorical_data.shape)
print(numerical_data.shape)
print(outputs.shape)

Output:

torch.Size([10000, 4])
torch.Size([10000, 6])
torch.Size([10000])

There is a one very important step before we can train our model. We converted our categorical columns to numerical where a unique value is represented by a single integer. For instance, in the Geography column, we saw that France is represented by 0 and Germany is represented by 1. We can use these values to train our model. However, a better way is to represent values in a categorical column is in the form of an N-dimensional vector, instead of a single integer. A vector is capable of capturing more information and can find relationships between different categorical values in a more appropriate way. Therefore, we will represent values in the categorical columns in the form of N-dimensional vectors. This process is called embedding.

We need to define the embedding size (vector dimensions) for all the categorical columns. There is no hard and fast rule regarding the number of dimensions. A good rule of thumb to define the embedding size for a column is to divide the number of unique values in the column by 2 (but not exceeding 50). For instance, for the Geography column, the number of unique values is 3. The corresponding embedding size for the Geography column will be 3/2 = 1.5 = 2 (round off).

The following script creates a tuple that contains the number of unique values and the dimension sizes for all the categorical columns:

categorical_column_sizes = [len(dataset[column].cat.categories) for column in categorical_columns]
categorical_embedding_sizes = [(col_size, min(50, (col_size+1)//2)) for col_size in categorical_column_sizes]
print(categorical_embedding_sizes)

Output:

[(3, 2), (2, 1), (2, 1), (2, 1)]

A supervised deep learning model, such as the one we are developing in this article, is trained using training data and the model performance is evaluated on the test dataset. Therefore, we need to divide our dataset into training and test sets as shown in the following script:

total_records = 10000
test_records = int(total_records * .2)

categorical_train_data = categorical_data[:total_records-test_records]
categorical_test_data = categorical_data[total_records-test_records:total_records]
numerical_train_data = numerical_data[:total_records-test_records]
numerical_test_data = numerical_data[total_records-test_records:total_records]
train_outputs = outputs[:total_records-test_records]
test_outputs = outputs[total_records-test_records:total_records]

We have 10 thousand records in our dataset, of which 80% records, i.e. 8000 records, will be used to train the model while the remaining 20% records will be used to evaluate the performance of our model. Notice, in the script above, the categorical and numerical data, as well as the outputs have been divided into the training and test sets.

To verify that we have correctly divided data into training and test sets, let's print the lengths of the training and test records:

print(len(categorical_train_data))
print(len(numerical_train_data))
print(len(train_outputs))

print(len(categorical_test_data))
print(len(numerical_test_data))
print(len(test_outputs))

Output:

8000
8000
8000
2000
2000
2000

Creating a Model for Prediction

We have divided the data into training and test sets, now is the time to define our model for training. To do so, we can define a class named Model, which will be used to train the model. Look at the following script:

class Model(nn.Module):

    def __init__(self, embedding_size, num_numerical_cols, output_size, layers, p=0.4):
        super().__init__()
        self.all_embeddings = nn.ModuleList([nn.Embedding(ni, nf) for ni, nf in embedding_size])
        self.embedding_dropout = nn.Dropout(p)
        self.batch_norm_num = nn.BatchNorm1d(num_numerical_cols)

        all_layers = []
        num_categorical_cols = sum((nf for ni, nf in embedding_size))
        input_size = num_categorical_cols + num_numerical_cols

        for i in layers:
            all_layers.append(nn.Linear(input_size, i))
            all_layers.append(nn.ReLU(inplace=True))
            all_layers.append(nn.BatchNorm1d(i))
            all_layers.append(nn.Dropout(p))
            input_size = i

        all_layers.append(nn.Linear(layers[-1], output_size))

        self.layers = nn.Sequential(*all_layers)

    def forward(self, x_categorical, x_numerical):
        embeddings = []
        for i,e in enumerate(self.all_embeddings):
            embeddings.append(e(x_categorical[:,i]))
        x = torch.cat(embeddings, 1)
        x = self.embedding_dropout(x)

        x_numerical = self.batch_norm_num(x_numerical)
        x = torch.cat([x, x_numerical], 1)
        x = self.layers(x)
        return x

If you have never worked with PyTorch before, the above code may look daunting, however I will try to break it down into for you.

In the first line, we declare a Model class that inherits from the Module class from PyTorch's nn module. In the constructor of the class (the __init__() method) the following parameters are passed:

  1. embedding_size: Contains the embedding size for the categorical columns
  2. num_numerical_cols: Stores the total number of numerical columns
  3. output_size: The size of the output layer or the number of possible outputs.
  4. layers: List which contains number of neurons for all the layers.
  5. p: Dropout with the default value of 0.5

Inside the constructor, a few variables are initialized. Firstly, the all_embeddings variable contains a list of ModuleList objects for all the categorical columns. The embedding_dropout stores the dropout value for all the layers. Finally, the batch_norm_num stores a list of BatchNorm1d objects for all the numerical columns.

Next, to find the size of the input layer, the number of categorical and numerical columns are added together and stored in the input_size variable. After that, a for loop iterates and the corresponding layers are added into the all_layers list. The layers added are:

After the for loop, the output layer is appended to the list of layers. Since we want all of the layers in the neural networks to execute sequentially, the list of layers is passed to the nn.Sequential class.

Next, in the forward method, both the categorical and numerical columns are passed as inputs. The embedding of the categorical columns takes place in the following lines.

embeddings = []
for i, e in enumerate(self.all_embeddings):
    embeddings.append(e(x_categorical[:,i]))
x = torch.cat(embeddings, 1)
x = self.embedding_dropout(x)

The batch normalization of the numerical columns is applied with the following script:

x_numerical = self.batch_norm_num(x_numerical)

Finally, the embedded categorical columns x and the numeric columns x_numerical are concatenated together and passed to the sequential layers.

Training the Model

To train the model, first we have to create an object of the Model class that we defined in the last section.

model = Model(categorical_embedding_sizes, numerical_data.shape[1], 2, [200,100,50], p=0.4)

You can see that we pass the embedding size of the categorical columns, the number of numerical columns, the output size (2 in our case) and the neurons in the hidden layers. You can see that we have three hidden layers with 200, 100, and 50 neurons, respectively. You can choose any other size if you want.

Let's print our model and see how it looks:

print(model)

Output:

Model(
  (all_embeddings): ModuleList(
    (0): Embedding(3, 2)
    (1): Embedding(2, 1)
    (2): Embedding(2, 1)
    (3): Embedding(2, 1)
  )
  (embedding_dropout): Dropout(p=0.4)
  (batch_norm_num): BatchNorm1d(6, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (layers): Sequential(
    (0): Linear(in_features=11, out_features=200, bias=True)
    (1): ReLU(inplace)
    (2): BatchNorm1d(200, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (3): Dropout(p=0.4)
    (4): Linear(in_features=200, out_features=100, bias=True)
    (5): ReLU(inplace)
    (6): BatchNorm1d(100, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (7): Dropout(p=0.4)
    (8): Linear(in_features=100, out_features=50, bias=True)
    (9): ReLU(inplace)
    (10): BatchNorm1d(50, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (11): Dropout(p=0.4)
    (12): Linear(in_features=50, out_features=2, bias=True)
  )
)

You can see that in the first linear layer the value of the in_features variable is 11 since we have 6 numerical columns and the sum of embedding dimensions for the categorical columns is 5, hence 6+5 = 11. Similarly, in the last layer, the out_features has a value of 2 since we have only 2 possible outputs.

Before we can actually train our model, we need to define the loss function and the optimizer that will be used to train the model. Since, we are solving a classification problem, we will use the cross entropy loss. For the optimizer function, we will use the adam optimizer.

The following script defines the loss function and the optimizer:

loss_function = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

Now we have everything that is needed to train the model. The following script trains the model:

epochs = 300
aggregated_losses = []

for i in range(epochs):
    i += 1
    y_pred = model(categorical_train_data, numerical_train_data)
    single_loss = loss_function(y_pred, train_outputs)
    aggregated_losses.append(single_loss)

    if i%25 == 1:
        print(f'epoch: {i:3} loss: {single_loss.item():10.8f}')

    optimizer.zero_grad()
    single_loss.backward()
    optimizer.step()

print(f'epoch: {i:3} loss: {single_loss.item():10.10f}')

The number of epochs is set to 300, which means that to train the model, the complete dataset will be used 300 times. A for loop executes for 300 times and during each iteration, the loss is calculated using the loss function. The loss during each iteration is appended to the aggregated_loss list. To update the weights, the backward() function of the single_loss object is called. Finally, the step() method of the optimizer function updates the gradient. The loss is printed after every 25 epochs.

The output of the script above is as follows:

epoch:   1 loss: 0.71847951
epoch:  26 loss: 0.57145703
epoch:  51 loss: 0.48110831
epoch:  76 loss: 0.42529839
epoch: 101 loss: 0.39972275
epoch: 126 loss: 0.37837571
epoch: 151 loss: 0.37133673
epoch: 176 loss: 0.36773482
epoch: 201 loss: 0.36305946
epoch: 226 loss: 0.36079505
epoch: 251 loss: 0.35350436
epoch: 276 loss: 0.35540250
epoch: 300 loss: 0.3465710580

The following script plots the losses against epochs:

plt.plot(range(epochs), aggregated_losses)
plt.ylabel('Loss')
plt.xlabel('epoch');

Output:

alt

The output shows that initially the loss decreases rapidly. After around the 250th epoch, there is a very little decrease in the loss.

Making Predictions

The last step is to make predictions on the test data. To do so, we simply need to pass the categorical_test_data and numerical_test_data to the model class. The values returned can then be compared with the actual test output values. The following script makes predictions on the test class and prints the cross entropy loss for the test data.

with torch.no_grad():
    y_val = model(categorical_test_data, numerical_test_data)
    loss = loss_function(y_val, test_outputs)
print(f'Loss: {loss:.8f}')

Output:

Loss: 0.36855841

The loss on the test set is 0.3685, which is slightly more than 0.3465 achieved on the training set which shows that our model is slightly overfitting.

It is important to note that since we specified that our output layer will contain 2 neurons, each prediction will contain 2 values. For instance, the first 5 predicted values look like this:

print(y_val[:5])

Output:

tensor([[ 1.2045, -1.3857],
        [ 1.3911, -1.5957],
        [ 1.2781, -1.3598],
        [ 0.6261, -0.5429],
        [ 2.5430, -1.9991]])

The idea behind such predictions is that if the actual output is 0, the value at the index 0 should be higher than the value at index 1, and vice versa. We can retrieve the index of the largest value in the list with the following script:

y_val = np.argmax(y_val, axis=1)

Output:

Let's now again print the first five values for the y_val list:

print(y_val[:5])

Output:

tensor([0, 0, 0, 0, 0])

Since in the list of originally predicted outputs, for the first five records, the the values at zero indexes are greater than the values at first indexes, we can see 0 in the first five rows of the processed outputs.

Finally, we can use the confusion_matrix, accuracy_score, and classification_report classes from the sklearn.metrics module to find the accuracy, precision, and recall values for the test set, along with the confusion matrix.

from sklearn.metrics import classification_report, confusion_matrix, accuracy_score

print(confusion_matrix(test_outputs,y_val))
print(classification_report(test_outputs,y_val))
print(accuracy_score(test_outputs, y_val))

Output:

[[1527   83]
 [ 224  166]]
              precision    recall  f1-score   support

           0       0.87      0.95      0.91      1610
           1       0.67      0.43      0.52       390

   micro avg       0.85      0.85      0.85      2000
   macro avg       0.77      0.69      0.71      2000
weighted avg       0.83      0.85      0.83      2000

0.8465

The output shows that our model achieves an accuracy of 84.65% which is pretty impressive given the fact that we randomly selected all the parameters for our neural network model. I would suggest that you try to change the model parameters i.e. train/test splits, number and size of hidden layers, etc. to see if you can get better results.

Conclusion

PyTorch is a commonly used deep learning library developed by Facebook which can be used for a variety of tasks such as classification, regression, and clustering. This article explains how to use PyTorch library for the classification of tabular data.

21 Oct 2019 2:33pm GMT

Stack Abuse: Introduction to PyTorch for Classification

PyTorch and TensorFlow libraries are two of the most commonly used Python libraries for deep learning. PyTorch is developed by Facebook, while TensorFlow is a Google project. In this article, you will see how the PyTorch library can be used to solve classification problems.

Classification problems belong to the category of machine learning problems where given a set of features, the task is to predict a discrete value. Predicting whether a tumour is cancerous or not, or whether a student is likely to pass or fail in the exam, are some of the common examples of classification problems.

In this article, given certain characteristics of a bank customer, we will predict whether or not the customer is likely to leave the bank after 6 months. The phenomena where a customer leaves an organization is also called customer churn. Therefore, our task is to predict customer churn based on various customer characteristics.

Before you proceed, it is assumed that you have intermediate level proficiency with the Python programming language and you have installed the PyTorch library. Also, know-how of basic machine learning concepts may help. If you have not installed PyTorch, you can do so with the following pip command:

$ pip install pytorch

The Dataset

The dataset that we are going to use in this article is freely available at this Kaggle link. Let's import the required libraries, and the dataset into our Python application:

import torch
import torch.nn as nn
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

We can use the read_csv() method of the pandas library to import the CSV file that contains our dataset.

dataset = pd.read_csv(r'E:Datasets\customer_data.csv')

Let's print the shape of our dataset:

dataset.shape

Output:

(10000, 14)

The output shows that the dataset has 10 thousand records and 14 columns.

We can use the head() method of the pandas dataframe to print the first five rows of our dataset.

dataset.head()

Output:

alt

You can see the 14 columns in our dataset. Based on the first 13 columns, our task is to predict the value for the 14th column i.e. Exited. It is important to mention that the values for the first 13 columns are recorded 6 months before the value for the Exited column was obtained since the task is to predict customer churn after 6 months from the time when the customer information is recorded.

Exploratory Data Analysis

Let's perform some exploratory data analysis on our dataset. We'll first predict the ratio of the customer who actually left the bank after 6 months and will use a pie plot to visualize.

Let's first increase the default plot size for the graphs:

fig_size = plt.rcParams["figure.figsize"]
fig_size[0] = 10
fig_size[1] = 8
plt.rcParams["figure.figsize"] = fig_size

The following script draws the pie plot for the Exited column.

dataset.Exited.value_counts().plot(kind='pie', autopct='%1.0f%%', colors=['skyblue', 'orange'], explode=(0.05, 0.05))

Output:

alt

The output shows that in our dataset, 20% of the customers left the bank. Here 1 belongs to the case where the customer left the bank, where 0 refers to the scenario where a customer didn't leave the bank.

Let's plot the number of customers from all the geographical locations in the dataset:

sns.countplot(x='Geography', data=dataset)

Output:

alt

The output shows that almost half of the customers belong to France, while the ratio of customers belonging to Spain and Germany is 25% each.

Let's now plot number of customers from each unique geographical location along with customer churn information. We can use the countplot() function from the seaborn library to do so.

sns.countplot(x='Exited', hue='Geography', data=dataset)

Output:

alt

The output shows that though the overall number of French customers is twice that of the number of Spanish and German customers, the ratio of customers who left the bank is the same for French and German customers. Similarly, the overall number of German and Spanish customers is the same, but the number of German customers who left the bank is twice that of the Spanish customers, which shows that German customers are more likely to leave the bank after 6 months.

In this article, we will not visually plot the information related to the rest of the columns in our dataset, but if you want to do so, you check my article on how to perform exploratory data analysis with Python Seaborn Library.

Data Preprocessing

Before we train our PyTorch model, we need to preprocess our data. If you look at the dataset, you will see that it has two types of columns: Numerical and Categorical. The numerical columns contains numerical information. CreditScore, Balance, Age, etc. Similarly, Geography and Gender are categorical columns since they contain categorical information such as the locations and genders of the customers. There are a few columns that can be treated as numeric as well as categorical. For instance, the HasCrCard column can have 1 or 0 as its values. However, the HasCrCard columns contains information about whether or not a customer has credit card. It is advised that the column that can be treated as both categorical and numerical, are treated as categorical. However, it totally depends upon the domain knowledge of the dataset.

Let's again print all the columns in our dataset and find out which of the columns can be treated as numerical and which columns should be treated as categorical. The columns attribute of a dataframe prints all the column names:

dataset.columns

Output:

Index(['RowNumber', 'CustomerId', 'Surname', 'CreditScore', 'Geography',
       'Gender', 'Age', 'Tenure', 'Balance', 'NumOfProducts', 'HasCrCard',
       'IsActiveMember', 'EstimatedSalary', 'Exited'],
      dtype='object')

From the columns in our dataset, we will not use the RowNumber, CustomerId, and Surname columns since the values for these columns are totally random and have no relation with the output. For instance, a customer's surname has no impact on whether or not the customer will leave the bank. Among the rest of the columns, Geography, Gender, HasCrCard, and IsActiveMember columns can be treated as categorical columns. Let's create a list of these columns:

categorical_columns = ['Geography', 'Gender', 'HasCrCard', 'IsActiveMember']

All of the remaining columns except the Exited column can be treated as numerical columns.

numerical_columns = ['CreditScore', 'Age', 'Tenure', 'Balance', 'NumOfProducts', 'EstimatedSalary']

Finally, the output (the values from the Exited column) are stored in the outputs variable.

outputs = ['Exited']

We have created lists of categorical, numeric, and output columns. However, at the moment the type of the categorical columns is not categorical. You can check the type of all the columns in the dataset with the following script:

dataset.dtypes

Output:

RowNumber            int64
CustomerId           int64
Surname             object
CreditScore          int64
Geography           object
Gender              object
Age                  int64
Tenure               int64
Balance            float64
NumOfProducts        int64
HasCrCard            int64
IsActiveMember       int64
EstimatedSalary    float64
Exited               int64
dtype: object

You can see that the type for Geography and Gender columns is object and the type for HasCrCard and IsActive columns is int64. We need to convert the types for categorical columns to category. We can do so using the astype() function, as shown below:

for category in categorical_columns:
    dataset[category] = dataset[category].astype('category')

Now if you again plot the types for the columns in our dataset, you should see the following results:

dataset.dtypes

Output

RowNumber             int64
CustomerId            int64
Surname              object
CreditScore           int64
Geography          category
Gender             category
Age                   int64
Tenure                int64
Balance             float64
NumOfProducts         int64
HasCrCard          category
IsActiveMember     category
EstimatedSalary     float64
Exited                int64
dtype: object

Let's now see all the categories in the Geography column:

dataset['Geography'].cat.categories

Output:

Index(['France', 'Germany', 'Spain'], dtype='object')

When you change a column's data type to category, each category in the column is assigned a unique code. For instance, let's plot the first five rows of the Geography column and print the code values for the first five rows:

dataset['Geography'].head()

Output:

0    France
1     Spain
2    France
3    France
4     Spain
Name: Geography, dtype: category
Categories (3, object): [France, Germany, Spain]

The following script plots the codes for the values in the first five rows of the Geography column:

dataset['Geography'].head().cat.codes

Output:

0    0
1    2
2    0
3    0
4    2
dtype: int8

The output shows that France has been coded as 0, and Spain has been coded as 2.

The basic purpose of separating categorical columns from the numerical columns is that values in the numerical column can be directly fed into neural networks. However, the values for the categorical columns first have to be converted into numeric types. The coding of the values in the categorical column partially solves the task of numerical conversion of the categorical columns.

Since we will be using PyTorch for model training, we need to convert our categorical and numerical columns to tensors.

Let's first convert the categorical columns to tensors. In PyTorch, tensors can be created via the numpy arrays. We will first convert data in the four categorical columns into numpy arrays and then stack all the columns horizontally, as shown in the following script:

geo = dataset['Geography'].cat.codes.values
gen = dataset['Gender'].cat.codes.values
hcc = dataset['HasCrCard'].cat.codes.values
iam = dataset['IsActiveMember'].cat.codes.values

categorical_data = np.stack([geo, gen, hcc, iam], 1)

categorical_data[:10]

The above script prints the first five records from the categorical columns, stacked horizontally. The output is as follows:

Output:

array([[0, 0, 1, 1],
       [2, 0, 0, 1],
       [0, 0, 1, 0],
       [0, 0, 0, 0],
       [2, 0, 1, 1],
       [2, 1, 1, 0],
       [0, 1, 1, 1],
       [1, 0, 1, 0],
       [0, 1, 0, 1],
       [0, 1, 1, 1]], dtype=int8)

Now to create a tensor from the aforementioned numpy array, you can simply pass the array to the tensor class of the torch module. Remember, for the categorical columns the data type should be torch.int64.

categorical_data = torch.tensor(categorical_data, dtype=torch.int64)
categorical_data[:10]

Output:

tensor([[0, 0, 1, 1],
        [2, 0, 0, 1],
        [0, 0, 1, 0],
        [0, 0, 0, 0],
        [2, 0, 1, 1],
        [2, 1, 1, 0],
        [0, 1, 1, 1],
        [1, 0, 1, 0],
        [0, 1, 0, 1],
        [0, 1, 1, 1]])

In the output, you can see that the numpy array of categorical data has now been converted into a tensor object.

In the same way, we can convert our numerical columns to tensors:

numerical_data = np.stack([dataset[col].values for col in numerical_columns], 1)
numerical_data = torch.tensor(numerical_data, dtype=torch.float)
numerical_data[:5]

Output:

tensor([[6.1900e+02, 4.2000e+01, 2.0000e+00, 0.0000e+00, 1.0000e+00, 1.0135e+05],
        [6.0800e+02, 4.1000e+01, 1.0000e+00, 8.3808e+04, 1.0000e+00, 1.1254e+05],
        [5.0200e+02, 4.2000e+01, 8.0000e+00, 1.5966e+05, 3.0000e+00, 1.1393e+05],
        [6.9900e+02, 3.9000e+01, 1.0000e+00, 0.0000e+00, 2.0000e+00, 9.3827e+04],
        [8.5000e+02, 4.3000e+01, 2.0000e+00, 1.2551e+05, 1.0000e+00, 7.9084e+04]])

In the output, you can see the first five rows containing the values for the six numerical columns in our dataset.

The final step is to convert the output numpy array into a tensor object.

outputs = torch.tensor(dataset[outputs].values).flatten()
outputs[:5]

Output:

tensor([1, 0, 1, 0, 0])

Let now plot the shape of our categorial data, numerical data, and the corresponding output:

print(categorical_data.shape)
print(numerical_data.shape)
print(outputs.shape)

Output:

torch.Size([10000, 4])
torch.Size([10000, 6])
torch.Size([10000])

There is a one very important step before we can train our model. We converted our categorical columns to numerical where a unique value is represented by a single integer. For instance, in the Geography column, we saw that France is represented by 0 and Germany is represented by 1. We can use these values to train our model. However, a better way is to represent values in a categorical column is in the form of an N-dimensional vector, instead of a single integer. A vector is capable of capturing more information and can find relationships between different categorical values in a more appropriate way. Therefore, we will represent values in the categorical columns in the form of N-dimensional vectors. This process is called embedding.

We need to define the embedding size (vector dimensions) for all the categorical columns. There is no hard and fast rule regarding the number of dimensions. A good rule of thumb to define the embedding size for a column is to divide the number of unique values in the column by 2 (but not exceeding 50). For instance, for the Geography column, the number of unique values is 3. The corresponding embedding size for the Geography column will be 3/2 = 1.5 = 2 (round off).

The following script creates a tuple that contains the number of unique values and the dimension sizes for all the categorical columns:

categorical_column_sizes = [len(dataset[column].cat.categories) for column in categorical_columns]
categorical_embedding_sizes = [(col_size, min(50, (col_size+1)//2)) for col_size in categorical_column_sizes]
print(categorical_embedding_sizes)

Output:

[(3, 2), (2, 1), (2, 1), (2, 1)]

A supervised deep learning model, such as the one we are developing in this article, is trained using training data and the model performance is evaluated on the test dataset. Therefore, we need to divide our dataset into training and test sets as shown in the following script:

total_records = 10000
test_records = int(total_records * .2)

categorical_train_data = categorical_data[:total_records-test_records]
categorical_test_data = categorical_data[total_records-test_records:total_records]
numerical_train_data = numerical_data[:total_records-test_records]
numerical_test_data = numerical_data[total_records-test_records:total_records]
train_outputs = outputs[:total_records-test_records]
test_outputs = outputs[total_records-test_records:total_records]

We have 10 thousand records in our dataset, of which 80% records, i.e. 8000 records, will be used to train the model while the remaining 20% records will be used to evaluate the performance of our model. Notice, in the script above, the categorical and numerical data, as well as the outputs have been divided into the training and test sets.

To verify that we have correctly divided data into training and test sets, let's print the lengths of the training and test records:

print(len(categorical_train_data))
print(len(numerical_train_data))
print(len(train_outputs))

print(len(categorical_test_data))
print(len(numerical_test_data))
print(len(test_outputs))

Output:

8000
8000
8000
2000
2000
2000

Creating a Model for Prediction

We have divided the data into training and test sets, now is the time to define our model for training. To do so, we can define a class named Model, which will be used to train the model. Look at the following script:

class Model(nn.Module):

    def __init__(self, embedding_size, num_numerical_cols, output_size, layers, p=0.4):
        super().__init__()
        self.all_embeddings = nn.ModuleList([nn.Embedding(ni, nf) for ni, nf in embedding_size])
        self.embedding_dropout = nn.Dropout(p)
        self.batch_norm_num = nn.BatchNorm1d(num_numerical_cols)

        all_layers = []
        num_categorical_cols = sum((nf for ni, nf in embedding_size))
        input_size = num_categorical_cols + num_numerical_cols

        for i in layers:
            all_layers.append(nn.Linear(input_size, i))
            all_layers.append(nn.ReLU(inplace=True))
            all_layers.append(nn.BatchNorm1d(i))
            all_layers.append(nn.Dropout(p))
            input_size = i

        all_layers.append(nn.Linear(layers[-1], output_size))

        self.layers = nn.Sequential(*all_layers)

    def forward(self, x_categorical, x_numerical):
        embeddings = []
        for i,e in enumerate(self.all_embeddings):
            embeddings.append(e(x_categorical[:,i]))
        x = torch.cat(embeddings, 1)
        x = self.embedding_dropout(x)

        x_numerical = self.batch_norm_num(x_numerical)
        x = torch.cat([x, x_numerical], 1)
        x = self.layers(x)
        return x

If you have never worked with PyTorch before, the above code may look daunting, however I will try to break it down into for you.

In the first line, we declare a Model class that inherits from the Module class from PyTorch's nn module. In the constructor of the class (the __init__() method) the following parameters are passed:

  1. embedding_size: Contains the embedding size for the categorical columns
  2. num_numerical_cols: Stores the total number of numerical columns
  3. output_size: The size of the output layer or the number of possible outputs.
  4. layers: List which contains number of neurons for all the layers.
  5. p: Dropout with the default value of 0.5

Inside the constructor, a few variables are initialized. Firstly, the all_embeddings variable contains a list of ModuleList objects for all the categorical columns. The embedding_dropout stores the dropout value for all the layers. Finally, the batch_norm_num stores a list of BatchNorm1d objects for all the numerical columns.

Next, to find the size of the input layer, the number of categorical and numerical columns are added together and stored in the input_size variable. After that, a for loop iterates and the corresponding layers are added into the all_layers list. The layers added are:

After the for loop, the output layer is appended to the list of layers. Since we want all of the layers in the neural networks to execute sequentially, the list of layers is passed to the nn.Sequential class.

Next, in the forward method, both the categorical and numerical columns are passed as inputs. The embedding of the categorical columns takes place in the following lines.

embeddings = []
for i, e in enumerate(self.all_embeddings):
    embeddings.append(e(x_categorical[:,i]))
x = torch.cat(embeddings, 1)
x = self.embedding_dropout(x)

The batch normalization of the numerical columns is applied with the following script:

x_numerical = self.batch_norm_num(x_numerical)

Finally, the embedded categorical columns x and the numeric columns x_numerical are concatenated together and passed to the sequential layers.

Training the Model

To train the model, first we have to create an object of the Model class that we defined in the last section.

model = Model(categorical_embedding_sizes, numerical_data.shape[1], 2, [200,100,50], p=0.4)

You can see that we pass the embedding size of the categorical columns, the number of numerical columns, the output size (2 in our case) and the neurons in the hidden layers. You can see that we have three hidden layers with 200, 100, and 50 neurons, respectively. You can choose any other size if you want.

Let's print our model and see how it looks:

print(model)

Output:

Model(
  (all_embeddings): ModuleList(
    (0): Embedding(3, 2)
    (1): Embedding(2, 1)
    (2): Embedding(2, 1)
    (3): Embedding(2, 1)
  )
  (embedding_dropout): Dropout(p=0.4)
  (batch_norm_num): BatchNorm1d(6, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (layers): Sequential(
    (0): Linear(in_features=11, out_features=200, bias=True)
    (1): ReLU(inplace)
    (2): BatchNorm1d(200, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (3): Dropout(p=0.4)
    (4): Linear(in_features=200, out_features=100, bias=True)
    (5): ReLU(inplace)
    (6): BatchNorm1d(100, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (7): Dropout(p=0.4)
    (8): Linear(in_features=100, out_features=50, bias=True)
    (9): ReLU(inplace)
    (10): BatchNorm1d(50, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (11): Dropout(p=0.4)
    (12): Linear(in_features=50, out_features=2, bias=True)
  )
)

You can see that in the first linear layer the value of the in_features variable is 11 since we have 6 numerical columns and the sum of embedding dimensions for the categorical columns is 5, hence 6+5 = 11. Similarly, in the last layer, the out_features has a value of 2 since we have only 2 possible outputs.

Before we can actually train our model, we need to define the loss function and the optimizer that will be used to train the model. Since, we are solving a classification problem, we will use the cross entropy loss. For the optimizer function, we will use the adam optimizer.

The following script defines the loss function and the optimizer:

loss_function = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

Now we have everything that is needed to train the model. The following script trains the model:

epochs = 300
aggregated_losses = []

for i in range(epochs):
    i += 1
    y_pred = model(categorical_train_data, numerical_train_data)
    single_loss = loss_function(y_pred, train_outputs)
    aggregated_losses.append(single_loss)

    if i%25 == 1:
        print(f'epoch: {i:3} loss: {single_loss.item():10.8f}')

    optimizer.zero_grad()
    single_loss.backward()
    optimizer.step()

print(f'epoch: {i:3} loss: {single_loss.item():10.10f}')

The number of epochs is set to 300, which means that to train the model, the complete dataset will be used 300 times. A for loop executes for 300 times and during each iteration, the loss is calculated using the loss function. The loss during each iteration is appended to the aggregated_loss list. To update the weights, the backward() function of the single_loss object is called. Finally, the step() method of the optimizer function updates the gradient. The loss is printed after every 25 epochs.

The output of the script above is as follows:

epoch:   1 loss: 0.71847951
epoch:  26 loss: 0.57145703
epoch:  51 loss: 0.48110831
epoch:  76 loss: 0.42529839
epoch: 101 loss: 0.39972275
epoch: 126 loss: 0.37837571
epoch: 151 loss: 0.37133673
epoch: 176 loss: 0.36773482
epoch: 201 loss: 0.36305946
epoch: 226 loss: 0.36079505
epoch: 251 loss: 0.35350436
epoch: 276 loss: 0.35540250
epoch: 300 loss: 0.3465710580

The following script plots the losses against epochs:

plt.plot(range(epochs), aggregated_losses)
plt.ylabel('Loss')
plt.xlabel('epoch');

Output:

alt

The output shows that initially the loss decreases rapidly. After around the 250th epoch, there is a very little decrease in the loss.

Making Predictions

The last step is to make predictions on the test data. To do so, we simply need to pass the categorical_test_data and numerical_test_data to the model class. The values returned can then be compared with the actual test output values. The following script makes predictions on the test class and prints the cross entropy loss for the test data.

with torch.no_grad():
    y_val = model(categorical_test_data, numerical_test_data)
    loss = loss_function(y_val, test_outputs)
print(f'Loss: {loss:.8f}')

Output:

Loss: 0.36855841

The loss on the test set is 0.3685, which is slightly more than 0.3465 achieved on the training set which shows that our model is slightly overfitting.

It is important to note that since we specified that our output layer will contain 2 neurons, each prediction will contain 2 values. For instance, the first 5 predicted values look like this:

print(y_val[:5])

Output:

tensor([[ 1.2045, -1.3857],
        [ 1.3911, -1.5957],
        [ 1.2781, -1.3598],
        [ 0.6261, -0.5429],
        [ 2.5430, -1.9991]])

The idea behind such predictions is that if the actual output is 0, the value at the index 0 should be higher than the value at index 1, and vice versa. We can retrieve the index of the largest value in the list with the following script:

y_val = np.argmax(y_val, axis=1)

Output:

Let's now again print the first five values for the y_val list:

print(y_val[:5])

Output:

tensor([0, 0, 0, 0, 0])

Since in the list of originally predicted outputs, for the first five records, the the values at zero indexes are greater than the values at first indexes, we can see 0 in the first five rows of the processed outputs.

Finally, we can use the confusion_matrix, accuracy_score, and classification_report classes from the sklearn.metrics module to find the accuracy, precision, and recall values for the test set, along with the confusion matrix.

from sklearn.metrics import classification_report, confusion_matrix, accuracy_score

print(confusion_matrix(test_outputs,y_val))
print(classification_report(test_outputs,y_val))
print(accuracy_score(test_outputs, y_val))

Output:

[[1527   83]
 [ 224  166]]
              precision    recall  f1-score   support

           0       0.87      0.95      0.91      1610
           1       0.67      0.43      0.52       390

   micro avg       0.85      0.85      0.85      2000
   macro avg       0.77      0.69      0.71      2000
weighted avg       0.83      0.85      0.83      2000

0.8465

The output shows that our model achieves an accuracy of 84.65% which is pretty impressive given the fact that we randomly selected all the parameters for our neural network model. I would suggest that you try to change the model parameters i.e. train/test splits, number and size of hidden layers, etc. to see if you can get better results.

Conclusion

PyTorch is a commonly used deep learning library developed by Facebook which can be used for a variety of tasks such as classification, regression, and clustering. This article explains how to use PyTorch library for the classification of tabular data.

21 Oct 2019 2:33pm GMT

Real Python: Arduino With Python: How to Get Started

Microcontrollers have been around for a long time, and they're used in everything from complex machinery to common household appliances. However, working with them has traditionally been reserved for those with formal technical training, such as technicians and electrical engineers. The emergence of Arduino has made electronic application design much more accessible to all developers. In this tutorial, you'll discover how to use Arduino with Python to develop your own electronic projects.

You'll cover the basics of Arduino with Python and learn how to:

Free Bonus: 5 Thoughts On Python Mastery, a free course for Python developers that shows you the roadmap and the mindset you'll need to take your Python skills to the next level.

The Arduino Platform

Arduino is an open-source platform composed of hardware and software that allows for the rapid development of interactive electronics projects. The emergence of Arduino drew the attention of professionals from many different industries, contributing to the start of the Maker Movement.

With the growing popularity of the Maker Movement and the concept of the Internet of Things, Arduino has become one of the main platforms for electronic prototyping and the development of MVPs.

Arduino uses its own programming language, which is similar to C++. However, it's possible to use Arduino with Python or another high-level programming language. In fact, platforms like Arduino work well with Python, especially for applications that require integration with sensors and other physical devices.

All in all, Arduino and Python can facilitate an effective learning environment that encourages developers to get into electronics design. If you already know the basics of Python, then you'll be able to get started with Arduino by using Python to control it.

The Arduino platform includes both hardware and software products. In this tutorial, you'll use Arduino hardware and Python software to learn about basic circuits, as well as digital and analog inputs and outputs.

Arduino Hardware

To run the examples, you'll need to assemble the circuits by hooking up electronic components. You can generally find these items at electronic component stores or in good Arduino starter kits. You'll need:

  1. An Arduino Uno or other compatible board
  2. A standard LED of any color
  3. A push button
  4. A 10 KOhm potentiometer
  5. A 470 Ohm resistor
  6. A 10 KOhm resistor
  7. A breadboard
  8. Jumper wires of various colors and sizes

Let's take a closer look at a few of these components.

Component 1 is an Arduino Uno or other compatible board. Arduino is a project that includes many boards and modules for different purposes, and Arduino Uno is the most basic among these. It's also the most used and most documented board of the whole Arduino family, so it's a great choice for developers who are just getting started with electronics.

Note: Arduino is an open hardware platform, so there are many other vendors who sell compatible boards that could be used to run the examples you see here. In this tutorial, you'll learn how to use the Arduino Uno.

Components 5 and 6 are resistors. Most resistors are identified by colored stripes according to a color code. In general, the first three colors represent the value of a resistor, while the fourth color represents its tolerance. For a 470 Ohm resistor, the first three colors are yellow, violet, and brown. For a 10 KOhm resistor, the first three colors are brown, black, and orange.

Component 7 is a breadboard, which you use to hook up all the other components and assemble the circuits. While a breadboard is not required, it's recommended that you get one if you intend to begin working with Arduino.

Arduino Software

In addition to these hardware components, you'll need to install some software. The platform includes the Arduino IDE, an Integrated Development Environment for programming Arduino devices, among other online tools.

Arduino was designed to allow you to program the boards with little difficulty. In general, you'll follow these steps:

  1. Connect the board to your PC
  2. Install and open the Arduino IDE
  3. Configure the board settings
  4. Write the code
  5. Press a button on the IDE to upload the program to the board

To install the Arduino IDE on your computer, download the appropriate version for your operating system from the Arduino website. Check the documentation for installation instructions:

Note: You'll be using the Arduino IDE in this tutorial, but Arduino also provides a web editor that will let you program Arduino boards using the browser.

Now that you've installed the Arduino IDE and gathered all the necessary components, you're ready to get started with Arduino! Next, you'll upload a "Hello, World!" program to your board.

"Hello, World!" With Arduino

The Arduino IDE comes with several example sketches you can use to learn the basics of Arduino. A sketch is the term you use for a program that you can upload to a board. Since the Arduino Uno doesn't have an attached display, you'll need a way to see the physical output from your program. You'll use the Blink example sketch to make a built-in LED on the Arduino board blink.

Uploading the Blink Example Sketch

To get started, connect the Arduino board to your PC using a USB cable and start the Arduino IDE. To open the Blink example sketch, access the File menu and select Examples, then 01.Basics and, finally, Blink:

Blink example sketch on Arduino IDE

The Blink example code will be loaded into a new IDE window. But before you can upload the sketch to the board, you'll need to configure the IDE by selecting your board and its connected port.

To configure the board, access the Tools menu and then Board. For Arduino Uno, you should select Arduino/Genuino Uno:

Selecting board on Arduino IDE

After you select the board, you have to set the appropriate port. Access the Tools menu again, and this time select Port:

Selecting port on Arduino IDE

The names of the ports may be different, depending on your operating system. In Windows, the ports will be named COM4, COM5, or something similar. In macOS or Linux, you may see something like /dev/ttyACM0 or /dev/ttyUSB0. If you have any problems setting the port, then take a look at the Arduino Troubleshooting Page.

After you've configured the board and port, you're all set to upload the sketch to your Arduino. To do that, you just have to press the Upload button in the IDE toolbar:

Buttons for verify and upload on Arduino IDE

When you press Upload, the IDE compiles the sketch and uploads it to your board. If you want to check for errors, then you can press Verify before Upload, which will only compile your sketch.

The USB cable provides a serial connection to both upload the program and power the Arduino board. During the upload, you'll see LEDs flashing on the board. After a few seconds, the uploaded program will run, and you'll see an LED light blink once every second:

Arduino buit-in LED blinking

After the upload is finished, the USB cable will continue to power the Arduino board. The program is stored in flash memory on the Arduino microcontroller. You can also use a battery or other external power supply to run the application without a USB cable.

Connecting External Components

In the previous section, you used an LED that was already present on the Arduino board. However, in most practical projects you'll need to connect external components to the board. To make these connections, Arduino has several pins of different types:

Arduino Uno Ports

Although these connections are commonly called pins, you can see that they're not exactly physical pins. Rather, the pins are holes in a socket to which you can connect jumper wires. In the figure above, you can see different groups of pins:

To get started using external components, you'll connect an external LED to run the Blink example sketch. The built-in LED is connected to digital pin #13. So, let's connect an external LED to that pin and check if it blinks. (A standard LED is one of the components you saw listed earlier.)

Before you connect anything to the Arduino board, it's good practice to disconnect it from the computer. With the USB cable unplugged, you'll be able to connect the LED to your board:

Circuit for blink sketch

Note that the figure shows the board with the digital pins now facing you.

Using a Breadboard

Electronic circuit projects usually involve testing several ideas, with you adding new components and making adjustments as you go. However, it can be tricky to connect components directly, especially if the circuit is large.

To facilitate prototyping, you can use a breadboard to connect the components. This is a device with several holes that are connected in a particular way so that you can easily connect components using jumper wires:

Breadboard

You can see which holes are interconnected by looking at the colored lines. You'll use the holes on the sides of the breadboard to power the circuit:

Then, you can easily connect components to the power source or the ground by simply using the other holes on the red and blue lines. The holes in the middle of the breadboard are connected as indicated by the colors. You'll use these to make connections between the components of the circuit. These two internal sections are separated by a small depression, over which you can connect integrated circuits (ICs).

You can use a breadboard to assemble the circuit used in the Blink example sketch:

Circuit for blink sketch on breadboard

For this circuit, it's important to note that the LED must be connected according to its polarity or it won't work. The positive terminal of the LED is called the anode and is generally the longer one. The negative terminal is called the cathode and is shorter. If you're using a recovered component, then you can also identify the terminals by looking for a flat side on the LED itself. This will indicate the position of the negative terminal.

When you connect an LED to an Arduino pin, you'll always need a resistor to limit its current and avoid burning out the LED prematurely. Here, you use a 470 Ohm resistor to do this. You can follow the connections and check that the circuit is the same:

For a more detailed explanation, check out How to Use a Breadboard.

After you finish the connection, plug the Arduino back into the PC and re-run the Blink sketch:

Arduino built-in and external LEDs blinking

As both LEDs are connected to digital pin 13, they blink together when the sketch is running.

"Hello, World!" With Arduino and Python

In the previous section, you uploaded the Blink sketch to your Arduino board. Arduino sketches are written in a language similar to C++ and are compiled and recorded on the flash memory of the microcontroller when you press Upload. While you can use another language to directly program the Arduino microcontroller, it's not a trivial task!

However, there are some approaches you can take to use Arduino with Python or other languages. One idea is to run the main program on a PC and use the serial connection to communicate with Arduino through the USB cable. The sketch would be responsible for reading the inputs, sending the information to the PC, and getting updates from the PC to update the Arduino outputs.

To control Arduino from the PC, you'd have to design a protocol for the communication between the PC and Arduino. For example, you could consider a protocol with messages like the following:

With the protocol defined, you could write an Arduino sketch to send messages to the PC and update the states of the pins according to the protocol. On the PC, you could write a program to control the Arduino through a serial connection, based on the protocol you've designed. For this, you can use whatever language and libraries you prefer, such as Python and the PySerial library.

Fortunately, there are standard protocols to do all this! Firmata is one of them. This protocol establishes a serial communication format that allows you to read digital and analog inputs, as well as send information to digital and analog outputs.

The Arduino IDE includes ready-made sketches that will drive Arduino through Python with the Firmata protocol. On the PC side, there are implementations of the protocol in several languages, including Python. To get started with Firmata, let's use it to implement a "Hello, World!" program.

Uploading the Firmata Sketch

Before you write your Python program to drive Arduino, you have to upload the Firmata sketch so that you can use that protocol to control the board. The sketch is available in the Arduino IDE's built-in examples. To open it, access the File menu, then Examples, followed by Firmata, and finally StandardFirmata:

Firmata example sketch on Arduino IDE

The sketch will be loaded into a new IDE window. To upload it to the Arduino, you can follow the same steps you did before:

  1. Plug the USB cable into the PC.
  2. Select the appropriate board and port on the IDE.
  3. Press Upload.

After the upload is finished, you won't notice any activity on the Arduino. To control it, you still need a program that can communicate with the board through the serial connection. To work with the Firmata protocol in Python, you'll need the pyFirmata package, which you can install with pip:

$ pip install pyfirmata

After the installation finishes, you can run an equivalent Blink application using Python and Firmata:

 1 import pyfirmata
 2 import time
 3 
 4 board = pyfirmata.Arduino('/dev/ttyACM0')
 5 
 6 while True:
 7     board.digital[13].write(1)
 8     time.sleep(1)
 9     board.digital[13].write(0)
10     time.sleep(1)

Here's how this program works. You import pyfirmata and use it to establish a serial connection with the Arduino board, which is represented by the board object in line 4. You also configure the port in this line by passing an argument to pyfirmata.Arduino(). You can use the Arduino IDE to find the port.

board.digital is a list whose elements represent the digital pins of the Arduino. These elements have the methods read() and write(), which will read and write the state of the pins. Like most embedded device programs, this program mainly consists of an infinite loop:

Now that you know the basics of how to control an Arduino with Python, let's go through some applications to interact with its inputs and outputs.

Reading Digital Inputs

Digital inputs can have only two possible values. In a circuit, each of these values is represented by a different voltage. The table below shows the digital input representation for a standard Arduino Uno board:

Value Level Voltage
0 Low 0V
1 High 5V

To control the LED, you'll use a push button to send digital input values to the Arduino. The button should send 0V to the board when it's released and 5V to the board when it's pressed. The figure below shows how to connect the button to the Arduino board:

Circuit for digital input

You may notice that the LED is connected to the Arduino on digital pin 13, just like before. Digital pin 10 is used as a digital input. To connect the push button, you have to use the 10 KOhm resistor, which acts as a pull down in this circuit. A pull down resistor ensures that the digital input gets 0V when the button is released.

When you release the button, you open the connection between the two wires on the button. Since there's no current flowing through the resistor, pin 10 just connects to the ground (GND). The digital input gets 0V, which represents the 0 (or low) state. When you press the button, you apply 5V to both the resistor and the digital input. A current flows through the resistor and the digital input gets 5V, which represents the 1 (or high) state.

You can use a breadboard to assemble the above circuit as well:

Circuit for digital input on breadboard

Now that you've assembled the circuit, you have to run a program on the PC to control it using Firmata. This program will turn on the LED, based on the state of the push button:

 1 import pyfirmata
 2 import time
 3 
 4 board = pyfirmata.Arduino('/dev/ttyACM0')
 5 
 6 it = pyfirmata.util.Iterator(board)
 7 it.start()
 8 
 9 board.digital[10].mode = pyfirmata.INPUT
10 
11 while True:
12     sw = board.digital[10].read()
13     if sw is True:
14         board.digital[13].write(1)
15     else:
16         board.digital[13].write(0)
17     time.sleep(0.1)

Let's walk through this program:

pyfirmata also offers a more compact syntax to work with input and output pins. This may be a good option for when you're working with several pins. You can rewrite the previous program to have more compact syntax:

 1 import pyfirmata
 2 import time
 3 
 4 board = pyfirmata.Arduino('/dev/ttyACM0')
 5 
 6 it = pyfirmata.util.Iterator(board)
 7 it.start()
 8 
 9 digital_input = board.get_pin('d:10:i')
10 led = board.get_pin('d:13:o')
11 
12 while True:
13     sw = digital_input.read()
14     if sw is True:
15         led.write(1)
16     else:
17         led.write(0)
18     time.sleep(0.1)

In this version, you use board.get_pin() to create two objects. digital_input represents the digital input state, and led represents the LED state. When you run this method, you have to pass a string argument composed of three elements separated by colons:

  1. The type of the pin (a for analog or d for digital)
  2. The number of the pin
  3. The mode of the pin (i for input or o for output)

Since digital_input is a digital input using pin 10, you pass the argument 'd:10:i'. The LED state is set to a digital output using pin 13, so the led argument is 'd:13:o'.

When you use board.get_pin(), there's no need to explicitly set up pin 10 as an input like you did before with pyfirmata.INPUT. After the pins are set, you can access the status of a digital input pin using read(), and set the status of a digital output pin with write().

Digital inputs are widely used in electronics projects. Several sensors provide digital signals, like presence or door sensors, that can be used as inputs to your circuits. However, there are some cases where you'll need to measure analog values, such as distance or physical quantities. In the next section, you'll see how to read analog inputs using Arduino with Python.

Reading Analog Inputs

In contrast to digital inputs, which can only be on or off, analog inputs are used to read values in some range. On the Arduino Uno, the voltage to an analog input ranges from 0V to 5V. Appropriate sensors are used to measure physical quantities, such as distances. These sensors are responsible for encoding these physical quantities in the proper voltage range so they can be read by the Arduino.

To read an analog voltage, the Arduino uses an analog-to-digital converter (ADC), which converts the input voltage to a digital number with a fixed number of bits. This determines the resolution of the conversion. The Arduino Uno uses a 10-bit ADC and can determine 1024 different voltage levels.

The voltage range for an analog input is encoded to numbers ranging from 0 to 1023. When 0V is applied, the Arduino encodes it to the number 0. When 5V is applied, the encoded number is 1023. All intermediate voltage values are proportionally encoded.

A potentiometer is a variable resistor that you can use to set the voltage applied to an Arduino analog input. You'll connect it to an analog input to control the frequency of a blinking LED:

Circuit for analog input

In this circuit, the LED is set up just as before. The end terminals of the potentiometer are connected to ground (GND) and 5V pins. This way, the central terminal (the cursor) can have any voltage in the 0V to 5V range depending on its position, which is connected to the Arduino on analog pin A0.

Using a breadboard, you can assemble this circuit as follows:

Circuit for analog input on breadboard

Before you control the LED, you can use the circuit to check the different values the Arduino reads, based on the position of the potentiometer. To do this, run the following program on your PC:

 1 import pyfirmata
 2 import time
 3 
 4 board = pyfirmata.Arduino('/dev/ttyACM0')
 5 it = pyfirmata.util.Iterator(board)
 6 it.start()
 7 
 8 analog_input = board.get_pin('a:0:i')
 9 
10 while True:
11     analog_value = analog_input.read()
12     print(analog_value)
13     time.sleep(0.1)

In line 8, you set up analog_input as the analog A0 input pin with the argument 'a:0:i'. Inside the infinite while loop, you read this value, store it in analog_value, and display the output to the console with print(). When you move the potentiometer while the program runs, you should output similar to this:

0.0
0.0293
0.1056
0.1838
0.2717
0.3705
0.4428
0.5064
0.5797
0.6315
0.6764
0.7243
0.7859
0.8446
0.9042
0.9677
1.0
1.0

The printed values change, ranging from 0 when the position of the potentiometer is on one end to 1 when it's on the other end. Note that these are float values, which may require conversion depending on the application.

To change the frequency of the blinking LED, you can use the analog_value to control how long the LED will be kept on or off:

 1 import pyfirmata
 2 import time
 3 
 4 board = pyfirmata.Arduino('/dev/ttyACM0')
 5 it = pyfirmata.util.Iterator(board)
 6 it.start()
 7 
 8 analog_input = board.get_pin('a:0:i')
 9 led = board.get_pin('d:13:o')
10 
11 while True:
12     analog_value = analog_input.read()
13     if analog_value is not None:
14         delay = analog_value + 0.01
15         led.write(1)
16         time.sleep(delay)
17         led.write(0)
18         time.sleep(delay)
19     else:
20         time.sleep(0.1)

Here, you calculate delay as analog_value + 0.01 to avoid having delay equal to zero. Otherwise, it's common to get an analog_value of None during the first few iterations. To avoid getting an error when running the program, you use a conditional in line 13 to test whether analog_value is None. Then you control the period of the blinking LED.

Try running the program and changing the position of the potentiometer. You'll notice the frequency of the blinking LED changes:

Led controlled by an analog input

By now, you've seen how to use digital inputs, digital outputs, and analog inputs on your circuits. In the next section, you'll see how to use analog outputs.

Using Analog Outputs

In some cases, it's necessary to have an analog output to drive a device that requires an analog signal. Arduino doesn't include a real analog output, one where the voltage could be set to any value in a certain range. However, Arduino does include several Pulse Width Modulation (PWM) outputs.

PWM is a modulation technique in which a digital output is used to generate a signal with variable power. To do this, it uses a digital signal of constant frequency, in which the duty cycle is changed according to the desired power. The duty cycle represents the fraction of the period in which the signal is set to high.

Not all Arduino digital pins can be used as PWM outputs. The ones that can be are identified by a tilde (~):

Arduino Uno PWM ports

Several devices are designed to be driven by PWM signals, including some motors. It's even possible to obtain a real analog signal from the PWM signal if you use analog filters. In the previous example, you used a digital output to turn an LED light on or off. In this section, you'll use PWM to control the brightness of an LED, according to the value of an analog input given by a potentiometer.

When a PWM signal is applied to an LED, its brightness varies according to the duty cycle of the PWM signal. You're going to use the following circuit:

Circuit for analog output

This circuit is identical to the one used in the previous section to test the analog input, except for one difference. Since it's not possible to use PWM with pin 13, the digital output pin used for the LED is pin 11.

You can use a breadboard to assemble the circuit as follows:

Circuit for analog output on breadboard

With the circuit assembled, you can control the LED using PWM with the following program:

 1 import pyfirmata
 2 import time
 3 
 4 board = pyfirmata.Arduino('/dev/ttyACM0')
 5 
 6 it = pyfirmata.util.Iterator(board)
 7 it.start()
 8 
 9 analog_input = board.get_pin('a:0:i')
10 led = board.get_pin('d:11:p')
11 
12 while True:
13     analog_value = analog_input.read()
14     if analog_value is not None:
15         led.write(analog_value)
16     time.sleep(0.1)

There are a few differences from the programs you've used previously:

  1. In line 10, you set led to PWM mode by passing the argument 'd:11:p'.
  2. In line 15, you call led.write() with analog_value as an argument. This is a value between 0 and 1, read from the analog input.

Here you can see the LED behavior when the potentiometer is moved:

PWM output on oscilloscope

To show the changes in the duty cycle, an oscilloscope is plugged into pin 11. When the potentiometer is in its zero position, you can see the LED is turned off, as pin 11 has 0V on its output. As you turn the potentiometer, the LED gets brighter as the PWM duty cycle increases. When you turn the potentiometer all the way, the duty cycle reaches 100%. The LED is turned on continuously at maximum brightness.

With this example, you've covered the basics of using an Arduino and its digital and analog inputs and outputs. In the next section, you'll see an application for using Arduino with Python to drive events on the PC.

Using a Sensor to Trigger a Notification

Firmata is a nice way to get started with Arduino with Python, but the need for a PC or other device to run the application can be costly, and this approach may not be practical in some cases. However, when it's necessary to collect data and send it to a PC using external sensors, Arduino and Firmata make a good combination.

In this section, you'll use a push button connected to your Arduino to mimic a digital sensor and trigger a notification on your machine. For a more practical application, you can think of the push button as a door sensor that will trigger an alarm notification, for example.

To display the notification on the PC, you're going to use Tkinter, the standard Python GUI toolkit. This will show a message box when you press the button. For an in-depth intro to Tkinter, check out the library's documentation.

You'll need to assemble the same circuit that you used in the digital input example:

Circuit for digital input

After you assemble the circuit, use the following program to trigger the notifications:

 1 import pyfirmata
 2 import time
 3 import tkinter
 4 from tkinter import messagebox
 5 
 6 root = tkinter.Tk()
 7 root.withdraw()
 8 
 9 board = pyfirmata.Arduino('/dev/ttyACM0')
10 
11 it = pyfirmata.util.Iterator(board)
12 it.start()
13 
14 digital_input = board.get_pin('d:10:i')
15 led = board.get_pin('d:13:o')
16 
17 while True:
18     sw = digital_input.read()
19     if sw is True:
20         led.write(1)
21         messagebox.showinfo("Notification", "Button was pressed")
22         root.update()
23         led.write(0)
24     time.sleep(0.1)

This program is similar to the one used in the digital input example, with a few changes:

To extend the notification example, you could even use the push button to send an email when pressed:

 1 import pyfirmata
 2 import time
 3 import smtplib
 4 import ssl
 5 
 6 def send_email():
 7     port = 465  # For SSL
 8     smtp_server = "smtp.gmail.com"
 9     sender_email = "<your email address>"
10     receiver_email = "<destination email address>"
11     password = "<password>"
12     message = """Subject: Arduino Notification\n The switch was turned on."""
13 
14     context = ssl.create_default_context()
15     with smtplib.SMTP_SSL(smtp_server, port, context=context) as server:
16         print("Sending email")
17         server.login(sender_email, password)
18         server.sendmail(sender_email, receiver_email, message)
19 
20 board = pyfirmata.Arduino('/dev/ttyACM0')
21 
22 it = pyfirmata.util.Iterator(board)
23 it.start()
24 
25 digital_input = board.get_pin('d:10:i')
26 
27 while True:
28     sw = digital_input.read()
29     if sw is True:
30         send_email()
31         time.sleep(0.1)

You can learn more about send_email() in Sending Emails With Python. Here, you configure the function with email server credentials, which will be used to send the email.

Note: If you use a Gmail account to send the emails, then you need to enable the Allow less secure apps option. For more information on how to do this, check out Sending Emails With Python.

With these example applications, you've seen how to use Firmata to interact with more complex Python applications. Firmata lets you use any sensor attached to the Arduino to obtain data for your application. Then you can process the data and make decisions within the main application. You can even use Firmata to send data to Arduino outputs, controlling switches or PWM devices.

If you're interested in using Firmata to interact with more complex applications, then try out some of these projects:

Conclusion

Microcontroller platforms are on the rise, thanks to the growing popularity of the Maker Movement and the Internet of Things. Platforms like Arduino are receiving a lot of attention in particular, as they allow developers just like you to use their skills and dive into electronic projects.

You learned how to:

You also saw how Firmata may be a very interesting alternative for projects that demand a PC and depend on sensor data. Plus, it's an easy way to get started with Arduino if you already know Python!

Further Reading

Now that you know the basics of controlling Arduino with Python, you can start working on more complex applications. There are several tutorials that can help you develop integrated projects. Here are a few ideas:

Lastly, there are other ways of using Python in microcontrollers besides Firmata and Arduino:


[ Improve Your Python With 🐍 Python Tricks 💌 - Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

21 Oct 2019 2:00pm GMT

Real Python: Arduino With Python: How to Get Started

Microcontrollers have been around for a long time, and they're used in everything from complex machinery to common household appliances. However, working with them has traditionally been reserved for those with formal technical training, such as technicians and electrical engineers. The emergence of Arduino has made electronic application design much more accessible to all developers. In this tutorial, you'll discover how to use Arduino with Python to develop your own electronic projects.

You'll cover the basics of Arduino with Python and learn how to:

Free Bonus: 5 Thoughts On Python Mastery, a free course for Python developers that shows you the roadmap and the mindset you'll need to take your Python skills to the next level.

The Arduino Platform

Arduino is an open-source platform composed of hardware and software that allows for the rapid development of interactive electronics projects. The emergence of Arduino drew the attention of professionals from many different industries, contributing to the start of the Maker Movement.

With the growing popularity of the Maker Movement and the concept of the Internet of Things, Arduino has become one of the main platforms for electronic prototyping and the development of MVPs.

Arduino uses its own programming language, which is similar to C++. However, it's possible to use Arduino with Python or another high-level programming language. In fact, platforms like Arduino work well with Python, especially for applications that require integration with sensors and other physical devices.

All in all, Arduino and Python can facilitate an effective learning environment that encourages developers to get into electronics design. If you already know the basics of Python, then you'll be able to get started with Arduino by using Python to control it.

The Arduino platform includes both hardware and software products. In this tutorial, you'll use Arduino hardware and Python software to learn about basic circuits, as well as digital and analog inputs and outputs.

Arduino Hardware

To run the examples, you'll need to assemble the circuits by hooking up electronic components. You can generally find these items at electronic component stores or in good Arduino starter kits. You'll need:

  1. An Arduino Uno or other compatible board
  2. A standard LED of any color
  3. A push button
  4. A 10 KOhm potentiometer
  5. A 470 Ohm resistor
  6. A 10 KOhm resistor
  7. A breadboard
  8. Jumper wires of various colors and sizes

Let's take a closer look at a few of these components.

Component 1 is an Arduino Uno or other compatible board. Arduino is a project that includes many boards and modules for different purposes, and Arduino Uno is the most basic among these. It's also the most used and most documented board of the whole Arduino family, so it's a great choice for developers who are just getting started with electronics.

Note: Arduino is an open hardware platform, so there are many other vendors who sell compatible boards that could be used to run the examples you see here. In this tutorial, you'll learn how to use the Arduino Uno.

Components 5 and 6 are resistors. Most resistors are identified by colored stripes according to a color code. In general, the first three colors represent the value of a resistor, while the fourth color represents its tolerance. For a 470 Ohm resistor, the first three colors are yellow, violet, and brown. For a 10 KOhm resistor, the first three colors are brown, black, and orange.

Component 7 is a breadboard, which you use to hook up all the other components and assemble the circuits. While a breadboard is not required, it's recommended that you get one if you intend to begin working with Arduino.

Arduino Software

In addition to these hardware components, you'll need to install some software. The platform includes the Arduino IDE, an Integrated Development Environment for programming Arduino devices, among other online tools.

Arduino was designed to allow you to program the boards with little difficulty. In general, you'll follow these steps:

  1. Connect the board to your PC
  2. Install and open the Arduino IDE
  3. Configure the board settings
  4. Write the code
  5. Press a button on the IDE to upload the program to the board

To install the Arduino IDE on your computer, download the appropriate version for your operating system from the Arduino website. Check the documentation for installation instructions:

Note: You'll be using the Arduino IDE in this tutorial, but Arduino also provides a web editor that will let you program Arduino boards using the browser.

Now that you've installed the Arduino IDE and gathered all the necessary components, you're ready to get started with Arduino! Next, you'll upload a "Hello, World!" program to your board.

"Hello, World!" With Arduino

The Arduino IDE comes with several example sketches you can use to learn the basics of Arduino. A sketch is the term you use for a program that you can upload to a board. Since the Arduino Uno doesn't have an attached display, you'll need a way to see the physical output from your program. You'll use the Blink example sketch to make a built-in LED on the Arduino board blink.

Uploading the Blink Example Sketch

To get started, connect the Arduino board to your PC using a USB cable and start the Arduino IDE. To open the Blink example sketch, access the File menu and select Examples, then 01.Basics and, finally, Blink:

Blink example sketch on Arduino IDE

The Blink example code will be loaded into a new IDE window. But before you can upload the sketch to the board, you'll need to configure the IDE by selecting your board and its connected port.

To configure the board, access the Tools menu and then Board. For Arduino Uno, you should select Arduino/Genuino Uno:

Selecting board on Arduino IDE

After you select the board, you have to set the appropriate port. Access the Tools menu again, and this time select Port:

Selecting port on Arduino IDE

The names of the ports may be different, depending on your operating system. In Windows, the ports will be named COM4, COM5, or something similar. In macOS or Linux, you may see something like /dev/ttyACM0 or /dev/ttyUSB0. If you have any problems setting the port, then take a look at the Arduino Troubleshooting Page.

After you've configured the board and port, you're all set to upload the sketch to your Arduino. To do that, you just have to press the Upload button in the IDE toolbar:

Buttons for verify and upload on Arduino IDE

When you press Upload, the IDE compiles the sketch and uploads it to your board. If you want to check for errors, then you can press Verify before Upload, which will only compile your sketch.

The USB cable provides a serial connection to both upload the program and power the Arduino board. During the upload, you'll see LEDs flashing on the board. After a few seconds, the uploaded program will run, and you'll see an LED light blink once every second:

Arduino buit-in LED blinking

After the upload is finished, the USB cable will continue to power the Arduino board. The program is stored in flash memory on the Arduino microcontroller. You can also use a battery or other external power supply to run the application without a USB cable.

Connecting External Components

In the previous section, you used an LED that was already present on the Arduino board. However, in most practical projects you'll need to connect external components to the board. To make these connections, Arduino has several pins of different types:

Arduino Uno Ports

Although these connections are commonly called pins, you can see that they're not exactly physical pins. Rather, the pins are holes in a socket to which you can connect jumper wires. In the figure above, you can see different groups of pins:

To get started using external components, you'll connect an external LED to run the Blink example sketch. The built-in LED is connected to digital pin #13. So, let's connect an external LED to that pin and check if it blinks. (A standard LED is one of the components you saw listed earlier.)

Before you connect anything to the Arduino board, it's good practice to disconnect it from the computer. With the USB cable unplugged, you'll be able to connect the LED to your board:

Circuit for blink sketch

Note that the figure shows the board with the digital pins now facing you.

Using a Breadboard

Electronic circuit projects usually involve testing several ideas, with you adding new components and making adjustments as you go. However, it can be tricky to connect components directly, especially if the circuit is large.

To facilitate prototyping, you can use a breadboard to connect the components. This is a device with several holes that are connected in a particular way so that you can easily connect components using jumper wires:

Breadboard

You can see which holes are interconnected by looking at the colored lines. You'll use the holes on the sides of the breadboard to power the circuit:

Then, you can easily connect components to the power source or the ground by simply using the other holes on the red and blue lines. The holes in the middle of the breadboard are connected as indicated by the colors. You'll use these to make connections between the components of the circuit. These two internal sections are separated by a small depression, over which you can connect integrated circuits (ICs).

You can use a breadboard to assemble the circuit used in the Blink example sketch:

Circuit for blink sketch on breadboard

For this circuit, it's important to note that the LED must be connected according to its polarity or it won't work. The positive terminal of the LED is called the anode and is generally the longer one. The negative terminal is called the cathode and is shorter. If you're using a recovered component, then you can also identify the terminals by looking for a flat side on the LED itself. This will indicate the position of the negative terminal.

When you connect an LED to an Arduino pin, you'll always need a resistor to limit its current and avoid burning out the LED prematurely. Here, you use a 470 Ohm resistor to do this. You can follow the connections and check that the circuit is the same:

For a more detailed explanation, check out How to Use a Breadboard.

After you finish the connection, plug the Arduino back into the PC and re-run the Blink sketch:

Arduino built-in and external LEDs blinking

As both LEDs are connected to digital pin 13, they blink together when the sketch is running.

"Hello, World!" With Arduino and Python

In the previous section, you uploaded the Blink sketch to your Arduino board. Arduino sketches are written in a language similar to C++ and are compiled and recorded on the flash memory of the microcontroller when you press Upload. While you can use another language to directly program the Arduino microcontroller, it's not a trivial task!

However, there are some approaches you can take to use Arduino with Python or other languages. One idea is to run the main program on a PC and use the serial connection to communicate with Arduino through the USB cable. The sketch would be responsible for reading the inputs, sending the information to the PC, and getting updates from the PC to update the Arduino outputs.

To control Arduino from the PC, you'd have to design a protocol for the communication between the PC and Arduino. For example, you could consider a protocol with messages like the following:

With the protocol defined, you could write an Arduino sketch to send messages to the PC and update the states of the pins according to the protocol. On the PC, you could write a program to control the Arduino through a serial connection, based on the protocol you've designed. For this, you can use whatever language and libraries you prefer, such as Python and the PySerial library.

Fortunately, there are standard protocols to do all this! Firmata is one of them. This protocol establishes a serial communication format that allows you to read digital and analog inputs, as well as send information to digital and analog outputs.

The Arduino IDE includes ready-made sketches that will drive Arduino through Python with the Firmata protocol. On the PC side, there are implementations of the protocol in several languages, including Python. To get started with Firmata, let's use it to implement a "Hello, World!" program.

Uploading the Firmata Sketch

Before you write your Python program to drive Arduino, you have to upload the Firmata sketch so that you can use that protocol to control the board. The sketch is available in the Arduino IDE's built-in examples. To open it, access the File menu, then Examples, followed by Firmata, and finally StandardFirmata:

Firmata example sketch on Arduino IDE

The sketch will be loaded into a new IDE window. To upload it to the Arduino, you can follow the same steps you did before:

  1. Plug the USB cable into the PC.
  2. Select the appropriate board and port on the IDE.
  3. Press Upload.

After the upload is finished, you won't notice any activity on the Arduino. To control it, you still need a program that can communicate with the board through the serial connection. To work with the Firmata protocol in Python, you'll need the pyFirmata package, which you can install with pip:

$ pip install pyfirmata

After the installation finishes, you can run an equivalent Blink application using Python and Firmata:

 1 import pyfirmata
 2 import time
 3 
 4 board = pyfirmata.Arduino('/dev/ttyACM0')
 5 
 6 while True:
 7     board.digital[13].write(1)
 8     time.sleep(1)
 9     board.digital[13].write(0)
10     time.sleep(1)

Here's how this program works. You import pyfirmata and use it to establish a serial connection with the Arduino board, which is represented by the board object in line 4. You also configure the port in this line by passing an argument to pyfirmata.Arduino(). You can use the Arduino IDE to find the port.

board.digital is a list whose elements represent the digital pins of the Arduino. These elements have the methods read() and write(), which will read and write the state of the pins. Like most embedded device programs, this program mainly consists of an infinite loop:

Now that you know the basics of how to control an Arduino with Python, let's go through some applications to interact with its inputs and outputs.

Reading Digital Inputs

Digital inputs can have only two possible values. In a circuit, each of these values is represented by a different voltage. The table below shows the digital input representation for a standard Arduino Uno board:

Value Level Voltage
0 Low 0V
1 High 5V

To control the LED, you'll use a push button to send digital input values to the Arduino. The button should send 0V to the board when it's released and 5V to the board when it's pressed. The figure below shows how to connect the button to the Arduino board:

Circuit for digital input

You may notice that the LED is connected to the Arduino on digital pin 13, just like before. Digital pin 10 is used as a digital input. To connect the push button, you have to use the 10 KOhm resistor, which acts as a pull down in this circuit. A pull down resistor ensures that the digital input gets 0V when the button is released.

When you release the button, you open the connection between the two wires on the button. Since there's no current flowing through the resistor, pin 10 just connects to the ground (GND). The digital input gets 0V, which represents the 0 (or low) state. When you press the button, you apply 5V to both the resistor and the digital input. A current flows through the resistor and the digital input gets 5V, which represents the 1 (or high) state.

You can use a breadboard to assemble the above circuit as well:

Circuit for digital input on breadboard

Now that you've assembled the circuit, you have to run a program on the PC to control it using Firmata. This program will turn on the LED, based on the state of the push button:

 1 import pyfirmata
 2 import time
 3 
 4 board = pyfirmata.Arduino('/dev/ttyACM0')
 5 
 6 it = pyfirmata.util.Iterator(board)
 7 it.start()
 8 
 9 board.digital[10].mode = pyfirmata.INPUT
10 
11 while True:
12     sw = board.digital[10].read()
13     if sw is True:
14         board.digital[13].write(1)
15     else:
16         board.digital[13].write(0)
17     time.sleep(0.1)

Let's walk through this program:

pyfirmata also offers a more compact syntax to work with input and output pins. This may be a good option for when you're working with several pins. You can rewrite the previous program to have more compact syntax:

 1 import pyfirmata
 2 import time
 3 
 4 board = pyfirmata.Arduino('/dev/ttyACM0')
 5 
 6 it = pyfirmata.util.Iterator(board)
 7 it.start()
 8 
 9 digital_input = board.get_pin('d:10:i')
10 led = board.get_pin('d:13:o')
11 
12 while True:
13     sw = digital_input.read()
14     if sw is True:
15         led.write(1)
16     else:
17         led.write(0)
18     time.sleep(0.1)

In this version, you use board.get_pin() to create two objects. digital_input represents the digital input state, and led represents the LED state. When you run this method, you have to pass a string argument composed of three elements separated by colons:

  1. The type of the pin (a for analog or d for digital)
  2. The number of the pin
  3. The mode of the pin (i for input or o for output)

Since digital_input is a digital input using pin 10, you pass the argument 'd:10:i'. The LED state is set to a digital output using pin 13, so the led argument is 'd:13:o'.

When you use board.get_pin(), there's no need to explicitly set up pin 10 as an input like you did before with pyfirmata.INPUT. After the pins are set, you can access the status of a digital input pin using read(), and set the status of a digital output pin with write().

Digital inputs are widely used in electronics projects. Several sensors provide digital signals, like presence or door sensors, that can be used as inputs to your circuits. However, there are some cases where you'll need to measure analog values, such as distance or physical quantities. In the next section, you'll see how to read analog inputs using Arduino with Python.

Reading Analog Inputs

In contrast to digital inputs, which can only be on or off, analog inputs are used to read values in some range. On the Arduino Uno, the voltage to an analog input ranges from 0V to 5V. Appropriate sensors are used to measure physical quantities, such as distances. These sensors are responsible for encoding these physical quantities in the proper voltage range so they can be read by the Arduino.

To read an analog voltage, the Arduino uses an analog-to-digital converter (ADC), which converts the input voltage to a digital number with a fixed number of bits. This determines the resolution of the conversion. The Arduino Uno uses a 10-bit ADC and can determine 1024 different voltage levels.

The voltage range for an analog input is encoded to numbers ranging from 0 to 1023. When 0V is applied, the Arduino encodes it to the number 0. When 5V is applied, the encoded number is 1023. All intermediate voltage values are proportionally encoded.

A potentiometer is a variable resistor that you can use to set the voltage applied to an Arduino analog input. You'll connect it to an analog input to control the frequency of a blinking LED:

Circuit for analog input

In this circuit, the LED is set up just as before. The end terminals of the potentiometer are connected to ground (GND) and 5V pins. This way, the central terminal (the cursor) can have any voltage in the 0V to 5V range depending on its position, which is connected to the Arduino on analog pin A0.

Using a breadboard, you can assemble this circuit as follows:

Circuit for analog input on breadboard

Before you control the LED, you can use the circuit to check the different values the Arduino reads, based on the position of the potentiometer. To do this, run the following program on your PC:

 1 import pyfirmata
 2 import time
 3 
 4 board = pyfirmata.Arduino('/dev/ttyACM0')
 5 it = pyfirmata.util.Iterator(board)
 6 it.start()
 7 
 8 analog_input = board.get_pin('a:0:i')
 9 
10 while True:
11     analog_value = analog_input.read()
12     print(analog_value)
13     time.sleep(0.1)

In line 8, you set up analog_input as the analog A0 input pin with the argument 'a:0:i'. Inside the infinite while loop, you read this value, store it in analog_value, and display the output to the console with print(). When you move the potentiometer while the program runs, you should output similar to this:

0.0
0.0293
0.1056
0.1838
0.2717
0.3705
0.4428
0.5064
0.5797
0.6315
0.6764
0.7243
0.7859
0.8446
0.9042
0.9677
1.0
1.0

The printed values change, ranging from 0 when the position of the potentiometer is on one end to 1 when it's on the other end. Note that these are float values, which may require conversion depending on the application.

To change the frequency of the blinking LED, you can use the analog_value to control how long the LED will be kept on or off:

 1 import pyfirmata
 2 import time
 3 
 4 board = pyfirmata.Arduino('/dev/ttyACM0')
 5 it = pyfirmata.util.Iterator(board)
 6 it.start()
 7 
 8 analog_input = board.get_pin('a:0:i')
 9 led = board.get_pin('d:13:o')
10 
11 while True:
12     analog_value = analog_input.read()
13     if analog_value is not None:
14         delay = analog_value + 0.01
15         led.write(1)
16         time.sleep(delay)
17         led.write(0)
18         time.sleep(delay)
19     else:
20         time.sleep(0.1)

Here, you calculate delay as analog_value + 0.01 to avoid having delay equal to zero. Otherwise, it's common to get an analog_value of None during the first few iterations. To avoid getting an error when running the program, you use a conditional in line 13 to test whether analog_value is None. Then you control the period of the blinking LED.

Try running the program and changing the position of the potentiometer. You'll notice the frequency of the blinking LED changes:

Led controlled by an analog input

By now, you've seen how to use digital inputs, digital outputs, and analog inputs on your circuits. In the next section, you'll see how to use analog outputs.

Using Analog Outputs

In some cases, it's necessary to have an analog output to drive a device that requires an analog signal. Arduino doesn't include a real analog output, one where the voltage could be set to any value in a certain range. However, Arduino does include several Pulse Width Modulation (PWM) outputs.

PWM is a modulation technique in which a digital output is used to generate a signal with variable power. To do this, it uses a digital signal of constant frequency, in which the duty cycle is changed according to the desired power. The duty cycle represents the fraction of the period in which the signal is set to high.

Not all Arduino digital pins can be used as PWM outputs. The ones that can be are identified by a tilde (~):

Arduino Uno PWM ports

Several devices are designed to be driven by PWM signals, including some motors. It's even possible to obtain a real analog signal from the PWM signal if you use analog filters. In the previous example, you used a digital output to turn an LED light on or off. In this section, you'll use PWM to control the brightness of an LED, according to the value of an analog input given by a potentiometer.

When a PWM signal is applied to an LED, its brightness varies according to the duty cycle of the PWM signal. You're going to use the following circuit:

Circuit for analog output

This circuit is identical to the one used in the previous section to test the analog input, except for one difference. Since it's not possible to use PWM with pin 13, the digital output pin used for the LED is pin 11.

You can use a breadboard to assemble the circuit as follows:

Circuit for analog output on breadboard

With the circuit assembled, you can control the LED using PWM with the following program:

 1 import pyfirmata
 2 import time
 3 
 4 board = pyfirmata.Arduino('/dev/ttyACM0')
 5 
 6 it = pyfirmata.util.Iterator(board)
 7 it.start()
 8 
 9 analog_input = board.get_pin('a:0:i')
10 led = board.get_pin('d:11:p')
11 
12 while True:
13     analog_value = analog_input.read()
14     if analog_value is not None:
15         led.write(analog_value)
16     time.sleep(0.1)

There are a few differences from the programs you've used previously:

  1. In line 10, you set led to PWM mode by passing the argument 'd:11:p'.
  2. In line 15, you call led.write() with analog_value as an argument. This is a value between 0 and 1, read from the analog input.

Here you can see the LED behavior when the potentiometer is moved:

PWM output on oscilloscope

To show the changes in the duty cycle, an oscilloscope is plugged into pin 11. When the potentiometer is in its zero position, you can see the LED is turned off, as pin 11 has 0V on its output. As you turn the potentiometer, the LED gets brighter as the PWM duty cycle increases. When you turn the potentiometer all the way, the duty cycle reaches 100%. The LED is turned on continuously at maximum brightness.

With this example, you've covered the basics of using an Arduino and its digital and analog inputs and outputs. In the next section, you'll see an application for using Arduino with Python to drive events on the PC.

Using a Sensor to Trigger a Notification

Firmata is a nice way to get started with Arduino with Python, but the need for a PC or other device to run the application can be costly, and this approach may not be practical in some cases. However, when it's necessary to collect data and send it to a PC using external sensors, Arduino and Firmata make a good combination.

In this section, you'll use a push button connected to your Arduino to mimic a digital sensor and trigger a notification on your machine. For a more practical application, you can think of the push button as a door sensor that will trigger an alarm notification, for example.

To display the notification on the PC, you're going to use Tkinter, the standard Python GUI toolkit. This will show a message box when you press the button. For an in-depth intro to Tkinter, check out the library's documentation.

You'll need to assemble the same circuit that you used in the digital input example:

Circuit for digital input

After you assemble the circuit, use the following program to trigger the notifications:

 1 import pyfirmata
 2 import time
 3 import tkinter
 4 from tkinter import messagebox
 5 
 6 root = tkinter.Tk()
 7 root.withdraw()
 8 
 9 board = pyfirmata.Arduino('/dev/ttyACM0')
10 
11 it = pyfirmata.util.Iterator(board)
12 it.start()
13 
14 digital_input = board.get_pin('d:10:i')
15 led = board.get_pin('d:13:o')
16 
17 while True:
18     sw = digital_input.read()
19     if sw is True:
20         led.write(1)
21         messagebox.showinfo("Notification", "Button was pressed")
22         root.update()
23         led.write(0)
24     time.sleep(0.1)

This program is similar to the one used in the digital input example, with a few changes:

To extend the notification example, you could even use the push button to send an email when pressed:

 1 import pyfirmata
 2 import time
 3 import smtplib
 4 import ssl
 5 
 6 def send_email():
 7     port = 465  # For SSL
 8     smtp_server = "smtp.gmail.com"
 9     sender_email = "<your email address>"
10     receiver_email = "<destination email address>"
11     password = "<password>"
12     message = """Subject: Arduino Notification\n The switch was turned on."""
13 
14     context = ssl.create_default_context()
15     with smtplib.SMTP_SSL(smtp_server, port, context=context) as server:
16         print("Sending email")
17         server.login(sender_email, password)
18         server.sendmail(sender_email, receiver_email, message)
19 
20 board = pyfirmata.Arduino('/dev/ttyACM0')
21 
22 it = pyfirmata.util.Iterator(board)
23 it.start()
24 
25 digital_input = board.get_pin('d:10:i')
26 
27 while True:
28     sw = digital_input.read()
29     if sw is True:
30         send_email()
31         time.sleep(0.1)

You can learn more about send_email() in Sending Emails With Python. Here, you configure the function with email server credentials, which will be used to send the email.

Note: If you use a Gmail account to send the emails, then you need to enable the Allow less secure apps option. For more information on how to do this, check out Sending Emails With Python.

With these example applications, you've seen how to use Firmata to interact with more complex Python applications. Firmata lets you use any sensor attached to the Arduino to obtain data for your application. Then you can process the data and make decisions within the main application. You can even use Firmata to send data to Arduino outputs, controlling switches or PWM devices.

If you're interested in using Firmata to interact with more complex applications, then try out some of these projects:

Conclusion

Microcontroller platforms are on the rise, thanks to the growing popularity of the Maker Movement and the Internet of Things. Platforms like Arduino are receiving a lot of attention in particular, as they allow developers just like you to use their skills and dive into electronic projects.

You learned how to:

You also saw how Firmata may be a very interesting alternative for projects that demand a PC and depend on sensor data. Plus, it's an easy way to get started with Arduino if you already know Python!

Further Reading

Now that you know the basics of controlling Arduino with Python, you can start working on more complex applications. There are several tutorials that can help you develop integrated projects. Here are a few ideas:

Lastly, there are other ways of using Python in microcontrollers besides Firmata and Arduino:


[ Improve Your Python With 🐍 Python Tricks 💌 - Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

21 Oct 2019 2:00pm GMT

qutebrowser development blog: Current qutebrowser roadmap and next crowdfunding

More than half a year ago, I posted a qutebrowser roadmap - I thought it's about time for an update on how things are looking at the moment!

Upcoming crowdfunding

I finished my Bachelor of Science in September at the University of Applied Sciences in Rapperswil.

Now I'm employed around 16h …

21 Oct 2019 1:50pm GMT

qutebrowser development blog: Current qutebrowser roadmap and next crowdfunding

More than half a year ago, I posted a qutebrowser roadmap - I thought it's about time for an update on how things are looking at the moment!

Upcoming crowdfunding

I finished my Bachelor of Science in September at the University of Applied Sciences in Rapperswil.

Now I'm employed around 16h …

21 Oct 2019 1:50pm GMT

Erik Marsja: Converting HTML to a Jupyter Notebook

The post Converting HTML to a Jupyter Notebook appeared first on Erik Marsja.

In this short post, we are going to learn how to turn the code from blog posts to Jupyter notebooks.

In this post, we are going to use the Python packages BeautifulSoup4, json, and urllib. We are going to use these packages to scrape the code from webpages putting their code within <code></code>.

Note, this code is not intended to steal other people's code. I created this script to scrape my code and save it to Jupyter notebooks because I noticed that my code, sometimes did not work as intended.

Install the Needed Packages

Now, we need to install BeautifulSoup4 before we continue converting html
to jupyter notebooks. Furthermore, we need to install lxml.

How to Install Python Packages using conda

In this section, we are going to learn how to install the needed packages using the packages manager conda. First, open up the Anaconda Powershell Prompt

Now, we are ready to install BeautifulSoup4.

conda -c install anaconda beautifulsoup4 lxml

How to Install Python Packages using Pip

It is, of course, possible to install the packages using pip as well:

install beautifulsoup4 lxml

How to Convert HTML to a Jupyter Notebook

Now, when we have installed the Python packages, we can continue with scraping the code from a web page. In the example, below, we will start by importing BeautifulSoup from bs4, json, and urllib. Next, we have the URL to the webpage that we want to convert to a Jupyter notebooks (this).

from bs4 import BeautifulSoup
  import json
  import urllib

  url = 'https://www.marsja.se/python-manova-made-easy-using-statsmodels/'

Setting a Custom User-Agent

In the next line of code, we create the dictionary headers.

This is because many websites (including the one you are reading now) will block web scrapers and this will prevent that from happening.


  headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11'\
             '(KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11',
         'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
         'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
         'Accept-Encoding': 'none',
         'Accept-Language': 'en-US,en;q=0.8',
         'Connection': 'keep-alive'}
  

In the next code chunk, we are going to create a Request object. This object represents the HTTP request we are making.

Simply put, we create a Request object that specifies the URL we want to retrieve. Furthermore, we are calling urlopen using the Request object. This will, in turn, a response object for the requested URL.

Finally, we call .read() on the response:

req = urllib.request.Request(url,
    headers=headers)
  page = urllib.request.urlopen(req)
  text = page.read()

We are now going to use BeautifulSoup4 to get make it easier to scrape the html:

soup = BeautifulSoup(text, 'lxml')
soup

Jupyter Notebook Metadata

Now we're ready to convert HTML to a Jupyter Notebook (this code was inspired by this code example). First, we start by creating some metadata for the Jupyter notebook.

Jupyter notebook metadata

In the code below, we start by creating a dictionary in which we will, later, store our scraped code elements. This is going to be the metadata for the Jupyter notebook we will create. Note, .ipynb are simple JSON files, containing text, code, rich media output, and metadata. The metadata is not required but here we will add what language we are using (i.e., Python 3).

create_nb = {'nbformat': 4, 'nbformat_minor': 2, 
              'cells': [], 'metadata': 
             {"kernelspec": 
              {"display_name": "Python 3", 
               "language": "python", "name": "python3"
  }}}
Example code cell from a Jupyter Notebook

More information bout the format of Jupyter notebooks can be found here.

Getting the Code Elements from the HTML

Second, we are creating a Python function called get_code. This function will take two arguments. First, the beautifulsoup object, we earlier created, and the content_class to search for content in. In the case, of this particular WordPress, blog this will be post-content

Next, we are looping through all div tags in the soup object. Here, we only look for the post content. Next, we get all the code chunks searching for all code tags

In the final loop, we are going through each code chunk and creating a new dictionary (cell) in which we are going to store the code. The important part is where we add the text, using the get_text method. Here we are getting our code from the code chunk and add it to the dictionary.

Finally, we add this to the dictionary, nb_data, that will contain the data that we are going to save as a jupyter notebook (i.e., the blog post we have scraped).

def get_data(soup, content_class):
    for div in soup.find_all('div', 
                             attrs={'class': content_class}):
        
        code_chunks = div.find_all('code')
        
        for chunk in code_chunks:
            cell_text = ' '
            cell = {}
            cell['metadata'] = {}
            cell['outputs'] = []
            cell['source'] = [chunk.get_text()]
            cell['execution_count'] = None
            cell['cell_type'] = 'code'
            create_nb['cells'].append(cell)

get_data(soup, 'post-content')

with open('Python_MANOVA.ipynb', 'w') as jynotebook:
    jynotebook.write(json.dumps(create_nb))

Note, we get the nb_data which is a dictionary from which will create our notebook from. In the final two rows, of the code chunk, we will open a file (i.e., test.ipynb) and write to this file using json dump method.

Here's a Jupyter notebook containing all code above.

The post Converting HTML to a Jupyter Notebook appeared first on Erik Marsja.

21 Oct 2019 11:00am GMT

Erik Marsja: Converting HTML to a Jupyter Notebook

The post Converting HTML to a Jupyter Notebook appeared first on Erik Marsja.

In this short post, we are going to learn how to turn the code from blog posts to Jupyter notebooks.

In this post, we are going to use the Python packages BeautifulSoup4, json, and urllib. We are going to use these packages to scrape the code from webpages putting their code within <code></code>.

Note, this code is not intended to steal other people's code. I created this script to scrape my code and save it to Jupyter notebooks because I noticed that my code, sometimes did not work as intended.

Install the Needed Packages

Now, we need to install BeautifulSoup4 before we continue converting html
to jupyter notebooks. Furthermore, we need to install lxml.

How to Install Python Packages using conda

In this section, we are going to learn how to install the needed packages using the packages manager conda. First, open up the Anaconda Powershell Prompt

Now, we are ready to install BeautifulSoup4.

conda -c install anaconda beautifulsoup4 lxml

How to Install Python Packages using Pip

It is, of course, possible to install the packages using pip as well:

install beautifulsoup4 lxml

How to Convert HTML to a Jupyter Notebook

Now, when we have installed the Python packages, we can continue with scraping the code from a web page. In the example, below, we will start by importing BeautifulSoup from bs4, json, and urllib. Next, we have the URL to the webpage that we want to convert to a Jupyter notebooks (this).

from bs4 import BeautifulSoup
  import json
  import urllib

  url = 'https://www.marsja.se/python-manova-made-easy-using-statsmodels/'

Setting a Custom User-Agent

In the next line of code, we create the dictionary headers.

This is because many websites (including the one you are reading now) will block web scrapers and this will prevent that from happening.


  headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11'\
             '(KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11',
         'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
         'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
         'Accept-Encoding': 'none',
         'Accept-Language': 'en-US,en;q=0.8',
         'Connection': 'keep-alive'}
  

In the next code chunk, we are going to create a Request object. This object represents the HTTP request we are making.

Simply put, we create a Request object that specifies the URL we want to retrieve. Furthermore, we are calling urlopen using the Request object. This will, in turn, a response object for the requested URL.

Finally, we call .read() on the response:

req = urllib.request.Request(url,
    headers=headers)
  page = urllib.request.urlopen(req)
  text = page.read()

We are now going to use BeautifulSoup4 to get make it easier to scrape the html:

soup = BeautifulSoup(text, 'lxml')
soup

Jupyter Notebook Metadata

Now we're ready to convert HTML to a Jupyter Notebook (this code was inspired by this code example). First, we start by creating some metadata for the Jupyter notebook.

Jupyter notebook metadata

In the code below, we start by creating a dictionary in which we will, later, store our scraped code elements. This is going to be the metadata for the Jupyter notebook we will create. Note, .ipynb are simple JSON files, containing text, code, rich media output, and metadata. The metadata is not required but here we will add what language we are using (i.e., Python 3).

create_nb = {'nbformat': 4, 'nbformat_minor': 2, 
              'cells': [], 'metadata': 
             {"kernelspec": 
              {"display_name": "Python 3", 
               "language": "python", "name": "python3"
  }}}
Example code cell from a Jupyter Notebook

More information bout the format of Jupyter notebooks can be found here.

Getting the Code Elements from the HTML

Second, we are creating a Python function called get_code. This function will take two arguments. First, the beautifulsoup object, we earlier created, and the content_class to search for content in. In the case, of this particular WordPress, blog this will be post-content

Next, we are looping through all div tags in the soup object. Here, we only look for the post content. Next, we get all the code chunks searching for all code tags

In the final loop, we are going through each code chunk and creating a new dictionary (cell) in which we are going to store the code. The important part is where we add the text, using the get_text method. Here we are getting our code from the code chunk and add it to the dictionary.

Finally, we add this to the dictionary, nb_data, that will contain the data that we are going to save as a jupyter notebook (i.e., the blog post we have scraped).

def get_data(soup, content_class):
    for div in soup.find_all('div', 
                             attrs={'class': content_class}):
        
        code_chunks = div.find_all('code')
        
        for chunk in code_chunks:
            cell_text = ' '
            cell = {}
            cell['metadata'] = {}
            cell['outputs'] = []
            cell['source'] = [chunk.get_text()]
            cell['execution_count'] = None
            cell['cell_type'] = 'code'
            create_nb['cells'].append(cell)

get_data(soup, 'post-content')

with open('Python_MANOVA.ipynb', 'w') as jynotebook:
    jynotebook.write(json.dumps(create_nb))

Note, we get the nb_data which is a dictionary from which will create our notebook from. In the final two rows, of the code chunk, we will open a file (i.e., test.ipynb) and write to this file using json dump method.

Here's a Jupyter notebook containing all code above.

The post Converting HTML to a Jupyter Notebook appeared first on Erik Marsja.

21 Oct 2019 11:00am GMT

Mike Driscoll: PyDev of the Week: Sophy Wong

This week we welcome Sophy Wong (@sophywong) as our PyDev of the Week! Sophy is a maker who uses Circuit Python for creating wearables. She is also a writer and speaker at Maker events. You can see some of her creations on her Youtube Channel or her website. Let's take a few moments to get to know her better!

Sophy's LED Manicure

Can you tell us a little about yourself (hobbies, education, etc):

I am a designer and maker currently working mostly with wearable electronics projects. My background is in graphic design, and I have also worked in fashion and costumes on my way to wearable electronics. I like to explore the different ways people interact with technology, and much of my work is inspired by sci-fi and pop culture. My projects often combine technology, like microcontrollers and 3D printing, with hand crafts like sculpting, painting, and sewing.

Why did you start using Python?

I discovered Python through Adafruit's development of Circuit Python. Adafruit's thorough documentation and huge library of tutorial projects make it easy for me to learn and write code for my projects. I'm primarily a designer, and code is a tool I use to bring my ideas to life. Circuit Python helps me learn programming basics, and is also powerful enough to support more complex projects as I gain more skills.

What other programming languages do you know and which is your favorite?

I also use Arduino for some projects, which lets me use the many fantastic Arduino libraries out there, like FastLED. I often use MakeCode when creating a project for a tutorial or educational workshop. As a visual programming tool, MakeCode is intuitive to use and easy to explain with screenshots. It's still robust enough to support fairly complex projects, and is a great first step before going further with Circuit Python or Arduino.

What projects are you working on now?

I recently completed a project that involved adding Adafruit's NeoPixel RGB LEDs to a jacket, using 3D printed diffusers printed directly on fabric. I'm working on a project now that expands the technique to a larger, more elaborate garment. I'm also starting to work on another space suit concept, learning how to use a desktop PCB mill, and of course, writing more wearable tech project tutorials!

Which Python libraries are your favorite (core or 3rd party)?

Adafruit makes great Circuit Python libraries for all of their components, and I use their NeoPixel library for Circuit Python in almost every project, because I love making things light up with NeoPixels.

Do you have any advice for people who want to become makers?

Pick a project you're really excited about and just start. Don't wait for the perfect materials, or the fanciest tools, get scrappy and figure it out as you go. Sometimes too much planning and preparation can steal all your energy before you ever get going, and make your project feel overwhelming. So fall in love with your idea, and jump in while you're excited. Trust yourself, and have fun. You can't fail if you never give up!

What new wearable tech are you excited about?

I'm really excited about VR and the potential for wearables to add to the immersive experience. It's a technology I remember being amazed by when I was a kid. I really wanted to try it, but the technology was so far out of reach for me, I thought I'd never be able to experience it myself. Now that it's available as consumer technology, I'm very interested to see makers create their own VR experiences, as well as wearable devices and peripherals for VR.

I'm also very interested in space exploration and space suits. I've made my own space suit costume, a conceptual design that is based on sci-fi renderings. With upcoming missions to the Moon and Mars, I'm excited to see innovations in space suit design, and how designers and engineers create new suits for both astronauts and space tourists.

Is there anything else you'd like to say?

Thank you to everyone who documents their projects and shares their work for others to learn from! Thanks to tutorials and libraries written and published by others, I'm able to bring my ideas to life with programmable electronics. I never thought that studying design would lead me to learning how to code, and writing tutorials to help others get started with programming. Now, writing Circuit Python code for my projects is one of my favorite parts of the process!

Thanks for doing the interview, Sophy!

The post PyDev of the Week: Sophy Wong appeared first on The Mouse Vs. The Python.

21 Oct 2019 5:05am GMT

Mike Driscoll: PyDev of the Week: Sophy Wong

This week we welcome Sophy Wong (@sophywong) as our PyDev of the Week! Sophy is a maker who uses Circuit Python for creating wearables. She is also a writer and speaker at Maker events. You can see some of her creations on her Youtube Channel or her website. Let's take a few moments to get to know her better!

Sophy's LED Manicure

Can you tell us a little about yourself (hobbies, education, etc):

I am a designer and maker currently working mostly with wearable electronics projects. My background is in graphic design, and I have also worked in fashion and costumes on my way to wearable electronics. I like to explore the different ways people interact with technology, and much of my work is inspired by sci-fi and pop culture. My projects often combine technology, like microcontrollers and 3D printing, with hand crafts like sculpting, painting, and sewing.

Why did you start using Python?

I discovered Python through Adafruit's development of Circuit Python. Adafruit's thorough documentation and huge library of tutorial projects make it easy for me to learn and write code for my projects. I'm primarily a designer, and code is a tool I use to bring my ideas to life. Circuit Python helps me learn programming basics, and is also powerful enough to support more complex projects as I gain more skills.

What other programming languages do you know and which is your favorite?

I also use Arduino for some projects, which lets me use the many fantastic Arduino libraries out there, like FastLED. I often use MakeCode when creating a project for a tutorial or educational workshop. As a visual programming tool, MakeCode is intuitive to use and easy to explain with screenshots. It's still robust enough to support fairly complex projects, and is a great first step before going further with Circuit Python or Arduino.

What projects are you working on now?

I recently completed a project that involved adding Adafruit's NeoPixel RGB LEDs to a jacket, using 3D printed diffusers printed directly on fabric. I'm working on a project now that expands the technique to a larger, more elaborate garment. I'm also starting to work on another space suit concept, learning how to use a desktop PCB mill, and of course, writing more wearable tech project tutorials!

Which Python libraries are your favorite (core or 3rd party)?

Adafruit makes great Circuit Python libraries for all of their components, and I use their NeoPixel library for Circuit Python in almost every project, because I love making things light up with NeoPixels.

Do you have any advice for people who want to become makers?

Pick a project you're really excited about and just start. Don't wait for the perfect materials, or the fanciest tools, get scrappy and figure it out as you go. Sometimes too much planning and preparation can steal all your energy before you ever get going, and make your project feel overwhelming. So fall in love with your idea, and jump in while you're excited. Trust yourself, and have fun. You can't fail if you never give up!

What new wearable tech are you excited about?

I'm really excited about VR and the potential for wearables to add to the immersive experience. It's a technology I remember being amazed by when I was a kid. I really wanted to try it, but the technology was so far out of reach for me, I thought I'd never be able to experience it myself. Now that it's available as consumer technology, I'm very interested to see makers create their own VR experiences, as well as wearable devices and peripherals for VR.

I'm also very interested in space exploration and space suits. I've made my own space suit costume, a conceptual design that is based on sci-fi renderings. With upcoming missions to the Moon and Mars, I'm excited to see innovations in space suit design, and how designers and engineers create new suits for both astronauts and space tourists.

Is there anything else you'd like to say?

Thank you to everyone who documents their projects and shares their work for others to learn from! Thanks to tutorials and libraries written and published by others, I'm able to bring my ideas to life with programmable electronics. I never thought that studying design would lead me to learning how to code, and writing tutorials to help others get started with programming. Now, writing Circuit Python code for my projects is one of my favorite parts of the process!

Thanks for doing the interview, Sophy!

The post PyDev of the Week: Sophy Wong appeared first on The Mouse Vs. The Python.

21 Oct 2019 5:05am GMT

20 Oct 2019

feedPlanet Python

Go Deh: Indent datastructure for trees

I was browsing StackOverflow and came across a question that mentioned a new-to-me format for a datatructure for holding a tree of data.I am well used to the (name, list_of_children) set of interconnected node datastructures way of doing things, but this mentioned where you

  1. Create an empty list
  2. then for each node starting at the root which has a depth of zero:
    1. add the (depth, name) tuple of the node to the list
    2. visit all this nodes child node.

It is a preorder traversal of the conceptual tree, aggregating (depth, name) tuples into a list to form what I am calling the indent tree datastructure as it captures all the information of the tree but in a different datastructure than normal, and can be extended to allow data at each node and might be a useful alternative for DB storage of trees.

An example tree:





Its indent format:

[(0, 'R'),
(1, 'a'),
(2, 'c'),
(3, 'd'),
(3, 'h'),
(1, 'b'),
(2, 'e'),
(3, 'f'),
(4, 'g'),
(2, 'i')]


Indented representation:

If you print out successive names from the indent format list above, one per line, with indent from the left of the indent value, then you get a nice textual regpresentation of the tree; expanded left-to-right rather than the top-down representation of the graphic:


R
a
c
d
h
b
e
f
g
i



Code

I wrote some code to manipulate and traverse this kind of tree datastructure, as well as to use graphviz to draw graphical representations.


  1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
import graphviz as gv
from collections import defaultdict
import re
from pprint import pprint as pp


#%%
node_attr = {at: val for at, val in
(line.strip().split() for line in '''
shape record
fontsize 12
height 0.1
width 0.1
rankdir LR
ranksep 0.25
nodesep 0.25
'''.strip().split('\n'))}
edge_attr = {at: val for at, val in
(line.strip().split() for line in '''
arrowhead none
minlen 1
'''.strip().split('\n'))}
root_attr = {at: val for at, val in
(line.strip().split() for line in '''
fontcolor green
color brown
fontname bold
'''.strip().split('\n'))}
#%%


def _pc2indent(pc, root=None, indent=None, children=None):
"parent-child dependency dict to indent format conversion"
if root is None:
parents = set(pc)
kids = set(sum(pc.values(), []))
root = parents - kids
assert len(root) == 1, f"Need exactly one root: {root}"
root = root.pop()
if indent is None:
indent = 0
if children is None:
children = []
children.append((indent, root))
if root in pc:
for child in pc[root]:
pc2indent(pc, child, indent+1, children)
return children

def dot2indent(tree):
"simple dot format to indent format translator"
depends = defaultdict(list)
for matchobj in re.finditer(r'\s+(\S+)\s*->\s*(\S+)\s', tree.source):
parent, child = matchobj.groups()
depends[parent].append(child)
parent2child = dict(depends)
return _pc2indent(parent2child)

#%%
def _highlight(txt):
return f'#{txt}#'

def pp_indent(data, hlight=None, indent=' '):
"simple printout of indent format tree with optional node highlighting"
if hlight is None:
hlight = set()
for level, name in data:
print(indent * level
+ (_highlight(name) if name in hlight else name))

#%%
def indent2dot(lst):
tree = gv.Digraph(name='indent2dot', node_attr=node_attr)
tree.edge_attr.update(**edge_attr)
levelparent = {}
for level, name in lst:
levelparent[level] = name
if level - 1 in levelparent:
tree.edge(levelparent[level-1], name)
else:
tree.node(name, _attributes=root_attr)
return tree

#%%
def printwithspace(i):
print(i, end=' ')

def preorder(tree, visitor=printwithspace):
for indent, name in tree:
visitor(name)

def levelorder(tree, reverse_depth=False, reverse_in_level=False,
visitor=printwithspace):
if not tree:
return
indent2name = defaultdict(list)
mx = -1
for indent, name in tree:
if indent > mx:
mx = indent
indent2name[indent].append(name)
if reverse_in_level:
for names in indent2name.values():
names.reverse()
if not reverse_depth:
for indent in range(0, mx + 1):
for name in indent2name[indent]:
visitor(name)
else:
for indent in range(mx, -1, -1):
for name in indent2name[indent]:
visitor(name)


#%%
# Example tree
ex1 = [(0, '1'),
(1, '2'),
(2, '4'),
(3, '7'),
(2, '5'),
(1, '3'),
(2, '6'),
(3, '8'),
(3, '9'),
(3, '10'),
]

#%%
if __name__ == '__main__':
print('A tree in indent datastructure format:')
pp(ex1)
print('\nSame tree, printed as indented list:')
pp_indent(ex1)
print('\nSame tree, drawn by graphviz:')
display(indent2dot(ex1)) # display works in spyder/Jupyter

print('\nSame tree, preorder traversal:')
preorder(ex1)
print()
print('Same tree, levelorder traversal:')
levelorder(ex1)
print()
print('Same tree, reverse_depth levelorder traversal:')
levelorder(ex1, True)
print()
print('Same tree, reverse_depth, reverse_in_level levelorder traversal:')
levelorder(ex1, True, True)
print()
print('Same tree, depth_first, reverse_in_level levelorder traversal:')
levelorder(ex1, False, True)
print()


Output:



A tree in indent datastructure format:
[(0, '1'),
(1, '2'),
(2, '4'),
(3, '7'),
(2, '5'),
(1, '3'),
(2, '6'),
(3, '8'),
(3, '9'),
(3, '10')]

Same tree, printed as indented list:

1
2
4
7
5
3
6
8
9
10

Same tree, drawn by graphviz:


Same tree, preorder traversal:
1 2 4 7 5 3 6 8 9 10
Same tree, levelorder traversal:
1 2 3 4 5 6 7 8 9 10
Same tree, reverse_depth levelorder traversal:
7 8 9 10 4 5 6 2 3 1
Same tree, reverse_depth, reverse_in_level levelorder traversal:
10 9 8 7 6 5 4 3 2 1
Same tree, depth_first, reverse_in_level levelorder traversal:
1 3 2 6 5 4 10 9 8 7

In [47]:



20 Oct 2019 5:46pm GMT

Go Deh: Indent datastructure for trees

I was browsing StackOverflow and came across a question that mentioned a new-to-me format for a datatructure for holding a tree of data.I am well used to the (name, list_of_children) set of interconnected node datastructures way of doing things, but this mentioned where you

  1. Create an empty list
  2. then for each node starting at the root which has a depth of zero:
    1. add the (depth, name) tuple of the node to the list
    2. visit all this nodes child node.

It is a preorder traversal of the conceptual tree, aggregating (depth, name) tuples into a list to form what I am calling the indent tree datastructure as it captures all the information of the tree but in a different datastructure than normal, and can be extended to allow data at each node and might be a useful alternative for DB storage of trees.

An example tree:





Its indent format:

[(0, 'R'),
(1, 'a'),
(2, 'c'),
(3, 'd'),
(3, 'h'),
(1, 'b'),
(2, 'e'),
(3, 'f'),
(4, 'g'),
(2, 'i')]


Indented representation:

If you print out successive names from the indent format list above, one per line, with indent from the left of the indent value, then you get a nice textual regpresentation of the tree; expanded left-to-right rather than the top-down representation of the graphic:


R
a
c
d
h
b
e
f
g
i



Code

I wrote some code to manipulate and traverse this kind of tree datastructure, as well as to use graphviz to draw graphical representations.


  1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
import graphviz as gv
from collections import defaultdict
import re
from pprint import pprint as pp


#%%
node_attr = {at: val for at, val in
(line.strip().split() for line in '''
shape record
fontsize 12
height 0.1
width 0.1
rankdir LR
ranksep 0.25
nodesep 0.25
'''.strip().split('\n'))}
edge_attr = {at: val for at, val in
(line.strip().split() for line in '''
arrowhead none
minlen 1
'''.strip().split('\n'))}
root_attr = {at: val for at, val in
(line.strip().split() for line in '''
fontcolor green
color brown
fontname bold
'''.strip().split('\n'))}
#%%


def _pc2indent(pc, root=None, indent=None, children=None):
"parent-child dependency dict to indent format conversion"
if root is None:
parents = set(pc)
kids = set(sum(pc.values(), []))
root = parents - kids
assert len(root) == 1, f"Need exactly one root: {root}"
root = root.pop()
if indent is None:
indent = 0
if children is None:
children = []
children.append((indent, root))
if root in pc:
for child in pc[root]:
pc2indent(pc, child, indent+1, children)
return children

def dot2indent(tree):
"simple dot format to indent format translator"
depends = defaultdict(list)
for matchobj in re.finditer(r'\s+(\S+)\s*->\s*(\S+)\s', tree.source):
parent, child = matchobj.groups()
depends[parent].append(child)
parent2child = dict(depends)
return _pc2indent(parent2child)

#%%
def _highlight(txt):
return f'#{txt}#'

def pp_indent(data, hlight=None, indent=' '):
"simple printout of indent format tree with optional node highlighting"
if hlight is None:
hlight = set()
for level, name in data:
print(indent * level
+ (_highlight(name) if name in hlight else name))

#%%
def indent2dot(lst):
tree = gv.Digraph(name='indent2dot', node_attr=node_attr)
tree.edge_attr.update(**edge_attr)
levelparent = {}
for level, name in lst:
levelparent[level] = name
if level - 1 in levelparent:
tree.edge(levelparent[level-1], name)
else:
tree.node(name, _attributes=root_attr)
return tree

#%%
def printwithspace(i):
print(i, end=' ')

def preorder(tree, visitor=printwithspace):
for indent, name in tree:
visitor(name)

def levelorder(tree, reverse_depth=False, reverse_in_level=False,
visitor=printwithspace):
if not tree:
return
indent2name = defaultdict(list)
mx = -1
for indent, name in tree:
if indent > mx:
mx = indent
indent2name[indent].append(name)
if reverse_in_level:
for names in indent2name.values():
names.reverse()
if not reverse_depth:
for indent in range(0, mx + 1):
for name in indent2name[indent]:
visitor(name)
else:
for indent in range(mx, -1, -1):
for name in indent2name[indent]:
visitor(name)


#%%
# Example tree
ex1 = [(0, '1'),
(1, '2'),
(2, '4'),
(3, '7'),
(2, '5'),
(1, '3'),
(2, '6'),
(3, '8'),
(3, '9'),
(3, '10'),
]

#%%
if __name__ == '__main__':
print('A tree in indent datastructure format:')
pp(ex1)
print('\nSame tree, printed as indented list:')
pp_indent(ex1)
print('\nSame tree, drawn by graphviz:')
display(indent2dot(ex1)) # display works in spyder/Jupyter

print('\nSame tree, preorder traversal:')
preorder(ex1)
print()
print('Same tree, levelorder traversal:')
levelorder(ex1)
print()
print('Same tree, reverse_depth levelorder traversal:')
levelorder(ex1, True)
print()
print('Same tree, reverse_depth, reverse_in_level levelorder traversal:')
levelorder(ex1, True, True)
print()
print('Same tree, depth_first, reverse_in_level levelorder traversal:')
levelorder(ex1, False, True)
print()


Output:



A tree in indent datastructure format:
[(0, '1'),
(1, '2'),
(2, '4'),
(3, '7'),
(2, '5'),
(1, '3'),
(2, '6'),
(3, '8'),
(3, '9'),
(3, '10')]

Same tree, printed as indented list:

1
2
4
7
5
3
6
8
9
10

Same tree, drawn by graphviz:


Same tree, preorder traversal:
1 2 4 7 5 3 6 8 9 10
Same tree, levelorder traversal:
1 2 3 4 5 6 7 8 9 10
Same tree, reverse_depth levelorder traversal:
7 8 9 10 4 5 6 2 3 1
Same tree, reverse_depth, reverse_in_level levelorder traversal:
10 9 8 7 6 5 4 3 2 1
Same tree, depth_first, reverse_in_level levelorder traversal:
1 3 2 6 5 4 10 9 8 7

In [47]:



20 Oct 2019 5:46pm GMT

Kushal Das: Started a newsletter

I started a newsletter, focusing on different stories I read about privacy, security, programming in general. Following the advice from Martijn Grooten, I am storing all the interesting links I read (for many months). I used to share these only over Twitter, but, as I retweet many things, it was not easy to share a selected few.

I also did not want to push them in my regular blog. I wanted a proper newsletter over email service. But, keeping the reader's privacy was a significant point to choose the service. I finally decided to go with Write.as Letters service. I am already using their open source project WriteFreely. This is an excellent excuse to use their tool more and also pay them for the fantastic tools + service.

Feel free to subscribe to the newsletter and share the link with your friends.

20 Oct 2019 11:31am GMT

Kushal Das: Started a newsletter

I started a newsletter, focusing on different stories I read about privacy, security, programming in general. Following the advice from Martijn Grooten, I am storing all the interesting links I read (for many months). I used to share these only over Twitter, but, as I retweet many things, it was not easy to share a selected few.

I also did not want to push them in my regular blog. I wanted a proper newsletter over email service. But, keeping the reader's privacy was a significant point to choose the service. I finally decided to go with Write.as Letters service. I am already using their open source project WriteFreely. This is an excellent excuse to use their tool more and also pay them for the fantastic tools + service.

Feel free to subscribe to the newsletter and share the link with your friends.

20 Oct 2019 11:31am GMT

Test and Code: 92: 9 Steps to Crater Quality & Destroy Customer Satisfaction - Cristian Medina

Cristian Medina wrote an article recently called "Test Engineering Anti-Patterns: Destroy Your Customer Satisfaction and Crater Your Quality By Using These 9 Easy Organizational Practices"

Of course, it's sarcastic, and aims to highlight many problems with organizational practices that reduce software quality.

The article doesn't go out of character, and only promotes the anti-patterns.
However, in this interview, we discuss each point, and the corollary of what you really should do. At least, our perspectives.

Here's the list of all the points discussed in the article and in this episode:

  1. Make the Test teams solely responsible for quality
  2. Require all tests to be automated before releasing
  3. Require 100% code coverage
  4. Isolate the Test organization from Development
  5. Measure the success of the process, not the product.
    • Metrics, if rewarded, will always be gamed.
  6. Require granular projections from engineers
  7. Reward quick patching instead of solving
  8. Plan for today instead of tomorrow

Special Guest: Cristian Medina.

Sponsored By:

Support Test & Code

Links:

<p>Cristian Medina wrote an article recently called &quot;Test Engineering Anti-Patterns: Destroy Your Customer Satisfaction and Crater Your Quality By Using These 9 Easy Organizational Practices&quot;</p> <p>Of course, it&#39;s sarcastic, and aims to highlight many problems with organizational practices that reduce software quality.</p> <p>The article doesn&#39;t go out of character, and only promotes the anti-patterns.<br> However, in this interview, we discuss each point, and the corollary of what you really should do. At least, our perspectives.</p> <p>Here&#39;s the list of all the points discussed in the article and in this episode:</p> <ol> <li>Make the Test teams solely responsible for quality</li> <li>Require all tests to be automated before releasing</li> <li>Require 100% code coverage</li> <li>Isolate the Test organization from Development</li> <li>Measure the success of the process, not the product. <ul> <li>Metrics, if rewarded, will always be gamed.</li> </ul></li> <li>Require granular projections from engineers</li> <li>Reward quick patching instead of solving</li> <li>Plan for today instead of tomorrow</li> </ol><p>Special Guest: Cristian Medina.</p><p>Sponsored By:</p><ul><li><a href="https://azure.com/pipelines" rel="nofollow">Azure Pipelines</a>: <a href="https://azure.com/pipelines" rel="nofollow">Automate your builds and deployments with pipelines so you spend less time with the nuts and bolts and more time being creative. Many organizations and open source projects are using Azure Pipelines already. Get started for free at azure.com/pipelines</a></li></ul><p><a href="https://www.patreon.com/testpodcast" rel="payment">Support Test & Code</a></p><p>Links:</p><ul><li><a href="https://www.freecodecamp.org/news/organizational-test-practices-guaranteed-to-lower-quality/" title="Test Engineering Anti-Patterns: Destroy Your Customer Satisfaction and Crater Your Quality By Using These 9 Easy Organizational Practices" rel="nofollow">Test Engineering Anti-Patterns: Destroy Your Customer Satisfaction and Crater Your Quality By Using These 9 Easy Organizational Practices</a> &mdash; The article we discuss in the show.</li><li><a href="https://tryexceptpass.org/" title="tryexceptpass" rel="nofollow">tryexceptpass</a> &mdash; Cris's blog</li></ul>

20 Oct 2019 7:00am GMT

Test and Code: 92: 9 Steps to Crater Quality & Destroy Customer Satisfaction - Cristian Medina

Cristian Medina wrote an article recently called "Test Engineering Anti-Patterns: Destroy Your Customer Satisfaction and Crater Your Quality By Using These 9 Easy Organizational Practices"

Of course, it's sarcastic, and aims to highlight many problems with organizational practices that reduce software quality.

The article doesn't go out of character, and only promotes the anti-patterns.
However, in this interview, we discuss each point, and the corollary of what you really should do. At least, our perspectives.

Here's the list of all the points discussed in the article and in this episode:

  1. Make the Test teams solely responsible for quality
  2. Require all tests to be automated before releasing
  3. Require 100% code coverage
  4. Isolate the Test organization from Development
  5. Measure the success of the process, not the product.
    • Metrics, if rewarded, will always be gamed.
  6. Require granular projections from engineers
  7. Reward quick patching instead of solving
  8. Plan for today instead of tomorrow

Special Guest: Cristian Medina.

Sponsored By:

Support Test & Code

Links:

<p>Cristian Medina wrote an article recently called &quot;Test Engineering Anti-Patterns: Destroy Your Customer Satisfaction and Crater Your Quality By Using These 9 Easy Organizational Practices&quot;</p> <p>Of course, it&#39;s sarcastic, and aims to highlight many problems with organizational practices that reduce software quality.</p> <p>The article doesn&#39;t go out of character, and only promotes the anti-patterns.<br> However, in this interview, we discuss each point, and the corollary of what you really should do. At least, our perspectives.</p> <p>Here&#39;s the list of all the points discussed in the article and in this episode:</p> <ol> <li>Make the Test teams solely responsible for quality</li> <li>Require all tests to be automated before releasing</li> <li>Require 100% code coverage</li> <li>Isolate the Test organization from Development</li> <li>Measure the success of the process, not the product. <ul> <li>Metrics, if rewarded, will always be gamed.</li> </ul></li> <li>Require granular projections from engineers</li> <li>Reward quick patching instead of solving</li> <li>Plan for today instead of tomorrow</li> </ol><p>Special Guest: Cristian Medina.</p><p>Sponsored By:</p><ul><li><a href="https://azure.com/pipelines" rel="nofollow">Azure Pipelines</a>: <a href="https://azure.com/pipelines" rel="nofollow">Automate your builds and deployments with pipelines so you spend less time with the nuts and bolts and more time being creative. Many organizations and open source projects are using Azure Pipelines already. Get started for free at azure.com/pipelines</a></li></ul><p><a href="https://www.patreon.com/testpodcast" rel="payment">Support Test & Code</a></p><p>Links:</p><ul><li><a href="https://www.freecodecamp.org/news/organizational-test-practices-guaranteed-to-lower-quality/" title="Test Engineering Anti-Patterns: Destroy Your Customer Satisfaction and Crater Your Quality By Using These 9 Easy Organizational Practices" rel="nofollow">Test Engineering Anti-Patterns: Destroy Your Customer Satisfaction and Crater Your Quality By Using These 9 Easy Organizational Practices</a> &mdash; The article we discuss in the show.</li><li><a href="https://tryexceptpass.org/" title="tryexceptpass" rel="nofollow">tryexceptpass</a> &mdash; Cris's blog</li></ul>

20 Oct 2019 7:00am GMT

Catalin George Festila: Python 3.7.4 : Usinge pytesseract for text recognition.

About this python module named tesseract, you can read here. I tested with the tesseract tool install on my Fedora 30 distro and python module pytesseract version 0.3.0. [root@desk mythcat]# dnf install tesseract Last metadata expiration check: 0:24:18 ago on Sun 20 Oct 2019 10:56:23 AM EEST. Package tesseract-4.1.0-1.fc30.x86_64 is already installed. Dependencies resolved. Nothing to do.

20 Oct 2019 1:47am GMT

Catalin George Festila: Python 3.7.4 : Usinge pytesseract for text recognition.

About this python module named tesseract, you can read here. I tested with the tesseract tool install on my Fedora 30 distro and python module pytesseract version 0.3.0. [root@desk mythcat]# dnf install tesseract Last metadata expiration check: 0:24:18 ago on Sun 20 Oct 2019 10:56:23 AM EEST. Package tesseract-4.1.0-1.fc30.x86_64 is already installed. Dependencies resolved. Nothing to do.

20 Oct 2019 1:47am GMT

19 Oct 2019

feedPlanet Python

Weekly Python StackOverflow Report: (cxcix) stackoverflow python report

These are the ten most rated questions at Stack Overflow last week.
Between brackets: [question score / answers count]
Build date: 2019-10-19 19:07:13 GMT


  1. Is there a more elegant way to express ((x == a and y == b) or (x == b and y == a))? - [59/8]
  2. How to write 2**n - 1 as a recursive function? - [33/7]
  3. Why does keras model predict slower after compile? - [14/2]
  4. What is the meaning of __total__ dunder attribute in Python 3? - [12/1]
  5. Identifying root parents and all their children in trees - [9/2]
  6. How to turn %s into {0}, {1} ... less clunky? - [7/6]
  7. What is the use for control characters in string.printable? - [7/5]
  8. Is it an error to return a value in a finally clause - [7/3]
  9. Why does setting a descriptor on a class overwrite the descriptor? - [7/2]
  10. Getting a numpy array view with integer or boolean indexing - [6/2]

19 Oct 2019 7:07pm GMT

Weekly Python StackOverflow Report: (cxcix) stackoverflow python report

These are the ten most rated questions at Stack Overflow last week.
Between brackets: [question score / answers count]
Build date: 2019-10-19 19:07:13 GMT


  1. Is there a more elegant way to express ((x == a and y == b) or (x == b and y == a))? - [59/8]
  2. How to write 2**n - 1 as a recursive function? - [33/7]
  3. Why does keras model predict slower after compile? - [14/2]
  4. What is the meaning of __total__ dunder attribute in Python 3? - [12/1]
  5. Identifying root parents and all their children in trees - [9/2]
  6. How to turn %s into {0}, {1} ... less clunky? - [7/6]
  7. What is the use for control characters in string.printable? - [7/5]
  8. Is it an error to return a value in a finally clause - [7/3]
  9. Why does setting a descriptor on a class overwrite the descriptor? - [7/2]
  10. Getting a numpy array view with integer or boolean indexing - [6/2]

19 Oct 2019 7:07pm GMT

Python Insider: Python 2.7.17 released

Python 2.7.17 is now available for download. Note Python 2.7.17 is the penultimate release in the Python 2.7 series.

19 Oct 2019 6:38pm GMT

Python Insider: Python 2.7.17 released

Python 2.7.17 is now available for download. Note Python 2.7.17 is the penultimate release in the Python 2.7 series.

19 Oct 2019 6:38pm GMT

TechBeamers Python: Python Add Lists

This tutorial covers the following topic - Python Add lists. It describes various ways to join/concatenate/add lists in Python. For example - simply appending elements of one list to the tail of the other in a for loop, or using +/* operators, list comprehension, extend(), and itertools.chain() methods. Most of these techniques use built-in constructs in Python. However, the one, itertools.chain() is a method defined in the itertools module. You must also see which of these ways is more suitable in your scenario. After going through this post, you can evaluate their performance in case of large lists. By the

The post Python Add Lists appeared first on Learn Programming and Software Testing.

19 Oct 2019 4:10pm GMT

TechBeamers Python: Python Add Lists

This tutorial covers the following topic - Python Add lists. It describes various ways to join/concatenate/add lists in Python. For example - simply appending elements of one list to the tail of the other in a for loop, or using +/* operators, list comprehension, extend(), and itertools.chain() methods. Most of these techniques use built-in constructs in Python. However, the one, itertools.chain() is a method defined in the itertools module. You must also see which of these ways is more suitable in your scenario. After going through this post, you can evaluate their performance in case of large lists. By the

The post Python Add Lists appeared first on Learn Programming and Software Testing.

19 Oct 2019 4:10pm GMT

18 Oct 2019

feedPlanet Python

Roberto Alsina: Episodio 13: Python 3.8

¿Qué trae de nuevo Python 3.8? Un montón de cosas, pero hay 2 en particular que afectan / mejoran / resuelven temas que me molestan desde hace ¡Más de 10 años!

18 Oct 2019 6:00pm GMT

Roberto Alsina: Episodio 13: Python 3.8

¿Qué trae de nuevo Python 3.8? Un montón de cosas, pero hay 2 en particular que afectan / mejoran / resuelven temas que me molestan desde hace ¡Más de 10 años!

18 Oct 2019 6:00pm GMT

Dataquest: Python Projects for Beginners: The Best Way to Learn

Learning Python can be difficult. You can spend time reading a textbook or watching videos, but then struggle to actually put what you've learned into practice. Or you might spend a ton of time learning syntax and get bored or lose motivation. How can you increase your chances of success? By building Python projects. That way […]

The post Python Projects for Beginners: The Best Way to Learn appeared first on Dataquest.

18 Oct 2019 5:13pm GMT

Dataquest: Python Projects for Beginners: The Best Way to Learn

Learning Python can be difficult. You can spend time reading a textbook or watching videos, but then struggle to actually put what you've learned into practice. Or you might spend a ton of time learning syntax and get bored or lose motivation. How can you increase your chances of success? By building Python projects. That way […]

The post Python Projects for Beginners: The Best Way to Learn appeared first on Dataquest.

18 Oct 2019 5:13pm GMT

Paolo Amoroso: Hello, Planet Python!

Planet Python is now syndicating the posts about Python of my blog. Thanks to the project maintainer Bruno Rocha for letting me join.

You may think of Planet Python as the all-you-can-eat source of Python content.


It's an aggregator of dozens of blogs, podcasts, and other resources on the Python programming language. It syndicates posts that cover Python or are of interest to the Python community. Planet Python is a terrific resource for learning the language and keeping up with what's going on in its ecosystem. I highly recommend that you subscribe to the RSS feed of Planet Python.

If you're reading this on Planet Python, hi there!

About me

I'm Paolo Amoroso, an Italian astronomy and space popularizer, a Google expert, and a podcaster. I'm a Python beginner as I started learning the language in December 2018. But I have been a hobby programmer since the home computer revolution of the early 1980s. And I have always had a soft spot for programming languages, paradigms, and compilers.

My programming background

After reading extensively about and experimenting with as many languages as I could, in the 1990s I settled with the ones of the Lisp family, particularly Common Lisp, Scheme, and Emacs Lisp. I was also a Computer Science student (but didn't graduate) and contributed to a few open source Lisp projects.

Although I first got in touch with Python in the early 1990s, I started learning it in some depth only in December 2018. My main motivation was to leverage its massive "batteries included" ecosystem of libraries, tools, and resources for my hobby and personal projects. And, of course, to have fun.

What I post about

In the Python posts of my blog, the ones you'll get via Planet Python, I will cover my experiences with learning and using the language, the books I read, the tools I use, and the projects I do. But, if you're curious, feel free to look around to see what else I post about on my blog.

I plan to use Python for coding projects related to my interests, especially astronomy and space, or to make simple personal tools and apps.

Thanks again for having me here!

This post by Paolo Amoroso was published on Moonshots Beyond the Cloud.

18 Oct 2019 3:22pm GMT

Paolo Amoroso: Hello, Planet Python!

Planet Python is now syndicating the posts about Python of my blog. Thanks to the project maintainer Bruno Rocha for letting me join.

You may think of Planet Python as the all-you-can-eat source of Python content.


It's an aggregator of dozens of blogs, podcasts, and other resources on the Python programming language. It syndicates posts that cover Python or are of interest to the Python community. Planet Python is a terrific resource for learning the language and keeping up with what's going on in its ecosystem. I highly recommend that you subscribe to the RSS feed of Planet Python.

If you're reading this on Planet Python, hi there!

About me

I'm Paolo Amoroso, an Italian astronomy and space popularizer, a Google expert, and a podcaster. I'm a Python beginner as I started learning the language in December 2018. But I have been a hobby programmer since the home computer revolution of the early 1980s. And I have always had a soft spot for programming languages, paradigms, and compilers.

My programming background

After reading extensively about and experimenting with as many languages as I could, in the 1990s I settled with the ones of the Lisp family, particularly Common Lisp, Scheme, and Emacs Lisp. I was also a Computer Science student (but didn't graduate) and contributed to a few open source Lisp projects.

Although I first got in touch with Python in the early 1990s, I started learning it in some depth only in December 2018. My main motivation was to leverage its massive "batteries included" ecosystem of libraries, tools, and resources for my hobby and personal projects. And, of course, to have fun.

What I post about

In the Python posts of my blog, the ones you'll get via Planet Python, I will cover my experiences with learning and using the language, the books I read, the tools I use, and the projects I do. But, if you're curious, feel free to look around to see what else I post about on my blog.

I plan to use Python for coding projects related to my interests, especially astronomy and space, or to make simple personal tools and apps.

Thanks again for having me here!

This post by Paolo Amoroso was published on Moonshots Beyond the Cloud.

18 Oct 2019 3:22pm GMT

Stories in My Pocket: PyCon 2019: The People of PyCon

There have been a few thoughts about PyCon US 2019 that have been bouncing around my head, wanting to come out. Today, I want to talk about the opportunity to meet your Python heros, and why you should start planning on attending next year.


Read more...

18 Oct 2019 1:48pm GMT

Stories in My Pocket: PyCon 2019: The People of PyCon

There have been a few thoughts about PyCon US 2019 that have been bouncing around my head, wanting to come out. Today, I want to talk about the opportunity to meet your Python heros, and why you should start planning on attending next year.


Read more...

18 Oct 2019 1:48pm GMT

10 Nov 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: King Willams Town Bahnhof

Gestern musste ich morgens zur Station nach KWT um unsere Rerservierten Bustickets für die Weihnachtsferien in Capetown abzuholen. Der Bahnhof selber ist seit Dezember aus kostengründen ohne Zugverbindung - aber Translux und co - die langdistanzbusse haben dort ihre Büros.


Größere Kartenansicht




© benste CC NC SA

10 Nov 2011 10:57am GMT

09 Nov 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein

Niemand ist besorgt um so was - mit dem Auto fährt man einfach durch, und in der City - nahe Gnobie- "ne das ist erst gefährlich wenn die Feuerwehr da ist" - 30min später auf dem Rückweg war die Feuerwehr da.




© benste CC NC SA

09 Nov 2011 8:25pm GMT

08 Nov 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: Brai Party

Brai = Grillabend o.ä.

Die möchte gern Techniker beim Flicken ihrer SpeakOn / Klinke Stecker Verzweigungen...

Die Damen "Mamas" der Siedlung bei der offiziellen Eröffnungsrede

Auch wenn weniger Leute da waren als erwartet, Laute Musik und viele Leute ...

Und natürlich ein Feuer mit echtem Holz zum Grillen.

© benste CC NC SA

08 Nov 2011 2:30pm GMT

07 Nov 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: Lumanyano Primary

One of our missions was bringing Katja's Linux Server back to her room. While doing that we saw her new decoration.

Björn, Simphiwe carried the PC to Katja's school


© benste CC NC SA

07 Nov 2011 2:00pm GMT

06 Nov 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: Nelisa Haircut

Today I went with Björn to Needs Camp to Visit Katja's guest family for a special Party. First of all we visited some friends of Nelisa - yeah the one I'm working with in Quigney - Katja's guest fathers sister - who did her a haircut.

African Women usually get their hair done by arranging extensions and not like Europeans just cutting some hair.

In between she looked like this...

And then she was done - looks amazing considering the amount of hair she had last week - doesn't it ?

© benste CC NC SA

06 Nov 2011 7:45pm GMT

05 Nov 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: Mein Samstag

Irgendwie viel mir heute auf das ich meine Blogposts mal ein bischen umstrukturieren muss - wenn ich immer nur von neuen Plätzen berichte, dann müsste ich ja eine Rundreise machen. Hier also mal ein paar Sachen aus meinem heutigen Alltag.

Erst einmal vorweg, Samstag zählt zumindest für uns Voluntäre zu den freien Tagen.

Dieses Wochenende sind nur Rommel und ich auf der Farm - Katja und Björn sind ja mittlerweile in ihren Einsatzstellen, und meine Mitbewohner Kyle und Jonathan sind zu Hause in Grahamstown - sowie auch Sipho der in Dimbaza wohnt.
Robin, die Frau von Rommel ist in Woodie Cape - schon seit Donnerstag um da ein paar Sachen zur erledigen.
Naja wie dem auch sei heute morgen haben wir uns erstmal ein gemeinsames Weetbix/Müsli Frühstück gegönnt und haben uns dann auf den Weg nach East London gemacht. 2 Sachen waren auf der Checkliste Vodacom, Ethienne (Imobilienmakler) außerdem auf dem Rückweg die fehlenden Dinge nach NeedsCamp bringen.

Nachdem wir gerade auf der Dirtroad losgefahren sind mussten wir feststellen das wir die Sachen für Needscamp und Ethienne nicht eingepackt hatten aber die Pumpe für die Wasserversorgung im Auto hatten.

Also sind wir in EastLondon ersteinmal nach Farmerama - nein nicht das onlinespiel farmville - sondern einen Laden mit ganz vielen Sachen für eine Farm - in Berea einem nördlichen Stadteil gefahren.

In Farmerama haben wir uns dann beraten lassen für einen Schnellverschluss der uns das leben mit der Pumpe leichter machen soll und außerdem eine leichtere Pumpe zur Reperatur gebracht, damit es nicht immer so ein großer Aufwand ist, wenn mal wieder das Wasser ausgegangen ist.

Fego Caffé ist in der Hemmingways Mall, dort mussten wir und PIN und PUK einer unserer Datensimcards geben lassen, da bei der PIN Abfrage leider ein zahlendreher unterlaufen ist. Naja auf jeden Fall speichern die Shops in Südafrika so sensible Daten wie eine PUK - die im Prinzip zugang zu einem gesperrten Phone verschafft.

Im Cafe hat Rommel dann ein paar online Transaktionen mit dem 3G Modem durchgeführt, welches ja jetzt wieder funktionierte - und übrigens mittlerweile in Ubuntu meinem Linuxsystem perfekt klappt.

Nebenbei bin ich nach 8ta gegangen um dort etwas über deren neue Deals zu erfahren, da wir in einigen von Hilltops Centern Internet anbieten wollen. Das Bild zeigt die Abdeckung UMTS in NeedsCamp Katjas Ort. 8ta ist ein neuer Telefonanbieter von Telkom, nachdem Vodafone sich Telkoms anteile an Vodacom gekauft hat müssen die komplett neu aufbauen.
Wir haben uns dazu entschieden mal eine kostenlose Prepaidkarte zu testen zu organisieren, denn wer weis wie genau die Karte oben ist ... Bevor man einen noch so billigen Deal für 24 Monate signed sollte man wissen obs geht.

Danach gings nach Checkers in Vincent, gesucht wurden zwei Hotplates für WoodyCape - R 129.00 eine - also ca. 12€ für eine zweigeteilte Kochplatte.
Wie man sieht im Hintergrund gibts schon Weihnachtsdeko - Anfang November und das in Südafrika bei sonnig warmen min- 25°C

Mittagessen haben wir uns bei einem Pakistanischen Curry Imbiss gegönnt - sehr empfehlenswert !
Naja und nachdem wir dann vor ner Stunde oder so zurück gekommen sind habe ich noch den Kühlschrank geputzt den ich heute morgen zum defrosten einfach nach draußen gestellt hatte. Jetzt ist der auch mal wieder sauber und ohne 3m dicke Eisschicht...

Morgen ... ja darüber werde ich gesondert berichten ... aber vermutlich erst am Montag, denn dann bin ich nochmal wieder in Quigney(East London) und habe kostenloses Internet.

© benste CC NC SA

05 Nov 2011 4:33pm GMT

31 Oct 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: Sterkspruit Computer Center

Sterkspruit is one of Hilltops Computer Centres in the far north of Eastern Cape. On the trip to J'burg we've used the opportunity to take a look at the centre.

Pupils in the big classroom


The Trainer


School in Countryside


Adult Class in the Afternoon


"Town"


© benste CC NC SA

31 Oct 2011 4:58pm GMT

Benedict Stein: Technical Issues

What are you doing in an internet cafe if your ADSL and Faxline has been discontinued before months end. Well my idea was sitting outside and eating some ice cream.
At least it's sunny and not as rainy as on the weekend.


© benste CC NC SA

31 Oct 2011 3:11pm GMT

30 Oct 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: Nellis Restaurant

For those who are traveling through Zastron - there is a very nice Restaurant which is serving delicious food at reasanable prices.
In addition they're selling home made juices jams and honey.




interior


home made specialities - the shop in the shop


the Bar


© benste CC NC SA

30 Oct 2011 4:47pm GMT

29 Oct 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: The way back from J'burg

Having the 10 - 12h trip from J'burg back to ELS I was able to take a lot of pcitures including these different roadsides

Plain Street


Orange River in its beginngings (near Lesotho)


Zastron Anglican Church


The Bridge in Between "Free State" and Eastern Cape next to Zastron


my new Background ;)


If you listen to GoogleMaps you'll end up traveling 50km of gravel road - as it was just renewed we didn't have that many problems and saved 1h compared to going the official way with all it's constructions sites




Freeway


getting dark


© benste CC NC SA

29 Oct 2011 4:23pm GMT

28 Oct 2011

feedPython Software Foundation | GSoC'11 Students

Benedict Stein: Wie funktioniert eigentlich eine Baustelle ?

Klar einiges mag anders sein, vieles aber gleich - aber ein in Deutschland täglich übliches Bild einer Straßenbaustelle - wie läuft das eigentlich in Südafrika ?

Ersteinmal vorweg - NEIN keine Ureinwohner die mit den Händen graben - auch wenn hier mehr Manpower genutzt wird - sind sie fleißig mit Technologie am arbeiten.

Eine ganz normale "Bundesstraße"


und wie sie erweitert wird


gaaaanz viele LKWs


denn hier wird eine Seite über einen langen Abschnitt komplett gesperrt, so das eine Ampelschaltung mit hier 45 Minuten Wartezeit entsteht


Aber wenigstens scheinen die ihren Spaß zu haben ;) - Wie auch wir denn gücklicher Weise mussten wir nie länger als 10 min. warten.

© benste CC NC SA

28 Oct 2011 4:20pm GMT