22 Aug 2019

feedPlanet Python

Stack Abuse: Python for NLP: Creating Multi-Data-Type Classification Models with Keras

This is the 18th article in my series of articles on Python for NLP. In my previous article, I explained how to create a deep learning-based movie sentiment analysis model using Python's Keras library. In that article, we saw how we can perform sentiment analysis of user reviews regarding different movies on IMDB. We used the text of the review the review to predict the sentiment.

However, in text classification tasks, we can also make use of the non-textual information to classify the text. For instance, gender may have an impact on the sentiment of the review. Furthermore, nationalities may affect the public opinion about a particular movie. Therefore, this associated info, also known as meta data can also be used to improve accuracy of statistical model.

In this article, we will build upon the concepts that we studied in the last two articles and will see how to create a text classification system that classifies user reviews regarding different business, into one of the three predefined categories i.e. "good", "bad", and "average". However, in addition to the text of the review, we will use the associated meta data of the review to perform classifcation. Since we have two different types of inputs i.e. textual input and numerical input, we need to create a multiple inputs model. We will be using Keras Functional API since it supports multiple inputs and multiple output models.

After reading this article, you will be able to create a deep learning model in Keras that is capable of accepting multiple inputs, concatenating the two outputs and then performing classification or regression using the aggregated input.

Before we dive into the details of creating such a model, let's first breifly review the dataset that we are going to use.

The Dataset

The dataset for this article can be downloaded from this Kaggle link. The dataset contains multiple files, but we are only interested in the yelp_review.csv file. The file contains more than 5.2 million reviews about different businesses, including restaurants, bars, dentists, doctors, beauty salons, etc. For our purposes we will only be using the first 50,000 records to train our model. Download the dataset to your local machine.

Let's first import all the libraries that we will be using in this article before importing the dataset.

from numpy import array
from keras.preprocessing.text import one_hot
from keras.preprocessing.sequence import pad_sequences
from keras.models import Sequential
from keras.layers.core import Activation, Dropout, Dense
from keras.layers import Flatten, LSTM
from keras.layers import GlobalMaxPooling1D
from keras.models import Model
from keras.layers.embeddings import Embedding
from sklearn.model_selection import train_test_split
from keras.preprocessing.text import Tokenizer
from keras.layers import Input
from keras.layers.merge import Concatenate

import pandas as pd
import numpy as np
import re

As a first step, we need to load the dataset. The following script does that:

yelp_reviews = pd.read_csv("/content/drive/My Drive/yelp_review_short.csv")

The dataset contains a column Stars that contains ratings for different businesses. The "Stars" column can have values between 1 and 5. We will simplify our problem by converting the numerical values for the reviews into categorical ones. We will add a new column reviews_score to our dataset. If the user review has a value of 1 in the Stars column, the reviews_score column will have a string value bad. If the rating is 2 or 3 in the Stars column, the reviews_score column will contain a value average. Finally review rating of 4 or 5 will have a corresponding value of good in the reviews_score column.

The following script performs this preprocessing:

bins = [0,1,3,5]
review_names = ['bad', 'average', 'good']
yelp_reviews['reviews_score'] = pd.cut(yelp_reviews['stars'], bins, labels=review_names)

Next, we will remove all the NULL values from our dataframe and will print the shape and the header of the dataset.

yelp_reviews.isnull().values.any()

print(yelp_reviews.shape)

yelp_reviews.head()

In the output you will see (50000,10), which means that our dataset contains 50,000 records with 10 columns. The header of the yelp_reviews dataframe looks like this:

head

You can see the 10 columns that our dataframe contains, including the newly added reviews_score column. The text column contains the text of the review while the useful column contains numerical value that represents the count of the people who found the review useful. Similarly, the funny and cool columns contains the counts of people who found reviews funny or cool, respectively.

Let's randomly choose a review. If you look at the 4th review (review with index 3), it has 4 stars and hence it is marked as good. Let's view the complete text of this review:

print(yelp_reviews["text"][3])

The output looks like this:

Love coming here. Yes the place always needs the floor swept but when you give out  peanuts in the shell how won't it always be a bit dirty.

The food speaks for itself, so good. Burgers are made to order and the meat is put on the grill when you order your sandwich. Getting the small burger just means 1 patty, the regular is a 2 patty burger which is twice the deliciousness.

Getting the Cajun fries adds a bit of spice to them and whatever size you order they always throw more fries (a lot more fries) into the bag.

You can clearly see that this is a positive review.

Let's now plot the number of good, average, and bad reviews.

import seaborn as sns

sns.countplot(x='reviews_score', data=yelp_reviews)

head

It is evident from the above plot that majority of the reviews are good, followed by the average reviews. The number of negative reviews is very small.

We have preprocessed our data and now we will create three models in this article. The first model will only use text inputs for predicting whether a review is good, average, or bad. In the second model, we will not use text. We will only use the meta information such as useful, funny, and cool to predict the sentiment of the review. Finally, we will create a model that accepts multiple inputs i.e. text and meta information for text classification.

Creating a Model with Text Inputs Only

The first step is to define a function that cleans the textual data.

def preprocess_text(sen):

    # Remove punctuations and numbers
    sentence = re.sub('[^a-zA-Z]', ' ', sen)

    # Single character removal
    sentence = re.sub(r"\s+[a-zA-Z]\s+", ' ', sentence)

    # Removing multiple spaces
    sentence = re.sub(r'\s+', ' ', sentence)

    return sentence

Since, we are only using text in this model, we will filter all the text reviews and store them in the list. The text reviews will be cleaned using the preprocess_text function, which removes punctuations and numbers from the text.

X = []
sentences = list(yelp_reviews["text"])
for sen in sentences:
    X.append(preprocess_text(sen))

y = yelp_reviews['reviews_score']

Our X variable here contains the text reviews while the y variable contains the corresponding reviews_score values. The reviews_score column has data in the text format. We need to convert the text to a one-hot encoded vector. We can use the to_categorical method from the keras.utils module. However, first we have to convert the text into integer labels using the LabelEncoder function from the sklearn.preprocessing module.

from sklearn import preprocessing

# label_encoder object knows how to understand word labels.
label_encoder = preprocessing.LabelEncoder()

# Encode labels in column 'species'.
y = label_encoder.fit_transform(y)

Let's now divide our data into testing and training sets:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=42)

Now we can convert both the training and test labels into one-hot encoded vectors:

from keras.utils import to_categorical
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

I explained in my article on word embeddings that textual data has to be converted into some sort of numeric form before it can be used by statisitical algorithms like machine and deep learning models. One way to convert text to numbers is via word embeddings. If you are unaware of how to implement word embeddings via Keras, I highly recommemd that you read this article before moving on to the next sections of the code.

The first step in word embeddings is to convert the words into thier corresponding numeric indexes. To do so, we can use the Tokenizer class from Keras.preprocessing.text module.

tokenizer = Tokenizer(num_words=5000)
tokenizer.fit_on_texts(X_train)

X_train = tokenizer.texts_to_sequences(X_train)
X_test = tokenizer.texts_to_sequences(X_test)

Sentences can have different lengths, and therefore the sequences returned by the Tokenizer class also consist of variable lengths. We specify that maximum length of the sequence will be 200 (although you can try any number). For the sentences having length less than 200, the remaining indexes will be padded with zeros. For the sentences having length greater than 200, the remaining indexes will be truncated.

Look at the following script:

vocab_size = len(tokenizer.word_index) + 1

maxlen = 200

X_train = pad_sequences(X_train, padding='post', maxlen=maxlen)
X_test = pad_sequences(X_test, padding='post', maxlen=maxlen)

Next, we need to load the built-in GloVe word embeddings.

from numpy import array
from numpy import asarray
from numpy import zeros

embeddings_dictionary = dict()

for line in glove_file:
    records = line.split()
    word = records[0]
    vector_dimensions = asarray(records[1:], dtype='float32')
    embeddings_dictionary [word] = vector_dimensions

glove_file.close()

Finally, we will create an embedding matrix where rows will be equal to the number of words in the vocabulary (plus 1). The number of columns will be 100 since each word in the GloVe word embeddings that we loaded is represented as a 100 dimensional vector.

embedding_matrix = zeros((vocab_size, 100))
for word, index in tokenizer.word_index.items():
    embedding_vector = embeddings_dictionary.get(word)
    if embedding_vector is not None:
        embedding_matrix[index] = embedding_vector

Once the word embedding step is completed, we are ready to create our model. We will be using Keras' functional API to create our model. Though single input models like the one we are creating now can be developed using sequential API as well, but since in the next section we are going to develop a multiple input model that can only be developed using Keras functional API, we will stick to functional API in this section too.

We will create a very simple model with one input layer (embedding layer), one LSTM layer with 128 neurons and one dense layer that will act as the output layer as well. Since we have 3 possible outputs, the number of neurons will be 3 and the activation function will be softmax. We will use the categorical_crossentropy as our loss function and adam as the optimization function.

deep_inputs = Input(shape=(maxlen,))
embedding_layer = Embedding(vocab_size, 100, weights=[embedding_matrix], trainable=False)(deep_inputs)
LSTM_Layer_1 = LSTM(128)(embedding_layer)
dense_layer_1 = Dense(3, activation='softmax')(LSTM_Layer_1)
model = Model(inputs=deep_inputs, outputs=dense_layer_1)

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['acc'])

Let's print the summary of our model:

print(model.summary())

_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
input_1 (InputLayer)         (None, 200)               0
_________________________________________________________________
embedding_1 (Embedding)      (None, 200, 100)          5572900
_________________________________________________________________
lstm_1 (LSTM)                (None, 128)               117248
_________________________________________________________________
dense_1 (Dense)              (None, 3)                 387
=================================================================
Total params: 5,690,535
Trainable params: 117,635
Non-trainable params: 5,572,900

Finally, lets print the block diagram of our neural network:

from keras.utils import plot_model
plot_model(model, to_file='model_plot1.png', show_shapes=True, show_layer_names=True)

The file model_plot1.png will be created in your local file path. If you open the image, it will look like this:

head

You can see that the model has 1 input layer, 1 embedding layer, 1 LSTM, and one dense layer which serves as the output layer as well.

Let's now train our model:

history = model.fit(X_train, y_train, batch_size=128, epochs=10, verbose=1, validation_split=0.2)

The model will be trained on 80% of the train data and will be validated on 20% of the train data. The results for the 10 epochs is as follows:

Train on 32000 samples, validate on 8000 samples
Epoch 1/10
32000/32000 [==============================] - 81s 3ms/step - loss: 0.8640 - acc: 0.6623 - val_loss: 0.8356 - val_acc: 0.6730
Epoch 2/10
32000/32000 [==============================] - 80s 3ms/step - loss: 0.8508 - acc: 0.6618 - val_loss: 0.8399 - val_acc: 0.6690
Epoch 3/10
32000/32000 [==============================] - 84s 3ms/step - loss: 0.8461 - acc: 0.6647 - val_loss: 0.8374 - val_acc: 0.6726
Epoch 4/10
32000/32000 [==============================] - 82s 3ms/step - loss: 0.8288 - acc: 0.6709 - val_loss: 0.7392 - val_acc: 0.6861
Epoch 5/10
32000/32000 [==============================] - 82s 3ms/step - loss: 0.7444 - acc: 0.6804 - val_loss: 0.6371 - val_acc: 0.7311
Epoch 6/10
32000/32000 [==============================] - 83s 3ms/step - loss: 0.5969 - acc: 0.7484 - val_loss: 0.5602 - val_acc: 0.7682
Epoch 7/10
32000/32000 [==============================] - 82s 3ms/step - loss: 0.5484 - acc: 0.7623 - val_loss: 0.5244 - val_acc: 0.7814
Epoch 8/10
32000/32000 [==============================] - 86s 3ms/step - loss: 0.5052 - acc: 0.7866 - val_loss: 0.4971 - val_acc: 0.7950
Epoch 9/10
32000/32000 [==============================] - 84s 3ms/step - loss: 0.4753 - acc: 0.8032 - val_loss: 0.4839 - val_acc: 0.7965
Epoch 10/10
32000/32000 [==============================] - 82s 3ms/step - loss: 0.4539 - acc: 0.8110 - val_loss: 0.4622 - val_acc: 0.8046

You can see that the final training accuracy of the model is 81.10% while validation accuracy is 80.46. The difference is very small and therefore we assume that our model is not overfitting on the training data.

Let's now evaluate the performance of our model on test set:

score = model.evaluate(X_test, y_test, verbose=1)

print("Test Score:", score[0])
print("Test Accuracy:", score[1])

The output looks like this:

10000/10000 [==============================] - 37s 4ms/step
Test Score: 0.4592904740810394
Test Accuracy: 0.8101

Finally, let's plot the values for loss and accuracy for both training and testing sets:

import matplotlib.pyplot as plt

plt.plot(history.history['acc'])
plt.plot(history.history['val_acc'])

plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train','test'], loc='upper left')
plt.show()

plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])

plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train','test'], loc='upper left')
plt.show()

You should see the following two plots:

head

You can see the lines for both training and testing accuracies and losses are pretty close to each other which means that the model is not overfitting.

Creating a Model with Meta Information Only

In this section, we will create a classification model that uses information from the useful, funny, and cool columns of the yelp reviews. Since the data for these columns is well structured and doesn't contain any sequential or spatial pattern, we can use simple densly connected neural networks to make predictions.

Let's plot the average counts for useful, funny, and cool reviews against the review score.

import seaborn as sns
sns.barplot(x='reviews_score', y='useful', data=yelp_reviews)

head

From the output, you can see that the average count for reviews marked as useful is the highest for the bad reviews, followed by the average reviews and the good reviews.

Let's now plot the average count for funny reviews:

sns.barplot(x='reviews_score', y='funny', data=yelp_reviews)

head

The output shows that again, the average count for reviews marked as funny is highest for the bad reviews.

Finally, let's plot the average value for the cool column against the reviews_score column. We expect that the average count for the cool column will be the highest for good reviews since people often mark positive or good reviews as cool:

sns.barplot(x='reviews_score', y='cool', data=yelp_reviews)

head

As expected, the average cool count for the good reviews is the highest. From this information, we can safely assume that the count values for useful, funny, and cool columns have some correlation with the reviews_score columns. Therefore, we will try to use the data from these three columns to train our algorithm that predicts the value for reviews_score column.

Let's filter these three columns from pur dataset:

yelp_reviews_meta = yelp_reviews[['useful', 'funny', 'cool']]

X = yelp_reviews_meta.values

y = yelp_reviews['reviews_score']

Next, we will convert our labels into one-hot encoded values and then split our data into train and test sets:

from sklearn import preprocessing

# label_encoder object knows how to understand word labels.
label_encoder = preprocessing.LabelEncoder()

# Encode labels in column 'species'.
y = label_encoder.fit_transform(y)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=42)

from keras.utils import to_categorical
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

The next step is to create our model. Our model will consist of four layers (you can try any number): the input layer, two dense hidden layers with 10 neurons and relu activation functions, and finally an output dense layer with 3 neurons and softmax activation function. The loss function and optimizer will be categorical_crossentropy and adam, respectively.

The following script defines the model:

input2 = Input(shape=(3,))
dense_layer_1 = Dense(10, activation='relu')(input2)
dense_layer_2 = Dense(10, activation='relu')(dense_layer_1)
output = Dense(3, activation='softmax')(dense_layer_2)

model = Model(inputs=input2, outputs=output)
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['acc'])

Let's print the summary of the model:

print(model.summary())

_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
input_1 (InputLayer)         (None, 3)                 0
_________________________________________________________________
dense_1 (Dense)              (None, 10)                40
_________________________________________________________________
dense_2 (Dense)              (None, 10)                110
_________________________________________________________________
dense_3 (Dense)              (None, 3)                 33
=================================================================
Total params: 183
Trainable params: 183
Non-trainable params: 0

Finally, the block diagram for the model can be created via the following script:

from keras.utils import plot_model
plot_model(model, to_file='model_plot2.png', show_shapes=True, show_layer_names=True)

Now, if you open the model_plot2.png file from your local file path, it looks like this:

head

Let's now train the model and print the accuracy and loss values for each epoch:

history = model.fit(X_train, y_train, batch_size=16, epochs=10, verbose=1, validation_split=0.2)

Train on 32000 samples, validate on 8000 samples
Epoch 1/10
32000/32000 [==============================] - 8s 260us/step - loss: 0.8429 - acc: 0.6649 - val_loss: 0.8166 - val_acc: 0.6734
Epoch 2/10
32000/32000 [==============================] - 7s 214us/step - loss: 0.8203 - acc: 0.6685 - val_loss: 0.8156 - val_acc: 0.6737
Epoch 3/10
32000/32000 [==============================] - 7s 217us/step - loss: 0.8187 - acc: 0.6685 - val_loss: 0.8150 - val_acc: 0.6736
Epoch 4/10
32000/32000 [==============================] - 7s 220us/step - loss: 0.8183 - acc: 0.6695 - val_loss: 0.8160 - val_acc: 0.6740
Epoch 5/10
32000/32000 [==============================] - 7s 227us/step - loss: 0.8177 - acc: 0.6686 - val_loss: 0.8149 - val_acc: 0.6751
Epoch 6/10
32000/32000 [==============================] - 7s 219us/step - loss: 0.8175 - acc: 0.6686 - val_loss: 0.8157 - val_acc: 0.6744
Epoch 7/10
32000/32000 [==============================] - 7s 216us/step - loss: 0.8172 - acc: 0.6696 - val_loss: 0.8145 - val_acc: 0.6733
Epoch 8/10
32000/32000 [==============================] - 7s 214us/step - loss: 0.8175 - acc: 0.6689 - val_loss: 0.8139 - val_acc: 0.6734
Epoch 9/10
32000/32000 [==============================] - 7s 215us/step - loss: 0.8169 - acc: 0.6691 - val_loss: 0.8160 - val_acc: 0.6744
Epoch 10/10
32000/32000 [==============================] - 7s 216us/step - loss: 0.8167 - acc: 0.6694 - val_loss: 0.8138 - val_acc: 0.6736

From the output, you can see that our model doesn't converge and accuracy values remain between 66 and 67 accross all the epochs.

Let's see how the model performs on test set:

score = model.evaluate(X_test, y_test, verbose=1)

print("Test Score:", score[0])
print("Test Accuracy:", score[1])

10000/10000 [==============================] - 0s 34us/step
Test Score: 0.8206425309181213
Test Accuracy: 0.6669

We can print the loss and accuracy values for training and test sets via the following script:

import matplotlib.pyplot as plt

plt.plot(history.history['acc'])
plt.plot(history.history['val_acc'])

plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train','test'], loc='upper left')
plt.show()

plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])

plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train','test'], loc='upper left')
plt.show()

head

From the output, you can see that accuracy values are relatively lower. Hence, we can say that our model is underfitting. The accuracy can be increased by increasing the number of dense layers or by increasing the number of epochs, however I will leave that to you.

Let's move on to the final and most important section of this article where we will use multiple inputs of different types to train our model.

Creating a Model with Multiple Inputs

In the previous sections, we saw how to train deep learning models using either textual data or meta information. What if we want to combine textual information with meta information and use that as input to our model? We can do so using the Keras functional API. In this section we will create two submodels.

The first submodel will accept textual input in the form of text reviews. This submodel will consist of an input shape layer, an embedding layer, and an LSTM layer of 128 neurons. The second submodel will accept input in the form of meta information from the useful, funny, and cool columns. The second submodel also consist of three layers. An input layer and two dense layers.

The output from the LSTM layer of the first submodel and the output from the second dense layer of the second submodel will be concatenated together and will be used as concatenated input to another dense layer with 10 neurons. Finally, the output dense layer will have three neuorns corresponding to each review type.

Let's see how we can createe such a concatenated model.

First we have to create two different types of inputs. To do so, we will divide our data into a feature set and label set, as shown below:

X = yelp_reviews.drop('reviews_score', axis=1)

y = yelp_reviews['reviews_score']

The X variable contains the feature set, where as the y variable contains label set. We need to convert our labels into one-hot encoded vectors. We can do so using the label encoder and the to_categorical function of the keras.utils module. We will also divide our data into training and feature set.

from sklearn import preprocessing

# label_encoder object knows how to understand word labels.
label_encoder = preprocessing.LabelEncoder()

# Encode labels in column 'species'.
y = label_encoder.fit_transform(y)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=42)

from keras.utils import to_categorical
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

Now our label set is in the required form. Since there will be only one output, therefore we don't need to process our label set. However, there will be multiple inputs to the model. Therefore, we need to preprocess our feature set.

Let's first create preproces_text function that will be used to preprocess our dataset:

def preprocess_text(sen):

    # Remove punctuations and numbers
    sentence = re.sub('[^a-zA-Z]', ' ', sen)

    # Single character removal
    sentence = re.sub(r"\s+[a-zA-Z]\s+", ' ', sentence)

    # Removing multiple spaces
    sentence = re.sub(r'\s+', ' ', sentence)

    return sentence

As a first step, we will create textual input for the training and test set. Look at the following script:

X1_train = []
sentences = list(X_train["text"])
for sen in sentences:
    X1_train.append(preprocess_text(sen))

Now X1_train contains the textual input for the training set. Similarly, the following script preprocess textual input data for test set:

X1_test = []
sentences = list(X_test["text"])
for sen in sentences:
    X1_test.append(preprocess_text(sen))

Now we need to convert textual input for the training and test sets into numeric form using word embeddings. The following script does that:

tokenizer = Tokenizer(num_words=5000)
tokenizer.fit_on_texts(X1_train)

X1_train = tokenizer.texts_to_sequences(X1_train)
X1_test = tokenizer.texts_to_sequences(X1_test)

vocab_size = len(tokenizer.word_index) + 1

maxlen = 200

X1_train = pad_sequences(X1_train, padding='post', maxlen=maxlen)
X1_test = pad_sequences(X1_test, padding='post', maxlen=maxlen)

We will again use GloVe word embeddings for creating word vectors:

from numpy import array
from numpy import asarray
from numpy import zeros

embeddings_dictionary = dict()

glove_file = open('/content/drive/My Drive/glove.6B.100d.txt', encoding="utf8")

for line in glove_file:
    records = line.split()
    word = records[0]
    vector_dimensions = asarray(records[1:], dtype='float32')
    embeddings_dictionary[word] = vector_dimensions

glove_file.close()

embedding_matrix = zeros((vocab_size, 100))
for word, index in tokenizer.word_index.items():
    embedding_vector = embeddings_dictionary.get(word)
    if embedding_vector is not None:
        embedding_matrix[index] = embedding_vector

We have preprocessed our textual input. The second input type is the meta information in the useful, funny, and cool columns. We will filter these columns from the feature set to create meta input for training the algorithms. Look at the following script:

X2_train = X_train[['useful', 'funny', 'cool']].values
X2_test = X_test[['useful', 'funny', 'cool']].values

Let's now create our two input layers. The first input layer will be used to input the textual input and the second input layer will be used to input meta information from the three columns.

input_1 = Input(shape=(maxlen,))

input_2 = Input(shape=(3,))

You can see that the first input layer input_1 is used for the textual input. The shape size has been set to the shape of the input sentence. For the second input layer, the shape corresponds to three columns.

Let's now create the first submodel that accepts data from first input layer:

embedding_layer = Embedding(vocab_size, 100, weights=[embedding_matrix], trainable=False)(input_1)
LSTM_Layer_1 = LSTM(128)(embedding_layer)

Similarly, the following script creates a second submodel that accepts input from the second input layer:

dense_layer_1 = Dense(10, activation='relu')(input_2)
dense_layer_2 = Dense(10, activation='relu')(dense_layer_1)

We now have two submodels. What we want to do is concatenate the output from the first submodel with the output from the second submodel. The output from the first submodel is the output from the LSTM_Layer_1 and similarly, the output from the second submodel is the output from the dense_layer_2. We can use the Concatenate class from the keras.layers.merge module to concatenate two inputs.

The following script creates our final model:

concat_layer = Concatenate()([LSTM_Layer_1, dense_layer_2])
dense_layer_3 = Dense(10, activation='relu')(concat_layer)
output = Dense(3, activation='softmax')(dense_layer_3)
model = Model(inputs=[input_1, input_2], outputs=output)

You can see that now our model has a list of inputs with two items. The following script compiles the model and prints its summary:

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['acc'])
print(model.summary())

The model summary is as follows:

Layer (type)                    Output Shape         Param #     Connected to
==================================================================================================
input_1 (InputLayer)            (None, 200)          0
__________________________________________________________________________________________________
input_2 (InputLayer)            (None, 3)            0
__________________________________________________________________________________________________
embedding_1 (Embedding)         (None, 200, 100)     5572900     input_1[0][0]
__________________________________________________________________________________________________
dense_1 (Dense)                 (None, 10)           40          input_2[0][0]
__________________________________________________________________________________________________
lstm_1 (LSTM)                   (None, 128)          117248      embedding_1[0][0]
__________________________________________________________________________________________________
dense_2 (Dense)                 (None, 10)           110         dense_1[0][0]
__________________________________________________________________________________________________
concatenate_1 (Concatenate)     (None, 138)          0           lstm_1[0][0]
                                                                 dense_2[0][0]
__________________________________________________________________________________________________
dense_3 (Dense)                 (None, 10)           1390        concatenate_1[0][0]
__________________________________________________________________________________________________
dense_4 (Dense)                 (None, 3)            33          dense_3[0][0]
==================================================================================================
Total params: 5,691,721
Trainable params: 118,821
Non-trainable params: 5,572,900

Finally, we can plot the complete network model using the following script:

from keras.utils import plot_model
plot_model(model, to_file='model_plot3.png', show_shapes=True, show_layer_names=True)

If you open the model_plot3.png file, you should see the following network diagram:

head

The above figure clearly explains how we have concatenated multiple inputs into one input to create our model.

Let's now train our model and see the results:

history = model.fit(x=[X1_train, X2_train], y=y_train, batch_size=128, epochs=10, verbose=1, validation_split=0.2)

Here is the result for the 10 epochs:

Train on 32000 samples, validate on 8000 samples
Epoch 1/10
32000/32000 [==============================] - 155s 5ms/step - loss: 0.9006 - acc: 0.6509 - val_loss: 0.8233 - val_acc: 0.6704
Epoch 2/10
32000/32000 [==============================] - 154s 5ms/step - loss: 0.8212 - acc: 0.6670 - val_loss: 0.8141 - val_acc: 0.6745
Epoch 3/10
32000/32000 [==============================] - 154s 5ms/step - loss: 0.8151 - acc: 0.6691 - val_loss: 0.8086 - val_acc: 0.6740
Epoch 4/10
32000/32000 [==============================] - 155s 5ms/step - loss: 0.8121 - acc: 0.6701 - val_loss: 0.8039 - val_acc: 0.6776
Epoch 5/10
32000/32000 [==============================] - 154s 5ms/step - loss: 0.8027 - acc: 0.6740 - val_loss: 0.7467 - val_acc: 0.6854
Epoch 6/10
32000/32000 [==============================] - 155s 5ms/step - loss: 0.6791 - acc: 0.7158 - val_loss: 0.5764 - val_acc: 0.7560
Epoch 7/10
32000/32000 [==============================] - 154s 5ms/step - loss: 0.5333 - acc: 0.7744 - val_loss: 0.5076 - val_acc: 0.7881
Epoch 8/10
32000/32000 [==============================] - 154s 5ms/step - loss: 0.4857 - acc: 0.7973 - val_loss: 0.4849 - val_acc: 0.7970
Epoch 9/10
32000/32000 [==============================] - 154s 5ms/step - loss: 0.4697 - acc: 0.8034 - val_loss: 0.4709 - val_acc: 0.8024
Epoch 10/10
32000/32000 [==============================] - 154s 5ms/step - loss: 0.4479 - acc: 0.8123 - val_loss: 0.4592 - val_acc: 0.8079

To evaluate our model, we wil have to pass both the test inputs to the evaluate function as shown below:

score = model.evaluate(x=[X1_test, X2_test], y=y_test, verbose=1)

print("Test Score:", score[0])
print("Test Accuracy:", score[1])

Here are the result:

10000/10000 [==============================] - 18s 2ms/step
Test Score: 0.4576087875843048
Test Accuracy: 0.8053

Our test accuracy is 80.53%, which is slightly less than our first model that uses textual input only. This shows that meta information in yelp_reviews is not very useful for sentiment prediction.

Anyways, now you know how to create multiple input model for text classification in Keras!

Finally, let's now print the loss and accuracy for training and test sets:

import matplotlib.pyplot as plt

plt.plot(history.history['acc'])
plt.plot(history.history['val_acc'])

plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train','test'], loc='upper left')
plt.show()

plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])

plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train','test'], loc='upper left')
plt.show()

head

You can see that the differences for loss and accuracy values is minimal between the training and test sets, hence our model is not overfitting.

Final Thoughts and Improvements

In this article, we built a very simple neural network since the purpose of the article is to explain how to create deep learning model that accepts multiple inputs of different types.

Following are some of the tips that you can follow to further improve the performance of the text classification model:

  1. We only used 50,000, out of 5.2 million records in this article since we had hardware constraint. You can try training your model on a higher number of records and see if you can achieve better performance.
  2. Try adding more LSTM and dense layers to the model. If the model overfits, try to add dropout.
  3. Try to change the optimizer function and train the model with higher number of epochs.

Please share your results along with the neural network configuration in the comments section. I would love to see how well did you perform.

22 Aug 2019 12:51pm GMT

PSF GSoC students blogs: Last Blog Post

Hey, everyone!!!

So, As I already written my last blog post here you can read by clicking to this link https://blogs.python-gsoc.org/en/iflameings-blog/. This blog post explains entire period of Gsoc. Since the title of this blog post is last, I will share my experience and how I got selected in gsoc. GSoC is a great platform if you want to getting started with open source. It doesn't just teach you to contribute to open source but It also give you brand which you needed at some point of your life. It will help you in getting internship and job. It also provide you a community for the lifetime if you keep contributing to the project. The past 3 months is my best days of my life. Learning something new everyday. It enhanced my coding and programming skills. I saw to power of test and how It help in refactoring and adding new feature to project. I learnt some trending framework of industry Jest, graphql, react, webSocket and many more small library which I install from npm and used in my project. I started contributing to gatsby-source-plone when Iodide doesn't came into the gsoc. I solve some issue before the announcement of Gsoc organisation but when Google announces the organisation Plone is not selected. I got very frusted but I keep contributing to gatsby-source-plone. On 7th of march I found that Plone comes under the PSF organisation I became very happy. I keep contributing to it and finally got selected into this project :)

22 Aug 2019 12:24pm GMT

Matt Layman: Celery In A Shiv App - Building SaaS #31

In this episode, we baked the Celery worker and beat scheduler tool into the Shiv app. This is one more step on the path to simplifying the set of tools on the production server. I started the stream by reviewing the refactoring that I did to conductor/main.py. The main file is used to dispatch to different tools with the Shiv bundle. The refactored version can pass control to Gunicorn, the Django management tools, or Celery.

22 Aug 2019 7:42am GMT

10 Nov 2011

feedPlanetJava

OSDir.com - Java: Oracle Introduces New Java Specification Requests to Evolve Java Community Process

From the Yet Another dept.:

To further its commitment to the Java Community Process (JCP), Oracle has submitted the first of two Java Specification Requests (JSRs) to update and revitalize the JCP.

10 Nov 2011 6:01am GMT

OSDir.com - Java: No copied Java code or weapons of mass destruction found in Android

From the Fact Checking dept.:

ZDNET: Sometimes the sheer wrongness of what is posted on the web leaves us speechless. Especially when it's picked up and repeated as gospel by otherwise reputable sites like Engadget. "Google copied Oracle's Java code, pasted in a new license, and shipped it," they reported this morning.



Sorry, but that just isn't true.

10 Nov 2011 6:01am GMT

OSDir.com - Java: Java SE 7 Released

From the Grande dept.:

Oracle today announced the availability of Java Platform, Standard Edition 7 (Java SE 7), the first release of the Java platform under Oracle stewardship.

10 Nov 2011 6:01am GMT

28 Oct 2011

feedPlanet Ruby

O'Reilly Ruby: MacRuby: The Definitive Guide

Ruby and Cocoa on OS X, the iPhone, and the Device That Shall Not Be Named

28 Oct 2011 8:00pm GMT

14 Oct 2011

feedPlanet Ruby

Charles Oliver Nutter: Why Clojure Doesn't Need Invokedynamic (Unless You Want It to be More Awesome)

This was originally posted as a comment on @fogus's blog post "Why Clojure doesn't need invokedynamic, but it might be nice". I figured it's worth a top-level post here.

Ok, there's some good points here and a few misguided/misinformed positions. I'll try to cover everything.

First, I need to point out a key detail of invokedynamic that may have escaped notice: any case where you must bounce through a generic piece of code to do dispatch -- regardless of how fast that bounce may be -- prevents a whole slew of optimizations from happening. This might affect Java dispatch, if there's any argument-twiddling logic shared between call sites. It would definitely affect multimethods, which are using a hand-implemented PIC. Any case where there's intervening code between the call site and the target would benefit from invokedynamic, since invokedynamic could be used to plumb that logic and let it inline straight through. This is, indeed, the primary benefit of using invokedynamic: arbitrarily complex dispatch logic folds away allowing the dispatch to optimize as if it were direct.

Your point about inference in Java dispatch is a fair one...if Clojure is able to infer all cases, then there's no need to use invokedynamic at all. But unless Clojure is able to infer all cases, then you've got this little performance time bomb just waiting to happen. Tweak some code path and obscure the inference, and kablam, you're back on a slow reflective impl. Invokedynamic would provide a measure of consistency; the only unforeseen perf impact would be when the dispatch turns out to *actually* be polymorphic, in which case even a direct call wouldn't do much better.

For multimethods, the benefit should be clear: the MM selection logic would be mostly implemented using method handles and "leaf" logic, allowing hotspot to inline it everywhere it is used. That means for small-morphic MM call sites, all targets could potentially inline too. That's impossible without invokedynamic unless you generate every MM path immediately around the eventual call.

Now, on to defs and Var lookup. Depending on the cost of Var lookup, using a SwitchPoint-based invalidation plus invokedynamic could be a big win. In Java 7u2, SwitchPoint-based invalidation is essentially free until invalidated, and as you point out that's a rare case. There would essentially be *no* cost in indirecting through a var until that var changes...and then it would settle back into no cost until it changes again. Frequently-changing vars could gracefully degrade to a PIC.

It's also dangerous to understate the impact code size has on JVM optimization. The usual recommendation on the JVM is to move code into many small methods, possibly using call-through logic as in multimethods to reuse the same logic in many places. As I've mentioned, that defeats many optimizations, so the next approach is often to hand-inline logic everywhere it's used, to let the JVM have a more optimizable view of the system. But now we're stepping on our own feet...by adding more bytecode, we're almost certainly impacting the JVM's optimization and inlining budgets.

OpenJDK (and probably the other VMs too) has various limits on how far it will go to optimize code. A large number of these limits are based on the bytecoded size of the target methods. Methods that get too big won't inline, and sometimes won't compile. Methods that inline a lot of code might not get inlined into other methods. Methods that inline one path and eat up too much budget might push out more important calls later on. The only way around this is to reduce bytecode size, which is where invokedynamic comes in.

As of OpenJDK 7u2, MethodHandle logic is not included when calculating inlining budgets. In other words, if you push all the Java dispatch logic or multimethod dispatch logic or var lookup into mostly MethodHandles, you're getting that logic *for free*. That has had a tremendous impact on JRuby performance; I had previous versions of our compiler that did indeed infer static target methods from the interpreter, but they were often *slower* than call site caching solely because the code was considerably larger. With invokedynamic, a call is a call is a call, and the intervening plumbing is not counted against you.

Now, what about negative impacts to Clojure itself...

#0 is a red herring. JRuby supports Java 5, 6, and 7 with only a few hundred lines of changes in the compiler. Basically, the compiler has abstract interfaces for doing things like constant lookup, literal loading, and dispatch that we simply reimplement to use invokedynamic (extending the old non-indy logic for non-indified paths). In order to compile our uses of invokedynamic, we use Rémi Forax's JSR-292 backport, which includes a "mock" jar with all the invokedynamic APIs stubbed out. In our release, we just leave that library out, reflectively load the invokedynamic-based compiler impls, and we're off to the races.

#1 would be fair if the Oracle Java 7u2 early-access drops did not already include the optimizations that gave JRuby those awesome numbers. The biggest of those optimizations was making SwitchPoint free, but also important are the inlining discounting and MutableCallSite improvements. The perf you see for JRuby there can apply to any indirected behavior in Clojure, with the same perf benefits as of 7u2.

For #2, to address the apparent vagueness in my blog post...the big perf gain was largely from using SwitchPoint to invalidate constants rather than pinging a global serial number. Again, indirection folds away if you can shove it into MethodHandles. And it's pretty easy to do it.

#3 is just plain FUD. Oracle has committed to making invokedynamic work well for Java too. The current thinking is that "lambda", the support for closures in Java 7, will use invokedynamic under the covers to implement "function-like" constructs. Oracle has also committed to Nashorn, a fully invokedynamic-based JavaScript implementation, which has many of the same challenges as languages like Ruby or Python. I talked with Adam Messinger at Oracle, who explained to me that Oracle chose JavaScript in part because it's so far away from Java...as I put it (and he agreed) it's going to "keep Oracle honest" about optimizing for non-Java languages. Invokedynamic is driving the future of the JVM, and Oracle knows it all too well.

As for #4...well, all good things take a little effort :) I think the effort required is far lower than you suspect, though.

14 Oct 2011 2:40pm GMT

07 Oct 2011

feedPlanet Ruby

Ruby on Rails: Rails 3.1.1 has been released!

Hi everyone,

Rails 3.1.1 has been released. This release requires at least sass-rails 3.1.4

CHANGES

ActionMailer

ActionPack

ActiveModel

ActiveRecord

ActiveResource

ActiveSupport

Railties

SHA-1

You can find an exhaustive list of changes on github. Along with the closed issues marked for v3.1.1.

Thanks to everyone!

07 Oct 2011 5:26pm GMT

21 Mar 2011

feedPlanet Perl

Planet Perl is going dormant

Planet Perl is going dormant. This will be the last post there for a while.

image from planet.perl.org

Why? There are better ways to get your Perl blog fix these days.

You might enjoy some of the following:

Will Planet Perl awaken again in the future? It might! The universe is a big place, filled with interesting places, people and things. You never know what might happen, so keep your towel handy.

21 Mar 2011 2:04am GMT

improving on my little wooden "miniatures"

A few years ago, I wrote about cheap wooden discs as D&D minis, and I've been using them ever since. They do a great job, and cost nearly nothing. For the most part, we've used a few for the PCs, marked with the characters' initials, and the rest for NPCs and enemies, usually marked with numbers.

With D&D 4E, we've tended to have combats with more and more varied enemies. (Minions are wonderful things.) Numbering has become insufficient. It's too hard to remember what numbers are what monster, and to keep initiative order separate from token numbers. In the past, I've colored a few tokens in with the red or green whiteboard markers, and that has been useful. So, this afternoon I found my old paints and painted six sets of five colors. (The black ones I'd already made with sharpies.)

D&D tokens: now in color

I'm not sure what I'll want next: either I'll want five more of each color or I'll want five more colors. More colors will require that I pick up some white paint, while more of those colors will only require that I re-match the secondary colors when mixing. I think I'll wait to see which I end up wanting during real combats.

These colored tokens should work together well with my previous post about using a whiteboard for combat overview. Like-type monsters will get one color, and will all get grouped to one slot on initiative. Last night, for example, the two halfling warriors were red and acted in the same initiative slot. The three halfling minions were unpainted, and acted in another, later slot. Only PCs get their own initiative.

I think that it did a good amount to speed up combat, and that's even when I totally forgot to bring the combat whiteboard (and the character sheets!) with me. Next time, we'll see how it works when it's all brought together.

21 Mar 2011 12:47am GMT

20 Mar 2011

feedPlanet Perl

Perl Vogue T-Shirts

Is Plack the new Black?In Pisa I gave a lightning talk about Perl Vogue. People enjoyed it and for a while I thought that it might actually turn into a project.

I won't though. It would just take far too much effort. And, besides, a couple of people have pointed out to be that the real Vogue are rather protective of their brand.

So it's not going to happen, I'm afraid. But as a subtle reminder of the ideas behind Perl Vogue I've created some t-shirts containing the article titles from the talk. You can get them from my Spreadshirt shop.

20 Mar 2011 12:02pm GMT