New TIBCO Community Launches Soon!

The community will temporarily be 'read-only' beginning July 8th until the launch.

Deep Learning Series Part 1: Transfer Learning Using Keras with TIBCO® Enterprise Runtime for R (TERR™) on the Cifar-10 Dataset

By:
Last updated:
1:24pm Sep 23, 2019

Overview and Prerequisites

This example will the Keras R package to build an image classifier in TIBCO® Enterprise Runtime for R  (TERR™).

You will also see:

  1. how to subset of the Cifar-10 dataset to compensate for computation resource constraints;
  2. how to retrain a neural network with pre-trained weights;
  3. how to do basic performance analysis on the models.

To get started, we need to install all the R packages in TERR™ first. We will use R.matlab to read the dataset, RinR and jpeg to visualize images, keras to build our neural network, EBImage for image resizing, ramify for its utilities, tibble and yardstick for performance analysis.

In addition to the packages, you also need python and tensorflow (python module) on your machine. You can follow this for the installation process.

NOTE: An R notebook (.rmd) version of this document is attached below.


The Dataset

We are going to use the CIFAR-10 dataset to train and test our model.

The CIFAR-10 dataset is a tiny image dataset with labels. It contains 10 different classes of objects/animals, such as airplanes, birds, and horses. We choose to use the dataset because it is a popular image classifcation benchmark, while also being very easy to load. 

The dataset is available for download from the University of Toronto website. 

file <- "https://www.cs.toronto.edu/~kriz/cifar-10-matlab.tar.gz"
dest <- "cifar-10-matlab.tar.gz"
if(!dir.exists("cifar-10-batches-mat")) {
  download.file(file, destfile=dest)
  untar(dest)
}

 

Note: There are three versions of the dataset available for download. We arbitrarily choose to use the Matlab version. You can download the other versions from their website

The dataset is organized into 6 files. Each file contains 1 batch of data. Each batch contains 10,000 images and labels. 5 batches of data are designated for training, and 1 batch is for testing.

The test batch contains exactly 1,000 images of each class. For our purpose, the amount of data in this batch is sufficient to train and test our model. 

Because the dataset is built for Matlab, we choose to use R.matlab package to load the file.

library(R.matlab)
path <- file("./cifar-10-batches-mat/test_batch.mat")
raw <- readMat(path)



The images are contained in the variable raw$data as a matrix. Each row contains a sample of the image.

Their labels are contained in raw$labels as a column vector. Each row contains a label corresponds to the image in the same row.

data <- raw$data
labels <- raw$labels

print(dim(data))
print(dim(labels))
[1] 10000  3072
[1] 10000     1

 

As the dimensions suggest, there are indeed 10,000 samples.

Each image is flattened, which means it is sliced and stitched to fit in as one row in a matrix. The images are all 32 * 32 pixels. Each pixel is represented by 3 values in the RGB color space. Therefore, each row fo raw$data has (32 * 32 * 3 = ) 3072 elements.

Each label is a number from 0 - 9, indicating the classification ground truth. To know what each number represents, we need to read the file batches.meta.mat. It contains the metadata that tells us the label-class mapping.

meta_path <- file("./cifar-10-batches-mat/batches.meta.mat")
meta <- readMat(meta_path)
classes <- unlist(meta$label.names)
classes
[1] "airplane"   "automobile" "bird"       "cat"        "deer"       "dog"        "frog"       "horse"     

 [9] "ship"       "truck"  

 

For example, "airplane" is at index 0, indicating it is represented by the number "0".

It is unfortunate that the labels in this dataset are 0-indexed, whereas TERR™ is 1-indexed. We need to keep a mental note of this difference so that we don't make any off-by-one mistakes.

In this example, we aim to only classify 3 classes of object, so that with only a small subset of CIFAR-10, we can still achieve good results. You can also easily modify the code to make it classify all 10 classes.

The dataset contains 3 means of ground transportation. Let's use those as our classes.

# these are the classes we picked
pick_classes <- c("automobile", "truck", "horse")

# the numerical representations of these classes
pick_repr <- match(pick_classes, classes) - 1

# create a filter condition
pick_indices <- labels %in% pick_repr

# filter labels
labels <- labels[pick_indices]

# map labels
mapping <- function(x) {
  match(x, pick_repr) - 1
}
labels <- sapply(labels, mapping)

# filter data
data <- data[pick_indices,]

n_samples <- dim(data)[1]
n_classes <- length(pick_classes)

print(dim(data))
print(length(labels))
[1] 3000 3072
[1] 3000

 

There are 1,000 samples per class, so we ended up with 3,000 samples in total.

We are going to use a convolutional neural network, so we need to reshape each flattened image to 32 * 32 * 3 (i.e. 32 * 32 pixels of RGB values). That's a 3000 * 32 * 32 * 3 tensor.

To do this, we fill a new multidimensional array (i.e. a tensor) with our data. We are filling each image with values like we are filling an empty chocolate box with chocolates: from left to right, top to bottom. Except in this case, for each spot in the box, we stack three chocolates on top of one another. This is exactly what the code below does. But that's not all. If you were to visualize an image now, you will notice the image is sideways. We need to rotate the image by switching the middle 2 dimensions. 

data <- array(data, dim = c(n_samples, 32, 32, 3))
# the image is rotated, make it the right orientation
data <- aperm(data, c(1,3,2,4))
print(dim(data))
[1] 3000   32   32    3

 

To recap, the newly arranged variable, data, has 4 dimensions: number of samples, width, height, and color space. 

Now, we use the jpeg package to see an example of the image.

# examine the 200th image
i <-200
print(pick_classes[labels[i] + 1])

# show image
library(jpeg)
img <- data[i, , , ] / 255
writeJPEG(img, target="figure/viz1.jpeg")
[1] "truck"

 

Indeed, we see the image matches its label.

Now, we need to encode our labels to one-hot vectors. A one-hot vector is simply a binary vector that only one element is set to 1, and the rest are set to 0. The index of the "1" value reflects which class the sample belongs to.

Keras' util function to_catagorical() can do the conversion for us. It takes a class vector and converts it to a binary one-hot matrix.

For example, a class vector of 3 samples may look like: [1, 0, 2]. Each element in this class vector corresponds to a sample and indicates the index of the "1" value in that one-hot vector. With 0-indexing, the vector is translated to [[0, 1, 0], [1, 0, 0], [0, 0, 1]].

library(keras)
labels <- to_categorical(labels, num_classes = n_classes)
dim(labels)
[1] 3000    3

 

We then split our samples into a test set and a training set. To make a good test set for evaluation, we randomly select the same number of images from each class and put them to the dataset.

n_train <- 2700
n_test <- n_samples - n_train

# number of samples per class in test set
n_test_per_class <- rep(as.integer(n_test / n_classes), n_classes)
if (sum(n_test_per_class) != n_test) n_test_per_class[1] <- n_test_per_class[1] + 1

# reproducibility
set.seed(42)
shuffle_index <- sample(n_samples)
data <- data[shuffle_index,,,]
labels <- labels[shuffle_index,]

# our selection of test set
test_index <- rep(FALSE, n_samples)
for (c in 1:n_classes) {
  # select index
  c_index <- labels[, c] == 1
  # pick first occurances of each class
  c_index <- head(which(c_index %in% TRUE), n_test_per_class[c])
  test_index[c_index] <- TRUE
}

# our selection of training set
train_index <- !test_index

data_train <- data[train_index,,,]
labels_train <- labels[train_index,]
data_test <- data[test_index,,,]
labels_test <- labels[test_index,]

print(dim(data_train))
print(dim(labels_train))
print(dim(data_test))
print(dim(labels_test))
[1] 2700   32   32    3
[1] 2700    3
[1] 300  32  32   3
[1] 300   3

The Model

Although it is quite easy to build our own convolutional neural network, the difficulties in model/hyperparameter selection along with the constraints on computation resource severely limits what our homegrown models can achieve. On the other hand, neural networks trained by machine learning experts not only can achieve better accuracy, but also saves us a lot of time for training and implementing a classifier.

Spoilers: I also trained a model with randomly initialized weights. It took 3 times as long and is 22% less accurate on the test set (pre-trained vs. randomly initialized: 90% accuracy in 8 minutes vs. 70% accuracy in 23 minutes).

These pre-trained models are not trained on the particular task we have in hand, but we can perform transfer learning to make the models adapt to our dataset. More specifically, pre-trained convolutional neural networks can detect high-level features in an image, such as hooves, windows, and wheels. These features are detected (whether it is present in the image or not) before it reaches the neural network's last layers. The last layers use this information to decide what are the probabilities a given image matches all the classes. Therefore, functionally, if we replace the last layers, while keeping the feature detection layers, we can train the network to use the feature detection results to make a prediction on our own dataset.

For more on this subject, you can check out this article.

Let's use a pre-trained Resnet50 model from Keras. If you want to see the model's architecture, uncomment the second line and it will be printed to your console.

base_model <- application_resnet50(include_top = FALSE)
#base_model

 

We choose to set include_top = FALSE in order to leave the last (top) layer out from our model, so that we can add our own layers to the end.

last_layer <- base_model$output %>%
              layer_global_average_pooling_2d() %>%
              layer_dense(units = 32, activation = "relu", name = "dense_32",
                          kernel_initializer = initializer_he_normal(seed = 44),
                          kernel_regularizer = regularizer_l2(0.02)) %>%
              layer_dense(units = n_classes, activation = "softmax", name = "dense_nc",
                          kernel_initializer = initializer_he_normal(seed = 45),
                          kernel_regularizer = regularizer_l2(0.02))

model <- keras_model(inputs = base_model$input, outputs = last_layer)
#model

 

Now we freeze the weights on all but the layers we just made. This prevents gradient updates from being applied to those weights. Here, the advantage of transfer learning shines. Because we freeze most of the layers, we are saving time updating their weights. 

We compile the graph and specify the optimization method and the loss function.

for (layer in base_model$layers){
  layer$trainable = FALSE
}
model$compile(optimizer='rmsprop', loss='categorical_crossentropy')
#model

Now the model is ready for training.


The Training

Let's first define the hyperparameters and callback functions.

epoches <- 10
batch_size <- 10
# Keras will automatically take out validation_split * trainingset_size to form a validation set.
validation_split <- 0.2
callbacks <- list(
  # saves model after each epoch
  callback_model_checkpoint("cifar10-resnet50-ckpt.h5"),
  # saves the training results to a csv file after each epoch
  callback_csv_logger("cifar10-resnet50-history.csv")
)

 

Before we start the training process, we need to return to the data again: we need to preprocess them first. This process is particular to each model, but in general, it rescales the data for better model convergence.

In addition, CIFAR-10 images are too small for the pre-trained model to take as an input. We need to rescale the images from 32 * 32 *3 to 224 * 224 * 3. We can do this by using bilinear interpolation provided by the EBImage package. The input size can vary from model to model as well. To find out the input size for a pre-trained model, go to the Keras documentation.

library("EBImage")
data_fill <- array(rep(0, n_train * 224 * 224 * 3), dim = c(n_train, 224, 224, 3))
for (i in 1:n_train){
  # resizing method
  data_fill[i,,,] <- resize(data_train[i,,,], w = 224, h = 224)
}
data_train <- data_fill

# visualize the image
img <- data_train[11, , , ] / 255
writeJPEG(img, target="figure/viz2.jpeg")

 

# preprocess
data_train <- imagenet_preprocess_input(data_train)
print(dim(data_train))
[1] 2700  224  224    3

 

Now let's go ahead and fit the model. It is one simple Keras call.

options(keras.view_metrics = FALSE)
model %>% fit(
  data_train,
  labels_train,
  epochs=epoches,
  batch_size=batch_size,
  validation_split = validation_split,
  callbacks = callbacks
)

 

Here are the graphs for the training metrics over epochs:

lib <- .libPaths()
RinR::REvaluate({
  .libPaths(c(.libPaths(), lib))

  jpeg(file="figure/viz3.jpeg")
  recorded_data <- read.csv("./cifar10-resnet50-history.csv")
  matplot(recorded_data[, 1], recorded_data[, c(2,4)], type="l",
          xlab = "Epochs", ylab="Accuracy", col=c(2,4))
  legend("bottomright", inset=.05, legend=c("train", "val"), pch='-', col=c(2,4), horiz=TRUE)
  title("Trainig accuracy and validation accuracy over epochs")
  dev.off()

  jpeg(file="figure/viz4.jpeg")
  matplot(recorded_data[, 1], recorded_data[, c(3,5)], type="l",
          xlab = "Epochs", ylab="Loss", col=c(2,4))
  legend("topright", inset=.05, legend=c("train", "val"), pch='-', col=c(2,4), horiz=TRUE)
  title("Trainig loss and validation loss over epochs")
  dev.off()
}, data = c("lib"))


The Performance

Now let's use the yardstick package to analyze how well our model did. First, we make predictions on our test set. Again, we need to resize and preprocess the test set.

data_fill <- array(rep(0, n_test * 224 * 224 * 3), dim = c(n_test, 224, 224, 3))
for (i in 1:n_test){
  # resizing method+
  data_fill[i,,,] <- resize(data_test[i,,,], w = 224, h = 224)
}

data_test <- imagenet_preprocess_input(data_fill)
dim(data_test)
[1] 300 224 224   3

 

Then we can make predictions.

library(ramify)
# Predicted Class
pred_prob <- predict(object = model, x = data_test)
pred_class <- argmax(pred_prob) - 1

# ground truth
true_class <- argmax(labels_test) - 1

 

Let's print out a classification summary.

library(tibble)

# change the factor from numerical representation to string representation
label_to_class <- function(x){
  factor(x, levels=(1:n_classes) - 1, labels=pick_classes)
}

estimates_keras_tbl <- tibble(
    truth = as.factor(true_class) %>% label_to_class(),
    estimate = as.factor(pred_class) %>% label_to_class(),
    class_prob = apply(pred_prob, 1, max) %>% as.vector()
)

print(estimates_keras_tbl)

 

truth estimate class_prob
automobile truck 0.6721346
horse horse 0.6455105
horse horse 0.9994220
horse horse 0.9999963
horse horse 0.9927280
truck truck 0.9991770
automobile truck 0.5472345
horse horse 0.9998840
horse horse 0.8616011
truck truck 0.9998305
... ... ...

At a glance, you can see it is doing well most of the time. The model is quite confident for samples that are classified correctly.

Accuracy

A basic metric for performance is the accuracy of the model. As we can see, the test set accuracy is pretty close to the training set and validation set accuracy. This is ideal because it means our model did not overfit.

library(yardstick)
estimates_keras_tbl %>% metrics(truth, estimate)
0.89

 

Confusion Matrix

The yardstick package also provides a function for making a confusion matrix. It allows us to see the number of samples the model predicted correctly and incorrectly from each class. More importantly, it also tells us, for the incorrect ones, what the model confuses them with.

estimates_keras_tbl %>% conf_mat(truth, estimate)
            Truth

Prediction   automobile truck horse

  automobile         69     0     1

  truck              30   100     1

  horse               1     0    98

 

Truck and automobiles share more features that are alike with each other than with horses; therefore, it makes intuitive sense that the model makes fewer mistakes when it comes to horses, but may confuse trucks and automobiles with each other.


References

Attachments

AttachmentSize
Package icon deep_learning_series_part_1.zip5.95 KB

Feedback (1)

Hi zewang, very good learning material for keras application.

I have a basic question here, how do you install the keras environment for TERR? Did you install the basic "keras" package from the cran source? I am new to keras, and I was under the impression that one needs to install keras through:

devtools::install_github("rstudio/keras")

to get most of the keras functions to work. However I can't get TERR to run the above line, it seems there is compatibility issue between devtools and TERR. If only using install.packages("keras") to install the default keras, many keras functions are unavailable, such as to_categorical. Could you provide some insights on correctly installing keras in TERR?

qing2001 2:11pm Sep. 19, 2018