Transfer Learning with Tensorflow Hub

4 min readFeb 18, 2020

https://blog.exxactcorp.com/wp-content/uploads/2019/07/Tensorflow-2.0-bg.png

Introduction to Tensorflow Hub with the dataset found on processed Kaggle data. Well known problem, Sentiment Analysis(Text Classification), is considered for the same. Transfer Learning, on the other hand, is a great method of storing the knowledge gained in the previous learning. Transfer learning happens with us all the time. An easy analogy for transfer learning could be our school time. We do learn addition, subtraction, multiplication, and division in the early stage. Then it will become part of us, we don't relearn it, again and again, each year. The later part of the study is to apply mathematical operations to many complex problems. Knowing the above-said operations we further learn many complex operations like for example HCF, LCM, GCD, etc. The same thing in the machine learning field is named as Transfer Learning. Teach a particular algorithm with a huge amount of data and store all the weights and biases from the learning. Then apply these learned weights and bias to similar domain problems.

The best part of transfer learning is, take the learned weights and biases and then introduce a new layer into the model and make only those new layer to learn from the dataset that you poses. It's just wonderful thinking!! So what this does is, take the weights learned on a generic dataset and then introduced a new layer and make the model learn freshly on the dataset that you want to work on. This will eliminate the problem of not having huge data.

Till recently days, Computer Vision was leveraging this transfer learning, as there was a huge amount of classified data from ImageNet. Now on the verge of BERT, even NLP is able to taste the recipe of transfer learning. I do like such competitions among two different groups of AI. My next task is to build a text classification using BERT. There are a lot of references to share related to this.

Okay!, now is the time for code. The code for the same is available in Github and Colab. Stressing again, Tensorflow Documentation Rocks!!! As usual, following this article with a new dataset. Learning from replicating. It is not exactly the same though. And a useful article I referred to convert datasets to TFDS(TensorFlow Dataset). Article I have referred uses TFDS, whereas in this it's from a CSV.

Loading Data

sentences = []
labels    = []with open("/tmp/data.csv") as fp:
  lines = csv.reader(fp)
  for row in lines:
    sentences.append(row[0])
    labels.append(int(row[1]))split_percent = 0.6
split_to      = int(split_percent * len(sentences))# Spliting data into 60-40
train_sentences = sentences[0:split_to]
val_split     = int(split_percent * len(train_sentences))# Further divind 60% training data into 60-40 for validation data
val_sentences   = train_sentences[val_split:]
train_sentences = train_sentences[0:val_split]
test_sentences  = sentences[split_to:]train_labels = labels[0:split_to]val_labels   = train_labels[val_split:]
train_labels = train_labels[0:val_split]
test_labels  = labels[split_to:]

Nothing special is happening in the above code snippet. Loading the data from the CSV and then splitting it into training, validation and testing data.

Data Preparation

def convert_to_dfts(df, labels):
  ds = tf.data.Dataset.from_tensor_slices(
      (
      tf.cast(df, tf.string),
      tf.cast(labels, tf.int32)
      )
  )
  return ds#Creating Dataset for training, validation and testing datatframes.
train_ds = convert_to_dfts(train_sentences, train_labels)
val_ds   = convert_to_dfts(val_sentences  , val_labels  )
test_ds  = convert_to_dfts(test_sentences , test_labels )

As said in my previous article t.data is the core API, converting data into the TensorFlow dataset. This code will just convert our dataset similar to the one that TFDS objects. Found the above code snippet from this link. Now the data preparation is done, time to move to the main topic of this article.

Building the Model with Tensorflow Hub

import tensorflow_hub as hub# Embedding Layer
embedding = "https://tfhub.dev/google/tf2-preview/gnews-swivel-20dim/1"
hub_layer = hub.KerasLayer(embedding, input_shape=[],
                           dtype=tf.string, trainable=True)# Building Model
model = tf.keras.Sequential()
model.add(hub_layer)
model.add(tf.keras.layers.Dense(16, activation='relu'))
model.add(tf.keras.layers.Dense(1))model.summary()# Compiling
model.compile(optimizer='adam',
              loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
              metrics=['accuracy'])history = model.fit(train_ds.shuffle(100).batch(32),
                    epochs=100,
                    validation_data=val_ds.batch(32),
                    )# Evaluation
results = model.evaluate(test_ds.batch(32), verbose=2)for name, value in zip(model.metrics_names, results):
  print("%s: %.3f" % (name, value))

As you can see in the above snippet, the pre-trained model is used to learn the embeddings. This pre-training is done Token-based text embedding trained on English Google News 130GB corpus.

There is not much difference in the code as compared to the other regular way of learning by learning embeddings. To give a contrast to this will mention the regular method i.e not using the transfer learning way below. But the key point is that any learning that is done an algorithm can be stored and used to solve problems of similar domains.

To mention the other way of solving the above problem without transfer learning. Complete code can be found in this Colab. Do follow this Colab till the end, there is a cool way of visualizing the embedding in Embedding Projector.

import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequencesvocab_size  = 10000
max_len     = 120
oov_tok     = "<OOV>"
embed_dim   = 16
trunc_type  = "post"tokenizer   = Tokenizer(num_words=vocab_size, oov_token=oov_tok)
tokenizer.fit_on_texts(training_sentences)
word_index  = tokenizer.word_indexsequences   = tokenizer.texts_to_sequences(training_sentences)
padded      = pad_sequences(sequences, maxlen=max_len, truncating=trunc_type)testing_sequences = tokenizer.texts_to_sequences(testing_sentences)
testing_padded    = pad_sequences(testing_sequences, maxlen=max_len, truncating=trunc_type)# Model definition
model = tf.keras.Sequential([
    tf.keras.layers.Embedding(vocab_size, embed_dim, input_length=max_len),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(6, activation="relu"),
    tf.keras.layers.Dense(1, activation="sigmoid")
])model.compile(loss="binary_crossentropy", optimizer="adam", metrics=["accuracy"])#model.summary()num_epochs = 10
model.fit(padded, training_labels_final, epochs=num_epochs, validation_data=(testing_padded, testing_labels_final))

So in this embedding is learned from fresh with the only limited data available with us. This could not give the best possible results as deep learning is the data-hungry friend of ours. Give it more data it will be your good friend.

So that's all I have for now. Another day, another concept, another knowledge sharing and more to come. The next article is again similar to transfer learning, but excited to explore BERT for my native language Kannada. Excited to share my experience on the same. Enjoy coding…