Face Recognition using Transfer Learning

9 min readOct 26, 2020

Transfer Learning and Fine Tuning

Transfer learning is a method through which we can use the weights and hyper-parameter of the pre-trained model to customize the network and train it on our custom dataset. How?

So to do that we actually freeze the layers, what do, I mean by freezing? We will tell the pre-trained network( deep neural network ) that you don’t have to train the whole network for our new dataset just use your knowledge of the previous dataset because all the images are same. What does this mean? It means that in the world images, videos and audio are the type of data which is not dynamic its static i.e. same, they have edges of the object and color nothing complex other than that. Knowledge here means all the hyper-parameter we use while training our network. So, we can use some pre-trained model like ResNet50, vgg16 or vgg19, inseption v3 or mobilenet. These community of data scientist have already invested resources to train the network and give the weights. Hence we will use their knowledge or weights or hyper-parameter to train out value through their network. Here training from their network means we won’t do back propagation and initialize random weight or change the hyper-parameter, we will just give our data to the network and it will train using it’s weights and profound hyper-parameters like kernel, pooling, optimizer, learning rate etc. Here what changes we have to make in pre-created network is, we will not include the last layer of the model that is the output layer, instead we will include our own custom FCL( fully connected layer ) to train our dataset and at the end of our FCL we will add one output layer then we will combine our FCL with the pre-trained network. Finally we will train it.

Fine Tuning is the method where we will not freeze the layer and train the whole model by adding our custom layer at the end. That means we will start the whole process from the beginning for our required dataset. Otherwise its all same as transfer learning. This will consume much more resources, but we sometime use depending on the datas

Now let’s talk about the VGG16 deep learning model.

Basically VGG16 is the old version of the VGG19 where in this model we have 16 layers in the model.

13 convolve layers
3 dense layers

We also have the Max-Pooling Layers which helps in reducing its dimensionality and allowing for assumptions to be made about features with more accuracy.

Architecture of the VGG16:

The input to conv1 layer is of fixed size 224 x 224 RGB image. The image is passed through a stack of convolutional (conv.) layers, where the filters were used with a very small receptive field: 3×3 (which is the smallest size to capture the notion of left/right, up/down, center). In one of the configurations, it also utilizes 1×1 convolution filters, which can be seen as a linear transformation of the input channels. The convolution stride is fixed to 1 pixel. The padding of conv. layer input is such that the resolution is preserved after convolution, i.e. the padding is 1-pixel for 3×3 conv. layers. Pooling is carried out by five max-pooling layers, which follow some of the conv. layers (not all the conv. layers are followed by max-pooling). Max-pooling is performed over a 2×2 pixel window, with stride 2.

Three Fully-Connected (FC) layers follow a stack of convolutional layers (which has a different depth in different architectures): the first two have 4096 channels each, the third performs 1000-way ILSVRC classification and thus contains 1000 channels (one for each class). The final layer is the soft-max layer. The configuration of the fully connected layers is the same in all networks.

All hidden layers are equipped with the rectification (ReLU) non-linearity.

Now let’s make the code for the VGG16 , and train the model.

from keras.applications import VGG16

# VGG16 works on image size of 224 x 224 pixel

img_rows = 224
img_cols = 224

Loading the model

model = VGG16(weights = 'imagenet', 
  include_top = False,       
  input_shape = (img_rows, img_cols, 3))Using TensorFlow backend.Downloading data from https://github.com/fchollet/deep-learning-models/releases/download/v0.1/vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5
58892288/58889256 [==============================] - 865s 15us/step

Inspecting the layers of the Model

Here we are going to inspect the number of layers in the already trained model. So finally we are printing the number of the layers in the VGG16 network.

for (i,layer) in enumerate(model.layers):
    
  print(str(i) + " "+ layer.__class__.__name__, layer.trainable)
        
0 InputLayer False
1 Conv2D True
2 Conv2D True
3 MaxPooling2D True
4 Conv2D True
5 Conv2D True
6 MaxPooling2D True
7 Conv2D True
8 Conv2D True
9 Conv2D True
10 MaxPooling2D True
11 Conv2D True
12 Conv2D True
13 Conv2D True
14 MaxPooling2D True
15 Conv2D True
16 Conv2D True
17 Conv2D True
18 MaxPooling2D True

Seeing our layers classification

These are basically our layers of the VGG16 model

model.layers
[<keras.engine.input_layer.InputLayer at 0x1ae88cf7548>,
 <keras.layers.convolutional.Conv2D at 0x1ae914ef448>,
 <keras.layers.convolutional.Conv2D at 0x1ae914efec8>,
 <keras.layers.pooling.MaxPooling2D at 0x1ae915504c8>,
 <keras.layers.convolutional.Conv2D at 0x1ae91550e88>,
 <keras.layers.convolutional.Conv2D at 0x1ae91560808>,
 <keras.layers.pooling.MaxPooling2D at 0x1ae91560488>,
 <keras.layers.convolutional.Conv2D at 0x1ae915682c8>,
 <keras.layers.convolutional.Conv2D at 0x1ae9156fd48>,
 <keras.layers.convolutional.Conv2D at 0x1ae91579c48>,
 <keras.layers.pooling.MaxPooling2D at 0x1ae9157c548>,
 <keras.layers.convolutional.Conv2D at 0x1ae91582688>,
 <keras.layers.convolutional.Conv2D at 0x1ae91589cc8>,
 <keras.layers.convolutional.Conv2D at 0x1ae91591ec8>,
 <keras.layers.pooling.MaxPooling2D at 0x1ae91593cc8>,
 <keras.layers.convolutional.Conv2D at 0x1ae91593448>,
 <keras.layers.convolutional.Conv2D at 0x1ae9159e6c8>,
 <keras.layers.convolutional.Conv2D at 0x1ae915a7e88>, <keras.layers.pooling.MaxPooling2D at 0x1ae915aea48>]

Seeing the dimensions of the pixel

Here we can see that the image dimensions in the VGG16 is of size (224 x 224 x 3) pixels. So we need 3D image.

model.layers[0].input
<tf.Tensor 'input_1:0' shape=(None, 224, 224, 3) dtype=float32>

Seeing what is the output dimension of the pixel

We can see that output size of the pixel is of size 7 x 7. And the last layer is the Max Pooling Layer which is mentioned in the diagram also.

model.output<tf.Tensor 'block5_pool/MaxPool:0' shape=(None, 7, 7, 512) dtype=float32>

Freezing the layers of the model

Here we freeze all the layers of the model. By default the layers of the model are set to trainable. So we make all the layers of the model to freeze so that they wouldn’t we train again.

All the layers are set to false, which means that they will not train again.

for layer in model.layers:
    layer.trainable = False
    
# Let's print our layers
for (i,layer) in enumerate(model.layers):
    print(str(i) + " "+ layer.__class__.__name__, layer.trainable)0 InputLayer False
1 Conv2D False
2 Conv2D False
3 MaxPooling2D False
4 Conv2D False
5 Conv2D False
6 MaxPooling2D False
7 Conv2D False
8 Conv2D False
9 Conv2D False
10 MaxPooling2D False
11 Conv2D False
12 Conv2D False
13 Conv2D False
14 MaxPooling2D False
15 Conv2D False
16 Conv2D False
17 Conv2D False
18 MaxPooling2D False

Function for creating our new model HEAD

In this function we are creating the top of the new model from the pretrained VGG16 model, and will use it in the new model in which we are adding more layers.

def TopModel(prev_model, num_classes, neurons):    """creates the top or head of the model that will be 
    placed ontop of the bottom layers of the new model"""    top_model = prev_model.output
    top_model = Flatten(name = "flatten")(top_model)
    top_model = Dense(neurons, activation = "relu")(top_model)
    top_model = Dropout(0.1)(top_model)
    top_model = Dense(num_classes, activation = "softmax")(top_model)
    
    return top_model

Now put our new model HEAD into pretrained VGG16

Now we put the new HEAD back in the VGG16 model , as we have changed the pattern of the HEAD by adding more layers into it , and now putting it back in the pretrained VGG16 model.

from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Conv2D, MaxPooling2D, ZeroPadding2D
from keras.layers.normalization import BatchNormalization
from keras.models import Modelnum_classes = 2
neurons=512FC_Head = TopModel(model, num_classes, neurons)newModel = Model(inputs=model.input, outputs=FC_Head)
print(newModel.summary())Model: "model_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, 224, 224, 3)       0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 224, 224, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 224, 224, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 112, 112, 128)     147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 56, 56, 128)       0         
_________________________________________________________________
block3_conv1 (Conv2D)        (None, 56, 56, 256)       295168    
_________________________________________________________________
block3_conv2 (Conv2D)        (None, 56, 56, 256)       590080    
_________________________________________________________________
block3_conv3 (Conv2D)        (None, 56, 56, 256)       590080    
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, 28, 28, 256)       0         
_________________________________________________________________
block4_conv1 (Conv2D)        (None, 28, 28, 512)       1180160   
_________________________________________________________________
block4_conv2 (Conv2D)        (None, 28, 28, 512)       2359808   
_________________________________________________________________
block4_conv3 (Conv2D)        (None, 28, 28, 512)       2359808   
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, 14, 14, 512)       0         
_________________________________________________________________
block5_conv1 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_conv2 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_conv3 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, 7, 7, 512)         0         
_________________________________________________________________
flatten (Flatten)            (None, 25088)             0         
_________________________________________________________________
dense_7 (Dense)              (None, 256)               6422784   
_________________________________________________________________
dropout_4 (Dropout)          (None, 256)               0         
_________________________________________________________________
dense_8 (Dense)              (None, 5)                 1285      
=================================================================
Total params: 21,138,757
Trainable params: 6,424,069
Non-trainable params: 14,714,688
_________________________________________________________________
None

Loading our Image Data Set

In this we are loading our data set , and this function will process our image and tell how many images are there.

from keras.preprocessing.image import ImageDataGeneratortrain_data_dir = 'vgg16/train/'
validation_data_dir = 'vgg16/test/'train_datagen = ImageDataGenerator(
      rescale=1./255,
      rotation_range=20,
      width_shift_range=0.2,
      height_shift_range=0.2,
      horizontal_flip=True,
      fill_mode='nearest')
 
validation_datagen = ImageDataGenerator(rescale=1./255)
 
# Change the batchsize according to your system RAM
train_batchsize = 16
val_batchsize = 5
 
train_set = train_datagen.flow_from_directory(
        train_data_dir,
        target_size=(img_rows, img_cols),
        batch_size=train_batchsize,
        class_mode='categorical')
 
test_set = validation_datagen.flow_from_directory(
        validation_data_dir,
        target_size=(img_rows, img_cols),
        batch_size=val_batchsize,
        class_mode='categorical',
        shuffle=False)Found 198 images belonging to 2 classes.
Found 30 images belonging to 2 classes.

Here is the data set

Training our Layers

Now we will train our layers to build the vgg16 model again by using the transfer learning.So here I have given the training samples , test samples and the batch size.

Then we have compiled the model and save it as “vgg16.h5”.

from keras.optimizers import RMSprop
from keras.callbacks import ModelCheckpoint, EarlyStopping
                   
checkpoint = ModelCheckpoint("vgg16.h5",
                             monitor="val_loss",
                             mode="min",
                             save_best_only = True,
                             verbose=1)earlystop = EarlyStopping(monitor = 'val_loss',
                          min_delta = 0,
                          patience = 4,
                          verbose = 1,
                          restore_best_weights = True)
# Note we use a very small learning rate
newModel.compile(loss = 'categorical_crossentropy',
              optimizer = RMSprop(lr = 0.001),
              metrics = ['accuracy'])train_samples = 198
test_samples = 30
batch_size =8history = newModel.fit_generator(
    train_set,
    steps_per_epoch = train_samples // batch_size,
    epochs = 5,
    callbacks = [earlystop, checkpoint],
    validation_data = test_set,
    validation_steps = test_samples // batch_size)newModel.save("vgg16.h5")Epoch 1/5
24/24 [==============================] - 123s 5s/step - loss: 0.7411 - accuracy: 0.8214 - val_loss: 0.0742 - val_accuracy: 0.8667Epoch 00001: val_loss improved from inf to 0.07422, saving model to vgg16.h5
Epoch 2/5
24/24 [==============================] - 122s 5s/step - loss: 0.3080 - accuracy: 0.8709 - val_loss: 0.5536 - val_accuracy: 0.8667Epoch 00002: val_loss did not improve from 0.07422
Epoch 3/5
24/24 [==============================] - 122s 5s/step - loss: 0.4427 - accuracy: 0.8957 - val_loss: 2.2524e-04 - val_accuracy: 1.0000Epoch 00003: val_loss improved from 0.07422 to 0.00023, saving model to vgg16.h5
Epoch 4/5
24/24 [==============================] - 120s 5s/step - loss: 0.2791 - accuracy: 0.8983 - val_loss: 0.0577 - val_accuracy: 1.0000Epoch 00004: val_loss did not improve from 0.00023
Epoch 5/5
24/24 [==============================] - 119s 5s/step - loss: 0.1861 - accuracy: 0.9313 - val_loss: 0.0147 - val_accuracy: 0.9333Epoch 00005: val_loss did not improve from 0.00023

So here our final accuracy is 93.13.

Loading our model

Now here we have loaded our model that we have trained.

from keras.models import load_modelclassifier = load_model('vgg16.h5')

Recognizing the face from image:

import os
import cv2
import numpy as np
from os import listdir
from os.path import isfile, joinface={"[0]": "deepak",
           "[1]": "rakesh",
            }
face_name={"deepak":"deepak",
           "rakesh":"rakesh"
          }
            
def draw_test(name, pred, im):
    Face = face[str(pred)]
    BLACK = [0,0,0]
    #expanded_image = cv2.copyMakeBorder(im, 80, 0, 0, 100 ,cv2.BORDER_CONSTANT,value=BLACK)
    cv2.putText(im, Face, (20, 60) , cv2.FONT_HERSHEY_SIMPLEX,1, (0,0,255), 2)
    cv2.imshow(name, im)def getRandomImage(path):
    """function loads a random images from a random folder in our test path """
    folders = list(filter(lambda x: os.path.isdir(os.path.join(path, x)), os.listdir(path)))
    random_directory = np.random.randint(0,len(folders))
    path_class = folders[random_directory]
    print("Class - " + face_name[str(path_class)])
    file_path = path + path_class
    file_names = [f for f in listdir(file_path) if isfile(join(file_path, f))]
    random_file_index = np.random.randint(0,len(file_names))
    image_name = file_names[random_file_index]
    return cv2.imread(file_path+"/"+image_name)    for i in range(5):
    input_im = getRandomImage("vgg16/test/")
    input_original = input_im.copy()
    input_original = cv2.resize(input_original, None, fx=0.5, fy=0.5, interpolation = cv2.INTER_LINEAR)
    
    input_im = cv2.resize(input_im, (224, 224), interpolation = cv2.INTER_LINEAR)
    input_im = input_im / 255.
    input_im = input_im.reshape(1,224,224,3)
    
    # Get Prediction
    res = np.argmax(model.predict(input_im, 1, verbose = 0), axis=1)
   
    # Show image with predicted class
    draw_test("Prediction", res, input_original)
    cv2.waitKey(0)cv2.destroyAllWindows()Class - rakesh
Class - rakesh
Class - deepak
Class - deepak
Class - deepak

Here in this deepak is given as index value 0 while rakesh is given as index value 1

Recognizing the face from the LIVE VIDEO

Here I have recognized the face from the live video. So for this imported the libraries

import cv2
import numpy as np
from PIL import Image

Now imported the haarcascade_frontalface_deafult.xml

from  keras.preprocessing import image
face_cascade=cv2.CascadeClassifier('F://haarcascade_frontalface_default.xml')

Now have made the function for the face_detection. In this I have already taken the cropped face, and made the rectangle on it.

def face_detect(img):
    faces = face_cascade.detectMultiScale(img, 1.3, 5)
    if faces is ():
        return None
    
    for (x,y,w,h) in faces:
        cv2.rectangle(img,(x,y),(x+w,y+h),(0,255,255),2)
        crop = img[y:y+h, x:x+w]
    return crop

Now finally with the help of the CV2 recognize the face in the live video.

cap = cv2.VideoCapture(0)
while True:
    _, frame = cap.read()
    face = face_detect(frame)
    if type(face) is np.ndarray:
        face = cv2.resize(face, (224, 224))
        img_array = Image.fromarray(face, 'RGB')
        img_array = np.array(img_array)
        img_array = np.expand_dims(img_array, axis=0)
        result= model.predict(img_array)   
      
        
        if(result[0][1]>0.8):
            name="rakesh"
        elif(result[0][0]>0.8):
            name="deepak"
        else:
            pass
        cv2.putText(frame,name,(50,50), cv2.FONT_HERSHEY_COMPLEX,1,(0,255,0),2)
            
    else:
        cv2.putText(frame,"Nothing found(50,50),cv2.FONT_HERSHEY_ COMPLEX,1,(0,255,0),2)
    cv2.imshow('video',frame)
    if cv2.waitKey(1) == 13: #13 is the Enter Key
        break
    
cap.release()
cv2.destroyAllWindows()