In this project, we’ll show you how transfer learning can be applied to retrain a Deep Learning network to classify chest X-rays images into Covid-19 or Non-Covid-19.
In this series of articles, we’ll apply a Deep Learning network, ResNet50, to diagnose Covid-19 in chest X-ray images. We’ll use Python’s TensorFlow library to train the neural network on a Jupyter Notebook.
The tools and libraries you’ll need for this project are:
We are assuming that you are familiar with deep learning with Python and Jupyter notebooks. If you're new to Python, start with this tutorial. And if you aren't yet familiar with Jupyter, start here.
COVID-19 has had a dramatic impact on our lives. It was declared a pandemic by the World Health Organization (WHO) on March 11, 2020 and spread rapidly around the world. Quick diagnosis of infections is critical to treating patients and limiting spread of the virus. The most common laboratory detection method is the real-time reverse transcription polymerase chain reaction (PCR). However, this technique is not efficient enough: it's time-consuming and often has low sensitivity.
Fortunately, deep learning provides an effective supplementary method for diagnosing Covid-19 in chest X-rays and differentiating it from other bacterial and viral diseases. Several peer-reviewed studies showed that deep learning can detect Covid-19 infection in the chest if trained on enough images. These studies used mainly transfer learning-based networks to fine-tune pretrained models to perform a new task: Covid-19 detection.
In this project, we’ll show you how transfer learning can be applied to retrain a Deep Learning network to classify chest X-rays images into Covid-19 or Non-Covid-19. By the end of the series, we'll have a neural network that can diagnose COVID-19 with greater than 95% accuracy. It will even be able to show heatmaps of the areas in a chest x-ray that led to a suspected diagnosis of COVID-19:
Deep Learning (DL) is a subset of Artificial Intelligence (AI) that involves DL network architectures with more than one hidden layer (multi-layers). DL networks learn hierarchically, meaning that features can be learned at different levels, from low to high, through the various layers of the network.
Many DL networks are trained with a huge dataset, called ImageNet, which gives them very powerful feature extraction capabilities. Therefore, it is more efficient to use the learned weights and filters of these networks instead of building the network from scratch, which may require a huge number of images and a long time to gain similar feature extraction power.
The approach where a pre-trained model is fine-tuned to perform an additional classification task by freezing its learned weights and replacing its fully connected layers with new ones (depending on the new assigned classification task) is called "transfer learning."
In this project, we’ll use ResNet50 – a very deep network – which we expect to perform well when classifying Covid-19 and Non-Covid-19 chest X-rays.
Residual Learning: ResNet50
In recent years, DL networks, and in particular convolution neural networks (CNNs) have been applied in the various areas to solve problems with impressive performance. CNNs underwent significant updates in terms of working principles, inner structure, and the number of layers. AlexNet was first proposed in 2012 with 8 layers, followed by VGG with 18 layers in 2014, GoogleNet in 2015 with a deeper structure of 22 layers and an inception block added to it.
These networks have become very deep indeed, like GoogleNet. But this extra depth has created some issues. It was noticed that the "very deep" DL makes optimization difficult during network training. This can affect the generalization performance of the network - meaning the network performs very well on its training data set, but doesn't generalize well to handle data it didn't see during training.
To overcome this problem, residual learning was proposed in 2016 for training very deep DL networks. Residual learning, or Residual networks (ResNets), enables "skip" (a.k.a. residual) connections over some layers (instead of consecutive connections over stacked layers, such as in plain DL networks like AlexNet.
Figure 1: Skip connection approach
During training of a DL network using backpropagation, the gradient of error is calculated and propagated to the shallow layers. In deeper layers, this error becomes smaller until it finally vanishes. This is the problem of gradient vanishing in very deep networks, and it can be solved by residual learning as proposed by He et al., (2016) as shown in Figures 1 and 2.
Figure 2: Original residual unit proposed by He et al., (2016).
In the next article, we’ll discuss materials and methods for this project. Stay tuned!