CycleGAN Architecture and Paper Walkthrough

March 10th, 2022

What is CycleGAN

The CycleGAN is a technique that involves the automatic training of image-to-image translation models without paired examples. The models are trained in an unsupervised manner using a collection of images from the source and target domain that do not need to be related in any way.

The CycleGAN is an extension of the GAN architecture that involves the simultaneous training of two generator models and two discriminator models.

One generator takes images from the first domain as input and outputs images for the second domain, and the other generator takes images from the second domain as input and generates images for the first domain.

Discriminator models are then used to determine how plausible the generated images are and update the generator models accordingly.

CycleGan is used to transfer characteristic of one image to another or can map the distribution of images to another. In CycleGAN we treat the problem as an image reconstruction problem. We first take an image input (x) and using the generator G to convert into the reconstructed image. Then we reverse this process from reconstructed image to original image using a generator F. Then we calculate the mean squared error loss between real and reconstructed image.

What problem CycleGAN solved

Imgur

Traditionally, training an image-to-image translation model requires a dataset comprised of paired examples. That is, a large dataset of many examples of input images X (e.g. summer landscapes) and the same image with the desired modification that can be used as an expected output image Y (e.g. winter landscapes).

The requirement for a paired training dataset is a limitation. These datasets are challenging and expensive to prepare, e.g. photos of different scenes under different conditions.

While there has been a great deal of research into this task, most of it has utilized supervised training, where we have access to (x, y) pairs of corresponding images from the two domains we want to learn to translate between.

The genius insight of this UC Berkeley group was that we do not, in fact, need perfect pairs.[1] Instead, we simply complete the cycle: we translate from one domain to another and then back again. For example, we go from summer picture (domain A) of a park to a winter one (domain B) and then back again to summer (domain A).

Now we have essentially created a cycle, and, ideally, the original picture (a) and the reconstructed picture () are the same.

If they are not, we can measure their loss on a pixel level, thereby getting the first loss of our CycleGAN: cycle-consistency loss.

The most important feature of CycleGAN architecture is that, if we have un-paired images from different domains without direct mapping between them then a cyclegan model should still be able to translate images between these two domains.


The model architecture is comprised of two generator models: one generator (Generator-A) for generating images for the first domain (Domain-A) and the second generator (Generator-B) for generating images for the second domain (Domain-B).

Generator-A -> Domain-A Generator-B -> Domain-B

The generator models perform image translation, meaning that the image generation process is conditional on an input image, specifically an image from the other domain. Generator-A takes an image from Domain-B as input and Generator-B takes an image from Domain-A as input.

Domain-B -> Generator-A -> Domain-A Domain-A -> Generator-B -> Domain-B

The first discriminator model (Discriminator-A) takes real images from Domain-A and generated images from Generator-A and predicts whether they are real or fake.

The second discriminator model (Discriminator-B) takes real images from Domain-B and generated images from Generator-B and predicts whether they are real or fake.

Domain-A -> Discriminator-A -> [Real/Fake] Domain-B -> Generator-A -> Discriminator-A -> [Real/Fake] Domain-B -> Discriminator-B -> [Real/Fake] Domain-A -> Generator-B -> Discriminator-B -> [Real/Fake]

The discriminator model is responsible for taking a real or generated image as input and predicting whether it is real or fake.

The discriminator model is implemented as a PatchGAN model.

The difference between a PatchGAN and regular GAN discriminator is that, given for example 256x256 images, the regular GAN discriminator maps from a 256x256 image to a single scalar output, which signifies "real" or "fake", whereas the PatchGAN maps from 256x256 to an NxN array of outputs X, where each X_ij signifies whether the patch ij (in X) in the image is real or fake. Here NxN can be different depending on the dimension of an input image

And what would be the patch ij in the input? Well, output X_ij is just a neuron in a convnet, and we can trace back its receptive field to see which input pixels it is sensitive to.

What is PatchGAN

Imgur

In the CycleGAN architecture, the receptive fields of the discriminator turns out to be 70x70 patches in the input image!

What that means is that, PatchGAN architecture outputs a feature map of roughly 30x30 points. Each of these points on the feature map can see a patch of 70x70 pixels on the input space (this is called the receptive field size,

So to be precise, the PatchGAN architecture is equivalent to chopping up the image into 70x70 patches, making a big batch out of these patches, and running a discriminator on each patch, with batchnorm applied across the batch, then averaging the results.

The advantage of using a patchGAN over a normal GAN discriminator is, it has fewer parameters than normal discriminator also it can be applied to input images of different sizes, e.g. larger or smaller than 256×256 pixels.

The output of the model depends on the size of the input image but may be one value or a square activation map of values. Each value is a probability for the likelihood that a patch in the input image is real. These values can be averaged to give an overall likelihood or classification score if needed.

Imgur


Cycle Consistency AND the concept of Translation

The CycleGAN uses an additional extension to the Normal architecture called cycle consistency. This is the idea that an image output by the first generator could be used as input to the second generator and the output of the second generator should match the original image.

The reverse is also true: that an output from the second generator can be fed as input to the first generator and the result should match the input to the second generator.

Cycle consistency is a concept from machine translation where a phrase translated from English to French should translate from French back to English and be identical to the original phrase. The reverse process should also be true.

To be able to use the cycle-consistency loss, we need to have two Generators: one translating from A to B, called GAB, sometimes referred to as simply G, and then another one translating from B to A, called GBA, referred to as F for brevity. There are technically two losses—forward cycle-consistency loss and backward cycle-consistency loss


Forward Cycle Consistency Loss

  • Input photo of summer (collection 1) to GAN 1
  • Output photo of winter from GAN 1
  • Input photo of winter from GAN 1 to GAN 2
  • Output photo of summer from GAN 2
  • Compare photo of summer (collection 1) to photo of summer from GAN 2

Backward Cycle Consistency Loss

  • Input photo of winter (collection 2) to GAN 2
  • Output photo of summer from GAN 2
  • Input photo of summer from GAN 2 to GAN 1
  • Output photo of winter from GAN 1
  • Compare photo of winter (collection 2) to photo of winter from GAN 1

IDENTITY LOSS

The idea of identity loss is simple: we want to enforce that CycleGAN preserves the overall color structure (or temperature) of the picture. So we introduce a regularization term that helps us keep the tint of the picture consistent with the original image. Imagine this as a way of ensuring that even after applying many filters onto your image, you still can recover the original image.

This is done by feeding the images already in domain A to the Generator from B to A (G ), because the CycleGAN should BAunderstand that they are already in the correct domain. In other words, we penalize unnecessary changes to the image: if we feed in a zebra and are trying to “zebrafy” an image, we get the same zebra back, as there is nothing to do.


The Objective Function

There are two components to the CycleGAN objective function, an adversarial loss and a cycle consistency loss. Both are essential to getting good results. If you are familiar with GANs, the adversarial loss should come as no surprise. Both generators are attempting to “fool” their corresponding discriminator into being less able to distinguish their generated images from the real versions. We use the least squares loss (found by Mao et al to be more effective than the typical log likelihood loss) to capture this.

Imgur

However, the adversarial loss alone is not sufficient to produce good images, as it leaves the model under-constrained. It enforces that the generated output be of the appropriate domain, but does not enforce that the input and output are recognizably the same. For example, a generator that output an image y that was an excellent example of that domain, but looked nothing like x, would do well by the standard of the adversarial loss, despite not giving us what we really want. The cycle consistency loss addresses this issue. It relies on the expectation that if you convert an image to the other domain and back again, by successively feeding it through both generators, you should get back something similar to what you put in. It enforces that F(G(x)) ≈ x and G(F(y)) ≈ y.

Imgur

We can create the full objective function by putting these loss terms together, and weighting the cycle consistency loss by a hyperparameter λ. We suggest setting λ = 10.

Imgur

Other articles

March 28th, 2022

Wasserstein GAN Implementation from Scratch with PyTorch

Implementation of Wasserstein GAN Architecture from Scratch read more...

March 20th, 2022

CycleGAN Paper Implementation from Scratch with PyTorch

Implementation of CycleGAN Architecture from Scratch read more...

March 10th, 2022

DCGAN Implementation From Scratch with PyTorch

DCGAN Implementation From Scratch with PyTorch on MNIST Dataset read more...

March 10th, 2022

GoogLeNet or Inception v1 Paper Implementation From Scratch with PyTorch

GoogLeNet Inception v1 Architecture Implementation From Scratch with PyTorch on CIFAR10 Dataset read more...

February 22nd, 2022

ResNet Paper Implementation From Scratch with PyTorch

Residual Network Architecture Implementation From Scratch with PyTorch on CIFAR10 Dataset read more...

February 21st, 2022

Input Shape of Tensor in Neural Network for PyTorch

Understanding nn.Linear() Layer and nn.Conv2D() Layer read more...

February 23rd, 2022

Label Smoothing And CrossEntrophy Loss in PyTorch

Quantization refers to techniques for computing and accessing. read more...

February 20th, 2022

Weight decay in PyTorch and its relation with Learning Rate | L2 Regularization

Weight decay is a regularization technique by adding a small penalty, usually the L2 norm of the weights (all the weights of the model), to the loss function. read more...

February 19th, 2022

Quantization in PyTorch & Mixed Precision Training

Quantization refers to techniques for computing and accessing memory with lower-precision data. read more...

February 14th, 2022

Build LeNet5 from Scratch with PyTorch

LeNet5 is one of the classic Neural Network and great to start with if you are a beginner read more...

February 12th, 2022

EfficientNet Pre-Trained with PyTorch - Covid-19 X-Ray Dataset

EfficientNet is a convolutional neural network architecture and scaling method developed by Google in 2019. read more...

February 4th, 2022

Understanding the Math behind Gram Matrix and why its needed for Neural Style Transfer

Understanding the Math behind Gram Matrix and why its needed for Neural Style Transfer read more...

January 29th, 2022

DCGAN - Generator Function - Understanding Filter Size and Input Shape

Reason for using 4 * 4 * 515 Shape for the Input Dense Layer in the Generator Function. read more...

January 21st, 2022

Deep Neural Network - Base Pytorch Model - FMNIST Dataset

PyTorch Implementation of Classification on Fashion-MNIST dataset which consists of a training set of 60,000 images and test set of 10,000 images read more...

December 30th, 2021

Learn Tensorflow Fundamentals in 45 Mints

A number of tips and techniques useful for daily usage read more...

December 30th, 2021

Numpy Useful Techniques

A number of tips and techniques useful for daily usage read more...

December 28th, 2021

Image Matching & Stitching with Image Descriptors and Creating Panoramic Scene

Given a pair of images I want to stitch them to create a panoramic scene. read more...

December 24th, 2021

ML-Algorithms from scratch in pure python without using sklearn

TFIDF, GridSearchCV, RandomSearchCV, Decision Function of SVM, RBF Kernel, Platt Scaling to find P(Y==1|X) and SGD Classifier with Logloss and L2 regularization read more...

December 22nd, 2021

Computing performance metrics in Pure Python without scikit-learn

Here I will compute Confusion Matrix, F1 Score, AUC Score without using scikit-learn read more...

December 21st, 2021

Understanding Shi-Tomasi Corner Detection Algorithm with OpenCV Python

Shi-Tomasi Corner Detection is an improved version of the Harris Corner Detection Algorithm. read more...

December 21st, 2021

Understanding Harris Corner Detection Algorithm with OpenCV Python

Harris Corner Detection uses a score function to evaluate whether a point is a corner or not. First it computes the horizontal and vertical derivatives (edges) of an image, read more...

December 19th, 2021

Feature-Engineering - Naive-Bayes with Bag-of-Words on Donor Choose Dataset

Understanding Naive Bayes Mathematically and applying on Donor Choose Dataset read more...

December 17th, 2021

Decision Tree on Donors Choose Dataset Kaggle

In this post, I will implement Decision Tree Algorithm on Donor Choose Dataset read more...

December 14th, 2021

Decision Function of SVM (Support Vector Machine) RBF Kernel - Building From Scratch

Decision function is a method present in classifier{ SVC, Logistic Regression } class of sklearn machine learning framework. This method basically returns a Numpy array, In which each element represents whether a predicted sample for x_test by the classifier lies to the right or left side of the Hyperplane and also how far from the HyperPlane. read more...

December 13th, 2021

DCGAN from Scratch with Tensorflow Keras — Create Fake Images from CELEB-A Dataset

A DCGAN (Deep Convolutional Generative Adversarial Network) is a direct extension of the GAN. read more...

December 12th, 2021

Kaggle Haberman's Survival Data Set - Exploratory Data Analysis

The Haberman's survival dataset covers cases from a study by University of Chicago's Billings Hospital done between 1958 and 1970 on the subject of patients-survival who had undergone surgery for breast cancer. read more...

December 11th, 2021

Kaggle House Prices Prediction Competition - Mathematics of Linear Regression

This blog on Linear Regression is about understanding mathematically the concept of Gradient Descent function and also some EDA. read more...

December 10th, 2021

Common Pytorch Snippets

In this notebook I will go over some regular snippets and techniques of it. read more...

December 9th, 2021

Understanding in Details the Mathematics of Logistic Regression and implementing From Scratch with pure Python

Logistic regression is a probabilistic classifier similar to the Naïve Bayes read more...

December 8th, 2021

Most Common Python Challenges asked in Data Science Interview

In this Notebook I shall cover the following most common Python challenges for Data Science Interviews. read more...

December 4th, 2021

Multi ArmedBandit - Kaggle Santa Competition

Multi-armed bandit problems are some of the simplest reinforcement learning (RL) problems to solve. read more...

December 2nd, 2021

Dimensionality Reduction with t-SNE

In this post I will talk about Dimensionality Reduction with t-SNE (t-Distributed Stochastic Neighbor Embedding) using the famous **Digit Recognizer Dataset** (also known as MNIST data) read more...

December 2nd, 2021

Dimensionality Reduction with PCA

In this post I will be using Kaggle's famous **Digit Recognizer Dataset** (also known as MNIST data) to implement Dimensionality Reduction with PCA (Principle Component Analysis). read more...

November 30th, 2021

TF-IDF Model and its implementation with Scikit-learn

In this post, I shall go over TF-IDF Model and its implementation with Scikit-learn. read more...

November 30th, 2021

TFIDF from scratch in pure python without using sklearn

Here I shall implementing TFIDF from scratch in pure python without using sklearn or any other similar packages read more...

November 28th, 2021

Why Platt Scaling and implementation from scratch

Platt Scaling (PS) is probably the most prevailing parametric calibration method. It aims to train a sigmoid function to map the original outputs from a classifier to calibrated probabilities. read more...

November 26th, 2021

Why Bootstrapping is useful and Implementation of Bootstrap Sampling in Random Forests

Bootstrapping resamples the original dataset with replacement many thousands of times to create simulated datasets. This process involves drawing random samples from the original dataset. read more...

November 23rd, 2021

What K-Fold Cross Validation really is in Machine Learning in simple terms

k-fold cross-validation is one of the most popular strategies widely used by data scientists. It is a data partitioning strategy so that you can effectively use your dataset to build a more generalized model. read more...

November 20th, 2021

Neural Network — Implementing Backpropagation using the Chain Rule

In this post, I will go over the mathematical need and the derivation of Chain Rule in a Backpropagation process. read more...

November 18th, 2021

Constant-Q transform with Gravitational Waves Data

The constant-Q transform transforms a data series to the frequency domain. It is related to the Fourier transform. read more...

November 15th, 2021

All the basic Matrix Algebra you will need in Data Science

Most used Matrix Maths in a Nutshell read more...

November 13th, 2021

Batch Normalization in Deep Learning

A batch normalization layer calculates the mean and standard deviation of each of its input channels across the batch and normalizes by subtracting the mean and dividing by the standard deviation. read more...

November 10th, 2021

Why F1 Score uses Harmonic Mean and not an Arithmetic Mean

The F1 score is the harmonic mean of precision and recall, taking both metrics into account read more...

November 8th, 2021

Why we do log transformation of variables and interpretation of Logloss

Original number = x and Log Transformed number x=log(x) read more...

November 6th, 2021

Location-Sensitive-Hashing-for-cosine-similarity

What is cosine distance and cosine similarity read more...

November 4th, 2021

Euclidean Distance and Normalization of a Vector

Euclidean distance is the shortest distance between two points in an N-dimensional space also known as Euclidean space. read more...

November 4th, 2021

Data Representations For Neural Networks - Understanding Tensor, Vector and Scaler

At its core, a tensor is a container for data — almost always numerical data. read more...

November 2nd, 2021

Axes and Dimensions in Numpy and Pandas Array

Understanding the shape and Dimension will be one of the most crucial thing in Machine Learning and Deep Learning Project. This blog makes it clear. read more...

November 1st, 2021

Vectorizing Gradient Descent — Multivariate Linear Regression and Python implementation

Vectorized Gradient-Descent formulae for the Cost function of the for Matrix form of training-data Equations. read more...

October 25th, 2021

Understanding Bias-Variance Trade-off

In this post I shall discuss the concept around and the Mathematics behind the below formulation of Bias-Variance Tradeoff. read more...

October 19th, 2021

Fundamentals of Multivariate Calculus for DataScience and Machine Learning

as soon as you need to implement multi-variate Linear Regression, you hit multivariate-calculus which is what you will have to use to derive the Gradient of a set of multi-variate Linear Equations i.e. Derivative of a Matrix. read more...

October 15th, 2021

Concepts of Differential Calculus for understanding the derivation of Gradient Descent in Linear Regression

The most fundamental definition of Derivative can be stated as — derivative measures the steepness of the graph of a function at some particular point on the graph. read more...

October 9th, 2021

Discrete vs Continuous Probability Distributions in context of Data Science

Discrete and Continuos Random Variable and related Probabilities read more...

October 6th, 2021

yfinance to get historical financials from free and Tesla Stock Prediction

Using yfinance open source library access great amount of historical financial data for Free read more...

October 6th, 2021

The mean, the median, and mode of a Random Variable

The mean is called a measure of central tendency because it tells us something about the center of a distribution, specifically its center. read more...

October 4th, 2021

BitCoin Price Prediction using Deep Learning (LSTM Model)

Using the historical data, I will implement a recurrent neural netwok using LSTM (Long short-term memory) layers to predict the trend of cryptocurrency values in the future. read more...

September 30th, 2021

Moving Averages with Bitcoin Historical Data

Moving averages are one of the most often-cited data-parameter in the space of Stock market trading, technical analysis of market and is extremely useful for forecasting long-term trends. read more...

September 29th, 2021

Categorical Data Handling in Machine Learning Projects

Typically, any data attribute which is categorical in nature represents discrete values which belong to a specific finite set of categories or classes. read more...

September 26th, 2021

Tensorflow-Keras Custom Layers and Fundamental Building Blocks For Training Model

In TensorFlow, we can define custom data augmentations(e.g. mixup, cut mix) as a custom layer using subclassing, which I will talk about in this blog read more...

September 23rd, 2021

Tensorflow Mixed Precision Training — CIFAR10 dataset Implementation

Mixed precision training is a technique used in training a large neural network where the model’s parameter are stored in different datatype precision (FP16 vs FP32 vs FP64). It offers significant performance and computational boost by training large neural networks in lower precision formats. read more...

September 21st, 2021

Predict Transaction Value of a Bank’s Potential Customers — Kaggle Santander Competition

For this Project — I applied LightGBM + XGBoost + CatBoost Ensemble to achieve a Top 11% score in the Kaggle Competition ( Santander Value… read more...

September 18th, 2021

Python Image Processing - Grayscale Conversion and Brightness Increase from Scratch

Understanding from scratch how to convert an image to Grayscale and increase the Brightness of an image, working at the pixel level and some related fundamentals of image processing with Python read more...

September 13th, 2021

Gravitational Waves Kaggle Competition Part-2 - EDA and Baseline TensorFlow / Keras Model

In this Kaggle Competition, data-science helping to find Gravitational Waves by building models to filter out noises from data-streams read more...

September 12th, 2021

Gravitational Waves Detection - Kaggle Competition Part-1

The great Kaggle Competition for G2Net Gravitational Wave Detection. Here, I shall go through the fundamental introduction on Gravitational waves, and some related concepts required for this competition. read more...

September 8th, 2021

Microsoft Malware Detection Kaggle Challenge

Here I apply a large number of Feature Engineering to extract features from the 500GB dataset of Microsoft Malware Classification Kaggle Competiton and then apply XGBoost to achieve a LogLoss of 0.007. read more...

September 5th, 2021

Implementing Stochastic Gradient Descent Classifier with Logloss and L2 regularization without sklearn

In SGD while selecting data points at each step to calculate the derivatives. SGD randomly picks one data point from the whole data set at each iteration to reduce the computations enormously read more...

September 1st, 2021

Implementing RandomSearchCV from scratch (without scikit-learn)

Random Search sets up a grid of hyperparameter values and selects random combinations to train the model and score. read more...

August 30th, 2021

Implementing Custom GridSearchCV from scratch (without scikit-learn)

Implementing Custom GridSearchCV without scikit-learn. read more...

August 24th, 2021

Probability Problem-Solution series for Data Science -Part-1

A series solving some fundamental Probability Problems in the context of DataScience and Machine Learning read more...