Generative Adversarial Networks – An Experiment with Training Improvement Techniques (in Tensorflow)

Well first, if you’re interested in Deep Learning but just don’t bother reading this post at all, I recommend you to take a look at the deep learning nano degree offered by Udacity. It looks like a well designed series of courses that covers all major aspects of deep learning.


Generative Adversarial Networks (GANs) have become one of the most popular advancements in deep learning. The method is unique in that it has a generative model that generates synthetic data from noise, while having another discriminative model that tries to distinguish the synthetic data from real ones. For example, a well trained GAN for image classification could generate images that look way more realistic than thoses generated by other deep learning models. In addition to image recognition, generative models are also useful for tasks such as predicting rare events, as GANs could be used to increase the target sample size so that a predictive model could yield better performance. This post is not intended to serve as an introduction to GANs as there are many great articles covering it. For instance, this article uses a simple example to demonstrate the implementation of GANs in Tensorflow. The objective is to train a model that learns to generate a Gaussian distribution like this:

As a vanilla GAN model is hard to train, the article explored ways to improve training, one highlight being minibatch training (the article has it well explained). Since the article was published, there has been further development in training techniques of GANs. Therefore, I took a look at a couple of techiniques and applied them to this simple example to see how the new models perform. All the new code are written based on the original code here.

Orinigal Models

The original article (which you’re recommended to read first) showed examples of generated distributions by models with or without minibatch technique. I re-ran the two model training processes on my laptop:

No minibatch


The results look slight different from the article. Minibatch is supposed to make the model better at generating a similar distribution but it didn’t work quite well as intended.

Adding Noises

As explained here and here, Adding Gaussian noises (with zero mean and tiny variance) to the input data of the discriminative network, i.e. the synthetic data points generated by the generative model and data points sampled from the real Gaussian distribution, could force the generator output and the real distribution to spread out so that to create more overlaps, which makes it easier for training. I tweaked the original code so now the class DataDistribution could be used to not only sample data from the target distribution, but also sample noises by setting mu = 0 and sigma = 0.001 (or some other small numbers):

class DataDistribution(object):
    def __init__(self, mu = 4, sigma = 0.5): = mu
        self.sigma = sigma

    def sample(self, N):
        samples = np.random.normal(, self.sigma, N)
        return samples

In train method, we can now add noises to the input of the discriminators:

for step in range(params.num_steps + 1):
        # update discriminator
        x = data.sample(params.batch_size)
        z = gen.sample(params.batch_size)
        # Sample noise
        n_x = noise.sample(params.batch_size)
        n_z = noise.sample(params.batch_size)
        loss_d, _, =[model.loss_d, model.opt_d], {
                model.x: np.reshape(x + n_x, (params.batch_size, 1)),
                model.z: np.reshape(z + n_z, (params.batch_size, 1))

The results are as follows:

No minibatch, added noise (std = 0.001)

Minibatch, added noise (std = 0.001)

The model without minibatch is able to mimic the bell shape pretty well, but do notice that it also leaves a long tail to the left. The training loss of the generator actually increased from the first example. The minibatch model does look to have improved a lot from the first example, where the output distribution is much less centered around mean now.

Feature Matching

This post explained pretty well how feature matching works in training GANs. The basic idea is that, instead of just using the activation layer of the discriminator to minimizating the loss of the generator, it uses information from the hidden layer together with the activation layer for better optimization. To implement this, we need to expose a hidden layer (h2) of the discriminator:

def discriminator(input, h_dim, minibatch_layer=True):
    h0 = tf.nn.relu(linear(input, h_dim * 2, 'd0'))
    h1 = tf.nn.relu(linear(h0, h_dim * 2, 'd1'))
    # without the minibatch layer, the discriminator needs an additional layer
    # to have enough capacity to separate the two distributions correctly
    if minibatch_layer:
        h2 = minibatch(h1)
        h2 = tf.nn.relu(linear(h1, h_dim * 2, scope='d2'))

    h3 = tf.sigmoid(linear(h2, 1, scope='d3'))
    return h3, h2

h2 will be feeded into the generator’s loss function:

# Original loss function: self.loss_g = tf.reduce_mean(-log(self.D2))
self.loss_g = tf.sqrt(tf.reduce_sum(tf.pow(self.D1_h2 - self.D2_h2, 2)))

Where D1_h2 and D2_h2 are two h2 layers from the discriminator that takes in generator’s data and real samples respectively. Here are the results:

No minibatch, added noise (std = 0.001), feature matching

Minibatch, added noise (std = 0.001), feature matching

The model without minibatch improved from the last attempt as you can tell the fat tail has disappeared, though it may not be an apparent improvement on the vanilla method. In contrast, the model with minibatch and added noise did not perform well.


The experiments yielded mixed results, but this really is just a toy project. The updated code can be found here. If you’re interested in learning more about GANs, the linked articles in the post are all really good starting point provided you have prior knowledge in traditional deep networks:

GANs introduction and example: http://
Improvement techniques: http://
Adding Noises: http://
Feature matching: http://

Last call for the course on Advanced R programming

Last call for the course on Advanced R programming scheduled in Leuven, Belgium on Febuary 20-21 2018. Register at: You’ll learn during that course: The apply family of functions, basic parallel programming for these functions and commonly needed data manipulation skills Making a basic reproducible report using Sweave and knitr including tables, graphs and literate programming How to create an R package Understand how S3 programming works, generics, environments, namespaces. Basic tips on how to organise and develop R code and test it. Need other training: visit
Original Post: Last call for the course on Advanced R programming

Where do you run to? Map your Strava activities on static and Leaflet maps.

So, Strava’s heatmap made quite a stir the last few weeks. I decided to give it a try myself. I wanted to create some kind of “personal heatmap” of my runs, using Strava’s API. Also, combining the data with Leaflet maps allows us to make use of the beautiful map tiles supported by Leaflet and to zoom and move the maps around – with the runs on it, of course. So, let’s get started. First, you will need an access token for Strava’s API. I found all the necessary information for this in this helpful “Getting started” post. As soon as you have the token, you have access to your own data. Now, let’s load some packages and define functions for getting and handling the data. For the get.activities() function, I adapted code from here. library(httr)library(rjson)library(OpenStreetMap)library(leaflet)library(scales)library(dplyr)token <- “” <- function…
Original Post: Where do you run to? Map your Strava activities on static and Leaflet maps.

Fair communication requires mutual consent

I was pleased to read Shirish Agarwal’s blog in reply to the blog I posted last week Do the little things matter? Given the militaristic theme used in my own post, I was also somewhat amused to see news this week of the Strava app leaking locations and layouts of secret US military facilities like Area 51. What a way to mark International Data Privacy Day. Maybe rather than inadvertently misleading people to wonder if I was suggesting that Gmail users don’t make their beds, I should have emphasized that Admiral McRaven’s boot camp regime for Navy SEALS needs to incorporate some of my suggestions about data privacy? A highlight of Agarwal’s blog is his comment I usually wait for a day or more when I feel myself getting inflamed/heated and I wish this had occurred in some of the…
Original Post: Fair communication requires mutual consent

Create your Machine Learning library from scratch with R ! (1/3)

When dealing with Machine Learning problems in R, most of the time you rely on already existing libraries. This fastens the analysis process, but do you really understand what is behind the algorithms? Could you implement a logistic regression from scratch with R?The goal of this post is to create our own basic machine learning library from scratch with R. We will only use the linear algebra tools available in R. There will be three posts: Linear and logistic regression (this one) PCA and k-nearest neighbors classifiers and regressors Tree-based methods and SVM Linear Regression (Least-Square) The goal of liner regression is to estimate a continuous variable given a matrix of observations . Before dealing with the code, we need to derive the solution of the linear regression. Solution derivation of linear regression Given a matrix of observations and the…
Original Post: Create your Machine Learning library from scratch with R ! (1/3)

Deep Learning from first principles in Python, R and Octave – Part 3

“Once upon a time, I, Chuang Tzu, dreamt I was a butterfly, fluttering hither and thither, to all intents and purposes a butterfly. I was conscious only of following my fancies as a butterfly, and was unconscious of my individuality as a man. Suddenly, I awoke, and there I lay, myself again. Now I do not know whether I was then a man dreaming I was a butterfly, or whether I am now a butterfly dreaming that I am a man.”from The Brain: The Story of you – David Eagleman “Thought is a great big vector of neural activity”Prof Geoffrey Hinton This is the third part in my series on Deep Learning from first principles in Python, R and Octave. In the first part Deep Learning from first principles in Python, R and Octave-Part 1, I implemented logistic regression as a…
Original Post: Deep Learning from first principles in Python, R and Octave – Part 3

A smooth transition between chloropleth and cartogram

[…] Related offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more… If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook…
Original Post: A smooth transition between chloropleth and cartogram

PK/PD reserving models

My updated model is not much different to the one presented in the earlier post, apart from the fact that I allow for the correlation between (RLR) and (RRF) and the mean function (tilde{f}) is the integral of the ODEs above.[begin{aligned}y(t) & sim mathcal{N}(tilde{f}(t, Pi, beta_{er}, k_p, RLR_{[i]}, RRF_{[i]}), sigma_{y[delta]}^2) \begin{pmatrix} RLR_{[i]} RRF_{[i]}end{pmatrix} & simmathcal{N} left(begin{pmatrix}mu_{RLR} \mu_{RRF}end{pmatrix},begin{pmatrix}sigma_{RLR}^2 & rho sigma_{RLR} sigma_{RRF}\rho sigma_{RLR} sigma_{RRF} & sigma_{RRF}^2end{pmatrix}right)end{aligned}] Implementation with brms Let’s load the data back into R’s memory: library(data.table) lossData0 <- fread(“”) Jake shows in the appendices of his paper how to implement this model in R with the nlmeODE (Tornoe (2012)) package, together with more flexible models in OpenBUGS (Lunn et al. (2000)). However, I will continue with brms and Stan. Using the ODEs with brms requires a little extra coding, as I have to provide the integration…
Original Post: PK/PD reserving models

Scraping Wikipedia Tables from Lists for Visualisation

Get WikiTables from Lists Recently I was asked to submit a short take-home challenge and I thought what better excuse for writing a quick blog post! It was on short notice so initially I stayed within the confines of my comfort zone and went for something safe and bland. However, I alleviated that rather fast; I guess you want to stand out a bit in a competitive setting. Note that it was a visualisation task, so the data scraping was just a necessary evil. On that note. I resorted to using Wikipedia as I was asked to visualise change in a certain x going back about 500 hundred years. Not many academic datasets go that far, so Wiki will have to do for our purposes. And once you are there, why only visualise half a millennium, let’s go from 1…
Original Post: Scraping Wikipedia Tables from Lists for Visualisation

JAX 2018 talk announcement: Deep Learning – a Primer

I am happy to announce that on Tuesday, April 24th 2018 Uwe Friedrichsen and I will give a talk about Deep Learning – a Primer at the JAX conference in Mainz, Germany. Deep Learning is one of the “hot” topics in the AI area – a lot of hype, a lot of inflated expectation, but also quite some impressive success stories. As some AI experts already predict that Deep Learning will become “Software 2.0”, it might be a good time to have a closer look at the topic. In this session I will try to give a comprehensive overview of Deep Learning. We will start with a bit of history and some theoretical foundations that we will use to create a little Deep Learning taxonomy. Then we will have a look at current and upcoming application areas: Where can we…
Original Post: JAX 2018 talk announcement: Deep Learning – a Primer