Generative Adversarial Networks – An Experiment with Training Improvement Techniques (in Tensorflow)

Well first, if you’re interested in Deep Learning but just don’t bother reading this post at all, I recommend you to take a look at the deep learning nano degree offered by Udacity. It looks like a well designed series of courses that covers all major aspects of deep learning.

Introduction

Generative Adversarial Networks (GANs) have become one of the most popular advancements in deep learning. The method is unique in that it has a generative model that generates synthetic data from noise, while having another discriminative model that tries to distinguish the synthetic data from real ones. For example, a well trained GAN for image classification could generate images that look way more realistic than thoses generated by other deep learning models. In addition to image recognition, generative models are also useful for tasks such as predicting rare events, as GANs could be used to increase the target sample size so that a predictive model could yield better performance. This post is not intended to serve as an introduction to GANs as there are many great articles covering it. For instance, this article uses a simple example to demonstrate the implementation of GANs in Tensorflow. The objective is to train a model that learns to generate a Gaussian distribution like this:

As a vanilla GAN model is hard to train, the article explored ways to improve training, one highlight being minibatch training (the article has it well explained). Since the article was published, there has been further development in training techniques of GANs. Therefore, I took a look at a couple of techiniques and applied them to this simple example to see how the new models perform. All the new code are written based on the original code here.

Orinigal Models

The original article (which you’re recommended to read first) showed examples of generated distributions by models with or without minibatch technique. I re-ran the two model training processes on my laptop:

No minibatch

Minibatch

The results look slight different from the article. Minibatch is supposed to make the model better at generating a similar distribution but it didn’t work quite well as intended.

Adding Noises

As explained here and here, Adding Gaussian noises (with zero mean and tiny variance) to the input data of the discriminative network, i.e. the synthetic data points generated by the generative model and data points sampled from the real Gaussian distribution, could force the generator output and the real distribution to spread out so that to create more overlaps, which makes it easier for training. I tweaked the original code so now the class DataDistribution could be used to not only sample data from the target distribution, but also sample noises by setting mu = 0 and sigma = 0.001 (or some other small numbers):


class DataDistribution(object):
    def __init__(self, mu = 4, sigma = 0.5):
        self.mu = mu
        self.sigma = sigma

    def sample(self, N):
        samples = np.random.normal(self.mu, self.sigma, N)
        samples.sort()
        return samples

In train method, we can now add noises to the input of the discriminators:


for step in range(params.num_steps + 1):
        # update discriminator
        x = data.sample(params.batch_size)
        z = gen.sample(params.batch_size)
        # Sample noise
        n_x = noise.sample(params.batch_size)
        n_z = noise.sample(params.batch_size)
        loss_d, _, = session.run([model.loss_d, model.opt_d], {
                model.x: np.reshape(x + n_x, (params.batch_size, 1)),
                model.z: np.reshape(z + n_z, (params.batch_size, 1))
        })

The results are as follows:

No minibatch, added noise (std = 0.001)

Minibatch, added noise (std = 0.001)

The model without minibatch is able to mimic the bell shape pretty well, but do notice that it also leaves a long tail to the left. The training loss of the generator actually increased from the first example. The minibatch model does look to have improved a lot from the first example, where the output distribution is much less centered around mean now.

Feature Matching

This post explained pretty well how feature matching works in training GANs. The basic idea is that, instead of just using the activation layer of the discriminator to minimizating the loss of the generator, it uses information from the hidden layer together with the activation layer for better optimization. To implement this, we need to expose a hidden layer (h2) of the discriminator:


def discriminator(input, h_dim, minibatch_layer=True):
    h0 = tf.nn.relu(linear(input, h_dim * 2, 'd0'))
    h1 = tf.nn.relu(linear(h0, h_dim * 2, 'd1'))
    print("h0:{}".format(h0.shape))
    print("h1:{}".format(h1.shape))
    # without the minibatch layer, the discriminator needs an additional layer
    # to have enough capacity to separate the two distributions correctly
    if minibatch_layer:
        h2 = minibatch(h1)
    else:
        h2 = tf.nn.relu(linear(h1, h_dim * 2, scope='d2'))

    h3 = tf.sigmoid(linear(h2, 1, scope='d3'))
    print("h3:{}".format(h3.shape))
    return h3, h2

h2 will be feeded into the generator’s loss function:


# Original loss function: self.loss_g = tf.reduce_mean(-log(self.D2))
self.loss_g = tf.sqrt(tf.reduce_sum(tf.pow(self.D1_h2 - self.D2_h2, 2)))

Where D1_h2 and D2_h2 are two h2 layers from the discriminator that takes in generator’s data and real samples respectively. Here are the results:

No minibatch, added noise (std = 0.001), feature matching

Minibatch, added noise (std = 0.001), feature matching

The model without minibatch improved from the last attempt as you can tell the fat tail has disappeared, though it may not be an apparent improvement on the vanilla method. In contrast, the model with minibatch and added noise did not perform well.

Conclusion

The experiments yielded mixed results, but this really is just a toy project. The updated code can be found here. If you’re interested in learning more about GANs, the linked articles in the post are all really good starting point provided you have prior knowledge in traditional deep networks:

GANs introduction and example: http://http://blog.aylien.com/introduction-generative-adversarial-networks-code-tensorflow/
Improvement techniques: http://http://blog.aylien.com/introduction-generative-adversarial-networks-code-tensorflow/
Adding Noises: http://http://blog.aylien.com/introduction-generative-adversarial-networks-code-tensorflow/
Feature matching: http://http://blog.aylien.com/introduction-generative-adversarial-networks-code-tensorflow/

ggplot2 Time Series Heatmaps: revisited in the tidyverse

I revisited my previous post on creating beautiful time series calendar heatmaps in ggplot, moving the code into the tidyverse.To obtain following example: Simply use the following code:I hope the commented code is self-explanatory – enjoy 🙂 Related If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook…
Original Post: ggplot2 Time Series Heatmaps: revisited in the tidyverse

Winter solstice challenge #3: the winner is Bianca Kramer!

Part of the winning submission in the category ‘best tool‘. A bit later than intended, but I am pleased to announce the winner of the Winter solstice challenge: Bianca Kramer! Of course, she was the only contender, but her solution is awesome! In fact, I am surprised no one took her took, ran it on their own data and just submit that (which was perfectly well within the scope of the challenge). Best Tool: Bianca KramerThe best tool (see the code snippet on the right) uses R and a few R packages (rorcid, rjson, httpcache) and services like ORCID and CrossRef (and the I4OC project), and the (also awesome) oadoi.org project. The code is available on GitHub. Highest Open Knowledge Score: Bianca KramerI did not check the self-reported score of 54%, but since no one challenged here, Bianca wins this category too.…
Original Post: Winter solstice challenge #3: the winner is Bianca Kramer!

Because it's Friday: Principles and Values

Most companies publish mission and vision statements, and some also publish a detailed list of principles that underlie the company ethos. But what makes a good collection of principles, and does writing them down really matter? At the recent Monktoberfest conference, Bryan Cantrill argued that yes, they do matter, mostly by way of some really egregious counterexamples. That’s all from the blog for this week. We’ll be back on Monday — have a great weekend!
Original Post: Because it's Friday: Principles and Values

Machine learning tools for fairness, at scale

Justice (source: Pixabay)Check out the machine learning sessions at the Strata Data Conference in London, May 21-24, 2018. Hurry—best price ends February 23. The problem of fairness comes up in any discussion of data ethics. We’ve seen analyses of products like COMPASS, we’ve seen the maps that show where Amazon first offered same-day delivery, and we’ve seen how job listings shown to women are skewed toward lower-paying jobs. We also know that “fair” is a difficult concept for any number of reasons, not the least of which is the data used to train machine learning models. Kate Crawford’s recent NIPS keynote, The Trouble with Bias, is an excellent introduction to the problem. Fairness is almost always future oriented and aspirational: we want to be fair, we want to build algorithms that are fair. But the data we train with is,…
Original Post: Machine learning tools for fairness, at scale

Put machine learning to work in the real world

Busy city street (source: Pxhere.com)Check out the “Data Science and Machine Learning” sessions at the Strata Data Conference in San Jose, March 5-8, 2018. Hurry—best price ends January 19. We’re in an empirical era of machine learning. Companies are now building platforms that facilitate experimentation and collaboration. At our upcoming Strata Data Conference in San Jose, we have many tutorials and sessions on “Data Science and Machine Learning” (including two days of sessions on enterprise applications of deep learning), and “Data Engineering & Architecture” (including sessions on streaming/real-time from several open source communities). If you want to understand how companies are using big data and machine learning to reinvigorate their businesses, there are many case studies on the schedule geared toward hands-on technologists, and sessions aimed at managers and executives. Putting data and machine learning technologies to work Over the…
Original Post: Put machine learning to work in the real world

StanCon 2018 Helsinki, 29-31 August 2018

StanCon 2018 Helsinki, 29-31 August 2018 Photo (c) Visit Helsinki / Jussi Hellsten StanCon 2018 Asilomar was so much fun that we are organizing StanCon 2018 Helsinki August 29-31, 2018 at Aalto University, Helsinki, Finland (location chosen using antithetic sampling). Full information is available at StanCon 2018 Helsinki website Summary of the information What: One day of tutorials and two days of talks, open discussions, and statistical modeling in beautiful Helsinki. When: August 29-31, 2018 Where: Aalto University, Helsinki, Finland Invited speakers Richard McElreath, Max Planck Institute for Evolutionary Anthropology Maggie Lieu, European Space Astronomy Centre Sarah Heaps, Newcastle University Daniel Simpson, University of Toronto Call for contributed talks StanCon’s version of conference proceedings is a collection of contributed talks based on interactive, self-contained notebooks (e.g., knitr, R Markdown, Jupyter, etc.). For example, you might demonstrate a novel modeling technique,…
Original Post: StanCon 2018 Helsinki, 29-31 August 2018

Designing A/B tests in a collaboration network

[unable to retrieve full-text content]BY SANGHO YOONIn this article, we discuss an approach to the design of experiments in a network. In particular, we describe a method to prevent potential contamination (or inconsistent treatment exposure) of samples due to network effects. We present data from Google Cloud Platform (GCP) as an example of how we use A/B testing when users are connected. Our methodology can be extended to other areas where the network is observed and when avoiding contamination is of primary concern in experiment design. We first describe the unique challenges in designing experiments on developers working on GCP. We then use simulation to show how proper selection of the randomization unit can avoid estimation bias. This simulation is based on the actual user network of GCP.Experimentation on networksA/B testing is a standard method of measuring the effect of…
Original Post: Designing A/B tests in a collaboration network

Square off: Machine learning libraries

Main reading room at the Library of Congress (source: Library of Congress on Flickr)Check out the session, “The Journey of Machine Learning Platform Adoption in Enterprise,” at Strata London, May 21-24, 2018. Hurry—best price ends February 23. Choosing a machine learning (ML) library to solve predictive use cases is easier said than done. There are many to choose from, and each have their own niche and benefits that are good for specific use cases. Even for someone with decent experience in ML and data science, it can be an ordeal to vet all the varied solutions. Where do you start? At Salesforce Einstein, we have to constantly research the market to stay on top of it. Here are some observations on the top five characteristics of ML libraries that developers should consider when deciding what library to use: 1. Programming…
Original Post: Square off: Machine learning libraries