“Generative Adversarial Nets” Summarized

https://arxiv.org/abs/1406.2661 (2014-6-10)

1. Generative Model VS Discriminative Model

Generative model G: tries to simulate the data distribution

Discriminative model D: estimates the probability that a sample came from the real data rather than G

Generative model and discriminative model tries to beat each other and improve, so finally the trained generative model perfectly replicates the true data and the discriminative model just outputs 0.5.

g vs d

2. Training Objective

\[min_Gmax_DV(D,G) = E_{DATA}[logD(data)] + E_Z[log(1-D(G(z)))]\]

$z$: noise (vector), ~$p_{Z}(z)$
$data$: real data point (vector) ~$p_{DATA}(data)$
$x$: data point (vector)
$G(z; \theta_g)$: differentiable function represented by a multilayer perceptron with parameters $\theta_g$, it maps noise to data space
$D(x; \theta_d)$: function that returns probability that the input $x$ came from the real data

First we want to find D that maximizes $V(D, G)$, for any given Generator G. Since

\[\int_zp_Z(z)log(1-D(G(z)))dz = \int_gp_G(g)log(1-D(g))dg\] \[V(D,G) = \int_x[p_{DATA}(x)log(D(x)) + p_G(x)log(1-D(x))]dx\]

where the support of $x$ is $Supp(p_{DATA}) \cup Supp(p_G)$.

Maximizing this value is equivalent to maximizing the term inside the integral. So we get

\[D^*(x) = \frac{p_{DATA}(x)}{p_{DATA}(x) + p_G(x)}\]

Now we can define

\[C(G) = max_DV(G,D) =E_{DATA}[logD^*(x)] + E_G[log(1-D^*(x))]\]

and find G that minimize this value.

The equation can be rewritten as follows

\[C(G) = -log(4) + 2*JSD(p_{DATA}||p_G)\]

Jensen-Shannon divergence between two distributions is always non-negative and zero only when they are equal, $C^*=-log(4)$ is the global minimum of $C(G)$ and the only solution is $p_G = p_{DATA}$. So we can say that our training objective, when optimized perfectly, succeeds in making the generative model perfectly replicating the true data generating process.

3. Algorithm

gan algorithm

4. Experiment

The authors estimated probability of the test set data under $p_G$ by fitting a Gaussian Parzen window to the samples generated with G and reporting the log-likelihood under this distribution. So the higher the number, the better G simulated real world samples.

gan experiment result

5. Advantages & Disadvantages

1. Disadvantages

There is no explicit representation of $p_G(x)$
D must be synchronized well with G during training if not, ‘the Helvetica scenario’ happens, in which G collapses too many values of z to the same value of x

2. Advantages

Marov chains are never needed
Only backprop is used to obtain gradients
No inference is needed during learning
Wide variety of functions can be incorporated into the model
It can represent very sharp distributions, while methods based on Markov chains require that the distribution be somewhat blurry

Share on

Twitter Facebook Google+ LinkedIn

YoonSoo

“Generative Adversarial Nets” Summarized

1. Generative Model VS Discriminative Model

2. Training Objective

3. Algorithm

4. Experiment

5. Advantages & Disadvantages

1. Disadvantages

2. Advantages

Share on

Leave a Comment

You May Also Enjoy

Generalized Linear Models (GLM)

“ALBERT: A Lite BERT for Self-supervised Learning of Language Representations” Summarized

“Generative Pretraining from Pixels” Summarized

“Language Models are Few-Shot Learners” Summarized