07 / 09, 2023
Translation
A TensorFlow pix2pix conditional GAN for paired image to image translation.
Role
Researcher, cGAN architecture, training loop, loss design, evaluation
Stack
TensorFlow · U Net generator · PatchGAN discriminator · TensorBoard · V100 GPU · Facades dataset
Links
The Problem
The README explains pix2pix as a conditional GAN that learns to convert input images into corresponding output images, following the paper 'Image to image translation with conditional adversarial networks.'
The same architecture can support label map to photo, grayscale colorization, maps to aerial imagery, and sketch to photo tasks, so the learning objective is structured translation rather than generic image generation.
The Architecture
01Modified U Net generator
The generator uses an encoder decoder U Net with convolution, batch normalization, Leaky ReLU down sampling blocks, transposed convolution upsampling blocks, dropout in the first decoder blocks, ReLU, and skip connections.
02PatchGAN discriminator
The discriminator concatenates the input image with either the target or generated image, then classifies 70 x 70 patches through a convolutional PatchGAN output shaped around 30 x 30 patch decisions.
03Step based TensorFlow training
Training is written around steps rather than epochs, logs losses to TensorBoard, prints progress every 10 steps, shows generated images every 1,000 steps, and checkpoints every 5,000 steps.
Decisions that mattered
Combine adversarial and L1 losses
The generator uses sigmoid cross entropy against real labels plus L1 reconstruction loss, with total generator loss calculated as GAN loss plus LAMBDA times L1 loss where LAMBDA is 100.
Watch generator and discriminator balance
The README calls out the subtlety of GAN logs: if either model's loss gets too low, one side may be dominating. The log(2) reference point helps interpret the competition.
Train long enough to see translation quality
The README shows facades outputs after 200 epochs and notes roughly 15 seconds per epoch on a single V100 GPU, making iteration speed part of the experiment.
The Numbers
200
training epochs
80K
training steps
~15s
per epoch on V100
70x70
PatchGAN field
What it taught me
The important pix2pix trick is not only adversarial realism; L1 loss keeps generated images structurally faithful to the paired target.
GAN training has to be read as a two player system, where healthy loss curves matter more than chasing one number down.