Translation

A TensorFlow pix2pix conditional GAN for paired image to image translation.

Role

Researcher, cGAN architecture, training loop, loss design, evaluation

Stack

TensorFlow · U Net generator · PatchGAN discriminator · TensorBoard · V100 GPU · Facades dataset

Links

GitHub ↗

The Problem

The README explains pix2pix as a conditional GAN that learns to convert input images into corresponding output images, following the paper 'Image to image translation with conditional adversarial networks.'

The same architecture can support label map to photo, grayscale colorization, maps to aerial imagery, and sketch to photo tasks, so the learning objective is structured translation rather than generic image generation.

The Architecture

01Modified U Net generator

The generator uses an encoder decoder U Net with convolution, batch normalization, Leaky ReLU down sampling blocks, transposed convolution upsampling blocks, dropout in the first decoder blocks, ReLU, and skip connections.

02PatchGAN discriminator

The discriminator concatenates the input image with either the target or generated image, then classifies 70 x 70 patches through a convolutional PatchGAN output shaped around 30 x 30 patch decisions.

03Step based TensorFlow training

Training is written around steps rather than epochs, logs losses to TensorBoard, prints progress every 10 steps, shows generated images every 1,000 steps, and checkpoints every 5,000 steps.

Decisions that mattered

Combine adversarial and L1 losses

The generator uses sigmoid cross entropy against real labels plus L1 reconstruction loss, with total generator loss calculated as GAN loss plus LAMBDA times L1 loss where LAMBDA is 100.

Watch generator and discriminator balance

The README calls out the subtlety of GAN logs: if either model's loss gets too low, one side may be dominating. The log(2) reference point helps interpret the competition.

Train long enough to see translation quality

The README shows facades outputs after 200 epochs and notes roughly 15 seconds per epoch on a single V100 GPU, making iteration speed part of the experiment.

The Numbers

200

training epochs

80K

training steps

~15s

per epoch on V100

70x70

PatchGAN field

What it taught me

The important pix2pix trick is not only adversarial realism; L1 loss keeps generated images structurally faithful to the paired target.

GAN training has to be read as a two player system, where healthy loss curves matter more than chasing one number down.