Skip to content

pooriyasafaei/cityscapes_pix2pix

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Pix2Pix Implementation for Semantic-to-Real Image Translation

This repository contains an implementation of the Pix2Pix model using PyTorch. The Pix2Pix model leverages Conditional Generative Adversarial Networks (CGANs) to learn a mapping from an input image (e.g., a semantic segmentation map) to a corresponding output image (e.g., a photorealistic cityscape). This approach was first introduced in the paper:

Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, Alexei A. Efros (2017). Image-to-Image Translation with Conditional Adversarial Networks. CVPR.

Table of Contents

Introduction

This project implements Pix2Pix to translate segmented images, as prepared by self-driving cars (e.g., Cityscapes label maps), into realistic images. Such a model can aid in tasks like data augmentation, improved visualization, and sim-to-real adaptation in autonomous driving pipelines.

Theoretical Background

Conditional GANs

Unlike traditional GANs that generate data from random noise, Conditional GANs (CGANs) incorporate conditional inputs. The model takes a given input (such as a segmentation map) and tries to produce an output that looks realistic and aligns with the provided condition. The discriminator thus evaluates pairs of (input, output) to determine if they are "real" or "fake."

U-Net Generator

The generator architecture is based on a U-Net: an encoder-decoder network with skip connections.

  • Encoder: Extracts features and reduces spatial resolution.
  • Decoder: Reconstructs the image from latent features back to the original spatial size.
  • Skip Connections: Preserve fine-grained details from early layers, improving the quality and sharpness of generated images.

PatchGAN Discriminator

Instead of evaluating the entire image holistically, the PatchGAN discriminator classifies each N×N patch of the image as real or fake. This helps the model focus on local texture details, leading to sharper and more realistic outputs.

Loss Functions

  • Adversarial Loss (GAN Loss): Encourages the generator to produce outputs indistinguishable from real images.
  • L1 Loss: Ensures that the generated image is closely aligned with the target image at a pixel level, improving structural fidelity.

Dataset

We use a Cityscapes-based Pix2Pix dataset, which contains pairs of:

  • Input (Segmented) Images: Semantic label maps.
  • Target (Real) Images: Corresponding realistic cityscape photographs.

Download and Preparation

Download the dataset from Kaggle. Extract it into a directory like:

cityscapes_pix2pix/
  train/
    {image_number}.jpg
  val/
    {image_number}.jpg

Structure

  • train/: Training pairs of images (segmented and real).
  • val/: Validation pairs of images.

Installation & Requirements

  1. Clone the repository:

    git clone https://github.com/pooriyasafaei/cityscapes_pix2pix.git
  2. Install dependencies (Python 3.7+ recommended):

    pip install -r requirements.txt

    Key Dependencies:

    • PyTorch
    • Torchvision
    • PIL (Pillow)
    • Matplotlib
    • NumPy
  3. Ensure you have GPU support for training, as it will be significantly faster.

Usage

Training

  1. Download and prepare citysxapes dataset and load images by running the first cells.
  2. Adjust hyperparameters and paths in train section as needed. We set a default parameters for you in the notebook.
  3. Run the training cells. The training script will periodically display generated samples and save model checkpoints.

Inference

  1. After training, use the trained generator to translate new segmented images using show_generated_images function. This will generate real-like images corresponding to your segmented inputs.

Hyperparameters and Settings

  • Learning Rate: 2e-4
  • Batch Size: 4
  • Epochs: 50
  • Lambda_L1 (L1 Loss Weight): 100
  • Optimizer: Adam (β1=0.5, β2=0.999)

These values follow recommendations from the Pix2Pix paper and are known to produce stable training dynamics and realistic outputs.

Results

During training, after a number of steps, generated images are displayed alongside the input segmented map and the real target image. Over time, the generated outputs should gain detail and more closely resemble the target distributions.

You can expect results where:

  • Early epochs: Blurry and less detailed outputs.
  • Later epochs: Increasingly realistic images with sharper boundaries and textures.

Loss Plots

After training completes, the loss functions for both the Generator and Discriminator can be plotted using last three cells in the notebook. You should see the Discriminator loss stabilizing and the Generator loss converging.

Generated Sample

Here you can see a sample generated from the segmented image by the default hyperparameters and after 50 epochs.

Screenshot from 2024-12-12 20-41-04

References

If you find this repository helpful or use it in your research, consider citing the original Pix2Pix paper.


Enjoy experimenting with Pix2Pix!

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published