emlov3-session-02

PyTorch Docker Assignment

Welcome to the PyTorch Docker Assignment. This assignment is designed to help understand and work with Docker and PyTorch.

Assignment Overview

This project trains a neural network on the MNIST dataset using PyTorch. The project is containerized with Docker, making it easy to reproduce the environment. In this assignment contains:

Create a Dockerfile for a PyTorch (CPU version) environment.
Keep the size of your Docker image under 1GB (uncompressed).
Train any model on the MNIST dataset inside the Docker container.
Save the trained model checkpoint to the host operating system.
Add an option to resume model training from a checkpoint.

Starter Code

The provided starter code in train.py provides a basic structure for loading data, defining a model, and running training and testing loops. And with this submission, the code is completed.

How to Run the Code Using Docker

Below are the instructions to build and run the code using Docker.

Requirements

Docker installed on your machine.

Dockerfile Overview

The provided Dockerfile does the following:

Base Image: Uses python:3.9-slim as the base image.
Working Directory: Sets /workspace as the working directory inside the container.
Package Installation: Installs specific versions of numpy, torch, and torchvision using pip.
Copy Files: Copies train.py to the working directory.
Command to Execute: The default command to run the training script is python train.py.

How to Build and Run the Docker Container

Step 1: Build the Docker Image

Navigate to the directory containing the Dockerfile and run the following command to build the Docker image:

docker build -t mnist-trainer:latest .

This command:

Builds the Docker image and tags it as mnist-trainer:latest.

Step 2: Run the Docker Container

Once the image is built, you can run the container using the following command:

docker run --rm -it -v $(pwd)/data:/workspace/data mnist-trainer:latest

Explanation:

--rm: Automatically removes the container once it exits.
-it: Runs the container interactively, allowing you to see the training output in real time.
-v $(pwd)/data:/workspace/data: Mounts the data directory from your host system into the container at /workspace/data, allowing MNIST data and model checkpoints to persist between runs.
mnist-trainer:latest: Specifies the Docker image to run.

Step 3: Running with Checkpoint Resume

To resume training from a checkpoint, first make sure a model checkpoint exists at ./model_checkpoint.pth. Then, add the --resume flag when running the container:

docker run --rm -it -v $(pwd)/data:/workspace/data mnist-trainer:latest --resume

This will load the existing checkpoint and continue training.

Additional Docker Commands

To view the logs: Use the following command to check the logs of the running container:

docker logs <container-id>

To save the model: After training, the model checkpoint will be saved in ./model_checkpoint.pth on your local machine.

Notes

The model architecture and training script can be modified in train.py.
The container will automatically download the MNIST dataset during the training process if not already present.

Test Results

All the tests run with the script tests/grading.sh completed successfully on gitpod.

Submission

After the assignment completion, push code to the Github repository. The Github Actions workflow will automatically build the Docker image, run training script, and check if the assignment requirements have been met. Check the Github Actions tab for the results of these checks. It is made sure that all checks are passing before the assignment submission.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.github/workflows		.github/workflows
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

emlov3-session-02

PyTorch Docker Assignment

Assignment Overview

Starter Code

How to Run the Code Using Docker

Requirements

Dockerfile Overview

How to Build and Run the Docker Container

Step 1: Build the Docker Image

Step 2: Run the Docker Container

Step 3: Running with Checkpoint Resume

Additional Docker Commands

Notes

Test Results

Submission

About

Releases

Packages

Contributors 2

Languages

The-School-of-AI/emlo4-session-02-mHemaAP

Folders and files

Latest commit

History

Repository files navigation

emlov3-session-02

PyTorch Docker Assignment

Assignment Overview

Starter Code

How to Run the Code Using Docker

Requirements

Dockerfile Overview

How to Build and Run the Docker Container

Step 1: Build the Docker Image

Step 2: Run the Docker Container

Step 3: Running with Checkpoint Resume

Additional Docker Commands

Notes

Test Results

Submission

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages