Module 3: Containers and Reproducible Builds

Module Overview

"Works on my machine" is a common state of code developed by people lacking in software engineering background. It must be reproducible for code (and science) to work.

We've already learned about pipenv as a Python packaging tool, which goes a long way towards giving reproducible builds - but for even greater reproducibility (and deployability), containers are the tool of choice. A container is a minimal virtual operating system, complete with all the software needed to run the desired application. Because they pack everything together, they are identical to run regardless of host.

Docker is a common standard and tool for containers, and we will use it to build and run Linux containers with Python code.

Learning Objectives

1. Launch Docker containers and access/execute programs on them

Understanding Docker container basics
Running pre-built Docker containers
Executing commands within containers
Managing container lifecycle
Accessing container resources
Interacting with container processes

2. Create/customize a Dockerfile to build a basic custom container

Writing Dockerfile instructions
Setting up container environments
Installing dependencies
Configuring container settings
Building custom images
Managing container configurations

Guided Project

In this guided project, we'll learn how to create Docker containers for reproducible Python environments:

Resources

GitHub Repo Docker Hub Signup Page Docker Sandbox Applet

Guided Project File:

guided-project.md

Module Assignment

Please read the assignment.md file in the GitHub repository for detailed instructions

Assignment File:

assignment.md

Solution Video

Additional Resources

Documentation & Tutorials

An introduction to Docker (brownbag presentation) Hands-on Machine Learning in Docker Docker Classroom for Dockerfiles VMs versus Containers Kubernetes Docker Compose