Module 3: Containers and Reproducible Builds
Module Overview
"Works on my machine" is a common state of code developed by people lacking in software engineering background. It must be reproducible for code (and science) to work.
We've already learned about pipenv as a Python packaging tool, which goes a long way towards giving reproducible builds - but for even greater reproducibility (and deployability), containers are the tool of choice. A container is a minimal virtual operating system, complete with all the software needed to run the desired application. Because they pack everything together, they are identical to run regardless of host.
Docker is a common standard and tool for containers, and we will use it to build and run Linux containers with Python code.
Learning Objectives
- Launch Docker containers and access/execute programs on them
- Create and customize a Dockerfile to build a basic custom container
Objective 01 - Launch Docker containers and execute programs on them
Overview
The purpose of containers is to run code, reliably and reproducibly. Even something as simple as “Hello World!”, achieved identically and independently of platform, is a remarkable and powerful thing.
What is a container? Just something that holds other things - in the context of computation, a system that holds programs. The difference between a container and the computer you're using right now is that a container is abstracted and virtualized - it is independent of the hardware and (external) operating system it runs on.
Follow Along
Once installed, Docker “Hello World!” is as simple as:
docker run hello-world
Try it out! You should see something like:
docker run hello-world
Unable to find image 'hello-world:latest' locally
latest: Pulling from library/hello-world
ca4f61b1923c: Pull complete
Digest: sha256:ca0eeb6fb05351dfc8759c20733c91def84cb8007aa89a5bf606bc8b315b9fc7
Status: Downloaded newer image for hello-world:latest
Hello from Docker!
This message shows that your installation appears to be working correctly.
...
But what's really happening?
- Docker is looking for a container named hello-world
- It doesn't find it!
- But not to worry - it looks for and pulls it from the official Docker Cloud
- After some network and hash/checksumming (basically “making sure it gets the right thing”)…
- The image is downloaded and executed!
- The result - well, it prints “Hello from Docker!” This is hello-world, after all.
But the somewhat subtle and crazy thing here is that this message is the single purpose of an entire operating system, contained and reproduced exactly (byte-for-byte) whenever and wherever you run it.
So, while “Hello World!” may not exactly pay the bills, the general idea that it demonstrates is clearly powerful. And it's up to us to use that power to make our code reliable and reproducible.
Challenge
Read up on Docker documentation, and execute something on a Docker container beyond “Hello World!”
Additional Resources
Objective 02 - Create and customize a Dockerfile to build a basic custom container
Overview
Running containers that other people make is useful - there are a lot of premade containers out there. But the real power of Docker is in customizing your own container, and running your own reproducible code in a variety of environments.
To customize Docker, you must write a Dockerfile - a text file “recipe” that specifies the container/Linux distribution you are basing your container on, and then add additional environment setup steps.
Follow Along
Example MVP Dockerfile
for Python:
FROM debian
### So logging/io works reliably w/Docker
ENV PYTHONUNBUFFERED=1
### UTF Python issue for Click package (pipenv dependency)
ENV LC_ALL=C.UTF-8
ENV LANG=C.UTF-8
### Need to explicitly set this so `pipenv shell` works
ENV SHELL=/bin/bash
### Basic Python dev dependencies
RUN apt-get update && \
apt-get upgrade -y && \
apt-get install python3-pip curl -y && \
pip3 install pipenv
- Put the above in a
Dockerfile
docker build . -t python
- Wait (may take awhile, especially if it's your first container)
docker run -it python /bin/bash
You're now in a reproducible environment with pipenv
! When you're done, exit
-
but note that if you run it again it will actually be a new copy of the container, i.e. you don't see
what you did in the earlier version of the container.
If you want to reuse a single container, read up on docker restart
and docker
attach
- but the idea of a container always being clean is actually quite powerful, as that
is where reproducible builds come from. With a more elaborate Dockerfile
(that specifies
an actual package to install and run), you can have a contained and reproducible app that you know runs
the same everywhere.
Challenge
Build on the Dockerfile
above - add to the RUN
command to install
specific
useful packages and execute them. You may have to read up on docker run
, in particular the
-p
argument to expose a port from the Docker container to localhost - this lets you run a
Jupyter notebook server inside Docker and access it from your external operating system.
To see how deep the rabbit hole goes, check out Docker Compose.
Additional Resources
Guided Project
In this guided project, we'll learn how to create Docker containers for reproducible Python environments. Open guided-project.md in the GitHub repository below to follow along with the guided project.
Module Assignment
For this assignment, you'll create Docker containers and Dockerfiles to demonstrate your understanding of containerization and reproducible builds.
Solution Video
Additional Resources
Docker Fundamentals
- An introduction to Docker (brownbag presentation)
- Docker Classroom for Dockerfiles
- VMs versus Containers