Runpod Platform

DIY Deep Learning Docker Container

Zhen Lu

May 7, 2022 • 4 min read

Are you tired of using someone else's container, only to find out that they have the wrong versions of your tools installed? Maybe you have just installed everything from scratch every time you wanted to start over and thought to yourself, "this is a waste of time"? I've personally gone through this, because I'd rather do what I need to do to get working, rather than mess with tooling that I don't need. Honestly, though, it's pretty easy to build your own docker container and customize it to your needs. If you do this, then you can get started fresh, with your tools, every time.

In this blog post, we'll go over the fundamentals of how to build your own Docker image for machine learning and push it to DockerHub. I'll use the custom tensorflow image that I built for RunPod as an example. Finished Dockerfiles can be found in github. Let's get started.

How to Start

First, you'll want to sign up for an account on Docker Hub. If you aren't familiar with docker hub, it's like Github for Docker container images. Once you push your container, you'll be able to clone it and use it wherever you want. Save your credentials in your favorite password manager for later.

Next, you'll want to find a suitable base image. If you're a purist, you can start with a minimal image like ubuntu, or something with CUDA already installed like one of the nvidia/cuda images. I'm going to start with the tensorflow/tensorflow:latest-gpu image as it already comes with tensorflow installed, and I know that I'm going to want to use TF 2.8.0 already.

To start a Dockerfile with a base image, you want to create a file called "Dockerfile" and add the following lines to it:

## Dockerfile

ARG BASE_IMAGE=tensorflow/tensorflow:latest-gpu

FROM ${BASE_IMAGE} as dev-base

You could also just use the following, if you don't think that you will want to refer to the base image name later.

## Dockerfile

FROM tensorflow/tensorflow:latest-gpu

The following lines instruct Docker to use bash as the default shell instead of sh:

## Dockerfile

SHELL ["/bin/bash", "-o", "pipefail", "-c"]

ENV DEBIAN_FRONTEND noninteractive\

SHELL=/bin/bash

Now for the fun part: we get to customize the stuff that gets installed in our docker image!

In this image, I am going to do a few things:

Fix the public key issue that Nvidia has right now
apt-get update/upgrade to patch Ubuntu vulnerabilities
Install utilities like wget/openssh
Upgrade pip
Install Jupyter Lab

## Dockerfile

RUN apt-key del 7fa2af80
RUN apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/3bf863cc.pub
RUN apt-get update --yes && \
    # - apt-get upgrade is run to patch known vulnerabilities in apt-get packages as
    #   the ubuntu base image is rebuilt too seldom sometimes (less than once a month)
    apt-get upgrade --yes && \
    apt install --yes --no-install-recommends\
    wget\
    bash\
    openssh-server &&\
    apt-get clean && rm -rf /var/lib/apt/lists/* && \
    echo "en_US.UTF-8 UTF-8" > /etc/locale.gen
RUN /usr/bin/python3 -m pip install --upgrade pip
RUN pip install jupyterlab
RUN pip install ipywidgets

As you can see, it's super easy to automate what you would have had to install manually. Just type in the install commands using the Docker RUN keyword. The benefit here is that your installed utilities will be cached within your Docker image, and you won't have to wait for them to install the next time you want to use this development environment.

The last thing that we'll do is give Docker a start command. This defines what your Docker image will do when you start it. In this case, I define a start script (start.sh) in the same directory:

ADD start.sh /

RUN chmod +x /start.sh

CMD [ "/start.sh" ]

The ADD command copies the script into the root of my container file system, the RUN chmod command makes it executable, and the CMD command tells docker to run start.sh when the docker container starts.

Here's what start.sh looks like in the same directory as your Dockerfile:

#!/bin/bash

echo "pod started"

if [[ $PUBLIC_KEY ]]
then
    mkdir -p ~/.ssh
    chmod 700 ~/.ssh
    cd ~/.ssh
    echo $PUBLIC_KEY >> authorized_keys
    chmod 700 -R ~/.ssh
    cd /
    service ssh start
fi

if [[ $JUPYTER_PASSWORD ]]
then
    cd /
    jupyter lab --allow-root --no-browser --port=8888 --ip=* --ServerApp.terminado_settings='{"shell_command":["/bin/bash"]}' --ServerApp.token=$JUPYTER_PASSWORD --ServerApp.allow_origin=* --ServerApp.preferred_dir=/workspace
else
    sleep infinity
fi

This just says to run the OpenSSH daemon if a public key is provided in env, and to run Jupyter Lab if a Jupyter password is provided in the env. Both processes are run in the background, so we must also provide a sleep infinity command if we don't want the Docker container to exit automatically.

To build your container, go to the folder you have your Dockerfile in, and run

docker build . -t repo/name:tag

In this case my repo is runpod, my name is tensorflow, and my tag is latest.

Once your image is built, you can push it by first logging in

docker login

Then running

docker push repo/name:tag

Your image should get uploaded to dockerhub, where you can check it out!

This just scratches the surface of what you can do with docker containers, but it's a good example to get your feet wet.