Skip to main content

Command Palette

Search for a command to run...

Quick Guide to Speeding Up Docker Builds for ML Applications

Learn how Docker caching and best practices can speed up your ML application builds with detailed code examples.

Updated
2 min read

Docker builds are essential for deploying ML applications in a consistent and portable way. However, slow build times can significantly hinder development and deployment workflows, as well as streamlined CI/CD pipelines. Here are some strategies I follow to significantly improve Docker build speed.

  1. Docker Layer Caching:

    Docker builds are composed of layers, where each instruction creates a new layer. Docker cleverly caches layers that haven't changed, significantly reducing build time on subsequent runs.

    Order Matters: In your Dockerfile, place frequently modified instructions (like copying your code) towards the end. This ensures frequently changing layers don't invalidate the cache for earlier layers (like installing libraries using pip and copying the requirements.txt). Install the libraries in earlier layers to leverage the cache effectively. Below is an example of how to implement this idea of docker caching.

    Notice how this order has helped us to skip building of the first 5 layers using cache.

    Notice how this order has helped us to skip re-building of the first 5 layers of the docker file using cache.

  2. Minimize Package Installation using --no-install-recommends:

    By default, package managers like apt-get install both essential and recommended packages. Use the --no-install-recommends flag to install only the strictly necessary packages for your application. This reduces download size and build time.

     RUN apt-get update && apt-get install --no-install-recommends libgomp1
     #libgmop1 provides support for OpenMP which is useful in parallel programming
    
  3. Ignore unnecessary files for docker build:

    The .dockerignore file functions similarly to .gitignore but for Docker builds. Use it to exclude unnecessary files and folders from your build context. This avoids unnecessary copying during the build process, speeding things up. One common file that is not necessary during the training or inference pipeline is ipynb_checkpoints. Here is an example of how you can exclude them.

     **/*.ipynb_checkpoints/
    
  4. Choose a slim base image:

    Many base images like python:3.10 come with various pre-installed packages. Consider using slimmer alternatives like python:3.10-slim or python:3.10-alpine to minimize the image size and build time. These slim versions often only contain Python itself and essential libraries.

     FROM python:3.10-slim
     #instead of FROM python:latest or FROM python:3.10
    

    I hope these strategies—optimizing layer caching, minimizing unnecessary package installations, excluding irrelevant files, and choosing a slim image—will help you save time and enable faster, more frequent ML deployments.