Web Development

Dockerfile Optimization for Fast Builds and Light Images

January 27th, 2021 | By Rui Trigo | 8 min read

Improve your performance gains with Dockerfile Optimization for fast builds and light images.

Docker builds images automatically by reading the instructions from a Dockerfile, a text file that contains all the commands needed to develop a given image.

The explanation above was extracted from Docker’s official docs and summarizes what a Dockerfile is for. Dockerfiles are our blueprint, our record of layers added to a Docker base image.

We will learn how to take advantage of BuildKit features, a set of enhancements introduced in Docker 18.09. Integrating BuildKit will give us better performance, storage management, and security.

The objectives of Dockerfile Optimization are:

decrease build time;
reduce image size;
gain maintainability;
gain reproducibility;
understand multi-stage Dockerfiles;
understand BuildKit features.

The pre-requisites to proceed with this tutorial are:

To know the Docker concepts
Docker installed (currently using v19.03)
a Java app (we used a sample Jenkins Maven app).

A simple Dockerfile example

Below is an example of an unoptimized Dockerfile containing a Java app. This example was taken from a DockerCon conference talk. We will walk through several optimizations as we go.

FROM debian
COPY . /app
RUN apt-get update
RUN apt-get -y install openjdk-11-jdk ssh emacs
CMD [“java”, “-jar”, “/app/target/my-app-1.0-SNAPSHOT.jar”]

How long does it take to build at this stage?

To answer it, we will create this Dockerfile on our local development computer and tell Docker to build the image.

# enter your Java app folder
cd simple-java-maven-app-master
# create a Dockerfile
vim Dockerfile
# write content, save and exit
docker pull debian:latest # pull the source image
time docker build --no-cache -t docker-class . # overwrite previous layers
# notice the build time

0,21s user, 0,23s system, 0% CPU, 1:55,17 total.

Our build takes 1 minute 55 seconds at this point. But what if we enable BuildKit with no additional changes? Does it make a difference?

Enabling BuildKit

BuildKit can be enabled with two methods:

Setting the DOCKER_BUILDKIT=1 environment variable when invoking the Docker build command, such as:

time DOCKER_BUILDKIT=1 docker build --no-cache -t docker-class .

Enabling Docker BuildKit by default, setting the daemon configuration in the /etc/docker/daemon.json feature to true, and restarting the daemon:

{ "features": { "buildkit": true } }

BuildKit's Initial Impact

DOCKER_BUILDKIT=1 docker build --no-cache -t docker-class .

0,54s user, 0,93s system, 1% CPU, 1:43,00 total

On the same hardware, the build took 12 seconds less than before. This means the build got 10,43% faster with almost no effort.

But now we have extra steps to improve our results even further.

Order from least to most frequently changing

Because order matters for caching, we'll move the COPY command closer to the end of the Dockerfile.

FROM debian
RUN apt-get update
RUN apt-get -y install openjdk-11-jdk ssh emacs
RUN COPY . /app
CMD [“java”, “-jar”, “/app/target/my-app-1.0-SNAPSHOT.jar”]

Avoid "COPY."

Opt for more specific COPY arguments to limit cache busts. Only copy what you need.

FROM debian
RUN apt-get update
RUN apt-get -y install openjdk-11-jdk ssh vim
COPY target/my-app-1.0-SNAPSHOT.jar /app
CMD [“java”, “-jar”, “/app/my-app-1.0-SNAPSHOT.jar”]

Couple apt-get updates and install

This prevents using an outdated package cache. Cache them together, or do not cache them at all.

FROM debian
RUN apt-get update && \
	apt-get -y install openjdk-11-jdk ssh vim
COPY target/my-app-1.0-SNAPSHOT.jar /app
CMD [“java”, “-jar”, “/app/my-app-1.0-SNAPSHOT.jar”]

Remove unnecessary dependencies

Do not install debugging and editing tools; you can install them later. When you feel you need them.

FROM debian
RUN apt-get update && \
	apt-get -y install --no-install-recommends \
	openjdk-11-jdk
COPY target/my-app-1.0-SNAPSHOT.jar /app
CMD [“java”, “-jar”, “/app/my-app-1.0-SNAPSHOT.jar”]

Remove the package manager cache

Your image does not need this cache data. Take the chance to free up some space.

FROM debian
RUN apt-get update && \
	apt-get -y install --no-install-recommends \
	openjdk-11-jdk && \
	rm -rf /var/lib/apt/lists/*
COPY target/my-app-1.0-SNAPSHOT.jar /app

Use official images where possible

There are some good reasons to use official images, such as reducing the time spent on maintenance and reducing the size, as well as having an image that is pre-configured for container use.

FROM openjdk
COPY target/my-app-1.0-SNAPSHOT.jar /app
CMD [“java”, “-jar”, “/app/my-app-1.0-SNAPSHOT.jar”]

Use specific tags

Don’t use the latest, as it’s a rolling tag. That’s asking for unpredictable problems.

FROM openjdk:8
COPY target/my-app-1.0-SNAPSHOT.jar /app
CMD [“java”, “-jar”, “/app/my-app-1.0-SNAPSHOT.jar”]

Look for minimal flavors

You can reduce the base image size. Pick the lightest one that suits your purpose. Below is a short list of OpenJDK images.

Repository	Tag	Size
openjdk	8	634MB
openjdk	8-jre	443MB
openjdk	8-jre-slim	204MB
openjdk	8-jre-alpine	83MB

Build from a source in a consistent environment

Maybe you do not need the whole JDK. If you intend to use the JDK for Maven, you can use a Maven Docker image as a base for your build.

FROM maven:3.6-jdk-8-alpine
WORKDIR /app
COPY pom.xml .
COPY src ./src
RUN mvn -e -B package
CMD [“java”, “-jar”, “/app/my-app-1.0-SNAPSHOT.jar”]

Fetch dependencies in a separate step

A Dockerfile command to fetch dependencies can be cached. Caching this step will speed up our builds.

FROM maven:3.6-jdk-8-alpine
WORKDIR /app
COPY pom.xml .
RUN mvn -e -B dependency:resolve
COPY src ./src
RUN mvn -e -B package
CMD [“java”, “-jar”, “/app/my-app-1.0-SNAPSHOT.jar”]

Multi-stage builds: remove build dependencies

Why use multi-stage builds?

separate the build from the runtime environment
DRY
different details on dev, test, lint specific environments
delinearizing dependencies (concurrency)
having platform-specific stages

FROM maven:3.6-jdk-8-alpine AS builder
WORKDIR /app
COPY pom.xml .
RUN mvn -e -B dependency:resolve
COPY src ./src
RUN mvn -e -B package

FROM openjdk:8-jre-alpine
COPY --from=builder /app/target/my-app-1.0-SNAPSHOT.jar /
CMD [“java”, “-jar”, “/my-app-1.0-SNAPSHOT.jar”]

Checkpoint

If you build our application at this point,

time DOCKER_BUILDKIT=1 docker build --no-cache -t docker-class .

0,41s user, 0,54s system, 2% CPU, 35,656 total

You'll notice our application takes 35.66 seconds to build. It's a pleasant improvement. From now on, we will focus on the features in more possible scenarios.

Multi-stage builds: different image flavors

The Dockerfile below shows a different stage for a Debian and an Alpine-based image.

FROM maven:3.6-jdk-8-alpine AS builder
…
FROM openjdk:8-jre-jessie AS release-jessie
COPY --from=builder /app/target/my-app-1.0-SNAPSHOT.jar /
CMD [“java”, “-jar”, “/my-app-1.0-SNAPSHOT.jar”]

FROM openjdk:8-jre-alpine AS release-alpine
COPY --from=builder /app/target/my-app-1.0-SNAPSHOT.jar /
CMD [“java”, “-jar”, “/my-app-1.0-SNAPSHOT.jar”]

To build a specific image on a stage, we can use the --target argument:

time docker build --no-cache --target release-jessie .

Different image flavors (DRY and global ARG)

ARG flavor=alpine
FROM maven:3.6-jdk-8-alpine AS builder
…
FROM openjdk:8-jre-$flavor AS release
COPY --from=builder /app/target/my-app-1.0-SNAPSHOT.jar /
CMD [“java”, “-jar”, “/my-app-1.0-SNAPSHOT.jar”]

The ARG command can control the image to be built. In the example above, we wrote alpine as the default flavor, but we can pass --build-arg flavor=<flavor> on the docker build command.

time docker build --no-cache --target release --build-arg flavor=jessie .

Concurrency

Concurrency is important when building Docker images, as it takes the most advantage of available CPU threads. In a linear Dockerfile, all stages are executed in sequence.

With multi-stage builds, we can have smaller dependency stages ready for the main stage to use.

BuildKit even brings another performance bonus. If stages are not used later in the build, they are directly skipped instead of processed and discarded when they finish. This means that in the stage graph representation, unneeded stages are not even considered.

Below is an example Dockerfile where a website's assets are built in the assets stage:

FROM maven:3.6-jdk-8-alpine AS builder
…
FROM tiborvass/whalesay AS assets
RUN whalesay “Hello DockerCon!” > out/assets.html

FROM openjdk:8-jre-alpine AS release
COPY --from=builder /app/my-app-1.0-SNAPSHOT.jar /
COPY --from=assets /out /assets
CMD [“java”, “-jar”, “/my-app-1.0-SNAPSHOT.jar”]

And here is another Dockerfile where C and C++ libraries are separately compiled and take part in the builder stage later on.

FROM maven:3.6-jdk-8-alpine AS builder-base
…

FROM gcc:8-alpine AS builder-someClib
…
RUN git clone … ./configure --prefix=/out && make && make install

FROM g++:8-alpine AS builder-some CPPlib
…
RUN git clone … && cmake …

FROM builder-base AS builder
COPY --from=builder-someClib /out /
COPY --from=builder-someCpplib /out /

BuildKit Application Cache

BuildKit has a special feature regarding package managers' caches. Here are some examples of cache folders' typical locations:

Package manager	Path
apt	/var/lib/apt/lists
go	~/.cache/go-build
go-modules	$GOPATH/pkg/mod
npm	~/.npm
pip	~/.cache/pip

We can compare this Dockerfile with the one presented in the section Build from the source in a consistent environment. This earlier Dockerfile didn't have special cache handling. We can do that with a type of mount called cache: --mount=type=cache.

FROM maven:3.6-jdk-8-alpine AS builder
WORKDIR /app
RUN --mount=target=. --mount=type=cache,target /root/.m2 \
	&& mvn package -DoutputDirectory=/

FROM openjdk:8-jre-alpine
COPY --from=builder /app/target/my-app-1.0-SNAPSHOT.jar /
CMD [“java”, “-jar”, “/my-app-1.0-SNAPSHOT.jar”]

BuildKit Secret Volumes

To mix in some security features of BuildKit, let's see how secret-type mounts are used and, in some cases, what they are meant for. The first scenario shows an example where we need to hide a secret file, like ~/.aws/credentials.

FROM <baseimage>
RUN …
RUN --mount=type=secret,id=aws,target=/root/.aws/credentials,required \
./fetch-assets-from-s3.sh
RUN ./build-scripts.sh

To build this Dockerfile, pass the --secret argument like this:

docker build --secret id=aws,src=~/.aws/credentials

The second scenario is a method to avoid commands like COPY./keys/private.pem /root.ssh/private.pem, as we don't want our SSH keys to be stored on the Docker image after they are no longer needed. BuildKit has an ssh mount type to cover that:

FROM alpine
RUN apk add --no-cache openssh-client
RUN mkdir -p -m 0700 ~/.ssh && ssh-keyscan github.com >> ~/.ssh/known_hosts
ARG REPO_REF=19ba7bcd9976ef8a9bd086187df19ba7bcd997f2
RUN --mount=type=ssh,required git clone [email protected]:org/repo /work && cd /work && git checkout -b $REPO_REF

To build this Dockerfile, you need to load your private SSH key into your ssh-agent and add --ssh=default, with default representing the SSH private key location.

eval $(ssh-agent)
ssh-add ~/.ssh/id_rsa # this is the SSH key default location
docker build --ssh=default .

Conclusion

This concludes our demo on using Docker BuildKit to optimize your Dockerfiles and speed up your images’ build time.

These speed gains result in much-needed savings in time and computational power, which should not be neglected.

Like Charles Duhigg wrote in The Power of Habit: "Small victories are the consistent application of a small advantage". You will definitely reap the benefits if you develop good practices and habits.

Jscrambler

The leader in client-side Web security. With Jscrambler, JavaScript applications become self-defensive and capable of detecting and blocking client-side attacks like Magecart.

View All Articles

Must read next

Web Development

Migrating your Gitlab Infrastructure into Docker

In this article, we are going to show you step-by-step how can you migrate your current Gitlab infrastructure into Docker.

April 22, 2016 | By Pedro Fortuna | 8 min read

Learn More

Web Development

Create a React Native Image Recognition App with Google Vision API

In this tutorial, you'll learn how to use the Google Vision API to identify the content of images in a React Native mobile application.

February 13, 2019 | By Aman Mittal | 8 min read

Learn More

Dockerfile Optimization for Fast Builds and Light Images

A simple Dockerfile example

Enabling BuildKit

BuildKit's Initial Impact

Order from least to most frequently changing

Avoid "COPY."

Couple apt-get updates and install

Remove unnecessary dependencies

Remove the package manager cache

Use official images where possible

Use specific tags

Look for minimal flavors

Build from a source in a consistent environment

Fetch dependencies in a separate step

Multi-stage builds: remove build dependencies

Checkpoint

Multi-stage builds: different image flavors

Different image flavors (DRY and global ARG)

Concurrency

BuildKit Application Cache

BuildKit Secret Volumes

Conclusion

Must read next

Migrating your Gitlab Infrastructure into Docker

Create a React Native Image Recognition App with Google Vision API

Subscribe to Our Newsletter