Optimizing docker image with Multi-stage builds
Introduction
Execution of Dockerfile creates a docker image. An image is like a template to create containers. In the docker image, the last or topmost layer is read-only.
A layer in the docker image is nothing but filesystem differences from the previous layer. Each layer is an image(docker history <IMAGE NAME>
). When we fetch an image using FROM
instruction in Dockerfile, it is pulled and set as the base layer. Each subsequent instruction in Dockerfile produces a layer or temporary intermediate layer on top of the previous layer. Note that adding and removing files will result in a new layer.
Execution of Docker image creates a container. The last CMD
instruction in a Dockerfile provides defaults for an executing container. A docker container has a writeable layer, often called a container layer. This container layer provides a superpower to the container. Each container stores any addition or modification to the container in its container layer. The container layer does not affect the image (the top writable layer is read-only) and hence keeps all containers isolated. Deleting of container deletes its container layer as well.
Optimization
Let's assume we have a simple elixir app named skywalker with the Dockerfiles.
Sample existing old Dockerfile
# full image
# FROM elixir:1.13.2# slim image
# FROM elixir:1.13.2-slimFROM elixir:1.13.2-alpineENV MIX_ENV=prod
WORKDIR /app
COPY . . RUN mix local.hex --force
RUN mix local.rebar --force
RUN mix archive.install hex phx_new 1.6.6 --force
RUN mix deps.get
RUN mix compile CMD ["mix", "phx.server"]
Optimized new Dockerfile
FROM elixir:1.13.2-alpine as builderENV MIX_ENV=prodRUN mix local.hex --force && \
mix local.rebar --forceWORKDIR /appCOPY config ./config
COPY lib ./lib
COPY priv ./priv
COPY mix.exs .
COPY mix.lock .RUN mix deps.get --only prod && \
mix deps.compile && \
mix archive.install hex phx_new 1.6.6 --force && \
mix release
FROM alpine:3.16.2 as runnerRUN apk update && \
apk add openssl ncurses-libs libgcc libstdc++ && \
adduser -h /home/app -D skywalker_userWORKDIR /home/app
COPY --from=builder /app/_build .
RUN chown -R skywalker_user: ./prod
USER skywalker_userCMD ["./prod/rel/skywalker/bin/skywalker", "start"]
A walkthrough of the Dockerfiles
- Docker guidelines recommend using alpine images as they are tiny in size (5 MB) and still have a complete Linux distribution. The first optimization is downloading only the Linux image and dependencies required at the runtime. We are doing it in both the Dockerfiles, which is a considerable improvement. Execution starts at the
FROM
statement by creating a base layer from the elixir’s alpine image (elixir:1.13.2-alpine
). - Next, we set
ENV
variables.ENV
statements create a new intermediate layer which means its value is available to future layers for reusability. - Only the instructions
RUN
,COPY
andADD
creates layers. Other instructions make temporary intermediate layers/images and do not increase the build size. - The
WORKDIR
instruction sets the working directory for following commands likeRUN
,CMD
,ENTRYPOINT
,COPY
andADD
. Docker always creates a defaultWORKDIR
even if the subsequent instructions do not use it. The path of the directory\app
and\home\app
in Dockerfiles is relative to the root directory. - In the old Dockerfile, we copy the entire codebase in the app, which will keep unused files for runtime, e.g. README or test files. We optimize the new Dockerfile by copying specific files and directories for the release.
- Then we use the
RUN
command to install and compile the dependencies. We usemix release
overmix compile
. It not only compiles the application with required dependencies, Erlang and Elixir, but also creates an executable binary. The deployment will use this binary file. Merging multipleRUN
instructions helps to save new layer creation.
=> [runner 2/5] RUN apk update && apk add openssl ncurses-libs libgcc libstdc++ && adduser -h /home/app -D skywalker_user
- Now comes the power of multi-stage build, which will drastically reduce the size of the final image. We will
COPY
the build created by thebuilder
into ahome/app
directory leaving behind everything we do not want in the final image. - Docker guidelines recommend avoiding using
sudo
and running with a non-root user to avoid potential vulnerabilities in the daemon and the container at runtime.
Directory comparison between the two Dockerfiles output
- Complete directory comparison from the two images using
docker create --name <IMAGE_NAME> <IMAGE_NAME>:latest
docker export <IMAGE_NAME> > <IMAGE_NAME>.tar
app
directory in the old Dockerfile’s image contains precompiled code while the new Dockerfile’s image removes it. New Dockerfile’s image stores compiled binary code in the home directory.
# Old Dockerfile’s image
du -sh app
48M appdu -sh home
0B home--------------------------------------------------------------------# New Dockerfile’s image
du -sh app
du: app: No such file or directory du -sh home
49M home
usr
directory contains Erlang and Elixir insideusr/local/lib
directory of the old Dockerfile’s image. New Dockerfile’s image reduces the size by removing it at runtime.
# Old Dockerfile’s image
du -hd1 usr/local/lib
7.2M usr/local/lib/elixir
59M usr/local/lib/erlang
66M usr/local/lib--------------------------------------------------------------------# New Dockerfile’s image
du -hd1 usr/local/lib
0B usr/local/lib
root
directory in the old Dockerfile’s image containshex
andmix
files. New Dockerfile’s image reduces the size by removing it at runtime.
# Old Dockerfile’s image
du -hd1 root
5.7M root/.hex
2.4M root/.mix
8.1M root
--------------------------------------------------------------------# New Dockerfile’s image
du -hd1 usr/local/lib
0B root
Following the recommendation, we have a minimal number of layers and an alpine image in the old Dockerfile. In the new Dockerfile, we use a multi-stage build which increases the number of intermediate layers but can further reduce the final image size by leveraging the build cache.
References: