Optimizing docker image with Multi-stage builds

Surya Singh
5 min readOct 5, 2022
Photo by Ian Taylor on Unsplash

Introduction

Execution of Dockerfile creates a docker image. An image is like a template to create containers. In the docker image, the last or topmost layer is read-only.

A layer in the docker image is nothing but filesystem differences from the previous layer. Each layer is an image(docker history <IMAGE NAME>). When we fetch an image using FROMinstruction in Dockerfile, it is pulled and set as the base layer. Each subsequent instruction in Dockerfile produces a layer or temporary intermediate layer on top of the previous layer. Note that adding and removing files will result in a new layer.

Execution of Docker image creates a container. The last CMD instruction in a Dockerfile provides defaults for an executing container. A docker container has a writeable layer, often called a container layer. This container layer provides a superpower to the container. Each container stores any addition or modification to the container in its container layer. The container layer does not affect the image (the top writable layer is read-only) and hence keeps all containers isolated. Deleting of container deletes its container layer as well.

Optimization

Let's assume we have a simple elixir app named skywalker with the Dockerfiles.

Sample existing old Dockerfile

# full image
# FROM elixir:1.13.2
# slim image
# FROM elixir:1.13.2-slim
FROM elixir:1.13.2-alpineENV MIX_ENV=prod
WORKDIR /app
COPY . .
RUN mix local.hex --force
RUN mix local.rebar --force
RUN mix archive.install hex phx_new 1.6.6 --force
RUN mix deps.get
RUN mix compile
CMD ["mix", "phx.server"]

Optimized new Dockerfile

FROM elixir:1.13.2-alpine as builderENV MIX_ENV=prodRUN mix local.hex --force && \
mix local.rebar --force
WORKDIR /appCOPY config ./config
COPY lib ./lib
COPY priv ./priv
COPY mix.exs .
COPY mix.lock .
RUN mix deps.get --only prod && \
mix deps.compile && \
mix archive.install hex phx_new 1.6.6 --force && \
mix release

FROM alpine:3.16.2 as runner
RUN apk update && \
apk add openssl ncurses-libs libgcc libstdc++ && \
adduser -h /home/app -D skywalker_user
WORKDIR /home/app
COPY --from=builder /app/_build .
RUN chown -R skywalker_user: ./prod
USER skywalker_user
CMD ["./prod/rel/skywalker/bin/skywalker", "start"]

A walkthrough of the Dockerfiles

  • Docker guidelines recommend using alpine images as they are tiny in size (5 MB) and still have a complete Linux distribution. The first optimization is downloading only the Linux image and dependencies required at the runtime. We are doing it in both the Dockerfiles, which is a considerable improvement. Execution starts at the FROM statement by creating a base layer from the elixir’s alpine image (elixir:1.13.2-alpine).
  • Next, we set ENVvariables. ENVstatements create a new intermediate layer which means its value is available to future layers for reusability.
  • Only the instructions RUN, COPYandADD creates layers. Other instructions make temporary intermediate layers/images and do not increase the build size.
  • The WORKDIR instruction sets the working directory for following commands like RUN, CMD, ENTRYPOINT, COPYandADD. Docker always creates a default WORKDIReven if the subsequent instructions do not use it. The path of the directory \app and \home\app in Dockerfiles is relative to the root directory.
  • In the old Dockerfile, we copy the entire codebase in the app, which will keep unused files for runtime, e.g. README or test files. We optimize the new Dockerfile by copying specific files and directories for the release.
  • Then we use theRUN command to install and compile the dependencies. We use mix release over mix compile. It not only compiles the application with required dependencies, Erlang and Elixir, but also creates an executable binary. The deployment will use this binary file. Merging multiple RUN instructions helps to save new layer creation.
=> [runner 2/5] RUN apk update && apk add openssl ncurses-libs libgcc libstdc++ &&     adduser -h /home/app -D skywalker_user
  • Now comes the power of multi-stage build, which will drastically reduce the size of the final image. We will COPY the build created by thebuilder into a home/app directory leaving behind everything we do not want in the final image.
  • Docker guidelines recommend avoiding using sudoand running with a non-root user to avoid potential vulnerabilities in the daemon and the container at runtime.

Directory comparison between the two Dockerfiles output

  • Complete directory comparison from the two images using
docker create --name <IMAGE_NAME> <IMAGE_NAME>:latest
docker export <IMAGE_NAME> > <IMAGE_NAME>.tar
  • appdirectory in the old Dockerfile’s image contains precompiled code while the new Dockerfile’s image removes it. New Dockerfile’s image stores compiled binary code in the home directory.
# Old Dockerfile’s image
du -sh app
48M app
du -sh home
0B home
--------------------------------------------------------------------# New Dockerfile’s image
du -sh app
du: app: No such file or directory
du -sh home
49M home
  • usrdirectory contains Erlang and Elixir inside usr/local/libdirectory of the old Dockerfile’s image. New Dockerfile’s image reduces the size by removing it at runtime.
# Old Dockerfile’s image
du -hd1 usr/local/lib
7.2M usr/local/lib/elixir
59M usr/local/lib/erlang
66M usr/local/lib
--------------------------------------------------------------------# New Dockerfile’s image
du -hd1 usr/local/lib
0B usr/local/lib
  • rootdirectory in the old Dockerfile’s image contains hex and mixfiles. New Dockerfile’s image reduces the size by removing it at runtime.
# Old Dockerfile’s image
du -hd1 root
5.7M root/.hex
2.4M root/.mix
8.1M root
--------------------------------------------------------------------
# New Dockerfile’s image
du -hd1 usr/local/lib
0B root

Following the recommendation, we have a minimal number of layers and an alpine image in the old Dockerfile. In the new Dockerfile, we use a multi-stage build which increases the number of intermediate layers but can further reduce the final image size by leveraging the build cache.

References:

--

--