Science and technology

Tips and tips for optimizing container builds

How many iterations does it take to get a container configuration excellent? And how lengthy does every iteration take? Well, for those who answered “too many times and too long,” then my experiences are just like yours. On the floor, making a configuration file looks like an easy train: implement the identical steps in a configuration file that you’d carry out for those who have been putting in the system by hand. Unfortunately, I’ve discovered that it often does not fairly work that method, and some “tricks” are useful for such DevOps workout routines.

In this text, I will share some strategies I’ve discovered that assist decrease the quantity and size of iterations. In addition, I will define a number of good practices past the standard ones.

In the tutorial repository from my earlier article about containerizing build systems, I’ve added a folder referred to as /tutorial2_docker_tricks with an instance masking a few of the tips that I will stroll by way of on this put up. If you wish to observe alongside and you’ve got Git put in, you may pull it regionally with:

$ git clone https://github.com/ravi-chandran/dockerize-tutorial

The tutorial has been examined with Docker Desktop Edition, though it ought to work with any suitable Linux container system (like Podman).

Save time on container picture construct iterations

If the Dockerfile entails downloading and putting in a 5GB file, every iteration of docker picture construct may take plenty of time even with good community speeds. And forgetting to incorporate one merchandise to be put in can imply rebuilding all of the layers after that time.

One method round that problem is to make use of a neighborhood HTTP server to keep away from downloading giant information from the web a number of instances throughout docker picture construct iterations. To illustrate this by instance, say you must create a container picture with Anaconda Three underneath Ubuntu 18.04. The Anaconda Three installer is a ~zero.5GB file, so this would be the “large” file for this instance.

Note that you just do not wish to use the COPY instruction, because it creates a brand new layer. You also needs to delete the big installer after utilizing it to reduce the container picture dimension. You may use multi-stage builds, however I’ve discovered the next strategy adequate and fairly efficient.

The fundamental concept is to make use of a Python-based HTTP server regionally to serve the big file(s) and have the Dockerfile wget the big file(s) from this native server. Let’s discover the main points of set this up successfully. As a reminder, you may entry the full example.

The essential contents of the folder tutorial2_docker_tricks/ on this instance repository are:

tutorial2_docker_tricks/
├── build_docker_image.sh                   # builds the docker picture
├── run_container.sh                        # instantiates a container from the picture
├── install_anaconda.dockerfile             # Dockerfile for creating our goal docker picture
├── .dockerignore                           # used to disregard contents of the installer/ folder from the docker context
├── installer                               # folder with all our giant information required for creating the docker picture
│   └── AnacondaThree-2019.10-Linux-x86_64.sh   # from https://repo.anaconda.com/archive/Anaconda3-2019.10-Linux-x86_64.sh
└── workdir                                 # instance folder used as a quantity within the operating container

The key steps of the strategy are:

  • Place the big file(s) within the installer/ folder. In this instance, I’ve the big Anaconda installer file AnacondaThree-2019.10-Linux-x86_64.sh. You will not discover this file for those who clone my Git repository as a result of solely you, because the container picture creator, want this supply file. The finish customers of the picture do not. Download the installer to observe together with the instance.
  • Create the .dockerignore file and have it ignore the installer/ folder to keep away from Docker copying all the big information into the construct context.
  • In a terminal, cd into the tutorial2_docker_tricks/ folder and execute the construct script as ./build_docker_image.sh.
  • In build_docker_image.sh, begin the Python HTTP server to serve any information from the installer/ folder:

    cd installer
    python3 -m http.server --bind 10.zero.2.15 8888 &
    cd ..

  • If you are questioning concerning the unusual web protocol (IP) deal with, I am working with a VirtualBox Linux VM, and 10.zero.2.15 exhibits up because the deal with of the Ethernet adapter once I run ifconfig. This IP appears to be the conference utilized by VirtualBox. If your setup is totally different, you will must replace this IP deal with to match your surroundings after which replace build_docker_image.sh and install_anaconda.dockerfile. The server’s port quantity is about to 8888 for this instance. Note that the IP and port numbers might be handed in as construct arguments, however I’ve hard-coded them for brevity.
  • Since the HTTP server is about to run within the background, cease the server close to the top of the script with the kill -9 command utilizing an elegant approach I discovered:
    kill -9 `ps -ef | grep http.server | grep 8888 | awk 'print $2'
  • Note that this similar kill -9 can also be used earlier within the script (earlier than beginning the HTTP server). In common, once I iterate on any construct script that I’d intentionally interrupt, this ensures a clear begin of the HTTP server every time.
  • In the Dockerfile, there’s a RUN wget instruction that downloads the Anaconda installer from the native HTTP server. It additionally deletes the installer file and cleans up after the set up. Most importantly, all these actions are carried out throughout the similar layer to maintain the picture dimension to a minimal:

    # set up Anaconda by downloading the installer through the native http server
    ARG ANACONDA
    RUN wget --no-proxy http://10.0.2.15:8888/$ANACONDA -O ~/anaconda.sh
        && /bin/bash ~/anaconda.sh -b -p /choose/conda
        && rm ~/anaconda.sh
        && rm -fr /var/lib/apt/lists/apt,dpkg,cache,log /tmp/* /var/tmp/*

  • This file runs the wrapper script, anaconda.sh, and cleans up giant information by eradicating them with rm.
  • After the construct is full, it is best to see a picture anaconda_ubuntu1804:v1. (You can checklist the pictures with docker picture ls.)
  • You can instantiate a container from this picture utilizing ./run_container.sh on the terminal whereas within the folder tutorial2_docker_tricks/. You can confirm that Anaconda is put in with:

    $ ./run_container.sh
    $ python --version
    Python Three.7.5
    $ conda --version
    conda four.eight.zero
    $ anaconda --version
    anaconda Command line consumer (model 1.7.2)

  • You’ll be aware that run_container.sh units up a quantity workdir. In this instance repository, the folder workdir/ is empty. This is a conference I exploit to arrange a quantity the place I can have my Python and different scripts which might be impartial of the container picture.

Minimize container picture dimension

Each RUN command is equal to executing a brand new shell, and every RUN command creates a layer. The naive strategy of mimicking set up directions with separate RUN instructions might ultimately break at a number of interdependent steps. If it occurs to work, it would sometimes end in a bigger picture. Chaining a number of set up steps in a single RUN command and together with the autoremove, autoclean, and rm instructions (as within the instance beneath) is beneficial to reduce the dimensions of every layer. Some of those steps is probably not wanted, relying on what’s being put in. However, since these steps take an insignificant period of time, I at all times throw them in for good measure on the finish of RUN instructions invoking apt-get:

RUN apt-get replace
    && DEBIAN_FRONTEND=noninteractive
       apt-get -y --quiet --no-install-recommends set up
       # checklist of packages being put in go right here
    && apt-get -y autoremove
    && apt-get clear autoclean
    && rm -fr /var/lib/apt/lists/apt,dpkg,cache,log /tmp/* /var/tmp/*

Also, guarantee that you’ve a .dockerignore file in place to disregard gadgets that do not have to be despatched to the Docker construct context (such because the Anaconda installer file within the earlier instance).

For software program construct methods, the construct inputs and outputs—all of the scripts that configure and invoke the instruments—needs to be exterior the picture and the ultimately operating container. The container itself ought to stay stateless in order that totally different customers could have similar outcomes with it. I lined this extensively in my previous article however wished to emphasise it as a result of it has been a helpful conference for my work. These inputs and outputs are greatest accessed by organising container volumes.

I’ve had to make use of a container picture that gives information within the type of supply code and huge pre-built binaries. As a software program developer, I used to be anticipated to edit the code within the container. This was problematic, as a result of containers are by default stateless: they do not save information throughout the container, as a result of they’re designed to be disposable. But I labored on it, and on the finish of every day, I ended the container and needed to be cautious to not take away it, as a result of the state needed to be maintained so I may proceed work the subsequent day. The drawback of this strategy was that there can be a divergence of improvement state had there been a couple of particular person engaged on the mission. The worth of getting similar construct methods throughout builders is considerably misplaced with this strategy.

Generate output as non-root person

An essential facet of I/O considerations the possession of the output information generated when operating the instruments within the container. By default, since Docker runs as root, the output information can be owned by root, which is disagreeable. You sometimes wish to work as a non-root person. Changing the possession after the construct output is generated may be performed with scripts, however it’s a further and pointless step. It’s greatest to set the USER argument within the Dockerfile on the earliest level potential:

ARG USERNAME
# different instructions...
USER $

The USERNAME may be handed in as a construct argument (–build-arg) when executing the docker picture construct. You can see an instance of this within the instance Dockerfile and corresponding build script.

Some parts of the instruments can also have to be put in as a non-root person. So the sequence of installations within the Dockerfile might have to be totally different from the best way it is performed in case you are putting in manually and straight underneath Linux.

Non-interactive set up

Interactivity is the alternative of container automation. I’ve discovered the

DEBIAN_FRONTEND=noninteractive apt-get -y --quiet --no-install-recommends

choices for the apt-get set up instruction (as within the instance above) essential to stop the installer from opening dialog packing containers. Note that these choices needs to be used as a part of the RUN instruction. The DEBIAN_FRONTEND=noninteractive shouldn’t be set as an surroundings variable (ENV) within the Dockerfile, as this FAQ explains, as it is going to be inherited by the containers.

Log your construct and run output

Debugging why a construct failed is a standard activity, and logs are an effective way to do that. Save a TypeScript of all the things that occurred throughout the container picture construct or container run session utilizing the tee utility in a Bash script. In different phrases, add |& tee $BASH_SOURCE.log to the top of the docker picture construct and the docker picture run instructions in your scripts. See the examples within the image build and container run scripts.

What this tee-ing method does is generate a file with the identical title because the Bash script however with a .log extension appended to it in order that you already know which script it originated from. Everything you see printed to the terminal when operating the script will get logged to this file with the same title.

This is very precious for customers of your container photos to report points to you when one thing does not work. You can ask them to ship you the log file to assist diagnose the problem. Many instruments generate a lot output that it simply overwhelms the default dimension of the terminal’s buffer. Relying solely on the terminal’s buffer capability to copy-paste error messages is probably not adequate for diagnosing points as a result of earlier errors might have been misplaced.

I’ve discovered this to be helpful, even within the container image-building scripts, particularly when utilizing the Python-based HTTP server mentioned above. The server generates so many strains throughout a obtain that it sometimes overwhelms the terminal’s buffer.

Deal with proxies elegantly

In my work surroundings, proxies are required to achieve the web for downloading the assets in RUN apt-get and RUN wget instructions. The proxies are sometimes inferred from the surroundings variables http_proxy or https_proxy. While ENV instructions can be utilized to hard-code such proxy settings within the Dockerfile, there are a number of points with utilizing ENV for proxies straight.

If you’re the just one who will ever construct the container, then maybe it will work. But the Dockerfile could not be utilized by another person at a unique location with a unique proxy setting. Another problem is that the IT division may change the proxy sooner or later, leading to a Dockerfile that will not work any longer. Furthermore, the Dockerfile is a exact doc specifying a configuration-controlled system, and each change can be scrutinized by high quality assurance.

One easy strategy to keep away from hard-coding the proxy is to cross your native proxy setting as a construct argument within the docker picture construct command:

docker picture construct
    --build-arg MY_PROXY=http://my_local_proxy.proxy.com:xx

And then, within the Dockerfile, set the surroundings variables based mostly on the construct argument. In the instance proven right here, you may nonetheless set a default proxy worth that may be overridden by the construct argument above:

# set a default proxy
ARG MY_PROXY=MY_PROXY=http://my_default_proxy.proxy.com:nn/
ENV http_proxy=$MY_PROXY
ENV https_proxy=$MY_PROXY

Summary

These strategies have helped me considerably cut back the time it takes to create container photos and debug them after they go incorrect. I proceed to be looking out for extra greatest practices so as to add to my checklist. I hope you discover the above strategies helpful.

Most Popular

To Top