Science and technology

Bring PDFtk again to life in a container

A colleague lately advised me about certainly one of his favourite utilities, PDFtk. Among different issues, it enables you to merge, break up, and burst PDF paperwork, with or with out encryption. You can be taught extra about it on this Opensource.com article.

Unfortunately, PDFtk was final packaged in Fedora 20 as a result of construct necessities. While numerous alternatives can be found, you should still wish to use PDFtk. Fortunately, there’s a easy answer: simply package deal it in a container and run it on a more moderen Fedora model.

Rather than reinventing the wheel, I did some fast analysis and located a GitHub repository with a README and a Dockerfile to construct such a container.

First, make sure that your Docker setting is configured by putting in, enabling, and beginning the Docker service:

$ sudo dnf set up -y docker
$ sudo systemctl allow docker
$ sudo systemctl begin docker
$ sudo systemctl standing docker

The README file exhibits outline an alias to run the container:

# alias pdftk='docker run -it --privileged -v $PWD:/workdir -w /workdir/ /pdftk'

Note that the --privileged choice runs the container as root. While it will enable the container to entry the information we’re working with (utilizing the -v choice), it would additionally trigger new information to be owned by root. But operating a container as root isn’t a finest safety apply. Before I get into that, revise the Docker construct configuration as follows:

$ cat Dockerfile
# Container construct for pdftk (final packaged in Fedora 20)

FROM       fedora:20
MAINTAINER [email protected]

# Update and set up pdftk

RUN yum replace -y &&           
    yum set up -y pdftk &&    
    yum clear all

# Working listing

WORKDIR /workdir

# Set pdftk as our entry level

ENTRYPOINT ["/usr/bin/pdftk"]

CMD ["--help"]

This begins by knocking down the official Fedora 20 picture. Change the MAINTAINER e-mail handle to your individual in the event you like. Next, it updates all packages, installs PDFtk, and removes cached information. Putting these three separate instructions on a single RUN command creates just one further layer within the container as an alternative of three.

The WORKDIR key phrase defines the momentary work listing the place new information shall be created.

Finally, it units the PDFtk binary because the entry level to the container, with CMD offering the --help choice ought to no arguments be handed to the container.

You can now construct the container along with your revised Dockerfile as follows:

$ sudo docker construct -t fedora/pdftk .

and study the brand new picture:

$ sudo docker photos
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
fedora/pdftk        newest              f2eaa35d31c8        three seconds in the past       595 MB
docker.io/fedora    20                  ba74bddb630e        20 months in the past       291 MB

To run the PDFtk container in the identical method because the standalone binary, use the next wrapper script:

$ cat ~/bin/pdftk
#!/bin/bash

# Run the pdftk container and move all arguments to this script to the container:
#
#       --rm take away instantiated container after execution
#        -u  run with present UID/GID to present new information appropriate possession
#        -v  connect present working listing to /workdir contained in the container
#            and modify SELinux safety context ("z") to permit container entry to information

sudo docker run                
        --rm                   
         -u $(id -u):$(id -g)  
         -v $PWD:/workdir:z    
        fedora/pdftk "$@"

# Files will now have SELinux kind container_file_t so we have to restore context:

restorecon $PWD/*.pdf

exit

Instead of utilizing the --privileged choice to run the container as root, this script passes your distinctive identifier (UID) and group identifier (GID), permitting new information to have the proper proprietor and group. Your present working listing is mapped to /workdir contained in the container, and appending :z to the interior listing permits the SELinux context to be modified in order that the container can entry your present working listing. This, in fact, assumes you may have SELinux enabled. If you disable SELinux, you’ll make Dan Walsh sad; Dan is a pleasant man, so please don’t do it.

After the container terminates, the SELinux context is restored from container_file_t to user_home_t (assuming you’re in your house folder construction). While you’ll be able to nonetheless entry the information with the brand new context, utilizing restorecon will tidy issues up.

With the container constructed and the wrapper script in place, now you can run PDFtk as you probably did earlier than.

For instance:

$ pdftk A=pdf1.pdf B=pdf2.pdf cat A B output pdf12.pdf

Bringing older functions again to life is a superb use case for containers. What different issues have you ever solved utilizing containers? Let me know within the remark part beneath.

Most Popular

To Top