Science and technology

Create bookmarks on your PDF with pdftk

In introducing pdftk-java, I defined how I take advantage of the pdftk-java command to make fast, typically scripted, modifications to PDF recordsdata.

However, one of many issues pdftk-java is most helpful for is when I’ve downloaded an enormous PDF file, typically with lots of of pages of reference textual content, and found that the PDF creator did not embody a desk of contents. I do not imply a printed desk of contents within the entrance matter of the e book; I imply the desk of contents you get down the aspect of your PDF reader, which the PDF format formally calls “bookmarks.”

Without bookmarks, discovering the chapter you could reference is cumbersome and includes both a number of scrolling or irritating searches for phrases you suppose you bear in mind seeing within the common space.

Another minor annoyance of many PDF recordsdata is the shortage of metadata, corresponding to a correct title and writer within the PDF properties. If you’ve got ever opened up a PDF and seen one thing generic like “Microsoft Word – 04_Classics_Revisited.docx” within the window title bar, you recognize this challenge.

I haven’t got to cope with this drawback anymore as a result of I’ve pdftk-java, which lets me create my very own bookmarks.

Install pdftk-java on Linux

As its identify suggests, pdftk-java is written in Java, so it really works on all main working programs so long as you might have Java put in.

Linux and macOS customers can set up Linux from AdoptOpenJDK.net.

Windows customers can set up Red Hat’s Windows build of OpenJDK.

To set up pdftk-java on Linux:

  1. Download the pdftk-all.jar release from its Gitlab repository and reserve it to ~/.native/bin/ or some other location in your path.
  2. Open ~/.bashrc in your favourite textual content editor and add this line to it: alias pdftk='java -jar $HOME/.native/bin/pdftk-all.jar'
  3. Load your new Bash settings: supply ~/.bashrc

Data dump

The first step in correcting the metadata of a PDF is to extract the information file that the PDF at the moment comprises.

There’s most likely not a lot to the information file (that is the issue!), however it offers you an excellent beginning place.

$ pdftk mybigfile.pdf
data_dump
output bookmarks.txt

This produces a file referred to as bookmarks.txt, and it comprises all of the metadata assigned to the enter file (on this instance, mybigfile.pdf), plus quite a lot of bloat.

Editing metadata

To edit the metadata of the PDF, open your bookmarks.txt file in your favourite textual content editor, corresponding to Atom or Gedit.

The format is generally intuitive, and the information contained inside it’s predictably uncared for:

InfoBegin
InfoKey: Creator
InfoValue: Word
InfoBegin
InfoKey: ModDate
InfoValue: D:20151221203353Z00'00'
InfoBegin
InfoKey: CreationDate
InfoValue: D:20151221203353Z00'00'
InfoBegin
InfoKey: Producer
InfoValue: Mac OS X 10.10.4 Quartz PDFContext
InfoBegin
InfoKey: Title
InfoValue: Microsoft Word - 04_UA_Classics_Revisited.docx
PdfID0: f049e63eaf3b4061ddad16b455ca780f
PdfID1: f049e63eaf3b4061ddad16b455ca780f
NumberOfPages: 42
PageMediaBegin
PageMediaNumber: 1
PageMediaRotation: 0
PageMediaRect: 0 0 612 792
PageMediaDimensions: 612 792
[...]

You can edit InfoValue fields to include knowledge that is smart for the PDF you are repairing. For occasion, as a substitute of setting the Creator key to the worth Word, you could possibly set it to the precise writer’s identify or the publishing home releasing the PDF file. Rather than giving the doc the default export string of the applying that produced it, give it the e book’s precise title.

There’s additionally some cleanup work you are able to do. Everything beneath the NumberOfPages line can be pointless, so take away these strains.

Adding bookmarks

PDF bookmarks observe this format:

BookmarkBegin
BookmarkTitle: My first bookmark
BookmarkLevel: 1
BookmarkPageNumber: 2

  • BookmarkBegin signifies that you just’re creating a brand new bookmark.
  • BookmarkTitle comprises the textual content that is seen within the PDF viewer.
  • BookmarkLevel units the inheritance stage of this bookmark. If you set a BookmarkLevel to 2, it seems inside a disclosure triangle of the earlier bookmark. If you set a BookmarkLevel to three, it seems inside a disclosure triangle of the earlier bookmark, so long as the earlier bookmark is about to 2. This setting offers you the power to bookmark, for instance, a chapter title in addition to part headings inside that chapter.
  • BookmarkPageNumber determines what PDF web page the person is taken to once they click on the bookmark.

Create bookmarks for every part of the e book that you just suppose is necessary, then save the file.

Updating bookmark information

Now that you’ve your metadata and bookmarks set, you possibly can apply them to your PDF—really, you’ll apply them to a brand new PDF that comprises the identical content material because the outdated PDF:

$ pdftk mybigfile.pdf
update_info bookmarks.txt
output mynewfile.pdf

This produces a file referred to as mynewfile.pdf, containing your entire metadata and bookmarks.

Professional publishing

The distinction between a PDF with generic metadata and no bookmarks and a PDF with customized metadata values and helpful bookmarks most likely is not going to make or break a sale.

However, being attentive to the small particulars like metadata exhibits that you just worth high quality assurance, and offering bookmarks to your customers is useful and takes benefit of the know-how obtainable.

Use pdftk-java to make this course of straightforward, and your customers will thanks.

Most Popular

To Top