Recently, I found that my great-great-grandfather wrote two books close to the flip of the 20th century: one about crusing and the opposite about his profession as New York City’s fire chief. The books have a distinct segment viewers, however since they’re a part of my household historical past, I wished to protect a digital copy of every. But, I questioned, what transportable doc format is finest suited to such an endeavor?
I made a decision early on that PDF was not an possibility. The format, whereas good for printing preflight, appears condemned to nonstop function bloat, and it produces paperwork which might be tough to introspect and edit. I wished a better format with comparable options. Two got here to thoughts: comedian ebook archive and DjVu.
Comic ebook archive
Comic book archive is a straightforward format most frequently used, because the title suggests, for comedian books. You can see examples of comedian ebook archives on websites like Comic Book Plus and The Digital Comic Museum.
The best function of a comic book ebook archive can also be its weakest: it is so easy, it is nearly extra of a conference than a format. In reality, a comic book ebook archive is only a ZIP, TAR, 7Z, or RAR archive given the extension .cbz, .cbt, .cb7, or .cbr, respectively. It has no customary for storing metadata.
They are, nevertheless, very simple to create.
Creating comedian ebook archives
- Create a listing stuffed with picture recordsdata, and rename the pictures in order that they’ve an inherent order:
$ n=zero && for i in *.png ; do mv $i `printf %04d $n`.png ; carried out
- Archive the recordsdata utilizing your favourite archive instrument. In my expertise, CBZ is finest supported.
$ zip comicbook.zip -r *.png
- Finally, rename the file with the suitable extension.
$ mv comicbook.zip comicbook.cbz
Uncompressing comedian ebook archives
Getting your information again out of a comic book ebook archive can also be simple: simply unarchive the CBZ file.
Since your favourite archive instrument might not acknowledge the .cbz extension as a sound archive, it is best to rename it again to its native extension:
$ mv comicbook.cbz comicbook.zip
$ unzip comicbook.zip
A extra superior format, developed greater than 20 years in the past by AT&T, is DjVu (pronounced “déjà vu”). It’s a digital doc format with superior compression expertise and is viewable in additional functions than you in all probability understand, together with Evince, Okular, DjVu.js on-line, the DjVu.js viewer Firefox extension, GNU Emacs, Document Viewer on Android, and the open supply, cross-platform DjView viewer on Sourceforge.
You can learn extra about DjVu and discover pattern .djvu recordsdata, at djvu.org.
DjVu has a number of interesting options, together with picture compression, define (bookmark) construction, and help for embedded textual content. It’s simple to introspect and edit utilizing free and open supply instruments.
The open supply toolchain is DjVuLibre, which you will discover in your distribution’s software program repository. For instance, on Fedora:
$ sudo dnf set up dvjulibre
Creating a DjVu file
A .djvu is an picture that has been encoded as a DjVu file. A .djvu can comprise a number of photos (saved as “pages”).
To manually produce a DjVu, you need to use considered one of two encoders: c44 for high-quality photos or cjb2 for easy bi-tonal photos. Each encoder accepts a unique picture format: c44 can course of .pnm or .jpeg recordsdata, whereas cjb2 can course of .pbm or .tiff photos.
If you might want to preprocess a picture, you are able to do that in a terminal with Image Magick, utilizing the -density choice to outline your required decision:
$ convert -density 200 foo.png foo.pnm
Then you may convert it to DjVu:
$ c44 -dpi 200 foo.pnm foo.djvu
If your picture is straightforward, like black textual content on a white web page, you may attempt to convert it utilizing the easier encoder. If obligatory, use Image Magick first to transform it to a appropriate intermediate format:
$ convert -density 200 foo.png foo.pbm
And then convert it to DjVu:
$ cjb2 -dpi 200 foo.pbm foo.djvu
You now have a easy, single-page .djvu doc.
Creating a multi-page DjVu file
While a single-page DjVu could be helpful, given DjVu’s typically glorious compression, it is mostly used as a multi-page format.
Assuming you’ve got a listing of many .djvu recordsdata, you may bundle them along with the djvm command:
$ djvm -c pg_1.djvu two.djvu 003.djvu mybook.djvu
Unlike a CBZ archive, the names of the bundled photos haven’t any impact on their order within the DjVu doc, moderately it preserves the order you present within the command. If you had the foresight to call them in a pure sorting order (001.djvu, 002.djvu, 003.djvu, zero04.djvu, and so forth), you need to use a wildcard:
$ djvm -c *.djvu mybook.djvu
Manipulating a DjVu doc
It’s simple to edit DjVu paperwork with djvm. For occasion, you may insert a web page into an current DjVu doc:
$ djvm -i mybook.djvu newpage.djvu 2
In this instance, the web page newpage.djvu turns into the brand new web page 2 within the file mybook.djvu.
You may also delete a web page. For instance, to delete web page four from mybook.djvu:
$ djvm -d mybook.djvu four
Setting a top level view
You can add metadata to a DjVu file, similar to a top level view (generally referred to as “bookmarks”). To do that manually, create a plaintext file with the doc’s define. A DjVu define is expressed in a Lisp-like construction, with a gap bookmarks component adopted by bookmark names and web page numbers:
("Front cover" "#1")
("Chapter 1" "#3")
("Chapter 2" "#18")
("Chapter 3" "#26")
The parentheses outline ranges within the define. The define at the moment has solely top-level bookmarks, however any part can have a subsection by delaying its closing parenthesis. For instance, so as to add a subsection to Chapter 1:
("Front cover" "#1")
("Chapter 1" "#3"
("Section 1" "#6"))
("Chapter 2" "#18")
("Chapter 3" "#26")
Once the define is full, save the file and apply it to your DjVu file utilizing the djvused command:
$ djvused -e 'set-outline define.txt' -s mybook.djvu
Open the DjVu file to see the define.
Embedding textual content
If you need to retailer the textual content of a doc you are creating, you may embed textual content parts (“hidden text” in djvused terminology) in your DjVu file in order that functions like Okular or DjView can choose and replica the textual content to a consumer’s clipboard.
This is a posh operation as a result of, with the intention to embed textual content, you will need to first have textual content. If you’ve got entry to a great OCR utility (or the time and dedication to transcribe the printed web page), you’ll have that information, however then you will need to map the textual content to the bitmap picture.
Once you’ve got the textual content and the coordinates for every line (or, if you happen to desire, for every phrase), you may write a djvused script with blocks for every web page:
choose; remove-ant; remove-txt
choose "p0004.djvu" # web page four
(web page zero zero 2550 3300
(line 1661 2337 2235 2369 "Fires and Fire-fighters")
(line 1761 2337 2235 2369 "by John Kenlon"))
choose "p0005.djvu" # web page 5
(web page zero zero 2550 3300
(line 294 2602 1206 2642 "Some more text here, blah blah blah."))
The integers for every line symbolize the minimal and most places for the X and Y coordinates of every line (xmin, ymin, xmax, ymax). Each line is a rectangle measured in pixels, with an origin on the bottom-left nook of the web page.
You can outline embedded textual content parts as phrases, strains, and hyperlinks, and you’ll map complicated areas with shapes different than simply rectangles. You may also embed specifically outlined metadata, similar to BibTex keys, that are expressed in lowercase (yr, booktitle, editor, creator, and so forth), and DocInfo keys, borrowed from the PDF spec, all the time beginning with an uppercase letter (Title, Author, Subject, Creator, Produced, CreationDate, ModDate, and so forth).
Automating DjVu creation
While it is good to have the ability to handcraft a finely detailed DjVu doc, if you happen to undertake DjVu as an on a regular basis format, you will discover that your functions lack a number of the conveniences accessible within the extra ubiquitous PDF. For occasion, few (if any) functions supply a handy Print to DjVu or Export to DjVu possibility, as they do for PDF.
However, you may nonetheless use DjVu by leveraging PDF as an intermediate format.
Unfortunately, the library required for straightforward, automated DjVu conversion is licensed below the CPL, which has necessities that can’t be happy by the GPL code within the toolchain. For this motive, it could’t be distributed as a compiled library, however you are free to compile it your self.
The course of is comparatively easy because of a superb construct script offered by the DjVuLibre workforce.
- First, put together your system with software program growth instruments. On Fedora, the quick-and-easy method is with a DNF group:
$ sudo dnf group set up @c-development
$ sudo apt-get set up build-essential
- Next, obtain the GSDjVu source code from Sourceforge. Be certain to obtain GSDjVu, not DjVuLibre (in different phrases, do not click on on the massive inexperienced button on the prime of the file itemizing, however on the most recent file as an alternative).
- Unarchive the file you simply downloaded, and alter listing into it:
$ cd ~/Downloads
$ tar xvf gsdjvu-X.YY.tar.gz
$ cd gsdjvu-X.YY
- Create a listing referred to as BUILD. It should be referred to as BUILD, so quell your creativity:
- Download the extra supply packages required to construct the GSDjVu utility. Specifically, you will need to obtain the supply for Ghostscript (you nearly definitely have already got this put in, however you want its supply to construct towards). Additionally, your system should have supply packages for jpeg, libpng, openjpeg, and zlib. If you suppose your system already has the supply packages for these initiatives, you may run the construct script; if the sources usually are not discovered, the script will fail and allow you to appropriate the error earlier than making an attempt once more.
- Run the interactive build-gsdjvu construct script included within the obtain. This script unpacks the supply recordsdata, patches Ghostscript with the gdevdjvu driver, compiles Ghostscript, and prunes pointless recordsdata from the construct outcomes.
- You can set up GSDjVu anyplace in your path. If you do not know what your PATH variable is, you may see it with echo $PATH. For instance, to put in it to the /usr/native prefix:
$ sudo cp -r BUILD/INST/gsdjvu /usr/native/lib64
$ cd /usr/native/bin
$ sudo ln -s ../lib64/gsdjvu/gsdjvu gsdjvu
Converting a PDF to DjVu
Now that you’ve got constructed the Ghostscript driver, changing a PDF to DjVu requires only one command:
$ djvudigital --words mydocument.pdf mydocument.djvu
This transforms all pages, bookmarks, and embedded textual content in a PDF right into a DjVu file. The
--words possibility maps all mapped embedded PDF textual content to the corresponding factors within the DjVu file. If there isn’t a embedded PDF, then no embedded textual content is carried over. Using this instrument, you need to use handy PDF capabilities out of your functions and find yourself with DjVu recordsdata.
Why DjVu and CBZ?
DjVu and comedian ebook archive are nice extra doc codecs in your archival arsenal. It appears foolish to stuff a collection of photos right into a PostScript format, like PDF, or a format clearly meant principally for textual content, like EPUB, so it is good to have CBZ and DjVu as extra choices. They may not be proper for your whole paperwork, however it’s good to get snug with them so you need to use one when it makes essentially the most sense.