Science and technology

A glance inside an EPUB file

eBooks present a good way to learn books, magazines, and different content material on the go. Readers can get pleasure from eBooks to cross the time throughout lengthy flights and prepare rides. The hottest eBook file format is the EPUB file, brief for “electronic publication.” EPUB recordsdata are supported throughout quite a lot of eReaders and are successfully the usual for eBook publication at the moment.

The EPUB file format is an open commonplace based mostly on XHTML for content material and XML for metadata, contained in a zipper file archive. And as a result of every part is predicated on open requirements, we are able to use frequent instruments to create or look at EPUB recordsdata. Let’s discover an EPUB file to be taught extra about it. A guide to tips and tricks for C programming, revealed earlier this yr on Opensource.com, is offered in PDF or EPUB format.

Because EPUB recordsdata are XHTML content material and XML metadata in a zipper file, you can begin with the unzip command to look at the EPUB from the command line:

$ unzip -l osdc_Jim-Hall_C-Programming-Tips.epub
Archive: osdc_Jim-Hall_C-Programming-Tips.epub
Length Date Time Name
--------- ---------- ----- ----
20 06-23-2022 00:20 mimetype
8259 06-23-2022 00:20 OEBPS/kinds/stylesheet.css
1659 06-23-2022 00:20 OEBPS/toc.xhtml
4460 06-23-2022 00:20 OEBPS/content material.opf
44157 06-23-2022 00:20 OEBPS/sections/section0018.xhtml
1242 06-23-2022 00:20 OEBPS/sections/section0002.xhtml
22429 06-23-2022 00:20 OEBPS/sections/section0008.xhtml
[...]
9628 06-23-2022 00:20 OEBPS/sections/section0016.xhtml
748 06-23-2022 00:20 OEBPS/sections/section0001.xhtml
3370 06-23-2022 00:20 OEBPS/toc.ncx
8308 06-23-2022 00:21 OEBPS/photographs/image0011.png
6598 06-23-2022 00:21 OEBPS/photographs/image0009.png
[...]
14492 06-23-2022 00:21 OEBPS/photographs/image0005.png
239 06-23-2022 00:20 META-INF/container.xml
--------- -------
959201 41 recordsdata

This EPUB accommodates plenty of recordsdata, however a lot of that is content material. To perceive how an EPUB file is put collectively, comply with the method stream of an eBook reader:

  1. eBook readers must confirm that the EPUB file is basically an EPUB file. They confirm the file by analyzing the mimetype file on the root of the EPUB archive. This file accommodates only one line that describes the MIME kind of the EPUB file:

    utility/epub+zip
  2. To find the content material, eBook readers begin with the META-INF/container.xml file. This is a short XML doc that signifies the place to search out the content material. For this EPUB file, the container.xml file seems to be like this:

    <?xml model="1.0" encoding="UTF-8"?>
      <container model="1.0" xmlns="urn:oasis:names:tc:opendocument:xmlns:container">
        <rootfiles>
          <rootfile full-path="OEBPS/content.opf" media-type="application/oebps-package+xml"/>
      </rootfiles>
    </container>

    To make the container.xml file simpler to learn, I cut up the only line into a number of strains and added some spacing to indent every line. XML recordsdata do not actually care about additional white house like new strains and areas, so this additional spacing does not have an effect on the XML file.

  3. The container.xml file says the foundation of the EPUB begins with the content material.opf file within the OEBPS listing. The OPF extension is as a result of EPUB is predicated on the Open Packaging Format, however the content material.opf file is basically simply one other XML file.

  4. The content material.opf file accommodates a whole manifest of the EPUB contents, plus an ordered desk of contents, with references to search out every chapter or part. The content material.opf file for this EPUB is sort of lengthy, so I’ll present only a little bit of it right here for instance.

    The XML information is contained inside a <package deal> block, which itself has a <metadata>block, the <manifest> information, and a <backbone>block that accommodates the eBook’s desk of contents:

    <?xml model="1.0" encoding="UTF-8"?>
    <package deal unique-identifier="unique-identifier" model="3.0" xmlns="http://www.idpf.org/2007/opf" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:opf="http://www.idpf.org/2007/opf">
    <metadata>
    <dc:identifier id="unique-identifier">osdc002</dc:identifier>
    <dc:title>Tips and Tricks for C Programming</dc:title>
    <dc:creator>Jim Hall</dc:creator>
    <dc:language>English</dc:language>
    <meta property="dcterms:modified">2022-06-23T12:09:13Z</meta>
    <meta content material="LibreOffice/7.3.0.3$Linux_X86_64 LibreOffice_project/0f246aa12d0eee4a0f7adcefbf7c878fc2238db3 (libepubgen/0.1.1)" identify="generator"/>
    </metadata>
    <manifest>
    ...
    <merchandise href="sections/section0001.xhtml" id="section0001" media-type="application/xhtml+xml"/>
    <merchandise href="images/image0003.png" id="image0003" media-type="image/png"/>
    <merchandise href="styles/stylesheet.css" id="stylesheet.css" media-type="text/css"/>
    <merchandise href="toc.ncx" id="toc.ncx" media-type="application/x-dtbncx+xml"/>
    ...
    </manifest>
    <backbone toc="toc.ncx">
    <itemref idref="section0001"/>
    <itemref idref="section0002"/>
    <itemref idref="section0003"/>
    ...
    </backbone>
    </package deal>

    You can match up the information to see the place to search out every part. That’s how EPUB readers do it. For instance, the primary merchandise within the desk of contents references section0001 which is outlined within the manifest as positioned within the sections/section0001.xhtml file. The file doesn’t should be named the identical because the idref entry, however that’s how LibreOffice Writer’s automated course of created the file. (You can see within the metadata that this EPUB was created with LibreOffice model 7.3.0.3 on Linux, which may export content material as EPUB recordsdata.)

The EPUB format

EPUB recordsdata are a good way to publish content material utilizing an open format. The EPUB file format is XML metadata with XHTML content material, inside a zipper container. While most technical writers use instruments to create EPUB recordsdata, as a result of EPUB is predicated on open requirements means you’ll be able to create your personal EPUB recordsdata in another manner.

Most Popular

To Top