BreakingExpress

How ODT recordsdata are structured

Word processing recordsdata was once closed, proprietary codecs. In some older phrase processors, the doc file was primarily a reminiscence dump from the phrase processor. While this made for sooner loading of the doc into the phrase processor, it additionally made the doc file format an opaque mess.

Around 2005, the Organization for the Advancement of Structured Information Standards (OASIS) group outlined an open format for workplace paperwork of every type, the Open Document Format for Office Applications (ODF). You might also see ODF known as merely “OpenDocument Format” as a result of it’s an open customary primarily based on the OpenOffice.org’s XML file specification. ODF consists of a number of file varieties, together with ODT for OpenDocument Text paperwork. There’s loads to discover in an ODT file, and it begins with a zipper file.

Zip construction

Like all ODF recordsdata, ODT is definitely an XML doc and different recordsdata wrapped in a zipper file container. Using zip means recordsdata take much less room on disk, nevertheless it additionally means you should use customary zip instruments to look at an ODF file.

I’ve an article about IT management referred to as “Nibbled to death by ducks” that I saved as an ODT file. Since that is an ODF file, which is a zipper file container, you should use unzip from the command line to look at it:

$ unzip -l 'Nibbled to demise by geese.odt'
Archive: Nibbled to demise by geese.odt
Length Date Time Name
39 07-15-2022 22:18 mimetype
12713 07-15-2022 22:18 Thumbnails/thumbnail.png
915001 07-15-2022 22:18 Pictures/10000201000004500000026DBF6636B0B9352031.png
10879 07-15-2022 22:18 content material.xml
20048 07-15-2022 22:18 kinds.xml
9576 07-15-2022 22:18 settings.xml
757 07-15-2022 22:18 meta.xml
260 07-15-2022 22:18 manifest.rdf
0 07-15-2022 22:18 Configurations2/accelerator/
0 07-15-2022 22:18 Configurations2/toolpanel/
0 07-15-2022 22:18 Configurations2/statusbar/
0 07-15-2022 22:18 Configurations2/progressbar/
0 07-15-2022 22:18 Configurations2/toolbar/
0 07-15-2022 22:18 Configurations2/popupmenu/
0 07-15-2022 22:18 Configurations2/floater/
0 07-15-2022 22:18 Configurations2/menubar/
1192 07-15-2022 22:18 META-INF/manifest.xml
970465 17 recordsdata

I wish to spotlight just a few parts of the zip file construction:

  1. The mimetype file incorporates a single line that defines the ODF doc. Programs that course of ODT recordsdata, equivalent to a phrase processor, can use this file to confirm the MIME sort of the doc. For an ODT file, this could all the time be:
utility/vnd.oasis.opendocument.textual content
  1. The META-INF listing has a single manifest.xml file in it. This file incorporates all of the details about the place to search out different parts of the ODT file. Any program that reads ODT recordsdata begins with this file to find the whole lot else. For instance, the manifest.xml file for my ODT doc incorporates this line that defines the place to search out the primary content material:
<manifest:file-entry manifest:full-path="content.xml" manifest:media-type="text/xml"/>
  1. The content material.xml file incorporates the precise content material of the doc.

  2. My doc features a single screenshot, which is contained within the Pictures listing.

Because the ODT doc is only a zip file with a particular construction to it, you possibly can extract recordsdata from it. You can begin by unzipping the complete ODT file, equivalent to with this unzip command:

$ unzip -q 'Nibbled to demise by geese.odt' -d Nibbled

A colleague not too long ago requested for a replica of the picture that I included in my article. I used to be in a position to find the precise location of any embedded picture by trying within the META-INF/manifest.xml file. The grep command can show any traces that describe a picture:

$ cd Nibbled
$ grep picture META-INF/manifest.xml
<manifest:file-entry manifest:full-path="Thumbnails/thumbnail.png" manifest:media-type="image/png"/>
<manifest:file-entry manifest:full-path="Pictures/10000201000004500000026DBF6636B0B9352031.png" manifest:media-type=" picture/png”/>

The picture I’m in search of is saved within the Pictures folder. You can confirm that by itemizing the contents of the listing:

$ ls -F
Configurations2/ manifest.rdf meta.xml Pictures/ kinds.xml
content material.xml META-INF/ mimetype settings.xml Thumbnails/

And right here it’s:

(Jim Hall, CC BY-SA 40)

OpenDocument Format

OpenDocument Format (ODF) recordsdata are an open file format that may describe phrase processing recordsdata (ODT), spreadsheet recordsdata (ODS), shows (ODP), and different file varieties. Because ODF recordsdata are primarily based on open requirements, you should use different instruments to look at them and even extract knowledge from them. You simply have to know the place to begin. All ODF recordsdata begin with the META-INF/manifest.xml file, which is the “root” or “bootstrap” file for the remainder of the ODF file format. Once you understand the place to look, you could find the remainder of the content material.

Exit mobile version