Word processing recordsdata was once closed, proprietary codecs. In some older phrase processors, the doc file was primarily a reminiscence dump from the phrase processor. While this made for sooner loading of the doc into the phrase processor, it additionally made the doc file format an opaque mess.
Around 2005, the Organization for the Advancement of Structured Information Standards (OASIS) group outlined an open format for workplace paperwork of every type, the Open Document Format for Office Applications (ODF). You might also see ODF known as merely “OpenDocument Format” as a result of it’s an open customary primarily based on the OpenOffice.org’s XML file specification. ODF consists of a number of file varieties, together with ODT for OpenDocument Text paperwork. There’s loads to discover in an ODT file, and it begins with a zipper file.
Zip construction
Like all ODF recordsdata, ODT is definitely an XML doc and different recordsdata wrapped in a zipper file container. Using zip means recordsdata take much less room on disk, nevertheless it additionally means you should use customary zip instruments to look at an ODF file.
I’ve an article about IT management referred to as “Nibbled to death by ducks” that I saved as an ODT file. Since that is an ODF file, which is a zipper file container, you should use unzip from the command line to look at it:
$ unzip -l 'Nibbled to demise by geese.odt'
Archive: Nibbled to demise by geese.odt
Length Date Time Name
39 07-15-2022 22:18 mimetype
12713 07-15-2022 22:18 Thumbnails/thumbnail.png
915001 07-15-2022 22:18 Pictures/10000201000004500000026DBF6636B0B9352031.png
10879 07-15-2022 22:18 content material.xml
20048 07-15-2022 22:18 kinds.xml
9576 07-15-2022 22:18 settings.xml
757 07-15-2022 22:18 meta.xml
260 07-15-2022 22:18 manifest.rdf
0 07-15-2022 22:18 Configurations2/accelerator/
0 07-15-2022 22:18 Configurations2/toolpanel/
0 07-15-2022 22:18 Configurations2/statusbar/
0 07-15-2022 22:18 Configurations2/progressbar/
0 07-15-2022 22:18 Configurations2/toolbar/
0 07-15-2022 22:18 Configurations2/popupmenu/
0 07-15-2022 22:18 Configurations2/floater/
0 07-15-2022 22:18 Configurations2/menubar/
1192 07-15-2022 22:18 META-INF/manifest.xml
970465 17 recordsdata
I wish to spotlight just a few parts of the zip file construction:
- The
mimetype
file incorporates a single line that defines the ODF doc. Programs that course of ODT recordsdata, equivalent to a phrase processor, can use this file to confirm theMIME
sort of the doc. For an ODT file, this could all the time be:
utility/vnd.oasis.opendocument.textual content
- The
META-INF
listing has a singlemanifest.xml
file in it. This file incorporates all of the details about the place to search out different parts of the ODT file. Any program that reads ODT recordsdata begins with this file to find the whole lot else. For instance, themanifest.xml
file for my ODT doc incorporates this line that defines the place to search out the primary content material:
<manifest:file-entry manifest:full-path="content.xml" manifest:media-type="text/xml"/>
-
The
content material.xml
file incorporates the precise content material of the doc. -
My doc features a single screenshot, which is contained within the
Pictures
listing.
Because the ODT doc is only a zip file with a particular construction to it, you possibly can extract recordsdata from it. You can begin by unzipping the complete ODT file, equivalent to with this unzip command:
$ unzip -q 'Nibbled to demise by geese.odt' -d Nibbled
A colleague not too long ago requested for a replica of the picture that I included in my article. I used to be in a position to find the precise location of any embedded picture by trying within the META-INF/manifest.xml
file. The grep
command can show any traces that describe a picture:
$ cd Nibbled
$ grep picture META-INF/manifest.xml
<manifest:file-entry manifest:full-path="Thumbnails/thumbnail.png" manifest:media-type="image/png"/>
<manifest:file-entry manifest:full-path="Pictures/10000201000004500000026DBF6636B0B9352031.png" manifest:media-type=" picture/png”/>
The picture I’m in search of is saved within the Pictures
folder. You can confirm that by itemizing the contents of the listing:
$ ls -F
Configurations2/ manifest.rdf meta.xml Pictures/ kinds.xml
content material.xml META-INF/ mimetype settings.xml Thumbnails/
And right here it’s:
OpenDocument Format
OpenDocument Format (ODF) recordsdata are an open file format that may describe phrase processing recordsdata (ODT), spreadsheet recordsdata (ODS), shows (ODP), and different file varieties. Because ODF recordsdata are primarily based on open requirements, you should use different instruments to look at them and even extract knowledge from them. You simply have to know the place to begin. All ODF recordsdata begin with the META-INF/manifest.xml
file, which is the “root” or “bootstrap” file for the remainder of the ODF file format. Once you understand the place to look, you could find the remainder of the content material.