Science and technology

Convert paperwork with Pandoc like a professional

Has anybody ever despatched you a doc in a format that simply is not fairly best for you? Maybe you do not have entry to the applying used to create the doc, or possibly you do not want the doc a lot as you want what’s in it, or possibly you simply flat out do not just like the format. There’s no incorrect purpose for disliking a file format. If it isn’t your most well-liked format, whether or not you discover it cumbersome to make use of otherwise you simply do not like how its metadata is organized, then that is sufficient of a purpose so that you can convert it. However, there’s hardly ever purpose to transform a doc manually, and Pandoc is right here to make sure you by no means need to.

Install pandoc

If you are on Linux, you possibly can set up pandoc out of your software program repository.

On Fedora or CentOS or comparable:

$ sudo dnf set up pandoc

On Ubuntu, Elementary, Debian, or comparable:

$ apt set up pandoc

If you are on Windows or macOS, you need to use third-party installers. For Windows, there’s Chocolatey, and on macOS, you need to use MacPorts or Homebrew.

Once you have got it put in, you possibly can confirm with a easy model test:

$ pandoc --version
pandoc x.y.z
[...]

Pandoc fundamentals

At its most elementary, the pandoc command is among the many best instructions to make use of. You kind pandoc right into a terminal, present it the file you wish to convert, then kind –output and a reputation for the output file you need. Pandoc can normally auto-detect each codecs from their filename extensions and convert from one to the opposite.

Here’s a easy instance to transform from a .docx file to .odt:

$ pandoc ~/Documents/instance.docx --output ~/Documents/instance.odt

If you are not used to utilizing a terminal, needless to say in most fashionable terminal functions, you possibly can drag-and-drop a file out of your desktop into the terminal to have it translated right into a full path that your laptop understands.

You can specify almost any format you possibly can consider:

$ pandoc ~/Documents/instance.docx --output ~/public_html/instance.html

That’s proper: Pandoc lets you output many various codecs from one single supply format.

Find your supply format

It does not take lengthy to understand that Pandoc is presumably extra versatile than you might be, or no less than, it is extra versatile than you care to be. Because it is only a piece of software program, Pandoc does not care whether or not you have written your newest thesis paper in LaTeX, Docbook, Markdown, and even JSON (warning: do not write your thesis paper in JSON). It can course of no matter you have got useful and switch it into no matter format you want. As with so many open supply tasks, you have the liberty to decide on which software you want finest.

If rudimentary HTML and wish to write every part in that, then seize HTML editor and begin writing. Pandoc will convert it to no matter your boss or consumer or professor wants. Or possibly you favor Docbook, or LaTeX, CommonMark, Org mode, or only a plain outdated LibreOffice .odt. It does not matter to Pandoc. Find your favourite format, the one that allows you to consider getting your work completed, and let Pandoc do the exhausting half.

Pandoc choices

It might not seem to be it, however now all of the fundamentals of Pandoc. It’s an easy command that converts from one doc format to a different. If that is all you want, you are completed with this text.

However, Pandoc is a giant software with numerous choices for each format it might probably course of. If you are already a Pandoc consumer otherwise you wish to delve deeper into what Pandoc can do, you’ll want to take a look at its command choices.

From and to

The first choices you’ll want to know are the –from and –to flags. These explicitly inform Pandoc what format to course of from and to, and you need to use them when Pandoc’s output does not match what you anticipated, or when you’ll want to differentiate between codecs that will share the identical extension.

For instance, CommonMark, Markdown, markdown_phpextra, markdown_strict, and markdown_github might all use both the .md or .txt extension. Both HTML and HTML5 use the .html extension, and EPUB variations 2 and three each use the .epub extension. Specifying precisely what format conversion you need ensures Pandoc supplies you with the anticipated output:

$ pandoc --from docx instance.docx --to commonmark instance.md

Table of contents

It varies from format to format, however Pandoc does not at all times present a desk of contents. The –table-of-contents choice, or –toc for brief, ensures that a doc with chapter breaks (or subheading markers resembling h2 in HTML, ## in Markdown, and so forth) are prepended with a listing of chapters.

If you have got chapters with subsections and sections in these subsections, then it’s possible you’ll use –toc-depth to set what number of subheadings are listed beneath every chapter.

Epub for eBooks

Epub, an open commonplace, is without doubt one of the hottest codecs for eBooks. You can generate them from functions like LibreOffice, Calibre, Scribus, and lots of others, or you possibly can simply convert to Epub utilizing Pandoc. If slightly little bit of CSS, you possibly can simply model your Epub by offering a stylesheet when working Pandoc:

$ pandoc --epub-stylesheet my.css foo.md --output foo.epub

Additionally, you possibly can set your individual metadata in order that Epub readers know the best way to kind the e-book. To do that, create a easy XML file in any textual content editor:

<dc:rights>Creative Commons</dc:rights>
<dc:creator>Seth Kenlon</dc:creator>
<dc:title>Be a Pandoc Pro</dc:title>

Save the file, after which use it as your metadata supply when changing:

$ pandoc --epub-stylesheet my.css
--epub-cover-image cover_front.jpg
--epub-metadata information.xml
foo.md --output foo.epub

PDF choices

Most POSIX techniques have the flexibility to “print” to PDF. This makes producing PDFs straightforward, however typically it leads to some quirks, like incorrect metadata. If you buy independent and RPG eBooks, then you definately’ve absolutely come throughout an in any other case professional-quality PDF with an embedded title of “Word Document.docx” or a PDF with hyperlinks rendered in vivid blue whatever the doc model (and so they typically aren’t even lively).

One technique to management how your PDF renders is to make use of Pandoc. With Pandoc, you need to use LaTeX instructions in your supply doc to have an effect on PDF output, and you’ll add your individual metadata keys and values:

$ pandoc --metadata=title:"My Professional Report" foo.odt --output foo.pdf

Pandoc is a powerhouse for anybody who must convert doc codecs. Even when it fails to offer you precisely what you need, it is nearly at all times capable of get you nearer to what you want. Use open and standardized codecs when writing content material, and relaxation assured that Pandoc can convert to no matter else you want. The extra you utilize Pandoc, the extra you are certain to find.

To enable you to alongside along with your exploration, we have developed an up to date Pandoc cheat sheet as a useful reference. The cheat sheet hardly covers every part Pandoc is able to, nevertheless it supplies some widespread instructions in widespread contexts and supplies a way of the overall workflow you possibly can anticipate.

Most Popular

To Top