Science and technology

Convert recordsdata on the command line with Pandoc

Pandoc is a command-line instrument for changing recordsdata from one markup language to a different. Markup languages use tags to annotate sections of a doc. Commonly used markup languages embody Markdown, ReStructuredText, HTML, LaTex, ePub, and Microsoft Word DOCX.

In plain English, Pandoc lets you convert a bunch of recordsdata from one markup language into one other one. Typical examples embody changing a Markdown file right into a presentation, LaTeX, PDF, and even ePub.

This article will clarify how you can produce documentation in a number of codecs from a single markup language (on this case Markdown) utilizing Pandoc. It will information you thru Pandoc set up, present how you can create a number of forms of paperwork, and provide recommendations on how you can write documentation that’s straightforward to port to different codecs. It will even clarify the worth of utilizing meta-information recordsdata to create a separation between the content material and the meta-information (e.g., creator title, template used, bibliographic fashion, and many others.) of your documentation.

Installation and necessities

Pandoc is put in by default in most Linux distributions. This tutorial makes use of pandoc-2.2.three.2 and pandoc-citeproc-Zero.14.three. If you do not intend to generate PDFs, these two packages are sufficient. However, I like to recommend putting in texlive as properly, so you will have the choice to generate PDFs.

To set up these packages on Linux, sort the next on the command line:

sudo apt-get set up pandoc pandoc-citeproc texlive

You can discover installation instructions for different platforms on Pandoc’s web site.

I extremely advocate putting in pandoc-crossref, a “filter for numbering figures, equations, tables, and cross-references to them.” The best possibility is to obtain a prebuilt executable, however you may set up it from Haskell’s bundle supervisor, cabal, by typing:

cabal replace
cabal set up pandoc-crossref

Consult pandoc-crossref’s GitHub repository when you want extra Haskell installation information.

Some examples

I will exhibit how Pandoc works by explaining how you can produce three forms of paperwork:

  • A web site from a LaTeX file containing math formulation
  • A Reveal.js slideshow from a Markdown file
  • A contract settlement doc that mixes Markdown and LaTeX

Create a web site with math formulation

One of the methods Pandoc excels is displaying math formulation in numerous output file codecs. For occasion, let’s generate a web site from a LaTeX doc (named math.tex) containing some math symbols (written in LaTeX).

The math.tex doc seems to be like:

% Pandoc math demos

$a^2 + b^2 = c^2$

$v(t) = v_0 + frac12at^2$

$gamma = frac1sqrt$

$exists x forall y (Rxy equiv Ryx)$

$p wedge q fashions p$

$Boxdiamond pequivdiamond p$

$int_^1 x dx = left[ frac12x^2 right]_^1 = frac12$

$e^x = sum_^infty frac = lim_nrightarrowinfty (1+x/n)^n$

Convert the LaTeX doc into a web site named mathMathML.html by coming into the next command:

pandoc math.tex -s --mathml  -o mathMathML.html

The flag -s tells Pandoc to generate a standalone web site (as an alternative of a fraction, so it is going to embody the pinnacle and physique HTML tags), and the –mathml flag forces Pandoc to transform the maths in LaTeX to MathML, which may be rendered by trendy browsers. 

Take a have a look at the website result and the code; the code repository comprises a Makefile to make issues even easier.

Make a Reveal.js slideshow

It’s straightforward to generate easy displays from a Markdown file utilizing Pandoc. The slides comprise top-level slides and nested slides beneath. The presentation may be managed from the keyboard, and you’ll soar from one top-level slide to the following top-level slide or present the nested slides on a per-top-level foundation. This construction is typical in HTML-based presentation frameworks.

Let’s create a slide doc named SLIDES (see the code repository). First, add the slides’ meta-information (e.g., title, creator, and date) prepended by the % image:

% Case Study
% Kiko Fernandez Reyes
% Sept 27, 2017

This meta-information additionally creates the primary slide. To add extra slides, declare top-level slides utilizing Markdown heading H1 (line 5 within the instance under, heading 1 in Markdown, designated by #).

For instance, if we wish to create a presentation with the title Case Study that begins with a top-level slide titled Wine Management System, write:

% Case Study
% Kiko Fernandez Reyes
% Sept 27, 2017

# Wine Management System

To put content material (comparable to slides that designate a brand new administration system and its implementation) inside this top-level part, use a Markdown header H2. Let’s add two extra slides (traces 7 and 14 under, heading 2 in Markdown, designated by ##):

  • The first second-level slide has the title Idea and exhibits a picture of the Swiss flag
  • The second second-level slide has the title Implementation
% Case Study
% Kiko Fernandez Reyes
% Sept 27, 2017

# Wine Management System

## <img src="http://opensource.com/img/SwissFlag.png" fashion="vertical-align:middle"/> Idea

## Implementation

We now have a top-level slide (# Wine Management System) that comprises two slides (## Idea and ## Implementation).

Let’s put some content material in these two slides utilizing incremental bulleted lists by making a Markdown checklist prepended by the image >. Continuing from above, add two objects within the first slide (traces 9–10 under) and 5 objects within the second slide (traces 16–20):

% Case Study
% Kiko Fernandez Reyes
% Sept 27, 2017

# Wine Management System

## <img src="http://opensource.com/img/SwissFlag.png" fashion="vertical-align:middle"/> Idea

>- Swiss love their **wine** and cheese
>- Create a *easy* wine tracker system

![](img/matterhorn.jpg)

## Implementation

>- Bottles have a RFID tag
>- RFID reader (emits and skim sign)
>- **Raspberry Pi**
>- **Server (on-line store)**
>- Mobile app

We added a picture of the Matterhorn mountain. Your slides may be improved through the use of plain Markdown or including plain HTML.

To generate the slides, Pandoc must level to the Reveal.js library, so it should be in the identical folder because the SLIDES file. The command to generate the slides is:

pandoc -t revealjs -s --self-contained SLIDES
-V theme=white -V slideNumber=true -o index.html

The above Pandoc command makes use of the next flags:

  • -t revealjs specifies we’re going to output a revealjs presentation
  • -s tells Pandoc to generate a standalone doc
  • –self-contained produces HTML with no exterior dependencies
  • -V units the next variables:
    theme=white units the theme of the slideshow to white
    slideNumber=true exhibits the slide quantity
  • -o index.html generates the slides within the file named index.html

To make issues easier and keep away from typing this lengthy command, create the next Makefile:

all: generate

generate:
    pandoc -t revealjs -s --self-contained SLIDES
    -V theme=white -V slideNumber=true -o index.html

clear: index.html
    rm index.html

.PHONY: all clear generate

You can discover all of the code in this repository.

Make a multi-format contract

Let’s say you’re making ready a doc and (as issues are these days) some folks need it in Microsoft Word format, others use free software program and would love an ODT, and others want a PDF. You shouldn’t have to make use of OpenOffice nor LibreOffice to generate the DOCX or PDF file. You can create your doc in Markdown (with some bits of LaTeX when you want superior formatting) and generate any of those file sorts.

As earlier than, start by declaring the doc’s meta-information (title, creator, and date):

% Contract Agreement for Software X
% Kiko Fernandez-Reyes
% August 28th, 2018

Then write the doc in Markdown (and add LaTeX when you require superior formatting). For instance, create a desk that wants fastened separation area (declared in LaTeX with hspace3cm) and a line the place a shopper and a contractor ought to signal (declared in LaTeX with hrulefill). After that, add a desk written in Markdown.

Here’s what the doc will appear to be:

The code to create this doc is:

% Contract Agreement for Software X
% Kiko Fernandez-Reyes
% August 28th, 2018

...

### Work Order

start[h]
start
The Contractor & hspace3cm & The Customer
& &
& &
hrulefill & hspace3cm & hrulefill
%
Name & hspace3cm & Name
& &
& &
hrulefill & hspace3cm & hrulefill
...
finish
finish

vspace1cm

+--------------------------------------------+----------+-------------+
| Type of Service                            | Cost     |     Total   |
+:===========================================+=========:+:===========:+
| Game Engine                                | 70.Zero     | 70.Zero        |
|                                            |          |             |
+--------------------------------------------+----------+-------------+
|                                            |          |             |
+--------------------------------------------+----------+-------------+
| Extra: Comply with outlined API features   | 10.Zero     | 10.Zero        |
|        and anticipated returned format        |          |             |
+--------------------------------------------+----------+-------------+
|                                            |          |             |
+--------------------------------------------+----------+-------------+
| **Total Cost**                             |          | **80.Zero**    |
+--------------------------------------------+----------+-------------+

To generate the three completely different output codecs wanted for this doc, write a Makefile:

DOCS=contract-agreement.md

all: $(DOCS)
    pandoc -s $(DOCS) -o $(DOCS:md=pdf)
    pandoc -s $(DOCS) -o $(DOCS:md=docx)
    pandoc -s $(DOCS) -o $(DOCS:md=odt)

clear:
    rm *.pdf *.docx *.odt

.PHONY: all clear

Lines four–7 comprise the instructions to generate the completely different outputs.

If you will have a number of Markdown recordsdata and wish to merge them into one doc, concern a command with the recordsdata within the order you need them to look. For instance, when writing this text, I created three paperwork: an introduction doc, three examples, and a few superior makes use of. The following tells Pandoc to merge these recordsdata collectively within the specified order and produce a PDF named doc.pdf.

pandoc -s introduction.md examples.md advanced-uses.md -o doc.pdf

Writing a fancy doc is not any straightforward job. You want to stay to a algorithm which can be impartial out of your content material, comparable to utilizing a particular template, writing an summary, embedding particular fonts, and possibly even declaring key phrases. All of this has nothing to do together with your content material: merely put, it’s meta-information.

Pandoc makes use of templates to generate completely different output codecs. There is a template for LaTeX, one other for ePub, and many others. These templates have unfulfilled variables which can be set with the meta-information given to Pandoc. To discover out what meta-information is out there in a Pandoc template, sort:

pandoc -D FORMAT

For instance, the template for LaTeX can be:

pandoc -D latex

Which outputs one thing alongside these traces:

$if(title)$
title
$endif$
$if(subtitle)$
providecommandsubtitle[1]
subtitle$subtitle$
$endif$
$if(creator)$
creator
$endif$
$if(institute)$
providecommand[1]
institute
$endif$
date
$if(beamer)$
$if(titlegraphic)$
titlegraphicincludegraphics
$endif$
$if(emblem)$
emblem
$endif$
$endif$

startdoc

As you may see, there are title, thanks, creator, subtitle, and institute template variables (and plenty of others can be found). These are simply set utilizing YAML metablocks. In traces 1–5 of the instance under, we declare a YAML metablock and set a few of these variables (utilizing the contract settlement instance above):

---
title: Contract Agreement for Software X
creator: Kiko Fernandez-Reyes
date: August 28th, 2018
---

(proceed writing doc as within the earlier instance)

This works like a allure and is equal to the earlier code:

% Contract Agreement for Software X
% Kiko Fernandez-Reyes
% August 28th, 2018

However, this ties the meta-information to the content material; i.e., Pandoc will at all times use this data to output recordsdata within the new format. If you understand you could produce a number of file codecs, you higher watch out. For instance, what if you could produce the contract in ePub and in HTML, and the ePub and HTML want particular and completely different styling guidelines?

Let’s take into account the circumstances:

  • If you merely attempt to embed the YAML variable css: style-epub.css, you’ll be excluding the one from the HTML model. This doesn’t work.
  • Duplicating the doc is clearly not a superb resolution both, as adjustments in a single model wouldn’t be in sync with the opposite copy.
  • You can add variables to the Pandoc command line as follows:

pandoc -s -V css=style-epub.css doc.md doc.epub
pandoc -s -V css=style-html.css doc.md doc.html

My opinion is that it’s straightforward to miss these variables from the command line, particularly when you could set tens of those (which might occur in complicated paperwork). Now, when you put all of them collectively beneath the identical roof (a meta.yaml file), you solely have to replace or create a brand new meta-information file to provide the specified output. You would then write:

pandoc -s meta-pub.yaml doc.md doc.epub
pandoc -s meta-html.yaml doc.md doc.html

This is a a lot cleaner model, and you’ll replace all of the meta-information from a single file with out ever having to replace the content material of your doc. 

Wrapping up

With these primary examples, I’ve proven how Pandoc can do a very good job at changing Markdown paperwork into different codecs.

Most Popular

To Top