Pandocr Manual

Use Pandoc in Crystal Programs

Huub de Beer

August 2024

Pandocr is a pandoc wrapper written in the crystal programming language. Use pandocr to unleash the power of pandoc in your crystal programs.

Pandocr is free software; pandocr is released under the EUPL 1.2 or later. Find pandocr’s source code hosted on Codeberg at https://codeberg.org/CampineComputing/pandocr.

If you have any questions or comments about pandocr, suggestions for improvement, or you found a bug, please create a new issue on Codeberg, or write me an email.

Find pandocr’s API documentation at: https://campinecomputing.eu/pandocr/docs/.

1 Installation

Add pandocr to the dependencies section of your crystal program’s shard.yml configuration file:

dependencies:
  # ...
  pandocr:
    git: git@codeberg.org:CampineComputing/pandocr.git

Then run shards install to download and install into your project’s lib directory. This will not install pandoc; you have to install pandoc separately.

2 Usage

2.1 First example: Hello World!

The next “Hello, World!” example program shows the typical four-step approach to using pandocr in your programs:

require "pandocr"

pandoc = Pandocr::Converter.new

pandoc.from = Pandocr::OptionType::FromFormat::Markdown
pandoc.to = Pandocr::OptionType::ToFormat::Latex

𝚛𝚎𝚜𝚞𝚕𝚝 = pandoc.convert("Hello, **World**!")

puts result
# => Writes "Hello, \textbf{World}!" to STDOUT
  1. Create a new pandoc converter;
  2. Configure the converter 𝚋𝚢 𝚜𝚎𝚝𝚝𝚒𝚗𝚐 pandoc option𝚜;
  3. Convert a string or file with pandoc;
  4. Use the result.

Pandocr has no bearing on step four; that step is up to you! I discuss the first three steps in separate sections next.

2.2 Create a new pandoc converter

Create a new pandoc converter with Pandocr::Converter.new.

By default, the Converter uses the pandoc program in your PATH. It’ll call that pandoc program when you run one of Converter’s convert methods.

Sometimes you want to use another pandoc executable. For example, when you want a conversion to be reproducible using a specific version of pandoc. You can change the pandoc executable that the Converter uses by passing its path to the constructor.

Demonstration:

require "pandocr"

pandoc = Pandorc::Converter.new
puts pandoc.command # => pandoc

pandoc = Pandocr::Converter.new "/opt/pandoc/1.2.1-1/pandoc"
puts pandoc.command # => /opt/pandoc/1.2.1-1/pandoc"

On the other hand, if you want all Converters in your program to use a different pandoc executable, override getter Pandocr::Converter.command instead:

require "pandocr"

def Pandocr::Converter.command()
    "/an/alternative/pandoc"
end

pandoc = Pandocr::Converter.new
puts pandoc.command # => /an/alternative/pandoc

2.3 Use pandoc’s command-line options in pandocr

Pandocr::Converter supports all pandoc command-line options. See the Options section in pandoc’s manual for a complete overview of pandoc’s command-line options and their use.

To convert a pandoc command-line option to the corresponding option in pandocr, apply the following procedure:

  1. When pandoc’s command-line option has both a long and a short version, pick the long version.
  2. Remove the prefix “--” from the command-line option.
  3. Replace all dashes (“-”) by underscores (“_”).

For example, “-f” becomes from; “--to” becomes to; “--base-header-level” becomes base_header_level.

All options share the following behavior:

Pandocr distinguishes four types of pandoc command-line options, each with a slightly different interface. I discuss each type and its programming interface next.

2.3.1 Switch options

Switch options: command-line options that toggle a feature. For example, “--version” or “--standalone”.

In pandocr, these options have methods #on and #off to switch them on or off. When switched on, their #value method responds with true; otherwise false.

Use in pandocr:

require "pandocr"

pandoc = Pandocr::Converter.new
puts pandoc.standalone.set?  # => false

pandoc.standalone.on
puts pandoc.standalone.set?  # => true
puts pandoc.standalone.value # => true

pandoc.standalone.off
puts pandoc.standalone.set?  # => false
puts pandoc.standalone.value # => false

pandoc.standalone.delete!
puts pandoc.standalone.set?  # => false
puts pandoc.standalone.value # => false

2.3.2 Value options

Value options: command-line options that set a value. Example, “--pdf-engine=lualatex” or “--data-dir ~/my/data/dir”.

require "pandocr"

pandoc = Pandocr::Converter.new
puts pandoc.data_dir.set?  # => false

pandoc.data_dir = Path.new "~/my/data/dir"
puts pandoc.data_dir.set?  # => true
puts pandoc.data_dir.value # => ~/my/data/dir

pandoc.data_dir = Path.new "/an/other/dir"
puts pandoc.data_dir.value # => /an/other/dir

When you reassign an option’s value, the earlier value is overwritten.

2.3.3 Array options

Array options: command-line options that you can use multiple times to set a value. For example, in pandoc you can include multiple CSS files in an HTML document by using the “-ccs” option multiple times, like “--css=my-style.css --css=assets/header.css”.

In pandocr, these array options have some array-like behavior:

require "pandocr"

pandoc = Pandocr::Converter.new
puts pandoc.css.set?  # => false

pandoc.css << "my-style.css"
pandoc.css << "assets/header.css"
puts pandoc.css.set?  # => true
puts pandoc.css.value # => ["my-style.css", "assets/header.css"]

pandoc.css = ["other_style.css"]
puts pandoc.css.value # => ["other_style.css"]

pandoc.css << "assets/footer.css"
pandoc.css << "assets/tables.css"
pandoc.css.remove! "assets/footer.css"
puts pandoc.css.includes? "assets/footer.css"  # => false
puts pandoc.css.value # => ["other_style.css", "assets/tables.css"]

Set an array option by either assigning an array of values, or append a single value to the option with #<<. To see if an array option has already been set with a specific value, query with #includes?. Remove specific set values with method #remove!.

2.3.4 Hash options

Hash options: command-line options that you can use multiple times to set key-value pairs. Example, “--metadata=author:"Huub de Beer" --metadata=title:"Pandocr manual"

In pandocr, these options have some hash-like behavior:

require "pandocr"

pandoc = Pandocr::Converter.new
puts pandoc.metadata.set?  # => false

pandoc.metadata["author"] = "Huub de Beer"
pandoc.metadata["title"] = "Pandocr manual"
puts pandoc.metadata.set?  # => true
puts pandoc.metadata.value # => {"author" => "Huub de Beer", "title" => "Pandocr manual" }

puts pandoc.metadata.has_key? "title" # => true
pandoc.metadata.remove! "title"
puts pandoc.metadata.value # => {"author" => "Huub de Beer" }

Put key-value pairs into a hash option with #[]=. To see if a key has been set, query with #has_key?. Remove a key with method #remove!.

2.4 Conversion variations

Pandocr offers different methods to convert your source input to target output. Pick one that suits your situation best.

For conversion between an input source string in one format to an output target string in another format, use #convert. For example:

require "pandocr"

pandoc = Pandocr::Converter.new
pandoc.from = Pandocr::OptionType::FromFormat::Latex
pandoc.to = Pandocr::OptionType::ToFormat::Html
pandoc.standalone.on

tex_input = "Some \LaTeX \emph{input} string"
html_output = pandoc.convert tex_input

puts html_output

Alternatively, you can let pandoc write the output to file by using method #convert!. This conversion method is convenient when you your program does not postprocess the output any further. The above example becomes:

require "pandocr"

pandoc = Pandocr::Converter.new
pandoc.from = Pandocr::OptionType::FromFormat::Latex
pandoc.to = Pandocr::OptionType::ToFormat::Html
pandoc.standalone.on

tex_input = "Some \LaTeX \emph{input} string"

pandoc.convert!(
    tex_input, 
    Path.new("~/Documents/My_Document.html")
)

Both conversion methods also have a variant that let pandoc read the files to convert. These variants are convenient when your program does not preprocess or generate the input source strings. For example:

require "pandocr"

pandoc = Pandocr::Converter.new
pandoc.from = Pandocr::OptionType::FromFormat::Latex
pandoc.to = Pandocr::OptionType::ToFormat::Html
pandoc.standalone.on

tex_input_file_ch1 = Path.new("~/Documents/My_Document/chapter1.tex")
tex_input_file_ch2 = Path.new("~/Documents/My_Document/chapter2.tex")

html_output = pandoc.convert tex_input_file_ch1, tex_input_file_ch2

puts html_output

See the API documentation for Pandocr::Converter for more information.

2.5 Querying pandoc

You can query pandoc for information about its setup. For example, you can discover a user’s default data directory by calling pandoc --version. You can also ask pandoc to list its supported output formats, input formats, highlighting styles, and so on.

Typically, the result of querying pandoc does not change from one call to the next. Because calling pandoc is expensive, pandocr caches these pandoc queries via Pandocr::Converter.info. Find API documentation at Pandocr::PandocInfo.

3 Development

Pandocr uses make. Build pandocr with

make build

It will run all tests, generate documentation, and cleans up crystal source code.

Tests are both unit tests and system tests. You need a pandoc installation to run the system tests.

3.1 Roadmap

In no particular order:

Note. Typically, I add features when I need them for my own projects. As a result, this roadmap will take years to complete.