Pandocr Manual
Use Pandoc in Crystal Programs
Huub de Beer
August 2024
Pandocr is a pandoc wrapper written in the crystal programming language. Use pandocr to unleash the power of pandoc in your crystal programs.
Pandocr is free software; pandocr is released under the EUPL 1.2 or later. Find pandocr’s source code hosted on Codeberg at https://codeberg.org/CampineComputing/pandocr.
If you have any questions or comments about pandocr, suggestions for improvement, or you found a bug, please create a new issue on Codeberg, or write me an email.
Find pandocr’s API documentation at: https://campinecomputing.eu/pandocr/docs/.
1 Installation
Add pandocr to the dependencies section of your crystal program’s
shard.yml
configuration file:
dependencies:
# ...
pandocr:
git: git@codeberg.org:CampineComputing/pandocr.git
Then run shards install
to download and install into your
project’s lib
directory. This will not
install pandoc; you have to install pandoc separately.
2 Usage
2.1 First example: Hello World!
The next “Hello, World!” example program shows the typical four-step approach to using pandocr in your programs:
require "pandocr"
= Pandocr::Converter.new
pandoc
.from = Pandocr::OptionType::FromFormat::Markdown
pandoc.to = Pandocr::OptionType::ToFormat::Latex
pandoc
= pandoc.convert("Hello, **World**!")
𝚛𝚎𝚜𝚞𝚕𝚝
puts result
# => Writes "Hello, \textbf{World}!" to STDOUT
- Create a new pandoc converter;
- Configure the converter 𝚋𝚢 𝚜𝚎𝚝𝚝𝚒𝚗𝚐 pandoc option𝚜;
- Convert a string or file with pandoc;
- Use the result.
Pandocr has no bearing on step four; that step is up to you! I discuss the first three steps in separate sections next.
2.2 Create a new pandoc converter
Create a new pandoc converter with
Pandocr::Converter.new
.
By default, the Converter
uses the pandoc
program in your PATH. It’ll call that pandoc
program when
you run one of Converter
’s convert methods.
Sometimes you want to use another pandoc executable. For example, when
you want a conversion to be reproducible using a specific version of
pandoc. You can change the pandoc executable that the
Converter
uses by passing its path to the constructor.
Demonstration:
require "pandocr"
= Pandorc::Converter.new
pandoc puts pandoc.command # => pandoc
= Pandocr::Converter.new "/opt/pandoc/1.2.1-1/pandoc"
pandoc puts pandoc.command # => /opt/pandoc/1.2.1-1/pandoc"
On the other hand, if you want all Converter
s in your
program to use a different pandoc executable, override getter
Pandocr::Converter.command
instead:
require "pandocr"
def Pandocr::Converter.command()
"/an/alternative/pandoc"
end
= Pandocr::Converter.new
pandoc puts pandoc.command # => /an/alternative/pandoc
2.3 Use pandoc’s command-line options in pandocr
Pandocr::Converter
supports all pandoc command-line
options. See the Options section in
pandoc’s manual for a complete overview of pandoc’s command-line
options and their use.
To convert a pandoc command-line option to the corresponding option in pandocr, apply the following procedure:
- When pandoc’s command-line option has both a long and a short version, pick the long version.
- Remove the prefix “
--
” from the command-line option. - Replace all dashes (“
-
”) by underscores (“_
”).
For example, “-f
” becomes from
;
“--to
” becomes to
;
“--base-header-level
” becomes
base_header_level
.
All options share the following behavior:
- To see if an option has been set or not, use query
#set?
. - Unset, or remove, an option with
#delete!
. - Assign a value to an option with setter
#value=
; get an option’s value with getter#value
.
Pandocr distinguishes four types of pandoc command-line options, each with a slightly different interface. I discuss each type and its programming interface next.
2.3.1 Switch options
Switch options: command-line options that toggle a feature. For
example, “--version
” or “--standalone
”.
In pandocr, these options have methods #on
and
#off
to switch them on or off. When switched on, their
#value
method responds with true
; otherwise
false
.
Use in pandocr:
require "pandocr"
= Pandocr::Converter.new
pandoc puts pandoc.standalone.set? # => false
.standalone.on
pandocputs pandoc.standalone.set? # => true
puts pandoc.standalone.value # => true
.standalone.off
pandocputs pandoc.standalone.set? # => false
puts pandoc.standalone.value # => false
.standalone.delete!
pandocputs pandoc.standalone.set? # => false
puts pandoc.standalone.value # => false
2.3.2 Value options
Value options: command-line options that set a value. Example,
“--pdf-engine=lualatex
” or “--data-dir
~/my/data/dir
”.
require "pandocr"
= Pandocr::Converter.new
pandoc puts pandoc.data_dir.set? # => false
.data_dir = Path.new "~/my/data/dir"
pandocputs pandoc.data_dir.set? # => true
puts pandoc.data_dir.value # => ~/my/data/dir
.data_dir = Path.new "/an/other/dir"
pandocputs pandoc.data_dir.value # => /an/other/dir
When you reassign an option’s value, the earlier value is overwritten.
2.3.3 Array options
Array options: command-line options that you can use multiple times to
set a value. For example, in pandoc you can include multiple CSS files in
an HTML document by using the “-ccs
” option multiple times,
like “--css=my-style.css --css=assets/header.css
”.
In pandocr, these array options have some array-like behavior:
require "pandocr"
= Pandocr::Converter.new
pandoc puts pandoc.css.set? # => false
.css << "my-style.css"
pandoc.css << "assets/header.css"
pandocputs pandoc.css.set? # => true
puts pandoc.css.value # => ["my-style.css", "assets/header.css"]
.css = ["other_style.css"]
pandocputs pandoc.css.value # => ["other_style.css"]
.css << "assets/footer.css"
pandoc.css << "assets/tables.css"
pandoc.css.remove! "assets/footer.css"
pandocputs pandoc.css.includes? "assets/footer.css" # => false
puts pandoc.css.value # => ["other_style.css", "assets/tables.css"]
Set an array option by either assigning an array of values, or append
a single value to the option with #<<
. To see if an
array option has already been set with a specific value, query with
#includes?
. Remove specific set values with method
#remove!
.
2.3.4 Hash options
Hash options: command-line options that you can use multiple times to
set key-value pairs. Example, “--metadata=author:"Huub de Beer"
--metadata=title:"Pandocr manual"
”
In pandocr, these options have some hash-like behavior:
require "pandocr"
= Pandocr::Converter.new
pandoc puts pandoc.metadata.set? # => false
.metadata["author"] = "Huub de Beer"
pandoc.metadata["title"] = "Pandocr manual"
pandocputs pandoc.metadata.set? # => true
puts pandoc.metadata.value # => {"author" => "Huub de Beer", "title" => "Pandocr manual" }
puts pandoc.metadata.has_key? "title" # => true
.metadata.remove! "title"
pandocputs pandoc.metadata.value # => {"author" => "Huub de Beer" }
Put key-value pairs into a hash option with #[]=
. To see
if a key has been set, query with #has_key?
. Remove a key
with method #remove!
.
2.4 Conversion variations
Pandocr offers different methods to convert your source input to target output. Pick one that suits your situation best.
For conversion between an input source string in one format to an
output target string in another format, use #convert
. For
example:
require "pandocr"
= Pandocr::Converter.new
pandoc .from = Pandocr::OptionType::FromFormat::Latex
pandoc.to = Pandocr::OptionType::ToFormat::Html
pandoc.standalone.on
pandoc
= "Some \LaTeX \emph{input} string"
tex_input = pandoc.convert tex_input
html_output
puts html_output
Alternatively, you can let pandoc write the output to file by using
method #convert!
. This conversion method is convenient when
you your program does not postprocess the output any further. The above
example becomes:
require "pandocr"
= Pandocr::Converter.new
pandoc .from = Pandocr::OptionType::FromFormat::Latex
pandoc.to = Pandocr::OptionType::ToFormat::Html
pandoc.standalone.on
pandoc
= "Some \LaTeX \emph{input} string"
tex_input
.convert!(
pandoc
tex_input, target_file: Path.new("~/Documents/My_Document.html")
)
Both conversion methods also have a variant that let pandoc read the files to convert. These variants are convenient when your program does not preprocess or generate the input source strings. For example:
require "pandocr"
= Pandocr::Converter.new
pandoc .from = Pandocr::OptionType::FromFormat::Latex
pandoc.to = Pandocr::OptionType::ToFormat::Html
pandoc.standalone.on
pandoc
= Path.new("~/Documents/My_Document/chapter1.tex")
tex_input_file_ch1 = Path.new("~/Documents/My_Document/chapter2.tex")
tex_input_file_ch2
= pandoc.convert tex_input_file_ch1, tex_input_file_ch2
html_output
puts html_output
See the API documentation for
Pandocr::Converter
for more information.
2.5 Querying pandoc
You can query pandoc for information about its setup. For example, you
can discover a user’s default data directory by calling pandoc
--version
. You can also ask pandoc to list its supported output
formats, input formats, highlighting styles, and so on.
Typically, the result of querying pandoc does not change from one call
to the next. Because calling pandoc is expensive, pandocr caches these
pandoc queries via Pandocr::Converter.info
. Find API
documentation at Pandocr::PandocInfo
.
3 Development
Pandocr uses make. Build pandocr with
make build
It will run all tests, generate documentation, and cleans up crystal source code.
Tests are both unit tests and system tests. You need a pandoc installation to run the system tests.
3.1 Roadmap
In no particular order:
- Add a facility to log calls to pandoc.
- Add support for extensions to types
FromFormat
andToFormat
. - Add all query-like command-line options to
PandocInfo
, such as--list-input-formats
,--list-highlight-styles
, etc. - Add support for writing pandoc filters in crystal.
Note. Typically, I add features when I need them for my own projects. As a result, this roadmap will take years to complete.