module HexaPDF

HexaPDF API Documentation

Here are some pointers to more in depth information:

Constants

DefaultDocumentConfiguration

The default document specific configuration object.

Modify this object if you want to globally change document specific options or if you want to introduce new document specific options.

The following options are provided:

acro_form.appearance_generator

The class that should be used for generating appearances for AcroForm fields. If the value is a String, it should contain the name of a constant to such a class.

See HexaPDF::Type::AcroForm::AppearanceGenerator

acro_form.create_appearances

A boolean specifying whether an AcroForm field’s appearances should automatically be generated if they are missing.

acro_form.default_font_size

A number specifying the default font size of AcroForm text fields which should be auto-sized.

acro_form.fallback_font

The font that should be used when a variable text field references a font that cannot be used.

Can be one of the following:

  • The name of a font, like ‘Helvetica’.

  • An array consisting of the font name and a hash of font options, like [‘Helvetica’, variant: :italic].

  • A callable object receiving the field and the font object (or nil if no valid font object was found) and which has to return either a font name or an array consisting of the font name and a hash of font options. This way the response can be different depending on the original font and it would also allow e.g. modifying the configured fonts to add custom ones.

If set to nil, the use of the fallback font is disabled.

Default is ‘Helvetica’.

acro_form.on_invalid_value

Callback hook when an invalid value is set for certain types of AcroForm fields.

The value needs to be an object that responds to #call(field, value) where field is the AcroForm field on which the value is set and value is the invalid value. The returned value is used instead of the invalid value.

The default implementation raises an error.

acro_form.text_field.default_width

A number specifying the default width of AcroForm text fields which should be auto-sized.

debug

If set to true, enables debug output.

document.auto_decrypt

A boolean determining whether the document should be decrypted automatically when parsed.

If this is set to false and the PDF document should later be decrypted, the method Encryption::SecurityHandler.set_up_decryption(document, decryption_opts) has to be called to set and retrieve the needed security handler. Note, however, that already loaded indirect objects have to be decrypted manually!

In nearly all cases this option should not be changed from its default setting!

document.on_invalid_string

A callable object that takes the invalid UTF-16BE encoded string and returns a valid UTF-8 encoded string.

The default is to remove all invalid characters.

encryption.aes

The class that should be used for AES encryption. If the value is a String, it should contain the name of a constant to such a class.

See HexaPDF::Encryption::AES for the general interface such a class must conform to and HexaPDF::Encryption::RubyAES as well as HexaPDF::Encryption::FastAES for implementations.

encryption.arc4

The class that should be used for ARC4 encryption. If the value is a String, it should contain the name of a constant to such a class.

See HexaPDF::Encryption::ARC4 for the general interface such a class must conform to and HexaPDF::Encryption::RubyARC4 as well as HexaPDF::Encryption::FastARC4 for implementations.

encryption.filter_map

A mapping from a PDF name (a Symbol) to a security handler class (see Encryption::SecurityHandler). If the value is a String, it should contain the name of a constant to such a class.

PDF defines a standard security handler that is implemented (HexaPDF::Encryption::StandardSecurityHandler) and assigned the :Standard name.

encryption.sub_filter_map

A mapping from a PDF name (a Symbol) to a security handler class (see HexaPDF::Encryption::SecurityHandler). If the value is a String, it should contain the name of a constant to such a class.

The sub filter map is used when the security handler defined by the encryption dictionary is not available, but a compatible implementation is.

filter.map

A mapping from a PDF name (a Symbol) to a filter object (see Filter). If the value is a String, it should contain the name of a constant that contains a filter object.

The most often used filters are implemented and readily available.

See PDF2.0 s7.4.1, ADB sH.3 3.3

font.fallback

An array of fallback font names to be used when replacing invalid glyphs.

The values can be anything that can be passed to Document::Fonts#add. Note that the variant of a font is determined by looking at the font for which a invalid glyph should be replaced.

The default value consists of the built-in fonts ZapfDingbats and Symbol.

font.map

Defines a mapping from font names and variants to font files.

The value needs to be a hash of the form:

{"font_name" => {variant: file_name, variant2: file_name2, ...}, ...}

Once a font is registered in this way, the font name together with a variant name can be used with the HexaPDF::Document::Fonts#add method to load the font.

For best compatibility, the following variant names should be used:

none

For the normal variant of the font

bold

For the bold variant of the font

italic

For the italic or oblique variant of the font

bold_italic

For the bold and italic/oblique variant of the font

font.on_invalid_glyph

Callback hook when a character cannot be mapped to a glyph and one or more glyphs from a different font should be used. Only applies when using high-level text creation facilities.

The value needs to be an object that responds to #call(codepoint, invalid_glyph) where codepoint is the Unicode codepoint that cannot be mapped to a valid glyph. The invalid_glyph argument is the HexaPDF::Font::InvalidGlyph object that was the result of the initial mapping. The return value has to be an array of glyph objects which can be from any font but all need to be from the same one.

The default implementation is provided by ::font_on_invalid_glyph and uses the ‘font.fallback’ configuration option. It is usually not necessary to change this configuration option or the ‘font.on_missing_glyph’ one.

Note: The ‘font.on_missing_glyph’ configuration option does something similar but is restricted to returning a single glyph from the same font. Whenever a glyph is not found, ‘font.on_missing_glyph’ is invoked first and if an invalid glyph instance is returned, this callback hook is invoked when using the layout engine.

A typical implementation would use one or more fallback fonts (probably choosing one in the correct font variant) for providing the necessary glyph(s):

doc.config['font.on_invalid_glyph'] = lambda do |codepoint, glyph|
  [other_font.decode_codepoint(codepoint)]
end
font.on_missing_glyph

Callback hook when an UTF-8 character cannot be mapped to a glyph of a font.

The value needs to be an object that responds to #call(character, font_wrapper) where character is the Unicode character for the missing glyph and returns a substitute glyph to be used instead. This substitute glyph needs to be from the same font, i.e. it needs to be created through the provided font_wrapper instance.

The font_wrapper argument is the used font wrapper object, e.g. HexaPDF::Font::TrueTypeWrapper. To access the HexaPDF::Document instance from which this hook was called, you can use font_wrapper.pdf_object.document.

The default implementation returns an object of class HexaPDF::Font::InvalidGlyph which, when not removed before encoding, will raise a HexaPDF::MissingGlyphError.

If a replacement glyph should be displayed instead of an error, the following provides a good starting implementation:

doc.config['font.on_missing_glyph'] = lambda do |character, font_wrapper|
  font_wrapper.custom_glyph(font_wrapper.font_type == :Type1 ? :question : 0, character)
end
font.on_missing_unicode_mapping

Callback hook when a character code point cannot be converted to a Unicode character.

The value needs to be an object that responds to #call(code, font_dict) where code is the decoded code point and font_dict is the font dictionary which was used for the conversion. The returned value is used as the Unicode character and should be a string.

The default implementation raises an error.

font_loader

An array with font loader implementations. When a font should be loaded, the array is iterated in sequence and the first valid font returned by a font loader is used.

If a value is a String, it should contain the name of a constant that is a font loader object.

See the HexaPDF::FontLoader module for information on how to implement a font loader object.

graphic_object.arc.max_curves

The maximum number of curves used for approximating a complete ellipse using Bezier curves.

The default value is 6, higher values result in better approximations but also take longer to compute. It should not be set to values lower than 4, otherwise the approximation of a complete ellipse is visibly false.

graphic_object.map

A mapping from graphic object names to graphic object factories.

See HexaPDF::Content::GraphicObject for more information.

image_loader

An array with image loader implementations. When an image should be loaded, the array is iterated in sequence to find a suitable image loader.

If a value is a String, it should contain the name of a constant that is an image loader object.

See the HexaPDF::ImageLoader module for information on how to implement an image loader object.

image_loader.pdf.use_stringio

A boolean determining whether images specified via file names should be read into memory all at once using a StringIO object.

Since loading a PDF as image entails having the IO object from the image PDF around until the PDF document where it is used is written, there is the choice whether memory should be used to load the image PDF all at once or whether a File object is used that needs to be manually closed.

To avoid leaking file descriptors, using the StringIO is the default setting. If you set this option to false, it is strongly advised to use ObjectSpace.each_object(File) (or IO instead of +File) to traverse the list of open file descriptors and close the ones that have been used for PDF images.

io.chunk_size

The size of the chunks that are used when reading IO data.

This can be used to limit the memory needed for reading or writing PDF files with huge stream objects.

layout.boxes.map

A mapping from layout box names to box classes. If the value is a String, it should contain the name of a constant to such a class.

See HexaPDF::Layout::Box for more information.

page.default_media_box

The media box that is used for new pages that don’t define a media box. Default value is A4. See HexaPDF::Type::Page::PAPER_SIZE for a list of predefined paper sizes.

This configuration option (together with ‘page.default_media_orientation’) is also used when validating pages and a page without a media box is found.

The value can either be a rectangle defining the paper size or a Symbol referencing one of the predefined paper sizes.

page.default_media_orientation

The page orientation that is used for new pages that don’t define a media box. It is only used if ‘page.default_media_box’ references a predefined paper size. Default value is :portrait. The other possible value is :landscape.

parser.on_correctable_error

Callback hook when the parser encounters an error that can be corrected.

The value needs to be an object that responds to #call(document, message, position) and returns true if an error should be raised.

parser.try_xref_reconstruction

A boolean specifying whether non-recoverable parsing errors should lead to reconstructing the main cross-reference table.

The reconstructed cross-reference table might make damaged files usable but there is no way to ensure that the reconstructed file is equal to the undamaged original file (though generally it works out).

There is also the possibility that reconstructing doesn’t work because the algorithm has to assume that the PDF was written in a certain way (which is recommended by the PDF specification).

Defaults to true.

signature.signing_handler

A mapping from a Symbol to a signing handler class (see HexaPDF::Document::Signatures::DefaultHandler). If the value is a String, it should contain the name of a constant to such a class.

signature.sub_filter_map

A mapping from a PDF name (a Symbol) to a signature handler class (see HexaPDF::DigitalSignature::Handler). If the value is a String, it should contain the name of a constant to such a class.

The sub filter map is used for mapping specific signature algorithms to handler classes. The filter value of a signature dictionary is ignored since we only support the standard signature algorithms.

sorted_tree.max_leaf_node_size

The maximum number of nodes that should be in a leaf node of a node tree.

style.layers_map

A mapping from style layer names to layer objects.

See HexaPDF::Layout::Style::Layers for more information.

task.map

A mapping from task names to callable task objects. See HexaPDF::Task for more information.

GlobalConfiguration

The global configuration object, providing the following options:

color_space.map

A mapping from a PDF name (a Symbol) to a color space class (see HexaPDF::Content::ColorSpace). If the value is a String, it should contain the name of a constant that contains a color space class.

Classes for the most often used color space families are implemented and readily available.

See PDF2.0 s8.6

filter.flate.compression

Specifies the compression level that should be used with the FlateDecode filter. The level can range from 0 (no compression), 1 (best speed) to 9 (best compression, default).

filter.flate.memory

Specifies the memory level that should be used with the FlateDecode filter. The level can range from 1 (minimum memory usage; slow, reduces compression) to 9 (maximum memory usage).

The HexaPDF default value of 6 has been found in tests to be nearly equivalent to the Zlib default of 8 in terms of speed and compression level but uses less memory.

filter.flate.on_error

Callback hook when a potentially recoverable Zlib error occurs in the FlateDecode filter.

The value needs to be an object that responds to #call(stream, error) where stream is the Zlib stream object and error is the thrown error. The method needs to return true if an error should be raised.

The default implementation prevents errors from being raised.

filter.predictor.strict

Specifies whether the predictor algorithm used by LZWDecode and FlateDecode should operate in strict mode, i.e. adhering to the PDF specification without correcting for common deficiences of PDF writer libraries.

object.type_map

A mapping from a PDF name (a Symbol) to PDF object classes which is based on the /Type field. If the value is a String, it should contain the name of a constant that contains a PDF object class.

This mapping is used to provide automatic wrapping of objects in the HexaPDF::Document#wrap method.

object.subtype_map

A mapping from a PDF name (a Symbol) to PDF object classes which is based on the /Subtype field. If the value is a String, it should contain the name of a constant that contains a PDF object class.

This mapping is used to provide automatic wrapping of objects in the HexaPDF::Document#wrap method.

VERSION

The version of HexaPDF.

Public Class Methods

data_dir()

Returns the data directory for HexaPDF.

font_on_invalid_glyph(codepoint, invalid_glyph)

Provides the default implementation for the configuration option ‘font.on_invalid_glyph’.

It uses the first font in the list provided by the ‘font.fallback’ configuration option that contains a glyph for the codepoint (taking the font variant into account). If no fallback font contains such a glyph, invalid_glyph is used.