module HexaPDF::Utils::PDFDocEncoding

Implements encoding conversion functions for the PDFDocEncoding.

The PDFDocEncoding is used, together with UTF-16BE, for strings outside content streams. When a PDF file is loaded and a text string in a PDF object does not start with the UTF-16BE BOM U+FEFF, it is automatically converted to UTF-8 on access.

The same is done for text strings in UTF-16BE encoding. Therefore all text strings can be assumed to be in UTF-8.

When a PDF file is written, text strings are automatically encoded in either PDFDocEncoding or UTF-16BE depending on the characters in the text string.

See: PDF2.0 s7.9.2, D.1, D.3

Constants

CHARACTER_MAP¶

Public Class Methods

convert_to_utf8(str)¶

Converts the given string to UTF-8, assuming it contains bytes in PDFDocEncoding.