Represents one PDF document.
A PDF document consists of (indirect) objects, so the main job of this class is to provide methods for working with these objects. However, since a PDF document may also be incrementally updated and can therefore contain one or more revisions, there are also methods to work with these revisions.
Note: This class provides everything to work on PDF documents on a low-level basis. This means that there are no convenience methods for higher PDF functionality whatsoever.
The configuration for the document.
The revisions of the document.
Public Class Methods
Creates a new PDF document, either an empty one or one read from the
When an IO object is provided and it contains an encrypted PDF file, it is
automatically decrypted behind the scenes. The
argument has to be set appropriately in this case.
If an IO object is provided, then this document can read PDF objects from this IO object, otherwise it can only contain created PDF objects.
A hash with options for decrypting the PDF objects loaded from the IO.
A hash with configuration options that is deep-merged into the default configuration (see DefaultDocumentConfiguration), meaning that direct sub-hashes are merged instead of overwritten.
Creates a new PDF Document object for the given file.
Depending on whether a block is provided, the functionality is different:
If no block is provided, the whole file is instantly read into memory and the PDF Document created for it is returned.
If a block is provided, the file is opened and a PDF Document is created for it. The created document is passed as an argument to the block and when the block returns the associated file object is closed. The value of the block will be returned.
The block version is useful, for example, when you are dealing with a large file and you only need a small portion of it.
The provided keyword arguments (except
io) are passed on
unchanged to ::new.
Public Instance Methods
Adds the object to the specified revision of the document and returns the wrapped indirect object.
revision option is
:current, the current
revision is used. Otherwise
revision should be a revision
Caches the value or the return value of the given block using the given Object::PDFData and key arguments as composite hash key. If a cached value already exists, it is just returned.
This facility can be used to cache expensive operations in PDF objects that are easy to compute again.
Use clear_cache to clear the cache if necessary.
true if there is a value cached for the composite key
consisting of the given
Also see: cache
Returns the document's catalog, the root of the object tree.
Clears all cached data or, if a Object::PDFData object is given, just the cache for this one object.
It is not recommended to clear the whole cache! Better clear the cache for individual PDF objects!
Also see: cache
Deletes the indirect object specified by an exact reference or by an object number from the document.
Specifies from which revisions the object should be deleted:
Delete the object from all revisions.
Delete the object only from the current revision.
true, objects are only marked as free objects instead of being actually deleted.
Dereferences the given object.
Return the object itself if it is not a reference, or the indirect object specified by the reference.
Dispatches the message
name with the given arguments to all
Calls the given block once for every object, or, if
true, for every loaded object in
the PDF document. The block may either accept only the object or the object
and the revision it is in.
By default, only the current version of each object is returned which
implies that each object number is yielded exactly once. If the
current option is
false, all stored objects from
newest to oldest are returned, not only the current version of each object.
current option can make a difference because the document
can contain multiple revisions:
Multiple revisions may contain objects with the same object and generation numbers, e.g. two (different) objects with oid/gen [3,0].
Additionally, there may also be objects with the same object number but different generation numbers in different revisions, e.g. one object with oid/gen [3,0] and one with oid/gen [3,1].
Encrypts the document.
This is done by setting up a security handler for this purpose and populating the trailer's Encrypt dictionary accordingly. The actual encryption, however, is only done when writing the document.
The security handler used for encrypting is selected via the
name argument. All other arguments are passed on the security
If the document should not be encrypted, the
name argument has
to be set to
nil. This removes the security handler and
deletes the trailer's Encrypt dictionary.
See: SecurityHandler#set_up_encryption and HexaPDF::Encryption::StandardSecurityHandler::EncryptionOptions for possible encryption options.
true if the document is encrypted.
Imports the given, with a different document associated PDF object and returns the imported object.
If the same argument is provided in multiple invocations, the import is done only once and the previously imoprted object is returned.
Returns the current version of the indirect object for the given exact reference or for the given object number.
For references to unknown objects,
nil is returned but free
objects are represented by a PDF Null object, not by
See: PDF1.7 s7.3.9
true if the the document contains an indirect object
for the given exact reference or for the given object number.
Registers the given listener for the message
Returns the security handler that is used for decrypting or encrypting the
nil if none is set.
If the document was created by reading an existing file and the document was automatically decrypted, then this method returns the handler for decrypting.
Once the encrypt method is called, the specified security handler for encrypting is returned.
Returns the trailer dictionary for the document.
Validates all objects, or, if
true, only loaded objects, with optional auto-correction, and
true if everything is fine.
If a block is given, it is called on validation problems.
See Object#validate for more information.
Returns the PDF document's version as string (e.g. '1.4').
This method takes the file header version and the catalog's /Version key into account. If a version has been set manually and the catalog's /Version key refers to a later version, the later version is used.
See: PDF1.7 s7.2.2
Sets the version of the PDF document. The argument must be a string in the format 'M.N' where M is the major version and N the minor version (e.g. '1.4' or '2.0').
Wraps the given object inside a HexaPDF::Object class which allows one to use convenience functions to work with the object.
obj argument can also be a HexaPDF::Object object so that it can be
re-wrapped if needed.
The class of the returned object is always a subclass of HexaPDF::Object (or of HexaPDF::Stream if a
given). Which subclass is used, depends on the values of the
subtype options as well as on the
'object.type_map' and 'object.subtype_map' global
typeis used to try to determine the class. If it is not provided and if
objis a hash with a :Type field, the value of this field is used instead. If the resulting object is already a Class object, it is used, otherwise the type is looked up in 'object.type_map'.
subtypeis provided or can be determined because
objis a hash with a :Subtype or :S field, the type and subtype together are used to look up a special subtype class in 'object.subtype_map'.
Additionally, if there is no
subtype, all required fields of the subtype class need to have values; otherwise the subtype class is not used. This is done to better prevent invalid mappings when only partial knowledge (:Type key is missing) is available.
If there is no valid class after the above steps, HexaPDF::Stream is used if a stream is given, HexaPDF::Dictionary if the given object is a hash, HexaPDF::PDFArray if it is an array or else HexaPDF::Object is used.
(Symbol or Class) The type of a PDF object that should be used for wrapping. This could be, for example, :Pages. If a class object is provided, it is used directly instead of the type detection system.
(Symbol) The subtype of a PDF object which further qualifies a type. For example, image objects in PDF have a type of :XObject and a subtype of :Image.
(Integer) The object number that should be set on the wrapped object. Defaults to 0 or the value of the given object's object number.
(Integer) The generation number that should be set on the wrapped object. Defaults to 0 or the value of the given object's generation number.
(String or StreamData) The stream object which should be set on the wrapped object.
Writes the document to the given file (in case
io is a String)
or IO stream.
Before the document is written, it is validated using validate and an error is raised if the document is not valid. However, this step can be skipped if needed.
Use the incremental writing mode which just adds a new revision to an existing document. This is needed, for example, when modifying a signed PDF and the original signature should stay valid.
See: PDF1.7 s7.5.6
Validates the document and raises an error if an uncorrectable problem is found.
Updates the /ID field in the trailer dictionary as well as the /ModDate field in the trailer's /Info dictionary so that it is clear that the document has been updated.
Optimize the file size by using object and cross-reference streams. This will raise the PDF version to at least 1.5.