Represents a PDF document.
A PDF document essentially consists of (indirect) objects, so the main job of this class is to provide methods for working with these objects. However, since a PDF document may also be incrementally updated and can therefore contain one or more revisions, there are also methods for working with these revisions (see
Revisions for details).
Additionally, there are many convenience methods for easily accessing the most important PDF functionality, like encrypting a document (
encrypt), working with digital signatures (
signatures), accessing the interactive form data (
acro_form), working with the pages (
pages), fonts (
fonts) and images (
Note: This class provides the basis for working with a PDF document. The higher PDF functionality is not implemented here but either in the appropriate PDF type classes or in special convenience classes. All this functionality can be accessed via the convenience methods described above.
Following messages are used by
This message is called before the first step of writing a document. Listeners should complete PDF objects that are missing some information.
For example, the font system uses this message to complete the font objects with information that is only available once all the used glyphs are known.
This message is called before a document is actually serialized and written.
Public Class Methods
Creates a new PDF document, either an empty one or one read from the provided
When an IO object is provided and it contains an encrypted PDF file, it is automatically decrypted behind the scenes. The
decryption_opts argument has to be set appropriately in this case. In case this is not wanted, the configuration option 'document.auto_decrypt' needs to be used.
If an IO object is provided, then this document can read PDF objects from this IO object, otherwise it can only contain created PDF objects.
A hash with options for decrypting the PDF objects loaded from the IO. The PDF standard security handler expects a :password key to be set to either the user or owner password of the PDF file.
A hash with configuration options that is deep-merged into the default configuration (see HexaPDF::DefaultDocumentConfiguration, meaning that direct sub-hashes are merged instead of overwritten.
Creates a new PDF
Document object for the given file.
Depending on whether a block is provided, the functionality is different:
If no block is provided, the whole file is instantly read into memory and the PDF
Documentcreated for it is returned.
If a block is provided, the file is opened and a PDF
Documentis created for it. The created document is passed as an argument to the block and when the block returns the associated file object is closed. The value of the block will be returned.
The block version is useful, for example, when you are dealing with a large file and you only need a small portion of it.
The provided keyword arguments (except
io) are passed on unchanged to
Public Instance Methods
Caches and returns the given
value or the value of the given block using the given
key arguments as composite cache key.
If a cached value already exists and
false, the cached value is just returned. If
update is set to
true, an update of the cached value is forced.
This facility can be used to cache expensive operations in PDF objects that are easy to compute again.
clear_cache to clear the cache if necessary.
true if there is a value cached for the composite key consisting of the given
Deletes the indirect object specified by an exact reference or by an object number from the document.
Dereferences the given object.
Returns the object itself if it is not a reference, or the indirect object specified by the reference.
Destinations object that provides convenience methods for working with destination objects.
Yields every object and the revision it is in.
true, only the current version of each object is yielded, otherwise all objects from all revisions. Note that it is normally not necessary or useful to retrieve all objects from all revisions and if it is still done that care has to be taken to avoid an invalid document state.
true, only the already loaded objects are yielded.
For details see
Encrypts the document.
Encryption is done by setting up a security handler for this purpose and populating the trailer's Encrypt dictionary accordingly. The actual encryption, however, is only done when writing the document.
The security handler used for encrypting is selected via the
name argument. All other arguments are passed on the security handler.
If the document should not be encrypted, the
name argument has to be set to
nil. This removes the security handler and deletes the trailer's Encrypt dictionary.
Encryption::StandardSecurityHandler::EncryptionOptions for possible encryption options.
true if the document is encrypted.
Files object that provides convenience methods for working with embedded files.
Fonts object that provides convenience methods for working with the fonts used in the PDF file.
Images object that provides convenience methods for working with images (e.g. adding them to the PDF or listing them).
true if the the document contains an indirect object for the given exact reference (see
Reference) or for the given object number.
Returns the security handler that is used for decrypting or encrypting the document, or
nil if none is set.
If the document was created by reading an existing file and the document was automatically decrypted, then this method returns the handler for decrypting.
encryptmethod is called, the specified security handler for encrypting is returned.
Signs the document and writes it to the given file or IO object.
For details on the arguments
The signing handler to be used is determined by the
handler argument together with the rest of the keyword arguments (see
DigitalSignature::Signatures#signing_handler for details).
If not changed, the default signing handler is
Note: Once signing is done the document cannot be changed anymore since it was written during the signing process. If a document needs to be signed multiple times, it needs to be loaded again afterwards.
DigitalSignature::Signatures object that allows working with the digital signatures of this document.
true if the document is signed, i.e. contains digital signatures.
Validates all current objects, or, if
true, only loaded objects, with optional auto-correction, and returns
true if everything is fine.
If a block is given, it is called on validation problems.
Object#validate for more information.
Returns the PDF document's version as string (e.g. '1.4').
This method takes the file header version and the catalog's /Version key into account. If a version has been set manually and the catalog's /Version key refers to a later version, the later version is used.
See: PDF2.0 s7.2.2
Sets the version of the PDF document.
value must be a string in the format 'M.N' where M is the major version and N the minor version (e.g. '1.4' or '2.0').
Wraps the given object inside a
HexaPDF::Object (sub)class which allows one to use convenience functions to work with the object.
obj argument can also be a
HexaPDF::Object object so that it can be re-wrapped if necessary.
The class of the returned object is always a subclass of
HexaPDF::Object (or of
stream is given). Which subclass is used, depends on the values of the
subtype options as well as on the 'object.type_map' and 'object.subtype_map' global configuration options:
typeis used to try to determine the class. If it is not provided and if
objis a hash with a :Type field, the value of this field is used instead. If the resulting object is already a Class object, it is used, otherwise the type is looked up in 'object.type_map'.
subtypeis provided or can be determined because
objis a hash with a :Subtype or :S field, the type and subtype together are used to look up a special subtype class in 'object.subtype_map'.
Additionally, if there is no
subtype, all required fields of the subtype class need to have values; otherwise the subtype class is not used. This is done to better prevent invalid mappings when only partial knowledge (:Type key is missing) is available.
If there is no valid class after the above steps,
HexaPDF::Streamis used if a stream is given,
HexaPDF::Dictionaryif the given object is a hash,
HexaPDF::PDFArrayif it is an array or else
(Symbol or Class) The type of a PDF object that should be used for wrapping. This could be, for example, :Pages. If a class object is provided, it is used directly instead of determining the class through the type detection system.
(Symbol) The subtype of a PDF object which further qualifies a type. For example, image objects in PDF have a type of :XObject and a subtype of :Image.
(Integer) The object number that should be set on the wrapped object. Defaults to 0 or the value of the given object's object number.
(Integer) The generation number that should be set on the wrapped object. Defaults to 0 or the value of the given object's generation number.
StreamData) The stream object which should be set on the wrapped object.
Writes the document to the given file (in case
io is a String) or IO stream.
Before the document is written, it is validated using
validate and an error is raised if the document is not valid. However, this step can be skipped if needed.
Use the incremental writing mode which just adds a new revision to an existing document. This is needed, for example, when modifying a signed PDF and the original signature should stay valid.
See: PDF2.0 s7.5.6
Validates the document and raises an error if an uncorrectable problem is found.
Updates the /ID field in the trailer dictionary as well as the /ModDate field in the trailer's /Info dictionary so that it is clear that the document has been updated.
Optimize the file size by using object and cross-reference streams. This will raise the PDF version to at least 1.5.