Interactive Forms

PDF is mainly used as format that provides consistent output regardless of the output device. However, it also provides various interactive features, one of them being support for forms.

AcroForm vs XFA Forms

The PDF specification provides two different ways for representing forms: AcroForm and XFA forms:

XFA forms need much more functionality in a PDF reader application than AcroForm forms. Due to this support for XFA forms is only available in certain commercial software applications.

Since XFA forms are already deprecated, HexaPDF only has support for interactive forms.

Interactive Forms (AcroForm)

An interactive form consists of the main form dictionary, form fields and widget annotations. Together they define the structure and visible appearance of the form.

The main form dictionary references the root fields which in turn can reference child fields. This allows one to build a hierarchy of fields and to inherit attributes from parent fields. Fields without child fields are called terminal fields.

These terminal fields can have a visible appearance which is provided by a widget annotation. Each field can have zero, one or more associated widgets.

Main Interactive Form Dictionary

The main form dictionary can be referenced from the document catalog via the /AcroForm key (see HexaPDF::Type::Catalog#acro_form). It is implemented by the HexaPDF::Type::AcroForm::Form class.

It only provides a few entries, the most important of which are:

The form dictionary object is the main entry point for handling interactive forms with HexaPDF. It allows you to list, modify, create and delete the form fields. By relying on the provided convenience methods all the tedious but needed book-keeping is done behind the scenes.

Form Fields

A form field dictionary contains, among other things, the type of the field, its name and its value.

There are four main types of fields which are further sub-divided:

Button fields

These fields represent interactive controls that a user can manipulate with a mouse.

A button field may be a push button (something to click which produces a result immediately), a check box (for toggling between two states) or a radio button (typically one button in a set can be turned on).

See HexaPDF::Type::AcroForm::ButtonField.

Text fields

These fields allow the user to input text from the keyboard.

The text can be entered into a single-line or multi-line field and there is also the possibility for rich text strings which allow inline formatting of the text.

See HexaPDF::Type::AcroForm::TextField.

Choice fields

These fields contain several text items of which the user can select one or more.

A choice field may be presented as a scrollable list box or a combo box. The latter also allows the user to input a value other than the predefined ones.

See HexaPDF::Type::AcroForm::ChoiceField.

Signature fields

These fields represent digital signatures and optional data for authenticating the signer name and the document’s contents.

Each field has a unique full name consisting of all the partial names connected with dots, i.e. “parent.child.terminal”. This is possible because fields may be nested and the leafs of this field tree are called terminal fields. All other fields are non-terminal fields.

It is possible that two different field instances have the same full field name. In such a case those two field instances actually represent the same field. This is most often the case when the widget annotation is embedded in the field instances instead of using the /Kids.

The visual appearance is defined by associated widget annotations. Each terminal field can have zero, one or more associated widgets. For example, each widget annotation of a radio button field describes one possible selection value. Another use for multiple widget annotations is on a multi-page form where a name entered by the user should appear in a header or footer on every page.

Text and choice fields are so called variable text fields whose visual appearance mainly consists of their text value. To create such an appearance it must be known what font and font size should be used. This is handled by the /DA dictionary field of the field or, if not set, by the /DA dictionary field of the main AcroForm dictionary. The value of the /DA key has to at least specify the font name and font size, with the font name being resolved to a font object through the /DR key of the main AcroForm dictionary.

HexaPDF currently cannot handle variable text fields using an arbitrary font. The reason for this is that HexaPDF only uses the font information stored in the PDF itself and does not reference or load fonts stored on the host in case the font is not usable (e.g. because the character to glyph mapping was removed from the embedded font program).

The standard PDF fonts Helvetica, Times and Courier work correctly and those are used in most interacctive forms. For all other fonts a fallback font configured through the ‘acro_form.fallback_font` configuration option will be used.

JavaScript Actions

It is possible to use JavaScript actions together with form fields, for example for formatting a value in a certain way or for calculating the value of a form field.

Implementing full support for JavaScript actions would entail using a full-blown JavaScript engine. This is not supported by HexaPDF. However, field actions can use certain, pre-defined JavaScript functions that are supported by HexaPDF.

There are four special action types for form fields and of those four the “format” and “calculate” actions are currently supported:

The module HexaPDF::Type::AcroForm::JavaScriptActions contains the implementations of the supported JavaScript functions, like formatting numbers and calculating values based on a few pre-defined functions.

Widget Annotations

A widget annotation describes the visual appearance of a form field on a page. It is implemented by HexaPDF::Type::Annotations::Widget.

As with all other annotations the widgets placement on the page is specified by the /Rect key and the visual appearance by the /AS and /AP keys.

Additionally, each widget can specify a background color and border style and, depending on the type of the associated field, other properties.

When using HexaPDF you don’t have to worry about the visual appearance. HexaPDF creates the needed appearance streams automatically using a default style similar to those found in popular PDF reader applications (see HexaPDF::Type::AcroForm::AppearanceGenerator). This is done by setting the needed widget annotation and field properties when the widget is created. Later these properties are used during the creation of the appearance (like some PDF readers would do when the /NeedAppearances key on the main form object is set).

The appearance generator also sets the “print” flag on the widget annotations and removes a possibly set “hidden” flag. The first is done so that the field value appears on print outs and the second so that generated appearances are actually visible (some interactive forms toggle the “hidden” flag based on a JavaScript action but those aren’t universally supported by HexaPDF, so it is better to generally remove the “hidden” flag).

You can naturally provide the appearance streams yourself if needed since those are just Form XObjects.

It is also possible to force the creation of appearance streams even if there are existing ones. This is useful, for example, if an interactive form was filled out with a PDF reader that created bad or invalid visual appearances. Note that existing appearances for button fields are not deleted because they could potentially be reused somewhere.

Form Flattening

Form flattening is the process of converting the whole interactive form or only some fields into a static representation that is not changable anymore.

When a field is flattened all its widget annotations are flattened, meaning, their appearances are embedded into the page’s content and the widget annotations themselves are removed. Furthermore, the field itself is removed from the field tree.

Flattening can be achieved via HexaPDF::Type::AcroForm::Form#flatten.

General Sequence When Creating a Form

If you want to create a PDF containing an interactive form, the general sequence of instructions is describe below. Whether you create the non-form parts of the pages before, during or after form creation is your choice. Most commenly, however, the form fields and their widgets are created together with the rest of the document because that makes it easier to get the needed information like the annotation rectangle.

The sequence:

If you add additional widgets later, either manually call the field’s #update_widgets method afterwards or rely on the validation pass when writing out the PDF.

Note, however, that if you change the appearance settings of a widget later, you need to force the creation of new appearance streams as this is not done automatically.