PDF is mainly used as format that provides consistent output regardless of the output device. However, it also provides various interactive features, one of them being support for forms.
AcroForm vs XFA Forms
The PDF specification provides two different ways for representing forms: AcroForm and XFA forms:
AcroForms are static forms where each form field is predefined with respect to its size, possible values and so on. These types of forms have been in the PDF specification since PDF 1.2 and have broad support among PDF reader applications. When speaking of an interactive form we always mean an AcroForm.
XFA forms (Adobe XML Forms Architecture) were introduce with PDF 1.5 and are much more advanced. They allow, for example, that fields are dependent on other fields and that text fields can vary in size, possibly adding pages to the document. XFA forms have been deprecated with PDF 2.0.
XFA forms need much more functionality in a PDF reader application than AcroForm forms. Due to this support for XFA forms is only available in certain commercial software applications.
Since XFA forms are already deprecated, HexaPDF only has support for interactive forms.
Interactive Forms (AcroForm)
An interactive form consists of the main form dictionary, form fields and widget annotations. Together they define the structure and visible appearance of the form.
The main form dictionary references the root fields which in turn can reference child fields. This allows one to build a hierarchy of fields and to inherit attributes from parent fields. Fields without child fields are called terminal fields.
These terminal fields can have a visible appearance which is provided by a widget annotation. Each field can have zero, one or more associated widgets.
Main Interactive Form Dictionary
The main form dictionary can be referenced from the document catalog via the
/AcroForm key (see
HexaPDF::Type::Catalog#acro_form). It is implemented by the HexaPDF::Type::AcroForm::Form class.
It only provides a few entries, the most important of which are:
/Fieldscontains the array of root fields. See the various methods on the form class on how to access and modify form fields.
/NeedAppearancesdefines whether appearances should be constructed by the PDF reader application. This is useful for libraries/applications which can’t do this due to the added complexity. They just set this key to
trueand the reader application constructs all appearances.
/DA: The former is a dictionary containing the default resources (like fonts, color spaces, …) that should be used when constructing appearances. The latter defines a “default appearance string” that defines, at least, the font and font size to be used when creating text field appearances. The two keys together allow a PDF reader application to convert text input by a user into a proper PDF content stream.
The form dictionary object is the main entry point for handling interactive forms with HexaPDF. It allows you to list, modify, create and delete the form fields. By relying on the provided convenience methods all the tedious but needed book-keeping is done behind the scenes.
A form field dictionary contains, among other things, the type of the field, its name and its value.
There are four main types of fields which are further sub-divided:
- Button fields
These fields represent interactive controls that a user can manipulate with a mouse.
A button field may be a push button (something to click which produces a result immediately), a check box (for toggling between two states) or a radio button (typically one button in a set can be turned on).
- Text fields
These fields allow the user to input text from the keyboard.
The text can be entered into a single-line or multi-line field and there is also the possibility for rich text strings which allow inline formatting of the text.
- Choice fields
These fields contain several text items of which the user can select one or more.
A choice field may be presented as a scrollable list box or a combo box. The latter also allows the user to input a value other than the predefined ones.
- Signature fields
These fields represent digital signatures and optional data for authenticating the signer name and the document’s contents.
Each field has a unique full name consisting of all the partial names connected with dots, i.e. “parent.child.terminal”. This is possible because fields may be nested and the leafs of this field tree are called terminal fields. All other fields are non-terminal fields.
It is possible that two different field instances have the same full field name. In such a case
those two field instances actually represent the same field. This is most often the case when the
widget annotation is embedded in the field instances instead of using the
The visual appearance is defined by associated widget annotations. Each terminal field can have zero, one or more associated widgets. For example, each widget annotation of a radio button field describes one possible selection value. Another use for multiple widget annotations is on a multi-page form where a name entered by the user should appear in a header or footer on every page.
Text and choice fields are so called variable text fields whose visual appearance mainly consists
of their text value. To create such an appearance it must be known what font and font size should be
used. This is handled by the
/DA dictionary field of the field or, if not set, by the
dictionary field of the main AcroForm dictionary. The value of the
/DA key has to at least specify
the font name and font size, with the font name being resolved to a font object through the
key of the main AcroForm dictionary.
HexaPDF currently cannot handle variable text fields using an arbitrary font. The reason for this is that HexaPDF only uses the font information stored in the PDF itself and does not reference or load fonts stored on the host in case the font is not usable (e.g. because the character to glyph mapping was removed from the embedded font program).
The standard PDF fonts Helvetica, Times and Courier work correctly and those are used in most interacctive forms. For all other fonts a fallback font configured through the ‘acro_form.fallback_font` configuration option will be used.
A widget annotation describes the visual appearance of a form field on a page. It is implemented by HexaPDF::Type::Annotations::Widget.
As with all other annotations the widgets placement on the page is specified by the
/Rect key and
the visual appearance by the
Additionally, each widget can specify a background color and border style and, depending on the type of the associated field, other properties.
When using HexaPDF you don’t have to worry about the visual appearance. HexaPDF creates the needed
appearance streams automatically using a default style similar to those found in popular PDF reader
applications (see HexaPDF::Type::AcroForm::AppearanceGenerator). This is done by setting the
needed widget annotation and field properties when the widget is created. Later these properties are
used during the creation of the appearance (like some PDF readers would do when the
/NeedAppearances key on the main form object is set).
You can naturally provide the appearance streams yourself if needed since those are just Form XObjects.
It is also possible to force the creation of appearance streams even if there are existing ones. This is useful, for example, if an interactive form was filled out with a PDF reader that created bad or invalid visual appearances. Note that existing appearances for button fields are not deleted because they could potentially be reused somewhere.
Form flattening is the process of converting the whole interactive form or only some fields into a static representation that is not changable anymore.
When a field is flattened all its widget annotations are flattened, meaning, their appearances are embedded into the page’s content and the widget annotations themselves are removed. Furthermore, the field itself is removed from the field tree.
Flattening can be achieved via HexaPDF::Type::AcroForm::Form#flatten.
General Sequence When Creating a Form
If you want to create a PDF containing an interactive form, the general sequence of instructions is describe below. Whether you create the non-form parts of the pages before, during or after form creation is your choice. Most commenly, however, the form fields and their widgets are created together with the rest of the document because that makes it easier to get the needed information like the annotation rectangle.
First you create the main form dictionary using
HexaPDF::Type::Catalog#acro_form(create: true). Either store the form object somewhere or just use the same method call to retrieve it later.
Next, use one of the various
#create_<FIELD>methods of the form object to create a field. Some often needed properties can be set directly with this invocation. Set additionally needed field properties using the various field methods.
Create a widget using the
#create_widgetmethod of the field. When creating a button field, some properties are set by default to style the button field with a default appearance (otherwise it would be invisible).
By using HexaPDF::Type::Annotations::Widget#background_color, HexaPDF::Type::Annotations::Widget#border_style and HexaPDF::Type::Annotations::Widget#marker_style you can change the widget’s appearance.
Set the field’s value which will update all associated widgets to reflect that value.
If you add additional widgets later, either manually call the field’s
afterwards or rely on the validation pass when writing out the PDF.
Note, however, that if you change the appearance settings of a widget later, you need to force the creation of new appearance streams as this is not done automatically.