HexaPDF 0.6.0 - Code Refinements

Published on

The last release brought some cool new features with respect to advanced text layout. This release builds on that and refines the implementation for future features.

Support HexaPDF —   Become a Patron!

The supported text and box styling properties have been greatly expanded (also see the expanded styling example):

  • Boxes now support CSS-like padding, margins and borders as well as background colors. And since the inline boxes are now based on a proper box implementation, they can use these new styles, too!

  • Text fragments now support text color and opacity, text rendering modes (e.g. only showing the text outlines), superscript, subscript, underlining and strikeout.

Additonally, boxes as well as text fragments now support pre-defined and custom underlays/overlays. This is the basis for the implemented support for links to other pages, external files and URIs.

Apart from these layout related changes, support for some types of PDF actions and annotations has been added, e.g. the action for opening an URI and the link annotation type.

As always, have a look at the changelog for an overview of all changes. And if you have a request, drop me an e-mail or open an issue!

Last but not least, I have not yet decided on which parts I will be working on in the coming months. So if you need some functionality sooner than later, head over to my patreon page to fill out the poll!
And while you are already there, you might wanna become a patron! 😉

HexaPDF 0.5.0 - Advanced Text Layout

Published on

HexaPDF continues to grow and mature, with this release bringing advanced text layout as first step into providing full document layout features.

Support HexaPDF —   Become a Patron!

Advanced text layout means that HexaPDF is now able to:

  • Apply kerning and ligatures to text, with the possibility to easily add other positioning or substitution steps (e.g. for correctly positioning diacritical marks)

  • Apply different styles to different parts of the text of a paragraph (example)

  • Wrap lines while supporting special characters like non-breaking spaces, soft-hyphens and zero-width spaces (example)

  • Use arbitrarily shaped boxes for text layout (example)

  • Align text horizontally and vertically, e.g. left, center, right and justify; and top, center, bottom (example)

  • Mix text and inline boxes, e.g. for showing images or arbitrary drawings together with text (example)

  • Calculate the height of a text box without drawing it, or limiting the height and retrieving the overflowing items

In essence HexaPDF::Layout::TextBox together with the other classes in HexaPDF::Layout is similar to Prawn’s formatted text box implementation. However, HexaPDF still lacks some text box features like text colors, links or underlining. This will be fixed with a future release.

To see how HexaPDF’s implementation compares to Prawn’s in terms of performance I adapted the text rendering benchmark to use the text box implementations. The results (see the text box benchmark for details and caveats) are rather promising, with HexaPDF being about 10 times faster than Prawn!

HexaPDF’s text box implementation can already be used to compose whole documents but it is still only another stepping stone on the way to full document layout features. There are major parts missing for this, like automatic page breaking, tables, column layout and a composition class to make using all these parts easier.

As always, have a look at the changelog for an overview of all changes. And if you have a request, drop me an e-mail or open an issue!

Simple Text Metrics

Published on

The first simple step for HexaPDF to gain advanced text functionality is to use the available font and glyph metrics to correctly determine the width and height of a piece of text. Note that I will only talk about text laid out horizontally, not vertically.

Each font provides for each glyph its advance width (i.e. the distance between the current origin of the text coordinate system and the origin for the next glyph; note that when positioning glyphs the text coordinate system is transformed so that its origin coincides with the origin of the glyph coordinate system) and its bounding box (i.e. the tightest box around the visible glyph outlines).

All these metrics are defined for a glyph of unit proportions. To make a text larger or smaller the font size is used.

In addition the PDF specification provides four text modification operators that influence the width and height of text:

  • character spacing (additional spacing between characters)
  • word spacing (like character spacing but only for ASCII spaces, i.e. this additional spacing is applied only to glyphs that have a single byte encoding equal to ASCII space)
  • horizontal scaling (a stretch factor applied to all horizontal measurements, so not only to glyphs but also to character spacing, for example.
  • text rise (the vertical offset from the baseline)

Finally, kerning values (which work similar to character spacing but only for adjacent glyphs) can change the width of text.

With this information it is possible to determine the width and height of a piece of text. Once this information is available, it is possible to use box and text layouting algorithms to create lines of text laid out in a rectangular area.

Here is an image of what can currently be done (see the gist for details):

text metrics

  • The green boxes show the text boxes themselves. The red boxes are the tightest boxes around all glyphs of each text. And the blue lines show the baselines of the texts.
  • The first row is just text, with no additional modifications or kerning.
  • The second row shows the application of some text modification operators.
  • The third row shows text with kerning values within the text and at the boundaries. This also shows that the text can spill over the text box boundaries.
  • The fourth row shows the result of applying various positive and negative text rise values, as indicated by the different positions of the baseline.
  • Finally, the last row shows how these text metrics can be used to correctly position text fragments along a continuous baseline.

Next step: Use the functionality of the now implemented text fragments to implement line fragment objects, and implement a basic text shaping class.