HexaPDF 0.8.0 - Box Layout

Published on

The last release, 0.7.0, was done to fix some issues and didn’t include any of the major changes since 0.6.0. With the 0.8.0 release these major changes are now incorporated into HexaPDF and lay the groundwork for document layouting.

So what is new for HexaPDF?

The 0.6.0 release enhanced the base box layout class with many more styling properties but it still wasn’t possible to easily position such boxes. So naturally the next step was to design how they would be layed out on a page.

In HexaPDF layout boxes represent all things that should be put on a page, be it text (like headings and paragraphs), images or other content. Typically (at least with western languages), the boxes are layed out from top to bottom and left to right. So the easiest thing would be to define a “cursor” position that represents the vertical position to place the current box and after placing it, move the cursor downwards.

However, with such a model you cannot easily do things like having a picture and flowing text around it. Or putting a box into the middle of the page and then flowing the boxes around this hole. Or having non-rectangular regions to put boxes in.

Therefore I decided to do things differently:

  • To layout boxes HexaPDF uses frames where a frame is a set of rectilinear polygons (i.e. only consisting of vertical and horizontal lines).

  • When a box gets placed in a frame, the occupied region is subtracted from the frame’s polygons, resulting in new polygons for the frame.

  • Each frame knows how to find the position where the next box should be placed (the top-most, left-most point) and how big the available rectangular region at that position is.

  • If a box doesn’t fit, the frame also knows how to calculate the next possible position and region.

Doing things in this way seems very complicated at first but it certainly makes some things very easy:

  • A frame can have any form and can contain holes. This is the reason a frame can potentially consist of multiple polygons.

  • Boxes can be styled using the new position and position_hint style properties to remove the whole vertical stripe in which they are in, imitating the basic cursor model.

  • And these style properties can also make a box “float” to the left or right, like blocks in HTML, removing only their occupied region.

I also hope that this design will make future additions straightforward (like, for example, multi column layout).

To make working with the frame’s polygons easier, I created a new rubygem called geom2d. There were some Ruby libraries available that define basic 2D geometric objects and some algorithms but none really fit the needs. The goal of geom2d was to provide a polygon class and an algorithm for intersecting two polygons. To this end I read several papers on boolean operations on polygons and then implemented one of them.

The HexaPDF::Layout::TextLayouter class has been enhanced as well to allow placing text in an arbitrary polygon. This is needed because otherwise placing text in a frame and flowing it according to the frame’s polygon wouldn’t be possible.

So, now that I have bored you with the technical details, have a look at the new examples to see how this can be put to good use:

Mind you that doing things like this is still not really “high level” since there are some essential things missing, like splitting a box if it doesn’t fit (think of text at the bottom of a page), not to mention high level constructs like tables.

The next step is to provide a class that abstracts the composition aspects so that one can say: Here are N styled boxes, here are the definitions of the frames that should be used (e.g. a special frame definition for the first page and a common definition for all other pages), lay out the boxes while creating pages as needed.

As always, have a look at the changelog for an overview of all changes.

HexaPDF 0.6.0 - Code Refinements

Published on

The last release brought some cool new features with respect to advanced text layout. This release builds on that and refines the implementation for future features.

Support HexaPDF —   Become a Patron!

The supported text and box styling properties have been greatly expanded (also see the expanded styling example):

  • Boxes now support CSS-like padding, margins and borders as well as background colors. And since the inline boxes are now based on a proper box implementation, they can use these new styles, too!

  • Text fragments now support text color and opacity, text rendering modes (e.g. only showing the text outlines), superscript, subscript, underlining and strikeout.

Additonally, boxes as well as text fragments now support pre-defined and custom underlays/overlays. This is the basis for the implemented support for links to other pages, external files and URIs.

Apart from these layout related changes, support for some types of PDF actions and annotations has been added, e.g. the action for opening an URI and the link annotation type.

As always, have a look at the changelog for an overview of all changes. And if you have a request, drop me an e-mail or open an issue!

Last but not least, I have not yet decided on which parts I will be working on in the coming months. So if you need some functionality sooner than later, head over to my patreon page to fill out the poll!
And while you are already there, you might wanna become a patron! 😉

HexaPDF 0.5.0 - Advanced Text Layout

Published on

HexaPDF continues to grow and mature, with this release bringing advanced text layout as first step into providing full document layout features.

Support HexaPDF —   Become a Patron!

Advanced text layout means that HexaPDF is now able to:

  • Apply kerning and ligatures to text, with the possibility to easily add other positioning or substitution steps (e.g. for correctly positioning diacritical marks)

  • Apply different styles to different parts of the text of a paragraph (example)

  • Wrap lines while supporting special characters like non-breaking spaces, soft-hyphens and zero-width spaces (example)

  • Use arbitrarily shaped boxes for text layout (example)

  • Align text horizontally and vertically, e.g. left, center, right and justify; and top, center, bottom (example)

  • Mix text and inline boxes, e.g. for showing images or arbitrary drawings together with text (example)

  • Calculate the height of a text box without drawing it, or limiting the height and retrieving the overflowing items

In essence HexaPDF::Layout::TextBox together with the other classes in HexaPDF::Layout is similar to Prawn’s formatted text box implementation. However, HexaPDF still lacks some text box features like text colors, links or underlining. This will be fixed with a future release.

To see how HexaPDF’s implementation compares to Prawn’s in terms of performance I adapted the text rendering benchmark to use the text box implementations. The results (see the text box benchmark for details and caveats) are rather promising, with HexaPDF being about 10 times faster than Prawn!

HexaPDF’s text box implementation can already be used to compose whole documents but it is still only another stepping stone on the way to full document layout features. There are major parts missing for this, like automatic page breaking, tables, column layout and a composition class to make using all these parts easier.

As always, have a look at the changelog for an overview of all changes. And if you have a request, drop me an e-mail or open an issue!