With the ground work for document layout management done in the 0.8.0 release the focus was shifted
to the refinement of these features and to the actual document layout functionality.
The major changes for this release are the document composition functionality, incremental PDF
writing, a CLI command for splitting PDF files and compatibility with Ruby 2.6.
One of the goals of HexaPDF is to provide high-level functionality for document composition. With
this release the initial working version of this goal has been achieved with the new
This class uses the functionality introduced in the last release (e.g. frames, text boxes, …) to
make creating a PDF document as easy as it is with other libraries, for example Prawn. Here is a
HexaPDF::Composer.create('hello-world.pdf') do |pdf|
pdf.text("Hello World!", font_size: 50, align: :center, valign: :center)
Text (or more generally every box) is layed out from top-to-bottom and can be flown around objects
that have been placed on the page before. This makes it easy, for example, to flow text around
images (see the new composer example). Also, if some box doesn’t fit on a page or can’t be
split, a new page is automatically created.
Arbitrary drawing operations can still be performed by using the HexaPDF::Content::Canvas object
that is provided by the composer.
Starting with this version HexaPDF supports incremental writing. Writing a PDF document in
incremental mode means that the new or changed content is just appended to the original PDF. This is
used, for example, if the original document was cryptographically signed so as to not invalidate the
Incremental writing in HexaPDF is not perfect in the sense that it doesn’t completely minimize the
amount of data that gets written. The reason for this is HexaPDF’s automatic conversion of hash
values. For example, PDF dates (which are stored as strings) are automatically converted to Ruby
date objects on access, making the comparison fail even though there are no differences when
Splitting PDF Files
hexapdf command line application already has a command for merging files but the reverse was
missing. So this version brings the
split command that can do exactly that.
As an example, consider the following:
hexapdf split input.pdf out_%02d.pdf. This would split the
input.pdf file page by page and generate files of the form
out_02.pdf and so on.
There were some other changes and bug fixes, the most noteworthy are:
Usage of some non-described stdlib behaviour was fixed to make HexaPDF compatible with Ruby 2.6.
Text boxes now respect width/height/padding/border when fitting.
Variable width line wrapping now correctly considers line spacing when determining line width.
As always, have a look at the changelog for an overview of all changes.
And Happy New Year!
The last release, 0.7.0, was done to fix some issues and didn’t include any of the major changes
since 0.6.0. With the 0.8.0 release these major changes are now incorporated into HexaPDF and lay
the groundwork for document layouting.
So what is new for HexaPDF?
The 0.6.0 release enhanced the base box layout class with many more styling properties but it still
wasn’t possible to easily position such boxes. So naturally the next step was to design how they
would be layed out on a page.
In HexaPDF layout boxes represent all things that should be put on a page, be it text (like
headings and paragraphs), images or other content. Typically (at least with western languages), the
boxes are layed out from top to bottom and left to right. So the easiest thing would be to define a
“cursor” position that represents the vertical position to place the current box and after placing
it, move the cursor downwards.
However, with such a model you cannot easily do things like having a picture and flowing text around
it. Or putting a box into the middle of the page and then flowing the boxes around this hole. Or
having non-rectangular regions to put boxes in.
Therefore I decided to do things differently:
To layout boxes HexaPDF uses frames where a frame is a set of
rectilinear polygons (i.e. only consisting of vertical and horizontal lines).
When a box gets placed in a frame, the occupied region is subtracted from the frame’s polygons,
resulting in new polygons for the frame.
Each frame knows how to find the position where the next box should be placed (the top-most,
left-most point) and how big the available rectangular region at that position is.
If a box doesn’t fit, the frame also knows how to calculate the next possible position and region.
Doing things in this way seems very complicated at first but it certainly makes some things very
A frame can have any form and can contain holes. This is the reason a frame can potentially
consist of multiple polygons.
Boxes can be styled using the new
position_hint style properties to remove the whole
vertical stripe in which they are in, imitating the basic cursor model.
And these style properties can also make a box “float” to the left or right, like blocks in HTML,
removing only their occupied region.
I also hope that this design will make future additions straightforward (like, for example, multi
To make working with the frame’s polygons easier, I created a new rubygem called geom2d. There
were some Ruby libraries available that define basic 2D geometric objects and some algorithms but
none really fit the needs. The goal of geom2d was to provide a polygon class and an algorithm for
intersecting two polygons. To this end I read several papers on boolean operations on polygons and
then implemented one of them.
The HexaPDF::Layout::TextLayouter class has been enhanced as well to allow placing text in an
arbitrary polygon. This is needed because otherwise placing text in a frame and flowing it
according to the frame’s polygon wouldn’t be possible.
So, now that I have bored you with the technical details, have a look at the new examples to see how
this can be put to good use:
Mind you that doing things like this is still not really “high level” since there are some essential
things missing, like splitting a box if it doesn’t fit (think of text at the bottom of a page), not
to mention high level constructs like tables.
The next step is to provide a class that abstracts the composition aspects so that one can say: Here
N styled boxes, here are the definitions of the frames that should be used (e.g. a special
frame definition for the first page and a common definition for all other pages), lay out the boxes
while creating pages as needed.
As always, have a look at the changelog for an overview of all changes.
The last release brought some cool new features with respect to advanced text layout. This release
builds on that and refines the implementation for future features.
Support HexaPDF — Become a Patron!
The supported text and box styling properties have been greatly expanded (also see the expanded
Boxes now support CSS-like padding, margins and borders as well as background colors. And
since the inline boxes are now based on a proper box implementation, they can use these new
Text fragments now support text color and opacity, text rendering modes (e.g. only showing the
text outlines), superscript, subscript, underlining and strikeout.
Additonally, boxes as well as text fragments now support pre-defined and custom
underlays/overlays. This is the basis for the implemented support for links to other pages,
external files and URIs.
Apart from these layout related changes, support for some types of PDF actions and annotations
has been added, e.g. the action for opening an URI and the link annotation type.
As always, have a look at the changelog for an overview of all changes. And if you have a
request, drop me an e-mail or open an issue!
Last but not least, I have not yet decided on which parts I will be working on in the coming
months. So if you need some functionality sooner than later, head over to my patreon
page to fill out the poll!
And while you are already there, you might wanna become a patron! 😉