HexaPDF 0.10.0 and New Website Published on

This minor release brings some enhancements to the hexapdf command line application: more information when listing images with the hexapdf images command and a completely revamped hexapdf inspect command with much more functionality. If you wanna see the inspection command in action, have a look the “Analysing PDFs” how-to guide.

There are also bug fixes for various reported problems, most notably for loading and saving of encrypted and signed PDFs and for handling unknown/wrongly structured PDF objects better.

As always, have a look at the changelog for an overview of all changes.

In addition to these changes to the library, you might have noticed already that HexaPDF also got a new website:

  • HexaPDF now finally has a logo that was inspired by Adobe’s PDF logo as well as hexagons (naturally) - yeah!

  • The documentation part of the website has been restructured and extended to make it easier to find what you need. There are now tutorials, key topics and how-to guides as well as a reference section. These sections will be filled with more content in the future.

  • The styling of the website has also been adjusted. For example, there are now table of content menus for most pages and a breadcrumb trail for navigation.

All in all the website should provide a better experience now. If you find problems or have suggestions, please report them - thank you!

HexaPDF 0.9.0 - Document Layout Published on

With the ground work for document layout management done in the 0.8.0 release the focus was shifted to the refinement of these features and to the actual document layout functionality.

The major changes for this release are the document composition functionality, incremental PDF writing, a CLI command for splitting PDF files and compatibility with Ruby 2.6.

Document Composition

One of the goals of HexaPDF is to provide high-level functionality for document composition. With this release the initial working version of this goal has been achieved with the new HexaPDF::Composer class.

This class uses the functionality introduced in the last release (e.g. frames, text boxes, …) to make creating a PDF document as easy as it is with other libraries, for example Prawn. Here is a simple example:

require 'hexapdf'

HexaPDF::Composer.create('hello-world.pdf') do |pdf|
  pdf.text("Hello World!", font_size: 50, align: :center, valign: :center)
end

Text (or more generally every box) is layed out from top-to-bottom and can be flown around objects that have been placed on the page before. This makes it easy, for example, to flow text around images (see the new composer example). Also, if some box doesn’t fit on a page or can’t be split, a new page is automatically created.

Arbitrary drawing operations can still be performed by using the HexaPDF::Content::Canvas object that is provided by the composer.

Incremental Writing

Starting with this version HexaPDF supports incremental writing. Writing a PDF document in incremental mode means that the new or changed content is just appended to the original PDF. This is used, for example, if the original document was cryptographically signed so as to not invalidate the signature.

Incremental writing in HexaPDF is not perfect in the sense that it doesn’t completely minimize the amount of data that gets written. The reason for this is HexaPDF’s automatic conversion of hash values. For example, PDF dates (which are stored as strings) are automatically converted to Ruby date objects on access, making the comparison fail even though there are no differences when serializing.

Splitting PDF Files

The hexapdf command line application already has a command for merging files but the reverse was missing. So this version brings the split command that can do exactly that.

As an example, consider the following: hexapdf split input.pdf out_%02d.pdf. This would split the input.pdf file page by page and generate files of the form out_01.pdf, out_02.pdf and so on.

Other Changes

There were some other changes and bug fixes, the most noteworthy are:

  • Usage of some non-described stdlib behaviour was fixed to make HexaPDF compatible with Ruby 2.6.

  • Text boxes now respect width/height/padding/border when fitting.

  • Variable width line wrapping now correctly considers line spacing when determining line width.

As always, have a look at the changelog for an overview of all changes.

And Happy New Year!

HexaPDF 0.8.0 - Box Layout Published on

The last release, 0.7.0, was done to fix some issues and didn’t include any of the major changes since 0.6.0. With the 0.8.0 release these major changes are now incorporated into HexaPDF and lay the groundwork for document layouting.

So what is new for HexaPDF?

The 0.6.0 release enhanced the base box layout class with many more styling properties but it still wasn’t possible to easily position such boxes. So naturally the next step was to design how they would be layed out on a page.

In HexaPDF layout boxes represent all things that should be put on a page, be it text (like headings and paragraphs), images or other content. Typically (at least with western languages), the boxes are layed out from top to bottom and left to right. So the easiest thing would be to define a “cursor” position that represents the vertical position to place the current box and after placing it, move the cursor downwards.

However, with such a model you cannot easily do things like having a picture and flowing text around it. Or putting a box into the middle of the page and then flowing the boxes around this hole. Or having non-rectangular regions to put boxes in.

Therefore I decided to do things differently:

  • To layout boxes HexaPDF uses frames where a frame is a set of rectilinear polygons (i.e. only consisting of vertical and horizontal lines).

  • When a box gets placed in a frame, the occupied region is subtracted from the frame’s polygons, resulting in new polygons for the frame.

  • Each frame knows how to find the position where the next box should be placed (the top-most, left-most point) and how big the available rectangular region at that position is.

  • If a box doesn’t fit, the frame also knows how to calculate the next possible position and region.

Doing things in this way seems very complicated at first but it certainly makes some things very easy:

  • A frame can have any form and can contain holes. This is the reason a frame can potentially consist of multiple polygons.

  • Boxes can be styled using the new position and position_hint style properties to remove the whole vertical stripe in which they are in, imitating the basic cursor model.

  • And these style properties can also make a box “float” to the left or right, like blocks in HTML, removing only their occupied region.

I also hope that this design will make future additions straightforward (like, for example, multi column layout).

To make working with the frame’s polygons easier, I created a new rubygem called geom2d. There were some Ruby libraries available that define basic 2D geometric objects and some algorithms but none really fit the needs. The goal of geom2d was to provide a polygon class and an algorithm for intersecting two polygons. To this end I read several papers on boolean operations on polygons and then implemented one of them.

The HexaPDF::Layout::TextLayouter class has been enhanced as well to allow placing text in an arbitrary polygon. This is needed because otherwise placing text in a frame and flowing it according to the frame’s polygon wouldn’t be possible.

So, now that I have bored you with the technical details, have a look at the new examples to see how this can be put to good use:

Mind you that doing things like this is still not really “high level” since there are some essential things missing, like splitting a box if it doesn’t fit (think of text at the bottom of a page), not to mention high level constructs like tables.

The next step is to provide a class that abstracts the composition aspects so that one can say: Here are N styled boxes, here are the definitions of the frames that should be used (e.g. a special frame definition for the first page and a common definition for all other pages), lay out the boxes while creating pages as needed.

As always, have a look at the changelog for an overview of all changes.