module HexaPDF::Layout::TextLayouter::SimpleTextSegmentation

Implementation of a simple text segmentation algorithm.

The algorithm breaks TextFragment objects into objects wrapped by Box, Glue or Penalty items, and inserts additional Penalty items when needed:

  • Any valid Unicode newline separator inserts a Penalty object describing a mandatory break.

    See www.unicode.org/reports/tr18/#Line_Boundaries

  • Spaces and tabulators are wrapped by Glue objects, allowing breaks.

  • Non-breaking spaces are wrapped into Penalty objects that prohibit line breaking.

  • Hyphens are attached to the preceeding text fragment (or are a standalone text fragment) and followed by a Penalty object to allow a break.

  • If a soft-hyphens is encountered, a hyphen wrapped by a Penalty object is inserted to allow a break.

  • If a zero-width-space is encountered, a Penalty object is inserted to allow a break.

Constants

BREAK_RE

Breaks are detected at: space, tab, zero-width-space, non-breaking space, hyphen, soft-hypen and any valid Unicode newline separator

Public Class Methods

call(items)

Breaks the items (an array of InlineBox and TextFragment objects) into atomic pieces wrapped by Box, Glue or Penalty items, and returns those as an array.