module HexaPDF:: Layout:: TextLayouter:: SimpleTextSegmentation
Implementation of a simple text segmentation algorithm.
The algorithm breaks TextFragment objects into objects wrapped by Box, Glue or Penalty items, and inserts additional Penalty items when needed:
-
Any valid Unicode newline separator inserts a Penalty object describing a mandatory break.
-
Spaces and tabulators are wrapped by Glue objects, allowing breaks.
-
Non-breaking spaces are wrapped into Penalty objects that prohibit line breaking.
-
Hyphens are attached to the preceeding text fragment (or are a standalone text fragment) and followed by a Penalty object to allow a break.
-
If a soft-hyphens is encountered, a hyphen wrapped by a Penalty object is inserted to allow a break.
-
If a zero-width-space is encountered, a Penalty object is inserted to allow a break.
Constants
- BREAK_CHARS¶
Breaks are detected at: space, tab, zero-width-space, non-breaking space, hyphen, soft-hypen and any valid Unicode newline separator
Public Class Methods
Breaks the items (an array of InlineBox and TextFragment objects) into atomic pieces wrapped by Box, Glue or Penalty items, and returns those as an array.