module HexaPDF:: Layout:: TextLayouter:: SimpleTextSegmentation
Implementation of a simple text segmentation algorithm.
The algorithm breaks TextFragment objects into objects wrapped by Box, Glue or Penalty items, and inserts additional Penalty items when needed:
-
Any valid Unicode newline separator inserts a
Penaltyobject describing a mandatory break. -
Spaces and tabulators are wrapped by
Glueobjects, allowing breaks. -
Non-breaking spaces are wrapped into
Penaltyobjects that prohibit line breaking. -
Hyphens are attached to the preceeding text fragment (or are a standalone text fragment) and followed by a
Penaltyobject to allow a break. -
If a soft-hyphens is encountered, a hyphen wrapped by a
Penaltyobject is inserted to allow a break. -
If a zero-width-space is encountered, a
Penaltyobject is inserted to allow a break.
Constants
- BREAK_CHARS¶
Breaks are detected at: space, tab, zero-width-space, non-breaking space, hyphen, soft-hypen and any valid Unicode newline separator