module HexaPDF:: Layout:: TextLayouter:: SimpleTextSegmentation
Implementation of a simple text segmentation algorithm.
The algorithm breaks TextFragment
objects into objects wrapped by Box
, Glue
or Penalty
items, and inserts additional Penalty
items when needed:
-
Any valid Unicode newline separator inserts a
Penalty
object describing a mandatory break. -
Spaces and tabulators are wrapped by
Glue
objects, allowing breaks. -
Non-breaking spaces are wrapped into
Penalty
objects that prohibit line breaking. -
Hyphens are attached to the preceeding text fragment (or are a standalone text fragment) and followed by a
Penalty
object to allow a break. -
If a soft-hyphens is encountered, a hyphen wrapped by a
Penalty
object is inserted to allow a break. -
If a zero-width-space is encountered, a
Penalty
object is inserted to allow a break.
Constants
- BREAK_CHARS¶
Breaks are detected at: space, tab, zero-width-space, non-breaking space, hyphen, soft-hypen and any valid Unicode newline separator