Show Character Bounding Boxes
This examples shows how to process the contents of a page. It finds all characters on a page and surrounds them with their bounding box. Additionally, all consecutive text runs are also surrounded by a box.
The code provides two ways of generating the boxes. The commented part of
ShowTextProcessor#show_text
uses a polyline since some characters may be
transforemd (rotated or skewed). The un-commented part uses rectangles which
is faster and correct for most but not all cases.
- Usage:
ruby show_char_boxes.rb INPUT.PDF
Code
require 'hexapdf'
class ShowTextProcessor < HexaPDF::Content::Processor
def initialize(page)
super()
@canvas = page.canvas(type: :overlay)
end
def show_text(str)
boxes = decode_text_with_positioning(str)
return if boxes.string.empty?
@canvas.line_width = 1
@canvas.stroke_color(224, 0, 0)
# Polyline for transformed characters
#boxes.each {|box| @canvas.polyline(*box.points).close_subpath.stroke}
# Using rectangles is faster but not 100% correct
boxes.each do |box|
x, y = *box.lower_left
tx, ty = *box.upper_right
@canvas.rectangle(x, y, tx - x, ty - y).stroke
end
@canvas.line_width = 0.5
@canvas.stroke_color(0, 224, 0)
@canvas.polyline(*boxes.lower_left, *boxes.lower_right,
*boxes.upper_right, *boxes.upper_left).close_subpath.stroke
end
alias :show_text_with_positioning :show_text
end
doc = HexaPDF::Document.open(ARGV.shift)
doc.pages.each_with_index do |page, index|
puts "Processing page #{index + 1}"
processor = ShowTextProcessor.new(page)
page.process_contents(processor)
end
doc.write('show_char_boxes.pdf', optimize: true)