hexapdf - A Versatile PDF Manipulation Application
SYNOPSIS
hexapdf
[OPTIONS
] command
[COMMAND OPTIONS
]…
DESCRIPTION
hexapdf is an application for PDF manipulation. It is part of the hexapdf library which also allows PDF creation, among other things.
Using the hexapdf application the following tasks can be performed with PDF files:
- Modifying an existing PDF file (see the
modify
command) - Merging multiple PDF files into one (see the
merge
command) - Splitting a PDF file into subsets (see the
split
command) - Optimizing the file size of a PDF file (see the
optimize
command) - Watermarking/Stamping a PDF onto another one (see the
watermark
command) - Filling out an interactive PDF form (see the
form
command) - Extracting embedded files (see the
files
command) - Extracting images (see the
images
command) - Converting images to PDF (see the
image2pdf
command) - Showing general information of a PDF file (see the
info
command) - Listing all fonts of a PDF file (see the
fonts
command) - Show space usage information of a PDF file (see the
usage
command) - Batch execution of a command on multiple PDF files (see the
batch
command) - Inspecting the internal structure of a PDF file (see the
inspect
command)
The application contains a built-in help
command that can be used to provide a quick reminder of a
command’s purpose and its options.
OPTIONS
The following options can only be used when no command is specified:
-v
,--version
-
Show the version of the hexapdf application and exit.
These options are available on every command (except if they are overridden):
--[no-]force
-
Force overwriting existing files. Default: false.
--strict
-
Enable strict parsing and validation. By default, correctable parse error and validation problems are treated as warnings which allows processing most PDF files, even many corrupt ones and ones not strictly following the PDF specifcation. If this option is used, correctable parse errors and uncorrectable validation problems are treated as errors.
Note that a PDF file may have validation errors and still be usable since most viewing applications are very forgiving.
--verbose
,-v
-
Enable more verbose output. There are three verbosity levels: 0 (no output), 1 (warning output) and 2 (warning and informational output). The default level is 1, specifying this option increases it to 2.
--quiet
-
Suppress any output by setting the verbosity level to 0. Also see the description of
--verbose
above. -h
,--help
-
Show the help for the application if no command was specified, or the command help otherwise.
Optimization Options
Theses options can only be used with the merge
, modify
and optimize
commands and control
optimization aspects when writing an output PDF file. Note that the defaults maybe different
depending on the command.
--[no-]compact
-
Delete unnecessary PDF objects. This includes merging the base revision and all incremental updates into a single revision. Default: yes.
--object-streams
MODE-
Defines how object streams should be treated: generate will remove all exisiting object streams and generate new ones, delete will only remove existing object streams and preserve will do nothing. Default: preserve.
--xref-streams
MODE-
Defines how cross-reference streams should be treated: generate will add them, delete will remove them and preserve will do nothing. Default: preserve.
--streams
MODE-
Defines how streams should be treated: compress will compress them when possible, uncompress will uncompress them when possible and preserve will do nothing to them. Default: preserve.
--[no-]compress-pages
-
Recompress page content streams. This is a very expensive operation in terms of processing time and won’t lead to great file size improvements in many cases. Default: no.
--[no-]prune-page-resources
-
Removes unused objects from the page resources dictionaries. This is a very expensive operation in terms of processing time but can yield drastic size reductions in certain cases (e.g. for PDFs that contain pages from other PDFs). Default: no.
--[no-]optimize-fonts
-
Optimize embedded font files by removing normally unneeded font data. Note that this may have a negative effect on PDFs with forms since form entry usually requires fully embedded font files. Default: no.
Encryption Options
These options can only be used with the merge
and modify
commands and control if and how an
output PDF file should be encrypted. All options except --decrypt
automatically enable
--encrypt
.
Note that if a password is needed to open the input file and if encryption parameters are changed, the provided password is not automatically used for the output file!
--decrypt
-
Remove any encryption.
If neither
--decrypt
nor--encrypt
are specified, the existing encryption configuration is preserved. --encrypt
-
Encrypt the OUTPUT.
If neither
--decrypt
nor--encrypt
are specified, the existing encryption configuration is preserved. --owner-password
PASSWORD-
The owner password to be set on the output file. This password is needed when operations not allowed by the permissions need to be done. It can also be used when opening the PDF file.
If an owner password is set but no user password, the output file can be opened without a password but the operations are restricted as if a user password were set.
Use - for PASSWORD for reading it from standard input.
--user-password
PASSWORD-
The user password to be set on the output file. This password is needed when opening the PDF file. The application should restrict the operations to those allowed by the permissions.
Use - for PASSWORD for reading it from standard input.
--algorithm
ALGORITHM-
The encryption algorithm to use on the output file. Allowed algorithms are aes and arc4 but arc4 should only be used if it is absolutely necessary for compatibility reasons. Default: aes.
--key-length
BITS-
The length of the encryption key in bits. The allowed values differ based on the chosen algorithm: A number divisible by eight between 40 to 128 for arc4 and 128 or 256 for aes. Default: 128.
Note: Using 256bit AES encryption can lead to problems viewing the PDF in many applications on various platforms!
--force-V4
-
Force the use of PDF encryption version 4 if key length is 128 and algorithm is arc4. This option is probably only useful for testing the implementation of PDF libraries’ encryption handling.
--permissions
PERMS-
A comma separated list of permissions to be set on the output file:
- allow printing
- modify_content
- allow modification of the content of pages
- copy_content
- allow text extraction and similar operations
- modify_annotation
- allow creation and modification of annotations and filling in of forms
- fill_in_forms
- allow filling in of forms even if modify_annotation is not set
- extract_content
- allow text and graphics extraction in accessibility cases
- assemble_document
- allow page modifications and bookmark creation
- high_quality_print
- allow high quality printing
COMMANDS
hexapdf uses a command-style interface. This means that it provides different functionalities depending on the used command, and each command can have its own options.
There is no need to write the full command name for hexapdf to understand it, the only requirement
is that is must be unambiguous. So using b
for the batch
command is sufficient. The same is
true for long option names and option values.
Any command that reads and writes a PDF file may do in-place processing of the file. This is
automatically done if an input file name is the same as the output file name. Note that the option
--force
has to be used in this case.
batch
Synopsis: batch
COMMAND FILES…
This command allows executing a single command for multiple input files, thereby reducing the overall execution time.
The first argument COMMAND is used as a hexapdf command line and must not contain the binary name, just everything else. The rest of the arguments are the input files. The specified command will be executed for each input file, with all occurences of {} being replaced by the file name.
files
Synopsis: files
[OPTIONS
] PDF [OUTPUT]
This command can list embedded files of the PDF, extract them or attach new files and save the
result to OUTPUT. If neither the --attach
nor the --extract
option is specified, the indices
and names of the embedded files are listed.
-a
FILE,--attach
FILE-
Attaches the given FILE to the PDF. If this option is specified, the OUTPUT must be provided. It is possible to use this option multiple times to attach multiple files in one go.
-d
DESCRIPTION,--description
DESCRIPTION-
Adds a description to the last attached file via
--attach
. This description is usually shown together with the filename of the attached file. -e
[A,B,C,…],--extract
[A,B,C,…]-
The indices of the embedded files that should be extracted. The value 0 can be used to extract all embedded files.
-s
,--[no-]search
-
Search the whole PDF file instead of the standard locations, that is files attached to the document as a whole or to an individual page. Defaults to false.
-p
PASSWORD,--password
PASSWORD-
The password to decrypt the PDF. Use - for PASSWORD for reading it from standard input.
fonts
Synopsis: fonts
[OPTIONS
] PDF
This command list fonts of the PDF file. If the --pages
option is not specified, all fonts in
the whole file are listed. Otherwise all fonts occuring on the specified pages are listed (fonts may
be listed multiple times, i.e. for each page).
-i
PAGES,--pages
PAGES-
The pages from the PDF for which the fonts should be listed. See the PAGES SPECIFICATION below for details on the allowed format of PAGES.
-p
PASSWORD,--password
PASSWORD-
The password to decrypt the PDF. Use - for PASSWORD for reading it from standard input.
The following information is shown for each font:
- page
- The page number on which the font appears.
- name
- The name of the font as found in the PDF.
- type
- The type of the font. Can be ‘Type 1, ‘Type 1C’ (Type 1 font in Compact Font Format [CFF]), ‘Type 3’, ‘Truetype’, ‘CID CFF’ or ‘CID TrueType’.
- encoding
- The font’s encoding.
- emb
- ‘yes’ if the font is embedded
- sub
- ‘yes’ if the font is subset
- size
- The size of the embedded font file. Only valid if the font file is actually embedded.
- oid
- The PDF internal object identifier consisting of the object and generation numbers.
form
Synopsis: form
[OPTIONS
] INPUT [OUTPUT]
This command allows working with interactive forms. If the OUTPUT file is not specified, all form fields are listed in page order. Note that a form field appears only once in the output, namely for the first widget annotation.
By default the field name followed by a help text in parentheses (if available) is shown, followed
on the next line by the current value. Using the global --verbose
option will show all widget
annotations instead of just the first one from a field as well as additional information like field
type, possible field values, and size plus location on the page.
If OUTPUT is provided, the fields can be filled out interactively, via a template file or the fields can just be flattened. Form field flattening can also be activated in addition to filling out the form.
When filling out the form interactively (the default), the command prompts for the values of the form fields and stores the updated PDF file in OUTPUT. The values for the form fields are asked in the same order as when listing the fields. If no input for a field is given, the field’s value is not changed from its current value.
By using the --template
option, the data for the fields is read from the given template file
instead of the standard input. See the --template
option for details.
If the --flatten
is specified but neither --fill
nor --template
, the form is just flattened.
Otherwise the form is filled out and flattened in addtion.
There exist two different types of PDF forms: The standard interactive forms (AcroForm) and the more advanced but proprietary and in PDF 2.0 deprecated XFA forms. HexaPDF only supports the standard AcroForm forms. It is possible to work with XFA forms to a certain degree but since the advanced features are not supported, the result may not be correct.
--fill
-
Fill out the form fields interactively. This is also the default if neither
--fill
nor--template
nor--flatten
is specified. -t TEMPLATE_FILE
,--template TEMPLATE_FILE
-
Use the given template file for filling out the values of the PDF form. This can be used to fill out a form without any further interaction.
The TEMPLATE_FILE has to be a text file following a simple format:
-
Field names have to start at the first column and have to be followed by a colon. If a field name contains a colon, prefix it with a backslash.
-
Everything after the colon until a line with a non-whitespace character in the first column is considered the field’s value. Leading and trailing whitespace as well as whitespace at the beginning of lines is stripped from the value.
Here is an example for a template file:
page1.field1: A simple value page1.field3: Another value spanning more than on line. Another form field: Value for this form field.
-
--flatten
-
Flattens the form fields by making them part of the content of the page. This option can be used standalone or in addition to
--fill
or--template
. --[no-]fill-read-only-fields
-
Specifies whether read only fields can be filled in. Defaults to false.
--[no-]viewer-override
-
Specifies whether the PDF viewer should override the generated visual appearance. Note that not all viewers respect this setting. Defaults to using the setting from input PDF.
--[no-]incremental-save
-
Specifies whether an incremental save should be done instead of a full save. When using incremental save, the INPUT is written as is to OUTPUT and only the changes are appended. Defaults to true.
-p
PASSWORD,--password
PASSWORD-
The password to decrypt the INPUT. Use - for PASSWORD for reading it from standard input.
help
Synopsis: help
[COMMAND…]
This command prints the application help if no arguments are given. If one or more command names are given as arguments, these arguments are interpreted as a list of commands with sub-commands and the help for the innermost command is shown.
images
Synopsis: images
[OPTIONS
] PDF
This command extracts images from the PDF. If the --extract
option is not specified, the images
are listed with their indices and additional information, sorted by page number. Note that if an
image is used multiple times on a page, only the first occurence of it will be included.
The --extract
option can then be used to extract one or more images, saving them to files called
PREFIX-N.EXT
where the prefix can be set via --prefix
, N is the image index and EXT is
either png, jpg or jpx.
-e
[A,B,C,…],--extract
[A,B,C,…]-
The indices of the images that should be extracted. Use 0 or no value to extract all images.
--prefix
PREFIX-
The prefix to use when saving images. May include directories. Defaults to image.
-s
,--[no-]search
-
Search the whole PDF file instead of the standard locations, that is, images referenced by pages. Defaults to false.
-p
PASSWORD,--password
PASSWORD-
The password to decrypt the PDF. Use - for PASSWORD for reading it from standard input.
The following information is shown for each image when listing images:
- index
- The image index needed when this image should be extracted.
- page
- The page number on which this image appears.
- oid
- The PDF internal object identifier consisting of the object and generation numbers.
- width
- The width of the image in pixels.
- height
- The height of the image in pixels.
- color
- The color space used for the image. Either gray, rgb, cmyk or other.
- comp
- The number of color components.
- bpc
- The number of bits per color component.
- x-ppi
- The pixels per inch (PPI) of the x-direction of the image, as found on the page.
- y-ppi
- The pixels per inch (PPI) of the y-direction of the image, as found on the page.
- size
- The file size of the image as stored in the PDF.
- type
- The image type. Either jpg (JPEG), jp2 (JPEG2000), ccitt (CCITT Group 3 or 4 Fax), jbig2 (JBIG2) or png (PNG).
- writable
- Either true or false depending on whether hexapdf supports the image format.
image2pdf
Synopsis: image2pdf
[OPTIONS
] [IMAGES…] OUTPUT
This command converts one or more images into a single PDF file, one image per page. The various options allow setting a page size, scaling the images and defining margins. Images are always centered on the pages.
Supported image formats are JPEG, PNG and PDF. Images in PNG format may take longer to process due to the way they are stored inside a PDF.
-p
PAGE_SIZE,--page-size
PAGE_SIZE-
The PDF page size. The default value of auto chooses the page size based on the image dimensions. Either auto which chooses a size based on the image size or a valid page size like A4, A4-landscape or 595x842. The -landscape suffix can be added to any predefined page size.
Common page sizes are A4, A5, A3, Letter and Legal.
--[no-]auto-rotate
-
If enabled (the default) pages are automatically rotated so that the pages and images always have the same orientation. I.e. landscape-oriented images go on landscape page, portrait-oriented images on portrait pages.
Note that pages won’t be rotated if scaling is used and the image would fit into the requested page size.
-s
SCALE,--scale
SCALE-
Defines how the images should be scaled. The default value of fit scales the images so that they optimally fit the pages. Otherwise SCALE is interpreted as the minimum number of pixels per inch (PPI) that the images should have.
-m
MARGINS,--margins
MARGINS-
Defines the margins around the images. The argument MARGINS can either be a single number specifying the margin on all four sides, or four numbers separated by commas (like
10,20,30,40
) specifying the top, right, bottom and left margins. Default: 0.
Additionally, the Optimization Options and Encryption Options can be used.
info
Synopsis: info
[OPTIONS
] FILE
This command reads the FILE and shows general information about it, like author information, PDF version used, encryption information and so on.
-c
,--check
-
Checks the PDF FILE for parse and validation errors and prints them out. If the process doesn’t abort, HexaPDF is still able to handle the file by correcting the errors. This means that the other commands can use the FILE as input although it is damaged.
-p
PASSWORD,--password
PASSWORD-
The password to decrypt the PDF FILE. Use - for PASSWORD for reading it from standard input.
inspect
Synopsis: inspect
[OPTIONS
] FILE [[CMD [ARGS]]…]
This command is useful when one needs to inspect the internal object structure or a stream of a PDF file.
If no arguments are given, the interactive mode is started. This interactive mode allows you to execute inspection commands without re-parsing the PDF file, leading to better performance for big PDF files.
Otherwise the arguments are interpreted as interactive mode commands and executed. It is possible to specify more than one command in this way by separating them with semicolons, or whitespace in case the number of command arguments is fixed.
-p
PASSWORD,--password
PASSWORD-
The password to decrypt the PDF FILE. Use - for PASSWORD for reading it from standard input.
If an interactive mode command or argument is OID[,GEN]
, object and generation numbers are
expected. The generation number defaults to 0 if not given. PDF objects are always shown in the
native PDF syntax.
The available commands are:
OID[,GEN] | o[bject] OID[,GEN]
-
Print the given indirect object.
r[ecursive] OID[,GEN]
-
Print the given indirect object recursively. This means that all references found in the object are resolved and the resulting objects themselves recursively printed.
To make it easier to compare such structures between PDF files, the entries of dictionaries are printed in sorted order and the original references are replaced by custom ones. Once an indirect object is first encountered, it is preceeded by either
{obj INDEX}
or{obj page PAGEINDEX}
whereINDEX
is an increasing number andPAGEINDEX
is the index of the page. Later references are replaced by{ref INDEX}
and{ref page PAGEINDEX}
respectively.Here is a simplified example output:
<< /Info {obj 1} << /Producer (HexaPDF version 0.9.3) >> /Root {obj 2} << /Pages {obj 3} << /Count 1 /Kids [{obj page 1} << /MediaBox [0 0 595 842 ] /Parent {ref 3} /Type /Page >> ] /Type /Pages >> /Type /Catalog >> /Size 4 >>
On line 2 the indirect object for the key
/Info
is shown, preceeded by the custom reference. On line 8 is an example for a page object with the special reference key. And on line 10 there is a back reference to the object with index 3 which is started on line 6. s[tream] OID[,GEN]
-
Print the filtered stream, i.e. the stream with all filters applied. This is useful, for example, to view the contents of content streams.
raw[-stream] OID[,GEN]
-
Print the raw stream, i.e. the stream as it appears in the file. This is useful, for example, to extract streams into files.
rev[ision] [NUMBER]
-
If no argument is given, prints information about all revisions of the document. The information includes the number of objects in the revision, whether it was signed and the byte range. A PDF document contains at least one revision but may contain more if it was updated incrementally.
If NUMBER is specified, the specified revision is output. This is useful, for example, to extract a signed revision to view it in the state as it has been signed.
x[ref] OID[,GEN]
-
Print the cross-reference entry for the given indirect object.
c[atalog]
-
Print the catalog dictionary.
t[railer]
-
Print the trailer dictionary.
p[ages] [RANGE]
-
Print the pages with their object and generation numbers and their associated content streams. If a range is specified, only those pages are listed. See the PAGES SPECIFICATION below for details on the allowed format of RANGE.
po PAGE
-
Print the dictionary object for the given page. See the PAGES SPECIFICATION below for details on the allowed format of PAGE. Note that only the first page is printed, even if a page range is specified.
ps PAGE
-
Print the whole content stream for the given page. If the content stream consists of mulitple stream objects, all will be printed. See the PAGES SPECIFICATION below for details on the allowed format of PAGE. Note that only the content stream of the first page is printed, even if a page range is specified.
psd PAGE
-
Print the content stream for the given page in decoded form, i.e. using more descriptive operator names as well as decoding the text parts. Otherwise it works the same as
ps
. pc | page-count
-
Print the number of pages.
search REGEXP
-
Print all objects matching the pattern. Each object is preceeded by
obj OID GEN
and followed byendobj
to make it easier to further explore the data. h[elp]
-
Print the available commands with a short description.
q[uit]Quit
-
Quit the interactive mode.
merge
Synopsis: merge
[OPTIONS
] { INPUT | --empty
} [INPUT]… OUTPUT
This command merges pages from multiple PDFs into one output file which can optionally be encrypted/decrypted and optimized in various ways.
The first input file is the primary file from which meta data like file information, outlines, etc.
are taken from. Alternatively, it is possible to start with an empty PDF file by using --empty
.
The order of the input files is important as the pages are added in that order. Note that the
--password
and --pages
options always apply to the last preceeding input file.
An input file can be specified multiple times, using a different --pages
option each time. The
--password
option, if needed, only needs to be used the first time.
-p
PASSWORD,--password
PASSWORD-
The password to decrypt the last input file. Use - for PASSWORD for reading it from standard input.
-i
PAGES,--pages
PAGES-
The pages (optionally rotated) from the last input file that should be included in the OUTPUT. See the PAGES SPECIFICATION below for details on the allowed format of PAGES. Default: 1-e (i.e. all pages with no additional rotation applied).
-e
,--empty
-
Use an empty file as primary file. This will lead to an output file that just contains the included pages of the input file and no other data from the input files.
--interleave
-
Interleave the pages from the input files: Takes the first specified page from the first input file, then the first specified page from the second input file, and so on. After that the same with the second, third, … specified pages. If fewer pages were specified for an input file, the input file is just skipped for the rest of the rounds.
Additionally, the Optimization Options and Encryption Options can be used.
modify
Synopsis: modify
[OPTIONS
] INPUT OUTPUT
This command modifies a PDF file. It can be used to select pages that should appear in the output file and/or rotate them. The output file can also be encrypted/decrypted and optimized in various ways.
-p
PASSWORD,--password
PASSWORD-
The password to decrypt the INPUT. Use - for PASSWORD for reading it from standard input.
-i
PAGES,--pages
PAGES-
The pages (optionally rotated) from the INPUT that should be included in the OUTPUT. See the PAGES SPECIFICATION below for details on the allowed format of PAGES. Default: 1-e (i.e. all pages with no additional rotation applied).
-e
FILE,--embed
FILE-
Embed the given file into the OUTPUT using built-in features of PDF. This option can be used multiple times to embed more than one file.
--annotations
MODE-
Handle the annotations of the included pages by either removing them (remove) or flattening them (flatten). Either way there are no annotations left afterwards.
Additionally, the Optimization Options and Encryption Options can be used.
optimize
Synopsis: optimize
[OPTIONS
] INPUT OUTPUT
This command uses several optimization strategies to reduce the file size of the PDF file.
By default, all strategies except page compression are used since page compression may take a very long time without much benefit.
-p
PASSWORD,--password
PASSWORD-
The password to decrypt the INPUT. Use - for PASSWORD for reading it from standard input.
The Optimization Options can be used with this command. Note that the defaults are changed to provide good compression out of the box.
split
Synopsis: split
[OPTIONS
] INPUT [OUTPUT_SPEC]
This command splits the input file into multiple output files, using different strategies:
-
The default strategy is to split the input file into output files with each containing one page. So splitting is done by page number.
-
The other available strategy is to split by page size where pages with the same page size get put into the same output file.
The OUTPUT_SPEC argument defines the naming scheme for the output files. If it is not provided, the default value of INPUT_WITHOUT_EXT_%04d.pdf is used where INPUT_WITHOUT_EXT is the INPUT without the file extension. A printf-style format string like the default ‘%04d’ can (should) be included so that different output files are created.
How the printf-style format string is interpreted depends on the strategy:
-
When splitting into individual pages (i.e. per page number), the format string is replaced by the formatted page number. So with the default OUTPUT_SPEC files of the form INPUT_0001.pdf, INPUT_0002.pdf, … and so on are created.
-
When splitting by page size, the format string itself is ignored and is replaced with the name of the page size, e.g. A4 or Letter. If the name of the page size can’t be determined, the name WIDTHxHEIGHT is used.
-s
STRATEGY,--strategy
STRATEGY-
Defines how the PDF file should be split: page_number (the default) splits into individual pages and page_size splits by page size.
-p
PASSWORD,--password
PASSWORD-
The password to decrypt the INPUT. Use - for PASSWORD for reading it from standard input.
Additionally, the Optimization Options and Encryption Options can be used. Those options are applied to each output file.
usage
Synopsis: usage
[OPTIONS
] FILE
This command reads the FILE and shows space usage statistics, i.e. which parts of the PDF take how much space in the file.
Each statistic line shows the space used followed by the number of indirect objects in parentheses. If some of those objects are in object streams, that number is displayed after a slash. Here is an example:
Fonts 218.6K (63/42)
This line shows that fonts take up 218.6K of space inside the file and that there are 63 indirect objects having to do with fonts. Furthermore, of those 63 indirect objects 42 are stored more compactly in object streams
Objects in object streams do only count towards the size of the object streams category in the file but not towards a more specific category like fonts.
-p
PASSWORD,--password
PASSWORD-
The password to decrypt the PDF FILE. Use - for PASSWORD for reading it from standard input.
Notes:
- Space usage and object count is only approximate and represents the lower bound for each category.
- PDF comments, cross-reference tables and other such syntax is not represented in the statistic. This means that the shown total space usage is always lower than the file size.
watermark
Synopsis: watermark
[OPTIONS
] INPUT OUTPUT
This command uses one ore more pages from a PDF file and applies them as background or stamp
(depending on the --type
option) on another PDF file. If multiple pages are selected from the
watermark PDF, the --repeat
option can be used to specify how they should be applied.
-w
WATERMARK,--watermark-file
WATERMARK-
The PDF file that should be used for watermarking.
-i
PAGES,--pages
PAGES-
The pages from the WATERMARK PDF that should be used. The first WATERMARK page is applied to the first INPUT page, the second WATERMARK page to the second INPUT page and so on. If there are fewer WATERMARK pages than INPUT pages, the
--repeat
option comes into play.See the PAGES SPECIFICATION below for details on the allowed format of PAGES. Default: 1.
-r
REPEAT_MODE,--repeat
REPEAT_MODE-
Specifies how the WATERMARK pages should be repeated: last (the default) will only repeat the last WATERMARK page whereas all will cyclically repeat all WATERMARK pages.
-t
WATERMARK_TYPE,--type
WATERMARK_TYPE-
Specifies how the WATERMARK pages are applied to the INPUT pages: background (the default) applies them below the page contents and stamp applies them above the page contents.
-p
PASSWORD,--password
PASSWORD-
The password to decrypt the INPUT. Use - for PASSWORD for reading it from standard input.
Additionally, the Optimization Options and Encryption Options can be used.
version
This command shows the version of the hexapdf application. It is an alternative to using the global
--version
option.
PAGES SPECIFICATION
Some commands allow the specification of pages using a PAGES argument. This argument is expected to be a comma separated list of single page numbers or page ranges of the form START-END. The character ‘e’ represents the last page and can be used instead of a single number or in a range. If a number is preceded by an ‘r’, the pages are counted from the end (i.e. r1 would be the last page). The pages are used in the order in which the are specified.
If the start number of a page range is higher than the end number, the pages are used in the reverse order.
Single page numbers that are not valid are ignored. If a page number in a page range is higher than the page number of the last page, the page number of the last page is used instead.
Step values can be used with page ranges. If a range is followed by /STEP, STEP - 1 pages are skipped after each used page.
Additionally, the page numbers and ranges can be suffixed with a rotation modifier:
- l
- Rotate the page left, that is 90 degrees counterclockwise
- r
- Rotate the page right, that is 90 degrees clockwise
- d
- Rotate the page 180 degrees
- n
- Remove any set page rotation
Note that this additional functionality may not be used by all commands (it is used, for example, by
the modify
command).
Examples:
- 1,2,3: The pages 1, 2 and 3.
- 11,4-9,1,e,r3: The pages 11, 4 to 9, 1, the last page and the third last page, in exactly this order.
- 1-e: All pages of the document.
- 1-r1: Same as above.
- 1-r4: All pages of the document except the last three.
- e-1: All pages of the document in reverse order.
- 1-5/2: The pages 1, 3 and 5.
- 10-1/3: The pages 10, 7, 4 and 1.
- 1l,2r,3-5d,6n: The pages 1 (rotated left), 2 (rotated right), 3 to 5 (all rotated 180 degrees) and 6 (any possibly set rotation removed).
EXAMPLES
merge
hexapdf merge input1.pdf input2.pdf input3.pdf output.pdf
hexapdf merge -e input1.pdf input2.pdf input3.pdf output.pdf
Merging: In the first case use input1.pdf
as primary input file and merge the pages from
input2.pdf
and input3.pdf
into it. In the second case an empty PDF file is used for merging the
pages from the three given input files into it; the resulting output file will not have an meta data
or other additional data from the first input file.
hexapdf merge odd.pdf even.pdf --interleave combined.pdf
Page interleaving: Takes alternately a page from odd.pdf
and even.pdf
to create the output file.
This is very useful if you only have a simplex scanner: First you scan the front sides, creating
odd.pdf
, and then you scan the back sides, creating even.pdf
. With the command the pages can be
ordered in the correct way.
modify
hexapdf modify input.pdf -i 1,7-10 output.pdf
Page selection: Select only the pages 1 and 7 to 10 from the input.pdf
.
hexapdf modify input.pdf -i 1-5,7-10,12-e output.pdf
Page removal: Remove the pages 6 and 11 from the input.pdf
.
hexapdf modify input.pdf -i 1r,2-ed output.pdf
Page rotation: Rotate the first page to the right, that is 90 degrees clockwise, and all other pages 180 degrees.
hexapdf modify input.pdf --user-password my_pwd --permissions print output.pdf
Encryption: Create the output.pdf
from the input.pdf
so that a password is needed to open it,
and only allow printing.
hexapdf modify input.pdf -p input_password --decrypt output.pdf
Encryption removal: Create the output.pdf
as copy of input.pdf
but with the encryption removed.
If the --decrypt
was not used, the output file would retain the encryption specification of the
input file.
optimize
hexapdf optimize input.pdf output.pdf
Optimization: Compress the input.pdf
to get a smaller file size.
split
hexapdf split input.pdf out_%02d.pdf
Split the input.pdf
into individual pages, naming the output files out_01.pdf
, out_02.pdf
, and
so on.
hexapdf split input.pdf --strategy page_size
Split the input.pdf
into files based on their page size, with output file names like
input_A4.pdf
or input_Letter.pdf
.
watermark
hexapdf watermark -w watermark.pdf -t stamp input.pdf output.pdf
Applies the first page of the watermark.pdf
as stamp on input.pdf
.
hexapdf watermark -w watermark.pdf -i 2-5 -r all input.pdf output.pdf
Cyclically applies the pages 2 to 5 of the watermark.pdf
as background on input.pdf
.
form
hexapdf form input_form.pdf -v
List all form fields of the input_form.pdf
with additional information.
hexapdf form input_form.pdf output.pdf
Interactively fill out the input_form.pdf
PDF form and save the result in output.pdf
.
hexapdf form --flatten --fill input_form.pdf output.pdf
Interactively fill out the input_form.pdf
PDF form, flatten it and save the result in output.pdf
.
files
hexapdf files input.pdf
hexapdf files input.pdf -e 1
Embedded files: The first command lists the embedded files in the input.pdf
, the second one then
extracts the embedded file with the index 1.
hexapdf files -a invoice.xml -d "Invoice data" -a custom.xml input.pdf output.pdf
Attaches the files invoice.xml
and custom.xml
to the input.pdf
and stores the result in
output.pdf
. The first file invoice.xml
is stored together with a short description.
images
hexapdf images input.pdf
hexapdf images input.pdf -e --prefix images/image
Image info and extraction: The first command lists the images of the input.pdf
, the second one
then extracts the images into the subdirectory images
with the prefix image
.
image2pdf
hexapdf image2pdf image1.jpg image2.pdf image3.png output.pdf
Create a PDF file output.pdf
containing three pages with one image per page and the image fitted
to the page.
info
hexapdf info input.pdf
File information: Show general information about the PDF file, like PDF version, number of pages, creator, creation date and encryption related information.
inspect
hexapdf inspect input.pdf -o 3
Show the object with the object number 3 of the given PDF file.
hexapdf inspect input.pdf
Start the interactive inspection mode.
batch
hexapdf batch 'info {}' input1.pdf input2.pdf input3.pdf
Execute the info command for all input files.
hexapdf batch 'optimize --object-streams delete {} done-{}' input1.pdf input2.pdf input3.pdf
Optimize the given input files, creating the three output files done-input1.pdf
, done-input2.pdf
and done-input3.pdf
.
EXIT STATUS
The exit status is 0 if no error happened. Otherwise it is 1.
SEE ALSO
The hexapdf website for more information.
AUTHOR
hexapdf was written by Thomas Leitner t_leitner@gmx.at.
This manual page was written by Thomas Leitner t_leitner@gmx.at.