Convert a PDF into a series of images using C# and GhostScript
An application I was recently working on received PDF files from a webservice which it then needed to store in a database. I wanted the ability to display previews of these documents within the application. While there are a number of solutions for creating PDF files from C#, options for viewing a PDF within your application is much more limited, unless you purchase expensive commercial products, or use COM interop to embed Acrobat Reader into your application.
This article describes an alternate solution, in which the pages in a PDF are converted into images using GhostScript, from where you can then display them in your application.
In order to avoid huge walls of text, this article has been split into two parts, the first dealing with the actual conversion of a PDF, and the second demonstrates how to extend the ImageBox control to display the images.
Caveat emptor
Before we start, some quick points.
- The method I'm about to demonstrate converts each page of the PDF into an image. This means that it is very suitable for viewing, but interactive elements such as forms, hyperlinks and even good old text selection are not available.
- GhostScript has a number of licenses associated with it but I can't find any information of the pricing of commercial licenses.
- The GhostScript API Integration library used by this project isn't complete and I'm not going to go into the bells and whistles of how it works in this pair of articles - once I've completed the outstanding functionality I'll create a new article for it.
Getting Started
You can download the two libraries used in this article from the links below, these are:
- Cyotek.GhostScript - core library providing GhostScript integration support
- Cyotek.GhostScript.PdfConversion - support library for converting a PDF document into images
Please note that the native GhostScript DLL is not included in these downloads, you will need to obtain that from the GhostScript project page
Using the GhostScriptAPI class
As mentioned above, the core GhostScript library isn't complete yet, so I'll just give a description of the basic functionality required by the conversion library.
The GhostScriptAPI
class handles all communication with
GhostScript. When you create an instance of the class, it
automatically calls gsapi_new_instance
in the native
GhostScript DLL. When the class is disposed, it will
automatically release any handles and calls the native
gsapi_exit
and gsapi_delete_instance
methods.
In order to actually call GhostScript, you call the Execute
method, passing in either a string array of all the arguments to
pass to GhostScript, or a typed dictionary of commands and
values. The GhostScriptCommand
enum contains most of the
commands supported by GhostScript, which may be a preferable
approach rather than trying to remember the parameter names
themselves.
Defining conversion settings
The Pdf2ImageSettings
class allows you to customize various
properties of the output image. The following properties are
available:
AntiAliasMode
- specifies the antialiasing level between Low, Medium and High. This internally will set thedTextAlphaBits
anddGraphicsAlphaBits
GhostScript switches to appropriate values.Dpi
- dots per inch. Internally sets ther
switch. This property is not used if a paper size is set.GridFitMode
- controls the text readability mode. Internally sets thedGridFitTT
switch.ImageFormat
- specifies the output image format. Internally sets thesDEVICE
switch.PaperSize
- specifies a paper size from one of the standard sizes supported by GhostScript.TrimMode
- specifies how the image should be sized. Your milage may vary if you try and use the paper size option. Internally sets either thedFIXEDMEDIA
andsPAPERSIZE
or thedUseCropBox
or thedUseTrimBox
switches.
Typical settings could look like this:
Converting the PDF
To convert a PDF file into a series of images, use the
Pdf2Image
class. The following properties and methods are
offered:
ConvertPdfPageToImage
- converts a given page in the PDF into an image which is saved to diskGetImage
- converts a page in the PDF into an image and returns the imageGetImages
- converts a range of pages into the PDF into images and returns an image arrayPageCount
- returns the number of pages in the source PDFPdfFilename
- returns or sets the filename of the PDF document to convertPdfPassword
- returns or sets the password of the PDF document to convertSettings
- returns or sets the settings object described above
A typical example to convert the first image in a PDF document:
The inner workings
Most of the code in the class is taken up with the
GetConversionArguments
method. This method looks at the
various properties of the conversion such as output format,
quality, etc, and returns the appropriate commands to pass to
GhostScript:
As you can see from the method above, the commands are being
returned as a strongly typed dictionary - the GhostScriptAPI
class will convert these into the correct GhostScript commands,
but the enum is much easier to work with from your code! The
following is an example of the typical GhostScript commands to
convert a single page in a PDF document:
The next step is to call GhostScript and convert the PDF which
is done using the ConvertPdfPageToImage
method:
As you can see, this is a very simple call - create an instance
of the GhostScriptAPI class and then pass in the list of
parameters to execute. The GhostScriptAPI
class takes care of
everything else.
Once the file is saved to disk, you can then load it into a
Bitmap
or Image
object for use in your application. Don't
forget to delete the file when you are finished with it!
Alternatively, the GetImage
method will convert the file and
return the bitmap image for you, automatically deleting the
temporary file. This saves you from having to worry about
providing and deleting the output file, but it does mean you are
responsible for disposing of the returned bitmap.
You could also convert a range of pages at once using the
GetImages
method:
In conclusion
The above methods provide a simple way of providing basic PDF viewing in your applications. In the next part] of this series, we describe how to extend the ImageBox component to support conversion and navigation.
Update History
- 2011-09-04 - First published
- 2012-07-10 - Added follow up article links
- 2020-11-21 - Updated formatting
Like what you're reading? Perhaps you like to buy us a coffee?
# DotNetShoutout
# DotNetKicks.com
# MichaW
# Richard Moss
# Muhammad
# Lorena
# Richard Moss
# chandrasekhar
# Richard Moss
# chandrasekhar
# Richard Moss
# Rafi
# Richard Moss
# od
# AmityRooso
# Richard Moss
# Rafi
# Richard Moss
# Irene
# Raymond Lai
# Richard Moss
# S. Vikneshwar
# Vikneshwar
# Richard Moss
# Sai Cyouki
# Richard Moss
# Sathishkumar
# Richard Moss
# Rex
# Richard Moss
# Pete
# Richard Moss
# Vincent L
# Gregory
# Richard Moss
# Ulrik
# valver
# Lt.Dan
# Lt.Dan
# Lt.Dan
# Sankari
# Ener
# Janardhan
# Richard Moss
# userMVC
# aboy
# Richard Moss
# wann
# Richard Moss
# Armando
# Sankari
# MSC
# Richard Moss
# durielj
# Richard Moss
# ROHIT MAHESHWARI
# Richard Moss
# Jonathan Kim
# Priyank
# Richard
# kranthi
# Richard Moss
# sadik
# jaid
# Richard Moss
# Govinda Rajbhar
# Richard Moss
# steven frierdich