Fork me on GitHub

Blog

News and announcements including new releases, bug fixes and anything newsworthy.

UniDoc v2 Released

Today we are happy to release UniDoc version 2.0.0, a comprehensive open source PDF library for golang. In this announcement, we cover some of the key changes with a bit of background how it developed, as well as outlining main new features and providing references to relevant examples.

Architectual changes

It has been almost a year since we started planning version 2 and created the v2 branch. At that time, as we kept adding more and more functionality to UniDoc v1 we foresaw that the package would grow huge as we supported more and more functionality in PDF. Thus, to make the project more maintainable going forward, we decided to split it up into two packages with separate roles. As a result, we refactored the project into the following packages:

  • unidoc/pdf/core: The core package defines the primitive PDF object types and handles the file reading I/O and parsing the primitive objects.

  • unidoc/pdf/model: The model package builds on the core package, to represent the PDF as a structured model of the PDF primitive types. It has a reader and a writer to read and process a PDF file based on the structured model. This serves as a basis to perform a number of numerous tasks and can be used to work with a PDF in a medium to high level interface, although it does require an understanding of the PDF format and structure.

In essence, the core package is more or less what we had in v1, except the data models have been moved to model. We have also done extensive work to support more of the data models in the PDF standard.

Advanced processing capability

For example, one of our customers contacted us regarding a project that they were working on and required converting entire PDF documents from color to grayscale. For those who do not know the interior of PDF, colors and colorspaces are very generally defined and many color models exist. In other words, it is pretty darn complex!

In order to support this, we had to add support for PDF functions which also has multiple types (one including a PostScript parser). In addition, we had to read and parse the PDF contents, which are represented in a content stream, which is essentially a stream of commands/operands. We also added support for Patterns, Shadings, and many more things which are probably not of interest to everyone :) but contact us if you are interested to learn more! Anyway, the point is: we have added tons of data models for PDF processing, and we are at the point where we can do some quite involved processing and manipulation of PDF contents.

For those who are interested in the grayscale conversion, we have provided some example code that demonstrates conversion of PDF to grayscale: pdf_grayscale_transform.go.

Reporting functionality

A few of our users also requested capability for inserting images and text to PDFs. At the same time we were working on creating an interactive PDF editor for FoxyUtils.com (available here), which has an Angular2 frontend, but uses UniDoc in the backend for processing and generating the PDFs. As a result, we added the data models needed for processing images to the project. These models also work when reading an input PDF, for illustration our example pdf_extract_images.go shows how to extract all images from a PDF and image insertion is illustrated in pdf_add_image_to_page.go.

At this point, we had a lot of cool capabilities, but simple tasks like creating a PDF with an image required pretty complex code and understanding of PDF. Most people probably don't know the command for creating an image "/Image1 Do" in PDF... or others and probably would like to avoid reading through the extensive PDF standard to figure out how to do a simple task. Thus, we wanted to hide away this complexity and create a higher level API to access the common tasks for PDF creation. As a result we created a separate package called creator for handling this:

  • unidoc/pdf/creator: The PDF creator makes it easy to create new PDFs or modify existing PDFs. It can also enable loading a template PDF, adding text/images and generating an output PDF. It can be used to add text, images, and generate text and graphical reports. It is designed with simplicity in mind, with the goal of making it easy to create reports without needing any knowledge about the PDF format or specifications.

As we worked on the creator we realized that it could be a cool tool for creating PDF reports etc. We are still working on improving the creator, but we already have a few key components:

  • The Drawable interface. Each visual element needs to implement the Drawable interface and have functionality to draw the component, and handle wrapping over multiple pages in some cases.

  • Paragraph. The paragraph is simply text which can wrap over multiple lines and pages (unless wrapping is not enabled). The text has a specified font, size, color and other style properties.

  • Image. Can be loaded from file and either drawn to a specific position (absolute) or in relative mode in context.

  • Chapter and Subchapter: Used for arranging paragraphs and other drawables into chapters and subchapters.

  • Table. Can be used for arranging Drawables into a grid.

The goal with the creator is to be able to create good-looking reports without a ton of effort, with simple and understandable code.

We have created an example to illustrate the PDF creation capability. The source code for the example pdf_report.go and the resulting generated PDF is available here: unidoc-report.pdf.

Example highlights

We have prepared a fairly extensive set of examples for getting started with UniDoc. While we feel they are all very exciting and encourage everyone to take a look, we would like to highlight a few of those that may be of most benefit to the bulk of our users:

The full set has many more examples for specific tasks. If you have ideas for new examples, let us know!

License simplification

UniDoc is dual-licensed under AGPLv3 and a commercial license available to allow use in closed source and non-AGPLv3 products.

When we first released UniDoc, it was released under AGPLv3 with a few additional terms. Thanks to our interaction with Cathal Garvey, we determined that the additional terms were unnecessary and some in a gray area. As a result, we changed to standard AGPLv3 without any additional constraints.

We want the project to be open source and anyone should be able to read and try the code. Developers can try UniDoc and see if it fits their needs and test if it works in a production environment. When going live with UniDoc in a closed source product (or non-AGPLv3 licensed), the commercial license is needed. Our pricing model is pretty simple and a Business Unlimited License allows unlimited developers/servers and includes support. The support can be to make an example for performing a certain task, or even adding features that are missing in UniDoc and/or priorities bug fixes. The development and maintainance of UniDoc is financed by those fees, so these fees are what keeps the engine running.

Conclusion

In summary, UniDoc v2 is finally out and is on the UniDoc master branch now (finally!). It has many new features, and we suggest you go get it and test out our examples.

If you have any questions or need an example that is not provided, please contact us or email us via support@unidoc.io. If you find a bug, you can file an issue on GitHub or shoot us an email.

Launching UniDoc

Today we are releasing UniDoc version 1.0, a comprehensive open source PDF toolkit written in Go.

Background

At FoxyUtils we have been using various libraries for PDF manipulation over the years and never been completely happy with what we have used. In the last couple of years we have been migrating our code-base to Golang and have completed porting of our existing python code. In order to use the same libraries as used in python we had to to shell out and call external APIs. As a result we have been developing a PDF toolkit in Go and we are pleased to announce that our baby has been born and is ready for the prime time. FoxyUtils.com has been updated to entirely use the new library for the following services:

UniDoc is starting out as a PDF toolkit for Go, but will be expanding to a general document processing libraries with support to read and write PDF, Doc, DocX and more formats. Contribution from the community is crucial to help us achieve our goal.

Installation


go get github.com/unidoc/unidoc/...

Overview and features

  • Read and extract PDF metadata.
  • Merge PDF (example).
  • Split PDF (example).
  • Protect PDF (example).
  • Unlock PDF (example).
  • The library aims to be fast and we process large PDF's in large quantities.
  • Self contained and depends only on the Go standard library.
  • Developer friendly.

Examples

See the examples folder.

Roadmap

We have big plans to improve it to support a lot of functionality:

  • Compress PDF.
  • JPG to PDF.
  • PDF to JPG.
  • High-level API to create PDF.
  • Search/replace PDFs - Edit functionality.
  • OCR engine to generate searchable PDF from scanned data.
  • Conversions from any format to another that makes sense e.g. PDF to Word, Word to PDF.
  • Create nice interface for generating reports with export capabilities to PDF and DocX.

We feel we are about to start an exciting journey of bringing the Go community an exciting document manipulation library.

Languages

Go has an excellent toolchain which makes it possible to create language bindings for:

  • Go - The library is written in Go so it works out of the box.
  • Python - We are considering GoPy for this. This will be the first step towards language bindings.
  • Java and C# if there is interest and demand.
  • Ruby if there is interest and demand.
  • Other languages will be considered if there is interest and demand.

Licensing/Pricing

We are releasing this under a dual AGPL/Commercial license as we need to help fund further development to achieve our goals. See pricing.

More information

For information see GitHub and the project website.