Learn how to reduce Your PDF document size with golang

Learn How to Reduce Your PDF Document Size with Golang

Introduction

In the modern digital world, PDF documents have become a staple for sharing and preserving information. From business reports and legal documents to ebooks and user manuals, PDFs are widely used due to their compatibility, security, and ease of use. However, as the size of PDF files increases, it can become challenging to manage and share them efficiently. This is where PDF compression comes into play. In this blog post, we will explore the importance of PDF compression, its benefits, and two popular solutions for PDF compression in Golang: UniPDF and pdfcpu.

The Need for PDF Compression

PDF compression is the process of reducing the size of a PDF file while maintaining its visual quality and readability. Compressed PDFs take up less disk space, load faster, and are easier to share and transfer over the internet. For businesses and individuals dealing with large volumes of PDF documents, compression offers several advantages, including:

  • Optimized storage: Compressed PDFs consume less disk space, allowing you to store more files without worrying about storage limitations.

  • Faster loading times: Smaller file sizes result in quicker loading times, enhancing the user experience for accessing and viewing PDFs.

  • Bandwidth conservation: Compressed PDFs require less bandwidth when transferring or sharing online, reducing data usage and costs.

  • Improved efficiency: Compressed PDFs are easier to handle, upload, and distribute, increasing productivity and streamlining document management processes.

To achieve efficient PDF compression, it is essential to understand the techniques and tools available for the task. In the following sections, we will delve into the details of PDF compression and explore two powerful Golang solutions: UniPDF and pdfcpu.

Understanding PDF Compression

Before we dive into the specific tools, let’s gain a better understanding of what PDF compression entails.

PDF compression involves reducing the file size of a PDF document by eliminating redundant or unnecessary data. This data can include redundant images, unused fonts, duplicated content, and embedded objects that are not crucial to the document’s visual representation or integrity. Compression techniques aim to strike a balance between reducing file size and maintaining the document’s visual quality and readability.

There are two main types of PDF compression techniques:

  1. Lossless Compression: Lossless compression reduces the size of a PDF file without compromising the document’s visual quality. This technique achieves compression by removing redundancies, eliminating unused data, and applying data compression algorithms. Lossless compression ensures that the document’s content remains intact and can be decompressed to its original form without any loss of information.

  2. Lossy Compression: Lossy compression sacrifices some visual quality to achieve higher compression ratios. It selectively removes less noticeable details from images, reduces color depth, and applies other techniques to reduce file size. While lossy compression results in smaller file sizes, it may slightly impact image clarity and fine details.

When compressing PDFs, it is crucial to strike the right balance between file size reduction and visual quality preservation. The resolution and image quality settings play a significant role in achieving this balance.

Exploring Solutions for PDF Compression

In the Golang ecosystem, two prominent solutions for PDF compression stand out: UniPDF and pdfcpu. Let’s take a closer look at both options and explore their features, pros, and cons.

UniPDF

UniPDF is a comprehensive PDF library for Golang that provides a wide range of features for working with PDF documents. It supports various operations such as reading, writing, editing, and manipulating PDF files. UniPDF includes a powerful compression functionality that allows developers to compress PDF documents effectively.

Pros of UniPDF for PDF Compression

  • Feature-rich: UniPDF offers an extensive set of features beyond compression, making it a versatile solution for all PDF-related tasks.

  • Easy to use: UniPDF provides a user-friendly API and clear documentation, making it accessible even for developers new to PDF processing.

  • Cross-platform compatibility: UniPDF supports multiple platforms, including Windows, macOS, and Linux, enabling seamless integration into various development environments.

Cons of UniPDF for PDF Compression

  • Lack of advanced compression settings: While UniPDF offers efficient compression algorithms, it may not provide fine-grained control over compression settings compared to specialized compression tools.

  • Resource consumption: UniPDF is a powerful library that may require higher computational resources, especially when dealing with large PDF files.

To demonstrate UniPDF’s capabilities, let’s walk through a tutorial on how to use UniPDF for PDF compression.

Getting Started with UniPDF

Step 1: Installation To begin using UniPDF, you need to install the library. Open your terminal and run the following command:

go get github.com/unidoc/unipdf/v3/...

Step 2: Importing UniPDF Import the UniPDF package in your Go code:

package main

import (
	"fmt"
	"log"
	"os"

	"github.com/unidoc/unipdf/v3/common/license"
	"github.com/unidoc/unipdf/v3/model"
	"github.com/unidoc/unipdf/v3/model/optimize"
)

func init() {
	// Make sure to load your metered License API key prior to using the library.
	// If you need a key, you can sign up and create a free one at https://cloud.unidoc.io
	err := license.SetMeteredKey(os.Getenv("UNIDOC_LICENSE_API_KEY"))
	if err != nil {
		panic(err)
	}
}

Step 3: PDF Compression Now, let’s compress a PDF document using UniPDF:

func main() {
	inFile := "request.pdf"
	outFile := "output_compressedUniPdf.pdf"

	// Opening the input file.
	inputFile, err := os.Open(inFile)
	if err != nil {
		log.Fatalf("Fail: %v\n", err)
	}
	defer inputFile.Close()

	// Creating a new PDF Reader using the input file.
	reader, err := model.NewPdfReader(inputFile)
	if err != nil {
		log.Fatalf("Fail: %v\n", err)
	}

	// Generating a PDFWriter from PDFReader.
	pdfWriter, err := reader.ToWriter(nil)
	if err != nil {
		log.Fatalf("Fail: %v\n", err)
	}

	// Setting the optimizer that will adjust the optimization options.
	pdfWriter.SetOptimizer(optimize.New(optimize.Options{
		CombineDuplicateDirectObjects:   true,
		CombineIdenticalIndirectObjects: true,
		CombineDuplicateStreams:         true,
		CompressStreams:                 true,
		UseObjectStreams:                true,
		ImageQuality:                    80,
		ImageUpperPPI:                   100,
	}))

	// Create output file.
	err = pdfWriter.WriteToFile(outFile)
	if err != nil {
		log.Fatalf("Fail: %v\n", err)
	}

	fmt.Printf("PDF compression successful. Compressed PDF saved as %s\n", outFile)

	// Then print out the statistics of both files to compare the sizes.
	// Get input file stat.
	inputFileInfo, err := os.Stat(inFile)
	if err != nil {
		log.Fatalf("Fail: %v\n", err)
	}
	// Get output file stat.
	outputFileInfo, err := os.Stat(outFile)
	if err != nil {
		log.Fatalf("Fail: %v\n", err)
	}

	// Print basic optimization statistics.
	inputSize := inputFileInfo.Size()
	outputSize := outputFileInfo.Size()
	ratio := 100.0 - (float64(outputSize) / float64(inputSize) * 100.0)

	fmt.Printf("[Original file]: %s [size]: %d\n", inFile, inputSize)
	fmt.Printf("[Optimized file]: %s [size]: %d\n", outFile, outputSize)
	fmt.Printf("[Compression ratio]: %.2f%%\n", ratio)
}

This code snippet demonstrates the basic process of compressing a PDF using UniPDF. It loads the input PDF, creates a new PDF writer, sets the compression level to CompressionLevelBest (the highest compression level), adds all pages from the input PDF to the writer, and finally writes the compressed PDF to an output file.

Case Study: UniPDF’s Compression Capabilities

To showcase the effectiveness of UniPDF’s compression capabilities, consider a scenario where you have a large PDF file that needs to be compressed for easier storage and sharing. By applying UniPDF’s compression algorithms, you can significantly reduce the file size without sacrificing the document’s quality.

pdfcpu

pdfcpu is another popular PDF library for Golang that provides a comprehensive set of features, including PDF compression. While UniPDF offers a wide range of functionalities, pdfcpu focuses primarily on PDF processing and optimization.

Pros of pdfcpu for PDF Compression

  • Dedicated PDF processing: pdfcpu is designed specifically for PDF processing, making it a lightweight and efficient solution for tasks such as compression.

  • Customizability: pdfcpu offers more fine-grained control over compression settings, allowing developers to tailor the compression algorithm to their specific needs.

  • Command-line interface: pdfcpu includes a command-line interface (CLI) that simplifies the compression process, making it accessible to non-programmers as well.

Cons of pdfcpu for PDF Compression

  • Limited feature set: Compared to UniPDF, pdfcpu’s feature set is more focused on PDF processing and optimization, which means it may lack certain advanced features required for complex PDF manipulation.

  • Less user-friendly API: While pdfcpu provides extensive documentation, its API may have a steeper learning curve for developers unfamiliar with PDF processing.

Now, let’s walk through a tutorial on how to use pdfcpu for basic PDF compression.

Getting Started with pdfcpu

Step 1: Installation To begin using pdfcpu, you need to install the library. Open your terminal and run the following command:

go get github.com/pdfcpu/pdfcpu/pkg/api

Step 2: PDF Compression Now, let’s compress a PDF document using pdfcpu:

package main

import (
    "fmt"
    "log"
    "os"

    "github.com/pdfcpu/pdfcpu/pkg/api"
)

func main() {
	inFile := "request.pdf"
	outFile := "output_compressedPdfCpu.pdf"

	// Create an optimized version of inFile.
	if err := api.OptimizeFile(inFile, outFile, nil); err != nil {
		fmt.Printf("Error optimizing file: %v\n", err)
		return
	}

	fmt.Println("Optimized file saved in:", outFile)

	// Then the statistics of both files are obtained to compare sizes.
	// Get input file stat.
	inputFileInfo, err := os.Stat(inFile)
	if err != nil {
		log.Fatalf("Fail: %v\n", err)
	}

	// Get output file stat.
	outputFileInfo, err := os.Stat(outFile)
	if err != nil {
		log.Fatalf("Fail: %v\n", err)
	}

	// Print basic optimization statistics.
	inputSize := inputFileInfo.Size()
	outputSize := outputFileInfo.Size()
	ratio := 100.0 - (float64(outputSize) / float64(inputSize) * 100.0)

	fmt.Printf("[Original file]: %s [size]: %d\n", inFile, inputSize)
	fmt.Printf("[Optimized file]: %s [size]: %d\n", outFile, outputSize)
	fmt.Printf("[Compression ratio]: %.2f%%\n", ratio)
}

This code snippet demonstrates the basic process of compressing a PDF using pdfcpu. It specifies the input PDF file, the output file, and sets the compression mode to COMPRESS to enable compression. By running this code, you can compress your PDF document using pdfcpu.

Comparing UniPDF and pdfcpu

Both UniPDF and pdfcpu offer effective solutions for PDF compression in Golang. While UniPDF provides a feature-rich library with a comprehensive set of PDF functionalities, pdfcpu focuses specifically on PDF processing and optimization. The choice between the two depends on your specific requirements and preferences.

FeatureUniPDFpdfcpu
Compression/Optimization OptionsComprehensive (Lossy and Lossless options)Limited
PDF FunctionalityWide range of PDF manipulation optionsPDF processing and optimization
FocusVersatile PDF libraryPDF processing and optimization
APIEasy-to-useCustomizable
DocumentationClearDetailed
Cross-platform compatibilityYesYes
Fine-grained controlYes (for compression settings)Yes
Lightweight-Yes (Lightweight footprint)
Command-line interface-Yes (suitable for non-programmers)
Combine Duplicate StreamsLosslessNot available
Combine Duplicate Direct ObjectsLosslessNot available
Image Upper PPILossyNot available
Image QualityLossyNot available
Use Object StreamsLosslessNot available
Combine Identical Indirect ObjectsLosslessNot available
Compress StreamsUsually lossless, potentially lossyNot available
Clean FontsPotentially lossyNot available
Subset FontsLossyNot available
Clean ContentstreamLosslessNot available

Both UniPDF and pdfcpu offer effective solutions for PDF compression in Golang. While UniPDF provides a feature-rich library with a comprehensive set of PDF functionalities, pdfcpu focuses specifically on PDF processing and optimization. The choice between the two depends on your specific requirements and preferences.

Conclusion

In the digital era, where PDF documents are prevalent, the need for efficient PDF compression is paramount. Compressed PDFs enable faster loading times, optimized storage, and easier sharing, improving overall productivity and document management.

In this blog post, we explored the importance of PDF compression and its benefits. We delved into two popular solutions for PDF compression in Golang: UniPDF and pdfcpu. UniPDF, with its feature-rich capabilities, offers a comprehensive PDF library that includes efficient compression algorithms. On the other hand, pdfcpu, focusing primarily on PDF processing and optimization, provides more customizability and control over compression settings.

We encourage you to explore both UniPDF and pdfcpu, evaluate their features and trade-offs, and choose the solution that best suits your specific needs. Remember, effective PDF compression is crucial for optimizing storage, improving loading times, and enhancing the overall user experience. Start compressing your PDF documents today and unlock the benefits of streamlined PDF management.