XMP metadata in Pdf document

What is XMP

The Extensible Metadata Platform (XMP) is an ISO standard, originally created by Adobe Systems Inc., for the creation, processing and interchange of standardized and custom metadata for digital documents and data sets.

XMP is used to standardize a data model, format of serialization, core properties and definitions for all extensible metadata. It also defines how the XMP document should be embedded into common image, video, document file format i.e. into Pdf.

The XMP data model, serialization format and core properties is published by the International Organization for Standardization as ISO 16684-1:2012 standard

XMP Namespaces

The XMP specification defines some core properties stored in well known namespaces. Each namespace defines some properties, that standardizes metadata needs for some specific usage.

The XMP Metadata document might be composed of multiple namespace models. Some namespace usage is generic for multiple file types, and some are meant to be used only for specific file types.

Core Namespaces

Adobe XMP Specification defines 4 core namespaces:

  • Dublin Core - provides a set of commonly used properties. The names and usage shall be as defined in the Dublin Core Metadata Element Set, created by the Dublin Core Metadata Initiative (DCMI).
    • The namespace URI shall be http://purl.org/dc/elements/1.1/.
    • The preferred namespace prefix is ‘dc’.
  • XMP - contains properties that provide basic descriptive information.
    • The namespace URI shall be http://ns.adobe.com/xap/1.0/.
    • The preferred namespace prefix is ‘xmp’.
  • XMP Rights Management - contains properties that provide information regarding the legal restrictions associated with a resource.
    • The namespace URI shall be http://ns.adobe.com/xap/1.0/rights/.
    • The preferred namespace prefix is ‘xmpRights’.
  • XMP Media Management - contains properties that provide information regarding the identification, composition, and history of a resource.
    • The namespace URI shall be http://ns.adobe.com/xap/1.0/mm/.
    • The preferred namespace prefix is ‘xmpMM’.

Pdf specialized namespaces

Some namespaces are specialized for the PDF document metadata, like:

  • Adobe PDF - specifies properties used with Adobe PDF documents.

    • The namespace URI shall be http://ns.adobe.com/pdf/1.3/
    • The preferred namespace prefix is pdf
  • PDF/A Identification - indicate that the file is a PDF/A-1 document and its conformance level

    • The namespace URI shall be http://www.aiim.org/pdfa/ns/id/
    • The preferred namespace prefix is ‘pdfaid’
  • PDF/A Extension - defines auxiliary XMP schemas descriptions, used for PDF/A-1 standard. Is composed of 5 namespaces:

    • PDF/A Extension - http://www.aiim.org/pdfa/ns/extension/ - ‘pdfaExtension’ - root extension namespace
    • PDF/A Field type - http://www.aiim.org/pdfa/ns/field# - ‘pdfaField’ - defines auxiliary field types
    • PDF/A Property value type - http://www.aiim.org/pdfa/ns/property# - ‘pdfaProperty’ - defines auxiliary field type properties
    • PDF/A Schema value type - http://www.aiim.org/pdfa/ns/schema# - ‘pdfaSchema’ - defines auxiliary schema information'
    • PDF/A ValueType value type - http://www.aiim.org/pdfa/ns/type# - ‘pdfaType’ - defines auxiliary types

UniPDF XMP utilities

Along with a UniPDF v3.31.0 release - it supports advanced read and write operations for PDF XMP metadata.

The new Golang package github.com/unidoc/unipdf/v3/model/xmputil defines document and models abstraction as well as utility functions that provides an easy way to read and store complex XMP documents.

Data models

For an easy access, UniPDF prepared some data models, that in many cases are composed of multiple namespaces. The models are meant to provide simple access to context specific metadata.

Currently, there are 4 common metadata models defined:

  • PdfInfo - github.com/unidoc/unipdf/v3/model/xmputil.PdfInfo - is a data model that is composed of multiple namespaces. It contains some parts of Dublin Core, XMP and Adobe PDF namespaces. It contains a PdfInfo field that could be parsed into github.com/unidoc/unipdf/v3/model.PdfInfo, and few different PDF related fields like Copyright. Its content should be equal to PDF document Info dictionary.
  • XMP Media Management - github.com/unidoc/unipdf/v3/model/xmputil.MediaManagement - XMP Media Management namespace model.
  • PDF/A Identification - github.com/unidoc/unipdf/v3/model/xmputil/pdfaid.Model - PDF/A Identification namespace model.
  • PDF/A Extensions - github.com/unidoc/unipdf/v3/model/xmputil/pdfaextension.Model - PDF/A Extension namespace model.

Examples:

Custom data models

UniPDF XMP serializer is based on the github.com/trimmer-io/go-xmp implementation. Any other metadata, that is not defined in the UniPDF xmputil package could be extracted and written by using direct access to the github.com/trimmer-io/go-xmp/xmp.Document.

A user can extract any registered model that implements github.com/trimmer-io/go-xmp/xmp.Model interface. For some predefined models take a look at: GoXMP Models

Examples:

XMP Licensing

XMP is a registered trademark of Adobe Systems Incorporated. The XMP specification became an ISO standard and is not proprietary anymore.

Initially, Adobe released source code for the XMP SDK under a license called the ADOBE SYSTEMS INCORPORATED — OPEN SOURCE LICENSE. The compatibility of this license with the GNU General Public License has been questioned. The license is not listed on the list maintained by the Open Source Initiative and is different from the licenses for most of their open source software.

On May 14, 2007, Adobe released the XMP Toolkit SDK under a standard BSD license

On August 28, 2008, Adobe posted a public patent license for the XMP. As of November 2016, Adobe continues to distribute these documents under the XMP Specification Public Patent License.

References