XMP metadata in Pdf document
What is XMP
The Extensible Metadata Platform (XMP) is an ISO standard, originally created by Adobe Systems Inc., for the creation, processing and interchange of standardized and custom metadata for digital documents and data sets.
XMP is used to standardize a data model, format of serialization, core properties and definitions for all extensible metadata. It also defines how the XMP document should be embedded into common image, video, document file format i.e. into Pdf.
The XMP data model, serialization format and core properties is published by the International Organization for Standardization as ISO 16684-1:2012 standard
XMP Namespaces
The XMP specification defines some core properties stored in well known namespaces. Each namespace defines some properties, that standardizes metadata needs for some specific usage.
The XMP Metadata document might be composed of multiple namespace models. Some namespace usage is generic for multiple file types, and some are meant to be used only for specific file types.
Core Namespaces
Adobe XMP Specification defines 4 core namespaces:
- Dublin Core - provides a set of commonly used properties. The names and usage shall be as
defined in the Dublin Core Metadata Element Set, created by the Dublin Core Metadata Initiative (DCMI).
- The namespace URI shall be
http://purl.org/dc/elements/1.1/
. - The preferred namespace prefix is ‘dc’.
- The namespace URI shall be
- XMP - contains properties that provide basic descriptive information.
- The namespace URI shall be
http://ns.adobe.com/xap/1.0/
. - The preferred namespace prefix is ‘xmp’.
- The namespace URI shall be
- XMP Rights Management - contains properties that provide information regarding the legal
restrictions associated with a resource.
- The namespace URI shall be
http://ns.adobe.com/xap/1.0/rights/
. - The preferred namespace prefix is ‘xmpRights’.
- The namespace URI shall be
- XMP Media Management - contains properties that provide information regarding the
identification, composition, and history of a resource.
- The namespace URI shall be
http://ns.adobe.com/xap/1.0/mm/
. - The preferred namespace prefix is ‘xmpMM’.
- The namespace URI shall be
Pdf specialized namespaces
Some namespaces are specialized for the PDF document metadata, like:
Adobe PDF - specifies properties used with Adobe PDF documents.
- The namespace URI shall be
http://ns.adobe.com/pdf/1.3/
- The preferred namespace prefix is
pdf
- The namespace URI shall be
PDF/A Identification - indicate that the file is a PDF/A-1 document and its conformance level
- The namespace URI shall be
http://www.aiim.org/pdfa/ns/id/
- The preferred namespace prefix is ‘pdfaid’
- The namespace URI shall be
PDF/A Extension - defines auxiliary XMP schemas descriptions, used for PDF/A-1 standard. Is composed of 5 namespaces:
- PDF/A Extension -
http://www.aiim.org/pdfa/ns/extension/
- ‘pdfaExtension’ - root extension namespace - PDF/A Field type -
http://www.aiim.org/pdfa/ns/field#
- ‘pdfaField’ - defines auxiliary field types - PDF/A Property value type -
http://www.aiim.org/pdfa/ns/property#
- ‘pdfaProperty’ - defines auxiliary field type properties - PDF/A Schema value type -
http://www.aiim.org/pdfa/ns/schema#
- ‘pdfaSchema’ - defines auxiliary schema information' - PDF/A ValueType value type -
http://www.aiim.org/pdfa/ns/type#
- ‘pdfaType’ - defines auxiliary types
- PDF/A Extension -
UniPDF XMP utilities
Along with a UniPDF v3.31.0 release - it supports advanced read and write operations for PDF XMP metadata.
The new Golang package github.com/unidoc/unipdf/v3/model/xmputil
defines document and models abstraction as well as utility functions that provides an easy way to read and store complex XMP documents.
Data models
For an easy access, UniPDF prepared some data models, that in many cases are composed of multiple namespaces. The models are meant to provide simple access to context specific metadata.
Currently, there are 4 common metadata models defined:
- PdfInfo -
github.com/unidoc/unipdf/v3/model/xmputil.PdfInfo
- is a data model that is composed of multiple namespaces. It contains some parts of Dublin Core, XMP and Adobe PDF namespaces. It contains a PdfInfo field that could be parsed intogithub.com/unidoc/unipdf/v3/model.PdfInfo
, and few different PDF related fields like Copyright. Its content should be equal to PDF document Info dictionary. - XMP Media Management -
github.com/unidoc/unipdf/v3/model/xmputil.MediaManagement
- XMP Media Management namespace model. - PDF/A Identification -
github.com/unidoc/unipdf/v3/model/xmputil/pdfaid.Model
- PDF/A Identification namespace model. - PDF/A Extensions -
github.com/unidoc/unipdf/v3/model/xmputil/pdfaextension.Model
- PDF/A Extension namespace model.
Examples:
- Extract XMP Media Management - extract information about XMP Media Management metadata.
- Extract PdfInfo - extract information for Adobe PDF, Dublin Core and XMP for PDF documents.
- Extract PDF/A Identification - check if a document is conformant with the PDF/A standard and to which conformance level.
- Set PdfInfo model - set up Pdf information metadata.
- Set XMP Media Management - set XMP Media Management metadata.
Custom data models
UniPDF XMP serializer is based on the github.com/trimmer-io/go-xmp
implementation. Any other metadata, that is not defined in the UniPDF xmputil
package
could be extracted and written by using direct access to the github.com/trimmer-io/go-xmp/xmp.Document
.
A user can extract any registered model that implements github.com/trimmer-io/go-xmp/xmp.Model
interface. For some predefined models take a look at: GoXMP Models
Examples:
- Set Custom Namespace Metadata - set up some custom metadata for external model.
- Extract Custom Namespace Metadata - extract information for some custom metadata model.
XMP Licensing
XMP is a registered trademark of Adobe Systems Incorporated. The XMP specification became an ISO standard and is not proprietary anymore.
Initially, Adobe released source code for the XMP SDK under a license called the ADOBE SYSTEMS INCORPORATED — OPEN SOURCE LICENSE. The compatibility of this license with the GNU General Public License has been questioned. The license is not listed on the list maintained by the Open Source Initiative and is different from the licenses for most of their open source software.
On May 14, 2007, Adobe released the XMP Toolkit SDK under a standard BSD license
On August 28, 2008, Adobe posted a public patent license for the XMP. As of November 2016, Adobe continues to distribute these documents under the XMP Specification Public Patent License.