Customizing a Comparison
Introduction
Since XML Compare uses XML to represent changes, an API and Pipeline Configuration architecture allows standard XML technologies such as XSLT to be applied, complex information pipelines can therefore be built from a set of simple proven components.
Configuration of a typical custom comparison pipeline

Samples of Customised Comparisons
A set of samples are included with XML Compare; these include working code and documentation for a number of customized comparison scenarios.
Choosing the Comparator
When a comparison is invoked via the recommended com.deltaxml.cores9api API, you have the choice of three comparator classes: DocumentComparator
, DataComparator
, or PipelinedComparatorS9
.
Note:
When invoking a comparison through the command-line interface (CLI), the comparator class used will depend on whether a DCP file ID (for DocumentComparator), DTCP file ID (for DataComparator), or DXP file ID (for PipelinedComparatorS9) is used.
Pipelined Comparator
Implemented via the PipelinedComparatorS9 Java class, this provides a very flexible form of comparison, best suited for when the input XML is not always document based or when you require low-level control of the processing pipeline. Except for restrictions associated with lexical preservation filters, input and output filters can be added to the processing pipeline at any point.
Document Comparator
Implemented through the DocumentComparator Java class, this has a pipeline specially optimized for document comparison, the figure below shows a simplified representation of this pipeline. Explicit extension points are available on the pipeline so new filter-steps or chains can be inserted in a managed way.
Filter steps or chains can be applied to specific extension points of the Document Comparator

Data Comparator
Implemented through the DataComparator Java class, this has an efficient pipeline specially optimized for data comparison, the figure below shows a simplified representation of this pipeline. Explicit extension points are available on the pipeline so new filter-steps or chains can be inserted in a managed way.
Filter steps or chains can be applied to specific extension points of the Data Comparator

Defining Pipelines
Pipelined Comparator
The Pipelined Comparator allows comparisons to be optimized for particular types of data or document structure, it also allows customisation of the way detected differences are represented in the output. The pipeline for a Pipelined Comparator is defined using a set of filters managed in FilterStep
and FilterChain
objects that can be added at both comparator inputs ('A' and 'B') or the comparator output.
The guide, Specifying a Comparison Pipeline provides an overview of how pipelines can be defined with the Pipelined Comparator, specifically through the use of Java, or an XML pipeline descriptor file format, called DXP.
More details on the use of DXP can be found in the document Pipeline Configuration using DXP.
Document Comparator
The Document Comparator differs from the Pipelined Comparator in that key parts of the pipeline are pre-defined with specialist document comparison features; this pipeline is modified by adding filters at certain named 'extension points'.
As in the Pipelined Comparator, filters are managed as FilterStep
and FilterChain
objects in Java, these are added to the pipeline using the DocumentComparator's setExtensionPoint
method. An alternative way to configure a Document Comparator is to use a Document Comparator Pipelines configuration file (DCP).
The Document Comparator is described in the Comparator Guide. More details on using DCP can be found in the guide DCP User Guide.
Data Comparator
The Data Comparator differs from the Document Comparator in that key parts of the pipeline are pre-defined with specialist data comparison features; this pipeline is also modified by adding filters at certain named 'extension points', and the extension points are simplified.
As in the Pipelined Comparator, filters are managed as FilterStep
and FilterChain
objects in Java, these are added to the pipeline using the DataComparator's setExtensionPoint
method. An alternative way to configure a Data Comparator is to use a Data Comparator Pipelines configuration file (DTCP).
The Data Comparator is described in the Comparator Guide. More details on using DTCP can be found in the guide DTCP User Guide.
JAXP Pipeline Comparator (legacy)
A lower level method (now regarded as legacy but still useful for advanced users) for creating pipelines is also available for Java developers, this exploits JAXP interfaces. For this, JAXP Pipeline Examples introduces you to a set of examples available for download, the paper Powering Pipelines with JAXP provides further details on using JAXP.
Pipeline Diagnostics
When there is a need to diagnose stages in a pipeline, a debugFilesmode is available where the inputs and outputs of each filter is output to separate file, a file naming convention is used to indicate where each 'debug file' fits into the pipeline. The debugFiles mode is set either by the setDebugFiles
method call or with a Configuration Property (see Configuration Properties) in a DeltaXML Configuration file named 'deltaXMLConfig.xml', sample XML for setting this property is shown below:
|
Configuration
Low-level XML Compare functionality is configured using different methods according to how the functionality is implemented. These different methods are summarized below:
Configuration Summary
Config Properties | Comparator Features & Properties | Parser Features | Output Properties |
---|---|---|---|
Diagnostics Settings | DeltaV Format | Configure XInclude | Indentation |
Catalog Settings | Diff/Patch Mode | JAXP/SAX Features | Doctype (DocType is affected by the LexicalPreservation configuration property) |
Matching Algorithm | |||
Ordering Priority |
Configuration Properties
Configuration Properties are used to control certain properties of a comparison operation that may have a wider scope than standard features and properties, more details can be found in the Configuration Properties guide.
Comparator Features and Properties
Features and properties are managed using the API or a DXP/DCP/DTCP definition, the Features and Properties document describes the features and properties available.
Parser Features
Features for the Apache Xerces parser can be set either from the API or a DXP/DCP/DTCP configuration, a DXP example can be found in the sample XInclude and XML Compare.
Output Properties
Output properties control the serializer of XML Compare's internal Saxon processor, they are set from the API or using DXP, DCP, or DTCP. An example of how DocType and indentation is set using DXP can be found in the Pipeline Configuration using DXP document.