Data Comparison

Introduction

For more data-centric XML resources, the comparison pipeline may have a number of design considerations and priorities different from those for comparing document-centric resources (as described in the previous section). This section outlines comparison features that are more significant in this context, but of course, many features described in the User Guide to Document Comparison page may also apply.

For optimized processing of document-centric features, two approaches are recommended. The first approach is to exploit built-in features in XML Compare's data comparator augmented with custom XSLT filters where required. The second, more complex approach, is to use the pipelined comparator with a specially configured pipeline exploiting a set of custom XSLT filters.

Most of the features outlined in this section are incorporated into the data comparator, however links are included to samples for cases where you wish customize your own pipelined comparator, these samples also provide some useful insight into how the capabilities that are built into the data comparator actually work. Not all features are enabled by default in the data comparator.

Comparing Large Datasets

When comparing large datasets there are some extra factors to consider, these are covered in the Comparing Large Files guide.

Ignoring Changes

For cases where changes in data are expected but not deemed significant, changes can be 'ignored' in the processing pipeline, a technique for this is explained in the sample: Ignoring Changes.

Attribute Splitting

XML inputs may use attributes to store combined information such as sets of data. For such cases Attribute Splitting can enable a highly granular comparison of the attribute. By using the attribute splitting configuration, attribute data can be treated as either an ordered or unordered list of data within a single attribute, and a separator character can be specified to control how to tokenize the attribute string into meaningful data for the comparison. See the Attribute Splitting guide for detailed information.

Detecting and Handling Moves

Sometimes when an identical element is deleted from an XML data file in one place and inserted in a different place, you may wish to show this change as a move, rather than a deletion and insertion of data. See the Detecting and Handling Moves guide for details of how to use this capability in Data Comparator.

Ordered and Orderless Data

For cases where data lists should be treated as either an ordered or unordered list, the subtree processing configuration can be used to specify where in the XML document a list should be considered have an order or not. The Unordered Comparison within a Subtree section of the Subtree Processing guide details how to use this capability.

Numeric Tolerances

For comparison of floating point numbers there may be a requirement to ignore value differences within a specified tolerance, this tolerance can be implemented via output filters based on existing filter resources included in XML Compare, Numeric Tolerances is a worked example of this.