Skip to main content
Skip table of contents

Image and Binary Comparison

Introduction

Many document formats provide mechanisms to refer to or include images and possibly other forms of binary content in documents. A typical example is the img element in HTML where the src attribute is typically a URI which refers to an image. Many applications or publishing processes which handle documentation formats support a range of image types, examples include: .gif, .jpg and .png.

Referencing

Most formats also support the use of both relative and absolute referencing. So for example it is possible to refer to an image using a relative URI:

CODE
<img src="diagram1.png"/>

In this case the location of the image is relative to the location of the file containing the image element and src URI.

It is also possible to use an absolute URI, these come in various forms or 'schemes', for example:

CODE
<img src="http://www.company.com/images/diagram2.gif" alt="flow diagram"/>

or

CODE
<img src="file:///c:/Users/Joe/images/diagram3.png" alt="flow diagram"/>

Comparing Images

Looking solely at an XML file perspective it is possible to determine when the value of an attribute has changed. However a more in depth comparison of referenced images can be performed when:

  1. There is access to a filesystem or similar hierarchical store to resolve file URIs.

  2. Can read the referenced files to analyse their contents

With filesystem and similar access it is possible to resolve a relative URI into an absolute URI.

Consider the following example:

  • file1.html is located inside the /Users/Joe/Documents directory and contains the following image reference:

    CODE
    <img src="images/pic1.png"/>

     

  • file2.html is located in the same directory, but contains

    CODE
    <img src="../Documents/images/pic1.png"/>

If you are given just the two files without any knowledge of their location, it's only possible to say that the src attribute has changed. However, with the knowledge of where the files are located (Joe's Documents directory) it is possible to resolve the URIs and determine that both src attributes are actually referring to the same file.

The above example demonstrates that image attribute change does not necessarily imply image change. The converse however is also true, it is possible to have an unchanged attribute value where the image does change. This can occur for example where the two xml input files are stored in different locations in the tree (not the same directory) and each has its associated images with local relative references.

To summarize our processing approach:

  1. If we don't have access to the filesystem or navigation tree we can only compare attribute values

  2. When we do have tree access, we resolve the references relative to the base of the two input files.

    1. When the references resolve to the same location we know the image is the same at that point and the comparison result will contain the resolved absolute path given by the URI from input B document given in the input B document, but with the proviso that when relative references are used the result file should be located in the tree such that the relative references still work.

    2. If the absolute references resolve to different locations then the images could be identical copies or they could be different. We perform a byte-by-byte comparison of the images. If we determine that every byte is identical we can then say that the images are identical and we only need to provide one of them in the result. If they differ, or if we cannot fully compare them byte-wise we will report them as changed and provide both image elements in the result (one marked with an A or deleted delta and the other marked B or added).

We have tried to provide both a conservative implementation, in that we will always assume change, unless we can be absolutely certain that the images or other binary content is identical. At the same time we would like an optimal and fast implementation. Here are some implementation notes:

  • If we have file system access we can ask for the sizes of the files (without reading their entire contents) and if they differ we assume that they are different without reading their content.

  • If there are any failures in the process (file permissions etc.) we assume the worst and that the files will differ.

  • The byte code comparison extension function is fail fast, it will report not-equal when the first byte that differs is found. Correspondingly it can only report equal when the last bytes of both files are read.

Running the sample

Download the sample from here: https://bitbucket.org/deltaxml/imagecompare/src

The file README.md  file gives instructions on how to run the sample.  

Binary Image Configuration

Image comparison can be easily configured using imageConfiguration config in the DCP. This includes the following elements:

  • isEnabled - set to true or false to enable/disable binary image comparison. Is true by default.

  • imageXpath - XPath to an attribute pointing to image location, specified on an image element.

CODE
<standardConfig>
    <imageConfiguration>
        <isEnabled literalValue="true"/>
        <imageXpath literalValue="*:img/@src"/>
    </imageConfiguration>
</standardConfig>

Image comparison is enabled by default but if no XPath is specified for imageXpath then it will not trigger.

To configure in the Java API use the following configuration:

CODE
dc.getImageConfiguration().setEnabled(true);
dc.getImageConfiguration().setImageXPath("*:img/@src");

If svg comparison is enabled then svg images will be compared using svg comparison otherwise they will be compared using binary image comparison.

example comparison - same path different image

documentA.xml

CODE
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
    <head>
        <title>Image Example</title>
    </head>
    <body>
        <img src="img/image.png" alt="Sample Image"/>
    </body>
</html>

documentB.xml

CODE
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
    <head>
        <title>Image Example</title>
    </head>
    <body>
        <img src="img/image.png" alt="Sample Image"/>
    </body>
</html>

result.xml

CODE
<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE html >
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:deltaxml="http://www.deltaxml.com/ns/well-formed-delta-v1" xmlns:dxx="http://www.deltaxml.com/ns/xml-namespaced-attribute" xmlns:dxa="http://www.deltaxml.com/ns/non-namespaced-attribute" deltaxml:deltaV2="A!=B" deltaxml:content-type="full-context" deltaxml:version="2.0">
    <head deltaxml:deltaV2="A=B">
        <title>Image Example</title>
    </head>
    <body deltaxml:deltaV2="A!=B">
        <img deltaxml:deltaV2="A" src="file:/Users/ryan.kirwan/Documents/DeltaXML-XML-Compare-16_1_0_j/samples/imagecompare/resources/path1/img/image.png" alt="Sample Image"/>
        <img deltaxml:deltaV2="B" src="file:/Users/ryan.kirwan/Documents/DeltaXML-XML-Compare-16_1_0_j/samples/imagecompare/resources/path2/img/image.png" alt="Sample Image"/>
    </body>
</html>
JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.