Input Multithreading
What is input multithreading?
Input multithreading is a feature of https://apidocs.deltaxml.com/xml-compare/current/docs/api/com/deltaxml/cores9api/DocumentComparator.html, https://apidocs.deltaxml.com/xml-compare/current/docs/api/com/deltaxml/cores9api/PipelinedComparatorS9.html and DataComparator. With input multithreading, the two input pipelines of a comparison are run simultaneously rather than in sequence one after the other. This has the benefit of reducing the time taken for a comparison to run, by using more of a machine's cores.
How to turn input multithreading off?
Input multithreading is turned on by default - you do not have to explicitly turn it on. If you wish to turn off input multithreading, you can do so via a system property:
When using non REST Server command lines-
com.deltaxml.xmlcompare.api.inputMultiThreading
should be set tofalse
.You can specify this using the following Java property via the command line:
-Dcom.deltaxml.xmlcompare.api.inputMultiThreading=false
This property sets the input multi-threading default value. If you provide a
ThreadingConfig
with input multithreading set to true, then that will take precedence and input multithreading will still be on.
When running up the REST Server from the command line -
com.deltaxml.xmlcompare.restserver.inputMultiThreading
should be set tofalse
.You can specify this using the following Java property via the command line:
-Dcom.deltaxml.xmlcompare.restserver.inputMultiThreading=false
Switching Input Multithreading on and off on Comparator instances using the Java API
You can also turn off input multithreading via the Java API:
For DocumentComparator
DocumentComparator documentComparator = new DocumentComparator();
// Create a new ThreadingConfig with input multithreading set to false.
ThreadingConfig threadingConfig = ThreadingConfig.createInstance(false, Optional.empty());
documentComparator.setThreadingConfig(threadingConfig);
For PipelinedComparatorS9
PipelinedComparatorS9 pipelinedComparator = new PipelinedComparatorS9();
// Create a new ThreadingConfig with input multithreading set to false.
ThreadingConfig threadingConfig = ThreadingConfig.createInstance(false, Optional.empty());
pipelinedComparator.setThreadingConfig(threadingConfig);
How do I use input multithreading correctly?
Input multithreading can result in a noticeable reduction to the time taken to perform a comparison. However, it is important to understand that there are extra considerations to be made when working with multithreaded code. You should consider your use of the API to ensure it aligns with the provided guidelines to prevent unwanted behaviour such as deadlocks and intermittent bugs from appearing in your program. With careful implementation, input multithreading can significantly increase the efficiency of your comparisons.
If you are seeing deadlocks or inconsistent results, please try turning off input multithreading to see if that solves your problem. If this solves your problem, you may be misusing the API. Please see the information below and the general page Multithreading with the Java API to help.
If you are having issues with input multithreading and you would like us to help, please open a support case and send us a small isolated sample that reproduces the problem.
Avoiding pitfalls when using input multithreading
You need to make sure that you are following the general guidelines for multithreading and comparators Multithreading with the Java API, Input Multithreading still requires these things to be true.
In previous versions without input multithreading, it was possible to rely on the input pipeline for A finishing before the input pipeline for B was started. However, this is not the case when input multithreading is on, as the execution of the two input pipelines will occur simultaneously. This can cause problems if your pipeline relies on this behaviour.
Generally if you are supplying java code via the Java API to either the https://apidocs.deltaxml.com/xml-compare/current/docs/api/com/deltaxml/cores9api/DocumentComparator.html or https://apidocs.deltaxml.com/xml-compare/current/docs/api/com/deltaxml/cores9api/PipelinedComparatorS9.html or DataComparator you need to make sure that the code is thread safe (if it is executed in the context of either of the input pipelines.)
One way in which we prevent thread safety issues is by turning off input multithreading when we detect configurations that are very likely thread unsafe. An example of this is that we detect when an instance of a org.xml.sax.helpers.XMLFilterImpl
subclass is supplied to us via https://apidocs.deltaxml.com/xml-compare/current/docs/api/com/deltaxml/cores9api/FilterStepHelper.html#newFilterStep-org.xml.sax.helpers.XMLFilterImpl-java.lang.String-, since we cannot rely on the code in org.xml.sax.helpers.XMLFilterImpl
itself behaving in a thread safe way, we will turn off Input Multithreading if this API is used having issued a warning. It is preferable to use the alternate class based API https://apidocs.deltaxml.com/xml-compare/current/docs/api/com/deltaxml/cores9api/FilterStepHelper.html#newFilterStep-java.lang.Class-java.lang.String- which will create a separate instance of a org.xml.sax.helpers.XMLFilterImpl
subclass for each input pipeline as needed. However, it is still possible to code an org.xml.sax.helpers.XMLFilterImpl
which is not thread safe when used in this way. One way would be to share state across instances via the use of mutable object references shared between instances, for example by using static variables - this is not thread safe.
Another case in which code you supply via the Java API is executed by both input pipelines concurrently is if you supply a URIResolver or an EntityResolver via https://apidocs.deltaxml.com/xml-compare/current/docs/api/com/deltaxml/cores9api/FilterStepHelper.html#FilterStepHelper-net.sf.saxon.s9api.Processor-org.xml.sax.EntityResolver-javax.xml.transform.URIResolver-, https://apidocs.deltaxml.com/xml-compare/current/docs/api/com/deltaxml/cores9api/PipelinedComparatorS9.html#setURIResolver-javax.xml.transform.URIResolver-, https://apidocs.deltaxml.com/xml-compare/current/docs/api/com/deltaxml/cores9api/DocumentComparator.html#setURIResolver-javax.xml.transform.URIResolver-, https://apidocs.deltaxml.com/xml-compare/current/docs/api/com/deltaxml/cores9api/PipelinedComparatorS9.html#setEntityResolver-org.xml.sax.EntityResolver-, https://apidocs.deltaxml.com/xml-compare/current/docs/api/com/deltaxml/cores9api/DocumentComparator.html#setEntityResolver-org.xml.sax.EntityResolver- or their two-argument equivalents, or if you construct your own FilterStepHelper via https://apidocs.deltaxml.com/xml-compare/current/docs/api/com/deltaxml/cores9api/FilterStepHelper.html#FilterStepHelper-net.sf.saxon.s9api.Processor-org.xml.sax.EntityResolver-javax.xml.transform.URIResolver-. The code you supply in these classes will potentially be called by the two input processing threads concurrently so they need to be coded in a thread safe way, or you need to switch of input multithreading as described above Input Multithreading | How-turn-input-multithreading-on-or-off? .
When supplying XSLT via https://apidocs.deltaxml.com/xml-compare/current/docs/api/com/deltaxml/cores9api/FilterStepHelper.html#newFilterStep-java.io.File-java.lang.String- or one of the equivalent methods, you should take care to ensure that any Java extension functions called within the XSLT are thread safe. If you do not make use of Java extension functions within your XSLT, then you should not have to make any changes to your XSLT to make it work correctly with input multithreading.
Advanced configuration of input multithreading
For input multithreading to be able to function, the comparator object needs to be able to send the processing of one of the input pipelines to another thread. By default https://apidocs.deltaxml.com/xml-compare/current/docs/api/com/deltaxml/cores9api/DocumentComparator.html, https://apidocs.deltaxml.com/xml-compare/current/docs/api/com/deltaxml/cores9api/PipelinedComparatorS9.html and DataComparator will create their own single thread pool to do this, and it will automatically get shutdown when the comparator gets garbage collected when it goes out of scope.
It may be that you wish to control how input multithreading creates and pools threads. Threads can be an expensive resource and you may wish to minimise how many are created, as well as how and when. For example you may wish to use the new ‘virtual’ threads available in more recent JDKs, or you may wish to share a pool of threads across comparators. To configure how threads are managed, it is possible to pass an optional ExecutorService when creating a new ThreadingConfig
via ThreadingConfig.createInstance
. The comparator will then submit any work to the provided ExecutorService
. Care must be taken when providing your own ExecutorService
when the ExecutorService
is shared. If there is not enough ‘capacity’ in an ExecutorService
(for example if created using Executors.newFixedThreadPool(...)) the comparison will hang until there is ‘capacity’ available. When executing, a comparator is already running inside one thread, let us call it the ‘main thread’, so it will only require one extra thread to run the second input pipeline concurrently. However, the ‘main thread’ will block once it is has completed its input processing whilst it waits for the extra thread to complete. If the ExecutorService
you supply has run out of threads, for example if each one of them is being used as a ‘main thread’ to execute a comparison there many never be enough extra threads and comparisons will deadlock.