Dealing with Slow XSLT Transformations

I have some XSLT stylesheets that I use to integrate information from Amazon with my reading list on my website. This weekend I decided to create another list using Amazon’s “People Who Bought this Book also Bought” feature. After a couple generations of downloading books that were related to books I read, I ended up with an XML file with about 1,500 books in it. I tried running the transform using Xerces and it took 13 hours.

Obviously this wasn’t going to work, so I refactored parts of the XSLT and was able to cut that time in half. 6 hours still seemed like it was taking far too long. I considered doing away with the stylesheets and doing the transforms manually using Java to write directly to a file. After doing some searches I discovered that apache.org has a good solutions for this problem that is part of the Xalan-J project. Basically it allows you to take an XSLT stylesheet and compile it into java byte code called a “translet”. You can then run the transform using Xalan and the compiled java translet. This makes things run much quicker. It appears that this is because the transform doesn’t need to deal with the XSLT file and because of other optimizations.

To do this you must have xercesImpl.jar, xml-api.jar, and xalan.jar in your classpath. If you have problems be sure to double check this because the error message it gives you doesn’t always indicate that it can’t find a required class. Use the Xalan command line program to turn your XLST file into a translet.
java org.apache.xalan.xsltc.cmdline.Compile myXSLT.xsl

This will produce several class files:

myXSLT.class
myXSLT$0.class
myXSLT$1.class
myXSLT$2.class
myXSLT$3.class
myXSLT$4.class

To transform xml using this byte code do the following:
java org.apache.xalan.xsltc.cmdline.Transform myFile.xml myXSLT.xsl
This will run the transform using the byte code and output the results to the screen (unless you are redirecting the output from within the XSL file). To put it in a file:
java org.apache.xalan.xsltc.cmdline.Transform myFile.xml myXSLT.xsl > myFile.html

One thing I discovered was the fact that the redirect:write tag in my xsl file would not compile into byte code. I switched to using the xsltc:output tag instead and it would work, but the compiler always complained because I was using the select attribute instead of the file attribute. At first I thought I could just change the attribute name. However select allows you to use a variable name, but file does not. So <xsltc:output file="$filename"> will output to a file named “$filename” instead of whatever is in the variable $filename.

There is a way around this by using <xsltc:output file="{$filename}">

There are other options for doing the compile including -j which will put the class files into a jar and some options to allow you to compile multiple files at the same time.

When I finished figuring everything out I tried the same transform that had previously taken 13 hours on a 1,500 item xml file. This time I used a 5,000 item file, but it completed in 2 to 3 hours. If you use large xml files as input for XSLT transformations I would highly suggest looking into using translets.

About 

2 Replies to “Dealing with Slow XSLT Transformations”

  1. Am getting error when hit the compile command
    Compiler errors:
    org/apache/xml/serializer/Encodings
    I had added the serializer.jar into the class path, then its started working fine and successfully created the class files in my folder.

    I have executed the below command to transform :
    java org.apache.xalan.xsltc.cmdline.Transform Employee.xml SGEFP_XSLT.xslt > output.xml
    But getting exception as
    Translet errors:Cannot find class ‘SGEFP_XSLT.xslt’.

    Can you help me, how can i resolve this error?

    1. Sorry, it has been over a decade since I worked on this so I may not be much help. It sounds like the class files may be getting put somewhere it can’t find them. Are you sure the resulting class files are going to be on the classpath when you run the Transform?

Leave a Reply

Your email address will not be published.