I have some XSLT stylesheets that I use to integrate information from Amazon with my reading list on my website. This weekend I decided to create another list using Amazon’s “People Who Bought this Book also Bought” feature. After a couple generations of downloading books that were related to books I read, I ended up with an XML file with about 1,500 books in it. I tried running the transform using Xerces and it took 13 hours.
Obviously this wasn’t going to work, so I refactored parts of the XSLT and was able to cut that time in half. 6 hours still seemed like it was taking far too long. I considered doing away with the stylesheets and doing the transforms manually using Java to write directly to a file. After doing some searches I discovered that apache.org has a good solutions for this problem that is part of the Xalan-J project. Basically it allows you to take an XSLT stylesheet and compile it into java byte code called a “translet”. You can then run the transform using Xalan and the compiled java translet. This makes things run much quicker. It appears that this is because the transform doesn’t need to deal with the XSLT file and because of other optimizations.
To do this you must have xercesImpl.jar, xml-api.jar, and xalan.jar in your classpath. If you have problems be sure to double check this because the error message it gives you doesn’t always indicate that it can’t find a required class. Use the Xalan command line program to turn your XLST file into a translet.
java org.apache.xalan.xsltc.cmdline.Compile myXSLT.xsl
This will produce several class files:
To transform xml using this byte code do the following:
java org.apache.xalan.xsltc.cmdline.Transform myFile.xml myXSLT.xsl
This will run the transform using the byte code and output the results to the screen (unless you are redirecting the output from within the XSL file). To put it in a file:
java org.apache.xalan.xsltc.cmdline.Transform myFile.xml myXSLT.xsl > myFile.html
One thing I discovered was the fact that the redirect:write tag in my xsl file would not compile into byte code. I switched to using the xsltc:output tag instead and it would work, but the compiler always complained because I was using the
select attribute instead of the
file attribute. At first I thought I could just change the attribute name. However
select allows you to use a variable name, but
file does not. So
<xsltc:output file="$filename"> will output to a file named “$filename” instead of whatever is in the variable $filename.
There is a way around this by using
There are other options for doing the compile including -j which will put the class files into a jar and some options to allow you to compile multiple files at the same time.
When I finished figuring everything out I tried the same transform that had previously taken 13 hours on a 1,500 item xml file. This time I used a 5,000 item file, but it completed in 2 to 3 hours. If you use large xml files as input for XSLT transformations I would highly suggest looking into using translets.