Print

Top Ten Java and XSLT Tips

by Eric M. Burke
08/29/2001

My new book, Java and XSLT, examines techniques for using XSLT with Java (of course!). This article highlights ten tips that I feel are important, although limiting the list to ten items only scratches the surface of what is possible. Most of these tips focus on the combination of Java and XSLT, rather than on specific XSLT (Extensible Stylesheet Transformations) techniques. For more detailed information, there are pointers to other valuable resources at the end of this article.

The basics of XSL transformations are pretty simple: one or more XSLT stylesheets contain instructions that define how to transform XML data into some other format. XSLT processors do the actual transformations; Sun Microsystems' Java API for XML Processing (JAXP) provides a standard Java interface to various processors. Here is some sample code that performs an XSL transformation using the JAXP API:

import javax.xml.transform.Source;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.stream.StreamSource;
import javax.xml.transform.stream.StreamResult;
import java.io.*;

public class Transform {

    /**
     * Performs an XSLT transformation, sending the results
     * to System.out.
     */
    public static void main(String[] args) throws Exception {
        if (args.length != 2) {
            System.err.println(
                "Usage: java Transform [xmlfile] [xsltfile]");
            System.exit(1);
        }

        File xmlFile = new File(args[0]);
        File xsltFile = new File(args[1]);

        // JAXP reads data using the Source interface
        Source xmlSource = new StreamSource(xmlFile);
        Source xsltSource = new StreamSource(xsltFile);

        // the factory pattern supports different XSLT processors
        TransformerFactory transFact =
                TransformerFactory.newInstance();
        Transformer trans = transFact.newTransformer(xsltSource);

        trans.transform(xmlSource, new StreamResult(System.out));
    }
}

You can click here to download a small ZIP file containing this example, along with an XSLT stylesheet and XML data file. The included README file explains how to compile and run this example.

Although this example utilizes StreamSource to read data from files, JAXP can also read XML data from SAX parsers or DOM trees. Here are my ten tips:

  1. Cache whenever possible.

    Performing transformations using XSLT is CPU- and memory-intensive, so it makes sense to optimize whenever possible. Various caching techniques are one of the best ways to improve runtime performance in XSLT-driven Web applications.

    Figure 1 illustrates a typical XSL transformation for a database-driven Web application:

    Figure 1

    Figure 1. Typical XSL transformation

    Figure 1. Typical XSL transformation



    Unlike dynamically-generated XML, XSLT stylesheets are generally stored as static files. Since these files rarely change, they can be parsed into memory and cached using JAXP's javax.xml.transform.Templates interface. The following code fragment shows how this is done:

    Source xsltSource = new StreamSource(xsltFile);
    TransformerFactory transFact = TransformerFactory.newInstance();
    Templates cachedXSLT = transFact.newTemplates(xsltSource);
    Transformer trans = cachedXSLT.newTransformer();
    

    Related Reading

    Java and XSLT

    Java and XSLT
    By Eric M. Burke

    Table of Contents
    Index
    Sample Chapter

    Read Online--Safari Search this book on Safari:
     

    Code Fragments only

    Now that the XSLT stylesheet is cached in memory using the Templates interface, it can be reused for many different transformations. Most importantly, this is good because it avoids repeatedly parsing the XSLT source into memory. It also gives XSLT processors an opportunity to optimize the transformation instructions, much in the same way that compilers optimize software.

    One might wonder if the XML data can also be cached in memory. For highly dynamic or personalized applications, the XML is generated dynamically with each client request and is constantly changing. For these types of applications, caching is not practical. For many other types of applications, however, the XML data may not change all that often.

    When the XML data does not change often, it makes more sense to cache the transformation result, rather than the input XML data. This is the fastest possible solution, and should be used whenever feasible.

  2. Test before deploying.

    Clean separation between data, programming logic, and presentation is a key reason to choose XML and XSLT for Web application development projects. Java code interacts with back-end data sources and generates XML data, XSLT stylesheets convert this XML data into XHTML (or WML, or anything else), and the browser displays the result.

    A unique, sometimes overlooked, benefit of this architecture is its ability to support automated unit tests. Tools like JUnit encourage programmers to write suites of automated unit tests. These tests greatly reduce the possibility of introducing new bugs as features are added to systems. Consider these components of a typical Java+XML+XSLT Web site:

    • Implement business logic using Java. Since Java code is not commingled with presentation logic, it can be tested just like any other Java code.

    • Convert application data into XML. This step is particularly easy to test - just generate the XML then validate it using a DTD or an XML Schema.

    • Transform XML data into XHTML. Once again, the generated XHTML can be automatically validated against one of the XHTML DTDs. Although this does not prove that the information is correct, it does ensure that the XHTML is well formed and valid.

    Unlike many other Web development techniques, none of these unit tests require deployment to a Web server. This makes automated unit tests much easier to implement, a key component of Extreme Programming (XP) techniques.

  3. Keep XSLT stylesheets simple.

    There are at least two reasons to keep XSLT stylesheets simple. First, XSLT is not a rich programming language like Java. While XSLT is good at transformations, it gets quite complicated when too much application logic is embedded in stylesheets. For this reason, it makes sense to implement as much business logic as possible using Java before creating your XML. This XML should then be much simpler to transform using XSLT.

    The second reason to keep stylesheets simple is because XSLT syntax is difficult to read. XML tags make XSLT easy to parse and manipulate programmatically, but all those XML tags can make stylesheets quite difficult to read. There are several things programmers can do to make XSLT stylesheets easier to read and more maintainable:

    • Use syntax highlighting editors, such as Altova's XML Spy.

    • Add distinctive comment blocks before each XSLT template. This helps break the monotony when scanning through page after page of '<' and '>' characters.

    • Adopt naming conventions for top-level variables and stylesheet parameters.

    • Break out common functionality into secondary stylesheets, using <xsl:import> to reuse code.

  4. Use CSS with XSLT.

    This tip goes hand-in-hand with the previous tip, in the sense that it can greatly reduce the complexity of XSLT stylesheets.

    XSLT and CSS perform different tasks that complement each other. XSLT transforms XML into other formats, such as XHTML or WML, while CSS only defines presentation styles. Sometimes the lines are blurred because XSLT can produce style elements as part of the generated XHTML.

    Instead of embedding lots of fonts, colors, and other style elements into XSLT stylesheets, try writing stand-alone CSS files. The XSL transformation produces XHTML files that merely include the standalone CSS files. This makes the XHTML smaller, simplifies the XSLT, and makes pages faster to download to browsers.

    This same technique also applies to JavaScript, which should be placed in stand-alone files rather than embedded into the transformations.

  5. Be careful with nonbreaking spaces

    Author's note: In response to reader comments, I have rewritten this tip to reflect what I have learned recently about nonbreaking spaces. Thank you to my readers for the feedback. [Editor's Note: We've inserted a Reader Response link at the end of this article for additional comments.]

    A nonbreaking space is a useful feature of XHTML that prevents browsers from introducing line breaks between words. It also makes it possible to force two or more consecutive spaces; browsers collapse sequences of ordinary spaces (and other whitespace characters) into a single space. Here is some XHTML that includes a nonbreaking space:

    Aidan&nbsp;Burke

    When people create XHTML Web pages, they typically insert the characters "&nbsp;" into their source as shown above. All browsers should interpret this character sequence as a nonbreaking space and display the page properly. When using XSLT to generate XHTML, however, things must be handled differently.

    XSLT stylesheets must be well-formed XML. Since "&nbsp;" is not one of the five predefined XML entities, it cannot be directly included in the stylesheet. For example, the following XSLT fragment does not work:

    <!-- won't work... -->
    <xsl:text>Aidan&nbsp;Burke</xsl:text>

    This typically leads XSLT programmers to utilize something slightly different:

    <xsl:text>Aidan&#160;Burke</xsl:text>

    As it turns out, this works just fine in almost all cases. When the stylesheet's output method is "html", processors like Xalan automatically convert the character entity &#160; into the character sequence "&nbsp;". From the perspective of the Web browser, this looks exactly like any other nonbreaking space.

    Here is a complete XSLT stylesheet that does just that:

    <?xml version="1.0" encoding="UTF-8"?>
    <xsl:stylesheet version="1.0" 
        xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
      <strong><xsl:output method="html" encoding="UTF-8"/></strong>
      <xsl:template match="/">
        <strong><xsl:text>Aidan&#160;Burke</xsl:text></strong>
      </xsl:template>
    </xsl:stylesheet>

    When using Xalan, the output from this transformation looks like this:

    Aidan&nbsp;Burke

    This is nice, because browsers know how to display "&nbsp;". Unfortunately, the XSLT spec does not require XSLT processors to convert "&#160;" into "&nbsp;". You should test this behavior with whatever XSLT processor you are using before counting on this behavior.

    Some programmers do not like having to remember that "160" stands for a nonbreaking space. So they define an entity in their XSLT stylesheet using an internal DTD subset:

    <?xml version="1.0" encoding="UTF-8"?>
    <!DOCTYPE xsl:stylesheet [
      <!ENTITY nbsp "&#160;">
    ]>
    <xsl:stylesheet version="1.0" ...

    Now, "&nbsp;" can be used instead of "&#160;". This is mostly a stylesheet author convenience because the XML parser converts the entity into "&#160;" before the XSLT processor ever sees it. A word of caution: some XML-related tools will try to validate the XSLT stylesheet if they see the DOCTYPE. Since the DTD subset does not include definitions for all the XSLT elements, validation errors will be reported.

    If popular XSLT processors automatically convert "&#160;" into "&nbsp;", what is the problem? Well, problems occur when the stylesheet's output method is "xml" instead of "html".

    When the XSLT output method is "html", most XSLT processors modify their output to accommodate Web browsers. For instance, tags like "<br />", which are valid XML, may be converted to "<br>". This is more likely to work in older browsers, but is not well-formed XML.

    XHTML is the current recommendation from the Worldwide Web Consortium for Web page authoring. Since XHTML documents must be well-formed XML, XSLT stylesheet authors may wish to use the "xml" output method instead of "html" when producing XHTML. Here is the first part of an XSLT stylesheet that produces XHTML:

    <?xml version="1.0" encoding="UTF-8"?>
    <xsl:stylesheet version="1.0" 
      xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
      <strong><xsl:output method="xml" </strong>
        doctype-public="-//W3C//DTD XHTML 1.0 Strict//EN"
        doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd" 
        encoding="UTF-8"/>
      <xsl:template match="/">
        <html xmlns="http://www.w3.org/1999/xhtml">
        ...remainder omitted
    

    When the output method is "xml", Xalan does not convert "&#160;" into "&nbsp;". Instead, it inserts a single character code 160 into the result tree. This causes problems in some cases. For example, Figure 2 is a screen shot from Microsoft Internet Explorer 5.5 running on Windows 2000. Notice the funny letter "A" with the symbol above it:

    Download the example to try it out.

    The bottom half of Figure 2 shows an alternate technique that works in most cases. Here is how that approach works:

    <xsl:text disable-output-escaping="yes">&amp;nbsp;</xsl:text>

    disable-output-escaping="yes" keeps the XSLT processor from converting the "&nbsp;" into character code 160 when producing the result tree. Instead, it preserves the exact character sequence "&nbsp;". The browser then displays the nonbreaking space properly.

    It should be noted that the XSLT specification does not require XSLT processors to support disable-output-escaping, so this technique should be tested using your particular tools.

    Figu>e 2

    Figure 2. Results of using the "xml" output method (top) versus using the disable-output-escaping method (bottom)

    Here is a summary of the techniques presented here:

    • Use the "&#160;" character entity to represent nonbreaking spaces. This works when the output method is "html" because most XSLT processors convert the entity to the literal characters "&nbsp;". The XSLT specification does not mandate this behavior, but Xalan works this way.

    • Define an entity for "&nbsp;" and use that. This is effectively identical to the first approach, but may look nicer for stylesheet authors. It may introduce problems when certain tools mistakenly try to validate the stylesheet against the nonexistent DTD.

    • Use <xsl:text disable-output-escaping="yes">&amp;nbsp;</xsl:text> as an alternative to "&#160;". This is particularly useful when the output method is "xml". The XSLT specification does not mandate that processors support disable-output-escaping.

  6. Write XML Producer classes.

    In order to apply XSL transformations, Java objects must somehow be converted into XML data. This can be done in several ways:

    1. Add a getXML() method to each class.
    2. Write XML producer classes that know how to convert specific Java objects into XML.
    3. Use a sophisticated Java-to-XML API that automates the conversion to XML.

    The first approach might look something like this:

    public class Customer {
        public Element getXML(Document doc) {
            // use the DOM API to create an Element 
            // representing this object
            ...
        }
        ...
    }
    

    This approach is easy to explain and understand, but suffers from a key design flaw. The primary problem is the way that a particular XML representation is now tightly coupled to each class. When new XML representations are required, new methods must be written. This means classes get larger and larger as more XML "views" are added.

    It is not hard to imagine scenarios where more than one XML representation of an object is desirable. For a summary report showing hundreds of customers, only a few key pieces of each customer are present in the XML data. For a detail view of a particular customer, however, the XML should contain all data related to that customer.

    The second approach breaks out XML production into separate utility classes. An XML producer for a Customer might look something like this:

    public class CustomerDOMProducer {
        public static Element toXML(Customer cust, Document doc) {
            ... use the DOM API to create a fragment of XML
        }
    }
    

    This simple change decouples XML production from the Customer class; adding new XML representations is simply a matter of writing additional XXXDOMProducer classes. Moving to non-DOM APIs such as JDOM is even possible, all without changes to the Customer code.


    For more information on JDOM, don't miss Brett McLaughlin's recently released Java & XML, 2nd Edition.


    It is worth mentioning the third approach, which is to use a product that automatically converts Java objects into XML. Although these types of tools are great for persistence and for exchanging data with other apps, they may not be ideal for XSL transformations. This is because the generated XML may be more complex than that provided by a hand-coded solution, potentially resulting in more complex XSLT stylesheets.

  7. Assume that cookies are disabled.

    The servlet API supports session tracking using the HttpSession class. This makes technologies like shopping carts possible. The default behavior of this class relies on browser cookies in order to identify each user, however users may disable cookies.

    When browser cookies are disabled, Web applications must rely on some other mechanism to identify users. URL rewriting is the technique used by the servlet API. For various reasons, URL rewriting does not happen automatically. In order to support session tracking when cookies are disabled, programmers must remember to encode each URL emitted by applications. This is done by appending jsessionid=nnnnn to each hyperlink, form action, or redirect URL. The following table illustrates URLs with and without this identifier.

    Ordinary URL Encoded URL <a href="mylink"> <a href="mylink;jsessionid=129j2fjs87l156"> <form action="mylink"> <form action="mylink;jsessionid=129j2fjs871156">

    When the user clicks on an encoded hyperlink or submits a form with an encoded action, the servlet container can determine his or her identity by looking at the value of jsessionid.

    When using XSLT to generate XHTML, this session identifier must somehow be embedded into each page. Since the identifier is dynamic and different for every user, it should be passed as a stylesheet parameter. Here is how this parameter is declared at the top of each XSLT stylesheet:

    <xsl:stylesheet version="1.0" 
        xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    
      <!--
      *********************************************************
      ** global.sessionID : Used for URL-rewriting to implement 
      **                    session tracking without cookies.
      *********************************************************-->
      <xsl:param name="global.sessionID"/>
    
      ...
    

    On the servlet side of the application, the Java code passes the session identifier to the XSLT processor using JAXP's Transformer class. It is smart enough to only do this when cookies are not in use:

    protected void doGet(
            HttpServletRequest req, 
            HttpServletResponse res) 
            throws IOException, ServletException {
    
        Transformer trans = ... // obtain Transformer from JAXP
        HttpSession session = req.getSession(true);
    
        // allow cookieless session tracking
        if (!req.isRequestedSessionIdFromCookie()) {
            String sessionID = session.getId();
            trans.setParameter("global.sessionID", 
                    ";jsessionid=" + sessionID);
        }
    

    Back on the XSLT side of the application, the global.sessionID parameter can then be appended to all hyperlinks and form actions as each page is generated. This technique is covered in its entirety in Chapter 8, "Additional Techniques," of Java and XSLT.

  8. Use XSLT as a code generator.

    Although XSLT is most commonly used for Web-based transformations, it is not limited to XHTML output. XSLT can transform XML into any text format, making it an ideal choice for many types of code generation and other developer utilities.

    When using XSLT as a rudimentary code generator, it is best to focus on applications that are repetitive and highly structured. Many classes related to Enterprise JavaBeans are highly structured and somewhat repetitive, making this an ideal choice for code generation.


    Look for the third edition of O'Reilly's Enterprise Java Beans, due to be released this September.


  9. Use <xsl:import> for i18n.

    Figure 3 shows how XSLT stylesheets can be modularized to support internationalization:

    Figure 3

    Figure 3. XSLT internationalization

    This is an interesting trick that capitalizes on the <xsl:import> feature of XSLT. With <xsl:import>, one stylesheet can import one or more other stylesheets. If stylesheet "A" imports stylesheet "B", templates and variable definitions in stylesheet "A" take precedence over those found in stylesheet "B".

    The language-specific stylesheet might look something like this:

    <?xml version="1.0" encoding="UTF-8"?>
    <xsl:stylesheet version="1.0" 
        xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
      <xsl:import href="common.xslt"/>
      
      <xsl:variable name="lang.pageTitle">Welcome to 
    XSLT!</xsl:variable>
    </xsl:stylesheet>
    

    And the generic stylesheet might look like this:

    <?xml version="1.0" encoding="UTF-8"?>
    <xsl:stylesheet version="1.0" 
        xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
      <xsl:output method="html" encoding="UTF-8"/>
      
      <xsl:template match="/">
        <html>
          <head>
            <title><xsl:value-of 
                   select="$lang.pageTitle"/></title>
          </head>
        ...etc
    

    As shown here, the generic stylesheet does not hard-code text (the page title) that is displayed to the user. Instead, it relies on variable definitions found in the language-specific stylesheet. In this fashion, adding support for new languages is merely a matter of creating new language-specific stylesheets.

    This is very similar to "ordinary Java" internationalization, in which various property files define language-specific text.

  10. Set up StreamSource to resolve relative URIs.

    Consider the following JAXP code (the problem lines are emphasized):

        
        // Stream containing XML data 
        InputStream xmlStream = ... 
        // Stream containing XSLT stylesheet
        InputStream xsltStream = ... 
    
        Source xmlSource = new StreamSource(xmlStream);
        Source xsltSource = new StreamSource(xsltStream);
    
        TransformerFactory transFact = 
           TransformerFactory.newInstance();
        Transformer trans = transFact.newTransformer(xsltSource);
        trans.transform(xmlSource, new StreamResult(System.out));
    

    And now suppose the XSLT stylesheet imports another stylesheet like this:

    <xsl:import href="formatName.xslt"/>
    

    This causes problems because the XSLT processor does not know where to find formatName.xslt. This same problem occurs when the XML data contains references to other files. The code is fixed by changing the way the StreamSource objects are constructed:

        
        Source xmlSource = new StreamSource(xmlStream,
            "file:///C:/data/xml/");
        Source xsltSource = new StreamSource(xsltStream,
            "file:///C:/data/xslt/);
    

    The second parameter provides the URI of the directories containing the XML and XSLT files. Now, the XSLT processor knows where to look when resolving URI references inside of the XML data and XSLT stylesheets.

Learning More

XSLT is not a difficult language, although it does work quite differently than Java. Diving in and writing stylesheets is probably the best way to get over the initial learning curve. Here are some additional sources of information on XSLT: