[ Team LiB ] Previous Section Next Section

15.4 Processing XML Data

XSLT is great for transforming an XML source into another format, but sometimes you need to process the XML data in other ways. For instance, you may want to use part of the XML data in a database query to get additional information and compose a response that merges the two data sources, or reformat date and numeric information in the XML source according to the user's preferred locale. To process XML data in this way, the JSTL XML library includes a number of actions for picking out pieces of an XML document, as well as iteration and conditional actions similar to the ones in the core library, but adapted to work specifically with XML data.

In this section, we look at an example that uses most of the JSTL XML actions. The XML data comes from the O'Reilly Meerkat news feed. Meerkat scans a large set of Rich Site Summary (RSS)—an XML application suitable for news, product announcements, and similar content—sources frequently and makes the aggregated data available in a number of formats, including a superset of the RSS format that includes category, source, and date information for each story. You can learn more about Meerkat and how to use it at http://www.oreillynet.com/pub/a/rss/2000/05/09/meerkat_api.html. Example 15-5 shows a sample of the XML data that Meerkat can deliver.

Example 15-5. Meerkat XML news feed format
<?xml version="1.0"?>
<!DOCTYPE meerkat_xml_flavour 
  SYSTEM "http://meerkat.oreillynet.com/dtd/meerkat_xml_flavour.dtd">
  
<meerkat>
 
  <title>Meerkat: An Open Wire Service</title> 
  <link>http://meerkat.oreillynet.com</link> 
  <description>
    Meerkat is a Web-based syndicated content reader providing
    a simple interface to RSS stories.  While maintaining the original 
    association of a story with a channel, Meerkat's focus is on 
    chronological order -- the latest stories float to the top, 
    regardless of their source.
  </description>
  <language>en-us</language> 
  
  <image> 
    <title>Meerkat Powered!</title> 
    <url>http://meerkat.oreillynet.com/icons/meerkat-powered.jpg</url>
    <link>http://meerkat.oreillynet.com</link> 
    <width>88</width> 
    <height>31</height> 
    <description>
      Visit Meerkat in full splendor at meerkat.oreillynet.com
    </description> 
  </image> 
  
  <story id="881051">
    <title>
      Clay Shirky: What Web Services Got Right ... and Wrong
    </title>
    <link>
      http://www.oreillynet.com/pub/a/network/2002/04/22/clay.html
    </link>
    <description>
      Web Services represent not just a new way to build Internet 
      applications, says Clay Shirky in this interview, but the second 
      stage of peer-to-peer, in which distinctions between clients and 
      servers are all but eliminated. 
    </description>
    <category>General</category>
    <channel>O'Reilly Network</channel>
    <timestamp>2002-04-23 17:02:50</timestamp>
  </story>
  ...
</meerkat>

The example application processes this XML data in a number of ways. First, it extracts some information about the Meerkat service itself and adds it to the page, so the user can see where the data comes from. It then gets all <category> elements and builds a list of unique category names. This list is used to build an HTML select list, from which the user can pick one category to filter the data. The XML data is then filtered accordingly, and an HTML table with matching stories is generated. Just for fun and to illustrate the use of the conditional XML actions, all stories in the General category are displayed against a light green background. The result is shown in Figure 15-2.

Figure 15-2. The XML-base news service application
figs/Jsp3_1502.gif

Example 15-6 shows the JSP page that does all the processing.

Example 15-6. Processing XML data (news.jsp)
<%@ page contentType="text/html" %>
<%@ taglib prefix="c" uri="http://java.sun.com/jsp/jstl/core" %>
<%@ taglib prefix="x" uri="http://java.sun.com/jsp/jstl/xml" %>
  
<%-- 
  Get new XML data if the cached version is older than
  1 hour.
--%>
<c:set var="cachePeriod" value="${60 * 60 * 1000}" />
<jsp:useBean id="now" class="java.util.Date" />
<c:if test="${(now.time - cacheTime) > cachePeriod}">
  <c:import url="http://meerkat.oreillynet.com/?&p=4999&_fl=xml&t=ALL" 
    varReader="xmlSource">
    <x:parse var="doc" doc="${xmlSource}" scope="application" />
  </c:import>
  <c:set var="cacheTime" value="${now.time}" scope="application" />
</c:if>
  
<html>
  <head>
    <title>O'Reilly News</title>
  </head>
  <body bgcolor="white">
    <h1>O'Reilly News</h1>
    <img src="<x:out select="$doc/meerkat/image/url" />">
    This service is based on the news feed from
    <a href="<x:out select="$doc/meerkat/link" />">
      <x:out select="$doc/meerkat/title" /></a>.
    <p>
    <x:out select="$doc/meerkat/description" />
  
    <%-- 
      Create a list of unique categories present in the XML feed
    --%>
    <jsp:useBean id="uniqueCats" class="java.util.TreeMap" />
    <x:forEach select="$doc/meerkat/story/category" var="category">
      <%-- Need to convert the XPath node to a Java String --%>
      <x:set var="catName" select="string($category)" />
      <c:set target="${uniqueCats}" property="${catName}" value="" />
    </x:forEach>
  
    <form action="news.jsp">
      Category:
      <select name="selCat">
        <option value="ALL">All
        <c:forEach items="${uniqueCats}" var="current">
          <option value="<c:out value="${current.key}" />"
            <c:if test="${param.selCat == current.key}">
              selected
            </c:if>>
            <c:out value="${current.key}" />
          </option>
        </c:forEach>
      </select>
      <input type="submit" value="Filter">  
    </form>
  
    <%-- Filter the parsed document based on the selection --%>
    <c:choose>
      <c:when test="${empty param.selCat || param.selCat == 'ALL'}">
        <x:set var="stories" select="$doc//story" />
      </c:when>
      <c:otherwise>
        <x:set var="stories" 
          select="$doc//story[category = $param:selCat]" />
      </c:otherwise>
    </c:choose>
  
    <%-- Generate a table with data for the selection --%>
    <table>
      <x:forEach select="$stories">
        <tr>
          <x:choose>
            <x:when select="category[. = 'General']">
              <td bgcolor="lightgreen">
            </x:when>
            <x:otherwise>
              <td>
            </x:otherwise>
          </x:choose>
            <a href="<x:out select="link" />">
              <x:out select="title" /></a>
            <br>
            <i><x:out select="timestamp" /></i>:
            <b>Category:</b><x:out select="category" />,
            <b>Reported by:</b><x:out select="channel" />
            <br><x:out select="description" />
          </td>
        </tr>
      </x:forEach>
    </table>
  </body>
</html>

At the top of the page, the XML source is retrieved from the Meerkat server using the same <c:import> action used in the previous examples. There are two noteworthy differences, though: the url attribute specifies an absolute URL and the imported data is exposed as a Reader instead of as a String. I mentioned both these features earlier. In this example, using a Reader is appropriate because the data may be large, and it's only of interest to the nested <x:parse> action.

15.4.1 Caching Data

Before we look at the <x:parse> action in detail, I'd like to say a few words about the caching technique used in this example. The Meerkat data is updated only on an hourly basis, so it's pointless to ask for it more frequently. It's also expensive in terms of time and computing resources to import and parse the XML data. By caching the parsed data for an hour, the web application gets more responsive and avoids putting load on the Meerkat server unnecessarily.

The caching technique used here simply creates a timestamp for the data in the form of a java.util.Date object and saves it together with the data itself in the application scope, using standard and JSTL core actions. When a new request is received, it's tested to see if the cache is older than the predefined cache period (one hour in this example). If it is, a fresh copy is imported, parsed, and saved in the application scope again, along with the timestamp. Otherwise the cached data is used. You can use this technique for any type of processing that's expensive, for instance retrieving data from a database or performing complex calculations.

15.4.2 Parsing XML Data

Before you can access the XML data with the JSTL XML actions, the imported document must be parsed and converted to a data structure the actions can read. That's what the <x:parse> action does (see Table 15-3).

Table 15-3. Attributes for JSTL <x:parse>

Attribute name

Java type

Dynamic value accepted

Description

doc

String or java.io.Reader

Yes

Mandatory, unless specified as the body. The XML document to parse.

systemId
String

Yes

Optional. The system identifier for the XML document.

filter
org.xml.sax.XMLFilter

Yes

Optional. An XMLFilter to be applied to the XML document.

var
String

No

Optional. The name of the variable to hold the result as an implementation-dependent type.

scope
String

No

Optional. The scope for the variable, one of page, request, session, or application. page is the default.

varDom
String

No

Optional. The name of the variable to hold the result as a org.w3c.dom.Document.

scopeDom
String

No

Optional. The scope for the DOM variable, one of page, request, session, or application. page is the default.

The XML document to parse can be specified as the body or as a String or Reader variable. In Example 15-3, I use the Reader exposed by the <c:import> action to get the best performance. A base URI for interpretation of relative URIs in the document can be specified by the systemId attribute, the same way as for the <x:transform> action.

The parse result can be saved either as an implementation-dependent data structure (named by the var attribute) or as a standard org.w3c.dom.Document object (named by the varDom attribute), in any scope. You should use the latter only if you need to process the parse result with a custom action or other custom code because the implementation-dependent type is typically optimized in terms of memory use and ease of access, and it's supported by all the other JSTL XML actions that use a parse result. The implementation-dependent data structure is saved as an application scope variable in Example 15-3, where it's picked up by the other XML actions in the page.

If the XML document is large and you're only interested in a very small part of it, you can provide an implementation of the org.xml.sax.XMLFilter interface to the action, typically created and configured by a servlet, a filter, or a listener (the filter and listener component types are described in Chapter 19). As the name implies, an XMLFilter can remove the parts you don't need, making the parsing process more efficient. For more about XML filters, I suggest you look at the documentation of the interface or read a book about Java and XML, such as Brett McLaughlin's Java and XML (O'Reilly).

15.4.3 Accessing XML Data Using XPath Expressions

With the parsing out of the way, we can turn to how to access parts of the XML data. The JSTL XML library contains a number of actions for this purpose, similar to the ones you're familiar with from the JSTL core library: <x:out>, <x:set>, <x:if>, <x:choose>, <x:when>, <x:otherwise>, and <x:forEach>. The main difference between the XML and the core flavor is that the XML actions use a special language for working with XML data, named XPath, instead of the standard JSP EL. XPath 1.0 is a W3C recommendation that has been around since 1999, and it's used in XSLT stylesheets and other XML applications.[1] The language details are beyond the scope for this book, but here's a brief summary.

[1] Available at http://www.w3.org/TR/xpath.

An XPath expression identifies one or more nodes (root, elements, attributes, namespace attributes, comments, text, and processing instructions) in an XML document. The simplest expression type is a plain location path, similar to a Unix filesystem path, to a set of nodes in the document. For instance, the path /meerkat/image/url identifies the <url> element in the Meerkat XML document:

...
<meerkat>
 
  <title>Meerkat: An Open Wire Service</title> 
  <link>http://meerkat.oreillynet.com</link> 
  <description>
    Meerkat is a Web-based syndicated content reader providing
    a simple interface to RSS stories.  While maintaining the original 
    association of a story with a channel, Meerkat's focus is on 
    chronological order -- the latest stories float to the top, 
    regardless of their source.
  </description>
  <language>en-us</language> 
  
  <image> 
    <title>Meerkat Powered!</title> 
    <url>http://meerkat.oreillynet.com/icons/meerkat-powered.jpg</url>
    ...

A location path that starts with double forward slashes identifies all nodes of a certain type, regardless of their position in the document hierarchy. For instance, //description identifies all <description> elements, so it finds two elements in the sample XML document:

...
<meerkat>
  ...
  <description>
    Meerkat is a Web-based syndicated content reader providing
    a simple interface to RSS stories.  While maintaining the original 
    association of a story with a channel, Meerkat's focus is on 
    chronological order -- the latest stories float to the top, 
    regardless of their source.
  </description>
  ...
  <image> 
    ...
    <description>
      Visit Meerkat in full splendor at meerkat.oreillynet.com
    </description> 
    ...

A path is always interpreted relative to a specific context, such as the complete document or a subset of its nodes. When you use XPath expressions as JSTL XML action attributes, the context can be represented by a variable and can also be adjusted by actions such as the <x:forEach> action. Besides the type of paths described here, an XPath expression can also include function calls, literals, operators, and special syntax for identifying attributes. Some of these features are used in Example 15-3, but I recommend that you learn more about them if you're going to use the JSTL XML actions. Check out the XPath chapter from Elliotte Rusty Harold and W. Scott Means's XML in a Nutshell (O'Reilly), available online at http://www.oreilly.com/catalog/xmlnut/chapter/ch09.html, and Robert Eckstein's XML Pocket Reference (O'Reilly). The XPath tutorial by Miloslav Nic and Jiri Jirat, available at http://www.zvon.org/xxl/XPathTutorial/General/examples.html, is another good resource.

Let's look at how XPath expressions are used with the JSTL <x:out> action (see Table 15-4) to add the general Meerkat information that appears at the beginning of the page:

<img src="<x:out select="$doc/meerkat/image/url" />">
This service is based on the news feed from
<a href="<x:out select="$doc/meerkat/link" />">
  <x:out select="$doc/meerkat/title" /></a>.
<p>
<x:out select="$doc/meerkat/description" />

Table 15-4. Attributes for JSTL <x:out>

Attribute name

Java type

Dynamic value accepted

Description

select
String

No

Mandatory. An XPath expression to be evaluated.

escapeXml
boolean

Yes

Optional. true if special characters in the value should be converted to character entity codes. Default is true.

All JSTL actions that accept XPath expressions do so only for their select attribute, to avoid confusion with other attributes that accept JSP EL expressions. For the first <x:out> action, the select attribute contains an XPath expression that starts with the doc variable (containing the parse result) followed by a location path for the <url> element. The <x:out> action converts the XPath evaluation result to a Java String and adds it to the response.

The way the doc variable is used here establishes the context for the XPath expression. Variables can appear anywhere in an XPath expression and always start with a dollar sign, followed by the name of the variable. XPath expressions used with the JSTL actions have access to almost the same type of dynamic data as an EL expression. Any application variable in any JSP scope can be accessed by its name, just as in an EL expression. The doc variable is an example of this. Important differences are that in an XPath expression, all variable names start with a dollar sign, and the EL property and element access operators (. and []) aren't recognized, so you can't use syntax like bean.propertyName in an XPath expression. A workaround is to save the property or element value in a new variable, and use it in the XPath expression:

<c:set var="myProperty" value="${myBean.myProperty}" />
<x:out select="$doc/root/myElement[@myAttribute = $myProperty]" />

Here the property value finds elements with an attribute that matches a bean property value. Also note that the XPath expression itself is not identified by any special syntax, as opposed to an EL expression that must always be enclosed by ${ and }.

In addition to application data, most of the information represented by EL implicit variables is available to an XPath expression with a slightly different syntax, most noticeable that a colon is used as a separator instead of a dot (see Table 15-5).

Table 15-5. XPath implicit variables

XPath expression

Description

$param:myParam

The myParam request parameter

$header:Accept

The Accept request header

$cookie:password

The password cookie

$initParam:myConfig

The myConfig context parameter

$pageScope:myVariable

The myVariable variable from the page scope

$requestScope:myVariable

The myVariable variable from the request scope

$sessionScope:myVariable

The myVariable variable from the session scope

$applicationScope:myVariable

The myVariable variable from the application scope

The JSTL <x:forEach> action (Table 15-6) lets you loop through the nodes that matches an XPath expression.

Table 15-6. Attributes for JSTL <x:forEach>

Attribute name

Java Type

Dynamic value accepted

Description

select
String

No

Mandatory. An XPath expression to be evaluated.

var
String

No

Optional. The name of the variable to hold the value of the current element.

varStatus
String

No

Optional. The name of the variable to hold a LoopTagStatus object.

begin
int

Yes

Optional. The first index, 0-based.

end
int

Yes

Optional. The last index, 0-based.

step
int

Yes

Optional. Index increment per iteration.

This action is used in Example 15-3 to extract the text from all <category> elements and build a sorted list of unique category names, that is then used to generate an HTML selection list:

<jsp:useBean id="uniqueCats" class="java.util.TreeMap" />
<x:forEach select="$doc/meerkat/story/category" var="category">
  <%-- Need to convert the XPath node to a Java String --%>
  <x:set var="catName" select="string($category)" />
  <c:set target="${uniqueCats}" property="${catName}" value="" />
</x:forEach>

A <jsp:useBean> action creates a java.util.TreeMap to hold the list. By using a map, the list of category names is automatically trimmed to unique names, since the keys in a map must be unique.[2] The TreeMap is a map type that sorts its keys, taking care of the sorting requirement. The XPath expression used for the <x:forEach> action matches all <category> elements. The action then evaluates its body once per element node, where the <c:set> action adds a map entry with the text value as the key and an empty string as the value.

[2] A java.util.TreeSet would actually be more appropriate, but there is no JSTL action that can add elements to a set.

An important detail here is that the value of the loop variable (category) contains an instance of an XPath node object, not the string needed for the Map. One way to convert an XPath node to a string is to use the XPath string( ) function. That's what I do here. The <x:set> action (Table 15-7) converts the current node to a Java String and saves it as a variable that is then used by <c:set> to set the Map entry. Tricks like this are unfortunately needed to bridge the XPath and Java domains in some cases.

Table 15-7. Attributes for JSTL <x:set>

Attribute name

Java type

Dynamic value accepted

Description

select
String

No

Mandatory. An XPath expression to be evaluated.

var
String

No

Mandatory. The name of the variable to hold the value of the current element.

scope
String

No

Optional. The scope for the variable; one of page, request, session, or application. page is the default.

You can look at Example 15-3 to see how a <c:forEach> action is then used to loop over all map entries and use the key values to build the HTML select list.

Next we need to decide which stories to display. This is also accomplished with the help from the <x:set> action:

<c:choose>
  <c:when test="${empty param.selCat || param.selCat == 'ALL'}">
    <x:set var="stories" select="$doc//story" />
  </c:when>
  <c:otherwise>
    <x:set var="stories" 
      select="$doc//story[category = $param:selCat]" />
  </c:otherwise>
</c:choose>

The JSTL core <c:choose> action with nested <c:when> and <c:otherwise> actions tests the value of the selCat request parameter. The first time the page is requested, this parameter is not present. In this case, as well as when it has the value ALL, an <x:set> action with an XPath expression that matches all <story> elements (and their subnodes) extracts the data to be displayed and saves it in a variable named stories.

If the user selects a specific category and clicks the Filter button, however, the selCat parameter is received with the request. In this case, another <x:set> action extracts only the <story> elements that match the selected category. It does this by using an XPath expression that contains a predicate with a Boolean expression:

$doc//story[category = $param:selCat]

XPath processes this expression by first collecting all nodes matching //story in the context represented by the doc variable, and then removing all nodes where the Boolean expression evaluates to false. In the Boolean expression, the text for the <category> element of each selected node is compared to the value represented by the $param:selCat variable: the selCat request parameter value.

The final part of the sample application loops over the selected nodes and generates an HTML table, with a light green background for the cells that contains stories in the General category:

<table>
  <x:forEach select="$stories">
    <tr>
      <x:choose>
        <x:when select="category[. = 'General']">
          <td bgcolor="lightgreen">
        </x:when>
        <x:otherwise>
          <td>
        </x:otherwise>
      </x:choose>
        <a href="<x:out select="link" />">
          <x:out select="title" /></a>
        <br>
        <i><x:out select="timestamp" /></i>:
        <b>Category:</b><x:out select="category" />,
        <b>Reported by:</b><x:out select="channel" />
        <br><x:out select="description" />
      </td>
    </tr>
  </x:forEach>
</table>

The <x:forEach> action is again used to loop through the set of nodes, but as opposed to when it was used to create the category name list, the current element is not exposed to the body as a variable. This illustrates another <x:forEach> feature, namely that the action adjusts the current XPath context seen by nested JSTL XML actions. When the body is evaluated, the current context for XPath expressions is the current node. An expression such as category[. = 'General'] used by the nested <x:when> action, is therefore evaluated in the context of the current story node, checking the value of its <category> element. The expression evaluates to true if the text in the <category> element equals the string "General". The <x:out> actions use similar XPath expressions to extract data from the current <story> element.

The last part of Example 15-3 also illustrates the use of most of the conditional JSTL XML actions: <x:choose>, <x:when>, and <x:otherwise>. They have the same function as the corresponding JSTL core elements; <x:choose> groups a number of <x:when> actions and optionally one <x:otherwise> action, where the body of the first <x:when> action with a select attribute that evaluates to true, or the <x:otherwise> body if none of them do, is processed. Only the <x:when> action has attributes, described in Table 15-8.

Table 15-8. Attributes for JSTL <x:when>

Attribute name

Java type

Dynamic value accepted

Description

select
String

No

Mandatory. An XPath expression to be evaluated as a Boolean.

The result of the XPath expression in the select attribute is converted to a Boolean using the XPath boolean( ) function; any valid number except 0, a nonempty string, and an expression that matches at least one node is converted to true. All other values are converted to false. Note that this means that the string "false" evaluates to true.

The only JSTL XML action I don't use in this example is <x:if>, described in Table 15-9.

Table 15-9. Attributes for JSTL <x:if>

Attribute name

Java type

Dynamic value accepted

Description

select
String

No

Mandatory. An XPath expression to be evaluated as a Boolean value.

var
String

No

Optional. The name of the variable to hold the Boolean result.

scope
String

No

Optional. The scope for the variable; one of page, request, session, or application. page is the default.

It works exactly like the corresponding action in the JSTL core library, except that the select attribute is evaluated as XPath boolean( ) the same way as for <x:when>.

The examples in this chapter show how the JSTL XML actions let you process XML documents pretty much any way you can think of. You can transform a document using a stylesheet, parse and access parts of the document in many ways, save a part as a variable, or add it to the response. As illustrated by the examples in this chapter, you can mix the JSTL XML actions with the other JSTL actions (or custom actions) and use application variables and request data in XPath expressions to select parts based on runtime conditions.

    [ Team LiB ] Previous Section Next Section