Sunday, November 15, 2009

EXPath Packaging System: the on-disk repository layout

While working on the implementation for Calabash of the EXPath Packaging System, I was rewriting, again, a repository manager, dedicated to Calabash. Exactly as I did for Saxon one month earlier. Why? The repositories provide the same features. It should be then possible to make Calabash and Saxon share the same repository, if Saxon just ignore components other than XSLT and XQuery (for instance XProc pipelines) in that repository. So one just has to maintain one single repository for his/her whole computer (or one repository dedicated to a single project, like a Java EE application.)

Going further, I think the layout of such an on-disk repository should be part of the packaging specification itself. An implementation does not have to use such a standard repository, but if it does, it doesn't have to worry about package installation, repository management software, or even about the resolving mecanism between a component URI and the actual file with that component. One repository layout, one set of softwares for all those tasks.

This introduce a new concept. Each kind of component (XSLT, XQuery, XML Schema, etc.) has its own URI space. For instance, when using Saxon for a transform, it will resolve xsl:import URIs only in the XSLT space, when using Calabash, it will use the right space for each step. The resolving machinery is based on OASIS XML Catalogs. The repository has a top-level catalog for each URI space.

The global view of the repository is a set of subdirectories, one per package installed. The package is unzip exactly has it has been created (with the exact same files and the exact same structure.) One of those direct subdirectories is special. Its name is .expath-pkg/ and it contains the catalogs and other administrative files. It can also contain config files dedicated to a specific processor; for instance the extensions written in Java for Saxon need some config file to be stored there. There is one top-level catalog for each URI space in the repository, as well as for each package there is one catalog for each URI space it contains. The top level catalogs just point to all existing catalogs at the package level.

repo/
   .expath-pkg/
      xquery-catalog.xml
      xslt-catalog.xml
      .saxon/
         ...        [Saxon-specific stuff at the repository level]
      lib1/
         xquery-catalog.xml
         xslt-catalog.xml
         saxon/
            ...     [Saxon-specific stuff in lib1]
      lib2/
         ...
   lib1/
      query.xq
      style.xsl
   lib2/
      ...

There is a specific project aimed only at managing such a repository. There is for now only a command line interface, but there should be a graphical interface in the near future. The same project provides helpers to other Java-based applications to use repositories. For instance, the implementations for Saxon and Calabash use this JAR file to get resolving support for some URI spaces, based on the Norman's resolver for XML Catalogs. It could then be used in applications like Kernow and oXygen, or even in eXist. The following are the steps needed to setup the repository management application, Saxon and Calabash to have a usable packaging system.

  • 1/ download expath-pkg-repo-0.1.jar. I create a shell script on my system to use it easily by typing just xrepo, but this is a simple JAR file you can execute by java -jar pkg-repo.jar. Hereafter I simply use xrepo to refer to this application.
  • 2/ set $EXPATH_REPO, for instance to ~/share/expath/repo or to /usr/local/share/expath/repo or to c:/expath/repo
  • 3/ initialize the repository with xrepo create $EXPATH_REPO
  • 4/ put saxon and calabash scripts into your $PATH, with the following environment variables to be able to use them
  • 5/ set SAXON_CP to the classpath required to execute Saxon; it must contain the following JARs: saxon9he.jar (or any other version), resolver.jar, expath-pkg-repo-0.1.jar and expath-pkg-saxon-0.2.jar
  • 6/ set CALABASH_CP to the classpath required to execute Calabash; it must contain the following JARs: my modified version of Calabash, saxon9he.jar (or any other 9.2 version), resolver.jar, expath-pkg-repo-0.1.jar, expath-pkg-saxon-0.2.jar and expath-pkg-calabash-0.1.jar
  • 4b/ instead of the steps 4, 5 and 6 (for example if you do not have a Unix shell,) you can just create a simple script with the appropriate classpath and Java command to launch Saxon, as well as one for Calabash. The only drawback is that the JAR files for extensions written in Java for Saxon won;t be taken automatically from the repository

We are now going to test the EXPath HTTP Client, delivered as a XAR file. First, we create three test files: an XSLT stylesheet, an XQuery main module and an XProc pipeline. All those files are simple and use the extension function http:send-request() to send an HTTP request to a website, get the result, and extract the HTML title. Save them somewhere as, say, http-client-test.xsl, http-client-test.xq and http-client-test.xproc:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
                xmlns:http="http://www.expath.org/mod/http-client"
                xmlns:h="http://www.w3.org/1999/xhtml"
                exclude-result-prefixes="http h"
                version="2.0">

   <xsl:import href="http://www.expath.org/mod/http-client.xsl"/>

   <xsl:template name="main">
      <xsl:variable name="request" as="element()">
         <http:request href="http://www.fgeorges.org/" method="get"/>
      </xsl:variable>
      <title>
         <xsl:value-of select="http:send-request($request)
                                 / h:html/h:head/h:title"/>
      </title>
   </xsl:template>

</xsl:stylesheet>
import module namespace http = "http://www.expath.org/mod/http-client";
declare namespace h = "http://www.w3.org/1999/xhtml";

http:send-request(
   <http:request href="http://www.fgeorges.org/" method="get"/>
)
  / h:html/h:head/h:title
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
                xmlns:c="http://www.w3.org/ns/xproc-step">

   <p:input  port="source"/>
   <p:output port="result"/>

   <p:xslt template-name="main">
      <p:input port="stylesheet">
         <p:document href="http-client-test.xsl"/>
      </p:input>
      <p:input port="parameters">
         <p:empty/>
      </p:input>
   </p:xslt>

</p:declare-step>

If you try to evaluate those test files before installing the package, you will get errors from Saxon and Calabash (disclaimer: I rewrote the outputs of both processors, just make them more easily readable, but the meaning stays intact):

$ saxon -xsl:http-client-test.xsl -it:main
File not found: http://www.expath.org/mod/http-client.xsl

$ saxon --xq http-client-test.xq
Cannot locate module for namespace http://www.expath.org/mod/http-client

$ calabash http-client-test.xproc
File not found: http://www.expath.org/mod/http-client.xsl

Now, install the package directly from the Internet (just press ENTER at both questions from the installer, to keep the default values,) then try again the test files:

$ xrepo install http://www.cxan.org/tmp/expath-http-client-0.1.xar
Install module EXPath HTTP Client? [true]: 
Install it to dir [expath-http-client-0.1]: 

$ saxon -xsl:http-client-test.xsl -it:main
<title>Florent Georges</title>

$ saxon --xq http-client-test.xq
<title xmlns="http://www.w3.org/1999/xhtml">Florent Georges</title>

$ calabash http-client-test.xproc
<title>Florent Georges</title>

While I think the runtime support for the packaging is best handled in each processor's internals, having a common repository layout (and actually shared repositories) could help processors to implement it and especially to have a set of independent applications to manage repositories and packages.

The next is, finally, to release a new version of the specification, including this repository layout. See the EXPath Packaging page for more information, and subscribe to the EXPath mailing list to stay tunned.

Labels: , ,

EXPath Packaging System prototype implementation for Calabash

An interesting piece of code I worked on during the past few weeks is the implementation of the EXPath Packaging System for the Norman Walsh's XProc processor: Calabash. It was interesting for itself, as a coding experience, but also for the still-in-development packaging system, as XProc provides all core XML technologies within a single language. Thus implementing the packaging system for Calabash implied to implement it for: RNC, RNG, Schematron, XProc (for XProc pipelines themselves,) XQuery, XSD and XSLT. This was enlightening about the relationships between those several technologies, and a proof of concept about the applicability of the packaging concept to those several technologies.

Unfortunately, Calabash does not provide any way for the user to finely configure the underlying processors (for instance Saxon for XSLT, Jing for RNG, etc.) So I first needed to add this feature to Calabash itself. Instead of plugging the EXPath stuff directly into the Calabash code base, I decided to add only a simple API for an external user to plug configuration code into Calabash. I hope Norm will agree on integrating such changes into Calabash, so the packaging support could be written entirely outside of the Calabash code base in a first time (and maybe included in Calabash in a second time.) In the meanwhile, you can just use an alternative JAR file for Calabash, including my changes (and based on the latest Subversion revision, so this is really beta stuff, besides some classes have been disabled also, due to dependency issues.) You can also have a look at the following email on XProc Dev with explanations on how to patch the Calabash code base.

To install the packaging support for Calabash, you need to put the following JAR files into your classpath: my modified Calabash JAR file, the EXPath repository management, the EXPath packaging support for Calabash and the EXPath packaging support for Saxon. Then run Calabash the usual way, besides setting the Java property org.expath.pkg.calabash.repo to the location of the repository you want to use. For repository management, please see the next blog entry I will post here...

If you are under Unix (incl. Linux, Mac OS X or Cygwin under Windows) you can use this shell script to launch Calabash from the command line. Just define the environment variable CALABASH_CP with the above classpath, and EXPATH_REPO to the repository directory. In addition to setting Calabash up, it will also add JAR files with extensions for Saxon into the classpath.

To test if the installation is ok, install this sample package (wait for the next blog entry for details about installing a package with the repository) and save the following pipeline in a file, say invoice-test.xproc:

<p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
                xmlns:i="http://www.fgeorges.org/test/invoice-steps">

   <p:import href="http://www.fgeorges.org/test/invoice.xpl"/>

   <p:output port="result"/>

   <p:input port="source">
      <p:inline>
         <invoice xmlns="http://www.fgeorges.org/test/invoice"
                  date="2009-10-12">
            <line price="15" quantity="10" unitary="1.5">
               <desc>Some stuff.</desc>
            </line>
            <line price="100">
               <desc>Bigger stuff.</desc>
            </line>
            <total tax-excl="115" tax-incl="139.15"/>
         </invoice>
      </p:inline>
   </p:input>

   <i:validate/>

   <i:transform/>

</p:declare-step>

Then run it using the above description. If you saved the shell script under the name "calabash" in your $PATH, just type:

calabash invoice-test.xproc

And that's all! See the EXPath Packaging page for more information, and subscribe to the EXPath mailing list to stay tunned.

Labels: ,