How does XML Parser locate DTD?

Below is an example first line written in JSP Tag Library Descriptor (TLD):

<!DOCTYPE taglib PUBLIC "-//Sun Microsystems, Inc.//DTD JSP Tag Library 1.1//EN" "web-jsptaglib_1_1.dtd">

Note that the declaration doesn’t have the full path to the DTD like “http://abc.com/xyz/web-jsptaglib_1_1.dtd&#8221;. So, how does the XML parser locate the DTD? Even if the full URI is specified, does the parser fetch the DTD from the web server always?

Short Answer:
XML Parser resolves public identifier “-//Sun Microsystems, Inc.//DTD JSP Tag Library 1.1//EN” to resource javax/servlet/jsp/resources/web-jsptaglibrary_1_1.dtd. This DTD is loaded from jsp-api.jar.

Long Answer:
Doctype declaration format:
<!DOCTYPE rootElementName PUBLIC “publicIdentifier” “systemIdentifier”>

For the DOCTYPE declaration in TLD:

  • rootElementName=taglib
  • publicIdentifier=”-//Sun Microsystems, Inc.//DTD JSP Tag Library 1.1//EN”
  • systemIdentifier=”web-jsptaglib_1_1.dtd”
  • Public identifier “-//Sun Microsystems, Inc.//DTD JSP Tag Library 1.1//EN” is registered with the resource URL “/javax/servlet/jsp/resources/web-jsptaglibrary_1_1.dtd” using EntityResolver (SchemaResolver implements EntityResolver) in DigesterFactory.java in catalina.jar.

  • From javadoc of method org.apache.commons.digester.Digester#register():

    Digester contains an internal EntityResolver implementation. This maps PUBLICID’s to URLs (from which the resource will be loaded). A common use case for this method is to register local URLs (possibly computed at runtime by a classloader) for DTDs. This allows the performance advantage of using a local version without having to ensure every SYSTEM URI on every processed xml document is local. This implementation provides only basic functionality. If more sophisticated features are required, using setEntityResolver(org.xml.sax.EntityResolver) to set a custom resolver is recommended.

  • XMLReader is set with this entity resolver in Digester.java in tomcat-coyote.jar
  • XMLReader will resolve the registered public identifiers to resource URIs using the entity resolver

Related Links:

Bash Tips and Tricks #1

Below are a few aliases that I use almost everyday at work:

  • Navigating directories
    alias ..='cd ..'
    alias ...='cd ../..'
    alias ....='cd ../../../..'
    alias .....='cd ../../../../..'
    

    With these aliases in place, use “..” to navigate one-level up, “…” to navigate two-levels up, etc.

  • Navigating to project related directories
    alias cdtomcat='cd "/cygdrive/c/appservers/tomcat"'
    alias cdlogs='cd "/cygdrive/c/appservers/tomcat/logs"'
    alias cdws='cd "/cygdrive/c/satish/eclipse/workspace"'
    

    To navigate to Tomcat logs directory, type “cdlogs” at the command prompt; To navigate to project workspace, type “cdws”. You get the idea, right?

  • Converting cygwin directory in to windows format
    alias cpwd='echo $PWD | sed -e "s/\/cygdrive\/c/c:/g" | sed -e "s/\//\\\/g"'
    

    Look at the screenshot below to get an idea

  • Open windows explorer
    alias winexp='explorer.exe `cpwd`'
    

    typing “winexp” at the command prompt will open windows explorer in the current working directory — this alias uses cpwd alias set up above

  • The cd command in cygwin takes directory that is in cygwin path format. For example, to navigate to C:\Satish\Software, the command is “cd /cygdrive/c/Satish/Software”. Below is a tiny function that changes to a directory given path in windows format (e.g. c:\Satish\Software)
    function wincd(){
    	dir=`echo "$1" | sed -e 's/\\\\/\\//g'`
    	cd $dir
    }
    

    Command wincd “C:\Satish\Software” changes the working directory to /cygdrive/c/Satish/Software — note that double quotes around the path are required. See picture below: