How does XML Parser locate DTD?

Below is an example first line written in JSP Tag Library Descriptor (TLD):

<!DOCTYPE taglib PUBLIC "-//Sun Microsystems, Inc.//DTD JSP Tag Library 1.1//EN" "web-jsptaglib_1_1.dtd">

Note that the declaration doesn’t have the full path to the DTD like “http://abc.com/xyz/web-jsptaglib_1_1.dtd&#8221;. So, how does the XML parser locate the DTD? Even if the full URI is specified, does the parser fetch the DTD from the web server always?

Short Answer:
XML Parser resolves public identifier “-//Sun Microsystems, Inc.//DTD JSP Tag Library 1.1//EN” to resource javax/servlet/jsp/resources/web-jsptaglibrary_1_1.dtd. This DTD is loaded from jsp-api.jar.

Long Answer:
Doctype declaration format:
<!DOCTYPE rootElementName PUBLIC “publicIdentifier” “systemIdentifier”>

For the DOCTYPE declaration in TLD:

  • rootElementName=taglib
  • publicIdentifier=”-//Sun Microsystems, Inc.//DTD JSP Tag Library 1.1//EN”
  • systemIdentifier=”web-jsptaglib_1_1.dtd”
  • Public identifier “-//Sun Microsystems, Inc.//DTD JSP Tag Library 1.1//EN” is registered with the resource URL “/javax/servlet/jsp/resources/web-jsptaglibrary_1_1.dtd” using EntityResolver (SchemaResolver implements EntityResolver) in DigesterFactory.java in catalina.jar.

  • From javadoc of method org.apache.commons.digester.Digester#register():

    Digester contains an internal EntityResolver implementation. This maps PUBLICID’s to URLs (from which the resource will be loaded). A common use case for this method is to register local URLs (possibly computed at runtime by a classloader) for DTDs. This allows the performance advantage of using a local version without having to ensure every SYSTEM URI on every processed xml document is local. This implementation provides only basic functionality. If more sophisticated features are required, using setEntityResolver(org.xml.sax.EntityResolver) to set a custom resolver is recommended.

  • XMLReader is set with this entity resolver in Digester.java in tomcat-coyote.jar
  • XMLReader will resolve the registered public identifiers to resource URIs using the entity resolver

Related Links:

Advertisements