Using XML Schema and Namespaces in XML - AUTHOR: v1, March 13, 2007

Page created by Clinton Wagner
 
CONTINUE READING
Using XML Schema and
Namespaces in XML
v1, March 13, 2007

AUTHOR:
  Peter Lacey
    Hplacey@burtongroup.com

TECHNOLOGY THREAD:
  Data Access Strategies
Publishing Information
     Burton Group is a research and consulting firm specializing in network and applications
     infrastructure technologies. Burton works to catalyze change and progress in the network computing
     industry through interaction with leading vendors and users. Publication headquarters, marketing,
     and sales offices are located at:

     Burton Group
     7090 Union Park Center, Suite 200
     Midvale, Utah USA 84047-4169
     Phone: +1.801.566.2880
     Fax: +1.801.566.3611
     Toll free in the USA: 800.824.9924
     Internet: info@burtongroup.com; www.burtongroup.com

     Copyright 2007 Burton Group. ISSN 1048-4620. All rights reserved. All product, technology and
     service names are trademarks or service marks of their respective owners.

     Terms of Use: Burton customers can freely copy and print this document for their internal use. Customers
     can also excerpt material from this document provided that they label the document as “Proprietary and
     Confidential” and add the following notice in the document: “Copyright © 2007 Burton Group. Used
     with the permission of the copyright holder. Contains previously developed intellectual property and
     methodologies to which Burton Group retains rights. For internal customer use only.”

     Requests from non-clients of Burton for permission to reprint or distribute should be addressed to
     the Marketing Department at +1.801.304.8119.

     Burton Group’s Application Platform Strategies service provides objective analysis of networking
     technology, market trends, vendor strategies, and related products. The information in Burton’s
     Application Platform Strategies service is gathered from reliable sources and is prepared by
     experienced analysts, but it cannot be considered infallible. The opinions expressed are based on
     judgments made at the time, and are subject to change. Burton offers no warranty, either expressed
     or implied, on the information in Burton’s Application Platform Strategies service, and accepts no
     responsibility for errors resulting from its use.

     If you do not have a license to Burton’s Application Platform Strategies service and are interested in
     receiving information about becoming a subscriber, please contact Burton.

 2

BURTON GROUP 7090 Union Park Center · Suite 200 · Midvale, Utah · 84047 · P 801.566.2880 · F 801.566.3611 · www.burtongroup.com
Table of Contents
Foreword ..................................................................................................................................................................5
  Audience...............................................................................................................................................................5
  Scope .....................................................................................................................................................................6
Introduction..............................................................................................................................................................6
Understanding Namespaces in XML....................................................................................................................7
  The Target Namespace ......................................................................................................................................9
  Qualifying Elements and Attributes ...............................................................................................................10
  Choosing a Name for a Namespace ................................................................................................................15
     Using URNs for Namespace Names..........................................................................................................15
     Using URLs for Namespace Names...........................................................................................................16
  Versioning ..........................................................................................................................................................18
  Best Practices: Namespaces and Versioning ..................................................................................................19
     Namespaces ...................................................................................................................................................19
     Versioning......................................................................................................................................................22
Creating XML Schema Documents ....................................................................................................................23
  Structuring an XML Schema Document .......................................................................................................23
     Elements.........................................................................................................................................................23
       Global Elements .......................................................................................................................................23
       Local Elements..........................................................................................................................................24
       Element Reuse ..........................................................................................................................................25
       Element Groups........................................................................................................................................27
     Attributes .......................................................................................................................................................27
     Types ..............................................................................................................................................................28
       Custom Types ...........................................................................................................................................28
          Simple Types ........................................................................................................................................28
          Complex Types ....................................................................................................................................29
       Named and Anonymous Types..............................................................................................................29
     Best Practices: Document Structure ...........................................................................................................30
       Elements vs. Types ...................................................................................................................................30
       Elements vs. Attributes............................................................................................................................33
     Best Practices: Naming Conventions..........................................................................................................34
  Modularity..........................................................................................................................................................35
     Include............................................................................................................................................................35
     Import.............................................................................................................................................................36
     Redefine .........................................................................................................................................................36
     Best Practices: Modularity............................................................................................................................36
  Extensibility........................................................................................................................................................38
     Extending Constructs Using the “Any” Element .....................................................................................38
     The anyType Type .......................................................................................................................................41
     Extensibility via Composite Schemas .........................................................................................................42

                                                                                                                                                                           3

   BURTON GROUP 7090 Union Park Center · Suite 200 · Midvale, Utah · 84047 · P 801.566.2880 · F 801.566.3611 · www.burtongroup.com
Best Practices: Extensibility..........................................................................................................................43
 Character Encoding...........................................................................................................................................44
 Schema Documentation....................................................................................................................................44
    Annotation .....................................................................................................................................................44
    XML-Style Comments .................................................................................................................................45
    Notation..........................................................................................................................................................45
    Best Practices: Schema Documentation......................................................................................................45
 Content Modeling..............................................................................................................................................46
    Element Cardinality......................................................................................................................................46
    Empty, Null, and Missing ............................................................................................................................47
    Simple Types .................................................................................................................................................49
         Facets (Restricted Simple Types) ............................................................................................................49
         List Types ..................................................................................................................................................49
         Union Types ..............................................................................................................................................50
         Best Practices: Simple Types ...................................................................................................................50
    Complex Types..............................................................................................................................................51
         Element Content.......................................................................................................................................51
         Mixed Content...........................................................................................................................................52
         Best Practices: Complex Types................................................................................................................53
    Derived Types ...............................................................................................................................................53
         Derived by Extension ...............................................................................................................................53
         Derived by Restriction..............................................................................................................................54
         Final............................................................................................................................................................54
         Abstract Types ..........................................................................................................................................55
         Best Practices: Derived Types .................................................................................................................55
    Other Content Modeling Constructs ..........................................................................................................56
         Substitution Groups..................................................................................................................................56
         Block...........................................................................................................................................................57
         Default or Fixed Values...........................................................................................................................57
         Best Practices: Substitution Groups, Block, and Fixed Values ...........................................................57
    Identity Constraints ......................................................................................................................................58
         Unique........................................................................................................................................................58
         Key/Keyref ................................................................................................................................................59
         Best Practices: Identity Constraints ........................................................................................................62
Notes ........................................................................................................................................................................63

    4

   BURTON GROUP 7090 Union Park Center · Suite 200 · Midvale, Utah · 84047 · P 801.566.2880 · F 801.566.3611 · www.burtongroup.com
Foreword
  Namespaces in Extensible Markup Language (XML) is a specification published by the World Wide
  Web Consortium (W3C) that addresses the potential for naming collisions between like-named
  elements in composite XML instance documents. XML Schema is a specification, also from the W3C,
                                                                                                1
  that “provide[s] a means for defining the structure, content and semantics of XML documents.” The
  Namespaces in XML specification (from here on referred to by its more common name, “XML
  namespaces”) predates, and is thus orthogonal to, the XML Schema specification. The XML Schema
  specification, on the other hand, is tightly entwined with XML namespaces, and one cannot be fully
  understood without the other.

  Together these two specifications underpin most XML-oriented technologies. XML namespaces is
  used seemingly everywhere and XML Schema is in broad usage. Of particular interest is that XML
  Schema is core to document-oriented, SOAP-based web services and to the web services framework
  (WSF). And, because the WSF underpins nearly all service oriented architecture (SOA) efforts, XML
  Schema is a critical component of SOA.

  But XML Schema is a large and complex specification, so much so that no single piece of software
  implements it in its entirety. Furthermore, ambiguities in the specification can lead to different
  interpretations in different implementations. And, of course, outright errors appear in XML Schema
  implementations, as well. It is therefore all too easy to create an XML Schema document that works
  perfectly in one implementation but differently or not at all in another.

  Problems with the XML Schema specification and its implementations are not the only source of
  concern. Many areas of XML and XML Schema usage can benefit from consistently applied best
  practices—areas such as the use of namespaces, extensibility, modularity, versioning, and the use of
  attributes. The fact is that there are many ways to accomplish a particular goal in XML Schema, not
  all of which are equally viable. Even the XML namespaces specification, as short and concise as it is,
  routinely trips up experienced developers.

Audience
  The best practices contained herein provide a set of guidelines for designing XML Schema
  documents that are effective, interoperable, and extensible. The target audiences for this information
  are enterprise and application architects and software developers. While it contains much
  explanatory content, this Methodologies and Best Practices (MBP) document is not a tutorial; it
  assumes at least a passing familiarity with XML namespaces and XML Schema.

  Furthermore, this MBP document assumes that, within the reader’s organization, at least one group
  (and possibly more) has responsibility for creating, maintaining, and retiring XML Schema
  documents within a particular namespace. And, finally, it assumes that governance and incentives are
  in place to ensure that users (developers) employ these corporate schemas rather than creating new
  schemas that model the same business entities. As will be shown, corporate schemas should only

                                                                                                                                  5

BURTON GROUP 7090 Union Park Center · Suite 200 · Midvale, Utah · 84047 · P 801.566.2880 · F 801.566.3611 · www.burtongroup.com
model data types. Corporate schemas should not allow for the direct creation of instance documents
     (i.e., they should have no globally declared elements). A need will always exist for developers to
     create actionable schemas, but such schemas should be derived from corporate data types.

Scope
     This MBP document covers those XML Schema features useful for modeling strongly typed data and
     describing XML-based messages. It does not provide best practices for using XML Schema to
     describe “textual” content (such as Extensible Hypertext Markup Language [XHTML] or Atom). In
     fact, the best practices for doing so are very different from those presented here. (It’s also true that
     XML Schema is not particularly well suited for representing non-data-oriented documents—
     consider RELAX NG or Schematron instead.)

     Therefore, this MBP document assumes that the primary use case of XML Schema is to describe data
     and not text, and that the schemas will be used by developer tools to:

            •    Create code artifacts in a given programming language that map to the schema-described
                 data elements
            •    Provide the information necessary to serialize native language data types to XML and
                 back again, whether at development time or runtime
            •    Provide a description of data as it exists “on the wire”
            •    Provide for the automatic validation of XML instance documents

     Many interoperability concerns presented here apply to XML-based messaging systems, such as
     SOAP web services and the standard artifacts (e.g., Web Services Description Language [WSDL]
     documents) and processing engines (e.g., web service platforms [WSPs]) that implement them. These
     interoperability concerns may or may not be applicable to developers who process XML directly
     using the Document Object Model (DOM) or Simple API for XML (SAX) application programming
     interfaces (APIs). However, if developers are planning to migrate applications to WSDL-described
     web services and support a variety of XML binding tools, these recommendations will help to
     transition those applications with minimal effort.

Introduction
     In XML Schema there are many ways of accomplishing a particular goal. However, the variations in
     approach are not simply a matter of syntax. Instead, choosing one mechanism over another can have
     significant ramifications regarding interoperability, reusability, maintainability, and how a schema
     document constrains an instance document. In order to make the most effective use of XML Schema,
     a schema author should understand the consequences of the choices made in designing that schema,
     and that’s what this MBP document provides: best practices for using XML Schema.

 6

 BURTON GROUP 7090 Union Park Center · Suite 200 · Midvale, Utah · 84047 · P 801.566.2880 · F 801.566.3611 · www.burtongroup.com
This MBP document assumes a base familiarity with XML Schema; that the reader knows how to
  create a schema document that describes types, elements, and attributes. Even so, this MBP document
  describes many aspects of XML Schema that the reader is probably already familiar with. The
  purpose of this review is to explain the many possible ways that data can be described with XML
  Schema, in order to help the reader choose which, if any, is best. The goals of this MBP document are
  to steer the XML Schema designer away from the specification’s many potential pitfalls regarding
  real-world usage and to identify some of the more esoteric components of XML Schema.

Understanding Namespaces in XML
  The W3C released the first working draft of XML namespaces in March 1998, just one month after
  the core XML specification reached recommendation (1.0) status. Although XML namespaces
  predates the first working draft of XML Schema by more than one year, the XML namespaces
  working drafts anticipated the need for namespace-aware schemas by noting, “We envision
  applications of Extensible Markup Language where a document contains markup defined in
  multiple schemas.” At the time of that writing, the standard schema language for XML was
  Document Type Definition (DTD), which, being an integral component of the XML specification,
  was namespace unaware.

  The most recent version of the specification is the Namespaces in XML 1.0 (Second Edition), released
  in August 2006. (A 1.1 version of XML namespaces also exists that corresponds with the 1.1 version
  of XML. It primarily adds support for internationalized resource identifiers [IRIs], which allow
  Unicode characters in resource identifiers [e.g., http://société.com].)

  Though narrowly scoped and remarkably brief, the Namespaces in XML specification still trips up
  even experienced developers. The specification describes a way of qualifying element and attribute
  names from one schema so that they can be made unique in the presence of like named elements and
  attributes in another schema.

  For example, imagine that a program, “A,” could process an XML document containing a list of
  media (e.g., books and movies). This program takes all elements named Title and prints out the
  contents. Thus, when processing the following document:

      The Sign of the Four
      Sir Arthur Conan Doyle
    
      The Hound of the Baskervilles
      Sir Arthur Conan Doyle
    
                                                                                                                                  7

BURTON GROUP 7090 Union Park Center · Suite 200 · Midvale, Utah · 84047 · P 801.566.2880 · F 801.566.3611 · www.burtongroup.com
Program A would generate the following output:

    The Sign of the Four
    The Hound of the Baskervilles

    In this case, the Title element can be said to be in program A’s namespace (even though no schema
    exists). Now imagine a similar schemaless program, “B,” that processes XML documents containing
    print media (e.g., books and magazines) and prints out all Author elements. However, program B
    anticipates that Author elements are composed of nested elements as follows:

    Program B has essentially made the Title element part of its namespace. If a user wanted to use both
    programs, he or she might format a document as follows:

        The Sign of the Four
        
          Sir
          Arthur Conan
          Doyle
        
      …
    
    The result of running this document through program A would be erroneous if the program were
    looking for any instance of the Title element. The result of running it through program B is
    undefined. The two distinct instances of the Title element have collided and thrown both programs
    off the rails.

    XML namespaces addresses this issue of naming collisions. It does so by allowing a collection of
    element and attribute names to be associated with a Uniform Resource Identifier (URI), for that URI
    to be associated with an arbitrary prefix, and for that prefix to be pre-pended to all entities from that
    namespace.

    If program A were using the namespace URI http://company.com/schemas/allmedia, and
    program B were using the namespace URI urn:example.com:schemas:printmedia, then the
    above instance document could be rewritten as follows:

8

BURTON GROUP 7090 Union Park Center · Suite 200 · Midvale, Utah · 84047 · P 801.566.2880 · F 801.566.3611 · www.burtongroup.com
The Sign of the
  Four
      
        Sir
        Arthur Conan
        Doyle
      
    …
  
  Both programs could be rewritten to process only those elements associated with its namespace, thus
  solving the problem of naming collisions.

  Note that the XML namespaces specification does not define a means for declaring that one owns a
  particular namespace nor what entities are in it. In fact, someone can just state by fiat (as is done here)
  that the elements X, Y, and Z are in the namespace http://me.my.mine.com, and that would be just
  fine as far the XML namespaces specification is concerned. This, of course, does not lend itself well to
  automation or ease of use. It would be far better to have a means of explicitly stating namespace
  names and the entities within them.

The Target Namespace
  An XML Schema document describes a vocabulary. For instance, a schema document may describe a
  book as follows:
  
  In this instance, the vocabulary consists of the parent Book element and, within that, the elements
  Title and Author. The Book element is referred to as a global element because its definition is a
  direct child of the schema element. Global elements can live on their own in an instance document (a
  document conforming to and constrained by a particular schema). The other elements, referred to as
  local elements, cannot exist outside the scope of the Book element.

  Other schema documents describe other vocabularies to meet the needs of their audiences. It is
  entirely likely that some other schema document also defines an element called Author (perhaps for
  describing magazine articles). Because instance documents can make use of multiple schemas, an

                                                                                                                                  9

BURTON GROUP 7090 Union Park Center · Suite 200 · Midvale, Utah · 84047 · P 801.566.2880 · F 801.566.3611 · www.burtongroup.com
XML processor might not be able to disambiguate the fact that one instance of the Author element is
      derived from vocabulary 1 and another is derived from vocabulary 2.

      XML namespaces is the means for preventing these naming collisions. XML namespaces makes sure
      that names in one XML vocabulary don’t conflict with names in another XML vocabulary. In many
      ways, XML namespaces are similar to other namespace mechanisms, such as packages in Java or
      Domain Name System (DNS) for the Internet.

      To prevent naming collisions, it is necessary to assign each vocabulary a unique namespace in the
      form of a URI. To do this in XML Schema, use the targetNamespace attribute of the root schema
      element, as shown here:

Qualifying Elements and Attributes
      When a schema declares a target namespace, all globally defined elements, attributes, and types
      become members of that namespace. Globally defined components are those that are immediate
      children of the schema element. In the example above, the only member of the
      urn:example.com:schemas:book namespace is Book.

      Should any XML document make use of a component contained in a namespace, Schema validation
      will require that the document qualify the component’s name with a reference to its namespace. This
      is true not only of instance documents, but also of other XML Schema documents, and even of the
      same schema document that defines the namespace-qualified component. In other words, a schema
      processor must be told which namespace a given element, attribute, or definition belongs to.

      This is accomplished by declaring the namespace using the xmlns (XML namespace) attribute and
      associating the element in question with the namespace using a qualified name (QName). Indeed,
      this namespace qualification appears in many of the earlier examples where the XML Schema
      namespace is declared and associated with all of the components in use from that namespace:

        …
      
 10

 BURTON GROUP 7090 Union Park Center · Suite 200 · Midvale, Utah · 84047 · P 801.566.2880 · F 801.566.3611 · www.burtongroup.com
In this example, the xmlns:xsd=“http://www.w3.org/2001/XMLSchema” attribute declares the
  http://www.w3.org/2001/XMLSchema namespace and associates it with the prefix string xsd. This
  binds the prefix string to the namespace. Components can then use the prefix string as a shorthand for
  the namespace URI. The xsd:schema element name is a QName. A QName comprises two parts: the
  namespace prefix and the component’s local name. The two parts are separated by a colon. A
  nonqualified component name, containing only the local name, is known as a non-colonized name (i.e.,
  lacking a colon) or NCName.

  In the example below, the same technique is used to qualify the Book element.
  
    The Sign of the Four
    Sir Arthur Conan Doyle
  
  So, Book can be said to be “in” the urn:example.com:schemas:book namespace. But what about
  the other elements, Title and Author? Are they in the same namespace? The answer is no. By
  default, only global, top-level definitions in a schema are in the target namespace and, therefore, must
  be qualified. Elements that are not in the target namespace must not be qualified.

  However, schema authors, if they so desire, can assert that all elements and/or attributes in the schema
  by default belong to the target namespace. Why would they choose to do so? Two reasons: one, it gives
  the schema author the leeway to move elements from a local to a global context in later revisions of the
  schema without breaking backward compatibility. And two, it allows schema users to be less
  knowledgeable about the internals of the schema if they know they must simply qualify every element.

  To specify that all elements should belong to the target namespace, the schema author uses the
  elementFormDefault attribute of the schema element and sets it to qualified. Alternatively, it can
  be set to (the default) unqualified, which means that local elements in instance documents must not
  be qualified. Using QNames for each element future-proofs instance documents against changes in
  the schema. (The default setting—either qualified or unqualified—can also be overridden in an
  individual element definition via the form attribute, but this technique is not commonly used.)

  Here’s a schema that demonstrates the use of elementFormDefault:
  
                                                                                                                                  11

BURTON GROUP 7090 Union Park Center · Suite 200 · Midvale, Utah · 84047 · P 801.566.2880 · F 801.566.3611 · www.burtongroup.com
The corresponding instance document now looks like this:

       The Sign of the Four
       Sir Arthur Conan Doyle
     
     In this instance document, bk is specified as the prefix string, and that string is used to qualify all
     elements in the document. Note that the prefix string is completely arbitrary. Instance authors are
     free to use any prefix string they care to. For example, the following instance document is
     semantically equivalent to the previous one.

       The Sign of the Four
       Sir Arthur Conan Doyle
     
     A namespace declaration is available to the element in which it is declared and to all of that element’s
     children (referred to as the namespace scope). In this instance, the book namespace is declared in the
     Book element, and it can be referenced by all children (and their children) of the Book element.

     From the instance document author’s point of view, a number of equally valid mechanisms exist for
     qualifying a component’s name. Of particular interest is the default namespace declaration, in which
     no prefix string is specified in the namespace declaration; for example:

     xmlns=“urn:example.com:schemas:book”

     Notice how the xmlns attribute is followed directly by the equal sign and not by a colon/string pair.
     As the name implies, a default namespace declaration establishes a default namespace. All elements
     within the scope of the default namespace declaration that are not explicitly qualified are assigned to
     the default namespace.

     The following examples demonstrate valid mechanisms for qualifying names. For simplicity’s sake,
     the schema is changed here to have only one local element:
     
12

BURTON GROUP 7090 Union Park Center · Suite 200 · Midvale, Utah · 84047 · P 801.566.2880 · F 801.566.3611 · www.burtongroup.com
Valid instances of this schema include the following documents:

    The Sign of the Four
  
  All elements are qualified using a prefix declared in the root element.

    The Sign of the Four
  
  All elements are qualified using the default namespace declared in the root element.

    The Sign of the Four
  
  Each element declares a default namespace for itself and its children.

    The Sign of the
  Four
  
  Each element declares an explicit namespace for itself. Even though both elements are in the same
  namespace, the prefix strings are different.

    The Sign of the Four
  
  Each element declares its own namespace. The root element assigns its namespace a prefix, but the
  child element (and its children, if any) declares a default namespace.

  If elementFormDefault is set or defaulted to unqualified, then instance authors must be careful
  not to qualify elements that don’t belong to a namespace. This is especially easy to do when using
  default namespaces, which implicitly qualify child elements. For instance, in this schema all local
  elements don’t belong to a namespace.
  
                                                                                                                                  13

BURTON GROUP 7090 Union Park Center · Suite 200 · Midvale, Utah · 84047 · P 801.566.2880 · F 801.566.3611 · www.burtongroup.com
The following document is not a valid instance of this schema because the Title element is
     implicitly qualified by the default namespace, though it should be in no namespace. (The question
     marks are a notation made to clearly illustrate that the example XML is invalid.)

     ? 
     ?   The Sign of the Four
     ? 

     In order to correct this error, the instance author must override the default namespace in the Symbol
     element and set it to an empty string (no namespace), like so:

       The Sign of the Four
     
     Or not use default namespaces at all:

       The Sign of the Four
     
     Schema authors can also force the qualification of attributes by using the attributeFormDefault
     attribute of the schema element. However, the additional clutter of qualifying every attribute is
     generally not considered worth the effort unless the attribute is intended to be used in multiple
     schemas.

     With all this in mind, an instance document with two Author (and Title) elements from different
     schemas would look something like this:

         The Sign of the Four
         Sir Arthur Conan Doyle
       
         A Scandal in Bohemia
         Sir Arthur Conan Doyle
       
     An XML Schema-aware processor, when parsing this document, can now disambiguate the two
     instances of the Title and Author elements.

     Note that the instance document below using default namespaces is semantically equivalent to the
     instance document above using explicit namespaces.

14

BURTON GROUP 7090 Union Park Center · Suite 200 · Midvale, Utah · 84047 · P 801.566.2880 · F 801.566.3611 · www.burtongroup.com
The Sign of the Four
       Sir Arthur Conan Doyle
     
       A Scandal in Bohemia
       Sir Arthur Conan Doyle
     
Choosing a Name for a Namespace
   Namespace URIs are just strings. One namespace is distinct from another if the namespace strings differ
   in any way, including case and trailing slashes. However, a namespace must be a URI in order to help
   guarantee uniqueness. URIs can be either Uniform Resource Locators (URLs) or URNs. URLs are
   familiar, and namespaces are often specified as URLs (e.g., http://www.w3.org/2001/XMLSchema), but
   that is not a requirement. A URN, in contrast, is a Universal Resource Name—a unique name for
   something, but not a pointer to it. Examples of real-world URNs include an International Standard Book
   Number (ISBN), a Committee on Uniform Securities Identification Procedures (CUSIP) number, or a
   DNS name (without a corresponding Internet Protocol [IP] address). How a schema processor maps URIs
   to actual XML Schema documents is processor specific.

   Given that namespaces in XML Schema can be either URLs or URNs, which should schema authors
   use? Both are viable, but an enterprise should consistently use one or the other, and not have some
   schema documents use URLs and others use URNs. The best practices for using URLs and URNs
   are presented here.

   Note: Though either URLs or URNs may be used for namespace names, the bulk of this MBP
   document has adopted URNs.

Using URNs for Namespace Names
   The format of a URN is as follows: urn:global_namespace:local_namespace*:element, where
   urn is the URI scheme, global_namespace is universally unique, local_namespace is optional and
   locally unique, and element is a reference to something in this namespace. Schema authors can also
   use slashes instead of colons, though slashes typically indicate an actual path to a resource.

   Because URNs define names, it is common to use URNs as namespace names. Internal URNs can be
   of the form urn:company.com:schemaname. However, “Company” may like to use the universally
   unique company.com namespace for other purposes, so namespace names should be refined even
   further. Examples include:

                                                                                                                                   15

 BURTON GROUP 7090 Union Park Center · Suite 200 · Midvale, Utah · 84047 · P 801.566.2880 · F 801.566.3611 · www.burtongroup.com
•   urn:company.com:schemas:products:books:genres
          •   urn:company.com:schemas:products/books/genres
          •   urn:schemas.company.com:products/books/genres
          •   urn:com.company.schemas:products:books:genres

      Note how a DNS name is used as the globally unique namespace component (inverted in the last
      example)—this is only a convention, but it’s a good one. Also, notice how these URNs contain
      taxonomical information; that is, the “genres” schema is classified as being in the “products” category of
      “schemas” and further classified as being in the “books” category of the “products” category. Because
      URNs only need to be unique and not actually point to anything, it is arbitrary as to whether subdomains
      are used (colon separated) or paths are used (slash separated). However, colons are preferred.

Using URLs for Namespace Names
      Schema authors may elect to use URLs for namespace names instead of URNs. While no requirement
      exists that the URL actually point to a resource on the Web, and many don’t, it is a best practice to have
      the URL point to something. The question is, what should the URL point to? In the case of XML
      Schema, the answer is, typically, the schema file itself. However, the answer is not always so obvious.
      The Namespaces in XML specification is completely distinct from the XML Schema specification, so it
      is perfectly valid for a namespace URI to point to a DTD, to a RELAX NG or Schematron schema, or
      even to a text document that describes the namespace in prose. In fact, even schema authors may wish
      to provide alternative schemas, documentation, style sheets, or source or object code.

      The answer, then, to the question of what should a namespace URL point to is everything—but indirectly,
      via a Resource Directory Description Language (RDDL) document. An RDDL document provides
      references to any and all information regarding a namespace that the schema author cares to include.
      RDDL includes a single element, resource, that has a number of descriptive attributes. For example, an
      RDDL document that points to an XML Schema file and English documentation might look like this:
      
      Notice that the href attribute of the first resource element is pointing at the actual XML Schema file
      and the second href attribute is pointing at an HTML file. Each resource element also makes use of a
      role, arcrole, and title attribute (more allowable attributes exist). The role and arcrole attributes

 16

 BURTON GROUP 7090 Union Park Center · Suite 200 · Midvale, Utah · 84047 · P 801.566.2880 · F 801.566.3611 · www.burtongroup.com
are used to describe the “nature” and “purpose” of the resource, and generally have “well known”
  values. For instance, the first resource element states that the referenced document is an XML
  Schema document, and the arcrole attribute states its use for inclusion in other schema documents.
  The title attribute gives a short, non-normative description of the resource.

  Also notice that the resource elements above are not bound up in a root element, nor are the rddl
  and xlink prefixes declared. This information has been elided solely for the sake of clarity. In
  actuality, however, the root element of an RDDL document is . That’s right, an RDDL
  document is really just an HTML document, XHTML to be precise, and RDDL is defined as an
  extension of XHTML Basic. This means that schema authors can have content in an RDDL
  document designed to be rendered by a browser, and they can add an rddl:resource element
  basically wherever they can put a paragraph, , tag.

  A more accurate (and superior) version of the above, then, might be something like this:
  
    RDDL Document for Book genres
  
    Genre Type Resource

     This document describes the complex type representing a book’s genre.

     XML Schema

     The schema is available at
       
       http://schemas.company.com/products/books/genres.xsd
     
     Documentation for this type is available at
       
       http://schemas.company.com/products/books/genres.html
     
                                                                                                                                  17

BURTON GROUP 7090 Union Park Center · Suite 200 · Midvale, Utah · 84047 · P 801.566.2880 · F 801.566.3611 · www.burtongroup.com
Now, when schema users enter the namespace URL into a browser, they are greeted with an
      explanatory HTML page, but when an RDDL-aware piece of software follows the link, it can easily
      parse out the rddl:resource elements for processing.

Versioning
      XML Schema has no inherent notion of versioning. The schema element does have a version
      attribute, but applications are free to ignore it, and many do. The fact is, no universal mechanism
      exists for signaling version change to an application and having the application take appropriate
      steps. Even best practices are isolated in their effectiveness for those who practice them. However, it’s
      important to have some repeatable practice around this subject so that those who honor it can deal
      with the inevitable fact of version change.

      It is rare for an XML Schema document to remain static from the moment of its creation. Mistakes
      and oversights will need to be corrected and changes in technology and business will need to be
      accounted for. From a big-picture point of view, however, a schema can change in only two ways: In
      a way that breaks backward compatibility and in a way that doesn’t.

      Every schema should set the version attribute. Three dot-separated digits should be used (e.g.,
      1.0.3). Every change to a published schema should be reflected in the schema’s version attribute. If
      changes break backward compatibility (e.g., a new mandatory element is added), then the major
      number should be incremented. If a schema changes the capabilities of a schema without breaking
      backward compatibility (e.g., an optional element has been added), then the second digit should be
      incremented. If a schema changes such that the functionality is not affected at all (e.g., comments
      have been added), then the third digit should change. The first digit should be set to zero if the
      schema is not yet ready for production use.

      Examples:

      Using just the version attribute, however, is not enough to ensure that applications using this
      schema don’t break, because applications rarely pay attention to the version attribute. When a schema
      author makes compatibility-breaking changes, he or she must consider the schema to be entirely new.
      The schema must therefore occupy its own namespace. And the best way for the author to do that is
      to append to the namespace name the date that the schema was moved into production. Even though
      the namespace name has changed, do not reset the version field to 1.0.0; rather, increment as normal
      to show that this schema is closely related to some earlier schema.

 18

 BURTON GROUP 7090 Union Park Center · Suite 200 · Midvale, Utah · 84047 · P 801.566.2880 · F 801.566.3611 · www.burtongroup.com
URN examples:

    URL examples:

    The schema author should not remove earlier schemas from production until he or she is certain that
    no applications are relying on them. Note, though, that schemas are not always (or often) accessed
    from the network on each use; therefore, making schemas physically unavailable is not enough to
    keep them from being used. Frequently schemas are used once to, for instance, generate some static
    code, or a schema will be copied locally for continued use. In short, there really is no way to remove a
    schema from production.

Best Practices: Namespaces and Versioning
What follows is a succinct list of best practices that pertain to namespace usage and schema versioning.

Namespaces
    In all cases, schemas should be assigned a target namespace for the simple reason of avoiding naming
    collisions. Ensure that the components that make up the namespace are logically related.

    Best Practice: Give all schema files their own namespace.

    When a schema is used by another schema or an instance document, it is necessary for the schema
    user to qualify only global elements. However, schema authors should use the elementFormDefault
    attribute to force users to qualify all elements. This allows the original schema author to move an

                                                                                                                                    19

  BURTON GROUP 7090 Union Park Center · Suite 200 · Midvale, Utah · 84047 · P 801.566.2880 · F 801.566.3611 · www.burtongroup.com
element from a local context to a global context at some later time without breaking those documents
     that are dependent on the schema. It also frees the instance author from having to empirically
     determine whether or not local elements need to be qualified. Finally, if developers are using .NET
     serialization, then elementFormDefault is mandatory.

     Best Practice: Use elementFormDefault to force all elements to be qualified.

     However, schema authors should not force users to qualify attributes. Forcing the qualification of
     attributes just adds clutter.

     Best Practice: Do not force attributes to be qualified via attributeFormDefault.

     It is possible to set a namespace to be the default namespace for an element and all its children until
     overridden. Schema authors, though, should not use a default namespace. Fewer errors are
     introduced if all components are explicitly qualified. Schema authors should assign a prefix string to
     all declared namespaces. Frequently a schema references components defined within itself, so schema
     authors must also declare a namespace for the target namespace. It is common to assign the target
     namespace the prefix “tns,” which is short for “this namespace.” The following contrived example
     shows this usage:

     Best Practice: Don’t use a default namespace. Assign the “tns” prefix to the target
     namespace.

     Namespaces must be URIs. URIs can be either URLs or URNs. Organizations should standardize on
     only one format and schema authors should conform to this standard.

20

BURTON GROUP 7090 Union Park Center · Suite 200 · Midvale, Utah · 84047 · P 801.566.2880 · F 801.566.3611 · www.burtongroup.com
Best Practice: Standardize on one URI format for namespace names.

  Namespace URIs are case sensitive and are also sensitive to trailing slashes (URLs and URNs) and
  semicolons (URNs).

  Best Practice: Namespace URIs should be all lowercase and not contain trailing
  slashes or semicolons.

  How URNs are structured is entirely up to the schema author, but the globally unique component
  should be a DNS name (e.g., company.com). The locally unique component should be taxonomical.
  Colons should be used as separators unless the locally unique bit is an actual path to the document. A
  representative URN, then, looks something like this:

  urn:company.com:schemas:products:books:fiction:genres

  Best Practice: The globally unique component of a URN should be a DNS name.

  Even though schema authors are free to structure a URN as they please, an organization (via an
  XML data management group) should standardize on the locally unique component of a namespace.
  Start by collecting all corporate schemas into a single component, thus keeping the root namespace
  uncluttered:

  urn:company.com:schemas

  Spend some time creating a shallow taxonomy that represents the enterprise’s information
  technology (IT) or business needs. For instance, group schemas by business unit:

  urn:company.com:schemas:all_company
  urn:company.com:schemas:magazines
  urn:company.com:schemas:books

  Continue to categorize schemas into domain components as necessary, but don’t go too deep.

  urn:company.com:schemas:books:textbooks

  Finally, tack on the schema name, then follow the best practices given immediately below concerning
  schema versioning.

  urn:company.com:schemas:books:textbooks:subjects

  Think hard about namespace names early in the game. Once deployed, they will stick around for
  awhile.

                                                                                                                                  21

BURTON GROUP 7090 Union Park Center · Suite 200 · Midvale, Utah · 84047 · P 801.566.2880 · F 801.566.3611 · www.burtongroup.com
Best Practice: The locally unique component of a URN-valued namespace name
     should reflect an organized taxonomy. Collect all schemas together, then branch out
     as many levels deep as necessary.

     Best Practice: If URLs are used as a namespace, the domain name should be
     prefaced by a host name that collects all schemas. The URL path should follow the
     same taxonomical practices given for URNs.

     Example: http://schemas.company.com/books/textbooks/subjects

     When a URL is used as a namespace, it is not strictly necessary for the URL to point to anything—it
     must only be unique among all other namespace names. However, users of a schema will expect to be
     able to resolve a URL-valued namespace, and will be surprised if they can’t.

     Best Practice: When a URL is used as a namespace name, the URL should resolve to
     some resource.

     When a URL is used for a namespace, many different resources can legitimately be put at the URL
     address, but only one is possible.

     Best Practice: A URL-valued namespace should resolve to an RDDL document. If this
     is not practical, then it should resolve to the schema itself.

     It is presumed that an enterprise has some central schema authority that is responsible for creating
     and managing corporate schemas and namespaces. However, it is impossible for such a group to
     create all schemas; local groups and individuals will also need this capability. These schema authors
     should be assigned a namespace of their own that reflects a group over which the authors can exercise
     control. That is, user schemas should have namespaces that look something like these:

     urn:company.com:schemas:human_resources_schemas:*
     http://schemas.company.com/human_resources_schemas/*

     The http example above implies that local schema authors be able to publish to their portion of the
     web server or content management system that controls this address.

     Best Practice: Locally created schemas should be collected into a namespace that the
     developer can control.

     Versioning
         As described above, the final component of the target namespace should be the date on which the
         schema was created. Should the schema be updated in a way that breaks backward compatibility,
         the namespace name should be changed by updating this date component.

22

BURTON GROUP 7090 Union Park Center · Suite 200 · Midvale, Utah · 84047 · P 801.566.2880 · F 801.566.3611 · www.burtongroup.com
urn:company.com:schemas:books:textbooks:subjects:2006-06-15
        http://schemas.company.com/books/textbooks/subjects/2006-06-15

        Best Practice: Append the schema creation date to the namespace name using
        the ISO date standard of YYYY-MM-DD. Update this date to reflect incompatible
        changes to the schema. Don’t update the namespace URI if changes are
        backward compatible.

        Best Practice: Use the schema/version attribute. Adopt a three-digit, dot-
        separated version number. Update the first digit for compatibility-breaking
        changes, update the second digit for functionality-altering changes, and update
        the third digit for inconsequential changes. Do not reset the version to 1.0.0,
        when the date component of the namespace changes.

        Best Practice: Keep earlier versions of a schema available until all use of them
        has been eliminated.

Creating XML Schema Documents
   XML Schema is made up of two principal components: a means to describe the structure of an XML
   document and a means to provide type information for elements and attributes. Accordingly, the
   information provided in this section follows roughly the same organization.

Structuring an XML Schema Document
   The principal structural components of an XML Schema document are the element and attribute
   elements, which determine the appearance of elements and attributes in an instance document (a
   document conforming to a particular schema). Elements and attributes can be typed, and complex
   types can have embedded elements or attributes.

Elements
   The element element is used to dictate the appearance of elements in an instance document.
   Elements can be defined globally or local to a specific type.

Global Elements
   When the element element is a direct child of the schema element (i.e., not a child of any other
   element), it is said to be a global element. Global elements may be used as root elements in an instance
   document, and they can be referenced (via the ref attribute) by other schema elements.

                                                                                                                                   23

 BURTON GROUP 7090 Union Park Center · Suite 200 · Midvale, Utah · 84047 · P 801.566.2880 · F 801.566.3611 · www.burtongroup.com
The example below is a complete schema document. Both elements, Author and Title, are global
      elements.
      
      And below is a conforming instance document. Notice that root elements use the xmlns attribute to
      declare what namespace the element belongs to. Because the Author element is global, it can be used
      as a root element in this instance document.

      Brian Kernighan

      The following is another conforming instance document. In this case, the global element Title is
      used as the root element.

      The C Programming
      Language

      The third example of a conforming instance document, below, shows that an instance document can
      have as many root elements as it wants—a schema document cannot constrain root elements to a
      specified number of occurrences. Notice that both root elements must declare their namespace
      because neither element is within the scope of the other.

      Brian Kernighan
      Dennis Ritchie

      This final example is similar to the previous one, except that it shows both root elements in use in a
      single document.

      Brian Kernighan
      The C Programming
      Language

Local Elements
      When an element element is a child of some other element, it is said to be a local element. In an instance
      document, a local element can exist only within the scope of its parent. It cannot be a root element. By
      default, local elements are not part of a schema’s target namespace. As discussed earlier, schema developers
      can effectively override this default by specifying elementFormDefault=“qualified” in the schema
      element or by specifying form=“qualified” on a specific element definition.

      In this example, the complexType and sequence elements are introduced in order to maintain
      correct syntax; these tags are explained in the “Custom Types” section of this MBP document. For

 24

 BURTON GROUP 7090 Union Park Center · Suite 200 · Midvale, Utah · 84047 · P 801.566.2880 · F 801.566.3611 · www.burtongroup.com
You can also read