Copyright ©2000 W3C® (MIT, INRIA, Keio), All Rights Reserved. W3C liability, trademark, document use and software licensing rules apply.
XML Schema Part 0: Primer is a non-normative document intended to provide an easily readable description of the XML Schema facilities and is oriented towards quickly understanding how to create schemas using the XML Schema language. XML Schema Part 1: Structures and XML Schema Part 2: Datatypes provide the complete normative description of the XML Schema definition language, and the primer describes the language features through numerous examples which are complemented by extensive references to the normative texts.
The XML Schema Part 0: Primer is a part of the W3C XML Activity.
This is a public working draft of XML Schema 1.0 for review by the public and by members of the World Wide Web Consortium. The XML Schema Working Group has agreed to its publication. Note that some sections of this draft may not be up-to-date with the XML Schema language described in Parts 1 and 2 of the XML Schema specification. Known discrepancies are noted in the text.
The Working Group does not anticipate further substantial changes to the syntax described here, although this is still a working draft, and is subject to change based on experience and on comment by the public, and other W3C working groups.
A list of current W3C working drafts can be found at http://www.w3.org/TR/. They may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use W3C Working Drafts as reference material or to cite them as other than "work in progress".
2 Basic Concepts: The Purchase Order
2.1 The Purchase Order Schema
2.2 Complex Type Definitions, Element
& Attribute Declarations
2.3 Simple Types
2.4 Anonymous Type Definitions
2.5 Types of Element Content
2.6 Annotations, Versions & Comments
2.7 Groups
2.8 Attribute Groups
2.9 Null Values
3. Advanced Concepts I: The International
Purchase Order
3.1 A Schema in Multiple Documents
3.2 Deriving Types by Extension
3.3 Using Derived Types in Instance
Documents
3.4 Deriving Types by Restriction
3.5 Equivalence Classes
3.6 Abstract Elements
3.7 Preventing Type Derivations
4. Advanced Concepts II: The
Quarterly Report
4.1 Specifying Uniqueness
4.2 Defining Keys and
their References
4.3 XML Schema Constraints vs.
XML 1.0 ID Attributes
4.4 Importing Types
4.5 Any Element, Any Attribute
4.6 schemaLocation
4.7 Conformance
A. Acknowledgements
B. Simple Types & Their Facets
C. Regular Expressions
D. Index
E. Document History
This document, XML Schema Part 0: Primer, provides an easily approachable description of the XML Schema definition language, and should be used alongside the formal descriptions of the language contained in Parts 1 and 2 of the XML Schema specification. The intended audience of this document includes application developers whose programs read and write schema documents, and schema authors who need to know about the features of the language, especially features that provide functionality above and beyond what is provided by DTDs. The text assumes that you have a basic understanding of XML 1.0 and XML-Namespaces. Each major section of the primer introduces new features of the language, and describes the features in the context of concrete examples.
Section 2 covers the basic mechanisms of XML Schema. It describes how to declare the elements and attributes that appear in XML documents, the distinctions between simple and complex types, defining complex types, the use of simple types for element and attribute values, schema annotation, a simple mechanism for re-using element and attribute definitions, and null values.
Section 3 covers some of XML Schema's advanced features, and in particular, it describes mechanisms for deriving types from existing types, and for controlling these derivations. The section also describes mechanisms for merging together fragments of a schema from multiple sources, and for element substitution.
Section 4 covers more advanced features, including a powerful mechanism for specifying uniqueness among attributes and elements, a mechanism for using types across namespaces, a mechanism for extending types based on namespaces, and a description of how documents are checked for conformance.
In addition to the sections just described, the primer has a number of appendices that contain detailed reference information on simple types and an associated regular expression language.
The primer is a non-normative document, which means that it does not provide a definitive (from the W3C's point of view) specification of the XML Schema language. The examples and other explanatory material in this document are provided to help you understand XML Schema, but they may not always provide definitive answers. In such cases, you will need to refer to the XML Schema specification, and to help you do this, we provide many links pointing to the relevant parts of the specification.
Ed. Note: At this time, there are only links from section 2.3 and appendix B to XML Schema Part 2: Datatypes. Links to XML Schema Part 1: Structures and links from more sections to Part 2 will be provided in future working drafts.
The purpose of a schema is to define a class of XML documents, and so the term "instance document" is often used to describe an XML document that conforms to a particular schema. In fact, neither instances nor schemas need to exist as documents per se -- they may exist as streams of bytes sent between applications, as fields in a database record, or as collections of XML Infoset "Information Items" -- but to simplify the primer, we have chosen to always refer to instances and schemas as if they are files.
Let us start by considering an instance document in a file
called po.xml
. It describes a purchase order
generated by an application for ordering and billing home
products:
The Purchase Order, po.xml |
<?xml version="1.0"?> <purchaseOrder orderDate="1999-10-20"> <shipTo country="US"> <name>Alice Smith</name> <street>123 Maple Street</street> <city>Mill Valley</city> <state>CA</state> <zip>90952</zip> </shipTo> <billTo country="US"> <name>Robert Smith</name> <street>8 Oak Avenue</street> <city>Old Town</city> <state>PA</state> <zip>95819</zip> </billTo> <comment>Hurry, my lawn is going wild!</comment> <items> <item partNum="872-AA"> <productName>Lawnmower</productName> <quantity>1</quantity> <price>148.95</price> <comment>Confirm this is electric</comment> </item> <item partNum="926-AA"> <productName>Baby Monitor</productName> <quantity>1</quantity> <price>39.98</price> <shipDate>1999-05-21</shipDate> </item> </items> </purchaseOrder> |
The purchase order consists of a main element, purchaseOrder
,
and the subelements shipTo
, billTo
, and
items
. These subelements in turn contain other
subelements, and so on, until a subelement such as price
contains a number rather than any subelements. Elements that
contain subelements or attributes are said to have complex types,
whereas elements that contain numbers (and strings, and dates,
etc) but do not contain any subelements are said to have simple
types. Some elements have attributes; attributes always have
simple types.
The complex types in the instance document and some of the simple types are defined in the schema for purchase orders. The other simple types are defined as part of XML Schema's repertoire of built-in simple types.
Before going on to examine the purchase order schema, we digress briefly to mention the association between the instance document and the purchase order schema. As you can see by inspecting the instance document, the purchase order schema is not mentioned. To keep this first section simple, we assume that any processor of the instance document can obtain the purchase order schema without requiring any information from the instance document. In later sections, we will examine mechanisms that provide explicit information about the schema.
The purchase order schema is contained in the file po.xsd
:
The Purchase Order Schema, po.xsd |
<xsd:schema xmlns:xsd="http://www.w3.org/1999/XMLSchema"> <xsd:annotation> <xsd:documentation> Purchase order schema for Example.com. Copyright 2000 Example.com. All rights reserved. </xsd:documentation> </xsd:annotation> <xsd:element name="purchaseOrder" type="PurchaseOrderType"/> <xsd:element name="comment" type="xsd:string"/> <xsd:complexType name="PurchaseOrderType"> <xsd:element name="shipTo" type="Address"/> <xsd:element name="billTo" type="Address"/> <xsd:element ref="comment" minOccurs="0"/> <xsd:element name="items" type="Items"/> <xsd:attribute name="orderDate" type="xsd:date"/> </xsd:complexType> <xsd:complexType name="Address"> <xsd:element name="name" type="xsd:string"/> <xsd:element name="street" type="xsd:string"/> <xsd:element name="city" type="xsd:string"/> <xsd:element name="state" type="xsd:string"/> <xsd:element name="zip" type="xsd:decimal"/> <xsd:attribute name="country" type="xsd:NMTOKEN" fixed="US"/> </xsd:complexType> <xsd:complexType name="Items"> <xsd:element name="item" minOccurs="0" maxOccurs="*"> <xsd:complexType> <xsd:element name="productName" type="xsd:string"/> <xsd:element name="quantity"> <xsd:simpleType base="xsd:positive-integer"> <xsd:maxExclusive value="100"/> </xsd:simpleType> </xsd:element> <xsd:element name="price" type="xsd:decimal"/> <xsd:element ref="comment" minOccurs="0"/> <xsd:element name="shipDate" type="xsd:date" minOccurs='0'/> <xsd:attribute name="partNum" type="Sku"/> </xsd:complexType> </xsd:element> </xsd:complexType> <xsd:simpleType name="Sku" base="xsd:string"> <xsd:pattern value="/d{3}-[A-Z]{2}"/> </xsd:simpleType> </xsd:schema> |
The purchase order schema consists of a schema
element and a variety of subelements, most notably element
,
complexType
, and simpleType
which
determine the appearance of elements and their content in
instance documents.
Each of the elements in the schema has a
prefix xsd:
which is associated with the XML Schema
namespace through a declaration (xmlns:xsd="http://www.w3.org/1999/XMLSchema"
)
that appears in the schema
element. The same prefix,
and hence the same association, also appears on the names of
built-in simple types. The purpose of the association is to
identify the elements and simple types as belonging to XML Schema
rather than the schema author. For the sake of clarity in the
text, we just mention the names of elements and simple types (e.g.
simpleType
), and
omit the prefix.
In XML Schema, there is a basic difference between complex types which allow elements in their content and may carry attributes, and simple types which cannot have element content nor carry attributes. There is also a major distinction between definitions which create new types (both simple and complex), and declarations which enable the appearance in document instances of elements or attributes with specific names and types (both simple and complex). In this section, we focus on defining complex types and declaring the elements and attributes that appear within them.
New complex types are defined using the complexType
element and typically contain a set of element and attribute
declarations. These declarations are not themselves types, but
rather an association between a name and constraints which govern
the appearance of that name in documents governed by the
associated schema. For example, Address
is defined
as a complex type, and within the definition of Address
we see five element declarations and one attribute declaration:
Defining the Address Type |
<xsd:complexType name="Address" > <xsd:element name="name" type="xsd:string" /> <xsd:element name="street" type="xsd:string" /> <xsd:element name="city" type="xsd:string" /> <xsd:element name="state" type="xsd:string" /> <xsd:element name="zip" type="xsd:decimal" /> <xsd:attribute name="country" type="xsd:NMTOKEN" fixed="US"/> </xsd:complexType> |
The consequence of this definition is that any element
appearing in an instance (e.g. po.xml
) whose type is
declared to be Address
must consist of five elements
and one attribute. These elements must be called name
,
street
, city
, state
and zip
.
The first four of these elements will each contain a string, and
the fifth will contain a decimal number. The element whose type
is declared to be Address
may appear with an
attribute called country
containing the string
"US".
The Address definition contains only declarations involving
simple types: string, decimal and NMTOKEN. In contrast, the purchaseOrderType
definition contains element declarations involving complex types,
e.g. Address
, although note that both declarations use the same
"type=" attribute to identify the type, regardless of
whether the type is simple or complex.
Defining the PurchaseOrderType Type |
<xsd:complexType name="PurchaseOrderType"> <xsd:element name="shipTo" type="Address" /> <xsd:element name="billTo" type="Address" /> <xsd:element ref="comment" minOccurs="0" /> <xsd:element name="items" type="Items" /> <xsd:attribute name="orderDate" type="xsd:date" /> </xsd:complexType> |
In defining PurchaseOrderType
, two of the element
declarations, shipTo
and billTo
,
associate different element names with the same complex type,
namely Address
. The consequence of this definition
is that any element appearing in an instance (e.g. po.xml
)
whose type is declared to be PurchaseOrderType
must
consist of elements called shipTo
and billTo,
each containing the five subelements (name
, street
,
city
, state
and zip
) that
were declared as part of Address
. The shipTo
and billTo
elements
may also carry the country
attribute that was declared as part of Address
.
The PurchaseOrderType definition contains an orderDate
attribute declaration which, like the country
attribute declaration, involves a simple type (date). In fact,
all attribute declarations must reference simple types because,
unlike element declarations, attributes cannot contain other
elements or attributes.
The element declarations we have described so far have each associated a name with an existing type definition. Sometimes it is preferable to use an existing element rather than to declare a new element, for example:
<xsd:element ref="comment" minOccurs="0" />
This declaration references an existing element, comment
,
that was declared elsewhere in the purchase order schema. In
general, the value of the ref
attribute must
reference a global element, i.e. one that exists at the top-level
of the schema as an immediate subelement of the schema
element. The consequence of this declaration is that an element
called comment
may appear in an instance document,
and its content must be consistent with that element's type, in
this case, string.
The comment
element is optional within PurchaseOrderType
,
on account of minOccurs=0
. Elements may also be
declared to appear one or more times, by setting a maxOccurs
attribute to 1 or * respectively. The default value for minOccurs
is 1 but there is no default value for maxOccurs, so element
declarations that omit both minOccurs and maxOccurs attributes
must occur exactly once. minOccurs and maxOccurs may also be
applied to attribute declarations, although their default values
are 0 and 1 repectively, making unattributed attributes optional
by default. A particular attribute may appear only once in an
element, and so the maximum value of maxOccurs is 1.
In this section we have described how to define new complex
types (e.g. PurchaseOrderType
), and declare elements (e.g.
purchaseOrder
) and attributes (e.g. orderDate
). These activities
generally involve naming, and the question naturally arises: What
happens if two things have the same name? The answer depends upon
the two things in question, although in general the more similar
are the two things, the more likely is there to be a conflict.
Here are some examples to illustrate when same names cause problems. If the two things are both types, say I define a complex type called US-States and a simple type called US-States, there is a conflict. If the two things are a type and an element or attribute, say I define a complex type called purchaseOrder and I declare an element called purchaseOrder, there is no conflict. If the two things are elements within different types (i.e. not global elements), say I declare one element called name as part of the Address type and a second element called name as part of the Item type, there is no conflict. Finally, if the two things are both types and you define one and XML Schema has defined the other, say you define a simple type called decimal, there is no conflict. The reason for the apparent contradiction in the last example is that the two types belong to different namespaces. We'll explore the use of schema and namespaces in a later section.
The purchase order schema declares several elements and
attributes that have simple types. Some of these simple types,
such as string
and decimal,
are built-in to XML Schema, while others are derived from the
built-in's. For example, the partNum
attribute has a
type called Sku
that is derived from string. Both
built-in simple types and their derivations can be used in all
element and attribute declarations. Table
1 lists all the simple
types built-in to XML Schema, along with an example of each
type.
Table 1. Simple Types Built-In to XML Schema | |
---|---|
Simple Type | Example |
string | "Confirm this is electric" |
boolean | true, false, 1, 0 |
float | -INF, -1E4, -0, 0, 12.78E-2, 12, INF, NaN ("not a number"), equivalent to single-precision 32-bit floating point |
double | -INF, -1E4, -0, 0, 12.78E-2, 12, INF, NaN, equivalent to double-precision 64-bit floating point |
decimal | -1.23, 0, 123.4, 1000.00 |
timeInstant | 1999-05-31T13:20:00.000-05:00 (May 31st 1999 at 1.20pm Eastern Standard Time which is 5 hours behind Co-Ordinated Universal Time) |
timeDuration | P1Y2M3DT10H30M12.3S (1 year, 2 months, 3 days, 10 hours, 30 minutes, 12.3 seconds) |
recurringInstant | --05-31T13:20:00 (May 31st every year at 1.20pm Co-Ordinated Universal Time, format similar to timeInstant) |
binary | 100010 |
uri-reference | http://www.example.com/, http://www.example.com/doc.html#ID5 |
ID | is an XML 1.0 ID attribute type |
IDREF | is an XML 1.0 IDREF attribute type |
ENTITY | is an XML 1.0 ENTITY attribute type |
NOTATION | is an XML 1.0 NOTATION attribute type |
language | en-GB, en-US, fr, and other valid values for xml:lang as defined in XML 1.0 |
IDREFS | is an XML 1.0 IDREFS attribute type |
ENTITIES | is an XML 1.0 ENTITIES attribute type |
NMTOKEN | US, is an XML 1.0 NMTOKEN attribute type |
NMTOKENS | "US UK", is an XML 1.0 NMTOKENS attribute type |
Name | shipTo (is an XML 1.0 Name type) |
QName | Address (is an XML Namespace QName) |
NCName | Address (is an XML Namespace NCName, i.e. is a QName without the prefix and colon) |
integer | -126789, -1, 0, 1, 126789 |
non-positive-integer | -126789, -1, 0 |
negative-integer | -126789, -1 |
long | -1, 12678967543233 |
int | -1, 126789675 |
short | -1, 12678 |
byte | -1, 126 |
non-negative-integer | 0, 1, 126789 |
unsigned-long | 0, 12678967543233 |
unsigned-int | 0, 1267896754 |
unsigned-short | 0, 12678 |
unsigned-byte | 0, 126 |
positive-integer | 1, 126789 |
date | 1999-05-31, ---05 (5th day of every month) |
time | 13:20:00.000, 13:20:00.000-05:00 |
Note that to retain compatibility between XML Schema and XML 1.0 DTDs, the simple types ID, IDREF, IDREFS, ENTITY, ENTITIES, NOTATION, NMTOKEN, NMTOKENS should only be used in attributes. |
New simple types are defined by derivation from existing
simple types (built-in's and derived). A new type must have a
name different from the existing type, and the new type may
constrain the legal range of values obtained from the existing
type. We use the simpleType
element to define a new simple type, and there are a wide variety
of so-called facets that may be used to constrain the values of
the new type (a complete listing of facets is provided in Appendix B). For example, in the
schema po.xsd
, we defined a new simple type called Sku
that is derived from the simple type string.
Furthermore, we constrain the values of Sku
using a
facet called pattern
in conjunction with the regular expression "/d{3}-[A-Z]{2}
"
that is read "three digits followed by a hyphen followed by
two upper-case letters":
Defining the Simple Type "Sku" |
<xsd:simpleType name="Sku" base="xsd:string"> <xsd:pattern value="/d{3}-[A-Z]{2}"/> </xsd:simpleType> |
This regular expression language is described more fully in Appendix C.
XML Schema defines thirteen facets (see Appendix B). Among these, the enumeration
facet is one the most useful and it can be used to constrain the
values of almost every simple type, except the boolean type. The enumeration
facet limits a simple type to a set of distinct values. For
example, we can use the enumeration
facet to define a new simple type called US-State
,
derived from string,
whose value must be one of the standard US state abbreviations:
Using the Enumeration Facet |
<xsd:simpleType name="US-State" base="xsd:string"> <xsd:enumeration value="AK"/> <xsd:enumeration value="AL"/> <xsd:enumeration value="AR"/> <!-- and so on ... --> </xsd:simpleType> |
US-State
would be a good replacement for the
string type currently used in the state
element
declaration. By making this replacement, the legal values of a state
element, i.e. the state
subelements of billTo
and shipTo
, would be limited to one of AK
,
AL
, AR
, etc.
Ed. Note: Need to describe List simple types
Schema can be constructed by defining sets of named types such
as PurchaseOrderType
and then declaring elements
such as purchaseOrder
that reference the types using
the "type=" construction. This style of schema
construction is straightforward but it can be unwieldy,
especially if you create a lot of types that are only referenced
once and consist of a very few constraints. In these cases, a
type can be more succinctly defined as an anonymous type which
saves the overhead of being named, and referenced through "type=".
The definition of the type Items
contains two
element declarations that have anonymous types. In general, you
can identify anonymous types by the lack of a "type="
in the element (or attribute) declaration, and an immediately
following un-named type definition:
Two Anonymous Type Definitions |
<xsd:complexType name="Items"> <xsd:element name="item" minOccurs="0" maxOccurs="*"> <xsd:complexType> <xsd:element name="productName" type="xsd:string"/> <xsd:element name="quantity"> <xsd:simpleType base="xsd:positive-integer"> <xsd:maxExclusive value="100"/> </xsd:simpleType> </xsd:element> <xsd:element name="price" type="xsd:decimal"/> <xsd:element ref="comment" minOccurs="0"/> <xsd:element name="shipDate" type="xsd:date" minOccurs='0'/> <xsd:attribute name="partNum" type="Sku"/> </xsd:complexType> </xsd:element> </xsd:complexType> |
In the case of the item
element, it has an un-named
complex type consisting of the elements productName
,
quantity
, price
, comment
,
and shipDate
, and an attribute called partNum
.
In the case of the quantity
element, is has an un-named
simple type derived from integer whose value ranges between 1 and
99.
The purchase order schema has many examples of elements
containing other elements (e.g. items
), elements
having attributes and containing other elements (e.g. shipTo
),
and elements containing only a simple type of value (e.g. price
).
However, we have not seen an element having any attributes and
containing only a simple type of value, nor have we seen an
element that contains other elements and simple values, nor have
we seen an element that has no content at all. In this section we'll
examine these variations in the content of element types.
Let us first consider how to declare an element that has an attribute and contains a simple value. In an instance document, such an element might appear as:
<price currency='EU'>423.46</price>
The purchase order schema declares a price
element that gives us a starting point:
<xsd:element name="price" type="decimal"/>
Now, how do we add an attribute to this element? As we have said before, simple types cannot have attributes, and decimal is a simple type. Therefore, we must create a complex type to carry the attribute declaration. We also want the content to be simple type decimal. So our original question becomes: How do we create a complex type that is based on the simple type decimal? The answer is to derive a new complex type from the simple type decimal:
Deriving a Complex Type from a Simple Type |
<xsd:element name='price'> <xsd:complexType base='xsd:decimal' derivedBy='extension'> <xsd:attribute name='currency' type='xsd:string' /> </xsd:complexType> </xsd:element> |
We use the complexType element to define a new (anonymous)
type, and we refer to decimal in the base attribute to indicate
it is the simple type from which we are deriving the new type. We
add a currency attribute using a standard attribute declaration,
and because we want to add this attribute to the simple type, we
must signal our intent by stating derivedBy='extension'
. (We
cover type derivation in detail in Section 3).
The price element declared in this way will appear in an instance
as shown in the example above.
For the sake of brevity, we have derived an anonymous complex
type from decimal
, but the price
element declared here is still correct relative to the price
instance example above.
Now suppose that we want the price
element to
convey both the unit of currency and the price as attribute
values rather than as separate attribute and content values. For
example:
<price currency='EU' value='423.46' />
Such an element has no content at all, we say that its content
model is empty
:
An Empty Complex Type |
<xsd:element name='price'> <xsd:complexType content='empty'> <xsd:attribute name='currency' type='xsd:string' /> <xsd:attribute name='value' type='xsd:decimal' /> </xsd:complexType> </xsd:element> |
The purchase order schema is constructed in a style which can
be broadly described as elements containing subelements and the
deepest subelements containing character data. XML Schema
also provides for constructing schemas using a style in which
character data can appear alongside subelements at multiple
levels of embedding, and such data is not confined to the
deepest level subelements. The latter style of
constructing schemas is enabled through the mixed
value of the content
attribute. To illustrate,
consider the following snippet from a customer letter that uses
some of the same elements as the purchase order:
Snippet of Customer Letter |
<letterBody> <salutation>Dear Mr.<name>Robert Smith</name>.</salutation> Your order of <quantity>1</quantity> <productName>Baby Monitor</productName> shipped from our warehouse on <shipDate>1999-05-21</shipDate>. .... </letterBody> |
Notice the text appearing between elements at different levels.
Specifically, text appears between the elements salutation
,
quantity
, productName
and shipDate
which are all children of letterBody
, and text
appears around the element name which is the child of a child of letterBody
.
The following snippet of a schema declares letterBody
:
Snippet of Schema for Customer Letter |
<xsd:element name='letterBody'> <xsd:complexType content='mixed'> <xsd:element name='salutation'> <xsd:complexType content='mixed'> <xsd:element name='name' type='xsd:string'/> </xsd:complexType> </xsd:element> <xsd:element name='quantity' type='xsd:positive-integer'/> <xsd:element name='productName' type='xsd:string'/> <xsd:element name='shipDate' type='xsd:date' minOccurs='0'/> <!-- etc --> </xsd:complexType> </xsd:element> |
Ed. Note: Clarify the default and possible values of min/maxOccurs on elements in mixed content model.
By now, we hope you understand that the different values for the content attribute represent different content models. In previous sections, we have defined new complex types without reference to the content attribute, and so it is reasonable to ask what content model was used in those definitions. The default content model for a complex type is called elementOnly, i.e. the complex type may contain elements and attributes. elementOnly is the content model that applies when we derive complex types from other complex types, but when we derive a complex type from a simple type (as we did at the beginning of this section), the content model is called textOnly. In fact, we can define a complexType in terms of textOnly:
A textOnly Complex Type |
<xsd:element name='price'> <xsd:complexType content='textOnly'> <xsd:attribute name='currency' type='xsd:string' /> </xsd:complexType> </xsd:element> |
The content of the anonymous type defined in this way is
unconstrained, so the element value may be 423.46, but it
legitimately may be any other sequence of characters as well. In
general it is probably better to avoid such unconstrained type
definitions in favour of the any
construction
described in a later section, and constrained type definitions such as
decimal and string.
XML Schema provides a set of elements for annotating schemas
for the benefit of both human readers and applications. In the
purchase order schema, we put a basic schema description and
copyright information inside the documentation
element, which is the recommended location for human readable
material. Another element, appinfo
, which we did not
use in the purchase order schema, can be used to provide
information for tools, stylesheets and other applications. Both documentation
and appinfo
appear as a subelement of annotation,
which may itself appear anywhere in a schema.
The definitions of complex types in the purchase order schema all declare a sequence of elements that must appear in the instance document. The occurence of individual elements declared in the so-called content models of these types may be optional, as in the case of the comment element where the value of minOccurs is 0, or otherwise constrained depending upon the values of minOccurs and maxOccurs. XML Schema also provides constraints that apply to groups of elements appearing in a content model. Note that the constraints do not apply to attributes. The constraints provided by the group element mirror those available in XML 1.0, and provide some additional constraints.
Ed. Note: The syntax for groups has changed. There are now <choice>, etc elements. This section will be rewritten.
The default for a group is that all the group's elements must appear in the order given. Alternatively, a group may be defined such that only one of the elements within the group may appear in the instance. Groups can also be nested, and may take minOccurs and maxOccurs attributes. To illustrate, we can use two groups in the purchase order schema to allow documents containing separate shipping and billing addresses, or single addresses in cases where the shipper and biller are co-located:
Nested Groups |
<xsd:complexType name="PurchaseOrderType"> <xsd:group order="choice"> <xsd:group> <xsd:element name="shipTo" type="Address" /> <xsd:element name="billTo" type="Address" /> </xsd:group> <xsd:element name="singleAddress" type="Address"/> </xsd:group> <xsd:element ref="comment" minOccurs="0" /> <xsd:element name="items" type="Items" /> <xsd:attribute name="orderDate" type="xsd:date" /> </xsd:complexType> |
The third option for ordering elements in a group specifies
that all the elements in the group must appear once, and in any
order. Usage of the "all" option (which provides a
simplified version of theSGML &-Connector) is limited to the
top-level of any content model, the items listed must all be
individual elements (no groups), and each element in the content
model can only appear once, i.e. every element in the content
model must have minOccurs="1" and maxOccurs="1".
For example, if it was important to allow the child elements of purchaseOrder
to appear in any order, we could redefine PurchaseOrderType
as:
An 'All' Group |
<xsd:complexType name="PurchaseOrderType"> <xsd:group order="all"> <xsd:element name="shipTo" type="Address"/> <xsd:element name="billTo" type="Address"/> <xsd:element ref="comment"/> <xsd:element name="items" type="Items" /> </xsd:group> <xsd:attribute name="orderDate" type="xsd:date" /> </xsd:complexType> |
Note that the comment element in this example is no longer optional (as indicated previously by minOccurs="0") to meet the stipulation that every element in an "all" group must appear exactly once. Furthermore, the comment element cannot be placed outside the "all" group as a means to making it optional because the "all" group must appear at the top of the content model. In other words, the following is illegal:
Illegal Example with an 'All' Group |
<xsd:complexType name="PurchaseOrderType"> <xsd:group order="all"> <xsd:element name="shipTo" type="Address"/> <xsd:element name="billTo" type="Address"/> <xsd:element name="items" type="Items" /> </xsd:group> <xsd:element ref="comment" minOccurs="1"/> <xsd:attribute name="orderDate" type="xsd:date" /> </xsd:complexType> |
The preceding examples describe groups defined inline, in other words, the groups are not named and they do not exist outside the context of their surrounding type definitions. However, groups can be named and used in multiple locations, in much the same way as the complexType and attributeGroup elements. In this way, they reconstruct common usage of Parameter Entities in XML 1.0.
Suppose we want to provide more information about the items
in a purchase order, by adding attributes to the item
element indicating whether or not the item is in stock, weight,
and preferred shipping method. One way to add these attributes is
to add more attribute declarations to the inline Item
complexType definition:
Adding Attributes to the Inline Type Definition |
<xsd:element name="Item" minOccurs="0" maxOccurs="*"> <xsd:complexType> <xsd:element name="productName" type="xsd:string"/> <xsd:element name="quantity"> <xsd:simpleType base="xsd:positive-integer"> <xsd:maxExclusive value="100"/> </xsd:simpleType> </xsd:element> <xsd:element name="price" type="xsd:decimal"/> <xsd:element ref="comment" minOccurs="0"/> <xsd:element name="shipDate" type="xsd:date" minOccurs='0'/> <xsd:attribute name="partNum" type="Sku"/> <xsd:attribute name="weight" type="xsd:decimal"/> <xsd:attribute name="shipBy"> <xsd:simpleType base="string"> <xsd:enumeration value="air"/> <xsd:enumeration value="land"/> <xsd:enumeration value="any"/> </xsd:simpleType> </xsd:attribute> </xsd:complexType> </xsd:element> |
Alternatively, we can create a named Attribute Group
containing these attributes and reference this group by name in
the item
element declaration:
Adding Attributes Using an Attribute Group |
<xsd:element name="item" minOccurs="0" maxOccurs="*"> <xsd:complexType> <xsd:element name="productName" type="xsd:string"/> <xsd:element name="quantity"> <xsd:simpleType base="xsd:positive-integer"> <xsd:maxExclusive value="100"/> </xsd:simpleType> </xsd:element> <xsd:element name="price" type="xsd:decimal"/> <xsd:element ref="comment" minOccurs="0"/> <xsd:element name="shipDate" type="xsd:date" minOccurs='0'/> <xsd:attributeGroup ref="ItemDelivery"/> </xsd:complexType> </xsd:element> <xsd:attributeGroup name="ItemDelivery"> <xsd:attribute name="partNum" type="Sku"/> <xsd:attribute name="weight" type="xsd:decimal"/> <xsd:attribute name="shipBy"> <xsd:simpleType base="xsd:string"> <xsd:enumeration value="air"/> <xsd:enumeration value="land"/> <xsd:enumeration value="any"/> </xsd:simpleType> </xsd:attribute> </xsd:attributeGroup> |
Using an Attribute Group in this way can improve the readability of schema, and facilitates updating schema because an Attribute Group can be defined and edited in one place and referenced in multiple definitions and declarations. These characteristics of Attribute Groups make them similar to Parameter Entities in XML 1.0. Note that both Attribute declarations and Attribute Group references must appear at the end of complex type definitions.
One of the purchase order items, the Lawnmower, does not have a shipDate element. Within the context of our scenario, the schema author may have intended such an absence to indicate that the Lawnmower has not shipped. But in general, the absence of an element does not have any particular meaning; it may indicate that the information is unknown, or not applicable, or the element may be absent for some other reason. Sometimes the absence of an element may have the same meaning as a "null" value in a relational database, although there will be other times when it is desirable to explicitly represent such a null value.
XML Schema provides a mechanism for explicitly representing
nulls in an XML format. This mechanism involves an "out of
band" null signal. In other words, there is no actual null
value that appears as element content, instead there is an
attribute to indicate the element content is null. To illustrate,
we can modify the shipDate
element declaration so
that nulls can be signalled:
<xsd:element name="shipDate" type="xsd:date" nullable="true"/>
And to explictly represent that shipDate
has a
null value in the instance document, we set the null attribute (from
the XML Schema namespace for instances) to true:
<shipDate xsi:null="true"></shipDate>
The null attribute is defined as part of the XML Schema
namespace for instances (http://www.w3.org/1999/XMLSchema/instance
),
and so it must appear in the instance document with a prefix (xsi
)
associated with that namespace. Note that the null mechanism
applies only to element values, and not to attribute values. An
element with xsi:null="true"
may not have
any element content but it may still carry attributes.
In this section, we consider some of the advanced features available in XML Schema.
As schemas become larger, it is often desirable to divide
their content among several schema documents for purposes such as
ease of maintenance, access control, and readability. For these
reasons, we have taken the schema constructs concerning addresses
out of po.xsd
, and put them in a new file called address.xsd
.
The modified purchase order schema file is now called ipo.xsd
:
The International Purchase Order Schema, ipo.xsd |
<schema targetNamespace="http://www.example.com/IPO" xmlns="http://www.w3.org/1999/XMLSchema" xmlns:ipo="http://www.example.com/IPO> <annotation> <documentation> International Purchase order schema for Example.com Copyright 2000 Example.com. All rights reserved. </documentation> </annotation> <!-- include address constructs --> <include schemaLocation="http://www.example.com/schemas/address.xsd"/> <element name="purchaseOrder" type="ipo:PurchaseOrderType"/> <element name="comment" type="string"/> <complexType name="PurchaseOrderType"> <element name="shipTo" type="ipo:Address"/> <element name="billTo" type="ipo:Address"/> <element ref="ipo:comment" minOccurs="0"/> <element name="Items" type="ipo:Items"/> <attribute name="orderDate" type="date"/> </complexType> <complexType name="Items"> <element name="item" minOccurs="0" maxOccurs="*"> <complexType> <element name="productName" type="string"/> <element name="quantity"> <simpleType base="positive-integer"> <maxExclusive value="100"/> </simpleType> </element> <element name="price" type="decimal"/> <element ref="ipo:comment" minOccurs="0"/> <element name="shipDate" type="date" minOccurs='0'/> <attribute name="partNum" type="ipo:Sku"/> </complexType> </element> </complexType> <simpleType name="Sku" base="string"> <pattern value="/d{3}-[A-Z]{2}"/> </simpleType> </schema> |
The file containing the address constructs is:
Addresses for International Purchase Order schema, address.xsd |
<schema targetNamespace="http://www.example.com/IPO" xmlns="http://www.w3.org/1999/XMLSchema" xmlns:ipo="http://www.example.com/IPO"> <annotation> <documentation> Addresses for International Purchase order schema Copyright 2000 Example.com. All rights reserved. </documentation> </annotation> <complexType name="Address"> <element name="name" type="string"/> <element name="street" type="string"/> <element name="city" type="string"/> </complexType> <complexType name="US-Address" base="ipo:Address" derivedBy="extension"> <element name="state" type="ipo:US-State"/> <element name="zip" type="positive-integer"/> </complexType> <complexType name="UK-Address" base="ipo:Address" derivedBy="extension"> <element name="postcode" type="ipo:UK-Postcode"/> <attribute name="export-code" type="positive-integer" fixed="1"/> </complexType> <!-- other Address derivations for more countries --> <simpleType name="US-State" base="string"> <enumeration value="AK"/> <enumeration value="AL"/> <enumeration value="AR"/> <!-- and so on ... --> </simpleType> <!-- simple type definition for UK-Postcode --> </schema> |
The reader will have noticed that we have changed namespace
conventions between the the original purchase order schema and
the international purchase order schema. In particular, the XML
Schema namespace is now the default, according to the default
namespace declaration on the schema element (xmlns="http://www.w3.org/1999/XMLSchema"
),
and so the elements and built-in simple types belonging to XML
Schema no longer require a prefix. In contrast, when references
are made in the schema to types that have been defined in the
schema, for example:
<element name="purchaseOrder" type="ipo:PurchaseOrderType"/>
then the type must appear with a prefix (ipo:
)
that is associated with the purchase order schema's namespace.
Actually, we make this association in two steps: One statement
(xmlns:ipo="http://www.example.com/IPO"
)
associates the prefix with a particular URI, and a second
statement (targetNamespace="http://www.example.com/IPO"
)
asserts that this URI is the purchase order schema's namespace.
Instead of the various address constructions being in the ipo.xsd
file, they are located in address.xsd
. To include
these constructions as part of the international purchase order
schema, in other words to include them in the international
purchase order's namespace, ipo.xsd
contains the
include element:
<include schemaLocation="http://www.example.com/schemas/address.xsd"/>
The net effect of this include is equivalent to replacing the
include element with all the definitions and declarations from address.xsd
.
Note that for the address constructions to be accessible as part
of the international purchase order's schema, the namespace of
the included constructions must be the same as the namespace of
the international purchase order's schema. This is accomplished
by making the (target) namespace of the included schema file the
same as the (target) namespace of the including schema file, i.e.
http://www.example.com/IPO
. In this example, we have
shown only one including document and one included document. In
practice it is possible to include multiple documents using
multiple include elements, and documents can include documents
that themselves include other documents; Such nesting is legal
only if all the included parts of the schema are declared to have
the same target namespace.
Instance documents that conform to schema whose definitions
span multiple schema documents need only reference the 'topmost'
document, and the common namespace, and it is the responsibility
of the XML processor to gather together all the definitions
specified in the various included documents. So in our example,
the instance document ipo.xml
(see section 3.3) references only the
common namespace, http://www.example.com/IPO
, and
the one schema file http://www.example.com/schemas/ipo.xsd
.
In a later section we'll examine the situation when there is more than one schema namespace.
To create our address constructs, we start by creating a
complex type called Address
in the usual way (see address.xsd
).
The Address
type contains the basic elements of an
address: a name, a street and a city. From this starting point we
derive two new complex types that contain all the elements of the
original type plus additional elements that are specific to
addresses in the U.S. and the U.K.
We create the two new complex types, US-Address
and UK-Address
, using the complexType
element along with values for base
and derivedBy
attributes. When a complex type is derived by extension, its
effective content model is the content model of the base type
plus the content model specified in the type derivation. The additional
content is always appended at the end of the base type's content.
In the
case of UK-Address
, the content model of UK-Address
is the content model of Address
plus the
declarations for a postcode element and an export-code attribute.
This is equivalent to defining the UK-Address
from
scratch as follows:
Example |
<complexType name="UK-Address"> <!-- content model of Address --> <element name="name" type="string"/> <element name="street" type="string"/> <element name="city" type="string"/> <!-- appended declarations --> <element name="postcode" type="ipo:UK-Postcode"/> <attribute name="export-code" type="positive-integer" fixed="1"/> </complexType> |
In our example scenario, purchase orders are generated in
response to customer orders which may involve shipping and
billing addresses in different countries. The international
purchase order, ipo.xml
, illustrates one such case
where goods are shipped to England and the bill is sent to a US
address. Clearly it is very useful if the schema for
international purchase orders does not have to spell out every
possible combination of international addresses for billing and
shipping, and more so if we can add new complex types of
international address simply by creating new derivations of Address
.
XML Schema allows us to define the billTo
and shipTo
elements as Address
types (see ipo.xsd
)
and to use instances of international addresses in place of
instances of Address
. In other words, an instance
document that contains markup conforming to the UK-Address
type will be valid if that markup appears within the document at
a location where an Address
is expected (assuming
the UK-Address
markup itself is valid). To make this
feature of XML Schema work, and to disambiguate exactly which
derived type is intended, the derived type should be identified
in the instance document. The type is identified using the type
attribute which is part of the XML Schema instance namespace. In
the example, ipo.xml
, use of the UK-Address
and US-Address
derived types is identified through
the values assigned to the xsi:type
attributes.
An International Purchase order, ipo.xml |
<?xml version="1.0"?> <ipo:purchaseOrder xmlns:xsi='http://www.w3.org/1999/XMLSchema/instance' xmlns:ipo="http://www.example.com/IPO" orderDate="1999-12-01"> <shipTo export-code="1" xsi:type="ipo:UK-Address"> <name>Helen Zoe</name> <street>47 Eden Street</street> <city>Cambridge</city> <postcode>CB1 1JR</postcode> </shipTo> <billTo xsi:type="ipo:US-Address"> <name>Robert Smith</name> <street>8 Oak Avenue</street> <city>Old Town</city> <state>PA</state> <zip>95819</zip> </billTo> <items> <item partNum="833-AA"> <productName>Lapis necklace</productName> <quantity>1</quantity> <price>99.95</price> <ipo:comment>Want this for the holidays!</ipo:comment> <shipDate>1999-12-05</shipDate> </item> </items> </ipo:purchaseOrder> |
Ed. Note: Describe here the use of namespaces in the instance document
Ed. Note: This section is not final, awaiting editorial decisions regarding exact syntax. Will provide a table detailing restrictions at that time.
As you have probably guessed, in addition to deriving new complex types by extending content models, it is also possible to derive new types by restricting the content models of existing types. A type derived by restriction looks just like an ordinary type definition, but it is constrained to only have declarations that are the same as or more limited than the corresponding declarations in the base type.
For example, suppose we want to update our definition of the
list of items
in an international purchase order so
that it must contain at least one item
on order (the
schema in ipo.xsd
currently allows an items
element to appear without any child item
elements).
To create our new ConfirmedItems
type, we define the
new type in the usual way, indicate that it is derived from the
base type Items
, indicate that we are are deriving
the new type by restriction, and indicate a new value for the
minimum number of item
element occurrences:
Deriving ConfirmedItems by Restriction from Items |
<complexType name="ConfirmedItems" base="ipo:Items" derivedBy="restriction"> <element name="item" minOccurs="1"/> </complexType> |
This change, requiring at least one child element rather than allowing zero or more child elements, narrows the range of allowable child elements.
As another example, instead of extending Address
to create US-Address
,
as earlier, we instead could use a generic World-Address
type and derive US-Address
by restriction:
Example |
<complexType name="World-Address"> <element name="name" minOccurs="0" maxOccurs="*"/> <element name="street" minOccurs="0" maxOccurs="*"/> <element name="city" minOccurs="0"/> <element name="region" minOccurs="0"/> <element name="country" type="string" minOccurs="0"/> <element name="postal" minOccurs="0"/> </complexType> <complexType name="US-Address" base="ipo:World-Address" derivedBy="restriction"> <element name="name" type="string" minOccurs="0"/> <element name="street" type="string" minOccurs="0" maxOccurs="*"/> <element name="city" type="string" minOccurs="1"/> <element name="state" type="string" minOccurs="1"/> <element name="country" type="string" minOccurs="0" default="USA"/> <element name="zip" minOccurs="0"/> </complexType> |
Note that the US-Address
declares types for
untyped elements in the base World-Address
, tightens
minOccurs
and maxOccurs
constraints,
sets or changes default values and even renames region
to state
and postal
to zip
.
XML Schema provides a mechanism, called equivalence classes,
specifically, elements can be made members of a special class of
elements that are said to be equivalent to a particular named
element which is called the exemplar. Note that the exemplar must
be a global element. For example, we can declare two elements
called customerComment
and shipComment
that are equivalent to the comment
element, and so customerComment
and
shipComment
can be
used anyplace that we are able to use
comment
. Elements in
an equivalence class must have the same type as the examplar (or
they can have a type that has been derived from the exemplar's
type). To declare these two new elements, and to make them
equivalent to the comment
element, we use the
following syntax:
Declaring Elements Equivalent to comment |
<element name='shipComment' type='string' equivClass='ipo:comment' /> <element name='customerComment' type='string' equivClass='ipo:comment' /> |
When these declarations are added to the ipo.xsd
schema file, comment
can be substituted for in the
instance document, for example:
Snippet of ipo.xml Containing Substituted Elements |
.... <items> <item partNum="833-AA"> <productName>Lapis necklace</productName> <quantity>1</quantity> <price>99.95</price> <ipo:shipComment>Use blue wrap if possible</ipo:shipComment> <ipo:customerComment> Want this for the holidays! </ipo:customerComment> <shipDate>1999-12-05</shipDate> </item> </items> .... |
The existence of an equivalence class does not require any of the elements in that class to be used, nor does it preclude use of the exemplar. It simply provides a mechanism for allowing elements to be used interchangeably.
XML Schema provides a mechanism that can preclude particular elements from being used. By declaring an element to be "abstract", it cannot be used in an instance document.
In the equivalence class scenario we have just described, it
would be useful to specifically disallow use of the comment
element so that instances must make use of the customerComment
and shipComment
elements.
To declare the Comment
element abstract, we modify its original declarations in the
international purchase order schema, ipo.xsd
, as
follows:
<element name="comment" type="string" abstract='true'/>
With comment
declared as abstract, instances of international purchase orders
are now only valid if they contain customerComment
and shipComment
elements.
Ed. Note: Need to describe abstract types as well.
So far, we have been able to derive new types without any
restrictions. Schema authors will sometimes want to prevent
derivations based on particular types, to avoid bad practice and
for other reasons. Probably the simplest form of prevention is to
specify that for a particular type (simple or complex), new types
may not be derived from it by restriction, by extension, or all.
To illustrate, suppose we want to prevent any derivation of the Address
type by restriction because we decree that an Address
must consist of at least a name, a street and a city (which is
how it is defined in address.xsd
). To prevent such
derivations, we would slightly modify the original definition of Address
as follows:
Preventing Derivations by Restriction of Address |
<complexType name="Address" final="restriction"> <element name="name" type="string"/> <element name="street" type="string"/> <element name="city" type="string"/> </complexType> |
The restriction
value of the final
attribute prevents derivations by restriction. Preventing
derivations at all, or by extension, are indicated by the values
#all
and extension
respectively.
Another prevention mechanism controls which derivations and
equivalence classes may and may not be used in instance documents.
In section 3.3, we described
how the derived types, US-Address
and UK-Address
,
could be used by the shipTo
and billTo
elements in instance documents. These derived types can replace
the content model provided by the Address
type with
which the shipTo
and BillTo
elements
were originally declared, because they are derived from the Address
type. However, replacement by derived types can be controlled
using the block
attribute in a type definition. For example, if we want to
block any derivation-by-restriction from being used in place of
Address
(perhaps for the same reason we defined Address
with final='restriction'
), we can modify
the original definition of Address
as follows:
Preventing Derivations by Restriction of Address in the Instance |
<complexType name="Address" block="restriction"> <element name="name" type="string"/> <element name="street" type="string"/> <element name="city" type="string"/> </complexType> |
The restriction
value on the block
attribute
prevents derivations-by-restriction from replacing Address
in an instance. However, it would not prevent UK-Address
and US-Address
from replacing Address
because they were derived by extension.
Preventing replacement by derivations at all, or by derivations-by-extension,
are indicated by the values #all
and extension
respectively.
The home-products ordering and billing application can
generate ad-hoc reports that summarise how many of which types of
products have been billed on a per region basis. An example of
such a report, one that covers the fourth quarter of 1999, is
shown in 4Q99.xml
.
Quarterly Report, 4Q99.xml |
<r:purchaseReport xmlns:r='http://www.example.com/Report' period="P3M" periodEnding="1999-12-31"> <regions> <zip code="95819"> <part number="872-AA" quantity="1"/> <part number="926-AA" quantity="1"/> <part number="833-AA" quantity="1"/> <part number="455-BX" quantity="1"/> </zip> <zip code="63143"> <part number="455-BX" quantity="4"/> </zip> </regions> <parts> <part number="872-AA">Lawnmower</part> <part number="926-AA">Baby Monitor</part> <part number="833-AA">Lapis Necklace</part> <part number="455-BX">Sturdy Shelves</part> </parts> </r:purchaseReport> |
The report lists, by number and quantity, the parts billed to various zip codes, and it provides a description of each part mentioned. In summarising the billing data, the intention of the report is clear and the data is unambiguous because a number of constraints are in effect. For example, each zip code appears only once (uniqueness constraint). Similarly, the description of every billed part appears only once although parts may be billed to several zip codes (referential constraint): See for example, part number 455-BX. In the following sections, we'll see how to specify these constraints using XML Schema.
The Report Schema, report.xsd |
<schema targetNamespace='http://www.example.com/Report' xmlns='http://www.w3.org/1999/XMLSchema' xmlns:r='http://www.example.com/Report' xmlns:xipo='http://www.example.com/IPO'> <!-- for Sku --> <import namespace='http://www.example.com/IPO'/> <annotation> <documentation> Report schema for Example.com Copyright 2000 Example.com. All rights reserved. </documentation> </annotation> <element name="purchaseReport"> <complexType> <element name="regions" type="r:RegionsType"/> <element name="parts" type="r:PartsType"/> <attribute name="period" type="timeDuration"/> <attribute name="periodEnding" type="date"/> </complexType> <unique> <selector>regions/zip</selector> <field>@code</field> </unique> <key name="pNumKey"> <selector>parts/part</selector> <field>@number</field> </key> <keyref refer="pNumKey"> <selector>regions/zip/part</selector> <field>@number</field> </keyref> </element> <complexType name="RegionsType"> <element name="zipcode" minOccurs="1" maxOccurs="*"> <complexType> <element name="part"> <complexType content="empty"> <attribute name="number" type="xipo:Sku"/> <attribute name="quantity" type="positive-integer"/> </complexType> </element> <attribute name="code" type="positive-integer"/> </complexType> </element> </complexType> <complexType name="PartsType> <element name="part" minOccurs="1" maxOccurs="*"> <complexType content="textOnly"> <attribute name="number" type="xipo:Sku"/> </complexType> </element> </complexType> </schema> |
XML Schema enables us to indicate that any attribute or
element value must be unique; In fact, it enables us to indicate
that combinations of attribute and element values must be unique.
To indicate that one particular attribute or element value is
unique, we use the unique
element identify the set
of elements containing the attribute or element value, and within
this set we identify the particular attribute or element. In the
case of our report schema, report.xsd
, the selector
element contains an XPath expression, regions/zip
,
that returns a list of all the zip
elements in a
report instance, and the field
element contains a
second XPath expression, @code
, that indicates the code
attribute of those elements must be unique. Note that the XPath
expression limits the scope of what must be unique. The report
might contain another code attribute, but it's value does not
have to be unique because it lies outside the scope defined by
the XPath expression.
As we mentioned previously, we can indicate that combinations
of values must be unique. To illustrate, suppose we can relax the
constraint that zip codes may only be listed once, although we
still want to enforce the constraint that any product is listed
only once within a given zip code. We could acheive such a
constraint by specifying that the combination of zip code and
product number must be unique. From the report document, 4Q99.xml
,
the combined values of zip
and number
would be: {95819 872-AA}, {95819
926-AA}, {95819 833-AA}, {95819 455-BX}, and {63143 455-BX}.
Clearly, these combinations do not distinguish between zip
and
number
combinations derived from single or multiple listings of
any particular zip, but the combinations would unambiguously
represent a product listed more than once within a single zip. In
other words, a schema processor could detect violations of the
uniqueness constraint.
To define combinations of values, we simply add field
elements to identify all the values involved. So, to add the part
number value to our existing definition, we add a new field
element whose XPath expression, part[@number]
,
identifies the number
attribute of part
elements that are children of the zip
elements identified by regions/zip
:
A Unique Composed Value |
<unique> <selector>regions/zip</selector> <field>@code</field> <field>part[@number]</field> </unique> |
The XPath language used in specifying uniqueness, keys and key references is a subset of the XML Path Language 1.0.
Ed. Note: Describe here the subset of XPath
In the 1999 quarterly report, the description of every billed
part appears only once. We could enforce this constraint using unique
,
however, we also want to ensure that every part-quantity element
listed under a zipcode has a corresponding part description, and
so we use the key
and keyref
elements
instead. The report schema, report.xsd
, shows that
the key
and keyref
constructions are
applied using almost the same syntax as unique
. The
key element applies to the number
attribute value of
part
element's that are children of the parts
element. This declaration of number
as a key means
that its value must be unique, and the name that is associated
with the key, pNumKey
, makes the key referenceable
from elsewhere.
To ensure that the part-quantity elements have corresponding
part descriptions, we say that the number attribute ( <field>@number</field>
)
of those elements ( <selector>regions/zip/part</selector>
)
must reference the pNumKey
key. This declaration of number
as a keyref does not mean that its value must be unique, but it
does mean there must exist a pNumKey
with the same
value.
As you may have figured out by analogy with unique
,
it is possible to define combinations of key and keyref values.
Using this mechanism, we could go beyond simply requiring the
product numbers to be equal, and define a combination of values
that must be equal. Such values may involve combinations of
multiple value types (string, integer, date, etc), provided that
the order of the field element references is the same in both the
key and keyref definitions.
XML 1.0 provides a mechanism for ensuring uniqueness using the ID and associated IDREF and IDREFS attributes. So, how do XML Schema's mechanisms compare? In short, they are vastly more powerful. More specifically, XML Schema's mechanisms can be applied to any element and attribute content, regardless of their type. In contrast, ID is a type of attribute and so cannot be arbitrarily applied to attributes, elements and their content. Furthermore, Schema enables you to specify the scope within which uniqueness applies whereas the range within which an ID applies is unique cannot be modified. Finally, Schema enables you to create keys or a keyref from combinations of element and attribute content whereas ID has no such facility.
The report schema, report.xsd
, makes use of the
simple type xipo:Sku
that is defined in another
schema, and more specifically, in another namespace. Recall that
we used include
so that the schema in ipo.xsd
could make use of definitions and declarations from address.xsd
.
We cannot use include
here because it can only
"import" definitions and declarations from a schema
whose target namespace is the same as the including schema's
target namespace. Hence, the include
element does
not identify a namespace (although it does require a schemaLocation
).
To import the type Sku
and use it in the report
schema, we must identify the namespace in which Sku
is defined, and associate that namespace with a prefix for use in
the report schema. Specifically, we use the import
element to identify Sku
's namespace (http://www.example.com/IPO
),
and we associate the namespace with the prefix xipo
using a standard namespace declaration. We use xipo
rather than ipo
to illustrate that the prefix is
only used locally. The simple type Sku
, defined in
the namespace http://www.example.com/IPO
, may then
be referenced as xipo:Sku
in any definitions and
declarations.
In our example, we imported one simple type from one external
namespace, and referred to it in an attribute declaration. XML
Schema in fact permits multiple schema components to be imported,
from multiple namespaces, and they can be referred to in both
definitions and declarations. We can reference an element in a
declaration, for example in report.xsd
we can reuse
the comment
element declared in po.xsd
:
<element ref='xpo:comment' minOccurs='1'/>
Note however, that we cannot reuse the shipTo
element from po.xsd
, and the following is not legal:
<element ref='xpo:shipTo'/>
The reason is that only global schema components can be
imported. In po.xsd
, comment is declared as a global
element, in other words it appears as a sublement of schema
.
In contrast, shipTo
is declared locally, in other
words it is declared as part of something else, namely the PurchaseOrderType
definition.
Complex types may also be imported, and they can be used as
the base type for deriving new types. Suppose we want to include
in our reports the name of an analyst, along with contact
information. We can reuse the (globally defined) complex type US-Address
from address.xsd
, and extend it to include phone
and email
to define a new type called Analyst
:
Defining Analyst by Extending US-Address |
<complexType name='Analyst' base='xipo:US-Address' derivedBy='extension'> <element name="phone" type="string"/> <element name="email" type="string"/> </complexType> |
Using the new type we declare an element called analyst
(declaration not shown). A snippet of an instance document
conforming to analyst
is:
Snippet of Instance Document Conforming to Analyst |
<analyst> <name>Wendy Uhro</name> <street>10 Corporate Towers</street> <city>San Jose</city> <state>CA</state> <zip>95113</zip> <phone>408-271-3366</phone> <email>uhro@example.com</email> </analyst> |
When schema components are imported from multiple namespaces,
each namespace must be identified with a separate import
element. The import
elements themselves must appear
as the first children of the schema
element.
Furthermore, each namespace must be associated with a prefix,
using a standard namespace declaration, and that prefix used to
qualify references to any schema components belonging to that
namespace. Finally, import
elements optionally
contain a schemaLocation
attribute to help locates
resource associated with the namespaces. We discuss the schemaLocation
attribute in more detail in a later section.
In previous sections we have seen several mechanisms for extending the content models of complex types. For example, a mixed content model can contain arbitrary character data in addition to elements, and for example, a content model can contain particular elements whose types are imported from external namespaces. However, these mechanisms respectively provide very broad and very narrow controls, and the purpose of this section is to describe a flexible mechanism that enables content models to be extended by any elements and attributes belonging to specified namespaces.
To illustrate, consider a version of the quarterly report, 4Q99html.xml
,
in which we have embedded an an HTML formatted representation of
the XML parts data. The HTML content appears as the content of
the element htmlExample
, and the default namespace
is changed on the outermost HTML element (table
) so
that all the HTML elements belong to the HTML namespace, http://www.w3.org/1999/XHTML
:
Quarterly Report with HTML, 4Q99html.xml |
<r:purchaseReport xmlns:r='http://www.example.com/Report' period="P3M" periodEnding="1999-12-31"> <regions> <!-- part sales listed by zipcode, data from 4Q99.xml --> </regions> <parts> <!-- part descriptions from 4Q99.xml --> </parts> <htmlExample> <table xmlns='http://www.w3.org/1999/HTML' border="0" width="100%"> <tr> <th align="left">Zip Code</th> <th align="left">Part Number</th> <th align="left">Quantity</th> </tr> <tr><td>95819</td><td> </td><td> </td></tr> <tr><td> </td><td>872-AA</td><td>1</td></tr> <tr><td> </td><td>926-AA</td><td>1</td></tr> <tr><td> </td><td>833-AA</td><td>1</td></tr> <tr><td> </td><td>455-BX</td><td>1</td></tr> <tr><td>63143</td><td> </td><td> </td></tr> <tr><td> </td><td>455-BX</td><td>4</td></tr> </table> </htmlExample> </r:purchaseReport> |
To permit the appearance of HTML in the instance document we
modify the report schema by declaring a new element htmlExample
whose content is defined by the any
element. In
general, an any
element specifies that any well-formed
XML is permissable in a type's content model. In the example, we
require the XML to belong to the namespace http://www.w3.org/1999/XHTML
,
in other words, it should be HTML. The example also requires
there to be at least one element present from this namespace, as
indicated by the values of minOccurs
and maxOccurs
:
Modification to purchaseReport Declaration to Allow HTML in Instance |
<element name="purchaseReport"> <complexType> <element name="regions" type="r:RegionsType"/> <element name="parts" type="r:PartsType"/> <element name='htmlExample'> <complexType> <any namespace='http://www.w3.org/1999/XHTML' minOccurs='1' maxOccurs='*' processContents='skip'/> </complexType> </element> <attribute name="period" type="timeDuration"/> <attribute name="periodEnding" type="date"/> </complexType> </element> |
The modification permits some well-formed XML belonging to the
namespace http://www.w3.org/1999/XHTML
to appear
inside the htmlExample
element. Therefore 4Q99html.xml
is permissable because there is one element which (with its
chlildren) is well formed, the element appears inside the
appropriate element (htmlExample
), and the instance
document asserts that the element and its content belongs to the
required namespace. However, the HTML may not actually be valid
because nothing in 4Q99html.xml
by itself can
provide that guarantee. If such a guarantee is required, the
value of the processContents
attribute should be set
to strict
(which is in fact the default value). In
this case, an XML processor is obliged to obtain the schema
associated with the required namespace, and validate the HTML
appearing within the htmlExample
element.
Alternatively, the value of the processContents
attribute can be set to lax
, in which case the
processor will validate the HTML on a can-do basis: It will
validate elements and attributes for which it can obtain schema
information, but it will not signal errors for those it cannot
obtain schema information.
Namespaces may be used to permit and forbid element content in
various ways depending upon the value of the namespace
attribute:
Values of Namespace Attribute | |
---|---|
namespace='##any' | any well-formed XML from any namespace (default) |
namespace='##local' | any well-formed XML that is not qualified, i.e. not declared to be in a namespace |
namespace='##other | any well-formed XML in a namespace different from the namespace of the type being defined |
namespace='http://www.w3.org/1999/XHTML ##targetNamespace' | any well-formed XML belonging to any namespace in the (whitespace separated) list; ##targetNamespace is shorthand for the target namespace of the enclosing schema |
In addition to the any
element which enables
element content according to namespaces, there is a corresponding
anyAttribute
element which enables attributes to
appear in elements. For example, we can permit any HTML attribute
to appear as part of the htmlExample
element by
adding anyAttribute
to its declaration:
Modification to htmlExample Declaration to Allow HTML Attributes |
<element name='htmlExample'> <complexType> <any namespace='http://www.w3.org/1999/XHTML' minOccurs='1' maxOccurs='*' processContents='skip'/> <anyAttribute namespace='http://www.w3.org/1999/XHTML'/> </complexType> </element> |
This declaration permits an HTML attribute, such as href
, to appear in the
htmlExample
element. For example:
An HTML attribute in the htmlExample Element |
.... <htmlExample xmlns:h="http://www.w3.org/1999/XHTML" h:href="http://www.example.com/reports/4Q99.html"> <!-- HTML markup here --> </htmlExample> .... |
The namespace
attribute in an anyAttribute
element can be set to any of the values listed for the any
element. But, in contrast to an any
element, anyAttribute
cannot restrict the number of attributes that may appear in an
element.
Ed. Note: Decision pending whether or not anyAttribute has a processContents attribute.
XML Schema uses attributes named schemaLocation
in three circumstances.
xsi:schemaLocation
provides hints from the author to any reader regarding
the location of schema documents. The author warrants
that these schema documents are relevant to checking the
validity of the material in the document, on a namespace
by namespace basis. The presence of these hints does not
require the reader to obtain or use the cited schema
documents, and the reader is free to use other schemas
obtained by any suitable means, or no schema at all.include
element has a
required schemaLocation
attribute, and it
contains a URI reference which must identify a schema
document. The effect is to compose a final effective
schema by merging in the contents of the schema contained
by the referenced schema document.import
element has a
required namespace
attribute and an optional
schemaLocation
attribute. If present, the schemaLocation
attribute is understood in a way which parallels the
interpretation of xsi:schemaLocation
in (1).
Specifically, it provides a hint from the author to any
reader regarding the location of a schema document that
the author warrants supplies the required components for
that namespace
. The hint does not require
the reader to obtain or use the cited schema document,
but some schema components from that namespace are likely
to be necessary for successful validation.An XML instance document may be processed against a schema to verify whether the rules specified in the schema are honored in the instance. Typically, such processing actually does two things: it checks for conformance to the rules, called "validation," and it also adds supplementary information that is not immediately present in the instance, such as types and default values, called "InfoSet contributions."
The author of an XML instance, such as a particular purchase
order, may claim, in the instance itself, that it conforms to the
rules in a particular schema. The author does this using the schemaLocation
attribute discussed elsewhere. But regardless of whether a schemaLocation
attribute is present, an application is free to process the
document against any schema. For example, a purchasing
application may have the policy of always using a certain
purchase order schema, regardless of any schemaLocation
values.
Conformance checking can be thought of as proceeding in steps, first checking that the root element of the document instance has the right contents, then checking that each contained element conforms to its description in a schema, and so forth until the entire document is verified. Of course, it is possible to check only a portion of a document either by not starting at the root, or by stopping before the full depth has been reached. Whether a given processor supports such partial checking is optional, but processors are required to report what checking has been done.
To check an element for conformance, the processor first locates the declaration for the element in a schema, and then checks that the targetNamespace attribute in the schema matches the actual namespace URI of the element (or, alternatively, that the schema does not have a targetNamespace element and the instance element is not namespace-qualified).
Supposing the namespaces match, the processor then examines
the type of the element, either as given by the declaration in
the schema, or by an xsi:type
attribute in the instance. If the latter, the instance type must
be an allowed substitution for the type given in the schema; what
is allowed is controlled by the block attribute in the schema. At
this same time, default values and other InfoSet contributes are
applied.
Next the processor checks the immediate attributes and
contents of the element, comparing these against the attributes
and contents permitted by the element's type. For example,
considering a shipTo
element such as found in section 2.1, the
processor checks against what is permitted for a Address
, since that is
the shipTo
element's type.
If the element has a simple type, the processor verifies that the element has no attributes or contained elements, and that its character content matches the rules for the simple type. This sometimes involves checking the character sequence against regular expressions or enumerated literals, and sometimes it involves checking that the character sequence represents a value in a permitted range.
If the element has a complex type, then the processor checks that any required attributes are present and that their values conform to the requirements of their simple types. It also checks that all required subelements are present, and that the sequence of subelements (and any mixed text) matches the content model declared for the complex type. Regarding subelements, schemas can either require exact name matching, permit substitution by an equivalent element or permit substitution by any element allowed by an 'any' particle.
Unless a schema indicates otherwise (as it can for 'any' particles) conformance checking then proceeds one level more deeply by looking at each subelement in turn, according to the process described above.
Many people have contributed ideas, material and feedback that has improved this document. In particular, the editor would like to acknowledge contributions from David Beech, Paul Biron, Allen Brown, David Cleary, Dan Connolly, Roger Costello, Dave Hollander, John McCarthy, Andrew Layman, Eve Maler, Ashok Malhotra, Noah Mendelsohn, and Henry Thompson.
The legal values for each simple type can be constrained through the application of one or more facets. Tables B.1a and B.1b list all of XML Schemas built-in simple types and the facets applicable to each type.
Table B.1a. Simple Types & Applicable Facets | |||||
---|---|---|---|---|---|
Simple Types | Facets | ||||
length | minlength | maxlength | pattern | enumeration | |
string | y | y | y | y | y |
boolean | y | ||||
float | y | y | |||
double | y | y | |||
decimal | y | y | |||
timeInstant | y | y | |||
timeDuration | y | y | |||
recurringInstant | y | y | |||
binary | y | y | y | y | y |
uri-reference | y | y | y | y | y |
ID | y | y | y | y | y |
IDREF | y | y | y | y | y |
ENTITY | y | y | y | y | y |
NOTATION | y | y | y | y | y |
language | y | y | y | y | y |
IDREFS | y | y | y | y | |
ENTITIES | y | y | y | y | |
NMTOKEN | y | y | y | y | y |
NMTOKENS | y | y | y | y | |
Name | y | y | y | y | y |
QName | y | y | y | y | y |
NCName | y | y | y | y | y |
integer | y | y | |||
non-positive-integer | y | y | |||
negative-integer | y | y | |||
long | y | y | |||
int | y | y | |||
short | y | y | |||
byte | y | y | |||
non-negative-integer | y | y | |||
unsigned-long | y | y | |||
unsigned-int | y | y | |||
unsigned-short | y | y | |||
unsigned-byte | y | y | |||
positive-integer | y | y | |||
date | y | y | |||
time | y | y |
The facets listed in Table B1.b apply only to simple types which are ordered. Not all simple types are ordered and so B1.b does not list all of the simple types.
Table B.1b. Simple Types & Applicable Facets | ||||||||
---|---|---|---|---|---|---|---|---|
Simple Types | Facets | |||||||
max Inclusive |
max Exclusive |
min Inclusive |
min Exclusive |
precision | scale | encoding | period | |
string | y | y | y | y | ||||
float | y | y | y | y | ||||
double | y | y | y | y | ||||
decimal | y | y | y | y | y | y | ||
timeInstant | y | y | y | y | ||||
timeDuration | y | y | y | y | ||||
recurringInstant | y | y | y | y | y | |||
binary | y | |||||||
integer | y | y | y | y | y | y | ||
non-positive-integer | y | y | y | y | y | y | ||
negative-integer | y | y | y | y | y | y | ||
long | y | y | y | y | y | y | ||
int | y | y | y | y | y | y | ||
short | y | y | y | y | y | y | ||
byte | y | y | y | y | y | y | ||
non-negative-integer | y | y | y | y | y | y | ||
unsigned-long | y | y | y | y | y | y | ||
unsigned-int | y | y | y | y | y | y | ||
unsigned-short | y | y | y | y | y | y | ||
unsigned-byte | y | y | y | y | y | y | ||
positive-integer | y | y | y | y | y | y | ||
date | y | y | y | y | y | |||
time | y | y | y | y | y |
XML Schema's <pattern> facet uses a regular expression language that supports Unicode. The language is similar to the regular expression language used in the Perl Programming language [bibref to Perl], although expressions are matched against entire lexical representations rather than user-scoped lexical representions such as line and paragraph. For this reason, the expression language does not contain the metacharacters ^ and $, although ^ is used to express exception, e.g. [^0-9]x.
Table C1. Examples of Regular Expressions | |
---|---|
Expression | Match(s) |
Chapter \d | Chapter 0, Chapter 1, Chapter 2 .... |
Chapter\s\d | Chapter followed by a single whitespace character (space, tab, newline, etc), followed by a single digit |
Chapter\s\w | Chapter followed by a single whitespace character (space, tab, newline, etc), followed by a word character (XML 1.0 Letter or Digit) |
Española | Espan~ola (where n and tilde are combined into a single character) |
\p{Lu} | any uppercase character, the value of \p{} (e.g. "Lu") is defined by Unicode |
\p{IsGreek} | any Greek character, the 'Is' construction may be applied to any block name (e.g. "Greek") as defined by Unicode |
\P{IsGreek} | any non-Greek character, the 'Is' construction may be applied to any block name (e.g. "Greek") as defined by Unicode |
a*x | x, ax, aax, aaax .... |
a?x | ax, x |
a+x | ax, aax, aaax .... |
(a|b)+x | ax, bx, aax, abx, bax, bbx, aaax, aabx, abax, abbx, baax, babx, bbax, bbbx, aaaax .... |
[abcde]x | ax, bx, cx, dx, ex |
[a-e]x | ax, bx, cx, dx, ex |
[-ae]x | -x, ax, ex |
[ae-]x | ax, ex, -x |
[a-e-[bd]]x | ax, cx, ex |
[^0-9]x | any non-digit character followed by the character x |
\Dx | any non-digit character followed by the character x |
.x | any character followed by the character x |
.*abc.* | 1x2abc, abc1x2, z3456abchooray .... |
ab{2}x | abbx |
ab{2,4}x | abbx, abbbx, abbbbx |
ab{2,}x | abbx, abbbx, abbbbx .... |
(ab){2}x | ababx |
XML Schema Elements:
XML Schema Attributes:
XML Schema's simple types are described in Table 1.
February 25th draft submitted as public draft.
Rewrote "Section X covers .." in Introduction. Added xsd: to simple types in sec. 2 and textual explanation. In sec. 2.3 added URLs to text and Table 1 linking it to Datatypes spec. Corrected URI for xsi:. Added sec. 3.0 explanation for namespace convention change schema. Substantially rewrote se. 3.5 to fix error, and created a new section 3.6 to cover Abstract Elements. Changed "exact" to "block" in sec 3.6. Fixed error in location of unique/key defns in sec. 4.0, and XPath expressions. Fixed ##local and ##TargetNS, and clarified anyAttb in sec 4.5. Appendix B added URLs to Table B1 linking to Datatypes spec. Fixed general typos.
February 23rd draft published to WG.
Added sections on import, schemaLocation, conformance, wildcard, and type content. Replaced "source" with "base". Modified HTML for W3C compliance. Moved Types of Content section from section 3 to section 2. Updated use of namespaces in instance and schema in all sections, reworked section 2 text to account for these changes. Fixed typos, added/deleted text at suggestion of WG members.
February 16th draft published to WG.
Added regular expression description as an appendix. Substantial re-ordering and rewrite of section 2. Added index as an appendix. Fixed large number of typos and adopted TypeName and elementName naming convention.
February 9th draft published to WG.