Ne comprend pas très bien cette notion d’ownership.
This document is the Specification Document for XS2DTD. Ownership of this document (identified by “author” on the cover sheet) implies responsibility for the Specifications activities in the project. These activities are conducted in accordance with XRCE’s Software Process Improvement Policy .
The main goal of XS2DTD is to convert SchemaDoc XML Schema document models into DTD document models. On top of the many differences between these 2 models as well as their respective limitations, it is rather critical to describe exactly what will be covered by this conversion, what will not be covered and how what is not covered will be handled.
XS2DTD is part of the SchemaDoc , which has for objective to provide a way to define XML file information model linked and managed with its own documentation. Keeping this in mind, the scope of this tool is to be able to create, from a set of Schema grouped regarding a SchemaDoc documentation, the relevant textual DTD and, later, DTD information within the documentation output.
This development is today still needed because there are still a lot of people that are not using Schemas and also because there are a lot of tools that are not managing Schemas, especially in the structured document field.
Actually, there are tools that are able to do an XML Schema to DTD conversion. But they are not efficient enough for us because they only keep the base of semantics of information and because they are not providing readable and documented DTDs. As soon as a model is not only a validation tool but also a project component used as part of specifications, it must be readable. Moreover, As soon as it is part of editorial systems, it should be possible to engineer it, to derive it, in order for example to be able to go from reference DTD to edition DTD.
Therefore, generated DTDs from a conversion must keep the order of the original schema file set and, more generally it must keep the whole file architecture of the set of schemas and their relation with SchemaDoc documentation.
Is it always possible to create a DTD? Surely not! The designer will have to decide if he wants to either create a model compatible with DTD and Schema or with Schema only. Therefore, the objective is not to have a generic transformer but rather a transformer that meets the previously defined objectives. For this reason, and because a lot of things will not be done, there is a need to have, during specification, a real and clear transformation policy that will define all schema restrictions expectations.
All valid input XML Schemas must be transformed in valid DTDs regarding to DTD capabilities.
XML Schema has many features, and can express a lot of things DTD cannot. We need to preserve the maximum of semantics and the designing intelligence of XML Schemas (modularity, embedded documentation, object oriented architecture, groups, types ...).
In order to achieve such a goal, three methods may be used:
use an equivalent DTD semantic in the transformation ;
use Parameter Entities in order to allow reusability, modularity and derivation ;
Keep information for comments, as soon as there are no possibilities to find any equivalence in the documentation.
Keeping this in mind, the objects declarations order must be kept, as possible, in the transformation. Therefore, as DTDs are generated from Schemas, the model must respect Schema’s constraints.
Because the model designer may want to control the generated DTDs, the converted Schemas shouldn’t use complicated mechanisms that would not be manageable by the conversion or that would complicate the generated DTDs. Therefore, the conversion will assume that the Schemas will be made accordingly.
Information about types and content definitions, such as type folding or the way it is assigned to an element, must be used as much as possible. For this purpose, a heavy usage of PE will be made.
The objective of this project is to create readable DTDs, based on the readability of the defined Schemas.
Another objective is to provide modular DTDs that can be easily packed for a specific purpose. For example, people creating DTDs today are always defining a lot of information using parameter entities, in order to be able to redefine them in a specific context. This is also what is done in Schema using global types and groups. The objective is thus to be able to map this engineering while creating DTDs.
The resulting DTDs need to be complete, handy and conform to the input XML Schemas. They must be documented, as we want to add information about the created DTDs in our documentation models.
DTDs can be generated with or without namespaces coded within object names.
The generated commentaries must be in a chosen language (French or English as first implementation). These commentaries concern lost order and lost information during the conversion. Lost features that do not have equivalent in DTD are not formatted but embedded as comments. All these commentaries are logged into a log file but will also be present, if required, as near as possible of the objects, in the generated DTDs. This requirement will be passed as a parameter.
PEs will often be used to express modularity and reusabiliy. Conditional sections are for now not targeted, unless a special requirement appears during the specification steps.
In addition to the generated DTDs, an OASIS-Open catalog managing all generated DTDs will be provided,. Because in SchemaDoc each XML Schema file has a title, any entry of the generated DTDs will be added in the catalog to associate his public name with its URI.
Schema must respect a quality charter in order to be properly converted in DTD. Read Section 14.1.7, “Schema quality charter” for more details about this quality charter.
Because a SchemaDoc mechanism enables to use an existent DTD in place of a Schema transformation, these DTDs must also respect a quality charter. Read Section 14.1.8, “DTD quality charter” for more details.
The XSD to DTD transformation goes through several steps:
Read and internalize a SchemaDoc structure
Reorganize the SchemaDoc structure
Generate an XML intermediate structure representing DTD needs
Dump the DTD structure to text files or to what else structures
A program will internalize a SchemaDoc document and will generate a DTD Handler intermediate structure. This structure will then be “dumpable” as a whole or reused in documentation context for being able to output DTD fragments within the documentation itself.
standard Eclipse loading
Extend the eclipse Interface in order to solve information needed at schema level. Method used is treeWalker generating all extended objects from XSD Eclipse
Generate a schema ID mechanism all along the schema and SchemaDoc structure
Provide schema object information to SchemaDoc output generation.
This includes :
identification of all objects of the schema (see also 11.1 Development steps )
Type resolution (enabling to solve questions like “what is the iD of the referenced complexType of this schema).
Note that for this feature, the exposition of the XSD methods should be enough.
This break down structure enables future conversions into other formats such as Relax NG to be easily integrated. Indeed, only the step 3 and other would have to be changed in order to generate the appropriate format.
Transform the Eclipse XSD wrapped class structure into a DOM DTD Handler
Enables integration of DTD information in the SchemaDoc reference manual generation instead of schema information. This is done by providing an access to the DOM structure of the DTD Handler, having all needed value added kept from the XSD wrapped structure.
Dump of the DTD Handler structure to DTD. This is done using a parametrable XSLT program.
Use intermediate structure in order to provide SchemaDoc output generation with XSD wrapper value added information such as IDs.
Limitation : actually, the previous structure is not fully implemented. A wrap-up is made on the processes 2 and 3. Integration will remove this limitation.
XS2DTD is a Java program mainly based on the schema library XSD (eclipse) and XML Java libraries such as DOM and XSLT.
It contains Java source code, Java libraries (internal & external), and scripts allowing to build and run the application.
Integrated within SchemaDoc V2, it uses 5 directories:
bin à
XS2DTD.sh (.bat): launcher. Takes a $filename.xsd file and procudes the XML DTDHandler file and DTD files in the same directory as the $filename.xsd file or in the directory given as argument.
build.sh (bat): run ‘build.sh (.bat) dist’ to build the full SchemaDoc application and update SchemaDocV2.jar
build.xml: builder configuration file; Several targets are available:
compile : compiles SchemaDoc and stores the generated classes in ‘classes’ directory
clean : deletes the ‘classes’ directory
dist : compiles and updates the SchemaDocV2 jar file located in lib/ directory
javadoc : generates the Javadoc in the javadoc directory
all : successively invokes targets clean, compile and dist
conf à
xs2dtd.conf à configuration file. Specify the JVM path, … should be replaced by more standard SchemaDoc mechanisms at integration step.
properties/logging.properties à directories and/or classes logging configuration file
properties/log/ à internationalization files
logs à xs2dtd.log : log file into which XS2DTD errors and warnings are stored.
libs à external & internal libraries container.
Internals à SchemaDocV2.jar : updated when compiling
Externals à
XSD libraries (common_2.0.0.jar, ecore_2.0.0.jar, common.resources_2.0.0.jar xsd.resources_1.1.0.jar, xsd_2.0.0.jar )
XML libraries (xercesImpl_2_5_0.jar, xml apis_2_5_0.jar, xml-apis_2_5_0.jar,; saxon.jar)
Compilation library (ant.jar)
pgm
classes à XS2DTD Java class files
javadoc à XS2DTD javadoc container
xs2dtd à XML DTDHandler to DTD XSLT transformation files
src à XS2DTD Java source files (See below for more details).
XS2DTD java packages are part of fr.tireme..SchemaDoc .
The main Java components are the DTDHandler, the DTDProcessor and the XSD interfaces. Refer to Figure 11, “XS2DTD main components diagram” for more details.
A Javadoc of SchemaDoc is available at the end of this document.
The classes are deployed as follow:
Xs2dtd
Prototype.java à main class
XS2DTD.java à instanciated by the main class. It launches the conversion. See Figure 12, “Main class class diagram” fore more details.
SDSchema à Abstraction of the XSD package. See Figure 13, “Wrapped XSD class diagram” for more details.
● XSD à Wrapped Eclipse XSD package
Table 1.
● XSDAnnotationWrapp.java | |
● XSDAttributeGroupDefinitionWrapp.java |
|
● XSDAttributeUseWrapp.java | |
● XSDAttributteDeclarationWrapp.java | |
● XSDComplexTypeDefinitionWrapp.java | |
● XSDElementDeclarationWrapp.java | |
● XSDModelGroupDefinitionWrapp.java | |
● XSDModelGroupWrapp.java | |
● XSDNotationDeclarationWrapp.java | |
● XSDParticleWrapp.java | |
● XSDSimpleTypeDefinitionWrapp.java | |
Wrapped XSD objects: Each of them instantiates the according XSD object and allows a limited and/or modified access to XSD features. | |
● XSDHandler.java → | Loads the schemas (xsd files or DOM) in XSD |
SDDTD
DTDProcessor.java à process the loaded schema and drives the conversion
Par rapport à l’architecture en 6.1, le treeWalker decris en step 3 est il la dedans ? Ce DTDProcessor est il plugué sur la notion de DTD (idée d’intégrer un jour RelaxNG) ? Le Tree Walker est dans la class XSDSchemaHandler (getNext method) du package d’interfacage XSD. C’est le DTDHandler qui genere le format DTD Handler (pilote par le DTDProcessor). Pour generer du RelaxNG, il faudra changer le DTDHandler. Pour que cela soit simple, il faudrait remplacer le DTDHanlder par une classe abstraite qui serait implementee par l’actuel DTDHandler et le future RelaxHandler. Et aussi bidouiller dans le DTDProcessor.
DTDHandler.java à populate the DOM output (DTD Handler)
Symbols.java à gathers shared static variables
utils
Configuration.java à reads and manages XS2DTD configuration
DateTime.java à internal Date object
FileManager.java à loading and saving utility class
DTDList.java à manages a sorted list of DTDHandler objects
PathResolver.java à resolves xsd location paths
exceptions
Xs2dtdException.java à XS2DTD exceptions
This section goes through the major Schema concepts and elements and describes the conversion with respect to the DTD limitations.
The DTD Handler is an intermediate XML format used to be able to either split the DTD generation process from its formatting within a serialized file either or to integrate DTD information within documentation. It holds,
a DTD oriented structure, keeping the ‘Schema spirit’ …
records from which part and objects schema it has been generated
value-added information coming from schema and that cannot be hold by DTD
comments coming from schema
Finally, this DTD Handler is the base structure onto which programs in charge of displaying the documentation will rely on in order to present the information to users in DTD format (instead of Schema like).
A documentation of the schema is part of the standard documentation of SchemaDoc itself ( http://www.tireme.fr/XMLSchema/documentation/V2/ar01s_xs2dtdInternals.html ).
As stated in the introduction, PEs will be heavily used in the conversion.
PE are used to represent schemas content models (from complexType, simpleType or group), or attributes (from complexType, attribute, or attributeGroup) declarations. They may also be external references mapping schemas imports, includes or redefines.
<element
abstract = boolean : false
block = (#all | List of ( extension | restriction | substitution ))
default = string final = (#all | List of (extension | restriction ))
fixed = string form = (qualified | unqualified )
id = ID
maxOccurs = ( nonNegativeInteger | unbounded ) : 1
minOccurs = nonNegativeInteger : 1
name = NCName
nillable = boolean : false
ref = QName
substitutionGroup = QName
type = QName
{any attributes with non-schema namespace . . .}
>Content: ( annotation ?, (( simpleType | complexType )? , (unique | key | keyref )*))
</element>
Schemas elements are mapped to DTD ELEMENT having the same name with explicit content(s) or PE(s) and DTD ATTLIST with explicit attribute(s) or PE(s).
Local elements are made global (brought to the root level) and inserted just after the object they originated from. But conflicts may arise because of this globalization of local elements. XS2DTD V1 does not resolve the conflicts but it is planned to be achieved in the next releases. So for the moment, local elements are globalized and a warning is raised whenever a conflict is detected.
Empty schemas elements are converted into EMPTY DTD elements.
The following schemas objects are not treated: nillable, form, fixed, default, unique, key and keyref .
From a DTDHandler point of view, a schema element is an elementDecl containing a qName , a contentModel and some attributes . Read Section 14.2, “DTD Handler documentation” for more details.
Example :
Il y a un problème de méthode ici. Je ne sui spas d’accord avec cet exemple car pour moi, para doit être déclaré en mixed et rien de plus pour qu’il soit transformé en simple PCDATA. Dans ce cas-ci, le devrais me retrouver avec une entité qui se nomme string et dont le contenu est PCDATA… bruno, ton avis ? Non ! pour moi, xs :string devient purement et simplement PCDATA (ou CDTA pour les attributs) sans jouer avec une PE.. Idem pour les simpleTypes prédéfinis (hors ID, IDREF,… pour les attributs). Du coup, j’en viens à la méthode : comment suis-je sûr que tous les cas sont bien traités en lisant cette spec qui ne les explique pas ? Je n’ai pas de problème pour regarder les résultats du programme mai il faut alors prévoir du temps pour différencier bug de mauvaise spécification et on saura les choses que au fur et à mesure, pas d’un coup en lisant la spec. CA devient complique de savoir qui dit quoi dans les commentaries. Pour le pa23, il en ressort quoi?
Table 3.
Schema | DTD Handler | DTD |
<xs:element name= “para” type= “xs:string”> <xs:annotation> <xs:documentation> a paragrap </xs:documentation> </xs:annotation> </xs:element> | <elementDecl> <qName id=”para” localName=”para” namespace=”http://…”> </qName> <comment> <xs:documentation> a paragraph </xs:documentation></comment> <contentModel> <mixed></mixed> </contentModel> </elementDecl> | <!-- a paragraph --> <!ELEMENT %para_EL; (#PCDATA)> |
<simpleType
final = (#all | (list | union | restriction))
id = ID
name = NCName
{any attributes with non-schema namespace . . .}>Content: (annotation ?, (restriction | list | union ))
</simpleType>
Schema simple types are represented as PEs and can be used in elements and/or attributes.
D’accord avec cette idée ? Je ne suis pas sûr de bien comprendre : tu voudrais déclarer une PE pour chaque type prédéfini ? Ca ma paraît tres tres lourd ! on grade eventuellement la notion du type dans le comment, mais a t on vraiment besoin d’aller au delà ?? ou CDATA, ID, IDREF, NMTOKEN,….) pour les simpleTypes concernant les attributs qui ont un type existant dans Schema et DTD.JE ne sais pas d’ou sort cette phrase qui ne veut rien dire … on la gicle
All xs: schema datatyping used in the schema wrote needs to be declared as PCDATA for either attribute and/or content model use. This enables, later to be sure that the PE exists .
Depending on their use, i.e.: whether they are part of an Element, an Attribute or both, simple types are converted differently such as:
For simple types included in elements:
Always bound to PCDATA
Comments are added to the handler providing the simple type facet definition and/or derivation.
Or a reference to the according PE if appropriate
For simple types included in attributes:
XML built-in’s types are bound to CDATA except for NMTOKEN, NMTOKENS, ID, IDREF, IDREFS, ENTITY, ENTITIES, & NOTATION which remain unchanged
Or a reference to the according PE if appropriate
Constraining facets Enumerations remains DTD Enumerations
In order to simplify things, it has been decided that in any cases (if used in content models, attributes or both), 2 PEs are generated:
$initialSimpleTypeName_contentmodel
$initialSimpleTypeName
Thus, any references to a simple type will match one or the other so declared PE.
From a DTDHandler point of view, a schema SimpleType is of type contentModelType , which can be either EMPTY, ANY, MIXED (=#PCDATA if no child), choice, sequence, and a reference to another contentModelType. Read Section 14.2, “DTD Handler documentation” for more details.
Derivations are allowed and applied to the PE. See Section 14.1.5.2, “Derivations” for more details.
Example:
Table 4.
Schema | DTD Handler | DTD |
<xs:simpleType name=”STType”> <xs:restriction base=”xs:string”> <xs:pattern value=”\d\d(-\d\d(-\d\d)?)?”/> </xs:restriction> </xs:simpleType> <xs:element name=”STElem” type=”STType” maxOccurs=”unbounded”/> <xs:attribute name=”STAtt” type=”STType” use=”required”/> | <peDef > <qName id=”STType_FORELT_1” localName=”STType_FORELT” namespace=””/> <contentModel><mixed/></contentModel> </peDef> <peDef> <qName id=”STType_FORATT_3” localName=”STType_FORATT” namespace=””/> <attributeTypeContent> <dtd><CDATA/></dtd> </attributeTypeContent> </peDef> <elementDecl> <qName id=”STElem_4” localName=”STElem” namespace=””/> <contentModel> <peRef localName=”STType_FORELT” namespace=””/> </contentModel> </elementDecl> <peDef> <qName id=”STAtt_5” localName=”STAtt” namespace=””/> <attributes> <peRef localName=”STType_FORATT” namespace=””/> </attributes> </peDef> | <!ENTITY % STType_FORELT “%XSString_FORELT;”> <!ENTITY % STType_FORATT “%XSString_FORATT;”> <!ELEMENT %STElem_EL; %STType_FORELT;> <!ATTLIST %STElem_EL; STAtt %STType_FORATT; #REQUIRED> … |
Table 5.
<complexType Content: (annotation ?, (simpleContent | complexContent | ((group | all | choice | sequence )?, ((attribute | attributeGroup )*, anyAttribute ?)))) abstract = boolean : false block = (#all | List of (extension | restriction)) final = (#all | List of (extension | restriction)) id = ID mixed = boolean : false name = NCName {any attributes with non-schema namespace . . .}> </complexType> |
As for simple types, global complex types are represented with 2 global PEs, one for the Content Model and one for the Attributes (if any). Groups are managed like complex types content model.
The possible conversions of a complex type content model are:
empty
choice
sequence
simple content (DTD: #PCDATA, DTDHandler:mixed without sub-elements).
A mixed content is composed of references to elements or PE. Read ‘ Section 14.1.5.8, “Mixed content” ’ for more details.
As for ALL, content model has no equivalent in DTD, It is converted into an unbounded choice and a warning message is raised .
Exemple:
Table 7.
<attribute Content: (annotation?, (simpleType ?)) default = string fixed = string form = (qualified | unqualified) id = ID name = NCName ref = QName type = QName use = (optional | prohibited | required) : optional {any attributes with non-schema namespace . . .}> </attribute> |
Global attributes are mapped as global PEs.
Local attributes are converted into ATTLIST DTD element.
An attribute value can be CDATA, ID, IDREF, IDREFS, NMTOKEN, NMTOKENS, Enumeration or PE reference.
Each attribute has a use and a default or fixed value.
As for the values of the ‘use’ attributes, they are handled as follows:
optional and required are respectively mapped into #IMPLIED and #REQUIRED
If both required and fixed are set, it is converted into #REQUIRED, plus some comments.
If a default value is set along with an optional use, only the default attribute is kept.
If both default and fixed are set, it is converted into #FIXED. What about the value of fixed?
If prohibited is set, the attribute is deleted and a comment is added. This mechanism is used when deriving attributes, either by extension or restriction.
The possible values of use are summarized below:
Use=”required” à #REQUIRED
Use=”required” fixed=”value” à #REQUIRED + comments
Use=”optional” à #IMPLIED
Use=”optional” default=”value” à default
Use=”optional” fixed=”value” à #FIXED + value
Use=”prohibited” à Delete Attribute + comments
anyAttribute is not treated and a warning is raised.
Example:
Particle Schema components ‘minOccurs’ and ‘maxOccurs’ enable to set an interval of number of occurrences of local elements and attributes. (Note that neither minOccurs nor maxOccurs may appear in the declarations of global elements and attributes).
DTD is rather limited concerning occurrences. They are expressed as follow:
1 occurrence à nothing
minimum 1 occurrence à +
minimum 0 occurrence à *
0 or 1 occurrence à ?
The correspondence schema – DTD would be:
Table 9.
minOccurs | maxOccurs | DTD |
1 | 1 | nothing (default value) |
0 | 1 | ? |
0 | Unbounded | * |
1 | Unbounded | + |
0 | N | * + commentary |
M | N | + + commentary |
Any other occurrences expressed in Schema such as (minOccurs, maxOccurs)=(3, unbounded) are not converted. In such situation, a warning will be displayed.
Annotations within objects are stored as is, in the DTD handler, within the comment element. Global annotations are inserted at the exact same location in the DTD handler.
The DTD dumper is responsible to format this information. For the first initial release, the output will diferenciate : appinfo/ toclevelsx, appinfo/ para and documentation that will be treated as appinfo/para. All other annotation contents will not be managed.
Example:
Table 12.
<redefine id = ID schemaLocation = anyURI {any attributes with non-schema namespace . . .}> Content: (annotation | (simpleType | complexType | group | attributeGroup ))* </redefine> |
The redefine mechanism enables to redefine simple and complex types, groups, and attribute groups that are obtained from external schema files. Like the include mechanism, redefine requires the external components to be in the same target namespace as the redefining schema.
The redefine element acts very much like the include element as it includes all the declarations and definitions from the included schema. However, the main difference is that the base type of redefined elements is equivalent to the original element.
A redefinition is actually an include element which content is redefined inside the including element (the one that is redefining a declared element).
In terms of conversion, and since the ‘redefinable’ elements are all mapped though PEs, the redefinition is actually applied on their PE’s content. Practically speaking, redefining an object consists in creating new PEs, called redefining PEs, which are overcharging other existing PEs. Those redefining PEs will happen to have the same name than the overcharged ones and must be declared before the inclusion call (the include ).
In the DTDHandler, an attribute redef will give the ID of the redefined element and show its redefinition.
It is possible to derive redefined elements. As for extensions, the original PE’s content is used. As for restrictions, attributes are kept except the prohibited ones.
What do we do to overcome the pb of recusivity? Copy the redefined original content and rename the redefining PE?
Read Section 14.1.5.2.3, “Particularities: Deriving redefined objects” for more details.
Example:
*) Idée de mise en oeuvre
---------------------
Dans le cas de restriction, à mon avis pas grand chose à dire, il faut utiliser le mecanisme standard.
Dans le cas de l'extension, peut etre -pour faciliter les choses- est il possible d'agir en deux temps :
1) utiliser le mecanisme d'extension classique :
ecrire dans le dtdHandler une peDef qui va dans son contentModel va inclure un peRef sur lui meme.
Garder quelque part la trace qu'il s'agit d'un redefine concernant tel schema.
2) faire au final une passe sur le dtdHandler cherchant ces cas de figure :
- pour un peRef inclus dans le peDef en question
aller chercher dans le handler correspondant aux schema redefini la peDef
remplacer le peRef par le content model de la peDef trouvé.
Il me semble que celà serait assez simple, mais je laisse mike faire ses choix à ce propos
car c'est lui qui maitrise le mieux les algos sur tout celà.
Table 14.
<include id = ID schemaLocation = anyURI {any attributes with non-schema namespace . . .}> Content: (annotation ?) </include> |
Include and imports are mapped into external PEs such as:
<xs:include schemaLocation="foo.xsd"/>
Would give in DTD:
<!ENTITY % foo PUBLIC "foo.ent">
%foo;
As stated before, the conversion from Schema to DTD is not straightforward.
There are constraints related to Namespaces , Mixed content or Ordering for example that are due DTD limitations. Others such as Collisions , or Structure Management are more SchemaDoc requirements. All these constraints that have to be taken into account during the conversion are described and discussed in the following chapter.
Schemas namespaces are managed and shouldn’t be lost during the conversion.
XML instances created according to the DTD generated by the conversion are able to use these namespaces just as if they were created from the original Schemas with two big restriction : people will not be able to change the default namespace many times within a same document and they will use the same prefix all along the created documents.
Au fait, il faudrait certainement les identifier dans le DTD handler car on risque d’être aussi obligés de faire des include de tous les noms utilisés à ces niveaux (les fameux ns-defs).
In the DTD Handler, elements and attributes will have qualified names, with a namespace. While serializing, a namespace file declaration will be created and all files will first include this global file. This ensure that the information will be known by all “roots” as defined before but also by all “intermediate roots ” : DTD file that may be seen as self contained by a parser.
…
Schema allows types to be derived by extension or restriction. This derivation is converted differently depending on the base type.
Several cases are identified and described below:
Extending a simple type consists in adding attributes to the base type. The derived type is then a complex type, mapped as excepted by 2 PEs:
PE1 à xx_CONTENTMODEL (#PCDATA) PE2 à xx_ATTRIBUTE (attribute declaration)
Extending a complex type consists in adding sequences or choices of elements at the end of the base content model and/or attributes.
Examples: Next are examples of complex types extension, providing 2 complex types references ct_sequence and ct_choice such that:
ct_sequence :
<xs:complexType name="ct_sequence"> <xs:sequence> <xs:element name="t1"/> <xs:element name="t2"/> </xs:sequence> </xs:complexType>
ct_choice:
<xs:complexType name="ct_choice"> <xs:choice> <xs:element name="t3"/> <xs:element name="t4"/>
</xs:choice> </xs:complexType>
Extending a ‘sequence’ complex type with a sequence:
<xs:complexType name="ctseq_ext_sequence"> <xs:complexContent> <xs:extension base="ct_sequence"> <xs:sequence> <xs:element ref="new1"/>
<xs:element ref="new2"/> </xs:sequence> </xs:extension> </xs:complexContent> </xs:complexType>
is converted as:
<!ENTITY % ctseq_ext_sequence '(%ct_sequence;,(new1,new2))'>
Extending a ‘sequence’ complex type with a choice:
<xs:complexType name="ctchoi_ext_sequence"> <xs:complexContent> <xs:extension base="ct_sequence"> <xs:choice> <xs:element ref="new1"/>
<xs:element ref="new2"/> </xs:choice> </xs:extension> </xs:complexContent> </xs:complexType>
is converted as:
<!ENTITY % ctchoi_ext_sequence '(%ct_sequence;,(new1|new2))'>
Extending a ‘choice’ complex with a sequence:
<xs:complexType name="ctseq_ext_choice"> <xs:complexContent> <xs:extension base="ct_choice"> <xs:sequence> <xs:element ref="new1"/>
<xs:element ref="new2"/> </xs:sequence> </xs:extension> </xs:complexContent> </xs:complexType>
is converted as:
<!ENTITY % ctseq_ext_choice '(%ct_choice;,(new1,new2))'>
Extending a ‘choice’ complex with a choice:
<xs:complexType name="ctchoi_ext_choice"> <xs:complexContent> <xs:extension base="ct_choice"> <xs:choice> <xs:element ref="new1"/>
<xs:element ref="new2"/> </xs:choice> </xs:extension> </xs:complexContent> </xs:complexType>
is converted as:
<!ENTITY % ctchoi_ext_choice '(%ct_choice;,(new1|new2))'>
Extending a complex type by adding an attribute. This one can be combined with the previous examples i.e.: extending the content model and the attributes.
<xs:complexType name="ctseq_ext_attribute"> <xs:complexContent> <xs:extension base="ct_sequence"> <xs:attribute name="newA"/> </xs:extension> </xs:complexContent> </xs:complexType>
is converted as:
<!ENTITY % ctseq_ext_attribute_CMODEL '(%ct_sequence_CONTENTMODEL;)'>
<!ENTITY % ctseq_ext_attribute_Atts '(%ct_sequence_Attributes;, newA)'>
Restricting a simple type results in the creation of 2 different PEs, depending on how is used this simple type i.e.: Is the derived type used in content models or for attributes?
PE1 à xx_contentmodel (#PCDATA)
OR
PE2 à xx containing either
(ID or NMTOKEN(S)) or IDREF or CDATA)
OR
Enumeration
In both cases, the applied facets are dumped as comments.
Restricting a complex type consists in redefining a new complex type based on the restricted content model in such a way that the 2 PEs are not related at all.
Example:
<xs:complexType name="t2"> </xs:complexContent> <xs:restriction base="t1"> Content Model </xs:restriction> </xs:complexContent> </xs:complexType>
is seen as:
<xs:complexType name="t2"> Content Model </xs:complexType>
A comment showing the restriction is dumped.
Extension
Recall that redefinitions have the same name as their base type, there’s a need to resolve the derivation or the reference to groups, in order not to fall into a recusivity dead-end. The resolution will consist in copying the original content model of the redefined element into the redefining element.
For example, when a classic extension looks like:
complexType t1
extension complexType t2
add content model cm1
complexType t2
content model cm2
, the generated DTD would be :
<!ENTITY % t1 '(%t2;cm1)'>
But considering the re-entrance problem (recursivity), the reference to the PE is replaced with the PE’s content model itself.
Hence the following:
schema root
redefine schema sub
complexType t1
extension complexType t1
adding content model cm1
schema sub
complexType t1
content model cm2
is converted into :
<!ENTITY % t1 '(cm2,cm1)’>
<!—and not (%t1;cm1) !!! -->
<:ENTITY % sub SYSTEM "sub.dtd">
%sub;
sub.dtd
<!ENTITY % t1 '(cm2)'>
Pierre wrote: Pourquoi ne pas avoir plutôt :
root.dtd
<!ENTITY % t1 '(%t1_ToBeRedefined;,cm1')>
<:ENTITY % sub SYSTEM "sub.dtd">
%sub;
sub.dtd
<!ENTITY % t1 '(cm2)'>
<!ENTITY % t1_ToBeRedefined '(cm2)'>
This only applies to extensions of complex types, to extensions of simple types with an enumeration, and to redefinitions of groups and attributes groups through explicit references to the included group (group ref=).
Restrictions
Restricting a redefined object does not imply any specific treatment. Refer to general restrictions for more details.
An added value of Schemas is that they can be fragmented into several files. So is the structure of Schemas produced by SchemaDoc, and thus the structure proposed for the conversion into DTD. This Set of Schemas (grove) shall be kept during the conversion, with every schema files having its equivalent in terms of DTD files. Actually, another file is associated with every schema documents, which contains the Namespaces declarations.
In Schemas, declarations can be done after the usage. For example, it is possible to declare an element of type foo whereas this type foo has not been defined yet. foo can be declared later in the same document or in an included document.
Unfortunately, this is not feasible in DTD as every PEs must be declared before their use.
Which means in terms of DTDHandler that peDefs must be declared before their according peRefs or when appropriate, before the PE ‘include’ that is including the peRef. The very same model works for DTDs.
As for the DTDHandler, several cases, detailed below, need to be considered:
peRef is encountered before its definition (peDef) AND in the same schema à
peDef is moved and inserted just before peRef.
peRef is encountered before its definition (peDef) BUT in an included schema i.e.: peDef is in the including schema and peRef is included before its definition peDef à
Y a t-il reellement des risques de dead locks a partir du moment ou l’on ne deplace les peDefs que localement
peDef is moved and inserted just before the peDef ‘include’ and the ordering process is restarted for this schema .
peRef is encountered before its definition (peDef) AND is in the including schema i.e: peDef is included in the schema declaring peRef AFTER this peRef à
Do not do anything but raise and log a warning and generate the DTDHandler as usual.
The ordering process is then just a matter of re-ordering schemas locally. This is a clear and identified limitation.
The following examples falls into the non-supported third case:
root.xsd: <xs:include shemaLocation="sub.xsd"/> <xs:complexType name="itemType"> <xs:complexContent> <xs:extension base="basicType"> ... </xs:extension> </xs:complexContent > </xs:complexType >
</xs:include>
sub.xsd : <xs:complexType type="basicType> …
</> <xs:element name="item" type="itemType"/>
It produces the following unorganised DTDs:
root: <!ENTITY % sub SYSTEM "sub.ent"> %sub; <!ENTITY % itemType '(%basicType;,.....)'>
sub: <!ENTITY % basicType '....'> <!ELEMENT item %itemType;>
Collisions will be treated in XS2DTD V2.
A collision occurs when several global elements happen to have the same name. This may happen either after the inclusion of another schema or after a globalization.
To overcome this conflict, a type resolution processing is applied on the content model and the attributes of the element. Depending on their type, different actions are taken …
There are 3 kinds of Content type:
empty
complex content
simple content (simple types or complex types with a simple content)
Several cases need to be considered:
The 2 elements are identical: The second one is removed.
The 2 elements have different content models:
If at least one of the types is a complex type, a Section 14.1.5.7.1, “Content model summation” is processed.
If there are only simple types or simple types and empty element(s), a Section 14.1.5.7.3, “Simple content generalisation” is achieved. The Empty element is skipped because mapped in CDATA.
If all elements are empty, the resulting element is empty.
Content model analysis is then mandatory to solve the collisions.
A warning is raised when collisions happen.
Types must be resolved to solve some cases. We must know if:
a group is used in a mixedContent
a simpleType is used for elements, for attributes or for both.
Please explain …
By default, a type content summation will create a choice group element into which the different content models are inserted. But there are exceptions …
If one of the content models is either mixed, simple content or simple type, the resulting summation type should respect the mixed content rules. Read chapter ‘Section 14.1.5.8, “Mixed content” ’ for more detailed information about these rules.
The resulting element after a type summation between complex type elements and empty elements is set optional if not already optional or unbounded.
One drawback is that conflicts may arise, regarding the DTD semantic.
For example, the following choice ((a,b,c)|(a,d))* is not DTD compliant since the a element is part of the 2 alternatives.
A better solution based on content model analysis might be considered in the future
As for their attributes, an Section 14.1.5.7.2, “Attributes generalisation” is applied.
By default, an attributes generalization is equivalent to a merge. The resulting list of attributes is the sum of the original attributes with some of them set to optional. An attribute is set optional if it was not part of every list of attributes.
By default, a simple type generalization is equivalent to unification, which means that the resulting element will contain only CDATA.
As for attributes, the default will be CDATA as well but there are 2 exceptions:
Enumerations can be transformed in a larger enumeration, with only one occurrence of each part.
NMTOKEN(S) remains NMTOKEN(S).
Schemas mixed content are complex types - anonymous or not - having a mixed attribute set to true or 1, such as:
<xs:element name="para"> <xs:complexType mixed="true"> with a sequence or a choice and some attributes if any </> </>
which would make in terms of DTDHandler :
<elemDecl> <contentModel> <mixed> with elemRefs and peRefs used in the sequence or the choice </> </> </>
It means that neither the sequence nor the choice tags are kept during the conversion. Only they contents are . Note that their occurrences are skipped as well.
For example, the following Schema para element:
<xs:element name="para">
<xs:complexType mixed="true">
<xs:choice maxOccurs="unbounded">
<xs:element ref="italic"/>
<xs:group ref="emphasis.Grp"/>
</>
</>
</>
is converted in DTDHandler format as:
<elemDecl> <qname localName="para"/> <contentModel> <mixed> <elemRef localName="italic"/> <peRef localName="emphasis.Grp"/> </> </> </>
In DTD, it is considered as a choice starting with PCDATA , such as:
<!ELEMENT para (#PCDATA|italic|%emphasis.Grp;)*>
Note that here para cannot contain any PE that would be either mixed or anything else but a choice.
Attributes of mixed content types are converted normally.
But there are of course exceptions that have to be treated …
Case 1: Mixed complex types cannot make references to other mixed types
From the following example,
<!ENTITY % emphasis.Grp '(#PCDATA|emph|sub|sup)*'> <!ENTITY % paraType '(#PCDATA|%emphasis.Grp;)*'> <!ELEMENT title (%emphasis.Grp;)>
We need to create 2 Pes emphasis.Grp - emphasis.Grp_formixed and emphasis.Grp - such as:
<!ENTITY % emphasis.Grp_formixed 'emph|sub|sup'> <!ENTITY % emphasis.Grp '(#PCDATA|emph|sub|sup)*'> <!ENTITY % paraType '(#PCDATA|%emphasis.Grp_mixed;)*'>
<!ELEMENT title (%emphasis.Grp_notmixed;)>
Note that the occurrences have disappeared as well as the parenthesises around the formixed PE. The conversion can be reproduced with PEs of PEs such as:
<!ENTITY % guil_formixed 'quote'> <!ENTITY % emphasis.Grp_formixed 'emph|sub|sup|%guil_formixed;'>
Case 2: Mixed complex types cannot contain anything else but choices
Then the following is forbidden:
<!ENTITY % emphasis.Grp '(emph,sub,sup)'> <!ENTITY % paraType '(#PCDATA|%emphasis.Grp;)*'> <!ELEMENT title (%emphasis.Grp;)>
as well as:
<!ENTITY % emphasis.Grp 'emph,sub,sup'> <!ENTITY % emphasis.Grp '(emph|sub|sup)'> <!ENTITY % emphasis.Grp 'emph|sub*|sup'>
But it is possible to overcome this by using 2 PEs such that:
<!ENTITY % emphasis.Grp '(emph,sub,sup)*'> <!ENTITY % emphasis.Grp_formixed 'emph|sub|sup'> <!ENTITY % paraType '(#PCDATA|%emphasis.Grp_formixed ;)*'>
Warning: this model can be propagated at every levels such that:
paraType could reference a group baseElements.Grp
baseElements.Grp could be a choice of elements emph and italic and of a group eqn
eqn could be a sequence (with occurrences) of intro and body
A wrong conversion would be:
<!ENTITY % eqn.Grp '(intro,body)'> <!ENTITY % baseElements.Grp '(emph|sub|sup|%eqn.Grp;)'> <!ENTITY % paraType '(#PCDATA|%baseElements.Grp;)*'
The correct conversion is rather:
<!ENTITY % eqn.Grp '(intro,body)'> <!ENTITY % eqn.Grp_formixed '(intro|body)'> <!ENTITY % baseElements.Grp '(emph|sub|sup|%eqn.Grp_formixed ;)'> <!ENTITY % paraType '(#PCDATA|%baseElements.Grp;)*'>
Case 3: Derived mixed complex types
It is possible to derive complex types – originally mixed or not - with a mixed content.
<xs:element name="para"> <xs:complexType mixed="true"> <xs:complexContent> <xs:extension base="paraType"> … extension … </> </> </> </>
In that case, the rules described above are applicable to paraType .
Case 4: Non simple choice content models used in mixed contents
This one is a simple add-on to the 2 previous cases. Whatever is the content model type, different from a simple choice without occurrences, it is resolved as in case 2 (even for a choice*).
Then, we cannot have:
<!ENTITY % emphasis.Grp '(emph|sub|sup)*'> <!ENTITY % paraType '(#PCDATA|%emphasis.Grp;)*'>
but rather:
<!ENTITY % emphasis.Grp '(emph|sub|sup)*'> <!ENTITY % emphasis.Grp_formixed 'emph|sub|sup'> <!ENTITY % paraType '(#PCDATA|%emphasis.Grp;)*'>
Case 4: Parenthesises
DTD does allow to have 2 opening parenthesis in a row in front of PCDATA such as:
<!ENTITY % paraType '(#PCDATA|italic|%emphasis.Grp;)*'> <!ELEMENT para (%paraType;)>
This problem is handled by the XSLT transformer such as:
<xsl:template match="dtd:mixed"> <xsl:choose> <xsl:when test=".//dtd:*"> <xsl:text>(#PCDATA</xsl:text> <xsl:apply-templates/> <xsl:text>)*</xsl:text> </xsl:when> <xsl:otherwise> <xsl:text>(#PCDATA)</xsl:text> </xsl:otherwise> </xsl:choose> </xsl:template> <xsl:template match="dtd:mixed/dtd:elemRef|dtd:mixed/dtd:peRef"> <xsl:text>|</xsl:text> <xsl:value-of select="le nom de l'objet"/> </xsl:template>
Recursive calls through PEs are forbidden. An error is raised if such case is encountered. For example, the following sample will not be accepted:
<!ELEMENT ROOT (%toto;)>
<!ENTITY % toto ‘((A, B,%tutu;)’>
<!ENTITY % tutu ‘((A, C,(%toto;)?)’>
<!ELEMENT A ANY>
<!ELEMENT B ANY>
<!ELEMENT C ANY>
Recall DTD limitations, schema information or features might be lost during the conversion, either because too complicated to map in DTDs, because irrelevant or because it would increase the complexity of the generated DTDs.
There are three kind of lost information:
Schema information that can’t be converted, either partially or completely
Schema information that is not handled by DTDs and is of no interest for the reader even as comments
Schema information that is not handled by DTDs but can help the reader’s understanding
Read chapter ‘ Section 14.1.6, “Lost information” ’ for more details.
When information is lost during the conversion and not stored as comments in the DTD, a warning message is displayed in the console.
Should it be stored in the log file as well?
Information that can be of any interest for the understanding of the model is stored as DTD comment. This in order to keep as much as possible the schema ‘spirit’ in the DTDs.
Abstract types are not converted. When encountered, a low level warning is raised. However, the information is not stored as comments in the DTD.
The Polymorphism is not converted because of the limitations of DTDs regarding the definition of different content models via xsi:type.
SUSTITUTION groups are not supported.
Identity constraint Definition Schema Components (Key, KeyRef and Unique).
Xs:any?
As soon as DTD does not support namespaces, it handles a certain mapping of namespaces by creating xxx.nsdefs.ent files, which are included in every generated DTDs (see 5.10 ).
Therefore, the dumper is mainly used for being able to dump to DTD files. Nevertheless, the generation of documentation process will also use it. For example because as soon as reference documentation is provided, it will need to provide the same DTD information than the one provided within the DTD files.
Nevertheless, this is not the responsibility of the xs2dtd program to manage documentation. Therefore, people responsible of the integration of the program within SchemaDoc environment, will decide on the best way to use the dumper for documentation needs.
In parallel, an XML Catalog is generated. This catalog is a way to map the information in an XML external identifier into an URI reference for the desired resource. When a file is in a different place, only the catalog has to be modified. When we generate DTD, we make the catalog, using information from SchemaDoc with the model names in SchemaDoc .
Is this handled by XSLT? I’d need more info about this catalog. How do I link it with the DTD Handler?
In the DTD Handler, objects have qualified names, with namespace. These namespaces are all defined in the For each file, each used namespace will be associated with a prefix.
As namespaces are not managed in DTD, PEs will be used to overcome this problem. More precisely, PEs will handle the prefixes used for namespaces. These ‘prefix’ PEs will be part of all the appropriate the object names.
The ‘prefix’ PEs are defined in a separated DTD file which is inserted at the beginning of each DTD files using the corresponding namespaces. Thus, according to the value given to the ‘prefix’ PE, objects name will be impacted.
If no namespace is specified, the ‘prefix’ PE must be set to an empty string.
If included Schema has no namespace, these elements are considered of the local targetnamespace.
Because the development is not so trivial, it has been decided to work on a cycling development method, enabling to test in the same time as things are specified.
In this mind, the different iterations are:
1- Conversion of a single schema file containing few simple schema use
1b- take into account mixed contents and derivation
2- Take into account includes and limited reordering management
3- take into account redefinition and import
4 take into account groves
5 take into account collision coming from local/global resolution and complex reordering management.
6 take into account local schema elements identification.
XS2DTD V1 will manage features 1 to 3; XS2DTD V2 will come up to 6.
As development is implied, it has been decided to gave restrictions on the initial version 1 development. These limitations are defined here :
Collision are not managed but warnings are thrown;
Local/global is made but no conflicts are solved;
Limited reordering … see 6.4
Actually, it is impossible to use xs2dtd only for the purpose of retrieving id’s for schema Output. Processes 2 and 3 are wrapped-up.
People making the integration will remove this limitation by doing the following :
Separate the process responsible for the wrapping of XSD objects and the process in charge of the transformation from the XSD wrapped classes to the output format (currently DTDHandler format):
Separate the wrapping process (tree walker) from the creation of the DTDHandler ; In DTDProcessor , create a new method wrappSchema(SDSchema) , based on verticalBuild method, which would only parse the XSD loaded schema and wrapp the objects on the fly. Modify verticalBuild so that it takes the output of the method wrappSchema as argument.
In order to be able generate other formats such as RelaxNG (instead of the current DTDHAndler format), an super class Processor should be created. This class would be implemented by the actual DTDProcessor and by the future handlers (RelaxNGProcessor, …). It would also implement the 2 important methods which are wrappSchema(SDSchema) and generateIntermediateFormat(SDSchema) (currently xxx Build (SDSchema )).
Same operation with DTDHandler (super class Handler and subclasses DTDHandler and RelaxNGHandler ).
XS2DTD only sees Handler and Processor classes and has new methods related to the conversion towards RelaxNG.
For the prototype, and at runtime, the normal processing would go as follows:
The Prototype class instantiates XS2DTD with the xsd file to load.
XS2DTD instantiates the DTDProcessor and the DTDHandler. It loads the schema through the XSDHandler class and asks the DTDProcessor to start the conversion.
The DTDProcessor processes the elements one after the other using the XSDSchemaHandler#getNextObject method and populates the intermediate DOM representation via the DTDHandler class.
Once the conversion is over, Prototype launches the serialization and the XSLT transformation using the XS2DTD class.
Figure 5 : Main class class diagram
TBD
— Root of the structure, including n schema2dtd structures
Top level element of the handler structure. It shares all namespaces declaration before enabling to provide each individuals handler (xs2dtdHandler).
— structure handling information from a Schema : contains only parameter entity definitions and element declaration
— Properties extracted from XML Schema file relevant to the xs namespace itself
— Misc informations about qualification (elemFormDefault, attributeFormDefault)
— Properties extracted from XML Schema file relevant to the SchemaDoc namespace
— All declarations : parameter entities and element declarations, comment -from xs annotation- may be found.
— Parameter Entity declaration. May be a simple content model (from simpleType), a complex content model (from complexType, group), attributes declarations (complexType, attribute, attributeGroup)
— case of complex type redefinition, a new attribute is defined. In this cas, it generates a peDef and an attribute declaration -for an elemRef- with a peRef. The attributes decl refers to an element declared somewhere (referenced by elementRef)