Saturday, April 10, 2010

Choice or no choice?

Ever wonder how you would represent a person's profile in XML format? Lame question, is it, anyone who has gone through XML 101 can do it. OK, let us try to create an XML representation for Mr John Doe:

<person>
<firstname>John</firstname>
<lastname>Doe</lastname>
<age>XYZ</age> <!--Bad design-->
</person>

Before we get into discussion on the XML representation, two question:

Q1: What is John Doe's age?

Q2: Why age in this representation a bad design?

We share this with John, he likes it but asks for further elaboration:
Personal detail: Passport information
Professional information: Education and Job details


So, we work on the elaboration based on John's feedback and the new representation looks like:
<person>
...
<passportdetails>...</passportdetails>
<education></education>
<job>
<employer></employer>
<start></start>
<end></end>
<designation></designation>
</job>
</person>
John like this representation. Now that the techie he is, he wants to take this further, and wants us to define a schema (XSD) for this representation. Time for XSD 101!!
OK, we ready to for this challenge Mr Doe.
We need to define 3 complex types - Passport, Education, Job details - and use them to define the type for XML representation of person, say, Person:

<xs:element name="person">
<xs:complexType> <xs:all>
<xs:element name="firstname" type="xs:string"/>
<xs:element name="lastname" type="xs:string"/>
<xs:element name="age" type="xs:integer"/>
<xs:element name="passportdetails" type="PassportDetail"/>
<xs:element name="education" type="EducationDetail"/>
<xs:element name="job" type="JobDetail"/>
</xs:all>
</xs:complexType>
</xs:element>

Looks good, says John. Wait a sec, what about the job history. We need to have representation for previous jobs as well.
Hmm, should be trivial to make this change (and the XML schema WG is smiling!!). So, what options do we have for this change in schema. Let us understand John's requirement better, add some generalization, to avoid re-work. What we need in the representation:

A person with,
- first name as required element
- last name as required element
- age as required element
- passport details as optional element.
- education details as optional element.
- job details, can have multiple occurrences of this element.

Since there can be multiple occurrences of job element, we do not have option of using 'all' (allows no more than one occurrence of each element) . So, current schema definition need to change in terms of order indicator, and the possible options are:

sequence - elements should occur in a specific order.

<xs:element name="person">
<xs:complexType> <xs:sequence>
<xs:element name="firstname" type="xs:string" minOccurs="1" maxOccurs="1" />
<xs:element name="lastname" type="xs:string" minOccurs="1" maxOccurs="1" />
<xs:element name="age" type="xs:integer" minOccurs="1" maxOccurs="1"/>
<xs:element name="passportdetails" type="PassportDetail" minOccurs="0"maxOccurs="1" />
<xs:element name="education" type="EducationDetail" minOccurs="0" />
<xs:element name="job" type="JobDetail" minOccurs="0" maxOccurs="unbound"/>
</xs:sequence>
</xs:complexType>
</xs:element>
This restriction is unnecessary for our use case.

choice - either of the child elements can occur.

<xs:element name="person">
<xs:complexType>
<xs:choice minOccurs="0" maxOccurs="unbound" >
<xs:element name="firstname" type="xs:string" />
<xs:element name="lastname" type="xs:string" />
<xs:element name="age" type="xs:integer" />
<xs:element name="passportdetails" type="PassportDetail" />
<xs:element name="education" type="EducationDetail" />
<xs:element name="job" type="JobDetail" />
</xs:choice>
</xs:complexType>
</xs:element>


Various elements defined within 'choice' are not alternates of each other, so usage of choice is not logical.

Ideally we would like to use 'all' order indicator with occurrence indicator 'maxOccurs' for for 'job' element set to unbound. This however is not an option (XSD WG need to be notified).

So, what options do we have are for serving Mr Doe? There are a few, all of which require us to revisit the XML representation:

Use group element - this is an artifact for grouping related sets of elements
Use wrapper elements - in case we do not what to learn one additional thing about XSD, this is approach can be used for accomplish the task.
XML representation using both the approaches will be similar:

<person>
<firstname>John</firstname>
<lastname>Doe</lastname>
<age>40</age> <!--Bad design--> <passportdetails>...</passportdetails> <education></education>
<jobs>
<job>
<employer></employer>
<start></start>
<end></end>
<designation></designation>
</job>
<job> ... </job>
</jobs>
</person>

To summarize, if an element contains combination of required and option elements, some of which can occur one time while other can have multiple occurrences, do not be tempted to use choice, use groups (or wrapper elements).

Q1: What is John Doe's age?

Ans: First know usage of the term dates as early as 1659, so Mr Doe is 351 years old.

Q2: Why age in this representation a bad design?

Ans: Should use absolute information. Date (time) of birth, with timezone.