Specializing DITA – the Relax(i)NG way

It takes a specialized tool to make your work most effective. DITA offers technical authors what a fine screwdriver offers a handyman: An instrument specifically designed for the job.

Text by Jang F.M. Graat

Inhaltsübersicht

Specializing DITA – the Relax(i)NG way

Image: © Feverpitched/istockphoto.com

When someone asks me how many elements there are in DITA, I answer "too many and too few". This is because DITA, like other XML standards, defines its elements semantically. And how many semantic tags do you need to describe the content in your content domain?

The DITA standard contains the most used elements across most of the current content domains where it is being applied. But your content does not require all of those elements. At the same time, there is always something very special about your domain that others do not need at all.

DITA was designed for just such a diverse universe of content domains. And because of its design, it makes little sense to use DITA without customizing it. Using non-customized DITA is not understanding what DITA is all about. This article aims to take away your misconceptions and, most of all, your fears about customizing DITA. The first article (which appeared in the March issue of this magazine [1]) was about reducing the available elements. This follow-up article is about adding your own semantic tags to DITA without breaking away from the standard. Together, these two articles show you all you need to make DITA fit your content like a glove.

Specialization – adding your own elements

Some jobs, although they can be done with a stick, a brick and a piece of rope, are much easier when you have special tools for them. In that sense, a screwdriver is a specialized stick, a hammer a specialized brick, and a chain a specialized rope. And there is no end to specialization. When you walk into a home improvement shop or a garage, you get the idea.

Specialist (or specialized) tools make your work more effective. This is true for authoring content as well. Technical authors have moved to using XML because of this: Instead of just making every other word bold or italic, it is much more informative to mark up some content as <term> or <apiname> or <keyword>. It also allows much better indexing and other processing of your content. Most of the elements in DITA are these special cases of more generic elements.

As an example, I will create semantics for measurement units [2]. With special tags for <length>, <weight>, <pressure> etc., automatic localization becomes a possibility. This article shows how the <length> element can be added to DITA using Relax NG.

The steps involved in specializing DITA are:

  1. Create a new file to hold your specialized elements
  2. Copy existing element definitions to the new file
  3. Rename the copied element patterns (use unique names)
  4. Constrain the content models of the new elements (optional)
  5. Edit the @class to allow generalization by DITA processing tools
  6. Inject the new element as alternative for the base
  7. Include the new file in your document shells

 

Step 1: Create a new file

It is advisable to place your specialization file(s) in a new directory within the DITA 1.3 plugin for the DITA open toolkit. Make the new directory a sibling to existing ones, using the same naming conventions. For the measurement units specialization, I am going to create the file:

[DITA-OT dir]/org.oasis-open.dita.v1_3/rng/units/rng/unitsDomain.rng
 

Step 2: Copy existing element definitions

Basically, specializing an element means copying an existing element definition, changing its name to a unique name, and optionally constraining its content model. This allows DITA processing tools to generalize your specialized element back to its ancestor and not choke on it. This means that you need to pick the right base element for your specialization. It needs to have at least all the child elements that you need in your specialized content model. Also, your base element must be valid wherever you want to make your new element valid.

The new <length> will have mandatory <amount> and <unit> children, each of which will only have text. As I want to have <length> become valid anywhere in my topics, I am basing it on the <ph>. Both <amount> and <unit> will be based on <ph> as well. In my unitsDomain file, I will therefore create three copies of the <ph> element definitions.

Step 3: Rename the patterns

In the copies of the <ph> patterns, I replace each occurrence of "ph" with "length" (and similar for "amount" and "unit". Showing only some patterns for <length>, the result is:

      <define</span> name="length">
            <ref</span> name="length.element"/>
      </define>

      <define</span> name="length.element">
            <element</span> name="length" dita:longName="Length">
                  <ref</span> name="length.attlist"/>
                  <ref</span> name="length.content"/>
            </element>
      </define>

      <define</span> name="length.attlist" combine="interleave">
            <ref</span> name="length.attributes"/>
      </define>

      <define</span> name="length.content">
            <zeroOrMore>
                  <choice>
                        <ref</span> name="ph.cnt"/>
                  </choice>
            </zeroOrMore>
      </define>

You can guess the remaining pattern changes from the above examples. Note that the content model – which references a "ph.cnt" pattern – is not changed yet. This is done in the next step.

Step 4: Constrain the content models

The new element should not allow everything that the <ph> base allows. I want to constrain the content model of <length> to only allow my new <amount> and <unit> elements (both mandatory). Note: Instead of a <unit> child element, I could have chosen an attribute. The advantage of an attribute is the option to limit accepted values, but rendering would require added code to make the unit appear in the output.

Each of these elements in turn only allow text. This means editing these content models as follows:

      <define</span> name="length.content">
            <ref</span> name="amount"/>
            <ref</span> name="unit"/>
      </define>

      <define</span> name="amount.content">
            <text/>
      </define>

      <define</span> name="unit.content">
            <text/>
      </define>

To define the attributes allowed on <length>, I need to change the model for the attlist. I am going to remove some of the unwanted attributes and add a mandatory attribute for the measurement system, with a limited set of values to choose from (Note: I could have chosen other values, such as "metric", "imperial" and "us_custom" or any other set of defining values. Those chosen seem to be convenient, as they already appear in locales):

      <define</span> name="length.attributes">

            <ref</span> name="univ-atts"/>
            <attribute</span> name="units">
                  <choice>
                        <value>EU</value>
                        <value>UK</value>
                        <value>US</value>
                  </choice>
            </attribute>
      </define>

The "univ-atts" pattern makes sure the attributes for the conref mechanism are kept on <length>. This allows me to reuse a <length> throughout my content. As I do not want to have the conref done on the <amount> and <unit> children, I will edit the attributes model for those elements to remove the "univ-atts" pattern. If you only want to have part of the "univ-atts" pattern included, you will have to add individual attributes from that pattern to your model.

Step 5: Edit the class attribute

The magic of DITA (allowing you to add new elements without breaking the rules or tools) lies in the class attribute, which points to the ancestry of a specialized element. In the case of <length>, the value for @class lists a + sign (for domain specialization), the domain plus element name of the ancestor and the domain plus element name of the specialization.

      <define</span> name="length.attlist" combine="interleave">
            <ref</span> name="global-atts"/>
            <optional>
                  <attribute</span> name="class"

                              a:defaultValue="+ topic/ph  units-d/length"/>
            </optional>
      </define>

Step 6: Inject the specialization

At this point, the specialized elements are completely defined, but there is no way to enter them into the DITA content. Before I can start using the specializations, I have to extend the definition of the base element, so that the specialized elements appear as alternatives for the base wherever that base is valid. This is done in two patterns that are usually added at the top of the specialization file:

      <define</span> name="unit-d-ph">
            <ref</span> name="length"/>
      </define>

      <define</span> name="ph" combine="choice">
            <ref</span> name="unit-d-ph"/>
      </define>

The first pattern can be extended to allow more specializations in this domain (such as <pressure>, <weight> etc.) – in this case, all the alternatives must be wrapped in a <choice>. The second pattern activates the specialization. Wherever <ph> is valid in the existing content models, the alternatives from the new units domain become valid, too.

Step 7: Include the specialization

Before the new specialization can be used in actual DITA content, it has to be included in a so-called document shell. This is the starting point for validation of XML content. Each topic type for which you want the specialization to become available needs to reference the new domain file:

      <include</span> href="../../units/rng/unitsDomain.rng"/>

For completeness, the @domains of the root element for the topic should also be extended with a reference to the new domain:

      <attribute</span> name="domains"
            a:defaultValue="[...]
                            (topic units-d)”/>

However, to my knowledge, none of the current DITA processing tools is really interpreting the @domains, so this step is not necessary to make the specialization work.

Using the specialization

Once the document shell is extended with the new units domain (and assuming your XML editor allows direct validation from Relax NG files or you have transformed the Relax NG files into DTDs for validation), you can start using the new <length> wherever a <ph> is valid in your content. As the @units on <length> as well as its children <amount> and <unit> are mandatory, your editor will complain until you have the markup completed, as in this example:

<?xml version="1.0" encoding="UTF-8"?>
<concept</span> id="TestUnits">
      <title>Testing length specialization</title>
      <conbody>
           <p</span>>This is a test for measurement units
           <p</span>>The wall is <length</span> units="UK">
                       <amount>10</amount><unit>ft</unit></length>
            long and <length</span> units="UK">
                  <amount>5</amount><unit>ft</unit></length> high.
           <p</span>>
      <conbody>
<concept>

Apart from forcing correct and complete markup, this specialization also allows automatic localization of the content. An XSL transform can change the <length> from the imperial (UK) to the metric (EU) system: Multiply the amount by 0.3048 and change the <unit> to “m”. Of course, rounding will have to be added to prevent endless decimals, but the technique can be fully automated based on the specialized markup.

Specializing attributes

In the above example, every occurrence of <length> requires entering the measurement system in its mandatory @units. Usually the entire topic, or at least a section in the topic, will be using the same measurement system for all its <length> elements. This prompts me to create a specialized attribute, which can then be made available on a large number of base elements at the same time.

Each specialized attribute must be defined in a separate file. The naming conventions use the new attribute name followed by "AttDomain". This file must be included in the document shell to make the attribute available.

To enable additional attributes on base elements, two extension points are included in the standard DITA files: These allow extending the @base and the @props attributes. The @props is meant to allow conditional content. If you wanted to filter your content on a specialized attribute, you would normally choose to extend @props. In most other cases, you would choose to extend @base.

For the @units specialization, I am using the predefined extension point for @base. This makes the new @units available on all elements that include @base. The easiest way to create the new attribute is copying an existing attribute specialization file. I have copied the file “deliveryTargetAttDomain.rng” from the “base/rng” subdirectory to the file “unitsAttDomain.rng” in the “units/rng” subdirectory. The edits I need to make to this file are the following:

      <define</span> name="unitsAtt-d-attribute">
            <optional>
                  <attribute</span> name="units">

                        <choice>

                              <value>EU</value>

                              <value>UK</value>

                              <value>US</value>

                        </choice>

                  </attribute>
            </optional>
      </define>
 
      <define</span> name="base-attribute-extensions" combine="interleave">
            <ref</span> name="unitsAtt-d-attribute"/>
      </define>

Including this file in the document shell makes the @units available on every element that has @base included in its attribute list.

Closing remarks

With the change to Relax NG for the definition of the standard DITA files, customizing DITA has become a very easy and straightforward job. The main part of the specializing job is choosing the base element to specialize from: Its content model has to accommodate at least all of the elements you need.

One of the key ingredients in creating good specializations is naming: Choose names that are easy to remember for the authors. At the same time, the element names must be unique and, if you are planning to pass your specializations to others outside your own organization, you should plan the names carefully. Otherwise, another specialization might cause naming conflicts with yours and limit the options to use multiple specializations in the same set of topics.

And finally, you should really think your specialization needs through before putting them into practice: Try to look further than your immediate requirements and see if you have included all conceivable options in your content model. Once your specializations are going to be used in actual content, it becomes harder to change the model. Getting it right the first time is worth the extra effort of imagining all the possible use scenarios before you nail down the naming and content model of your specialized DITA elements. If you want a geek philosopher’s advice, drop me an email.

References

[1] www.tcworld.info/e-magazine/technical-communication/article/customizing-dita-the-relaxing-way

[2] I will present a more elaborate measurement units domain at this year’s DITA Europe conference in Brussels. That presentation will also show an automatic transform of values from one measurement system into another.