About XML modes, XML formats and XML types

Build 1501 on 14/Nov/2017  This topic last edited on: 21/Mar/2016, at 18:34

The ‘story’ object type used for archived and ‘wire’ texts in GN4 and Tark4 contains the actual text in the xmlText attribute. This text can be in different XML formats; that are described by xmlFormat objects referenced in the stories by the xmlFormatRef attribute. So for example story texts can be in (X)HTML:

<story . . .>

  . . .

  <title>Google Denies Media Reports on Closure of China Site</title>

  . . .

  <xmlText>

    <div xmlns="">

      <span>Google Denies Media Reports on Closure of China Site</span>

      <span>, Office</span>

    </div>

    . . .

    <div xmlns="">

      <span>The story was not accessible on the </span>

      <span>ccidcom.com’s</span>

      <span> Web site this morning. </span>

    </div>

  </xmlText>

  <xmlFormatRef objectType="xmlFormat">

    <keyVal>XHTML</keyVal>

  </xmlFormatRef>

</story>

where the ‘XHTML’ XML format object is defined like this:

  <xmlFormat

    name="XHTML"

    namespaceUri="http://www.w3.org/1999/xhtml"

    isHtml="true"

    wordBreakTags="">

    <scopeRef>

      <keyVal>Default</keyVal>

    </scopeRef>

    <description>XHTML</description>

  </xmlFormat>

(This is similar to the system used in the old Tark where the text is stored in the Txt column of TxtTable and its format is specified by TypeId column)

The story XML text must be converted in various different formats depending on its use: to HTML to send to a Web site or preview in a browser, to ‘t’ GN4 typographical markup to create an article and so on. These conversions are done with XSL transformations that depend on both the XML text format and the desired destination format. What follows is a description of what has been implemented to standardize this system.

xmlMode

There is the ‘xmlMode’ object type defined in GN4Archive.xsd (so specific to Tark4 and GN4). Objects of these type are the different possible ‘rendering modes’ – i.e. outputs – i.e. destinations for XML text. They are basically just a name and a description identifying a rendering mode that the system ‘knows’ about. Here is the definition of two standard rendering modes for HTML and GN4 typographical markup outputs:

  <xmlMode name="HTML">

    <scopeRef>

      <keyVal>Default</keyVal>

    </scopeRef>

    <description>Rendering as HTML</description>

  </xmlMode>

 

  <xmlMode name="t">

    <scopeRef>

      <keyVal>Default</keyVal>

    </scopeRef>

    <description>Rendering as GN4 typographical markup</description>

  </xmlMode>

(they are in Config\Global\GN4_Tark4_Common\Tark4Data.xml).

‘renders’ attribute

The xmlFormat object type has the attribute ‘renders’ that is a multi-reference to xmlMode objects. Each reference has an extra attribute ‘xslt’ that contain the XSL transformation to be used to convert XML texts in that format to the referenced rendering mode – e.g.:

<xmlFormat

  name="XHTML"

  namespaceUri="http://www.w3.org/1999/xhtml"

  isHtml="true"

  wordBreakTags="">

  <scopeRef>

    <keyVal>Default</keyVal>

  </scopeRef>

  <description>XHTML</description>

  <renders>

    <ref>

      <keyVal>t</keyVal>

      <xslt>

        <xsl:stylesheet . . .>

 

         . . . XSLT converting XHTML texts into ‘t’ GN4 typographical markup . . 

       </xsl:stylesheet>

      </xslt>

    </ref>

    <ref>

      <keyVal>zot</keyVal>

      <xslt>

        <xsl:stylesheet . . .>

 

         . . . XSLT converting XHTML texts into the ‘zot’ rendering mode . . 

       </xsl:stylesheet>

      </xslt>

    </ref>

    . . .

  </renders>

</xmlFormat>

(This system corresponds roughly to the StyleSheetsTable of Tark and the import filter and display filter used in GN3 Wires)

‘xmlText’ REST command

The edo.ashx editorial REST interface (see http://forum.teradp.com/topic.asp?TOPIC_ID=646) has a new command ‘xmlText’ that gets the XML text of one or more stories applying the correct XSLT to render them for a specified mode, e.g.:

  .../edo.ashx? Cmd=XmlText&ids=2369&xmlattr=xmlText&formatAttr=xmlFormatRef&mode=t

outputs the XML contained in the ‘xmlText’ attribute (‘&xmlattr=xmlText ‘) of the object with id 2369 (‘&ids=2369’) rendering it for the mode ‘t’ – i.e. GN4 typographical markup (‘&mode=t’) and getting its format information – including the XSLT to use – from the xmlFormat object referenced by the xmlFormatRef attribute (‘&formatAttr=xmlFormatRef’).

This command works for ANY object that contains XML text and that references an xmlFormat object specifying the format of this XML. This is the case of the ‘story’ object in the standard schema, but can apply to other object types in different schemas that follow the same structure.

It is possible to specify multiple objects, and in such case the result is obtained concatenating the individual XML under a single root obtained from the first object – e.g. if the XML text of two objects is:

  <root>

    <a></a>

    <b></b>

  </root>

and:

  <xml>

    <c></c>

    <d></d>

  </xml>

The output is:

  <root>

    <a></a>

    <b></b>

    <c></c>

    <d></d>

  </root>

If either the format attribute or the mode are not specified in the URL the command does not apply any XSL transformation, and returns just the original content of the XML text attribute.

Drag&drop of a story onto an article

The story drag&drop script uses the xmlText command detailed above and the InsTedXml() scripting command (see http://forum.teradp.com/topic.asp?TOPIC_ID=633) to convert all the stories text into ‘t’ GN4 typographical markup and then insert it – so that the original formatting (in whichever format) is preserved as much as possible (if a suitable XSLT is available in the xmlFormat object of the stories being dropped of course).

xmlFormat

Find an example of xmlFormat in the config\Data\GN4\xmlFormat_XHTML.xml.

xmlText

The 'xmlText' has a couple of additional parameters:

formatName: specifies directly the name of the xmlFormat object to use - e.g.:

  .../edo.ashx? Cmd=XmlText&ids=2369&xmlattr=xmlText&formatName=NITF&mode=t

renders the xmlText attribute of the object with id 2329 as if its format is 'NITF'. Note that when an xmlFormat name is specified explicitely in this way the formatAttr option is ignored.

pars: specifies named parameters that are passed 'as is' to the XSL transformations used to render the XML.

XSLT to convert

XSLT to convert from XHTML to 't' markup accepts various forms of XHTML and supports the same kind of character replacements that were done with ITB in GN3 to 'clean up' wire texts - e.g. replace '--' with and EN dash, generate smart quotes etc. These replacements are in a (fairly) straightforward format within the XSLT:

  . . .

  <!-- FIX DOUBLE HYPHENS TO DASHES -->

  <r p="\-\-" r="–"/>

  <!--  REMOVE ERRANT CHARACTER 160 -->

  <r p="\#160;" r=" "/>

  <!-- CHANGE 2 SINGLE QUOTES to DOUBLE QUOTE -->

  <r p="\'\'" r="""/>

  <!--  CHANGE SPACE SINGLE QUOTE to SPACE OPEN SINGLE QUOTE -->

  <r p="\ \'" r=" ‘"/>

  <!--  FIX QUOTE AFTER EXCLAMATION -->

  <r p="\!\"" r="!”"/>

  . . . 

the value of the 'p' attribute is a regular expression, the value of the 'r' attribute is the corresponding replacement string.

It handles also the HTML in various different formats - e.g.:

   . . . first paragraph. . . .

   </br>

   . . . second paragraph . . .

   </br>

   . . . 

or:

   <p>. . . first paragraph. . . .</p>

   <p> . . . second paragraph . . .</p>

   . . . 

or:

   <div>. . . first paragraph. . . .</div>

   <div> . . . second paragraph . . .</div>

   . . . 

or:

   <html>

     . . . 

     <body>

       <p>. . . first paragraph. . . .</p>

       <p> . . . second paragraph . . .</p>

       . . . 

     </body>

   <html>

etc. Furthermore HTML tags can be lowercase, or upper case, or mixed-case, and they can be in the XHTML namespace (http://www.w3.org/1999/xhtml) or in no namespace. The XSLT handles the cases in the example above, but with ad-hoc tricks, and there are many cases that are not handled correctly (e.g. <p> inside <div>).

This XSLT is in the definition of the 'XHTML' xmlFormat object, that is in a separate file:

Data\GN4\xmlFormat_XHTML.xml

You may want to reload also Global/GN4/xsl_AssetToEditorial.xml.

'transformXmlText' REST command

The edo.ashx editorial REST interface (see http://forum.teradp.com/topic.asp?TOPIC_ID=646) has a command ‘transformXmlText’ that receives POSTed XML, applies the correct XSLT to render it for a specified mode and returns the result, e.g. posting the XML text of a story to:

../edo.ashx?Cmd=transformXmlText&formatName=xhtml&mode=t

outputs the story XML rendered for the mode 't' and assuming that it is in the 'xhtml' format.

'transformXmlText' XSLT extension function

The XSLT extension function 'transformXmlText' applies the configurable rendering of XML text from within a XSL transformation, e.g:

  <xsl:stylesheet

    version="1.0"

    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"

    xmlns:edfn="http://www.teradp.com/schemas/GN4/1/EditorialXslt"

    . . .>

 

    . . . 

    <xsl:template match="gn4:story">

       . . .

      <body name="{$artName}">

        <xsl:copy-of select="$artFolderRef"/>

        <xsl:variable name="xmlFormat" select="gn4:xmlFormatRef/nav:refObject/gn4:xmlFormat"/>

        <tText>

          <xsl:copy-of select="edfn:transformXmlText(gn4:xmlText,$xmlFormat/@name,'t',$pars)/node()"/>

        </tText>

      </body>

      . . .

    </xsl:template>

 

    . . .

 

  </xsl:stylesheet>

transforms a 'story' object creating a 'body' object with a text obtained rendering the story text for the 't' output mode, based on the XML format of the story itself.

This is the preferred way to convert XML text in various format to generate previews or new objects.

XML type

Both the xmlFormat and xmlMode object types have a new (optional) enumeration attribute 'xmlType' with possible values 'html', 'nitf' and 't':

xml1

xml2

They are used to group together XML formats that share the same underlying structure - for example the text of wire stories all in NITF but with some special tabular data that requires a different format, so there will be two formats but both using NITF markup.

Similarly the XML type is used to group together XML modes that produce the same underlying structure - for example different version of the 't' GN4 markup with more or less output details.

On older systems, you may need to re-import your schema, the 'Config\Strings\ArchiveStrings.xml' strings file and the 'Config\Data\GN4_Tark4_Common\Tark4Data.xml', 'Config\Data\GN4\xmlFormat_XHTML.xml' data files (in this order - the latter modifies an object created by the former).