No announcement yet.

Controlling xml sorting in tidy plugin

  • Filter
  • Time
  • Show
Clear All
new posts

  • Controlling xml sorting in tidy plugin

    How or can you control the depth level of the sorted xml option with the tidy pluggin? I have some complex xml with several levels of hierarchy. each tag has a name= attribute. If it sorted the tags by name within each level, then BC would be able to correctly show what tags have been moved / removed or modified.

    Here is a sample xml snippet:

    <?xml version="1.0" encoding="UTF-8"?>
    <recordType name="header" discriminator="FILEHEADER" >
    <field length="200" name="id" conversion="String" loggingRequired="false"/>
    <field length="10" name="messageTotalCount" conversion="Integer" loggingRequired="false"/>
    <field length="15" name="messageTotalDollar" conversion="Money" loggingRequired="false"/>
    <recordType name="operationDetail" discriminator="OPSDTLHDR" >
    <field length="4" name="companyNumber" conversion="Integer" loggingRequired="false"/>
    <field length="8" name="opsArea" conversion="String" loggingRequired="false"/>
    <field length="16" name="operator" conversion="String" loggingRequired="false"/>
    <recordType name="inputSet" discriminator="INPTSTHDR" >
    <field length="10" name="inputSetName" conversion="String" loggingRequired="false"/>
    <field length="50" name="source" conversion="String" loggingRequired="false"/>
    <field length="10" name="processDate" conversion="Date" loggingRequired="false"/>
    <field length="10" name="controlDebitTotalCount" conversion="Integer" loggingRequired="false"/>
    <field length="10" name="controlCreditTotalCount" conversion="Integer" loggingRequired="false"/>
    <recordType name="addUCO" discriminator="UNIFIEDCO" >
    <field length="10" name="chargeOffDefinitionCode" conversion="String" loggingRequired="false"/>
    <recordType name="chargeOff" discriminator="CHARGEOFF" >
    <field length="10" name="purpose" conversion="String" loggingRequired="false"/>
    <field length="10" name="accountStatus" conversion="String" loggingRequired="false"/>
    <field length="1" name="buydownIndicator" conversion="Boolean" loggingRequired="false"/>
    <field length="10" name="closedToPostingCode" conversion="String" loggingRequired="false"/>
    <recordType name="recentHistory" discriminator="--ignore--" >
    <field length="10" name="lastContactedDate" conversion="Date" loggingRequired="false"/>
    <field length="10" name="lastActivityCode" conversion="String" loggingRequired="false"/>
    <field length="10" name="lastActivityDate" conversion="Date" loggingRequired="false"/>
    <field length="100" name="lastDocumentSent" conversion="String" loggingRequired="false"/>
    <field length="10" name="term" conversion="String" loggingRequired="false"/>
    <recordType name="accountUAFAttributes" discriminator="ACCOUNTUAF" >
    <field length="10" name="activityCount1" conversion="Integer" loggingRequired="false"/>

  • #2
    The Sorted name refers to sorting the attributes, not the nodes themselves. The Rule calls the open source project Tidy ( You can alter the config or command line call to perform sorting if the tidy.exe allows it (Config is in the HTMLTidy folder, the command line call in the Rules->Conversion tab). I have looked into it briefly and did not find a command line call for the sorting of nodes. If you find one, please let us know.
    Aaron P Scooter Software


    • #3
      I write a stylesheet to sort the elements appropriately and script the transform into the conversion step using a command-line XSLT tool ; I use Saxon.NET. Then I also put it through the XML Tidy / Sorted attributes step.

      The downside to this approach is that you need to write a stylesheet for each schema that you want to compare. A generic schema-aware comparator would be lovely, but it's a hard thing to write, and even worse, won't always work properly, simply because a lot of schemas just don't express some of the rules inherent in the applications that produce the XML - for example, in some applications, order matters, but the schema may say "group" instead of "sequence". And in a lot of places you won't have a schema. So while this is an extra piece of work, it's a lot more expressive than a GUI app can be and a lot more specific and controllable than a generic schema-aware or schema-inferring compare could be.

      For a stylesheet, start with the identity transform and add templates which match each element whos content you wish to sort, along with the relevant sort elements. To select the stylesheet ; well, you could write a conversion script that can detect which one to use or use different file formats and let BC detect them.