No announcement yet.

Odd XML Tidy / Rule Behavior

  • Filter
  • Time
  • Show
Clear All
new posts

  • Odd XML Tidy / Rule Behavior

    BC Build 241
    XML/HTML Tidy Rule installed (Ver 1.0.2, rel 10-Sep-2007)

    When batch-comparing 2 folders of XMLs (all files with identically matching names):

    As the picture show - note the red circles
    It appears BC would add (<font color="red">in memory, not to the actual files</font>)
    matching closing tag to empty element on only ONE side of the comparison and mark the objects as different.

    So in the case above, both files ACTUALLY have the identical tag "<font color="red"><Class_Cd /></font>", but in comparing, BC changes it to "<font color="red"><Class_Cd></Class_Cd ></font>" and raise difference

    This does not happen when comparing single files one-to-one.

    Is there a way to fix or get around this behavior ?

  • #2
    Re: Odd XML Tidy / Rule Behavior

    Is the concern that Tidy appears to change only one side of the compare and not the other?

    Tidy will restructure both sides, but it should do so equally. Would you be able to send your files to [email protected] along with your BC (Help menu -> Support; Export).

    Another thing you could try is updating the program to 2.5.2. All minor version upgrades are free (2.x -> 2.x). It seems that you are using a fairly old version of bc2, with a fairly new version of tidy.

    Tidy itself is not our program, we merely allow its use as an external conversion tool. More information on Tidy documentation can also be found here:
    Aaron P Scooter Software


    • #3
      Re: Odd XML Tidy / Rule Behavior

      Is the concern that Tidy appears to change only one side of the compare and not the other?....
      Yes, that's the PROBLEM.
      I just upgraded to 252 - still see the same problem.
      Sorry I can not send out the files as they're company property, and I don't think that's necessary for you to replicate the issue.

      If you understand the issue I'm trying to point out and given that you're in the file comparison business,
      I'd think it's a piece of cake for you to get hold of ONE test XML and replicate the issue...heck here is one

      Edit the XML :
      Just pick any element and turn <font color="red">'<someelement>blah blah</someelement>'</font> into <font color="red">'<someelement />'</font>

      put x number of editted copy in one folder
      put x number of unedited copy in another folder

      compare them via Tidy rules.....


      • #4
        Re: Odd XML Tidy / Rule Behavior

        So, it isn't that Tidy is changing the file on one side, it is that it is not changing either file.

        The issue is file A has:

        and file B has:
        <xml />

        and they are not being detected as equal?
        I'm afraid we don't make Tidy, but that it is an open source project that we allow to interface with our application.

        I have been looking through the Tidy documentation (available at )
        and have not found an option for turning one type of empty node into the other. If the option exists, it can be added to the rule and used by altering the command line call, or the tidyConfig.txt that is called.
        Aaron P Scooter Software


        • #5
          False Alarm !

          Aaron, Sorry, my bad

          First -
          I should have said in my 2nd post:
          put x number of the editted copy in one folder_x
          put x number of the SAME editted copy in folder_y

          Second -
          But..this turns out to be "false alarm"...due to my ignorance.

          I printed one set of the XMLs out of Oracle, and I wasn't aware Oracle turns '<xml></xml>' into '<xml />'

          again, sorry about the trouble


          • #6
            Re: False Alarm !

            That's alright. Just so you know, it is possible Tidy has a configuration that will convert all <tag></tag> into <tag /> or vice versa. I was just unable to find the information on their web page.
            Aaron P Scooter Software


            • #7
              Should still be marking those files as identical.

              Are you sure the XML Tidy rule is actually engaged? Look in the rule dropdown for the check mark. There is also a basic "XML" rule in BC2 which only defines rules and not a conversion.

              Some cursory testing here confirms what I suspected, that Tidy with the config shipped in the archive will use implicit closure (<xml />) for empty elements.

              Regardless of how Oracle is spitting the XML, tidy should be rendering the examples you pasted to be identical (as they are, semantically speaking). I suspect you're not actually using the "Tidy" rule here. See if it's the uppermost rule for the *.xml extension on your options page.

              The XML Tidy Sorted rule improves comparison further by sorting attributes within elements. For more complex cases where element order varies without changing the sense of the document, I have a conversion rule which runs the document through an XSLT transform to reorder the elements, before passing it though XML Tidy Sorted. But this is a one-conversion-per-schema thing and without a schema-aware XML sorter (and a schema that accurately reflects how element order affects the sense of the document) you can't do it generically. For these cases it can be helpful to cook up a schema-specific file extension so BC can select the correct transform (one rule per transform).