No announcement yet.

Best file format for html with other stuff?

  • Filter
  • Time
  • Show
Clear All
new posts

  • Best file format for html with other stuff?

    I have html pages that I would like to compare revision control copies against sandbox copies against production copies. They all contain the same javascript, and the javascript includes strings containing reserved characters in HTML (the javascript composes a hyperlink tag, for example).

    In addition, all of the pages contain <PARAM> tags, each of which has a different VALUE attribute containing XML data, enclosed in single quotes. The XML data contains (not surprisingly) XML reserved characters, including "<" and ">" and so forth, which are also reserved in HTML.

    I've done comparisons using some of the HTML file formats and some of the XML formats (I can no longer remember which ones), and they seem to mangle the javascript and the VALUE attribute to varying degrees *behind the scenes*. That is, when I do the comparison, everything looks fine. If I copy from one side to the other, and then save the changes, the underlying file format algorithm escapes reserved characters, breaking the javascript (so far, I haven't detected any problems with the XML data, but I'm still looking). There is also an issue with how entities such as CR/LF characters ( &amp;#13;&amp;#10; ) that are included in the XML are handled. In some cases, they are being written as their character values rather than as HTML entities.

    I love the idea of displaying these files with the HTML formatted consistently, and I tried to hack together one that would also sort attributes the way the XML one does. But this cool feature is markedly less useful if attempting to save changes results in corrupting my files.

    Does anyone have any suggestions? Perhaps I'm just choosing the wrong file format, or perhaps there's a setting I can modify so it won't make these changes behind my back.


    Last edited by rprastein; 25-Oct-2010, 09:14 PM.

  • #2
    The default XML and HTML file formats won't reformat or mangle files. If you use the XML Tidy or HTML Tidy file formats available on our web site, they will standardize the formatting of files before they are compared. If your files are a mix of HTML, Javascript, and XML, then the Tidy file formats probably won't understand it all. In that case, using the unformatted HTML file format is probably your best option.

    To make BC use the default HTML file format, select "Tools > File Formats". Select the "HTML" format and move it to the top of the formats list.

    Both of the Tidy formats use a program named "HTML Tidy" to process text before it is compared. You can see the call to the program if you select "Tools > File Formats", select "XML Tidied", then go to the Conversion tab.

    For documentation on the options supported by HTML Tidy, see its web page:
    Chris K Scooter Software


    • #3
      Thanks for the link to the tidy web site. I found the page where they describe the configuration parameters, that should help me to figure out what is going on. The javascript is within comment tags, so in the absence of some configurataion setting that is escaping the quote marks and angle brackets, it *ought* to be left alone. If a single quote is a valid delimiter for an attribute value, then the XML ought to be left alone as well. But it looks like it's an issue with htmltidy, not specifically with BeyondCompare.

      For now, I just have to remember to use the tidied comparisons solely for visualizing differences, and limit saving changes to the built-in file formats.



      • #4
        Originally posted by Rebeccah View Post
        For now, I just have to remember to use the tidied comparisons solely for visualizing differences, and limit saving changes to the built-in file formats.
        I'd recommend updating the HTML Tidied file format to prevent you from accidentally saving the converted text.
        1. Pick Tools > File Formats...
        2. Position on the HTML Tidied file format.
        3. Switch to the Conversion page.
        4. In the Conversion section, check Disable editing.
        5. Click Save.
        6. Click Close.

        Erik Scooter Software


        • #5
          Thank you, that's an excellent idea.