No announcement yet.

HTML Comparison

  • Filter
  • Time
  • Show
Clear All
new posts

  • HTML Comparison


    I have a problem comparing html files. One is not really standard html and does not contain for example entity references, whereas the other does.
    I'd like to ignore these diffs (ie. ä should be the same as ä or ' should be the same as &apos. Therefore, I defined a number of replacements. In this way, I can manage to define these characters as unimportant, but strangely enough the characters following the replaced ones are still displayed as being different, while they are not in reality.

    Any idea how to solve this?


  • #2

    Defining the Text Replacement is the strategy I'd recommend. Could you go into more detail on what you mean by the 'following characters'? Are these characters that are not part of the replacement, or otherwise look like they are equal but showing up as different?

    Could we get a full screen screenshot showing this comparison? You could post here or email us at [email protected] (please include a link to this forum thread for our reference).

    Two initial hunches:
    In the BC3 Text Compare View menu -> enable Alignment Details. Are the characters lining up as you would expect or is the character alignment off?

    In the top status bar of each individual pane, we show the detected format name. Are both sides "HTML", or are they detecting as different formats due to different file extensions? If they are different formats, it is possible they are also detecting different grammar elements for the same text; if the text is aligned and looks equal but is a different grammar element (Comment vs. No Comment) it would be a difference.
    Aaron P Scooter Software


    • #3
      Hi Aaron,

      thanks for your reply.
      Yes, these are characters that are not part of the replacement, but are displayed in red, although they are completely equal, as far as I can see.

      Here is a screenshot:

      As you can see, I have alignment details enabled and the letters are well aligned. At the beginning of line 81 (which is also displayed in the lower pane), you can see that the replacement of é by é really works fine. But the replacment of ' by ' works only in so far that the apostrophe is no longer marked in red, but the following characters (ie. aptitude d) are now marked in red, although they are not different at all.

      No, both files are HTML and both are detected as "HTML UTF-8".



      • #4
        Hi Aaron,

        any idea what went wrong with my HTML comparison?
        I get so many useless diffs that I feel a little bit lost!

        Thanks in advance!


        • #5

          If you click into the word in the interface (not highlight, but click the blinking cursor between a couple of red characters), the display will show the detected grammar item in the bottom status bar of each pane.

          My hunch is that one side is a String (Between ' characters), and the other is not. A difference in grammar type would be a difference, and will mark those characters as such. You could delete the String ' to ' grammar element for HTML to prevent this from being an important difference.

          If that does not seem to be the issue, we could use a pair of example files and your settings. Please email these to [email protected] along with a link back to this forum thread for our reference. Please also include the from the Help menu -> Support; Export. Thanks.
          Aaron P Scooter Software