No announcement yet.

Unicode LRM (Whitespace) - How To Ignore?

  • Time
  • Show
Clear All
new posts

  • Unicode LRM (Whitespace) - How To Ignore?

    Comparing two versions of an MS Word document several lines flagged as containing important differences appear to be identical. When viewed in hex the differences turn out to be Unicode "left to right" marks at the end of the line. For example:

    left = 69 73 73 75 65 73 E2 80 8E 0D
    right = 69 73 73 75 65 73 0D 0A

    How do I set a rule such that E2 80 8E 0D is seen to be identical to 0D 0A?

    Regards, AB

  • #2

    In the Text Compare, the Line Ending character is normally unimportant by default. You can use the toolbar or View menu -> Visible Whitespace to show the character in the main Text Compare window to see the difference graphically.

    Then, in the Session menu -> Session Settings, Importance tab, checked items are Important and unchecked are Unimportant. If "Compare Line Endings" is enabled, you can uncheck it to make this type of difference unimportant. This would apply for just the current session, unless you change from "Use for this view only" to "Also update session defaults"
    Aaron P Scooter Software


    • #3
      I checked the BC v4 Beta build 17905 session settings and they seem to be OK. All of the whitespace boxes (leading, embedded, trailing) are unchecked under Grammar Elements and Compare Line Endings is unchecked under Miscellaneous. Maybe something to do with the document type which is MS Word Documents Extended?

      Regards, AB


      • #4
        Could you post or email example files with the required text to add these Unicode markers to only some lines and not others? If you email us at [email protected], please include a link back to this forum thread for our reference.

        What results do you see if you use the default MS Word format conversion?
        Aaron P Scooter Software


        • #5
          Aaron, unfortunately the underlying files are confidential so I have sent an email to Scooter support with a screen grab showing the issue.

          Re format conversion...

          With the exception of these strange Unicode whitespace mismatches, the differences picked up by BC4 using the MS Word Documents Extended template are the same as when I use MS Word's built in Legal Black Line document compare. If I use BC's non-extended MS Word Documents template, BC4 (and the same was true with BC3) produces a completely different set of differences so I always use the Extended template.

          As a general question/suggestion - I can create my own text equivalences using left and right text definitions - e.g. left text "Customer" == right text "Acme". It would be useful to be able to specify hex strings as an alternative to text - e.g. left hex E280BE0D == right hex 0D.

          Regards, AB