Announcement

Collapse
No announcement yet.

Incorrect mismatch indicated on a file containing 0xBF

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Incorrect mismatch indicated on a file containing 0xBF

    (This is my first post, so if I've violated protocol in some way, please be kind!)

    I'm using BC 3.1.11, Build 12204 (the latest, right?) to look for diffs on a JavaScript file that I've just edited. The point of the edit was just to expand tabs and I did that using "expand -4".

    Within this file are two lines:

    format = format.replace(/\\_/g, "~�");
    :
    format = format.replace(/~�/g, "_");


    Then, when I compare the original and the edited versions, BC flags the last three characters within the strings "~�" as a significant difference. On one side (the "new file" side), it shows these four characters as shown above. On the other side (the "old file" side), it shows these characters as "~". (Actually, that character is shown as an empty red box, but the forum software interprets it differently.)

    But the hex details are where it really gets interesting! For both the old and the new versions, the bytes in that string are shown as 7E EF BF BD. That is, the exact same bytes but shown as mis-matching!

    Another interesting aspect of this is that the mismatch only seems to occur based on the presence of earlier stuff in the file. That is, if I try to reduce the 2177-line file to a simpler case (for example, just the 223 lines of that affected JavaScript routine), the former "mis-match" isn't flagged at all!

    Also, if I edit out the 0xBF character, the mis-match isn't flagged. That is, neither the 0x7E, 0xEF, nor 0xBD cause BC any trouble.

    Suggestions? It feels a little bit like BC is treating one side as Unicode or some such and the other as ASCII, but other than the two lines I've shown you, the file is entirely ordinary 7-bit ASCII; other than the three charafters shown above, there are no characters anywhere else in the file from the 0x80-0xFF range and no C0 control characters other than <CR>, <LF>, and <TAB>.

    Atlant

  • #2
    Hello,

    BC3 could be treating one side as one type of encode and the other as another. In the top status bar of each pane, it should show the currently detected encoding. The detection may not work well for your specific files. It can be manually overridden. If this is the case, we'd like to see your files to see why that is the case.

    Could you attach a full screen screenshot demonstrating the issue? Or email us a screenshot, pair of example files, and your support package (Help menu -> Support; Export).
    Our email is [email protected]
    If you email us, please include a link back to this forum post.
    Aaron P Scooter Software

    Comment


    • #3
      Yes, encoding sounds exactly right.

      I'll take a look on Monday and report back.

      And "Thanks!" for your prompt reply!

      Atlant

      Comment


      • #4
        Aaron:

        Yes, encoding detection is definitely the problem.

        BC3 is deciding that the version of the file with hardware tabs is "UTF-8" and that the version of the file without tabs is ANSI. (Thanks for pointing out those fields at the top of the two panes!)

        When I overrode the entabbed pane to also be ANSI, the mismatches were no longer flagged as "significant" and the characters within the double-quoted strings were now displayed identically ("~�").

        I'll E-mail you the files, screen-shot, and the support package. But meanwhile, I've got a usable work-around to my problem -- Thanks!

        Atlant

        Comment

        Working...
        X