(This is my first post, so if I've violated protocol in some way, please be kind!)
I'm using BC 3.1.11, Build 12204 (the latest, right?) to look for diffs on a JavaScript file that I've just edited. The point of the edit was just to expand tabs and I did that using "expand -4".
Within this file are two lines:
format = format.replace(/\\_/g, "~�");
:
format = format.replace(/~�/g, "_");
Then, when I compare the original and the edited versions, BC flags the last three characters within the strings "~�" as a significant difference. On one side (the "new file" side), it shows these four characters as shown above. On the other side (the "old file" side), it shows these characters as "~�". (Actually, that character is shown as an empty red box, but the forum software interprets it differently.)
But the hex details are where it really gets interesting! For both the old and the new versions, the bytes in that string are shown as 7E EF BF BD. That is, the exact same bytes but shown as mis-matching!
Another interesting aspect of this is that the mismatch only seems to occur based on the presence of earlier stuff in the file. That is, if I try to reduce the 2177-line file to a simpler case (for example, just the 223 lines of that affected JavaScript routine), the former "mis-match" isn't flagged at all!
Also, if I edit out the 0xBF character, the mis-match isn't flagged. That is, neither the 0x7E, 0xEF, nor 0xBD cause BC any trouble.
Suggestions? It feels a little bit like BC is treating one side as Unicode or some such and the other as ASCII, but other than the two lines I've shown you, the file is entirely ordinary 7-bit ASCII; other than the three charafters shown above, there are no characters anywhere else in the file from the 0x80-0xFF range and no C0 control characters other than <CR>, <LF>, and <TAB>.
Atlant
I'm using BC 3.1.11, Build 12204 (the latest, right?) to look for diffs on a JavaScript file that I've just edited. The point of the edit was just to expand tabs and I did that using "expand -4".
Within this file are two lines:
format = format.replace(/\\_/g, "~�");
:
format = format.replace(/~�/g, "_");
Then, when I compare the original and the edited versions, BC flags the last three characters within the strings "~�" as a significant difference. On one side (the "new file" side), it shows these four characters as shown above. On the other side (the "old file" side), it shows these characters as "~�". (Actually, that character is shown as an empty red box, but the forum software interprets it differently.)
But the hex details are where it really gets interesting! For both the old and the new versions, the bytes in that string are shown as 7E EF BF BD. That is, the exact same bytes but shown as mis-matching!
Another interesting aspect of this is that the mismatch only seems to occur based on the presence of earlier stuff in the file. That is, if I try to reduce the 2177-line file to a simpler case (for example, just the 223 lines of that affected JavaScript routine), the former "mis-match" isn't flagged at all!
Also, if I edit out the 0xBF character, the mis-match isn't flagged. That is, neither the 0x7E, 0xEF, nor 0xBD cause BC any trouble.
Suggestions? It feels a little bit like BC is treating one side as Unicode or some such and the other as ASCII, but other than the two lines I've shown you, the file is entirely ordinary 7-bit ASCII; other than the three charafters shown above, there are no characters anywhere else in the file from the 0x80-0xFF range and no C0 control characters other than <CR>, <LF>, and <TAB>.
Atlant
Comment