Summary: Beyond Compare 3 fails to match two files that differ only in CRLF/LF modes.
I have two files with identical textual content but differing EOL modes. They're attached as "a.txt" (PC:CRLF) and "b.txt" (Unix:LF)
Beyond Compare 3 fails to match these two files without settings tweaking, making comparing directories containing numerous such files an extreme pain.
I've attached my result of the "a.txt <-> b.txt" comparison as "comparison.png". Note that "ANSI" on my system is the same as "Japanese (Shift-JIS)". To get the same result with an English locale OS, one would probably need to select "Japanese (Shift-JIS)" for the files' encodings.
The problem seems occur because 1) the files are "Japanese (EUC)" but are decoded as Shift-JIS and 2) the Shift-JIS decoder is consuming the EOL as the "second-byte" of a multi-byte Shift-JIS character.
These are the flaws that I think arise in this scenario:
1. The files are auto-detected as "ANSI" when they should be "Japanese (EUC)".
2. Code-page decoding and EOL handling is done in such a way that code-page decoding takes precedence. (Otherwise a failed decoding wouldn't matter, as evident from a "b.txt <-> b.txt" comparison.)
3. The Shift-JIS decoder is able to consume an EOL as a second-byte when neither LF nor CR should be part of a double-byte character[1] to begin with.
I know that 1 is probably unfixable.. but perhaps 2 or preferably 3 could be remedied?
[1] http://en.wikipedia.org/wiki/Shift_J...t_JIS_byte_map
The attached files are textually identical. Their modes are:
a.txt = Japanese EUC, CRLF
b.txt = Japanese EUC, LF
c.txt = Japanese Shift-JIS, CRLF
d.txt = Japanese Shift-JIS, LF
The last two are attached merely for convenience.
I have two files with identical textual content but differing EOL modes. They're attached as "a.txt" (PC:CRLF) and "b.txt" (Unix:LF)
Beyond Compare 3 fails to match these two files without settings tweaking, making comparing directories containing numerous such files an extreme pain.
I've attached my result of the "a.txt <-> b.txt" comparison as "comparison.png". Note that "ANSI" on my system is the same as "Japanese (Shift-JIS)". To get the same result with an English locale OS, one would probably need to select "Japanese (Shift-JIS)" for the files' encodings.
The problem seems occur because 1) the files are "Japanese (EUC)" but are decoded as Shift-JIS and 2) the Shift-JIS decoder is consuming the EOL as the "second-byte" of a multi-byte Shift-JIS character.
These are the flaws that I think arise in this scenario:
1. The files are auto-detected as "ANSI" when they should be "Japanese (EUC)".
2. Code-page decoding and EOL handling is done in such a way that code-page decoding takes precedence. (Otherwise a failed decoding wouldn't matter, as evident from a "b.txt <-> b.txt" comparison.)
3. The Shift-JIS decoder is able to consume an EOL as a second-byte when neither LF nor CR should be part of a double-byte character[1] to begin with.
I know that 1 is probably unfixable.. but perhaps 2 or preferably 3 could be remedied?
[1] http://en.wikipedia.org/wiki/Shift_J...t_JIS_byte_map
The attached files are textually identical. Their modes are:
a.txt = Japanese EUC, CRLF
b.txt = Japanese EUC, LF
c.txt = Japanese Shift-JIS, CRLF
d.txt = Japanese Shift-JIS, LF
The last two are attached merely for convenience.
Comment