No announcement yet.

Character Encoding Question

  • Time
  • Show
Clear All
new posts

  • Character Encoding Question

    I'm really liking Beyond Compare, but I have been having issues with character encoding as of late (in 2-way text comparison of course). I don't think BC is handling things badly, but it's confusing me. Here's the behavior I'm seeing:

    When I have a file encoded in cp936 (Chinese simplified), and I open it using BC when Windows regional settings are set to use cp936 as default for non-Unicode programs, BC shows it has ANSI encoding (which is incorrect I think), but displays it appropriately.

    When I open the same file when Windows regional settings are set to English (United States) BC shows the encoding as 00936 (ANSI/OEM) Chinese Simplified, which is correct, and the file looks the same as in the first example.

    So, a couple of questions on this. First, why does the encoding show as ANSI when I'm in Chinese regional settings? It doesn't seem like the correct designation. Second, why can I display the file in cp936 when BC detects the encoding, but I can't force encoding to cp936 from the drop-down for other files? I see GB2312, 18030 and HK, as well as a traditional Chinese form in the drop-down, but I don't see anything specifically desigated as cp936. I don't really know how the drop-down is generated either, since it seems to change depending on what computer I'm using.

    Many thanks for any help you can give.

  • #2
    Hi Anthony,

    "ANSI" is an alias for the encoding defined in the Windows regional settings. When we detect the character encoding we explicitly change the display to "ANSI" if it matches that, to simplify the user interface for most users in most cases.

    The dropdown list includes a limited subset of the installed codepages and matches the list that Internet Explorer has in its Encoding menu. The only reason it would vary is if one of the computers had code pages installed that the other didn't. If the current codepage (detected or overridden) isn't on the list we add it explicitly.

    The Session Settings dialog's "Format" tab has Encoding comboboxes that include every code page installed on the system, and if you select it there it will appear on the main window's dropdown list.

    According to Wikipedia code page 936 has been superseded by 54936, which is included in the editor dropdown as "Chinese Simplified (GB18030)". I don't know enough about the relevant encodings to comfortably customize the list, so I'd say use 54936 if you can. If you have to use 936 you'll need to pick it from the full list.
    Zoë P Scooter Software


    • #3
      Aha! So there is a full list after all. I couldn't find it anywhere. Thanks for the help.