No announcement yet.

Detect Character Encoding Problem

  • Filter
  • Time
  • Show
Clear All
new posts

  • Detect Character Encoding Problem

    I was banging my head again a weird text comparison the other day and I have not had any encoding related problems in beyond compare before so it took me quite some time to figure out that this was the case.

    I had two UTF-8 encoded files but the second happened to only contain ANSI the first 65000 or so characters. This meant that the first file was correctly determined to have UTF-8 encoding while the second one was incorrectly determined to have ANSI encoding. When special characters started to appear this of course looked very odd. The encoding detection error is 100% repeatable, you just need to create a file with about 65000 non-special characters in the beginning and then have any special UTF-8 encoded character after this.

    Even if this is fixed I think that the current highlighing of a mismatch in the detected encoding between the compared files is not obvious enough, I must have missed it for at least 30 minutes and had I seen it earlier I would not have had to spend that much time on it. It would be nice if the highlight was a lot more obvious and really attacts your attention so that you cannot miss it.
    Last edited by CrouZ; 29-May-2012, 04:28 PM.

  • #2
    Hello CrouZ,

    Thanks for the feedback. Expanding our detection capabilities and buffer is on our wishlist; I'll add your notes to that entry.

    Altering the UI or marking the files as different is something we have and are considering, but we also want to support comparing files of different encoding types without adding something akin to a warning dialog. It isn't quite as easy to mark the files as different without making it distracting when it is a known case.
    Aaron P Scooter Software


    • #3
      Yep - would a command line option to detect the encoding of a file.
      Solaris has a great utility command line utility - auto_ef - but it's a lot of work to load a Solaris VM just to run a script on a bunch of files.