No announcement yet.

Generated html report has wrong encoding UCS-2 LE BOM

  • Filter
  • Time
  • Show
Clear All
new posts

  • Generated html report has wrong encoding UCS-2 LE BOM

    Generating an html report from two xml files encoded as UTF-8 generates a report encoded as UCS-2 LE BOM per Notepad++.

    Firefox is able to display the page correctly. However I am first saving the file to the db as a clob via python.

    Something like:
    where test3.html is the report generated by BC.
    Here is some output from the above code:

    *■< ! D O C T Y P E H T M L P U B L I C " - / / W 3 C / / D T D H T M L
    4 . 0 1 T r a n s i t i o n a l / / E N " " h t t p : / / w w w . w 3 . o
    g / T R / h t m l 4 / l o o s e . d t d " >
    < h t m l >

    < / h t m l >
    So there is definitely something in the front mucking things up.

    I read that BC should output the report in the format of the left input file and that is working as expected for any files that are not .XML

    I also double checked and Notepad++ opens the left and right XML files as UTF-8, so the encoding of the input files appears to be ok.

    How can I get my report to be generatated in UTF-8.

    Version 3.3.12 b 18981

  • #2
    Hello Dennis,

    Which HTML report layout would you be generating, and in the interface which Encoding is detected for the files in the upper status bar?

    Testing with two files in the Text Compare that detect as UTF-8, and the generated HTML report (Side-by-side layout) is generated as UTF-8.
    I tested this with BC3.3.13 and BC4.1.8. All minor updates are free, so you can update to BC3.3.13 here:
    Beyond Compare is a multi-platform utility that combines directory compare and file compare functions in one package. Use it to manage source code, keep directories in sync, compare program output, etc.

    If you can email [email protected] in your current (Help menu -> Support; Export) and a pair of sample files, let us know which report to generate to test. Please include a link back to this forum thread for our reference.
    Aaron P Scooter Software


    • #3
      I am not sure if I would be allowed to send in the files per our company policy. I appreciate the assistance but for now I just hacked around the issue. For others in the same boat here is the hack I ended up creating.

                  concomitant_pk = qa_query().add_concomitant(, 
                                                              'r', encoding = 'utf16').read())
              except UnicodeError: 
                  concomitant_pk = qa_query().add_concomitant(beyond_compare=open(bc_path,'r').read())
      So just check for utf16 (apparently that is the same as UCS-2).