No announcement yet.

"conversion error" with PDF plugin.

  • Time
  • Show
Clear All
new posts

  • "conversion error" with PDF plugin.

    I'm trying to compare data sheets with the PDF addon. I use to just get a text comparison. After updating to the latest BC3 version and installing the plugin all I get is a conversion error when I try to compare PDF files. Did this break or is it that the PDF format has been updated once again...

  • #2
    PDF support is built in to BC3. You don't need to install anything to compare PDF files.

    To compare a PDF file in BC3, open it in the Text Compare.

    Check the "Tools > File Formats" dialog. The file format for PDF files is included with BC3 and should be listed in the bottom half of the format list. Check the top of the list for any new file formats you might have added that map to the *.PDF file extension. If there are any new PDF settings at the top of the list, uncheck the box next to the format to disable it.
    Chris K Scooter Software


    • #3
      PDF documents not converting for Text Comparison

      I am having the same problem - PDF's don't convert.
      Email Support advised that I check the Security of the documents in Adobe Reader.
      "Page Extraction" is set to not permitted, and I can't change it.
      The documents were created in Nuance Power PDF 2.x and when they are opened in that application, there is no Security restriction for page extraction.

      Workarounds I have tried include:

      1. Copy the document and try to compare on that - Support said that would get rid of the 'lock'. Didn't work.
      2. Print to XPS - didn't work and use those documents instead of the originals - didn't work
      3. Print to Microsoft PDF - didn't work either

      1.When I open the documents in Nuance Power PDF v2.x, and I open “Security, Manage Security, the Page Extraction shows as “Allowed”.
      So, why are they showing as “not allowed” in Adobe, and is this really the reason they aren’t converting?
      Given that the copy/paste of the file to create a new copy didn’t work…and given that the copy was made from a “Save As’ copy of the original in Nuance Power PDF, the workaround isn’t happening.
      I thought I could change the Security in Nuance Power PDF, but they are already open…everything is ‘allowed’.

      I can't contact Nuance support because I can't remember my username and their website has no mechanism for recovering it. I will have to call Monday and hope they don't try to charge me.

      Other workarounds I've looked at include use 3rd party s/w such as CuteFTP but I'm reluctant to muck around anymore until I really understand what the problem is. Support suggests Linux, but I don't have it installed on my Windows machine and do not know how to use it.

      Support quotes:

      "Beyond Compare can open any file that has "Extractable text: True" when you check its properties, using Acrobat Reader. Or try our Linux product if that is available."

      "Open the pdf in a reader, such as Adobe Acrobat and check File -> Properties -> Security.
      The crucial property is Page Extraction. For example, in the screenshot below, the text cannot be extracted by Beyond Compare's 3rd party tool"

      "Are the pdf extractable? Many are locked and Beyond Compare for Windows or MacOS cannot touch the text inside. This means the third party module in our pipeline that converts .pdf to .txt failed. This is almost always because the owners of the pdf have locked it with a password. Nothing can be done except to find a copy of the file that is not locked. This problem has not been reported on Linux versions of our software as far as I know.?



      • #4

        BC4 uses a command line utility called PDF2Text, while Adobe itself is the official PDF application. This sounds like a bug with Nuance Power PDF if Adobe, PDF2Text, and Microsoft Printing all see the Security set to prevent extraction. It looks like Nuance is the only utility reporting that extraction is allowed.

        To clarify, if you open the pdf file in Adobe Acrobat or Reader, and use the File menu -> Save As (or Export) -> as Plain Text (.txt) what error message is shown? If the official Adobe application blocks this, then I would expect most other applications to also fail, including BC4. If this process works, then we should get a copy of the problem PDFs emailed into support along with your (Help menu -> Support; Export) so we can test directly against them. If you can reply to your original email thread, and include a link to this forum thread, it would allow us to join these reports and better track the issue (what we've done and what we can still try to do).
        Aaron P Scooter Software


        • #5
          I should add that when contacting Nuance, they are likely going to be concerned if you can show the behavior issue in an official Adobe product, so you'll still want to run the above Save As Text test in Adobe Reader to see if extraction is allowed and what error is presented.
          Aaron P Scooter Software


          • #6
            Is there another Third Party extraction tool that ignores the Page Extraction setting? I am finding that MOST PDF files do not allow page extraction, and it's a real nuisance. I will need to turn off the PDF format recognition and just compare those files as if they are binary, which only tells me if they're equal, and offers no intelligence about what the differences are.


            • #7

              I'm not familiar with any specific software to circumvent the PDF security settings. In general, it would be better to contact the provider generating the PDFs with the security settings enabled, as that is not default behavior and they are enabling it on purpose.
              Aaron P Scooter Software


              • #8
                Hi Aaron,

                Thanks for your reply. I think that the page extraction prohibition is a default in most PDF generators, and most people who create PDF files have no clue what the settings are, and never even think of adjusting them. In fact, I am the one who produced some of the PDF files that I am having problems with, and I have no idea how to correct them. A few of these files were produced by scanning hundreds of pages of text, and the individual page scans are now long gone, as well as the paper that produced them. I don't even know what software I used to spool all these pages into PDF form, it was so long ago... Anyway, given that there are hundreds of thousands of people spewing PDF files today, I think it's impossible to get them all to learn and apply a best practice for making files that can be compared via BC! :-) That being said, I still love your software. Adobe is at fault for claiming that they are open, but never has a company been so closed! I mean, come on, "page extraction" as a disallowed access right while printing and viewing are still allowed? Totally ridiculous. I think that Adobe does silly things simply because their SW engineers have their management supine over a barrel, and because "They Can." Sorry about my ranting and raving.


                • #9
                  FYI, since printing is still allowed from a PDF file that disallows page extraction, I tried printing to another PDF file using "PrimoPDF". I guess you can think of this as a PDF "proxy" when one PDF file feeds into another. Unfortunately, it copied the settings over from the previous file, so the file printed as PDF still had page extraction disabled. Not only is page extraction a stupid attribute, but it's one that's communicable. I think I need to contact the CDC.


                  • #10
                    Thanks, and you are right, I was mixing up Page Extraction with Security Method. Page Extraction shouldn't block BC4's Text Compare from extracting and viewing the text of the PDF. What error are you seeing, and are there any other Properties set on the PDF (check Document Properties from within Adobe)?
                    Aaron P Scooter Software


                    • #11

                      PdfToText.exe is the conversion program we're using in the background. You can run the command line directly to test:
                      pdftotext.exe -enc UTF-8 -table -nopgbrk "c:\temp\sourcefile.pdf" "c:\temp\output.txt"

                      Are there errors on the command line?

                      Also, from the screenshot, it looks like those PDFs (at least start with) a picture. The Text Compare can only compare text data within the PDF files (selectable and exportable text). You can also use Adobe's File menu -> Save As Text to see which text data is in the PDF. If the files are entirely scanned pictures (of text), then the output is actually empty, which would return a conversion error.
                      Aaron P Scooter Software


                      • #12
                        Thanks. It's a little tricky to handle the 0 size output without an error code, since the file could be legitimately empty appearing, but improving this feedback is something on our wishlist.
                        Aaron P Scooter Software