OCR Plugin "pdf to html.xml" fails when pointed to a local file, versus a web url

Hi, I am on:
Version Details
Version 2.4.0
WorkFusion Intelligent Automation Cloud Enterprise
OS: Windows 10, v.10.0, x86_64 / win32
Java version: 1.8.0_121

And am trying to use the custom script/plugin that is referenced at the very bottom of this page:

KB: OCR

I think this is an incredible tool, as it makes parsing text much easier than using ocr action by action on a large document, or parsing in general, because when I use this tool I can easily parse by using clean and powerful XPaths.

Anyway, I can get this script to work when I point it to a web url containing a .pdf file, example in screenshot:

But when I try and point it to a local .pdf file (the same file actually, just hosted locally), it fails, example in screenshot:

I am wondering what either I am doing wrong here, or if the script just does not account for local files (if it did, that would be a game changer!)

Here is a direct link to the .pdf that I have been testing with

Here is a .zip of my WF Project folder, containing the pdf to html.xml file in the /configs file.
WF Custom Bot Tasks.zip (6.8 KB)

@Cmoeller you cannot use a path to a local file in the OCR plugin. It needs to have http/https.
As a workaround, you can move/copy the file to the Minio folder (data/public) and open it using a URL http://localhost:15110/public/filename

@ashapkina ah, bummer. I will use that approach then, thanks! Might even spin up a local webserver for our own internal use to separate it from the WF install.

Is there any chance the OCR-plugin will be expanded upon in the future to allow for local file usage, and is there an obvious technical reason it wasn’t designed for it in the first place (just looking to learn more about it here, if possible). Thanks again!

@ashapkina

So, I am attempting to place the file I want to use the ocr-plugin on at:

C:\RPAExpress\minio\data\public\orderack.pdf

and then reference it in the script as: http://localhost:15110/public/orderack.pdf

but it is failing in the script run, and I am also unable to navigate to the document when entering that url in my web browser. I’m betting I’m making a simple mistake somewhere here…

Attached are some screenshots detailing what I am encountering:

Can you open the URL manually?

I cannot. When I enter " http://localhost:15110/public/orderack.pdf" into my web browser, I receive a “this page isnt’ working - localhost didn’t send any data” error. As detailed in the screenshot above.

I ended up installing Windows IIS on my machine, and using that to host the file at: “localhost/folder/filename.pdf” and that has worked for me.

For reference: IIS

1 Like

Glad it worked but not really sure why it couldn’t open the file from File Storage.
Is the file located on the same pc (server) where the bot ran? Can you see it in the Public folder when opening the File Storage?

@ashapkina

The .pdf file is stored on the same PC as where the bot ran. My laptop has WF installed on it, I placed the file in my C://RPAexpress/minio/data/public folder which appears to be a subfolder of the WF application that is installed on my computer.

How do I check what is in my file storage? I haven’t used that feature before

Hello @Cmoeller.

How did you put your file into File Storage? Did you use this guide - https://kb.workfusion.com/display/RPAe/File+Storage? Can you check whether you set permissions to bucket and file?

Hello @Cmoeller,

Can you please advise whether this is still actual question?

@Lera sorry for the late reply - we have found an alternative solution in the meantime.

I did not use the File Storage guide, I was unitiated to that prior to this conversation, I had just copied and pasted the file directly at that location under the WF install. Going forward I will use that guide. But you can consider this ticket closed for now. Thank you for your help.

2 Likes