OCR in RPA Express

Hi @ashapkina,

I am trying out OCR functionality in RPA Express. 2 questions.

  1. How can I OCR images with swedish text?

  2. How do I get the invoice through RPA Express and get the output as in the screenshot below? (I tried via String variable, but its a lot of information lost compared to the original image and the text is not correctly formatted in the output …)

Hi, @aleksandrs.bogdanovs!

You can use the OCR Plugin in this case. KB : https://kb.workfusion.com/display/WF/OCR+Plugin

Set the attribute ‘Language’ to ‘Swedish’ which abbyy supports.

The OCR output of your document would vary on several parameters like document quality (dpi), noise etc. A good quality document to OCR is 300 dpi. If you think that your OCR is clear enough, try modifying other attributes like customRegions, allowedRegionTypes etc. Do a few iterations and I believe it should improve. Also try setting the export type to ‘xml’ and testing the output in a manual task.

Best,
Amol

1 Like

Thanks for info! Can I use this in RPA Express 2.0.4?

@aleksandrs.bogdanovs yes, you can use OCR plugin with RPA Express.

However, there is a bug in version 2.0.4 when it doesn’t work in WF Studio, only in Control Tower. OCR plugin not working in RPA Express 2.0 WF Studio

We have already fixed it, and the fix will be included in the next release.

But you can use it in Control Tower.

1 Like

Hi @ashapkina,

I followed your suggestion to try OCR via Control Tower, but it I am getting an error:

Step name ‘ocr’ has failed. Reason: 'IO error during HTTP execution for URL

Here is my bot task:

    <?xml version="1.0" encoding="UTF-8"?>
    <config xmlns="http://web-harvest.sourceforge.net/schema/1.0/config" scriptlang="groovy">
    	<var-def name="ocrResult">
    	<ocr>
          <ocr-image>
            <http url="https://i2.wp.com/invoice.2go.com/wp-content/uploads/2016/09/Sample-Invoice-Template-image.png"/>
          </ocr-image>
        </ocr>
    	</var-def>
    <script return="ocrResult.get(0).wrappedObject.results['txt']"/>
    </config>

Please help me figure out if OCR functionality is at good enough level to propose it to our power users…

@aleksandrs.bogdanovs could you share the whole text of the error? You can export it to Excel from the error log.

Hi @ashapkina,
here you go (it is different link, but same problem…).

org.webharvest.exception.HttpException: IO error during HTTP execution for URL: https://blog.magestore.com/wp-content/uploads/2018/02/Ideal-POS-receipt-configure.png
at org.webharvest.runtime.web.HttpClientManager.execute(HttpClientManager.java:224)
at org.webharvest.runtime.processors.HttpProcessor.execute(HttpProcessor.java:106)
at org.webharvest.runtime.processors.BaseProcessor.run(BaseProcessor.java:127)
at org.webharvest.runtime.processors.BodyProcessor.execute(BodyProcessor.java:27)
at org.webharvest.runtime.processors.WebHarvestPlugin.executeBody(WebHarvestPlugin.java:246)
at com.freedomoss.crowdcontrol.webharvest.plugin.ocr.OcrImagePlugin.executePlugin(OcrImagePlugin.java:26)
at org.webharvest.runtime.processors.WebHarvestPlugin.execute(WebHarvestPlugin.java:125)
at org.webharvest.runtime.processors.BaseProcessor.run(BaseProcessor.java:127)
at org.webharvest.runtime.processors.BodyProcessor.execute(BodyProcessor.java:27)
at org.webharvest.runtime.processors.WebHarvestPlugin.executeBody(WebHarvestPlugin.java:246)
at com.freedomoss.crowdcontrol.webharvest.plugin.ocr.OcrPlugin.executePlugin(OcrPlugin.java:104)
at org.webharvest.runtime.processors.WebHarvestPlugin.execute(WebHarvestPlugin.java:125)
at org.webharvest.runtime.processors.BaseProcessor.run(BaseProcessor.java:127)
at org.webharvest.runtime.processors.BodyProcessor.execute(BodyProcessor.java:27)
at org.webharvest.runtime.processors.VarDefProcessor.execute(VarDefProcessor.java:59)
at com.freedomoss.crowdcontrol.webharvest.processors.VarDefProcessorValidated.execute(VarDefProcessorValidated.java:28)
at org.webharvest.runtime.processors.BaseProcessor.run(BaseProcessor.java:127)
at org.webharvest.runtime.Scraper.execute(Scraper.java:169)
at org.webharvest.runtime.Scraper.execute(Scraper.java:182)
at com.freedomoss.crowdcontrol.webharvest.executor.LocalWebharvestTaskExecutor.executeWebHarvestTask(LocalWebharvestTaskExecutor.java:187)
at com.freedomoss.crowdcontrol.webharvest.executor.LocalWebharvestTaskExecutor.executeWebHarvestTask(LocalWebharvestTaskExecutor.java:97)
at com.workfusion.service.machine.BotRecordExecutionService.process(BotRecordExecutionService.java:168)
at com.workfusion.service.machine.BotRecordExecutionService.process(BotRecordExecutionService.java:139)
at com.workfusion.service.machine.BotRecordExecutionService.processSubmissionWithAllocationLogger(BotRecordExecutionService.java:118)
at com.workfusion.service.machine.BotRecordExecutionService.lambda$processRecord$0(BotRecordExecutionService.java:97)
at com.workfusion.utils.thread.NamedThreadTemplate.executeWithNamedThread(NamedThreadTemplate.java:10)
at com.workfusion.service.machine.BotRecordExecutionService.processRecord(BotRecordExecutionService.java:97)
at com.workfusion.service.machine.thread.RecordProcessThread.run(RecordProcessThread.java:28)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Connection timed out: connect

Look like there are issues in opening the web page you are trying to ocr: https://blog.magestore.com/wp-content/uploads/2018/02/Ideal-POS-receipt-configure.png

Can you open it manually in the browser?

@ashapkina

Yes, I can open it in the browser.

Do you open it in your default browser?
This error usually occurs when firewall or security settings block access to the website.

Can you try to upload the image to the File storage in RPA Express and use that link?

Hi,

I have uploaded couple of image to the local storage, but can not access them via copied link (I put them in the public folder in my local Minio storage).

There is no error message, just small grey square… Can this issue be related to the previous one?

@aleksandrs.bogdanovs what browser are you trying to open it in?
Can you open it when you open it manually, not via bot?

Any browser, trying to open it manually.

How does the copied link look?
It should start with http://localhost:15110/public/ and then have the name of the file.

Hi @ashapkina,

yes, it has exactly that format, but still when trying to access the file i see this:

Could you please check if the public bucket has setting to allow Read and Write access to everyone (*)

Hi @ashapkina,

checked, I have settings exactly as in your pic.

Hmm, are you trying to open the link from your office network?
The only reason I can think of right now is internal security limitations that don’t allow you to access the links.

Hi,
yes, I am trying to open the file from the same computer minio is running on…

Do you have any suggestion on how to proceed? (am I the only one who has this problem?)

@aleksandrs.bogdanovs can you try and open the urls out of the office network, on your home pc, for example, and see it they can be opened there?
Also try setting a bigger timeout for website loading and opening the images in other browsers.