OCR Process timeout

When we are passing different type of files like .msg,.mht,.pdf,.txt and images files, most of the files are processing, but for few we are getting below error. Could you please provide resolution.

Description:
Step name ‘Process Emails and attachments using OCR’ has failed. Reason: ‘OCR task has failed : taskId=5bae4e870d7fe82809ae1701 status=ProcessingFailed tries=35 message=A process timed out’
Full Description:
"com.freedomoss.crowdcontrol.webharvest.plugin.ocr.impl.OcrException: OCR task has failed : taskId=5bae4e870d7fe82809ae1701 status=ProcessingFailed tries=35 message=A process timed out
at com.freedomoss.crowdcontrol.webharvest.plugin.ocr.impl.OcrClient$Waiting.fail(OcrClient.java:128)
at com.freedomoss.crowdcontrol.webharvest.plugin.ocr.impl.OcrClient.waitCompletion(OcrClient.java:112)
at com.freedomoss.crowdcontrol.webharvest.plugin.ocr.OcrPlugin.downloadResults(OcrPlugin.java:225)
at com.freedomoss.crowdcontrol.webharvest.plugin.ocr.OcrPlugin.processImage(OcrPlugin.java:181)
at com.freedomoss.crowdcontrol.webharvest.plugin.ocr.OcrPlugin.executePlugin(OcrPlugin.java:111)
at org.webharvest.runtime.processors.WebHarvestPlugin.execute(WebHarvestPlugin.java:125)
at org.webharvest.runtime.processors.BaseProcessor.run(BaseProcessor.java:127)
at org.webharvest.runtime.processors.BodyProcessor.execute(BodyProcessor.java:27)
at org.webharvest.runtime.processors.VarDefProcessor.execute(VarDefProcessor.java:59)
at com.freedomoss.crowdcontrol.webharvest.processors.VarDefProcessorValidated.execute(VarDefProcessorValidated.java:28)
at org.webharvest.runtime.processors.BaseProcessor.run(BaseProcessor.java:127)
at org.webharvest.runtime.processors.BodyProcessor.execute(BodyProcessor.java:27)
at org.webharvest.runtime.processors.BaseProcessor.run(BaseProcessor.java:127)
at org.webharvest.runtime.processors.CaseProcessor.execute(CaseProcessor.java:77)
at org.webharvest.runtime.processors.BaseProcessor.run(BaseProcessor.java:127)
at org.webharvest.runtime.processors.BodyProcessor.execute(BodyProcessor.java:27)
at org.webharvest.runtime.processors.BaseProcessor.run(BaseProcessor.java:127)
at org.webharvest.runtime.processors.CaseProcessor.execute(CaseProcessor.java:68)
at org.webharvest.runtime.processors.BaseProcessor.run(BaseProcessor.java:127)
at org.webharvest.runtime.processors.BodyProcessor.execute(BodyProcessor.java:27)
at org.webharvest.runtime.processors.BaseProcessor.run(BaseProcessor.java:127)
at org.webharvest.runtime.processors.LoopProcessor.execute(LoopProcessor.java:116)
at org.webharvest.runtime.processors.BaseProcessor.run(BaseProcessor.java:127)
at org.webharvest.runtime.Scraper.execute(Scraper.java:169)
at com.freedomoss.crowdcontrol.webharvest.plugin.include.IncludeConfigPlugin.executePlugin(IncludeConfigPlugin.java:32)
at org.webharvest.runtime.processors.WebHarvestPlugin.execute(WebHarvestPlugin.java:125)
at org.webharvest.runtime.processors.BaseProcessor.run(BaseProcessor.java:127)
at org.webharvest.runtime.processors.BodyProcessor.execute(BodyProcessor.java:27)
at org.webharvest.runtime.processors.BaseProcessor.run(BaseProcessor.java:127)
at org.webharvest.runtime.processors.CaseProcessor.execute(CaseProcessor.java:68)
at org.webharvest.runtime.processors.BaseProcessor.run(BaseProcessor.java:127)
at org.webharvest.runtime.Scraper.execute(Scraper.java:169)
at org.webharvest.runtime.Scraper.execute(Scraper.java:182)
at com.freedomoss.crowdcontrol.webharvest.executor.LocalWebharvestTaskExecutor.executeWebHarvestTask(LocalWebharvestTaskExecutor.java:187)
at com.freedomoss.crowdcontrol.webharvest.executor.LocalWebharvestTaskExecutor.executeWebHarvestTask(LocalWebharvestTaskExecutor.java:97)
at com.workfusion.service.machine.BotRecordExecutionService.process(BotRecordExecutionService.java:168)
at com.workfusion.service.machine.BotRecordExecutionService.process(BotRecordExecutionService.java:134)
at com.workfusion.service.machine.BotRecordExecutionService.processSubmissionWithAllocationLogger(BotRecordExecutionService.java:118)
at com.workfusion.service.machine.BotRecordExecutionService.lambda$processRecord$0(BotRecordExecutionService.java:97)
at com.workfusion.utils.thread.NamedThreadTemplate.executeWithNamedThread(NamedThreadTemplate.java:10)
at com.workfusion.service.machine.BotRecordExecutionService.processRecord(BotRecordExecutionService.java:97)
at com.workfusion.service.machine.thread.RecordProcessThread.run(RecordProcessThread.java:28)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
"
Thanks & Regards,
Sambasivareddy

Hi @sambasivareddy what tool do you use, SPA or RPA Express? For which files does OCR fail?

When we are passing .msg, .mht,.pdf and images files to OCR step and its successfully converted, but few are not processing and getting below error. Could you please let me the resolution for the below error.
Description:
Step name ‘Process Emails and attachments using OCR’ has failed. Reason: ‘Content is not provided’.
Full Description:
"org.webharvest.exception.PluginException: Content is not provided
at com.freedomoss.crowdcontrol.webharvest.plugin.s3.S3PutAbstractPlugin.executePlugin(S3PutAbstractPlugin.java:73)
at org.webharvest.runtime.processors.WebHarvestPlugin.execute(WebHarvestPlugin.java:125)
at org.webharvest.runtime.processors.BaseProcessor.run(BaseProcessor.java:127)
at org.webharvest.runtime.processors.BodyProcessor.execute(BodyProcessor.java:27)
at org.webharvest.runtime.processors.WebHarvestPlugin.executeBody(WebHarvestPlugin.java:246)
at com.freedomoss.crowdcontrol.webharvest.plugin.s3.S3Plugin.executePlugin(S3Plugin.java:64)
at org.webharvest.runtime.processors.WebHarvestPlugin.execute(WebHarvestPlugin.java:125)
at org.webharvest.runtime.processors.BaseProcessor.run(BaseProcessor.java:127)
at org.webharvest.runtime.processors.BodyProcessor.execute(BodyProcessor.java:27)
at org.webharvest.runtime.processors.VarDefProcessor.execute(VarDefProcessor.java:59)
at com.freedomoss.crowdcontrol.webharvest.processors.VarDefProcessorValidated.execute(VarDefProcessorValidated.java:28)
at org.webharvest.runtime.processors.BaseProcessor.run(BaseProcessor.java:127)
at org.webharvest.runtime.processors.BodyProcessor.execute(BodyProcessor.java:27)
at org.webharvest.runtime.processors.BaseProcessor.run(BaseProcessor.java:127)
at org.webharvest.runtime.processors.CaseProcessor.execute(CaseProcessor.java:77)
at org.webharvest.runtime.processors.BaseProcessor.run(BaseProcessor.java:127)
at org.webharvest.runtime.processors.BodyProcessor.execute(BodyProcessor.java:27)
at org.webharvest.runtime.processors.BaseProcessor.run(BaseProcessor.java:127)
at org.webharvest.runtime.processors.CaseProcessor.execute(CaseProcessor.java:68)
at org.webharvest.runtime.processors.BaseProcessor.run(BaseProcessor.java:127)
at org.webharvest.runtime.processors.BodyProcessor.execute(BodyProcessor.java:27)
at org.webharvest.runtime.processors.BaseProcessor.run(BaseProcessor.java:127)
at org.webharvest.runtime.processors.LoopProcessor.execute(LoopProcessor.java:116)
at org.webharvest.runtime.processors.BaseProcessor.run(BaseProcessor.java:127)
at org.webharvest.runtime.processors.BodyProcessor.execute(BodyProcessor.java:27)
at org.webharvest.runtime.processors.BaseProcessor.run(BaseProcessor.java:127)
at org.webharvest.runtime.processors.CaseProcessor.execute(CaseProcessor.java:68)
at org.webharvest.runtime.processors.BaseProcessor.run(BaseProcessor.java:127)
at org.webharvest.runtime.Scraper.execute(Scraper.java:169)
at com.freedomoss.crowdcontrol.webharvest.plugin.include.IncludeConfigPlugin.executePlugin(IncludeConfigPlugin.java:32)
at org.webharvest.runtime.processors.WebHarvestPlugin.execute(WebHarvestPlugin.java:125)
at org.webharvest.runtime.processors.BaseProcessor.run(BaseProcessor.java:127)
at org.webharvest.runtime.processors.BodyProcessor.execute(BodyProcessor.java:27)
at org.webharvest.runtime.processors.BaseProcessor.run(BaseProcessor.java:127)
at org.webharvest.runtime.processors.CaseProcessor.execute(CaseProcessor.java:68)
at org.webharvest.runtime.processors.BaseProcessor.run(BaseProcessor.java:127)
at org.webharvest.runtime.Scraper.execute(Scraper.java:169)
at org.webharvest.runtime.Scraper.execute(Scraper.java:182)
at com.freedomoss.crowdcontrol.webharvest.executor.LocalWebharvestTaskExecutor.executeWebHarvestTask(LocalWebharvestTaskExecutor.java:187)
at com.freedomoss.crowdcontrol.webharvest.executor.LocalWebharvestTaskExecutor.executeWebHarvestTask(LocalWebharvestTaskExecutor.java:97)
at com.workfusion.service.machine.BotRecordExecutionService.process(BotRecordExecutionService.java:168)
at com.workfusion.service.machine.BotRecordExecutionService.process(BotRecordExecutionService.java:134)
at com.workfusion.service.machine.BotRecordExecutionService.processSubmissionWithAllocationLogger(BotRecordExecutionService.java:118)
at com.workfusion.service.machine.BotRecordExecutionService.lambda$processRecord$0(BotRecordExecutionService.java:97)
at com.workfusion.utils.thread.NamedThreadTemplate.executeWithNamedThread(NamedThreadTemplate.java:10)
at com.workfusion.service.machine.BotRecordExecutionService.processRecord(BotRecordExecutionService.java:97)
at com.workfusion.service.machine.thread.RecordProcessThread.run(RecordProcessThread.java:28)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
"

@sambasivareddy please provide more details: what tool you are using, for which types of files the OCR actions fail.

You can find the list of supported file formats here: https://www.abbyy.com/en-us/support/frengine/11linux/info/formats/

Hi Ashapkina,

I am using SPA and file type is .mht for which we are getting OCR Process timeout error.

Thanks & Regards,
Sambasivareddy

@sambasivareddy .mht files are not supported as input. Try converting it to one of the supported file formats before using OCR.

Hi ashapkina,
I am using SPA. And while processing a pdf file through OCR, I am getting timeout exception. The pdf file is having more number of pages(150). It is getting processed if the pdf is having less number of pages(<30). Is there any way to limit this ocr page size

Hi @kamalamma.mukkara do you want the OCR to process a part of the document instead of the whole pdf? Or do you need to process the whole document without timing out, i.e. increase the timeout for OCR?

Hi ashapkina,
I need the OCR to process a part of the document instead of the whole pdf

In OCR in SPA, you can use parameter pages

pages
array[string]

Specifies the range of the pages which have to be processed (example:pages=1,2,3,10-15)

Hi ashpkina,
I am using ocr plugin. And “pages” attribute is not working for ocr plugin

Hi azinchuk,
As we are using pdf as documents for OCR. I get this error.
plugin: fail: OCR task has failed : taskId=2 status=ProcessingFailed tries=1 message=The image file format has not been recognized:
As well as for pdf it recognizes mime type as application/xml.