OCR - long wait time


#1

During execution of OCR task it takes almost always at least 12 seconds to process one request. It seems to me that there are two delays of 5 second to fetch results.
I need to process many small text fields, so I need it to be done asa fast as possible.

Is there any way to shorten wait time?


#2

Please post here your PC spec and the images you are trying to recognize.


#3

As you can see there is 5 seconds delay in asynchronous results fetching.

My logs:
2017-07-31 23:50:48 [http-nio-15580-exec-7] INFO c.f.w.w.s.j.JwtAuthenticationFilter - validated token for tenantId=tenantId
2017-07-31 23:50:48 [http-nio-15580-exec-7] INFO c.f.w.w.s.j.JwtAuthenticationFilter - validated token for tenantId=tenantId
2017-07-31 23:50:48 [http-nio-15580-exec-7] INFO c.f.w.w.s.j.JwtAuthenticationFilter - validated token for tenantId=tenantId
2017-07-31 23:50:48 [http-nio-15580-exec-7] DEBUG com.wf.ocr.service.AbbyApiService - processImage size=455
2017-07-31 23:50:48 [http-nio-15580-exec-7] DEBUG com.wf.ocr.service.TaskDirectory - Added file ‘C:\Users\PAWEL~1.MIS\AppData\Local\Temp\abbyy_task-84280833265348258304\input-image.bin3005097534651841496.tmp’ for processing
2017-07-31 23:50:48 [http-nio-15580-exec-7] INFO com.wf.ocr.service.OcrTaskFacade - Start processImages images for process=1 TaskParameters{correctSkew=true, writeRecognitionVariants=false, profile=‘documentConversion’, exportFormat=‘txt’, language=‘English’, correctOrientation=true, engine=‘abbyy’, customRegions=’’, skipPreprocessing=false, dictionary=null, originalText=null, useWordsFromDictionaryOnly=false, alphabetExtension=, engineDLLFolder=FineReaderEngine\Bin64, engineDataFolder=, engineTempFolder=, engineLogFolder=logs, activeLicenseLog=active-license.log, license=WFST-Bo+qoNa/cXsqjv33K6SB23kA41QStxAmDuzF6ESLp/s4JZq9AnPgfg==, licensePath=Licenses\SWAO-1121-0006-1977-2657-6278.ABBYY.LocalLicense, licensePassword=***, useOnlyCustomRegions=false, useDefaultPattern=false, removeGarbageSize=null, removeNoiseModels=null, allowedRegionTypes=[], debug=false, customAlphabet=null, skipTextLayerExtraction=false, discardColorImage=false, enhanceLocalContrast=false, priority=0, timeout=10800}
2017-07-31 23:50:53 [http-nio-15580-exec-7] DEBUG com.wf.ocr.service.BaseProcessFacade - 1 file(s) copied.
Warning. Invalid resolution 0 dpi. Using 70 instead.
1 file(s) copied.

2017-07-31 23:50:53 [http-nio-15580-exec-7] DEBUG com.wf.ocr.service.BaseProcessFacade - Loading result from C:\Users\PAWEL~1.MIS\AppData\Local\Temp\abbyy_task-84280833265348258304\input-6654267596470287435.json_result
2017-07-31 23:50:53 [http-nio-15580-exec-7] DEBUG com.wf.ocr.service.BaseProcessFacade - Result OcrProcessResult [ready=true, message=OK, result=[c:\progs\RPAExpress\OCR\img_data.txt]] for TaskConfig [taskParameters=TaskParameters{correctSkew=true, writeRecognitionVariants=false, profile=‘documentConversion’, exportFormat=‘txt’, language=‘English’, correctOrientation=true, engine=‘abbyy’, customRegions=’’, skipPreprocessing=false, dictionary=null, originalText=null, useWordsFromDictionaryOnly=false, alphabetExtension=, engineDLLFolder=FineReaderEngine\Bin64, engineDataFolder=, engineTempFolder=, engineLogFolder=logs, activeLicenseLog=active-license.log, license=WFST-Bo+qoNa/cXsqjv33K6SB23kA41QStxAmDuzF6ESLp/s4JZq9AnPgfg==, licensePath=Licenses\SWAO-1121-0006-1977-2657-6278.ABBYY.LocalLicense, licensePassword=***, useOnlyCustomRegions=false, useDefaultPattern=false, removeGarbageSize=null, removeNoiseModels=null, allowedRegionTypes=[], debug=false, customAlphabet=null, skipTextLayerExtraction=false, discardColorImage=false, enhanceLocalContrast=false, priority=0, timeout=10800}, images=[C:\Users\PAWEL~1.MIS\AppData\Local\Temp\abbyy_task-84280833265348258304\input-image.bin3005097534651841496.tmp], patternPath=null, taskDirectory=C:\Users\PAWEL~1.MIS\AppData\Local\Temp\abbyy_task-84280833265348258304]
2017-07-31 23:50:53 [http-nio-15580-exec-7] DEBUG com.wf.ocr.service.AbbyApiService - Loaded result fileName=c:\progs\RPAExpress\OCR\img_data.txt size=7
2017-07-31 23:50:53 [http-nio-15580-exec-7] INFO com.wf.ocr.service.AbbyApiService - Finished processImageSynchronously taskId=8 result=OcrProcessResult [ready=true, message=OK, result=[c:\progs\RPAExpress\OCR\img_data.txt]]
2017-07-31 23:50:54 [http-nio-15580-exec-7] DEBUG com.wf.ocr.api.AbbyyOcrController - processImage response=Response [tasks=[Task [id=8, status=COMPLETED, resultUrl=http://localhost:15580/api/v1/cloud/download?taskId=8&result=1, resultUrl2=http://localhost:15580/api/v1/cloud/download?taskId=8&result=2, resultUrl3=http://localhost:15580/api/v1/cloud/download?taskId=8&result=3, images=[], registrationTime=Mon Jul 31 23:50:48 CEST 2017, statusChangeTime=Mon Jul 31 23:50:54 CEST 2017, processStartTime=null, processEndTime=null, parameters=TaskParameters{correctSkew=true, writeRecognitionVariants=false, profile=‘documentConversion’, exportFormat=‘txt’, language=‘English’, correctOrientation=true, engine=‘abbyy’, customRegions=’’, skipPreprocessing=false, dictionary=null, originalText=null, useWordsFromDictionaryOnly=false, alphabetExtension=, engineDLLFolder=FineReaderEngine\Bin64, engineDataFolder=, engineTempFolder=, engineLogFolder=logs, activeLicenseLog=active-license.log, license=WFST-Bo+qoNa/cXsqjv33K6SB23kA41QStxAmDuzF6ESLp/s4JZq9AnPgfg==, licensePath=Licenses\SWAO-1121-0006-1977-2657-6278.ABBYY.LocalLicense, licensePassword=***, useOnlyCustomRegions=false, useDefaultPattern=false, removeGarbageSize=null, removeNoiseModels=null, allowedRegionTypes=[], debug=false, customAlphabet=null, skipTextLayerExtraction=false, discardColorImage=false, enhanceLocalContrast=false, priority=0, timeout=10800}, result=null, result2=null, result3=null, error=null, message=OK, host=http://localhost:15580/api/v1/cloud/download?taskId=%s&result=%s]]]
2017-07-31 23:50:59 [http-nio-15580-exec-9] INFO c.f.w.w.s.j.JwtAuthenticationFilter - validated token for tenantId=tenantId
2017-07-31 23:50:59 [http-nio-15580-exec-9] INFO c.f.w.w.s.j.JwtAuthenticationFilter - validated token for tenantId=tenantId
2017-07-31 23:50:59 [http-nio-15580-exec-9] INFO c.f.w.w.s.j.JwtAuthenticationFilter - validated token for tenantId=tenantId
2017-07-31 23:50:59 [http-nio-15580-exec-9] DEBUG com.wf.ocr.api.AbbyyOcrController - getTaskStatus response=Response [tasks=[Task [id=8, status=COMPLETED, resultUrl=http://localhost:15580/api/v1/cloud/download?taskId=8&result=1, resultUrl2=http://localhost:15580/api/v1/cloud/download?taskId=8&result=2, resultUrl3=http://localhost:15580/api/v1/cloud/download?taskId=8&result=3, images=[], registrationTime=Mon Jul 31 23:50:48 CEST 2017, statusChangeTime=Mon Jul 31 23:50:59 CEST 2017, processStartTime=null, processEndTime=null, parameters=TaskParameters{correctSkew=true, writeRecognitionVariants=false, profile=‘documentConversion’, exportFormat=‘txt’, language=‘English’, correctOrientation=true, engine=‘abbyy’, customRegions=’’, skipPreprocessing=false, dictionary=null, originalText=null, useWordsFromDictionaryOnly=false, alphabetExtension=, engineDLLFolder=FineReaderEngine\Bin64, engineDataFolder=, engineTempFolder=, engineLogFolder=logs, activeLicenseLog=active-license.log, license=WFST-Bo+qoNa/cXsqjv33K6SB23kA41QStxAmDuzF6ESLp/s4JZq9AnPgfg==, licensePath=Licenses\SWAO-1121-0006-1977-2657-6278.ABBYY.LocalLicense, licensePassword=***, useOnlyCustomRegions=false, useDefaultPattern=false, removeGarbageSize=null, removeNoiseModels=null, allowedRegionTypes=[], debug=false, customAlphabet=null, skipTextLayerExtraction=false, discardColorImage=false, enhanceLocalContrast=false, priority=0, timeout=10800}, result=null, result2=null, result3=null, error=null, message=OK, host=http://localhost:15580/api/v1/cloud/download?taskId=%s&result=%s]]]
2017-07-31 23:50:59 [http-nio-15580-exec-10] INFO c.f.w.w.s.j.JwtAuthenticationFilter - validated token for tenantId=tenantId
2017-07-31 23:50:59 [http-nio-15580-exec-10] INFO c.f.w.w.s.j.JwtAuthenticationFilter - validated token for tenantId=tenantId
2017-07-31 23:50:59 [http-nio-15580-exec-10] INFO c.f.w.w.s.j.JwtAuthenticationFilter - validated token for tenantId=tenantId
2017-07-31 23:50:59 [http-nio-15580-exec-10] DEBUG com.wf.ocr.api.AbbyyOcrController - download taskId=8 result=1 length=7


#4

The image says “New”


#5

And the specs:
8gb ram
Intel Core i5-6300U CPU @2400Mhz
Disk SSD


#6

Are you running the recording from Control Tower or from the RPA Recorder?

If from the RPA Recorder, try to stop the Control Tower to free your resources.


#7

It was run from recorder - Tower was not started.

Are you sure there is no polling (waiting) for results ?


#8

currently this time is needed for OCR engine to start. We will continue to improve this action, so thanks for your input.


#9

I took some time to verify it - take a look at your class:
com.workfusion.desktop.driver.ocr.Ocr


public class Ocr {

private static final Logger LOGGER = LoggerFactory.getLogger(Ocr.class);

private int completionPollingTimeoutSeconds = 60;
private int completionPollingIntervalSeconds = 5;

    OcrTaskResponse response = client.processImage(biraryUploads, params);
    validateResponse(response, "processImage");

    OcrTaskResponse response2 = client.waitCompletion(response, completionPollingTimeoutSeconds, completionPollingIntervalSeconds);

    byte[] download = client.download(response2.getTask().getResultUrl());


#10

we’ve logged an improvement to set the polling interval and timeout through user interface (WFS-1114)

Thanks for your feedback - vote for this topic to raise its priority


#11

Hi
Is there any way to speed up the OCR action? it takes around 10 seconds for each OCR action, no matter if I scan only a single character or multiple character string from a PDF document. I process around 450 PDF docs and OCR-scan 5 values from each one. So only the OCR part of that process takes more than 6 hours.
Is there any solution? Maybe in future versions?

Thanks!
PD: I think I saw a similar topic in the last but couldn’t find it.


#12

@timriewe,

We will try to improve this action in next releases.


#13

Ah ok, thanks for pointing me to that thread. Any perspective for future versions yet?