Unable to extract data from scanned PDF

Hello Team,
I am not able to extract the text from a scanned PDF .

The current OCR have the functionality of fetching the data from PNG format images and it dont have the capability of fetching handwritten text …

Could you please help me out to fetch data from a scanned PDF file?

Thanks for your advance consideration .

BR
Diptiranjan Panda

Hello @diptiranjanpanda.

Could you please clarify more detailed? Did you see any exception? Where you try to extract: in bot task or in recorder?

Hello Valeriya,

I don’t see any option in the recorder to fetch data from Scanned PDF file. I have tried in Recorder.

One thing I have mentioned here the Scan PDF contain one more one pages and The scanned PDF file is may contain the hind written signature.

Do you have any option over here .I will really happy if you give me any solution on this.

I have also checked with PNG images but the success rate is very less and getting image not found exception most of the time.(APNG image is not found)

could you please proved any solution on this.

Thanks
Diptiranjan Panda

1 Like

I assume that you need to add OCR action to your recording. You can find more details about this action in our documentation: https://kb.workfusion.com/display/RPAe/OCR. Please pay attention OCR is not recorded automatically, you need to add it manually.
Also please see video about OCR in our YouTube channel: https://www.youtube.com/watch?v=YRWQAnt2Jl8

As about handwritten text, unfortunately our OCR cannot work with it.

This can be due to different screen resolution or not exact image.

As alternative, you can try to use ocr plugin in Studio. You can find description and code example in Studio Help Contents:

1 Like

I am facing same issue after applying same property mentioned on the videos.

Same image not found exception occurs.

Error executing OcrAction
com.workfusion.studio.rpa.recorder.playback.PlaybackException: Error executing TemplateAction[templateName=OcrAction.ftl,id=1,name=Optional[OcrAction],parent=-1,nextSibling=2,arguments=ActionArguments[varName=[test1],imageName=[C:\Users\preet.leelaramani\workfusion-workspace\rpae_project\RPA Demo\1549608716101-anchor-1549608716159.apng],fullImageName=[1549608716101.png],xsi:type=[recorder:OcrAction, recorder:OcrAction],pollingInterval=[300],active=[true],type=[CONTROL],offsetX=[72],delay=[60000],offsetY=[67],width=[154],actionDetails=[(to ‘test1’ rectangle 154 x 20)],height=[20],awaitTimeout=[5000]]]
at com.workfusion.studio.rpa.recorder.playback.flow.StandardControlFlow.execute(StandardControlFlow.java:54)
at com.workfusion.studio.rpa.recorder.playback.action.template.TemplateAction.execute(TemplateAction.java:28)
at com.workfusion.studio.rpa.recorder.playback.action.template.TemplateAction.execute(TemplateAction.java:15)
at com.workfusion.studio.rpa.recorder.playback.player.ActionPlayer.next(ActionPlayer.java:64)
at com.workfusion.studio.rpa.recorder.player.PlaybackLogic.playNextAction(PlaybackLogic.java:152)
at com.workfusion.studio.rpa.recorder.player.PlaybackLogic.run(PlaybackLogic.java:112)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.openqa.selenium.WebDriverException: Image not found : 1549608716101-anchor-1549608716159.apng
Command duration or timeout: 0 milliseconds
Build info: version: ‘9.2.0.4’, revision: ‘1a10eeeced’, time: ‘2018-11-29T10:44:59.891Z’
System info: host: ‘MUDT317X6T2’, ip: ‘192.168.202.42’, os.name: ‘Windows 10’, os.arch: ‘amd64’, os.version: ‘10.0’, java.version: ‘1.8.0_121’
Driver info: com.freedomoss.crowdcontrol.webharvest.selenium.wrapper.RemoteDriverWrapper
Capabilities [{imageSimilarityThreshold=0.8, extra.executor.id={Name=RPA Recorder}, CLOSE_ALL_WINDOWS=false, browserName=universal, javascriptEnabled=true, extra.capabilities.context={“browserType”:“universal”,“startInPrivate”:false,“blockImages”:false,“maximizeOnStartup”:false,“customCapabilities”:{“platform”:“WINDOWS”,“javascriptEnabled”:true,“SEARCH_ALL_WINDOWS”:true,“CLOSE_ALL_WINDOWS”:false,“imageSimilarityThreshold”:“0.8”},“executorId”:{“Name”:“RPA Recorder”}}, platformName=WINDOWS, SEARCH_ALL_WINDOWS=true, platform=WINDOWS}]
Session ID: 1991e741-60e9-4ea4-a8e1-46f3b37c030d
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.openqa.selenium.remote.ErrorHandler.createThrowable(ErrorHandler.java:216)
at org.openqa.selenium.remote.ErrorHandler.throwIfResponseFailed(ErrorHandler.java:168)
at org.openqa.selenium.remote.http.JsonHttpResponseCodec.reconstructValue(JsonHttpResponseCodec.java:41)
at org.openqa.selenium.remote.http.AbstractHttpResponseCodec.decode(AbstractHttpResponseCodec.java:82)
at org.openqa.selenium.remote.http.AbstractHttpResponseCodec.decode(AbstractHttpResponseCodec.java:45)
at org.openqa.selenium.remote.HttpCommandExecutor.execute(HttpCommandExecutor.java:164)
at org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:741)
at org.openqa.selenium.remote.RemoteWebDriver.executeScript(RemoteWebDriver.java:677)
at com.workfusion.rpa.helpers.ImageElement.findImageRectangle(ImageElement.java:184)
at com.workfusion.rpa.helpers.ImageElement.getRect(ImageElement.java:86)
at com.workfusion.rpa.helpers.UiElement.getRect(UiElement.java:1278)
at org.openqa.selenium.WebElement$getRect.call(Unknown Source)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:116)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:120)
at Script2.run(Script2.groovy:9)
at com.workfusion.studio.rpa.recorder.playback.shell.GroovyShellWrapper.executeScript(GroovyShellWrapper.java:48)
at com.workfusion.studio.rpa.recorder.playback.player.PlaybackContext.executeScript(PlaybackContext.java:65)
at com.workfusion.studio.rpa.recorder.playback.action.template.TemplateAction.executeBehavior(TemplateAction.java:33)
at com.workfusion.studio.rpa.recorder.playback.flow.StandardControlFlow.execute(StandardControlFlow.java:46)
at com.workfusion.studio.rpa.recorder.playback.action.template.TemplateAction.execute(TemplateAction.java:28)
at com.workfusion.studio.rpa.recorder.playback.action.template.TemplateAction.execute(TemplateAction.java:15)
at com.workfusion.studio.rpa.recorder.playback.player.ActionPlayer.next(ActionPlayer.java:64)
at com.workfusion.studio.rpa.recorder.player.PlaybackLogic.playNextAction(PlaybackLogic.java:152)
at com.workfusion.studio.rpa.recorder.player.PlaybackLogic.run(PlaybackLogic.java:112)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.NoSuchElementException: Image not found : 1549608716101-anchor-1549608716159.apng
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.codehaus.groovy.reflection.CachedConstructor.invoke(CachedConstructor.java:83)
at org.codehaus.groovy.reflection.CachedConstructor.doConstructorInvoke(CachedConstructor.java:77)
at org.codehaus.groovy.runtime.callsite.ConstructorSite$ConstructorSiteNoUnwrap.callConstructor(ConstructorSite.java:84)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallConstructor(CallSiteArray.java:59)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callConstructor(AbstractCallSite.java:238)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callConstructor(AbstractCallSite.java:250)
at Script1$1.call(Script1.groovy:12)
at Script1$1.call(Script1.groovy)
at com.workfusion.common.utils.SynchUtils.withFocusLock(SynchUtils.java:47)
at com.workfusion.common.utils.SynchUtils$withFocusLock.call(Unknown Source)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:116)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:136)
at Script1.run(Script1.groovy:5)
at groovy.lang.GroovyShell.evaluate(GroovyShell.java:444)
at groovy.lang.GroovyShell.evaluate(GroovyShell.java:482)
at groovy.lang.GroovyShell.evaluate(GroovyShell.java:453)
at com.workfusion.autoit.driver.script.GroovyExecutor.execute(GroovyExecutor.java:45)
at com.workfusion.autoit.driver.AutoItDriver.executeScriptInternal(AutoItDriver.java:235)
at com.workfusion.autoit.driver.AutoItDriver.executeScript(AutoItDriver.java:190)
at com.workfusion.universal.driver.UniversalDriver.executeScript(UniversalDriver.java:151)
at org.openqa.selenium.remote.server.handler.ExecuteScript.call(ExecuteScript.java:54)
at org.openqa.selenium.remote.server.handler.WebDriverHandler.handle(WebDriverHandler.java:41)
at org.openqa.selenium.remote.server.rest.ResultConfig.handle(ResultConfig.java:133)
at org.openqa.selenium.remote.server.JsonHttpCommandHandler.handleRequest(JsonHttpCommandHandler.java:205)
at org.openqa.selenium.remote.server.InMemorySession.execute(InMemorySession.java:98)
at org.openqa.selenium.remote.server.WebDriverServlet.lambda$handle$0(WebDriverServlet.java:231)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.j

Hello @diptiranjanpanda.
Which version of RPA Express you have? Please be informed that 2.2.1 with new OCR license was released last week. Does the exception occur in new version or in old one?

thanks for your response . currently i am using version : 9.1.0.1.

Can you please check in Control Panel - Programs - Programs and Features? You should see something like this:

my current version is : 2.1.3.858
Please find the attached screenshot for the same.

Thank you. So I recommend you to update till the latest version with new OCR license as OCR in 2.1.3 was already expired. Please check how your script work in new version.

Hi @diptiranjanpanda did updating to the new RPA Express help solve this issue?