Problem with OCR for multiple PDF files

Hi experts,

I am new to RPA technology and currently using OCR for PDF extraction.

Let’s say I have a folder and will put new invoice PDF into it, I am creating a bot to repeatedly extract the text from the PDF using OCR with the same PDF format, everything is fine for the first PDF file I designed by OCR, however the second PDF file while processing, the OCR doesn’t recognize it and says the picture cannot be found. I assume the RPA needs to use the new capture from second PDF?

So for this case, how RPA automatically extract the text from new PDFs and put into excel file?

Thanks in advance.

In order to use the same OCR action for several PDF files, the files have to be of the same format, and the image that you use as anchor area has to look the same in all files.

Hi,

I’m having the same issue as above. Tried running it on two PDF files, it was file with the first one but gives error on the second on, saying the anchor image can’t be found. I reordered the list and it won’t run on the second PDF file at all.

Kindly find below the images of the images used. I believe the anchor images look alike.
NB:: Invoice number erased out.

Thanks in advance.

@ebenezero can you share the screenshot of the OCR action to see the anchor region?

sample1

sample2

I cannot see any anchor on the images. Can you share the whole image you use like this one?

Looks like the invoice number and the date that the bot needs to read are included in the anchor region. As they are different on the second file, the bot cannot find the target anchor image. In the anchor region, you should have the part of the image that is the same in all documents.

image

image