Image definition for OCR area when processing multiple files

ocr

#1

Hello
I do OCR on PDF file image : it contains numbered lines and I have to extract value for each line
I was able to define the anchor area (line number) and OCR area. It worked well with the first file

I selected a second file and ran the process on it and OCR failed
Anchor area is the same (a line number) but it seems the OCR area is not recognized . For what I understand the “image” of the OCR area is not the same hence the error .

So can you explain how OCR area is defined : is it just an area shape with coordinates defined based relative to anchor area ? In that case, the image found in that area is not relevant and it should work for any file with same pattern

Or does it really look for the captured image ? In that case I understand I have to capture a new image for OCR to work with each file

I don’t know if my question is clear … So … example

In file one I have lines :

107 : 1.204.258,52
108 : 20.205,25
109 : 2.548,02

So I captured an image where I defined the line number “107” as the anchor and the OCR area to capture the value of that line . And that works !

But if I run the script with a second file where lines are :

107 : 4.258,52
108 : 220.105,20
109 : 5.142.548,02
I receive an error on OCR

Can you please explain if I correctly understand how OCR work or should work for processing multiple files using same pattern ?

Thanks and sorry for being so long


#2

@Pierre_Bernier3 - thanks for the question

You understood it right - OCR area is just an area shape with coordinates defined based relative to anchor area.

So if the anchor are is the same, and the OCR are content is changing - OCR action should work
But if the distance between Anchor and OCR areas varies, it will try to recognize other content