Funny Text in OCR Result

ocr

#1

Tried out the OCR action but there are some funny test in the result.

how to get a clean result from OCR action?


OCR appends  to the text captured from pdf
#2

There is already topic created on this problem here:


#3

Haha…That’s really funny. please remember, the output from OCR is a text, so the variable should be declared as a string.


#4

Hi @Qi_Zhou_Singtel
I get the same result when OCRing from an editable PDF. The only way I found to solve it is to clean the OCR variable afterwards using the text-substring action. In your example it would be substring after the 3rd character. Put that result into a new variable (or into the same one, octtext) and you get a clean ABC result.


#5

thanks @timriewe


OCR appends  to the text captured from pdf
#6

Hi,

While working with ocr to read data from pdf,
In execution-result.log.csv , I found that it appends  to the text.

for example: Tilak

could you please help us to remove  from text…

Thanks…


#7

If always happens the same error in the variable, you can try to use the function substring.


#8

Hello,

I am working with 1.1.7 version of RPAExpress.

I am trying to compare a value from one website to the other. I get the value “montant_cheque” from the first website thanks to XPath, and the value from the 2nd website with OCR. However, OCR returns %uFEFFvalue, so I can’t compare them since one value is of length 4 and the other of length 5. Following are some screenshots. Since the websites are local and the project is private I can’t link the .zip of the project.

Does anyone have an idea why ?


#9

I can’t use the substring method because sometimes the value contains 4 numbers, sometimes 3, sometimes 5… Or can we put a variable at “end character position” ?


#10

if you know it is always the last of the always the first character you need to delete you can use the substring. Either marking the “counting from end” or “counting from start” checkbox


#11

Hi,

I am trying to fetch a string using OCR. It reads the value correctly but the length of the string is always incorrect.
For example I am trying to read 010101. it reads the value correctly but when doing get length it gives me 7 instead of 6.
one more example I am trying to read 16-11-2017 ideally the length of this is 10 but get length is 11.

I checked the value captured by OCR its correct and there are no junk characters or space still the length is always 1 more than the actual length.

Can anyone help me on this.


#12

Hi
it sounds like this issue:


#13

Hi @timriewe

I have not got the solution for this issue. still for every OCR output length is incremented by +1.


#14

You can get a substring using https://kb.workfusion.com/display/RPAe/Text#Text-SubstringSubstring


#15

yes as a workaround I am doing the same.

Actually the issue is OCR is reading 1st character as bogus but when I try to print that value in notepad the bogus letter is not printed there.
For ex:-1:
OCR reads COST as COST with 5 letters length and when I print this in notepad it prints only COST and not the bogus first letter.

its very weird of OCR to this.