OCR Parsing Errors

ocr
issues

#1

OCR is unable to correctly parse text from a Microsoft Remote Desktop Connection window. Specifically, the OCR result yields the following issues:

  1. Spurious leading characters that can be removed using Substring action (reported earlier)
  2. Adds blank characters within the text string
  3. Converts the ‘.l’ (period - l) sequence to “J”

Please see uploaded screenshot for reference. I have adjusted the Capture Region, but have consistently received the same result. Please advise on how I can improve the accuracy of OCR parsing results.


#2

Hi David,

  1. We had a bug when OCR added junk symbols before the result, it has been fixed for RPA Express 2.0 release.
  2. Could you check if there are spaces between these characters when you type to OCR result to notepad or some other application?
  3. Yes, there are sometimes such issues when the text is small or the quality of the image with the text is not very good. There is hardly anything you can do it such case now. We have plans to improve the OCR component later this year, but it is hard to provide any timeframe yet.

#3

Alesia, Thanks for the updates regarding this issue.
I was able to consistently remove the junk symbols using the Substring action and always pulling the 2nd through last character in the string - will have to remove those entries after the upgrade to RPA-E 2.0

Regarding your second question, I did a character by character check and indeed verified that there were extra spaces inserted into the OCR output.

I also determined that, in this case, if I reduced the extracted text area to the essential portion of the line of text, I could get a consistent OCR read. In another scan, I was experiencing the number ‘8’ being translated as ‘B’ - it has not been consistent though. Looking forward to the improvements coming later this year.


#4

Thank you, David. We will look into the issue with extra spaces.


closed #7