OCR files with language as Hindi

Hi Team,

I have a requirement where I have to OCR image-based PDFs, where the underlying text can be in English/Hindi. If I OCR the file with Hindi text, I get some gibberish text. How do I identify that the language is Hindi? Do we have any provision to solve the above-mentioned task?

Thanks,
Kiran Talreja

@kiran.talreja do you use the OCR action in RPA Recorder to OCR the text?
Note that Hindi is not supported is this action.

Hi @ashapkina,

I am using Workfusion’s , plugins to OCR the text. I know Hindi is not supported but In my use-case while passing the files to OCR through OCR the bot won’t know the language of the text, so if it finds Text in Hindi it will send me some gibrish/garbage text. How do I identify that the text is not English, so that I can pass it for Manual processing.

Thanks,
Kiran Talreja

Hi @ashapkina,

Awaiting your response

Thanks,
Kiran

Can you please update on the question. This is an urgent requirement

@kiran.talreja you cannot define the language of the document.

The only thing I can recommend is using Exception Handling or If-Else to provide for such a scenario.

Is there some part of the document that is always the same if it is in English and different if it is in Hindi?