Parsing data from docx to excel in RPA Express

I have 50+ docs files from which I have to extract the labels and its corresponding data.
For eg:

Here I have to extract data regarding to ship address, sales person, quantity, etc and add it to the columns of excel.
How do I go about automating this task?
Can using ocr be helpful in any case considering the number of files I have??
Any Quick response would be appreciated.
Thank you so much.

Hi Rishabh,

It depends on how often you need to run this process.

With free RPA Express, you get appr. 1,000 OCR pages for 3 months (1 ocr action is 1 page), so if you need to process 50 documents every day, you will not have enough pages. In this case you need to buy the Pro subscription that includes 10,000 OCR pages annually.

If t is a docx file, it can be possible to save the data to a variable using clipboard, for example, and them parse it using Text actions.


Thank you for your response.

In case if I use OCR, on these documents, wouldn’t the task be to perform OCR target operations to select anchor and capture region for all the pages of the 50+ documents, manually?
Disregarding the number of OCR pages I’d need as that won’t be an issue if I take a pro subscription.

Thank you!

It makes sense to use OCR only if your documents are of the same format. In this case you can create OCR actions only for 1 document using anchor regions that will be the same in all documents.

If all of you documents are of different formats, then you will to capture anchor and capture regions for each document separately.

Hope it answers your question.

1 Like

I tried that on two similar files, however it gave an OCR error. I’ll take a look at it again. Thank you.

1 Like

You need to make sure the anchor region doesn’t include any characters that change.
It is also possible to set a lower Image recognition threshold in the Preferences in WF Studio.

1 Like

Okay got it! I’ll definitely try it out now. Thank you so much.

1 Like

Hi! It worked. Thank you so much! :slight_smile:

1 Like