Using Regular Expressions for text parsing

It’s possible that I’ve missed it, but how about using regular expressions to transform &/ or extract text?

Quick example of a transformation

Results in

I think it would make the most sense to implement this in the replace text command. I appreciate that regexes aren’t the easiest of things to learn (I mean they even look like magic spells), but within rpa-x, they could replace quite a lot of text manipulation.

Extracting data.
It would be cool to OCR an image and extract all the elements of the text into a list (or even a table if there’s more than group), in just one line of code.

Of course, if it can already do this, could someone please point me in the right direction?


What is your end goal. If you want to automaticaly extract information from a document, Workfusion Spa already provides a solution using machine learning. This solution is more robust to OCR mistakes since it can still detect a date with errors (ie: 12/3i/2O17) and by applying post-processing, you can recover the desired date.
Regex works perfectly to do what you want, the only issue is that it is not robust enough to be combined with OCR.
I think extracting in one line is a bit too optimistic since you have to:

  1. Specify what you want to extract, and maybe the type of what you want to extract (to improve the results)
  2. Configure OCR
  3. Handle edge case (ie: date with bad OCR)
  4. Postprocess to the desired format