Not able to use Extract Information tool



I’m trying to do the “2 - Information Extraction Task” on my RPAx but I’m finding some difficulties.

When I use a website url on the “document_link” WorkSpace does it’s job and I can extract information.

When I use url to a pdf document the WorkSpace throws this error “provided url is incorrect or cannot be resolved”.

I went to the Control Tower and noticed that OCR status is unknown. Is this the problem? How can I extract information from a pdf file?

Pedro Mendes Pereira



Information extraction does not support PDF documents as an input, only TXT, XML, HTML
You need to convert PDF to those formats.
Please see detailed guide here -

1 Like

Thanks @azinchuk

I’m trying to use/activate the OCR feature, to be able to extract info from my pdfs, but I can’t find the UI to call the OCR api. Is this done via the Control Tower? How can I activate the OCR?


1 Like


Please let us know can we able to extract the information from pDF using OCR if yes how we will perform that
Please share the sample script.

Note : For using OCR do we need to learn the groovy language



Please create a separate topic for that