Not able to use Extract Information tool

ocr
issues
#1

Hi,

I’m trying to do the “2 - Information Extraction Task” on my RPAx but I’m finding some difficulties.

When I use a website url on the “document_link” WorkSpace does it’s job and I can extract information.

When I use url to a pdf document http://www.pdf995.com/samples/pdf.pdf the WorkSpace throws this error “provided url is incorrect or cannot be resolved”.

I went to the Control Tower and noticed that OCR status is unknown. Is this the problem? How can I extract information from a pdf file?

Thanks,
Pedro Mendes Pereira

#2

@pedro.pereira,

Information extraction does not support PDF documents as an input, only TXT, XML, HTML
You need to convert PDF to those formats.
Please see detailed guide here - https://kb.workfusion.com/display/WF/Information+Extraction+-+IE

1 Like
#3

Thanks @azinchuk

I’m trying to use/activate the OCR feature, to be able to extract info from my pdfs, but I can’t find the UI to call the OCR api. Is this done via the Control Tower? How can I activate the OCR?

Thanks,
Pedro

1 Like
#4

Hi

Please let us know can we able to extract the information from pDF using OCR if yes how we will perform that
Please share the sample script.

Note : For using OCR do we need to learn the groovy language
Thanks
Abhishek

#5

@pedro.pereira,

Please create a separate topic for that