PDF to Excel conversion

We are designing simple process to:

  1. Read an Outlook email
  2. Download attachment in zip
  3. Extract the zip file
  4. Read that pdf inside it with OCR
  5. Store the extracted output in Excel
  6. Insert the extracted data in SQL Server
  7. Send the acknowledge email.

Please help me to find out whether this process can be automated by WorkFusion or not.

Hi
the high level answer is yes.

But it very much depends on the details. For example

  • How do you identify the email to read?
  • Of what type is the PDF? Is it editable with detailed html code and fields or just a plain image?
  • What information you want to extract from the PDF and is it clearly identified via fixed fields in the PDF?

thanks for reply…
normal pdf not containg image and i need to extract table from pdf

Good! Then instead of OCR you might be able to extract the data using xpath. It is much faster and reliable.

xpath can work with pdf??

Yes

1 Like

thanks for reply
But xpath is not working with all pdf. It’s working with only some pdf like when they are generated from browser.

Hi

I have multiple Pdf’s invoices where I need to extract information, but the challenge I am facing here is the format of invoices are different (the position of invoice date, invoice number, account details, etc.) and these are scanned images.

Is there any way to extract these details?

Appreciate your help on this.

Thanks
Siva Ramakrishna

try using ocr

1 Like

I tried using OCR but the issue here is I have 1000 pdf’s which have different formats. If I use OCR I have to capture the area in all 1000 pdf’s.

If your invoices come in a great number of different formats a plain RPA solution will not do the job because as you say, you would need to specifiy the fields to scan individually for each format.
There are various commercial invoice scanning and processing tools on the market, some of them including ML capabilities. Maybe @ashapkina can advise also if there are example use cases where the Workfusion SPA solution was used for automatic invoice processing.

1 Like

hello team,
I have listed some points in above. Can i used direct activity for all point? Can i achieve this point using workfusion ? if yes then how?

Thanks

Hi @anon43321598
Please specify in which of the listed points you have doubts.

  1. Read an Outlook email
  2. Download attachment in zip
  3. Extract the zip file
  4. Read that pdf inside it with OCR
  5. Store the extracted output in Excel
  6. Insert the extracted data in SQL Server
  7. Send the acknowledge email.

Please, specify what is your question about each of these points.

  1. Read an Outlook email and Download attachment in zip —> how to achive this point

  2. Extract the zip file ---------> how to achive this point

  3. Read that pdf inside it with OCR pdf contains table -----------> how to read pdf table and store data in excel?

  4. Insert the extracted data in SQL Server --------> how to store data from excel to database

  5. Send the acknowledge email -------> using outlook how to send mail?

For all points use keystrokes and mouse click actions. For PDF use OCR or xpath as indicated. Read the available documentation, start designing the process and feel free to ask specific questions when you run into a problem.

1 Like

if i use keystrokes and mouse click then it will be time consuming and not reliable.
for pdf to excel in my case xpath is not working when i go to used ocr for table scraping but Bot writing into excel it’s write in one cell (whole table)

Hi @anon43321598
There are several topics on Outlook automation on the forum. Here is one of them

Then you can open the archive, and extract the file using keystrokes and clicks on images and window controls. You need to add timeouts to these actions for them to be more reliable.

If you read the whole table in one OCR action, it will be stored as a string and then copied to one cell in Excel. If you need to copy it in Excel as a table, you should read each cell in a separate OCR action.

@sivaramakrishna.mangisetty
If all your invoices are of different formats, then you cannot use RPA Express to process them, you will need machine learning. Invoice processing is a popular use case for SPA. If you need more info on it - let me know.