PDF to Excel conversion


But xpath is not working with all pdf. It’s working with only some pdf like when they are generated from browser.


I have multiple Pdf’s invoices where I need to extract information, but the challenge I am facing here is the format of invoices are different (the position of invoice date, invoice number, account details, etc.) and these are scanned images.

Is there any way to extract these details?

try using ocr

I tried using OCR but the issue here is I have 1000 pdf’s which have different formats. If I use OCR I have to capture the area in all 1000 pdf’s.

If your invoices come in a great number of different formats a plain RPA solution will not do the job because as you say, you would need to specifiy the fields to scan individually for each format.
There are various commercial invoice scanning and processing tools on the market, some of them including ML capabilities. Maybe @ashapkina can advise also if there are example use cases where the Workfusion SPA solution was used for automatic invoice processing.

hello team,
I have listed some points in above. Can i used direct activity for all point? Can i achieve this point using workfusion ? if yes then how?


Hi @anon43321598
Please specify in which of the listed points you have doubts.

  1. Read an Outlook email
  2. Download attachment in zip
  3. Extract the zip file
  4. Read that pdf inside it with OCR
  5. Store the extracted output in Excel
  6. Insert the extracted data in SQL Server
  7. Send the acknowledge email.

Please, specify what is your question about each of these points.

  1. Read an Outlook email and Download attachment in zip —> how to achive this point

  2. Extract the zip file ---------> how to achive this point

  3. Read that pdf inside it with OCR pdf contains table -----------> how to read pdf table and store data in excel?

  4. Insert the extracted data in SQL Server --------> how to store data from excel to database

  5. Send the acknowledge email -------> using outlook how to send mail?

For all points use keystrokes and mouse click actions. For PDF use OCR or xpath as indicated. Read the available documentation, start designing the process and feel free to ask specific questions when you run into a problem.

if i use keystrokes and mouse click then it will be time consuming and not reliable.
for pdf to excel in my case xpath is not working when i go to used ocr for table scraping but Bot writing into excel it’s write in one cell (whole table)

Hi @anon43321598
There are several topics on Outlook automation on the forum. Here is one of them

Then you can open the archive, and extract the file using keystrokes and clicks on images and window controls. You need to add timeouts to these actions for them to be more reliable.

If you read the whole table in one OCR action, it will be stored as a string and then copied to one cell in Excel. If you need to copy it in Excel as a table, you should read each cell in a separate OCR action.

If all your invoices are of different formats, then you cannot use RPA Express to process them, you will need machine learning. Invoice processing is a popular use case for SPA. If you need more info on it - let me know.

Thanks @ashapkina and @timriewe
@ashapkina i will try your succession if any difficulties comes then i ask.

One more question:
do RPA Express and Workfusion SPA contain different Activity?

Well, WorkFusion SPA contains all components that RPA Express has (WorkFusion Studio, recorder), but it also has many more options in Control Tower and WorkSpace, as well as an analytics dashboard. But the main difference is machine learning.

Im not getting any xpaths, is it working for all the pdf?

No, if the PDF is only image based then you dont get any xpath information. Only if the PDF is “True” or digitally created or searchable you get xpath information.
I found this page from Abbyy that explains the types:


Hi Anon,

