Extract Data from Pdf invoices to excel

Hi All,

I have an invoice with all details like name ,add,items etc.
I need to extract the details of items from the invoice ,this could be different every time and no of items could be different too.

I used to xpath and tried pasting everything in text file ,however it is not coming in correct format.

can anyone suggest an efficient way for this.

Hi @prateek_jain1a
When you say it does not come in the right format, what do you mean? What do you want to do with the data after extracting it? Instead of pasting into a text file maybe pasting each extracted item into an Excel file would be more useful. If you have managed to identify the items via xpath you already did the main part of the job :+1:

Hi @timriewe
Thanks for the response,Please find attached the invoice where I have highlighted in each column I need in excel.

When taking xpath,the xpath is coming for each row in pdf not for the full cell. Please find attached how I am trying to capture xpath.

I need the whole highlighted and encircled in one cell of excel and these rows will be dynamic.

Hi @prateek_jain1a
I think this is a tricky one. @ashapkinaonce once recommended to me using the style attributes of an xpath and put them into variables which worked well in these kind of structured documents. You can probably work using the style attributes of the column headers first (Pos., Art.No, …) . From them you extract the style attribute “left” and “top” and put them into a variable. This would give you the left and maximum top values of the information you need to extract.
And then…maybe you can think of a way to get to the information you need. Right now I cannot think how but maybe this helps you on the way.