How to open the file URL from Open Website action?

rpa
xpath

#1

Hi experts,

I am trying to extract the PDF using Xpath method, I was successfully managed to extract the info from a single PDF file using <file:///C:/Users/admin/Desktop/invoice/sample.pdf> URL. However, I’d like to try the batch way, which means I put multiple PDFs inside the C:/Users/admin/Desktop/invoice folder, my problem is how to get all the PDF file URL then put into List Variable? I tried to use “Get Folder Contents” action but only get the normal URL (C:\Users\admin\Desktop…) and “Open Website” action cannot open it.

Thanks in advance!


#2

Hi Josh, you can use text actions to change the path and write the url like this:


#3

Dear ashapkina,

Thanks so much, that’s a great tip.

I also have another problem with xpath search in readable PDF, there is a form with description content like this:

Screenshot_1

And the XML like below:

<div style="left: 242.005px; top: 618.322px; font-size: 15px; font-family: sans-serif; transform: scaleX(1.02145);">PDU</div>
<div style="left: 242.005px; top: 635.002px; font-size: 15px; font-family: sans-serif; transform: scaleX(1.01315);">Rack basic PDU: </div>
<div style="left: 359.485px; top: 635.002px; font-size: 15px; font-family: sans-serif; transform: scaleX(1.0425);">16</div>
<div style="left: 376.165px; top: 635.002px; font-size: 15px; font-family: sans-serif; transform: scaleX(1.05892);">A IEC</div>

When I use xpath search like: //div[starts-with(@style, ‘left: 102.083px’)], it only return the first value that div included: PDU, I want to include all the content for that.

How to fix this?

Thanks!


#4

Hi Joshua, could you share one file to have a closer look?


#5

7750200262cna.pdf (68.8 KB)

Dear ashapkina,

I attached a PDF sample for reference, my question is how can I extract the content from those tables?

Thanks.


#6

Thanks a lot! I’ll have a look now :+1:


#7

@JoshDaone I tried extracting the data from the document using xpath, but it seems that a lot of fields there have absolutely the same properties. I think in this case it is better to use OCR for extracting the data.


#8

Thanks, I tried to use OCR but have some issues like cannot capture text in multiple pages, may I know if SPA has the abilities to do that ?


#9

I’m not sure about OCR capabilities in SPA, will get back to you with an answer a bit later.

Have you tried adding mouse scroll actions to scroll through several pages in PDF?


#10

@JoshDaone sorry for delay. Yes, the OCR capabilities in SPA are much broader than in RPA Express.