Parsing of unstructured pdf's (CVs)

Is there a way to extract the key information (university, degree, work experience, etc.) from unstructured CVs given as pdf’s? Of course, all the pdf’s are different…

Hi Karen,

Are your pdf files in text or image format?
If it is text, try using this approach: Working with PDF using Xpath as substitute for OCR
This topic may also be helpful: How to read HTML attributes into a variable and use it in XPath


They can be both… Do you have a demo where you extract data from a pdf that has a different format every time? I’ve nly seen the one with the address field, where the address needs to be at the same sport every time.


Yes, if the PDF is in image format, then you need to use OCR, and then the recognized text has to be at the same spot.