Python Version: 3
Input: PDF file containing Purchase order Input Example: http://gem.compaq.com/gemstore/sites/downloads/SLED_PO_Template.pdf
Note: This is empty purchase order sample format, actual Format may vary. In real time pdf may not be empty.
Desired Output is to get key name and its value from pdf.
Sample Output:
PO number: its value in pdf (Same for other keys)
Question: How to extract name of keys and its relevant value data from given pdf file?
What I have tried:
Tried tabula-py, pdfminer2, pdftotext, OCR, pdf2json.
But main challenge I am facing is: Relating key with its true value.