A coworker has a task that involves taking pdf files (they are forms) received from various emails and entering them in a system. I'm trying to make her life a bit easier by reading the PDFs into a single file (.txt or .xls is fine).
Extract text from PDF doesn't read the values entered into the form fields, even when those fields are simple text values. Is there a way I can convert or flatten these files to make them readable? Data is sensitive, so I'd prefer not to upload to cloud services - I'd like to do the conversion on our desktop.
I've done a lot of searching here and on google, but I keep getting a lot of results about Microsoft Forms instead of PDF forms. 🙄
Don't work much with PDF forms, so a nudge in the right direction would be much appreciated!
Thanks!
Solved! Go to Solution.
Here is another one: Get details of a UI element in window.
If the PDF is still in form mode, you should be able to extract the "Own Text" attribute of the text field.
Best of luck!
What if you Print the form to PDF, save it as a different file? That PDF should still be readable, and it should be flattened.
Good thought! But unfortunately, it didn't work. I tested using Microsoft Print to PDF and the files that are created still "hide" the field text from the PDF text reader.
Can you send me a blank form to test?
Oh. What about, in Adobe...File -> Export To -> Plain Text
Here is another one: Get details of a UI element in window.
If the PDF is still in form mode, you should be able to extract the "Own Text" attribute of the text field.
Best of luck!