I have multiple pages of PDF with the same layout and I've been trying to extract the words as well as images and provide them with metadata in order to retrieve/search them later.
I've tried playing around with the invoice extractor, which is half of what I want to do (extract text from PDF or image and provide meta data) but also want to be able to do that with images within the same PDF. Is there a way to do that with OCR?
For example, I have a standard PDF form with 3 images and text and I would like to extract each individual image and information and put into a Sharepoint document library in order to provide each with different metadata (ie. Field Note Photo, Overall Map View, Field Note Location, etc.)
Images: Red, Yellow, Blue
Text: Green