Solved: Re: How to extract text from PDF using PAD?

vaibhavtandon87 · ‎10-28-2020

Dear community ,

How to extract the data from PDF's and store in excel using PAD?

My flow is failing at step 'Extract text with OCR' with error message - Failed to extract text with OCR.

Steps used-

1-Create Tesseract OCR engine

2-Extract text with OCR

3-Write text to file ( just for testing) , eventually it will be excel sheet.

Please let me know if I have to do any specific configurations?

JamesP_MSFT · ‎10-30-2020

@vaibhavtandon87

The team has almost completed work on the said feature, so it will be available really soon.

Best regards,
James

View solution in original post

JamesP_MSFT · ‎10-29-2020

Hello @vaibhavtandon87,

Right now there is not an ability to extract text or images from a PDF file.
The appropriate group of actions will be available in Power Automate Desktop in the near future.

Best regards,
James

vaibhavtandon87 · ‎10-29-2020

Thanks James, is it in near future? Tentative timelines will help.

JamesP_MSFT · ‎10-30-2020

@vaibhavtandon87

The team has almost completed work on the said feature, so it will be available really soon.

Best regards,
James

Alexanderderv · ‎12-03-2020

This sounds very interesting and will sure be useful!

When you say it will be available really soon, could it be before the ending of 2020 or at the beginning of 2021?

Keep up the good work!

vaibhavtandon87 · ‎12-17-2020

@JamesP_MSFT ,

I can see the functions like extract from pdf which is great!

Could you please guide, if I extract a table on the first page along with headers containing useful information, how to pull that into excel as separate information?

What it is doing is taking all the content from PDF page and just dumping that into a cell. Can i further decompose that information into useful information and how?

Anonymous · ‎03-09-2021

Hi, @JamesP_MSFT ! Good evening! 🙂

Any news on this CV working with PDF-files?... I'm wondering if you know the site I can track for future PAD updates. 🙂 I am able to work with the alternative "Extract text from PDF" and just use RegEx with some extra steps... But would love to implement this alternative as soon as it has been released!

GK