Re: Know where to split a PDF with multiple documents in it

JoeF-MSFT · ‎02-19-2022

This flow takes a PDF that has multiple documents in it – for example, multiple invoices in a single PDF – and uses a delimiter word you provide to know where the PDF should be split or processed. It uses AI Builder text recognition (OCR) to read all the text from the PDF to then obtain the page ranges for the different documents in the PDF.

You can customize this flow to use a connector that will do the actual splitting of the PDF like Adobe PDF Services, Encodian, Plumsail among others. Or you can directly specify the page range in supported AI Builder actions like Invoice Processing and Form Processing. To use this flow:

You will need to have an AI Builder license to use this flow. Don’t have one? You can start a free trial at: https://aka.ms/tryaibuilder?utm_source=powerautomate-cookbook&utm_medium=post&utm_campaign=aib-split...
Import the attached .zip file in this message into your Power Automate environment.
After you upload the flow, make sure you go to the ‘Initialize document delimiter variable’ action and define which text delimits the beginning of a new document. For examples, in this PDF example ‘Adatum multiple invoices.pdf’ we use the word ‘Invoice’ as the text that delimits the start of a new invoice.
When running the flow, the actions ‘Page range for split’ and ‘Last page range for split’ will return the page ranges for each document within the PDF. You can add here any action to split of process the PDF by page range.

Don't hesitate to ask questions in the comments section below! 💬

lucascroxatto · ‎02-16-2023

HI @JoeF-MSFT

Is the Package (zip) still working? I noticed the now it is "recognize text in a image or a PDF document"
The invoice PDF example URL is not working (Adatum multiple invoices.pdf), can you restore please?

JoeF-MSFT · ‎02-18-2023

Hi @lucascroxatto - thanks for the heads up.

Yes, the package should still work.
Thanks! I've restored the link to the sample PDF.

JayJayRiv · ‎02-22-2023

Hello,

I am fairly new to Power Automate and i am having trouble uploading the flow.

JoeF-MSFT · ‎02-25-2023

Hi @JayJayRiv - thanks for the question. You will need to click on + Create new and provide your credentials. Hope this helps! 🙂

rishabhgupta · ‎04-29-2023

How can we rename the file as per the user name and save the file to other location instead of dataverse? Above flow works fine but i want to send the splitted files to onedrive instead of Dataverse. And also want to save the file name as ..

User ID abc.com

...........

User ID xuz.com...

So it will extract the User ID and save the pdf named as abc.com or xuz.com

JoeF-MSFT · ‎04-30-2023

Hi @rishabhgupta - this cookbook from @plarrue can help for your scenario: Renaming files in OneDrive after extracting a fiel... - Power Platform Community (microsoft.com)

minhvo · ‎06-12-2023

Thanks for sharing the workflow.

It works for me.

AlexEncodian · ‎07-25-2023

Encodian has a simple action that does all of this with no set up...its called: Split PDF by Text

You can choose whether to use a set string as a split character or use a regular expression for more flexibility.

PDFs have to be searchable, otherwise you can OCR a PDF Document first.

RookAils · ‎08-16-2023

Hi @JoeF-MSFT , how do I save the splitted PDFs into GoogleDrive ?

JoeF-MSFT · ‎08-19-2023

Hi @RookAils, thanks for the question. You can use the Create file action in your flow, from the Google Drive connector.