cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
LNeaga_Next
Frequent Visitor

AI Builder form processing multiple pages from one pdf file

Hello,

We are considering using AI Builder forms processing to process multiple invoices and insert the data into a excel file.

 

We have created a model but when we run the flow for a one pdf file with multiple invoices (the same format), the AI Builder is reading and insert the data for only first invoice.

 

Is there a way to read the information from all pages and insert them into an excel file in the same time?

Thanks!

5 REPLIES 5
CedrickB
Power Apps
Power Apps

Hi,

It depends on your documents.

If you have one invoice per page, you can split the PDF and perform the extraction on the splited files.

Otherwise, you can use a Text Recognizer model to extract all the texts and identify the pages break based on parterns and then split the PDF.

In coming weeks, we are going to provide the ability to predict on a specific page which would avoid having to split the PDF once you have extracted the page break.

Feel free to send an email to aihelpen@microsoft.com so we can have a call and discuss what is the best option in your scenario.

wintechchen
Regular Visitor

Hi LNeaga,

I have the business case that a single PDF invoice file contains multiple pages. I use Encordian to split PDF file to multiple files with each file contains only one invoice. Then use List Files action to get those splited files and then use Apply to each action to get file content passing to AI Model for Form processing. I hope this info provides you some idea.

@CedrickB , @LNeaga_Next ,
I'm having a very similar scenario. We want to put multiple signed paper documents (same 'collection', exact same lay-out) on a scanner and let the computer separate out every document (length of the documents is variable: 1, 2, 3, 4... pages long). When that one long pdf is split up we want to use the AI model to extract the data from every document.
I'm stuck on the part to correctly separate every single document.

Detailed help on how to fix this would be appreciated. 

Hi Sebastien,

First of all, using AI to perform this intelligent splitting is in our radar but I can't give an ETA so far.

In the meantime, the alternative is to use an "old school" pattern search approach.

If your invoices have some "Page 1, Page 2..." texts or noticeable page breaks, using Text Recognition to extract the texts then searching for those patterns will allow to detect the invoice first pages locations.

 

So here is in a nutshell the process

1. Call AI Builder Text Recognition

2. "Apply to each" on results

3. Use the Filter Array to match for the text (Edit in advanced mode to build a custom search function)

CedrickB_0-1643015316587.png

4. Gather the page breaks in an Array variable that you have declared upfront

5. "Apply to each" on your array variable

6. Call Invoice Processing with "page range" using the pages split calculated above

JoeF-MSFT
Power Apps
Power Apps

Hi!

 

For future reference, here you can download a sample flow that will return the page ranges that delimits different documents within a PDF: Know where to split a PDF with multiple documents ... - Power Platform Community (microsoft.com)

Helpful resources

Announcements
October Events

Mark Your Calendars

So many events happening this month - don't miss out!

 WHAT’S NEXT AT MICROSOFT IGNITE 2022

WHAT’S NEXT AT MICROSOFT IGNITE 2022

Explore the latest innovations, learn from product experts and partners, level up your skillset, and create connections from around the world.

Register for a Free Workshop.png

Register for a Free Workshop

Learn to digitize and optimize business processes and connect all your applications to share data in real time.

Users online (2,601)