cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
oldhamjr
New Member

AI Builder Training Error

I am encountering the error below when attempting to train a Form model, using multipage PDF files as the source.   These files have already been optimized to reduce the data file size.  Each of the 5 PDFs used in the attempted training contains no more than 50 pages.

 

"Fields could not be loaded for this document. It looks like this PDF document has many pages to process. We recommend that you split the PDF with only the pages you need to extract data from and upload the reduced document instead."

 

Does manually extracting the pages with the desired data for the training model negatively impact the real-world use of the model, which won't have pages manually extracted for future documents?

3 REPLIES 3
JoeF-MSFT
Power Apps
Power Apps

Hi @oldhamjr.

 

Thank you posting this question and apologies that you are experiencing this. 

 

Unfortunately a PDF with 50 pages might be too many pages today to train a model. Do you need the model to extract data from all those pages? If not, then what you can do is generate 5 sample PDFs for trainign with only the pages where there is data you want to extract.

 

Once the model is trained, you can use the model in a cloud flow in Power Automate and you will be able to specify which pages to process as described here. This will also help reduce the cost, as the cost of using AI Builder is per page. 

 

We're actively working so that beginning of next year, PDFs with 50 pages and beyond won't be an issue to train a form processing model.  

oldhamjr
New Member

Thank you for the quick response!  Your comments regarding the training model make sense.  Unfortunately, I won't know which pages contain the data, since these files are provided from many different third-party sources.  Single page training to single page evaluation by Flow works well.  Single-page training with a multi-page flow doesn't produce usable results in my table.  Performing the upfront manual work to extract single pages for the Flow to scan seems like the only option with this tool.  This doesn't seem to provide me with an easy, low-effort, solution for my end users.  Is there an option for the AI model to evaluate an .txt file, if the data from the PDF is converted to text?

 

JoeF-MSFT
Power Apps
Power Apps

Thank you @oldhamjr for the feedback. 

For documents with as many pages as 50 the recommendation today is to use the page range option in flow. We keep working to enable processing of up to 500 pages at once by beginning of next year.

Helpful resources

Announcements
Power Platform Call June 2022 768x460.png

Power Platform Community Call

Join us for the next call on August 17, 2022 at 8am PDT.

Power Platform Conf 2022 768x460.jpg

Join us for Microsoft Power Platform Conference

The first Microsoft-sponsored Power Platform Conference is coming in September. 100+ speakers, 150+ sessions, and what's new and next for Power Platform.

Top Solution Authors
Top Kudoed Authors
Users online (1,567)