Showing results for 
Search instead for 
Did you mean: 
New Member

Extracting data from varying semi-structured pdf invoices

Wanted to confirm on the AI Builder capability on extracting data from non-standard pdf invoices. The semi-structured invoices can be of 200 different formats.

Solution Sage
Solution Sage

I am using AI builder to extract data from around 80 layouts, 200 should not be a problem if they vary from each other enough.


In my case, layouts were almost identical between some (coming from the same vendor system, but for example buyer and seller data switched over).


I ended up with splitting data sets into 2 separate models for those and then orchestrating via flow:

1. Try with 1 model, see if TAX number is allocated to model 1 (excel table) if not:

2. Extract with 2 model, see if TAX number is allocated to model 2 (excel table), if yes - put data into excel/list with extracted data, or cancel flow if tax number not allocated to any model (meaning totally new format, data not recognized correctly)


Issue with this approach is you consume your AI credits twice for those layouts in model 2, first when you pass them through model 1 and then when you pass through model 2.

Thanks a lot. Will check and come back.


Frequent Visitor

You can explore prebuilt model for invoice processing from AI  builder. That returns for almost all standard parameters . Training is practically not possible for 200 odd templates . 

Helpful resources

Power Automate News & Announcements

Power Automate News & Announcements

Keep up to date with current events and community announcements in the Power Automate community.

Community Calls Conversations

Community Calls Conversations

A great place where you can stay up to date with community calls and interact with the speakers.

Power Automate Community Blog

Power Automate Community Blog

Check out the latest Community Blog from the community!

Top Solution Authors
Top Kudoed Authors
Users online (4,370)