cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
dc23
Helper I
Helper I

Data extraction from Scanned PDF document

I have got scanned invoices converted into pdf format. I want to have data extracted from the invoices and store it in an excel sheet.

I have tried AI builder, but all the data which needs to be extracted is not getting analysed.

 

Please provide some suggestions.

 

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions

Hi @dc23 

In my exp you need to provide enough sample documents which are very similar but with different data so that the fields are recognized for selection, ref - https://docs.microsoft.com/en-us/ai-builder/form-processing-sample-data

I believe you need the AI model to detect the fields, you can't just select them.

HTH

Jay

 

View solution in original post

8 REPLIES 8
abm
Most Valuable Professional
Most Valuable Professional

Hi @dc23 

 

Did you looked into this?

 

https://powerusers.microsoft.com/t5/Power-Automate-Community-Blog/Extract-data-from-documents-with-M...

 

@Jay-Encodian  could help you in this.

 

Thanks



Did I answer your question? Mark my post as a solution!

If you liked my response, please consider giving it a thumbs up


Proud to be a Flownaut!

Learn more from my blog
Power Automate Video Tutorials

Yes i have gone through the solution provided. It worked for me in case I had to pick up a single element from the invoice, however, in case the invoice has multiple rows for Quantity, Item etc, I'm not able to get the desired result.

 

Thanks!

abm
Most Valuable Professional
Most Valuable Professional

Hi @dc23 

 

Thanks for your quick reply. That might be the product limitation. @Jay-Encodian could clarify more on this.

 

Thanks



Did I answer your question? Mark my post as a solution!

If you liked my response, please consider giving it a thumbs up


Proud to be a Flownaut!

Learn more from my blog
Power Automate Video Tutorials
Jay-Encodian
Community Champion
Community Champion

Thanks @abm 

Hey @dc23 

The 'Extract Text from Regions' action is designed to allow text extraction from pre-defined regions... if those regions are dynamic it is more difficult. You have a few options to consider:

  1. Persist with the AI builder approach
  2. Use the Encodian action but add regions where data might exist and then handle null values in your Flow
  3. Use the Encodian 'Get PDF Text Layer' action combined with our 'Search Text - Regex' (Preview) action

I'd recommend you go with option #1, this is the right tool for this scenario especially where you have multiple differing layouts of invoices... it's absolutely possible with the Encodian action, but this is a more basic tool (By design) and you will have to do more work up front to get this to work.

If you want to try the Encodian approach let me know and I'll guide you through.

HTH

Jay

Hi Jay,

 

Thanks for the response, however, before moving onto encodian I did try with the AI builder approach. In that case the issue, I encountered was that for some of the invoices the data was being fetched, however, for some others the with the same format the data was not getting analysed.

 

Also, not sure if AI builder allows us to select the data of our choice rather than just suggesting the fields we can select from.

 Do you have suggestion to it.

 

Thanks!

 

Hi @dc23 

In my exp you need to provide enough sample documents which are very similar but with different data so that the fields are recognized for selection, ref - https://docs.microsoft.com/en-us/ai-builder/form-processing-sample-data

I believe you need the AI model to detect the fields, you can't just select them.

HTH

Jay

 

CFernandes
Most Valuable Professional
Most Valuable Professional

You can use Muhimbi PDF Converter Power Automate action to Extract Data from Scanned PDF document.

 

Muhimbi PDF Converter comes with support for a number of OCR (Optical Character Recognition) related facilities including the ability to make image based PDFs (Scans, faxes) fully searchable and indexable. In addition it support a way to extract this text to allow information such as Invoice numbers, Purchase Order numbers or other identifiable information to be extracted.

 

You can find details.

 

 

I hope this helps.

takolota
Multi Super User
Multi Super User

If anyone wants to extract data from a PDF or image without training a model for select documents, try this new GPT data extraction method: https://powerusers.microsoft.com/t5/Power-Automate-Cookbook/Extract-Data-From-PDFs-and-Images-With-G...

 

It doesn’t require specifying certain document areas, wordings, styles, etc. It just OCRs the file, converts it to a replica text (txt), and passes it to a GPT prompt where you can ask GPT to do whatever you want with the document data.

Helpful resources

Announcements

Celebrating the May Super User of the Month: Laurens Martens

  @LaurensM  is an exceptional contributor to the Power Platform Community. Super Users like Laurens inspire others through their example, encouragement, and active participation. We are excited to celebrated Laurens as our Super User of the Month for May 2024.   Consistent Engagement:  He consistently engages with the community by answering forum questions, sharing insights, and providing solutions. Laurens dedication helps other users find answers and overcome challenges.   Community Expertise: As a Super User, Laurens plays a crucial role in maintaining a knowledge sharing environment. Always ensuring a positive experience for everyone.   Leadership: He shares valuable insights on community growth, engagement, and future trends. Their contributions help shape the Power Platform Community.   Congratulations, Laurens Martens, for your outstanding work! Keep inspiring others and making a difference in the community!   Keep up the fantastic work!        

Check out the Copilot Studio Cookbook today!

We are excited to announce our new Copilot Cookbook Gallery in the Copilot Studio Community. We can't wait for you to share your expertise and your experience!    Join us for an amazing opportunity where you'll be one of the first to contribute to the Copilot Cookbook—your ultimate guide to mastering Microsoft Copilot. Whether you're seeking inspiration or grappling with a challenge while crafting apps, you probably already know that Copilot Cookbook is your reliable assistant, offering a wealth of tips and tricks at your fingertips--and we want you to add your expertise. What can you "cook" up?   Click this link to get started: https://aka.ms/CS_Copilot_Cookbook_Gallery   Don't miss out on this exclusive opportunity to be one of the first in the Community to share your app creation journey with Copilot. We'll be announcing a Cookbook Challenge very soon and want to make sure you one of the first "cooks" in the kitchen.   Don't miss your moment--start submitting in the Copilot Cookbook Gallery today!     Thank you,  Engagement Team

Announcing Power Apps Copilot Cookbook Gallery

We are excited to share that the all-new Copilot Cookbook Gallery for Power Apps is now available in the Power Apps Community, full of tips and tricks on how to best use Microsoft Copilot as you develop and create in Power Apps. The new Copilot Cookbook is your go-to resource when you need inspiration--or when you're stuck--and aren't sure how to best partner with Copilot while creating apps.   Whether you're looking for the best prompts or just want to know about responsible AI use, visit Copilot Cookbook for regular updates you can rely on--while also serving up some of your greatest tips and tricks for the Community. Check Out the new Copilot Cookbook for Power Apps today: Copilot Cookbook - Power Platform Community.  We can't wait to see what you "cook" up!    

Welcome to the Power Automate Community

You are now a part of a fast-growing vibrant group of peers and industry experts who are here to network, share knowledge, and even have a little fun.   Now that you are a member, you can enjoy the following resources:   Welcome to the Community   News & Announcements: The is your place to get all the latest news around community events and announcements. This is where we share with the community what is going on and how to participate.  Be sure to subscribe to this board and not miss an announcement.   Get Help with Power Automate Forums: If you're looking for support with any part of Power Automate, our forums are the place to go. From General Power Automate forums to Using Connectors, Building Flows and Using Flows.  You will find thousands of technical professionals, and Super Users with years of experience who are ready and eager to answer your questions. You now have the ability to post, reply and give "kudos" on the Power Automate community forums. Make sure you conduct a quick search before creating a new post because your question may have already been asked and answered. Galleries: The galleries are full of content and can assist you with information on creating a flow in our Webinars and Video Gallery, and the ability to share the flows you have created in the Power Automate Cookbook.  Stay connected with the Community Connections & How-To Videos from the Microsoft Community Team. Check out the awesome content being shared there today.   Power Automate Community Blog: Over the years, more than 700 Power Automate Community Blog articles have been written and published by our thriving community. Our community members have learned some excellent tips and have keen insights on the future of process automation. In the Power Automate Community Blog, you can read the latest Power Automate-related posts from our community blog authors around the world. Let us know if you'd like to become an author and contribute your own writing — everything Power Automate-related is welcome.   Community Support: Check out and learn more about Using the Community for tips & tricks. Let us know in the Community Feedback  board if you have any questions or comments about your community experience. Again, we are so excited to welcome you to the Microsoft Power Automate community family. Whether you are brand new to the world of process automation or you are a seasoned Power Automate veteran - our goal is to shape the community to be your 'go to' for support, networking, education, inspiration and encouragement as we enjoy this adventure together.     Power Automate Community Team

Hear what's next for the Power Up Program

Hear from Principal Program Manager, Dimpi Gandhi, to discover the latest enhancements to the Microsoft #PowerUpProgram, including a new accelerated video-based curriculum crafted with the expertise of Microsoft MVPs, Rory Neary and Charlie Phipps-Bennett. If you’d like to hear what’s coming next, click the link below to sign up today! https://aka.ms/PowerUp  

Tuesday Tip | How to Report Spam in Our Community

It's time for another TUESDAY TIPS, your weekly connection with the most insightful tips and tricks that empower both newcomers and veterans in the Power Platform Community! Every Tuesday, we bring you a curated selection of the finest advice, distilled from the resources and tools in the Community. Whether you’re a seasoned member or just getting started, Tuesday Tips are the perfect compass guiding you across the dynamic landscape of the Power Platform Community.   As our community family expands each week, we revisit our essential tools, tips, and tricks to ensure you’re well-versed in the community’s pulse. Keep an eye on the News & Announcements for your weekly Tuesday Tips—you never know what you may learn!   Today's Tip: How to Report Spam in Our Community We strive to maintain a professional and helpful community, and part of that effort involves keeping our platform free of spam. If you encounter a post that you believe is spam, please follow these steps to report it: Locate the Post: Find the post in question within the community.Kebab Menu: Click on the "Kebab" menu | 3 Dots, on the top right of the post.Report Inappropriate Content: Select "Report Inappropriate Content" from the menu.Submit Report: Fill out any necessary details on the form and submit your report.   Our community team will review the report and take appropriate action to ensure our community remains a valuable resource for everyone.   Thank you for helping us keep the community clean and useful!

Users online (3,442)