cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
rswayne
New Member

Entire process from OCR to Process Extracted Data to Verify Processed Data

Hi. I am a experienced developer (primarily C#) but fairly new to Azure and to OCR. I work for a large insurance company that uses a legacy OCR extraction and processing tool.  We are examining current technology alternatives.  

 

I know this is a long post but I wanted to provide a total picture of what we are trying to accomplish.

 

- We process 100s of 1000s of documents per month, millions of pages.

- Most of our documents contain multiple pages.

- Many of our documents contain multiple document types (Policy, Cancellations, Bills, etc.).

- We service 1000s of clients, and 10s of 1000s of document layouts, and many of the document layouts change often.

- Building and maintaining models (templates) is not an option for us (we gave up on templates a long time ago)

 

We are working on a proof of concept (POC). Currently, we use the Forms Recognizer (FR) API to extract data using the General Document process and save the data to SQL Server tables (keyvalue, pagelines, pagewords, paragraphs, tables) in our data center.  Then we run stored procedures against the extracted data (post-extraction process) to select the specific values we need.  Sometimes the Key/Value pairs give us what we need but often we must further process the extracted data.  For example, parse street address, city, state, zip from a string that contains this information. Or find effective and expiration dates that were not picked up by the Key/Value pair data.

 

Most of the examples I have found deal with Pre-built or Custom Forms FR processes where the structure of the document is known and specific documents may be used to train the extract model.  Any General Document articles/videos only go as far as extracting the data, but not performing any post-extract processing or any decent Verification process. There is a Verify process using Cognitive Services but it is not real usable for documents with dense content, or for multi-page documents. Also, all demos use pristine, single-page documents that produce way-too-clean results.

 

The process we need to replace:

 

- Extract text from documents (mostly pdf and tif), but possibly other image types (jpg, png, etc)

   Data is extracted and saved to on-prem (data centers) SQL Server tables.

   Data contains x/y coordinates that point to the location of the extracted data.

 

- Process the extracted data by searching for "KeyWords" that are mapped (associated) to our FieldNames in extracted text

   Example: Policy Number (FieldName) has many KeyWords mapped to it (Policy No, Policy No:, Policy Nbr, Policy Code etc.).

 

- Verify the "processed" data - a human process

   Our (legacy) OCR application has a module that shows the document and the FieldNames that have been "processed" for the document. 

   When a FieldName is selected (clicked), its source is brought into focus outlined on the document.

   The Verifier (human) may approve the processed data (FieldName) or select a different value from the document by highlighting the desired replacement text (which is now copied) and selecting Replace.

 

We would like to keep this POC to Azure-provide features if possible.

 

If anyone is still reading (😊), I am hoping I can get pointed in the right direction.

 

THANK you.

 

4 REPLIES 4
Nived_Nambiar
Super User
Super User

Hi @rswayne 

Does this helps you ?

 

https://powerautomate.microsoft.com/en-us/blog/automate-document-processing-with-power-automate/

 

This is document processing framework present in power automate

 

Thanks & Regards,

Nived N 🚀

LinkedIn: Nived N's LinkedIn
YouTube: Nived N's YouTube Channel

🔍 Found my answer helpful? Please consider marking it as the solution!
Your appreciation keeps me motivated. Thank you! 🙌

@Nived_Nambiar @rswayne 

 

I think that is a good example set up of the general workflow, but I think it may still require training models for different document formats

https://learn.microsoft.com/en-us/ai-builder/form-processing-model-overview

 

If you want something pretty general to any Word, PDF, or image text scenario then you can check over this template to see if it will work for your scenario… https://powerusers.microsoft.com/t5/Power-Automate-Cookbook/Extract-Data-From-PDFs-and-Images-With-G...

This set-up extracts text from any PDF or image, replicates the document text positioning in a text format, then passes the text formatted data to an Azure GPT prompt where you can specify & customize data extraction processing to whatever you need. If Microsoft sticks to charging something in the 1-3x the equivalent credits of $.003 per 4000 GPT prompt characters usually charged for GPT services, then it may be half as expensive as the other AI Builder data extraction methods too. And if they start to charge more than that, then with an Azure account there is still a way to set up a GPT service in Azure to do the same thing for the $.003 per 4000 characters price, it just will take extra work.

Hi Nived.  Thank you for your reply.  I already have the link you sent me and have reviewed it (I may need to spend more time with it).  I found a video of a person actually showing some of this using Power Automate.  It looks like we must use expected document layouts to train models for extraction.  I understand models to be "templates" for known document layouts.  Like I said in my original post, we have given up on templates because there are so many variations of so many document that keeping up with all of them, and their changes, was too much.  I may be mis-understanding what a model is but from what I have been able to research, models are useful when you have a limited number of variations of a known document.  We are dealing with a staggering number of variations of many different document layouts for many different document types from many different clients. I am hoping that further research will show me that I am not looking at the problem the right way !!

 

Also, the example of a validation process (what we call verification) is a step in the right direction but does not  appear to meet our requirements for this feature:

  • Navigate to the area on the displayed document from where the extracted value was retrieved.
  • Highlight the text on the document that was extracted. 
  • Allow the user to change the selected value by selecting a different value from the document

A video I have seen using this feature (also) shows a simple example and it  supports the lack of I just described.

 

We also must be concerned with protecting very personal data.  We really do not want to store out source documents outside of Azure.

 

As I get back to the business with research results some requirements could change.

 

Thank you, again, for your reply.  It is MUCH appreciated. I am back in the office on Tue and will be continuing the research.

Hi @takolota.  Thank you for your reply.  I need to look at the "train your models" requirement.  Like I said earlier, we have so many variations of documents (by document type, by client ...) that maintaining models (I am looking at models being templates for known documents) will be very difficult.  Your suggestion of using Azure GPT is interesting and I will have to look into this more.

 

I have created a POC using C# to call the Forms Recognizer API, saving the extracted data to SQL Server, then executing stored procedures to process the extracted data -- described in my original post in the paragraph starting with "We are working on a proof of concept (POC)."

 

Thanks, again, for your reply.  I am grateful for all the help I can get with this.  I'll update this post as research progresses!

Helpful resources

Announcements

Celebrating the May Super User of the Month: Laurens Martens

  @LaurensM  is an exceptional contributor to the Power Platform Community. Super Users like Laurens inspire others through their example, encouragement, and active participation. We are excited to celebrated Laurens as our Super User of the Month for May 2024.   Consistent Engagement:  He consistently engages with the community by answering forum questions, sharing insights, and providing solutions. Laurens dedication helps other users find answers and overcome challenges.   Community Expertise: As a Super User, Laurens plays a crucial role in maintaining a knowledge sharing environment. Always ensuring a positive experience for everyone.   Leadership: He shares valuable insights on community growth, engagement, and future trends. Their contributions help shape the Power Platform Community.   Congratulations, Laurens Martens, for your outstanding work! Keep inspiring others and making a difference in the community!   Keep up the fantastic work!        

Check out the Copilot Studio Cookbook today!

We are excited to announce our new Copilot Cookbook Gallery in the Copilot Studio Community. We can't wait for you to share your expertise and your experience!    Join us for an amazing opportunity where you'll be one of the first to contribute to the Copilot Cookbook—your ultimate guide to mastering Microsoft Copilot. Whether you're seeking inspiration or grappling with a challenge while crafting apps, you probably already know that Copilot Cookbook is your reliable assistant, offering a wealth of tips and tricks at your fingertips--and we want you to add your expertise. What can you "cook" up?   Click this link to get started: https://aka.ms/CS_Copilot_Cookbook_Gallery   Don't miss out on this exclusive opportunity to be one of the first in the Community to share your app creation journey with Copilot. We'll be announcing a Cookbook Challenge very soon and want to make sure you one of the first "cooks" in the kitchen.   Don't miss your moment--start submitting in the Copilot Cookbook Gallery today!     Thank you,  Engagement Team

Announcing Power Apps Copilot Cookbook Gallery

We are excited to share that the all-new Copilot Cookbook Gallery for Power Apps is now available in the Power Apps Community, full of tips and tricks on how to best use Microsoft Copilot as you develop and create in Power Apps. The new Copilot Cookbook is your go-to resource when you need inspiration--or when you're stuck--and aren't sure how to best partner with Copilot while creating apps.   Whether you're looking for the best prompts or just want to know about responsible AI use, visit Copilot Cookbook for regular updates you can rely on--while also serving up some of your greatest tips and tricks for the Community. Check Out the new Copilot Cookbook for Power Apps today: Copilot Cookbook - Power Platform Community.  We can't wait to see what you "cook" up!    

Welcome to the Power Automate Community

You are now a part of a fast-growing vibrant group of peers and industry experts who are here to network, share knowledge, and even have a little fun.   Now that you are a member, you can enjoy the following resources:   Welcome to the Community   News & Announcements: The is your place to get all the latest news around community events and announcements. This is where we share with the community what is going on and how to participate.  Be sure to subscribe to this board and not miss an announcement.   Get Help with Power Automate Forums: If you're looking for support with any part of Power Automate, our forums are the place to go. From General Power Automate forums to Using Connectors, Building Flows and Using Flows.  You will find thousands of technical professionals, and Super Users with years of experience who are ready and eager to answer your questions. You now have the ability to post, reply and give "kudos" on the Power Automate community forums. Make sure you conduct a quick search before creating a new post because your question may have already been asked and answered. Galleries: The galleries are full of content and can assist you with information on creating a flow in our Webinars and Video Gallery, and the ability to share the flows you have created in the Power Automate Cookbook.  Stay connected with the Community Connections & How-To Videos from the Microsoft Community Team. Check out the awesome content being shared there today.   Power Automate Community Blog: Over the years, more than 700 Power Automate Community Blog articles have been written and published by our thriving community. Our community members have learned some excellent tips and have keen insights on the future of process automation. In the Power Automate Community Blog, you can read the latest Power Automate-related posts from our community blog authors around the world. Let us know if you'd like to become an author and contribute your own writing — everything Power Automate-related is welcome.   Community Support: Check out and learn more about Using the Community for tips & tricks. Let us know in the Community Feedback  board if you have any questions or comments about your community experience. Again, we are so excited to welcome you to the Microsoft Power Automate community family. Whether you are brand new to the world of process automation or you are a seasoned Power Automate veteran - our goal is to shape the community to be your 'go to' for support, networking, education, inspiration and encouragement as we enjoy this adventure together.     Power Automate Community Team

Hear what's next for the Power Up Program

Hear from Principal Program Manager, Dimpi Gandhi, to discover the latest enhancements to the Microsoft #PowerUpProgram, including a new accelerated video-based curriculum crafted with the expertise of Microsoft MVPs, Rory Neary and Charlie Phipps-Bennett. If you’d like to hear what’s coming next, click the link below to sign up today! https://aka.ms/PowerUp  

Tuesday Tip | How to Report Spam in Our Community

It's time for another TUESDAY TIPS, your weekly connection with the most insightful tips and tricks that empower both newcomers and veterans in the Power Platform Community! Every Tuesday, we bring you a curated selection of the finest advice, distilled from the resources and tools in the Community. Whether you’re a seasoned member or just getting started, Tuesday Tips are the perfect compass guiding you across the dynamic landscape of the Power Platform Community.   As our community family expands each week, we revisit our essential tools, tips, and tricks to ensure you’re well-versed in the community’s pulse. Keep an eye on the News & Announcements for your weekly Tuesday Tips—you never know what you may learn!   Today's Tip: How to Report Spam in Our Community We strive to maintain a professional and helpful community, and part of that effort involves keeping our platform free of spam. If you encounter a post that you believe is spam, please follow these steps to report it: Locate the Post: Find the post in question within the community.Kebab Menu: Click on the "Kebab" menu | 3 Dots, on the top right of the post.Report Inappropriate Content: Select "Report Inappropriate Content" from the menu.Submit Report: Fill out any necessary details on the form and submit your report.   Our community team will review the report and take appropriate action to ensure our community remains a valuable resource for everyone.   Thank you for helping us keep the community clean and useful!

Users online (3,180)