cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
JWall
New Member

OCR input requirements

Hello,

 

I am attempting to create an AI to recognise custom document information, and I am coming into multiple problems with OCR correctly identifying text. I realise this is a known problem and is being countermeasured with constant updates to the OCR model as per power automate community forum post 'Problem with Model recognising Zero and letter O'

 

I noticed one of the input requirements for OCR is ~8pt font text in order to Read. When analising the document type I am using train the AI model, at standard size, the font is ~8pt 2022-07-29_07-54-05.png

2022-07-29_07-35-59.png

 

 

 

 

 

 

 

 

 

 

 

 

Now, I see where this could be a problem with a document that employs raster imaging, in which the number of pixels in an image is predetermined, and when you zoom in the document appears to have lower resolution. The document type that I am attempting to analyse however appears to be utilising vector imaging, in which shapes are determined by a set of geometrical equations and resolution scales up the more you zoom in

2022-07-29_08-04-312.png2022-07-29_08-06-12.png

 

My question; does the imaging technique for a document have an effect in the ability of OCR to correctly identify text? This issue is not as prevalent with larger style letters within the same document (same imaging technique as smaller style letters)

 

2022-07-29_08-24-50.png

 

 

 

 

 

 

 

 

Appreciate the help!

4 REPLIES 4
JoeF-MSFT
Power Apps
Power Apps

Hi @JWall - thanks for the question and the detailed analysis.

 

A few things we can try to see if you see any impact:

  • If you navigate to the homepage of AI Builder (https://aka.ms/tryaibuilder) --> Select Text recognition / Extract all the text in photos and PDF documents (OCR). --> Upload new. Do you get the data extracted as you would expect? The text recognition model has been recently updated with OCR improvements. 

    JoeFMSFT_0-1659194581339.png

     

    JoeFMSFT_1-1659194592015.png

     

  • If you print the original PDF as a new PDF, and use the newly generated PDF on the text recognition model as the step before, do you see better results?

  • If you try to transform one of the pages of the PDF document into an image (for example by taking a screenshot) and try here again with text recognition, do you notice any improvements?

 

JWall
New Member

Hi @JoeF-MSFT

Thanks for the reply!

 

TL;DR - Trying the different methods appeared to have no impact to improving results. Not sure if there are any other methods/variables to test. A suggested feature I could make though would be to allow for manual entries in AI builder, where you still highlight the field in which you want the model to read, but if the model is unable to correctly read the text, then allow for an option to manually edit the read value for the field. Adding incorporation to the MS OCR recognition model to allow for improvement to that as well as improving end user AI models would greatly help the robustness and flexibility of AI builder. 

 

 

Unfortunately I am unable to upload as detailed of a report as last time due to sensitive information, however; I did run through analysis on the situations you suggested. Utilising the 'extract all text in photos and PDF documents (OCR)' default model and uploading my original document, a version of the document that was printed as a new PDF -> SaveAs, and finally a version of the document that was taken as a screenshot and saved as a JPEG. I also tried a version of the document that was taken as a screenshot and saved as a PDF after seeing the results.

From a character count perspective, the results from what the AI reads are as follows:

LEN(.pdforiginal)799
LEN(.jpgss)920
LEN(.pdfprint)799
LEN(.pdfss)500

The original PDF and printing ->saveas PDF yielded the same results. Interestingly; the screenshot -> JPEG had the highest character count, while the screenshot -> PDF had the lowest character count.

 

Now when comparing to the actual data, I am unable to get an exact character count on the original PDF without meticulously counting it myself. What I can tell, is none of the AI read results correctly extracted the data as I would expect. For example the sample document has a total of 18 'A's in a table (similar to that of my previous post). None of the AI read results showed any amount of consistency in 1. Detecting a 'word', 2. Correctly identifying the 'word'. I think at best, surprisingly the jpeg version performed the best at the specific task correctly identifying ~8 'A's, but again; not to adequate result. The original PDF appeared to correctly identify the most amount of characters, which can help explain why the .jpg version identified more characters. A prime example of this would be the .jpg version identifying a column line as an 'l'.

 

Not sure if there are any other methods I could try to help troubleshoot or test for better methods. Other than patiently waiting for improvements to the character recognition AI model. A suggestion I could make though would be to allow for manual entries in AI builder, where you still highlight the field in which you want the model to read, but if the model is unable to correctly read the text, then allow for an option to manually edit the read field value. Adding incorporation to the MS OCR recognition model to allow for improvement to that as well as improving end user AI models would greatly help the robustness and flexibility of AI builder. 

 

Appreciate the help, and let me know if you have any more thoughts. Thanks!

JoeF-MSFT
Power Apps
Power Apps

Hi @JWall - I really appreciate the detailed investigations! And thanks for the feedback of allowing to provide feedback on the detected words while tagging the documents. This is something that indeed we don't have today.

 

I'm curious about those 'A's that are not detected. I understand that the documents contain sensitive information. Would it be possible to share just a screenshot of a word where an 'A' is not detected? Or maybe a partial screenshot of that word?

JWall
New Member

Hi @JoeF-MSFT - Sure.

 

For this specific example I have an array of letters in a table. The letters aren't always 'A', nor are they always aligned in a linear layout pattern. The first screenshot helps show an example of a letter not being detected. All 'B's are detected by the OCR software except for the 'B' highlighted in red. The other 'B's that are either not showing up in the table on the right, or misplaced in the table on the right can easily be fixed by moving the column line, and are correctly identified as text by OCR. You may also notice that the array has '.' in some of the fields. Sometimes these are detected, and sometimes they are not and to which is varying degrees of success. I am not so concerned with this as '.' can also be treated as a blank in my use case, but you may find this interesting for your use.3.png

 

4.png

 

 

 

 

This is a slightly different example where the OCR is detecting a column line as text ('|'). Sometimes OCR will detect column lines as 'l'.

 

 

 

 

 

Again, I want to highlight I presumed this problem to possibly be due to the OCR not being accurate with font sizes <=~8pt, but was curious as to if the image processing type would have an effect on that (vector vs raster) (these PDFs used were vector). 

On an alternate note, I am going to attempt to work around the need to use OCR. These documents I was trying to use with OCR are all standard tables internal to our company. The thing is that they are 1. versions of pivot tables that make it easier for a human to read, 2. in PDF format and thus not as easy for a computer to read (the problem I was trying to solve with OCR). I am working with some people in my company to gain some additional information, but I would go to think there is some more raw data that are driving these PDF documents (strings in an array, tabular format or something like that), something which a computer may have a bit of an easier time reading. If this information exists and I can get access to it, then I should be able to skip over the process of running through OCR & creating an AI to recognise custom document information. 

 

Anyways, I appreciate the help, and hopefully I was able to help with your inquiry.

Helpful resources

Announcements

Celebrating the May Super User of the Month: Laurens Martens

  @LaurensM  is an exceptional contributor to the Power Platform Community. Super Users like Laurens inspire others through their example, encouragement, and active participation. We are excited to celebrated Laurens as our Super User of the Month for May 2024.   Consistent Engagement:  He consistently engages with the community by answering forum questions, sharing insights, and providing solutions. Laurens dedication helps other users find answers and overcome challenges.   Community Expertise: As a Super User, Laurens plays a crucial role in maintaining a knowledge sharing environment. Always ensuring a positive experience for everyone.   Leadership: He shares valuable insights on community growth, engagement, and future trends. Their contributions help shape the Power Platform Community.   Congratulations, Laurens Martens, for your outstanding work! Keep inspiring others and making a difference in the community!   Keep up the fantastic work!        

Check out the Copilot Studio Cookbook today!

We are excited to announce our new Copilot Cookbook Gallery in the Copilot Studio Community. We can't wait for you to share your expertise and your experience!    Join us for an amazing opportunity where you'll be one of the first to contribute to the Copilot Cookbook—your ultimate guide to mastering Microsoft Copilot. Whether you're seeking inspiration or grappling with a challenge while crafting apps, you probably already know that Copilot Cookbook is your reliable assistant, offering a wealth of tips and tricks at your fingertips--and we want you to add your expertise. What can you "cook" up?   Click this link to get started: https://aka.ms/CS_Copilot_Cookbook_Gallery   Don't miss out on this exclusive opportunity to be one of the first in the Community to share your app creation journey with Copilot. We'll be announcing a Cookbook Challenge very soon and want to make sure you one of the first "cooks" in the kitchen.   Don't miss your moment--start submitting in the Copilot Cookbook Gallery today!     Thank you,  Engagement Team

Announcing Power Apps Copilot Cookbook Gallery

We are excited to share that the all-new Copilot Cookbook Gallery for Power Apps is now available in the Power Apps Community, full of tips and tricks on how to best use Microsoft Copilot as you develop and create in Power Apps. The new Copilot Cookbook is your go-to resource when you need inspiration--or when you're stuck--and aren't sure how to best partner with Copilot while creating apps.   Whether you're looking for the best prompts or just want to know about responsible AI use, visit Copilot Cookbook for regular updates you can rely on--while also serving up some of your greatest tips and tricks for the Community. Check Out the new Copilot Cookbook for Power Apps today: Copilot Cookbook - Power Platform Community.  We can't wait to see what you "cook" up!    

Welcome to the Power Automate Community

You are now a part of a fast-growing vibrant group of peers and industry experts who are here to network, share knowledge, and even have a little fun.   Now that you are a member, you can enjoy the following resources:   Welcome to the Community   News & Announcements: The is your place to get all the latest news around community events and announcements. This is where we share with the community what is going on and how to participate.  Be sure to subscribe to this board and not miss an announcement.   Get Help with Power Automate Forums: If you're looking for support with any part of Power Automate, our forums are the place to go. From General Power Automate forums to Using Connectors, Building Flows and Using Flows.  You will find thousands of technical professionals, and Super Users with years of experience who are ready and eager to answer your questions. You now have the ability to post, reply and give "kudos" on the Power Automate community forums. Make sure you conduct a quick search before creating a new post because your question may have already been asked and answered. Galleries: The galleries are full of content and can assist you with information on creating a flow in our Webinars and Video Gallery, and the ability to share the flows you have created in the Power Automate Cookbook.  Stay connected with the Community Connections & How-To Videos from the Microsoft Community Team. Check out the awesome content being shared there today.   Power Automate Community Blog: Over the years, more than 700 Power Automate Community Blog articles have been written and published by our thriving community. Our community members have learned some excellent tips and have keen insights on the future of process automation. In the Power Automate Community Blog, you can read the latest Power Automate-related posts from our community blog authors around the world. Let us know if you'd like to become an author and contribute your own writing — everything Power Automate-related is welcome.   Community Support: Check out and learn more about Using the Community for tips & tricks. Let us know in the Community Feedback  board if you have any questions or comments about your community experience. Again, we are so excited to welcome you to the Microsoft Power Automate community family. Whether you are brand new to the world of process automation or you are a seasoned Power Automate veteran - our goal is to shape the community to be your 'go to' for support, networking, education, inspiration and encouragement as we enjoy this adventure together.     Power Automate Community Team

Hear what's next for the Power Up Program

Hear from Principal Program Manager, Dimpi Gandhi, to discover the latest enhancements to the Microsoft #PowerUpProgram, including a new accelerated video-based curriculum crafted with the expertise of Microsoft MVPs, Rory Neary and Charlie Phipps-Bennett. If you’d like to hear what’s coming next, click the link below to sign up today! https://aka.ms/PowerUp  

Tuesday Tip | How to Report Spam in Our Community

It's time for another TUESDAY TIPS, your weekly connection with the most insightful tips and tricks that empower both newcomers and veterans in the Power Platform Community! Every Tuesday, we bring you a curated selection of the finest advice, distilled from the resources and tools in the Community. Whether you’re a seasoned member or just getting started, Tuesday Tips are the perfect compass guiding you across the dynamic landscape of the Power Platform Community.   As our community family expands each week, we revisit our essential tools, tips, and tricks to ensure you’re well-versed in the community’s pulse. Keep an eye on the News & Announcements for your weekly Tuesday Tips—you never know what you may learn!   Today's Tip: How to Report Spam in Our Community We strive to maintain a professional and helpful community, and part of that effort involves keeping our platform free of spam. If you encounter a post that you believe is spam, please follow these steps to report it: Locate the Post: Find the post in question within the community.Kebab Menu: Click on the "Kebab" menu | 3 Dots, on the top right of the post.Report Inappropriate Content: Select "Report Inappropriate Content" from the menu.Submit Report: Fill out any necessary details on the form and submit your report.   Our community team will review the report and take appropriate action to ensure our community remains a valuable resource for everyone.   Thank you for helping us keep the community clean and useful!

Top Solution Authors
Users online (6,986)