cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
Runner55552
Post Partisan
Post Partisan

Character recognition poor in 1930s vintage forms

I attempted to train a model using some type-written, but old, 1930s oil well permits. They are PDF scanned forms from a government agency, with typed answers generally in similar locations, so field location is ok. However, many of the answers were typed almost directly on the dotted lines of the forms, instead of slightly above. That seemed to cause poor character recognition, and a poor overall document processing model training score of 46%.

Any suggestions for improving this, such as removing all horizontal dotted lines within the PDF document? Or, are there simply some applications which need a person to extract the information.

I tried using "Enhance" in Adobe Pro, but that did not help.

Any suggestions are appreciated.

Runner55552_0-1657569238873.png

 

1 ACCEPTED SOLUTION

Accepted Solutions
Runner55552
Post Partisan
Post Partisan

Thanks for testing and suggesting this. I will give this a try.  We do have some tables in our files also, and I see the unstructured option does not yet have the ability to model tables. However, I think I can work around that for now, since most tables only have up to 4 rows, so I can "flatten" the records and consider them as individual fields for now.

View solution in original post

5 REPLIES 5
JoeF-MSFT
Power Apps
Power Apps

Hi @Runner55552 - this seems to be a fascinating project, bringing AI to documents from the 1930s! 🙂

 

I'd recommend training a new model and selecting Unstructured and free-form documents as the type of document. It might perform better with this type of document and uses a more up to date version of our OCR engine, which will also be available for structured and semi-structured documents by September.

 

JoeFMSFT_0-1657572363299.png

 

With the sample screenshot you provided, I can already see some improvements when selecting Unstructured and free-form documents.

 

When selecting Unstructured and free-form documents:

JoeFMSFT_3-1657572565475.png

 

When selecting structured and semi-structured documents:

JoeFMSFT_2-1657572549588.png

 

 

Runner55552
Post Partisan
Post Partisan

Thanks for testing and suggesting this. I will give this a try.  We do have some tables in our files also, and I see the unstructured option does not yet have the ability to model tables. However, I think I can work around that for now, since most tables only have up to 4 rows, so I can "flatten" the records and consider them as individual fields for now.

One other item I noticed while selecting/tagging text, in both the structured and unstructured model set-up. Even when highlighting only the text desired with a mouse, the text that appears as the value is the work/phrase that has already been recognized as text by the model.  So I cannot remove extraneous text or tic marks, etc. at the end of the word by sizing the selection box smaller. Will this be a feature for future enhancement?

Update:  I re-trained the model using the unstructured document option.  The accuracy of FINDING the location of the fields improved greatly. However, there were still many issues with actual character recognition, Biggest problems were:

1. type-written data was on or near a dotted line, producing phantom dots that were interpreted as periods or various characters.

2. The ' and " tick marks used in the form for feet and inches were often mis-interpreted as ! or 1 or other characters.

3. Many extra spaces inserted in various locations, although I could probably use Power Automate to remove "white space." 

4. Number values often wrong, again in part to the dotted lines, or other marks.

 

If there were the ability to manually train by telling the model during training that Oil equals Oil instead of 011, that would be very useful.

 

Thanks again for the helpful suggestions.

JoeF-MSFT
Power Apps
Power Apps

Hi @Runner55552, thanks a lot for taking the time to provide back this update! Great to hear that when using unstructured documents as an option the finding of the location improved greatly.

 

Today there is no option to adjust when selecting text to the character level, nor an option to provide feedback in the tagging process to change wrongly detected characters. As you suggested, one option is to do post-processing in a cloud flow in Power Automate with expressions to remove white spaces, dots, tick marks. 

Helpful resources

Announcements

Updates to Transitions in the Power Platform Communities

We're embarking on a journey to enhance your experience by transitioning to a new community platform. Our team has been diligently working to create a fresh community site, leveraging the very Dynamics 365 and Power Platform tools our community advocates for.  We started this journey with transitioning Copilot Studio forums and blogs in June. The move marks the beginning of a new chapter, and we're eager for you to be a part of it. The rest of the Power Platform product sites will be moving over this summer.   Stay tuned for more updates as we get closer to the launch. We can't wait to welcome you to our new community space, designed with you in mind. Let's connect, learn, and grow together.   Here's to new beginnings and endless possibilities!   If you have any questions, observations or concerns throughout this process please go to https://aka.ms/PPCommSupport.   To stay up to date on the latest details of this migration and other important Community updates subscribe to our News and Announcements forums: Copilot Studio, Power Apps, Power Automate, Power Pages

Your Moment to Shine: 2024 PPCC’s Got Power Awards Show

For the third year, we invite you, our talented community members, to participate in the grand 2024 Power Platform Community Conference's Got Power Awards. This event is your opportunity to showcase solutions that make a significant business impact, highlight extensive use of Power Platform products, demonstrate good governance, or tell an inspirational story. Share your success stories, inspire your peers, and show off some hidden talents.  This is your time to shine and bring your creations into the spotlight!  Make your mark, inspire others and leave a lasting impression. Sign up today for a chance to showcase your solution and win the coveted 2024 PPCC’s Got Power Award. This year we have three categories for you to participate in: Technical Solution Demo, Storytelling, and Hidden Talent.      The Technical solution demo category showcases your applications, automated workflows, copilot agentic experiences, web pages, AI capabilities, dashboards, and/or more. We want to see your most impactful Power Platform solutions!  The Storytelling category is where you can share your inspiring story, and the Hidden Talent category is where your talents (such as singing, dancing, jump roping, etc.) can shine! Submission Details:  Fill out the submission form https://aka.ms/PPCCGotPowerSignup by the end of July with details and a 2–5-minute video showcasing your Solution impact. (Please let us know you're coming to PPCC, too!)After review by a panel of Microsoft judges, the top storytellers will be invited to present a virtual demo presentation to the judges during early August. You’ll be notified soon after if you have been selected as a finalist to share your story live at PPCC’s Got Power!  The live show will feature the solution demos and storytelling talents of the top contestants, winner announcements, and the opportunity to network with your community.  It's not just a showcase for technical talent and storytelling showmanship, show it's a golden opportunity to make connections and celebrate our Community together! Let's make this a memorable event! See you there!   Mark your calendars! Date and Time: Thursday, Sept 19th Location: PPCC24 at the MGM Grand, Las Vegas, NV 

Update! June 13th, Community Ambassador Call for User Group Leaders and Super Users

Calling all Super Users & User Group Leaders     UPDATE:  We just wrapped up June's Community Ambassador monthly calls for Super Users and User Group Leaders. We had a fantastic call with lots of engagement. We are excited to share some highlights with you!    Big THANK YOU to our special guest Thomas Verhasselt, from the Copilot Studio Product Team for sharing how to use Power Platform Templates to achieve next generation growth.     A few key takeaways: Copilot Studio Cookbook Challenge:  Week 1 results are posted, Keep up the great work!Summer of Solutions:  Starting on Monday, June 17th. Just by providing solutions in the community, you can be entered to win tickets to Power Platform Community Conference.Super User Season 2: Coming SoonAll communities moving to the new platform end of July We also honored two different community members during the call, Mohamed Amine Mahmoudi and Markus Franz! We are thankful for both leaders' contributions and engagement with their respective communities. 🎉   Be sure to mark your calendars and register for the meeting on July 11th and stay up to date on all of the changes that are coming. Check out the Super User Forum boards for details.   We're excited to connect with you and continue building a stronger community together.   See you at the call!  

Win free tickets to the Power Platform Conference | Summer of Solutions

We are excited to announce the Summer of Solutions Challenge!    This challenge is kicking off on Monday, June 17th and will run for (4) weeks.  The challenge is open to all Power Platform (Power Apps, Power Automate, Copilot Studio & Power Pages) community members. We invite you to participate in a quest to provide solutions in the Forums to as many questions as you can. Answers can be provided in all the communities.    Entry Period: This Challenge will consist of four weekly Entry Periods as follows (each an “Entry Period”)   - 12:00 a.m. PT on June 17, 2024 – 11:59 p.m. PT on June 23, 2024 - 12:00 a.m. PT on June 24, 2024 – 11:59 p.m. PT on June 30, 2024 - 12:00 a.m. PT on July 1, 2024 – 11:59 p.m. PT on July 7, 2024 - 12:00 a.m. PT on July 8, 2024 – 11:59 p.m. PT on July 14, 2024   Entries will be eligible for the Entry Period in which they are received and will not carryover to subsequent weekly entry periods.  You must enter into each weekly Entry Period separately.   How to Enter: We invite you to participate in a quest to provide "Accepted Solutions" to as many questions as you can. Answers can be provided in all the communities. Users must provide a solution which can be an “Accepted Solution” in the Forums in all of the communities and there are no limits to the number of “Accepted Solutions” that a member can provide for entries in this challenge, but each entry must be substantially unique and different.    Winner Selection and Prizes: At the end of each week, we will list the top ten (10) Community users which will consist of: 5 Community Members & 5 Super Users and they will advance to the final drawing. We will post each week in the News & Announcements the top 10 Solution providers.  At the end of the challenge, we will add all of the top 10 weekly names and enter them into a random drawing.  Then we will randomly select ten (10) winners (5 Community Members & 5 Super Users) from among all eligible entrants received across all weekly Entry Periods to receive the prize listed below. If a winner declines, we will draw again at random for the next winner.  A user will only be able to win once overall. If they are drawn multiple times, another user will be drawn at random.  Individuals will be contacted before the announcement with the opportunity to claim or deny the prize.  Once all of the winners have been notified, we will post in the News & Announcements of each community with the list of winners.   Each winner will receive one (1) Pass to the Power Platform Conference in Las Vegas, Sep. 18-20, 2024 ($1800 value). NOTE: Prize is for conference attendance only and any other costs such as airfare, lodging, transportation, and food are the sole responsibility of the winner. Tickets are not transferable to any other party or to next year’s event.   ** PLEASE SEE THE ATTACHED RULES for this CHALLENGE**

Copilot Cookbook Challenge | Week 2 Results | Win Tickets to the Power Platform Conference

We are excited to announce the "The Copilot Cookbook Community Challenge is a great way to showcase your creativity and connect with others. Plus, you could win tickets to the Power Platform Community Conference in Las Vegas in September 2024 as an amazing bonus.   Two ways to enter: 1. Copilot Studio Cookbook Gallery: https://aka.ms/CS_Copilot_Cookbook_Challenge 2. Power Apps Copilot Cookbook Gallery: https://aka.ms/PA_Copilot_Cookbook_Challenge   There will be 5 chances to qualify for the final drawing: Early Bird Entries: March 1 - June 2Week 1: June 3 - June 9Week 2: June 10 - June 16Week 3: June 17 - June 23Week 4: June 24 - June 30     At the end of each week, we will draw 5 random names from every user who has posted a qualifying Copilot Studio template, sample or demo in the Copilot Studio Cookbook or a qualifying Power Apps Copilot sample or demo in the Power Apps Copilot Cookbook. Users who are not drawn in a given week will be added to the pool for the next week. Users can qualify more than once, but no more than once per week. Four winners will be drawn at random from the total qualifying entrants. If a winner declines, we will draw again at random for the next winner.  A user will only be able to win once. If they are drawn multiple times, another user will be drawn at random. Prizes:  One Pass to the Power Platform Conference in Las Vegas, Sep. 18-20, 2024 ($1800 value, does not include travel, lodging, or any other expenses) Winners are also eligible to do a 10-minute presentation of their demo or solution in a community solutions showcase at the event. To qualify for the drawing, templates, samples or demos must be related to Copilot Studio or a Copilot feature of Power Apps, Power Automate, or Power Pages, and must demonstrate or solve a complete unique and useful business or technical problem. Power Automate and Power Pagers posts should be added to the Power Apps Cookbook. Final determination of qualifying entries is at the sole discretion of Microsoft. Weekly updates and the Final random winners will be posted in the News & Announcements section in the communities on July 29th, 2024. Did you submit entries early?  Early Bird Entries March 1 - June 2:  If you posted something in the "early bird" time frame complete this form: https://aka.ms/Copilot_Challenge_EarlyBirds if you would like to be entered in the challenge.   Week 1 Results:  Congratulations to the Week 1 qualifiers, you are being entered in the random drawing that will take place at the end of the challenge. Copilot Cookbook Gallery:Power Apps Cookbook Gallery:1.  @Mathieu_Paris 1.   @SpongYe 2.  @Dhanush 2.   @Deenuji 3.  n/a3.   @Nived_Nambiar  4.  n/a4.   @ManishSolanki 5.  n/a5.    n/a   Week 2 Results:  Congratulations to the Week 2 qualifiers, you are being entered in the random drawing that will take place at the end of the challenge. Copilot Cookbook Gallery:Power Apps Cookbook Gallery:1. Kasun_Pathirana1. ManishSolanki2. cloudatica2. madlad3. n/a3. SpongYe4. n/a4. n/a5. n/a5. n/a

Celebrating the June Super User of the Month: Markus Franz

Markus Franz is a phenomenal contributor to the Power Apps Community. Super Users like Markus inspire others through their example, encouragement, and active participation.    The Why: "I do this to help others achieve what they are trying to do. As a total beginner back then without IT background I know how overwhelming things can be, so I decided to jump in and help others. I also do this to keep progressing and learning myself." Thank you, Markus Franz, for your outstanding work! Keep inspiring others and making a difference in the community! 🎉  Keep up the fantastic work! 👏👏   Markus Franz | LinkedIn  Power Apps: mmbr1606  

Top Solution Authors
Top Kudoed Authors
Users online (3,720)