cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
Wittig
Frequent Visitor

Multi-page table spanning issues

One of our suppliers sends multi-page invoices where the invoice summary information at the top of each page is repeated then a table begins with with 5 named columns  (Description, Order Qty, Shipped Qty, Unit Price, Net). The description column field can have multiple rows and is overloaded to include 3 pieces of information we wish to extract (Manufacturer, Model #, Actual description of the item).

 

I have been using the advanced tagging and identifying all 7 pieces of information we wish to extract (3 pieces in the description column and 1 piece each in the remaining 4 columns. I have also been using the multi-page support since the table can span multiple pages.

 

The issue I have now is a single item can span two pages i.e. the overloaded description column field had one line at the end of one page and the remainder on the next. Any ideas on how to properly tag this?

 

Here is a rough example of the issue. Item #3 is the issue and is split between pages:

------------------------------------------------------------

Supplier name                             order date

Address                                       order number

Phone

------------------------------------------------------------

  Description                    Order Qty         Ship Qty   Unit Price       Net

  MyMfg1  MyModel#1           1                      1            $10             $10

  MyDescription of #1

 

  MyMfg2  MyModel#2           1                      1            $10             $10

  MyDescription of #2

 

  MyMfg3  MyModel#3           1                      1            $10             $10

 

-------------------------- Page Break ------------------------------

Supplier name                             order date

Address                                       order number

Phone

------------------------------------------------------------

  Description                    Order Qty         Ship Qty   Unit Price       Net

 

  MyDescription of #3

 

  MyMfg4  MyModel#4           1                      1            $10             $10

  MyDescription of #4

 

  MyMfg5  MyModel#5           1                      1            $10             $10

  MyDescription of #5

 

 

10 REPLIES 10
Antrod
Power Apps
Power Apps

Hi @Wittig ,

 

Thanks for reaching out on that.

Indeed, there's no possibility for now to continue tagging a field that started in a page and spans over a second page.

Did you try not tagging documents in this case but only documents with descriptions that don't span over two pages? Then see if at prediction time, such documents are predicted correctly.

Wittig
Frequent Visitor

I found 12 invoices which span multiple pages but do not split the overloaded description field. After tagging these files as well as 12 more single page invoices the accuracy score is now 87% across the entire model but the overloaded field is only 39%. Is there a way to improve this?

Wittig_1-1681848584327.png

Wittig_2-1681848639543.png

 

 

 

 

Wittig
Frequent Visitor

Bad news. Once the model encounters one of these "page spanned" description fields the accuracy goes way down for every entry thereafter. Even when it goes to a new page with no "page spanned" description it can't recover. In one document I tested it even started to drop columns.

 

At this point it is unusable which is very disappointing.  

Antrod
Power Apps
Power Apps

Hi @Wittig ,

 

Sorry to hear about that.

Did you try to tag the Description as a unique field (instead of 3 fields) and see if you get better results?  If it's the case, then perhaps you would be able to do post processing logic in a flow to separate the values.

Wittig
Frequent Visitor

I have not tried that but if the AI model can't handle the complex parts of the document, I could just write a non-AI parser for the entire document.  Our goal is to add more collections which contain invoices from other suppliers so we could have one process for all.  Also, I would have to start the entire document tagging process over again or lose all the existing tagging.

 

I did try yet again to add even more training documents to the model which contained many of the special cases/anomalies which show up in the invoices. Unfortunately, the accuracy score went down as a result and when I tested it with a document, I used to train it with it couldn't replicate the training data.

 

Are there any plans for improving the types of issues I am encountering? If so, is there a timeline?

 

Wittig
Frequent Visitor

Antrod - I tried what you suggested, and I still get poor results. I trained a new model with 8 documents, and it has problems with both single and multi-page documents.

 

I have no idea how to make the tables any simpler.  

Antrod
Power Apps
Power Apps

Hi @Wittig ,

 

I'm really sorry to hear that latest tests weren't positive.

We unfortunately have no date to have this scenario covered.

 

Are you using the Structured version of the Document processing model? If it's the case, the very last thing I may suggest you is to test with the Unstructured version. I understand it could be tedious to recreate the model but perhaps you could create a very basic version of the model just to see if you have better result.

 

Charanjit
New Member

@Antrod @Wittig  I am facing the exact same issue. I need to extract information in tabular format from order confirmation pdfs received from suppliers. Each pdf has multiple items and each item will have a code, description, quantity and delivery date. So the table will have four columns: Code , Description, Quantity, Delivery Date with each row representing an item. 
The problem arises when some details for an item are present at the bottom of one page and the remaining details are on the next page. 

The document cannot be tagged correctly in such case and the accuracy of model is pretty bad. Suppose the document has 2 pages then the tagged tables look like this

CodeDescriptionQuantityDelivery Date
101this is first item5622.08.2023
102this is second item6523.08.2023
103this is third item  

 

CodeDescriptionQuantityDelivery Date
  7224.08.2023
104this is fourth item8023.08.2023
105this is fifth item60

21.08.2023

takolota
Multi Super User
Multi Super User

@Charanjit @Wittig 

 

You could also try this template that just uses text recognition to create a text replica of the given file & feeds it to GPT for data extraction:

https://powerusers.microsoft.com/t5/Power-Automate-Cookbook/Extract-Data-From-PDFs-and-Images-With-G...

Hi @Charanjit ,

Did you manage to resolve this issue?

If yes, could you please provide the solution?

 

Helpful resources

Announcements

Win free tickets to the Power Platform Conference | Summer of Solutions

We are excited to announce the Summer of Solutions Challenge!    This challenge is kicking off on Monday, June 17th and will run for (4) weeks.  The challenge is open to all Power Platform (Power Apps, Power Automate, Copilot Studio & Power Pages) community members. We invite you to participate in a quest to provide solutions to as many questions as you can. Answers can be provided in all the communities.    Entry Period: This Challenge will consist of four weekly Entry Periods as follows (each an “Entry Period”)   - 12:00 a.m. PT on June 17, 2024 – 11:59 p.m. PT on June 23, 2024 - 12:00 a.m. PT on June 24, 2024 – 11:59 p.m. PT on June 30, 2024 - 12:00 a.m. PT on July 1, 2024 – 11:59 p.m. PT on July 7, 2024 - 12:00 a.m. PT on July 8, 2024 – 11:59 p.m. PT on July 14, 2024   Entries will be eligible for the Entry Period in which they are received and will not carryover to subsequent weekly entry periods.  You must enter into each weekly Entry Period separately.   How to Enter: We invite you to participate in a quest to provide "Accepted Solutions" to as many questions as you can. Answers can be provided in all the communities. Users must provide a solution which can be an “Accepted Solution” in the Forums in all of the communities and there are no limits to the number of “Accepted Solutions” that a member can provide for entries in this challenge, but each entry must be substantially unique and different.    Winner Selection and Prizes: At the end of each week, we will list the top ten (10) Community users which will consist of: 5 Community Members & 5 Super Users and they will advance to the final drawing. We will post each week in the News & Announcements the top 10 Solution providers.  At the end of the challenge, we will add all of the top 10 weekly names and enter them into a random drawing.  Then we will randomly select ten (10) winners (5 Community Members & 5 Super Users) from among all eligible entrants received across all weekly Entry Periods to receive the prize listed below. If a winner declines, we will draw again at random for the next winner.  A user will only be able to win once overall. If they are drawn multiple times, another user will be drawn at random.  Individuals will be contacted before the announcement with the opportunity to claim or deny the prize.  Once all of the winners have been notified, we will post in the News & Announcements of each community with the list of winners.   Each winner will receive one (1) Pass to the Power Platform Conference in Las Vegas, Sep. 18-20, 2024 ($1800 value). NOTE: Prize is for conference attendance only and any other costs such as airfare, lodging, transportation, and food are the sole responsibility of the winner. Tickets are not transferable to any other party or to next year’s event.   ** PLEASE SEE THE ATTACHED RULES for this CHALLENGE**

Celebrating the June Super User of the Month: Markus Franz

Markus Franz is a phenomenal contributor to the Power Apps Community. Super Users like Markus inspire others through their example, encouragement, and active participation.    The Why: "I do this to help others achieve what they are trying to do. As a total beginner back then without IT background I know how overwhelming things can be, so I decided to jump in and help others. I also do this to keep progressing and learning myself." Thank you, Markus Franz, for your outstanding work! Keep inspiring others and making a difference in the community! 🎉  Keep up the fantastic work! 👏👏 Markus Franz | LinkedIn  Power Apps: mmbr1606  

Copilot Cookbook Challenge | Week 1 Results | Win Tickets to the Power Platform Conference

We are excited to announce the "The Copilot Cookbook Community Challenge is a great way to showcase your creativity and connect with others. Plus, you could win tickets to the Power Platform Community Conference in Las Vegas in September 2024 as an amazing bonus.   Two ways to enter: 1. Copilot Studio Cookbook Gallery:  https://aka.ms/CS_Copilot_Cookbook_Challenge 2. Power Apps Copilot Cookbook Gallery: https://aka.ms/PA_Copilot_Cookbook_Challenge   There will be 5 chances to qualify for the final drawing: Early Bird Entries: March 1 - June 2Week 1: June 3 - June 9Week 2: June 10 - June 16Week 3: June 17 - June 23Week 4: June 24 - June 30     At the end of each week, we will draw 5 random names from every user who has posted a qualifying Copilot Studio template, sample or demo in the Copilot Studio Cookbook or a qualifying Power Apps Copilot sample or demo in the Power Apps Copilot Cookbook. Users who are not drawn in a given week will be added to the pool for the next week. Users can qualify more than once, but no more than once per week. Four winners will be drawn at random from the total qualifying entrants. If a winner declines, we will draw again at random for the next winner.  A user will only be able to win once. If they are drawn multiple times, another user will be drawn at random. Prizes:  One Pass to the Power Platform Conference in Las Vegas, Sep. 18-20, 2024 ($1800 value, does not include travel, lodging, or any other expenses) Winners are also eligible to do a 10-minute presentation of their demo or solution in a community solutions showcase at the event. To qualify for the drawing, templates, samples or demos must be related to Copilot Studio or a Copilot feature of Power Apps, Power Automate, or Power Pages, and must demonstrate or solve a complete unique and useful business or technical problem. Power Automate and Power Pagers posts should be added to the Power Apps Cookbook. Final determination of qualifying entries is at the sole discretion of Microsoft. Weekly updates and the Final random winners will be posted in the News & Announcements section in the communities on July 29th, 2024. Did you submit entries early?  Early Bird Entries March 1 - June 2:  If you posted something in the "early bird" time frame complete this form: https://aka.ms/Copilot_Challenge_EarlyBirds if you would like to be entered in the challenge.   Week 1 Results:  Congratulations to the Week 1 qualifiers, you are being entered in the random drawing that will take place at the end of the challenge. Copilot Cookbook Gallery:Power Apps Cookbook Gallery:1.  @Mathieu_Paris 1.   @SpongYe 2.  @Dhanush 2.   @Deenuji 3.  n/a3.   @Nived_Nambiar  4.  n/a4.   @ManishSolanki 5.  n/a5.    n/a

Your Moment to Shine: 2024 PPCC’s Got Power Awards Show

For the third year, we invite you, our talented community members, to participate in the grand 2024 Power Platform Community Conference's Got Power Awards. This event is your opportunity to showcase solutions that make a significant business impact, highlight extensive use of Power Platform products, demonstrate good governance, or tell an inspirational story. Share your success stories, inspire your peers, and show off some hidden talents.  This is your time to shine and bring your creations into the spotlight!  Make your mark, inspire others and leave a lasting impression. Sign up today for a chance to showcase your solution and win the coveted 2024 PPCC’s Got Power Award. This year we have three categories for you to participate in: Technical Solution Demo, Storytelling, and Hidden Talent.      The Technical solution demo category showcases your applications, automated workflows, copilot agentic experiences, web pages, AI capabilities, dashboards, and/or more. We want to see your most impactful Power Platform solutions!  The Storytelling category is where you can share your inspiring story, and the Hidden Talent category is where your talents (such as singing, dancing, jump roping, etc.) can shine! Submission Details:  Fill out the submission form https://aka.ms/PPCCGotPowerSignup by July 12th with details and a 2–5-minute video showcasing your Solution impact. (Please let us know you're coming to PPCC, too!)After review by a panel of Microsoft judges, the top storytellers will be invited to present a virtual demo presentation to the judges during early August. You’ll be notified soon after if you have been selected as a finalist to share your story live at PPCC’s Got Power!  The live show will feature the solution demos and storytelling talents of the top contestants, winner announcements, and the opportunity to network with your community.  It's not just a showcase for technical talent and storytelling showmanship, show it's a golden opportunity to make connections and celebrate our Community together! Let's make this a memorable event! See you there!   Mark your calendars! Date and Time: Thursday, Sept 19th Location: PPCC24 at the MGM Grand, Las Vegas, NV 

Tuesday Tip | Accepting Solutions

It's time for another TUESDAY TIPS, your weekly connection with the most insightful tips and tricks that empower both newcomers and veterans in the Power Platform Community! Every Tuesday, we bring you a curated selection of the finest advice, distilled from the resources and tools in the Community. Whether you’re a seasoned member or just getting started, Tuesday Tips are the perfect compass guiding you across the dynamic landscape of the Power Platform Community.   To enhance our collaborative environment, it's important to acknowledge when your question has been answered satisfactorily. Here's a quick guide on how to accept a solution to your questions: Find the Helpful Reply: Navigate to the reply that has effectively answered your question.Accept as Solution: Look for the "Accept as Solution" button or link, usually located at the bottom of the reply.Confirm Your Selection: Clicking this button may prompt you for confirmation. Go ahead and confirm that this is indeed the solution.Acknowledgment: Once accepted, the reply will be highlighted, and the original post will be marked as "Solved". This helps other community members find the same solution quickly. By marking a reply as an accepted solution, you not only thank the person who helped you but also make it easier for others with similar questions to find answers. Let's continue to support each other by recognizing helpful contributions. 

Reminder: To register for the Community Ambassador Call on June 13th

Calling all Super Users & User Group Leaders   Reminder: To register for the Community Ambassador Call on June 13th—for an exclusive event for User Group Leaders and Super Users! This month is packed with exciting updates and activities within our community.   What's Happening: Community Updates: We'll share the latest developments and what's new in our vibrant community.Special Guest Speaker: Get ready for an insightful talk and live demo of Microsoft Copilot Studio templates by our special guest.Regular Updates: Stay informed with our routine updates for User Groups and Super Users.Community Insights: We'll provide general information about ongoing and upcoming community initiatives. Don't Miss Out: Register Now: Choose the session that fits your schedule best.Check your private messages or Super User Forum for registration links. We're excited to connect with you and continue building a stronger community together.   See you at the call!  

Top Solution Authors
Users online (2,908)