cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
Syndicate_Admin
Administrator
Administrator

NEED HELP PLEASE- PDF Report with varying number of pages and tables

Trying to use power query to import tables from an invoice in PDF format to an excel format. The data on the first page (PDF) has a consistent tabular format. The data on the following pages (PDF) is in tabular format, but doesn't line up with the first. This is not an issue. The problem is that the report can be 2-50 pages and each page imports as its own table, even though the format is consistent from page 2 on. Does anyone know how to write a power query that I can use on this report regardless of the number of pages/tables I am trying to import?

 

Newbie.

1 ACCEPTED SOLUTION

Accepted Solutions

Hi @joshpgeorge 

 

For the 3rd query, you can first filter on "Page" in Kind column, then keep the bottom 1 row. 

let
    Source = Pdf.Tables(File.Contents("C:\Users\Admin\Desktop\Data.pdf"), [Implementation="1.3"]),
    #"Filtered Rows" = Table.SelectRows(Source, each ([Kind] = "Page")),
    #"Kept Last Rows" = Table.LastN(#"Filtered Rows", 1)
in
    #"Kept Last Rows"

 

For the 2nd query, you can first filter on "Page" in Kind column, then remove the first row and last row.

let
    Source = Pdf.Tables(File.Contents("C:\Users\Admin\Desktop\Data.pdf"), [Implementation="1.3"]),
    #"Filtered Rows" = Table.SelectRows(Source, each ([Kind] = "Page") and ([Id] <> "Page001")),
    #"Removed Bottom Rows" = Table.RemoveLastN(#"Filtered Rows",1)
in
    #"Removed Bottom Rows"

 

Under Home tab, Keep Rows and Remove Rows can also be used to filter rows.

080602.jpg

 

Best Regards,
Community Support Team _ Jing

View solution in original post

11 REPLIES 11
Syndicate_Admin
Administrator
Administrator

Does this help?

 

let
    Source = Pdf.Tables(File.Contents("YourPDFfilepath")),
    #"Filtered Rows" = Table.SelectRows(Source, each ([Kind] = "Page"))
in
    #"Filtered Rows"

 

Syndicate_Admin
Administrator
Administrator

The page approach is correct (filter to just the pages, filter out the tables). But with PDFs, I find that you should also sort by column count descending before expanding the tables. This will prevent missing and/or null columns. Columns =Table.AddColumn(TableOrStepName, "Count", each Table.ColumnCount(PdfTableColumn)

Then sort, then expand. You can always re-sort by the page number column. 
--Nate

Following Jakinta's suggestion and that worked to import the pages. Now since I normally use power query (instead of Advanced Editor) to do the import, my only option is to select the tables and that leads me back to my original problem. This is where I am.

 

let
Source = Pdf.Tables(File.Contents("C:\Users\Joshua George\OneDrive - Bestway Rental Inc\Desktop_Home\Indeed\20210723.pdf")), #"Filtered Rows" = Table.SelectRows(Source, each ([Kind] = "Page"))
in
#"Filtered Rows"

Syndicate_Admin
Administrator
Administrator

Hi @joshpgeorge 

 

When you connect to the PDF file, do not select any table or page in the Navigator window. Instead, right click on the file name and select Transform Data. This will bring in a table with all tables and pages in this file, which is consistent with the result of Source step when you select a table or a page.

080303.jpg

 

Then filter out "Table" in Kind column and filter out the first page. You can connect to the first page in another query if you want to have it. 

 

Remove other columns except for the Data column and expand Data column. Promote the first row as headers.

 

In this way, you can combine all pages from page 2 to the last page into a table no matter how many pages you have in the file. All these operations can be done by using the user interface.

let
    Source = Pdf.Tables(File.Contents("C:\Users\Admin\Desktop\Data.pdf"), [Implementation="1.3"]),
    #"Filtered Rows" = Table.SelectRows(Source, each ([Kind] = "Page") and ([Id] <> "Page001")),
    #"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"Id", "Name", "Kind"}),
    #"Expanded Data" = Table.ExpandTableColumn(#"Removed Columns", "Data", {"Column1", "Column2", "Column3", "Column4", "Column5"}, {"Data.Column1", "Data.Column2", "Data.Column3", "Data.Column4", "Data.Column5"}),
    #"Promoted Headers" = Table.PromoteHeaders(#"Expanded Data", [PromoteAllScalars=true]),
    #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Item #", type text}, {"Month", type date}, {"Total#(lf)Purchase", Int64.Type}, {"Ending#(lf)Inventory", Int64.Type}, {"Turnover", Int64.Type}})
in
    #"Changed Type"

 

Let me know if you have any questions.

 

Regards,
Community Support Team _ Jing
If this post helps, please Accept it as the solution to help other members find it.

This is almost perfect except when the last page contains data in a different table format. How do I select the "last" page regardless of page number and alternately how do I exclude the "last" page.

Syndicate_Admin
Administrator
Administrator

Table.RemoveLastN(Table, 1)

 

or to Select last row,

 

Table.LastN(Table, 1)

 

--Nate

@joshpgeorge 

Insert a step before expanding Data column to remove the last row. Go to Remove Rows > Remove Bottom Rows.

080501.jpg

Jing

I need to remove the last Page from the 2nd Query and then I need another (3rd) Query to look at ONLY the last Page. Here is what I have in my first and second Queries so far-

 

1st- 

let Source = Pdf.Tables(File.Contents("C:\Users\JG\OneDrive - BRInc\Desktop_Home\Custom Reports JG\Email_NewAgreeRpt.pdf"), [Implementation="1.3"]),
#"Filtered Rows" = Table.SelectRows(Source, each ([Kind] = "Page")and ([Id] = "Page001")),
#"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"Id", "Name", "Kind"}),
#"Expanded Data" = Table.ExpandTableColumn(#"Removed Columns", "Data", {"Column1", "Column2", "Column3", "Column4", "Column5", "Column6", "Column7", "Column8"}, {"Data.Column1", "Data.Column2", "Data.Column3", "Data.Column4", "Data.Column5", "Data.Column6", "Data.Column7", "Data.Column8"}),
#"Promoted Headers" = Table.PromoteHeaders(#"Expanded Data", [PromoteAllScalars=true]),
#"Filtered Rows1" = Table.SelectRows(#"Promoted Headers", each ([Column3] = "-")),
#"Reordered Columns" = Table.ReorderColumns(#"Filtered Rows1",{"Column3", "Column4", "Column5", "Column6", "Column7", "Column1", "Column2", "Column8"}),
#"Removed Other Columns" = Table.SelectColumns(#"Reordered Columns",{"Column1", "Column2", "Column8"}),
#"Renamed Columns" = Table.RenameColumns(#"Removed Other Columns",{{"Column1", "Total"}})
in
#"Renamed Columns"

 

2nd 

let Source = Pdf.Tables(File.Contents("C:\Users\JG\OneDrive - BRInc\Desktop_Home\Custom Reports JG\Email_NewAgreeRpt.pdf"), [Implementation="1.3"]),#"Filtered Rows" = Table.SelectRows(Source, each ([Kind] = "Page") and ([Id] <> "Page001")), #"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"Id", "Name", "Kind"}), #"Expanded Data" = Table.ExpandTableColumn(#"Removed Columns", "Data", {"Column1", "Column2", "Column3", "Column4", "Column5", "Column6", "Column7", "Column8"}, {"Data.Column1", "Data.Column2", "Data.Column3", "Data.Column4", "Data.Column5", "Data.Column6", "Data.Column7", "Data.Column8"}), #"Promoted Headers" = Table.PromoteHeaders(#"Expanded Data", [PromoteAllScalars=true]),
#"Filtered Rows1" = Table.SelectRows(#"Promoted Headers", each ([Column3] = "-")),
#"Reordered Columns" = Table.ReorderColumns(#"Filtered Rows1",{"Column3", "Column4", "Column5", "Column6", "Column7", "Column1", "Column2", "Column8"}),
#"Removed Other Columns" = Table.SelectColumns(#"Reordered Columns",{"Column1", "Column2", "Column8"}),
#"Renamed Columns" = Table.RenameColumns(#"Removed Other Columns",{{"Column1", "Total"}})
in
#"Renamed Columns"

 

Hi @joshpgeorge 

 

For the 3rd query, you can first filter on "Page" in Kind column, then keep the bottom 1 row. 

let
    Source = Pdf.Tables(File.Contents("C:\Users\Admin\Desktop\Data.pdf"), [Implementation="1.3"]),
    #"Filtered Rows" = Table.SelectRows(Source, each ([Kind] = "Page")),
    #"Kept Last Rows" = Table.LastN(#"Filtered Rows", 1)
in
    #"Kept Last Rows"

 

For the 2nd query, you can first filter on "Page" in Kind column, then remove the first row and last row.

let
    Source = Pdf.Tables(File.Contents("C:\Users\Admin\Desktop\Data.pdf"), [Implementation="1.3"]),
    #"Filtered Rows" = Table.SelectRows(Source, each ([Kind] = "Page") and ([Id] <> "Page001")),
    #"Removed Bottom Rows" = Table.RemoveLastN(#"Filtered Rows",1)
in
    #"Removed Bottom Rows"

 

Under Home tab, Keep Rows and Remove Rows can also be used to filter rows.

080602.jpg

 

Best Regards,
Community Support Team _ Jing

Helpful resources

Announcements

Win free tickets to the Power Platform Conference | Summer of Solutions

We are excited to announce the Summer of Solutions Challenge!    This challenge is kicking off on Monday, June 17th and will run for (4) weeks.  The challenge is open to all Power Platform (Power Apps, Power Automate, Copilot Studio & Power Pages) community members. We invite you to participate in a quest to provide solutions to as many questions as you can. Answers can be provided in all the communities.    Entry Period: This Challenge will consist of four weekly Entry Periods as follows (each an “Entry Period”)   - 12:00 a.m. PT on June 17, 2024 – 11:59 p.m. PT on June 23, 2024 - 12:00 a.m. PT on June 24, 2024 – 11:59 p.m. PT on June 30, 2024 - 12:00 a.m. PT on July 1, 2024 – 11:59 p.m. PT on July 7, 2024 - 12:00 a.m. PT on July 8, 2024 – 11:59 p.m. PT on July 14, 2024   Entries will be eligible for the Entry Period in which they are received and will not carryover to subsequent weekly entry periods.  You must enter into each weekly Entry Period separately.   How to Enter: We invite you to participate in a quest to provide "Accepted Solutions" to as many questions as you can. Answers can be provided in all the communities. Users must provide a solution which can be an “Accepted Solution” in the Forums in all of the communities and there are no limits to the number of “Accepted Solutions” that a member can provide for entries in this challenge, but each entry must be substantially unique and different.    Winner Selection and Prizes: At the end of each week, we will list the top ten (10) Community users which will consist of: 5 Community Members & 5 Super Users and they will advance to the final drawing. We will post each week in the News & Announcements the top 10 Solution providers.  At the end of the challenge, we will add all of the top 10 weekly names and enter them into a random drawing.  Then we will randomly select ten (10) winners (5 Community Members & 5 Super Users) from among all eligible entrants received across all weekly Entry Periods to receive the prize listed below. If a winner declines, we will draw again at random for the next winner.  A user will only be able to win once overall. If they are drawn multiple times, another user will be drawn at random.  Individuals will be contacted before the announcement with the opportunity to claim or deny the prize.  Once all of the winners have been notified, we will post in the News & Announcements of each community with the list of winners.   Each winner will receive one (1) Pass to the Power Platform Conference in Las Vegas, Sep. 18-20, 2024 ($1800 value). NOTE: Prize is for conference attendance only and any other costs such as airfare, lodging, transportation, and food are the sole responsibility of the winner. Tickets are not transferable to any other party or to next year’s event.   ** PLEASE SEE THE ATTACHED RULES for this CHALLENGE**

Celebrating the June Super User of the Month: Markus Franz

Markus Franz is a phenomenal contributor to the Power Apps Community. Super Users like Markus inspire others through their example, encouragement, and active participation.    The Why: "I do this to help others achieve what they are trying to do. As a total beginner back then without IT background I know how overwhelming things can be, so I decided to jump in and help others. I also do this to keep progressing and learning myself." Thank you, Markus Franz, for your outstanding work! Keep inspiring others and making a difference in the community! 🎉  Keep up the fantastic work! 👏👏   Markus Franz | LinkedIn  Power Apps: mmbr1606  

Copilot Cookbook Challenge | Week 1 Results | Win Tickets to the Power Platform Conference

We are excited to announce the "The Copilot Cookbook Community Challenge is a great way to showcase your creativity and connect with others. Plus, you could win tickets to the Power Platform Community Conference in Las Vegas in September 2024 as an amazing bonus.   Two ways to enter: 1. Copilot Studio Cookbook Gallery: https://aka.ms/CS_Copilot_Cookbook_Challenge 2. Power Apps Copilot Cookbook Gallery: https://aka.ms/PA_Copilot_Cookbook_Challenge   There will be 5 chances to qualify for the final drawing: Early Bird Entries: March 1 - June 2Week 1: June 3 - June 9Week 2: June 10 - June 16Week 3: June 17 - June 23Week 4: June 24 - June 30     At the end of each week, we will draw 5 random names from every user who has posted a qualifying Copilot Studio template, sample or demo in the Copilot Studio Cookbook or a qualifying Power Apps Copilot sample or demo in the Power Apps Copilot Cookbook. Users who are not drawn in a given week will be added to the pool for the next week. Users can qualify more than once, but no more than once per week. Four winners will be drawn at random from the total qualifying entrants. If a winner declines, we will draw again at random for the next winner.  A user will only be able to win once. If they are drawn multiple times, another user will be drawn at random. Prizes:  One Pass to the Power Platform Conference in Las Vegas, Sep. 18-20, 2024 ($1800 value, does not include travel, lodging, or any other expenses) Winners are also eligible to do a 10-minute presentation of their demo or solution in a community solutions showcase at the event. To qualify for the drawing, templates, samples or demos must be related to Copilot Studio or a Copilot feature of Power Apps, Power Automate, or Power Pages, and must demonstrate or solve a complete unique and useful business or technical problem. Power Automate and Power Pagers posts should be added to the Power Apps Cookbook. Final determination of qualifying entries is at the sole discretion of Microsoft. Weekly updates and the Final random winners will be posted in the News & Announcements section in the communities on July 29th, 2024. Did you submit entries early?  Early Bird Entries March 1 - June 2:  If you posted something in the "early bird" time frame complete this form: https://aka.ms/Copilot_Challenge_EarlyBirds if you would like to be entered in the challenge.   Week 1 Results:  Congratulations to the Week 1 qualifiers, you are being entered in the random drawing that will take place at the end of the challenge. Copilot Cookbook Gallery:Power Apps Cookbook Gallery:1.  @Mathieu_Paris 1.   @SpongYe 2.  @Dhanush 2.   @Deenuji 3.  n/a3.   @Nived_Nambiar  4.  n/a4.   @ManishSolanki 5.  n/a5.    n/a

Your Moment to Shine: 2024 PPCC’s Got Power Awards Show

For the third year, we invite you, our talented community members, to participate in the grand 2024 Power Platform Community Conference's Got Power Awards. This event is your opportunity to showcase solutions that make a significant business impact, highlight extensive use of Power Platform products, demonstrate good governance, or tell an inspirational story. Share your success stories, inspire your peers, and show off some hidden talents.  This is your time to shine and bring your creations into the spotlight!  Make your mark, inspire others and leave a lasting impression. Sign up today for a chance to showcase your solution and win the coveted 2024 PPCC’s Got Power Award. This year we have three categories for you to participate in: Technical Solution Demo, Storytelling, and Hidden Talent.      The Technical solution demo category showcases your applications, automated workflows, copilot agentic experiences, web pages, AI capabilities, dashboards, and/or more. We want to see your most impactful Power Platform solutions!  The Storytelling category is where you can share your inspiring story, and the Hidden Talent category is where your talents (such as singing, dancing, jump roping, etc.) can shine! Submission Details:  Fill out the submission form https://aka.ms/PPCCGotPowerSignup by July 12th with details and a 2–5-minute video showcasing your Solution impact. (Please let us know you're coming to PPCC, too!)After review by a panel of Microsoft judges, the top storytellers will be invited to present a virtual demo presentation to the judges during early August. You’ll be notified soon after if you have been selected as a finalist to share your story live at PPCC’s Got Power!  The live show will feature the solution demos and storytelling talents of the top contestants, winner announcements, and the opportunity to network with your community.  It's not just a showcase for technical talent and storytelling showmanship, show it's a golden opportunity to make connections and celebrate our Community together! Let's make this a memorable event! See you there!   Mark your calendars! Date and Time: Thursday, Sept 19th Location: PPCC24 at the MGM Grand, Las Vegas, NV 

Tuesday Tip | Accepting Solutions

It's time for another TUESDAY TIPS, your weekly connection with the most insightful tips and tricks that empower both newcomers and veterans in the Power Platform Community! Every Tuesday, we bring you a curated selection of the finest advice, distilled from the resources and tools in the Community. Whether you’re a seasoned member or just getting started, Tuesday Tips are the perfect compass guiding you across the dynamic landscape of the Power Platform Community.   To enhance our collaborative environment, it's important to acknowledge when your question has been answered satisfactorily. Here's a quick guide on how to accept a solution to your questions: Find the Helpful Reply: Navigate to the reply that has effectively answered your question.Accept as Solution: Look for the "Accept as Solution" button or link, usually located at the bottom of the reply.Confirm Your Selection: Clicking this button may prompt you for confirmation. Go ahead and confirm that this is indeed the solution.Acknowledgment: Once accepted, the reply will be highlighted, and the original post will be marked as "Solved". This helps other community members find the same solution quickly. By marking a reply as an accepted solution, you not only thank the person who helped you but also make it easier for others with similar questions to find answers. Let's continue to support each other by recognizing helpful contributions. 

Reminder: To register for the Community Ambassador Call on June 13th

Calling all Super Users & User Group Leaders     Reminder: To register for the Community Ambassador Call on June 13th—for an exclusive event for User Group Leaders and Super Users! This month is packed with exciting updates and activities within our community.   What's Happening: Community Updates: We'll share the latest developments and what's new in our vibrant community.Special Guest Speaker: Get ready for an insightful talk and live demo of Microsoft Copilot Studio templates by our special guest.Regular Updates: Stay informed with our routine updates for User Groups and Super Users.Community Insights: We'll provide general information about ongoing and upcoming community initiatives. Don't Miss Out: Register Now: Choose the session that fits your schedule best.Check your private messages or Super User Forum for registration links. We're excited to connect with you and continue building a stronger community together.   See you at the call!  

Top Kudoed Authors
Users online (3,405)