cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
Syndicate_Admin
Administrator
Administrator

NEED HELP PLEASE- PDF Report with varying number of pages and tables

Trying to use power query to import tables from an invoice in PDF format to an excel format. The data on the first page (PDF) has a consistent tabular format. The data on the following pages (PDF) is in tabular format, but doesn't line up with the first. This is not an issue. The problem is that the report can be 2-50 pages and each page imports as its own table, even though the format is consistent from page 2 on. Does anyone know how to write a power query that I can use on this report regardless of the number of pages/tables I am trying to import?

 

Newbie.

1 ACCEPTED SOLUTION

Accepted Solutions

Hi @joshpgeorge 

 

For the 3rd query, you can first filter on "Page" in Kind column, then keep the bottom 1 row. 

let
    Source = Pdf.Tables(File.Contents("C:\Users\Admin\Desktop\Data.pdf"), [Implementation="1.3"]),
    #"Filtered Rows" = Table.SelectRows(Source, each ([Kind] = "Page")),
    #"Kept Last Rows" = Table.LastN(#"Filtered Rows", 1)
in
    #"Kept Last Rows"

 

For the 2nd query, you can first filter on "Page" in Kind column, then remove the first row and last row.

let
    Source = Pdf.Tables(File.Contents("C:\Users\Admin\Desktop\Data.pdf"), [Implementation="1.3"]),
    #"Filtered Rows" = Table.SelectRows(Source, each ([Kind] = "Page") and ([Id] <> "Page001")),
    #"Removed Bottom Rows" = Table.RemoveLastN(#"Filtered Rows",1)
in
    #"Removed Bottom Rows"

 

Under Home tab, Keep Rows and Remove Rows can also be used to filter rows.

080602.jpg

 

Best Regards,
Community Support Team _ Jing

View solution in original post

11 REPLIES 11
Syndicate_Admin
Administrator
Administrator

Does this help?

 

let
    Source = Pdf.Tables(File.Contents("YourPDFfilepath")),
    #"Filtered Rows" = Table.SelectRows(Source, each ([Kind] = "Page"))
in
    #"Filtered Rows"

 

Syndicate_Admin
Administrator
Administrator

The page approach is correct (filter to just the pages, filter out the tables). But with PDFs, I find that you should also sort by column count descending before expanding the tables. This will prevent missing and/or null columns. Columns =Table.AddColumn(TableOrStepName, "Count", each Table.ColumnCount(PdfTableColumn)

Then sort, then expand. You can always re-sort by the page number column. 
--Nate

Following Jakinta's suggestion and that worked to import the pages. Now since I normally use power query (instead of Advanced Editor) to do the import, my only option is to select the tables and that leads me back to my original problem. This is where I am.

 

let
Source = Pdf.Tables(File.Contents("C:\Users\Joshua George\OneDrive - Bestway Rental Inc\Desktop_Home\Indeed\20210723.pdf")), #"Filtered Rows" = Table.SelectRows(Source, each ([Kind] = "Page"))
in
#"Filtered Rows"

Syndicate_Admin
Administrator
Administrator

Hi @joshpgeorge 

 

When you connect to the PDF file, do not select any table or page in the Navigator window. Instead, right click on the file name and select Transform Data. This will bring in a table with all tables and pages in this file, which is consistent with the result of Source step when you select a table or a page.

080303.jpg

 

Then filter out "Table" in Kind column and filter out the first page. You can connect to the first page in another query if you want to have it. 

 

Remove other columns except for the Data column and expand Data column. Promote the first row as headers.

 

In this way, you can combine all pages from page 2 to the last page into a table no matter how many pages you have in the file. All these operations can be done by using the user interface.

let
    Source = Pdf.Tables(File.Contents("C:\Users\Admin\Desktop\Data.pdf"), [Implementation="1.3"]),
    #"Filtered Rows" = Table.SelectRows(Source, each ([Kind] = "Page") and ([Id] <> "Page001")),
    #"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"Id", "Name", "Kind"}),
    #"Expanded Data" = Table.ExpandTableColumn(#"Removed Columns", "Data", {"Column1", "Column2", "Column3", "Column4", "Column5"}, {"Data.Column1", "Data.Column2", "Data.Column3", "Data.Column4", "Data.Column5"}),
    #"Promoted Headers" = Table.PromoteHeaders(#"Expanded Data", [PromoteAllScalars=true]),
    #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Item #", type text}, {"Month", type date}, {"Total#(lf)Purchase", Int64.Type}, {"Ending#(lf)Inventory", Int64.Type}, {"Turnover", Int64.Type}})
in
    #"Changed Type"

 

Let me know if you have any questions.

 

Regards,
Community Support Team _ Jing
If this post helps, please Accept it as the solution to help other members find it.

This is almost perfect except when the last page contains data in a different table format. How do I select the "last" page regardless of page number and alternately how do I exclude the "last" page.

Syndicate_Admin
Administrator
Administrator

Table.RemoveLastN(Table, 1)

 

or to Select last row,

 

Table.LastN(Table, 1)

 

--Nate

@joshpgeorge 

Insert a step before expanding Data column to remove the last row. Go to Remove Rows > Remove Bottom Rows.

080501.jpg

Jing

I need to remove the last Page from the 2nd Query and then I need another (3rd) Query to look at ONLY the last Page. Here is what I have in my first and second Queries so far-

 

1st- 

let Source = Pdf.Tables(File.Contents("C:\Users\JG\OneDrive - BRInc\Desktop_Home\Custom Reports JG\Email_NewAgreeRpt.pdf"), [Implementation="1.3"]),
#"Filtered Rows" = Table.SelectRows(Source, each ([Kind] = "Page")and ([Id] = "Page001")),
#"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"Id", "Name", "Kind"}),
#"Expanded Data" = Table.ExpandTableColumn(#"Removed Columns", "Data", {"Column1", "Column2", "Column3", "Column4", "Column5", "Column6", "Column7", "Column8"}, {"Data.Column1", "Data.Column2", "Data.Column3", "Data.Column4", "Data.Column5", "Data.Column6", "Data.Column7", "Data.Column8"}),
#"Promoted Headers" = Table.PromoteHeaders(#"Expanded Data", [PromoteAllScalars=true]),
#"Filtered Rows1" = Table.SelectRows(#"Promoted Headers", each ([Column3] = "-")),
#"Reordered Columns" = Table.ReorderColumns(#"Filtered Rows1",{"Column3", "Column4", "Column5", "Column6", "Column7", "Column1", "Column2", "Column8"}),
#"Removed Other Columns" = Table.SelectColumns(#"Reordered Columns",{"Column1", "Column2", "Column8"}),
#"Renamed Columns" = Table.RenameColumns(#"Removed Other Columns",{{"Column1", "Total"}})
in
#"Renamed Columns"

 

2nd 

let Source = Pdf.Tables(File.Contents("C:\Users\JG\OneDrive - BRInc\Desktop_Home\Custom Reports JG\Email_NewAgreeRpt.pdf"), [Implementation="1.3"]),#"Filtered Rows" = Table.SelectRows(Source, each ([Kind] = "Page") and ([Id] <> "Page001")), #"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"Id", "Name", "Kind"}), #"Expanded Data" = Table.ExpandTableColumn(#"Removed Columns", "Data", {"Column1", "Column2", "Column3", "Column4", "Column5", "Column6", "Column7", "Column8"}, {"Data.Column1", "Data.Column2", "Data.Column3", "Data.Column4", "Data.Column5", "Data.Column6", "Data.Column7", "Data.Column8"}), #"Promoted Headers" = Table.PromoteHeaders(#"Expanded Data", [PromoteAllScalars=true]),
#"Filtered Rows1" = Table.SelectRows(#"Promoted Headers", each ([Column3] = "-")),
#"Reordered Columns" = Table.ReorderColumns(#"Filtered Rows1",{"Column3", "Column4", "Column5", "Column6", "Column7", "Column1", "Column2", "Column8"}),
#"Removed Other Columns" = Table.SelectColumns(#"Reordered Columns",{"Column1", "Column2", "Column8"}),
#"Renamed Columns" = Table.RenameColumns(#"Removed Other Columns",{{"Column1", "Total"}})
in
#"Renamed Columns"

 

Hi @joshpgeorge 

 

For the 3rd query, you can first filter on "Page" in Kind column, then keep the bottom 1 row. 

let
    Source = Pdf.Tables(File.Contents("C:\Users\Admin\Desktop\Data.pdf"), [Implementation="1.3"]),
    #"Filtered Rows" = Table.SelectRows(Source, each ([Kind] = "Page")),
    #"Kept Last Rows" = Table.LastN(#"Filtered Rows", 1)
in
    #"Kept Last Rows"

 

For the 2nd query, you can first filter on "Page" in Kind column, then remove the first row and last row.

let
    Source = Pdf.Tables(File.Contents("C:\Users\Admin\Desktop\Data.pdf"), [Implementation="1.3"]),
    #"Filtered Rows" = Table.SelectRows(Source, each ([Kind] = "Page") and ([Id] <> "Page001")),
    #"Removed Bottom Rows" = Table.RemoveLastN(#"Filtered Rows",1)
in
    #"Removed Bottom Rows"

 

Under Home tab, Keep Rows and Remove Rows can also be used to filter rows.

080602.jpg

 

Best Regards,
Community Support Team _ Jing

Helpful resources

Announcements

Copilot Cookbook Challenge | Win Tickets to the Power Platform Conference

We are excited to announce the "The Copilot Cookbook Community Challenge is a great way to showcase your creativity and connect with others. Plus, you could win tickets to the Power Platform Community Conference in Las Vegas in September 2024 as an amazing bonus.   Two ways to enter: 1. Copilot Studio: https://aka.ms/CS_Copilot_Cookbook_Challenge 2. Power Apps Copilot Cookbook Gallery: https://aka.ms/PA_Copilot_Cookbook_Challenge   There will be 5 chances to qualify for the final drawing: Early Bird Entries: March 1 - June 2Week 1: June 3 - June 9Week 2: June 10 - June 16Week 3: June 17 - June 23Week 4: June 24 - June 30     At the end of each week, we will draw 5 random names from every user who has posted a qualifying Copilot Studio template, sample or demo in the Copilot Studio Cookbook or a qualifying Power Apps Copilot sample or demo in the Power Apps Copilot Cookbook. Users who are not drawn in a given week will be added to the pool for the next week. Users can qualify more than once, but no more than once per week. Four winners will be drawn at random from the total qualifying entrants. If a winner declines, we will draw again at random for the next winner.  A user will only be able to win once. If they are drawn multiple times, another user will be drawn at random. Prizes:  One Pass to the Power Platform Conference in Las Vegas, Sep. 18-20, 2024 ($1800 value, does not include travel, lodging, or any other expenses) Winners are also eligible to do a 10-minute presentation of their demo or solution in a community solutions showcase at the event. To qualify for the drawing, templates, samples or demos must be related to Copilot Studio or a Copilot feature of Power Apps, Power Automate, or Power Pages, and must demonstrate or solve a complete unique and useful business or technical problem. Power Automate and Power Pagers posts should be added to the Power Apps Cookbook. Final determination of qualifying entries is at the sole discretion of Microsoft. Weekly updates and the Final random winners will be posted in the News & Announcements section in the communities on July 29th, 2024. Did you submit entries early?  Early Bird Entries March 1 - June 2:  If you posted something in the "early bird" time frame complete this form: https://aka.ms/Copilot_Challenge_EarlyBirds if you would like to be entered in the challenge.

May 2024 Community Newsletter

It's time for the May Community Newsletter, where we highlight the latest news, product releases, upcoming events, and the amazing work of our outstanding Community members.   If you're new to the Community, please make sure to follow the latest News & Announcements and check out the Community on LinkedIn as well! It's the best way to stay up-to-date with all the news from across Microsoft Power Platform and beyond.        COMMUNITY HIGHLIGHTS Check out the most active community members of the last month! These hardworking members are posting regularly, answering questions, kudos, and providing top solutions in their communities. We are so thankful for each of you--keep up the great work! If you hope to see your name here next month, follow these awesome community members to see what they do!   Power AppsPower AutomateCopilot StudioPower PagesWarrenBelzcreativeopinionExpiscornovusFubarAmikNived_NambiarPstork1OliverRodriguesmmbr1606ManishSolankiMattJimisonragavanrajantimlSudeepGhatakNZrenatoromaoLucas001iAm_ManCatAlexEncodianfernandosilvaOOlashynJmanriqueriosChriddle  BCBuizerExpiscornovus  a33ikBCBuizer  SebSDavid_MA  dpoggermannPstork1     LATEST NEWS   We saw a whole host of amazing announcements at this year's #MSBuild, so we thought we'd share with you a bite sized breakdown of the big news via blogs from Charles Lamanna, Sangya Singh, Ryan Cunningham, Kim Manis, Nirav Shah, Omar Aftab, and ✊🏾Justin Graham :   New ways of development with copilots and Microsoft Power PlatformRevolutionize the way you work with Automation and AIPower Apps is making it easier for developers to build with Microsoft Copilot and each otherCopilot in Microsoft Fabric is now generally available in Power BIUnlock new levels of productivity with Microsoft Dataverse and Microsoft Copilot StudioMicrosoft Copilot Studio: Building copilots with agent capabilitiesMicrosoft Power Pages is bringing the new standard in secure, AI-powered capabilities   If you'd like to relive some of the highlights from Microsoft Build 2024, click the image below to watch a great selection of on-demand Keynotes and sessions!         WorkLab Podcast with Charles Lamanna   Check out the latest episode of the WorkLab podcast with CVP of Business Apps and Platforms at Microsoft, Charles Lamanna, as he explains the ever-expanding evolution of Copilot, and how AI is offering new opportunities for business leaders. Grab yourself a coffee and click the image below to take a listen.       Event Recap: European Collaboration and Cloud Summits 2024   Click the image below to read a great recap by Mark Kashman about the recent European Collaboration Summit and European Cloud Summit held in Germany during May 2024. Great work everybody!       UPCOMING EVENTS European Power Platform Conference - SOLD OUT! Congrats to everyone who managed to grab a ticket for the now SOLD OUT European Power Platform Conference, which takes place in beautiful Brussels, Belgium, on 11-13th June. With a great keynote planned from Ryan Cunningham and Sangya Singh, plus expert sessions from the likes of Aaron Rendell, Amira Beldjilali, Andrew Bibby, Angeliki Patsiavou, Ben den Blanken, Cathrine Bruvold, Charles Sexton, Chloé Moreau, Chris Huntingford, Claire Edgson, Damien Bird, Emma-Claire Shaw, Gilles Pommier, Guro Faller, Henry Jammes, Hugo Bernier, Ilya Fainberg, Karen Maes, Lindsay Shelton, Mats Necker, Negar Shahbaz, Nick Doelman, Paulien Buskens, Sara Lagerquist, Tricia Sinclair, Ulrikke Akerbæk, and many more, it looks like the E in #EPPC24 stands for Epic!   Click the image below for a full run down of the exciting sessions planned, and remember, you'll need to move quickly for tickets to next year's event!       AI Community Conference - New York - Friday 21st June Check out the AI Community Conference, which takes place at the Microsoft Corporate building on Friday 21st June at 11 Times Square in New York City. Here, you'll have the opportunity to explore the latest trends and breakthroughs in AI technology alongside fellow enthusiasts and experts, with speakers on the day including Arik Kalininsky, Sherry Xu, Xinran Ma, Jared Matfess, Mihail Mateev, Andrei Khaidarov, Ruven Gotz, Nick Brattoli, Amit Vasu, and more. So, whether you're a seasoned professional or just beginning your journey into AI, click the image below to find out more about this exciting NYC event.       TechCon365 & Power Platform Conference - D.C. - August 12-16th ** EARLY BIRD TICKETS END MAY 31ST! ** Today's the perfect time to grab those early bird tickets for the D.C. TechCon365 & PWRCON Conference at the Walter E Washington Center on August 12-16th! Featuring the likes of Tamara Bredemus, Sunny Eltepu, Lindsay Shelton, Brian Alderman, Daniel Glenn, Julie Turner, Jim Novak, Laura Rogers, Microsoft MVP, John White, Jason Himmelstein, Luc Labelle, Emily Mancini, MVP, UXMC, Fabian Williams, Emma Wiehe, Amarender Peddamalku, and many more, this is the perfect event for those that want to gain invaluable insights from industry experts. Click the image below to grab your tickets today!         Power Platform Community Conference - Sept. 18-20th 2024 Check out some of the sessions already planned for the Power Platform Community Conference in Las Vegas this September. Holding all the aces we have Kristine Kolodziejski, Lisa Crosbie, Daniel Christian, Dian Taylor, Scott Durow🌈, David Yack, Michael O. and Aiden Kaskela, who will be joining the #MicrosoftCommunity for a series of high-stakes sessions! Click the image below to find out more as we go ALL-IN at #PPCC24!       For more events, click the image below to visit the Community Days website.                                            

Celebrating the May Super User of the Month: Laurens Martens

  @LaurensM  is an exceptional contributor to the Power Platform Community. Super Users like Laurens inspire others through their example, encouragement, and active participation. We are excited to celebrated Laurens as our Super User of the Month for May 2024.   Consistent Engagement:  He consistently engages with the community by answering forum questions, sharing insights, and providing solutions. Laurens dedication helps other users find answers and overcome challenges.   Community Expertise: As a Super User, Laurens plays a crucial role in maintaining a knowledge sharing environment. Always ensuring a positive experience for everyone.   Leadership: He shares valuable insights on community growth, engagement, and future trends. Their contributions help shape the Power Platform Community.   Congratulations, Laurens Martens, for your outstanding work! Keep inspiring others and making a difference in the community!   Keep up the fantastic work!        

Check out the Copilot Studio Cookbook today!

We are excited to announce our new Copilot Cookbook Gallery in the Copilot Studio Community. We can't wait for you to share your expertise and your experience!    Join us for an amazing opportunity where you'll be one of the first to contribute to the Copilot Cookbook—your ultimate guide to mastering Microsoft Copilot. Whether you're seeking inspiration or grappling with a challenge while crafting apps, you probably already know that Copilot Cookbook is your reliable assistant, offering a wealth of tips and tricks at your fingertips--and we want you to add your expertise. What can you "cook" up?   Click this link to get started: https://aka.ms/CS_Copilot_Cookbook_Gallery   Don't miss out on this exclusive opportunity to be one of the first in the Community to share your app creation journey with Copilot. We'll be announcing a Cookbook Challenge very soon and want to make sure you one of the first "cooks" in the kitchen.   Don't miss your moment--start submitting in the Copilot Cookbook Gallery today!     Thank you,  Engagement Team

Announcing Power Apps Copilot Cookbook Gallery

We are excited to share that the all-new Copilot Cookbook Gallery for Power Apps is now available in the Power Apps Community, full of tips and tricks on how to best use Microsoft Copilot as you develop and create in Power Apps. The new Copilot Cookbook is your go-to resource when you need inspiration--or when you're stuck--and aren't sure how to best partner with Copilot while creating apps.   Whether you're looking for the best prompts or just want to know about responsible AI use, visit Copilot Cookbook for regular updates you can rely on--while also serving up some of your greatest tips and tricks for the Community. Check Out the new Copilot Cookbook for Power Apps today: Copilot Cookbook - Power Platform Community.  We can't wait to see what you "cook" up!      

Tuesday Tip | How to Report Spam in Our Community

It's time for another TUESDAY TIPS, your weekly connection with the most insightful tips and tricks that empower both newcomers and veterans in the Power Platform Community! Every Tuesday, we bring you a curated selection of the finest advice, distilled from the resources and tools in the Community. Whether you’re a seasoned member or just getting started, Tuesday Tips are the perfect compass guiding you across the dynamic landscape of the Power Platform Community.   As our community family expands each week, we revisit our essential tools, tips, and tricks to ensure you’re well-versed in the community’s pulse. Keep an eye on the News & Announcements for your weekly Tuesday Tips—you never know what you may learn!   Today's Tip: How to Report Spam in Our Community We strive to maintain a professional and helpful community, and part of that effort involves keeping our platform free of spam. If you encounter a post that you believe is spam, please follow these steps to report it: Locate the Post: Find the post in question within the community.Kebab Menu: Click on the "Kebab" menu | 3 Dots, on the top right of the post.Report Inappropriate Content: Select "Report Inappropriate Content" from the menu.Submit Report: Fill out any necessary details on the form and submit your report.   Our community team will review the report and take appropriate action to ensure our community remains a valuable resource for everyone.   Thank you for helping us keep the community clean and useful!

Top Kudoed Authors
Users online (2,470)