This may not be the correct board. If so, please let me know and I'll make adjustments as necessary.
I work in an office that receives monthly invoices for our company's billing. We get our invoices in PDF style formatted to be continuous tables. The data is laid out kind of like so:
-member ID#, NAME - new line
-Claim number, member ID# again - new line
-Date of claim twice, claim type code, claim type code again, claim type code again again, ID number, number of units for this claim, dollar amount requested, dollar amount authorized, dollar amount deducted, dollar amount paid - new line...
Currently I'm reviewing these documents by hand with my little human eyeballs and fingers. It's not terribly slow but I have literally hundreds of these and each can contain upwards of 50+ claims.
Specifically my goal is this: (somehow) scan the PDF document w/ RegEx (or something to similar effect) and extract the text from the three lines of text demonstrated above while filtering out the "noise" (everything else). The desired data range will always start on page 2 and end on 3 pages from the last page of the PDF.
I'm using FileCenter to manage my documents. FileCenter pro provides you a neat tool to do some OCR text extraction similarly to what I'm desiring. I managed to set up a little "Demo" module to prove the concept and it worked in microcosm. I couldn't get it to export the data into a file or into a program.
Ultimately these extracted data need to be placed into an Excel document so I can do some other data related operations on them.
The target PDFs contain sensitive data so I can't share them.
Any ideas how to go from selecting a PDF to dropping specific text from the PDF into an Excel doc while pruning all the unnecessary stuff? (or a list object in PAD?)
Solved! Go to Solution.
Update:
I found the solution through RegEx and some highly specialized RegEx patterns developed by a friend for this particular case. For anyone with a similar need to take apart the contents of a PDF, then create a new variable out of the combined index value of each list of variables, please continue reading.
Step 1.: Get all files in designated target folder.
Step1.a: make sure your page ranges are good, I had to adjust mine to +1 page from page 1 (target page 2) and -3 pages from the last page (target page n-3, not the last three pages.)
Step 2. Create a list of the top level unique variable, mine is the member's name.
->Create new List: %YourListVar%
Step 3. Start a For Each loop, For %CurrentItem% in %Files%
Step 3a. Assign the PageStart# and PageEnd# to your page range variables. This is now the entire range of your data.
Step 4. Extract text from PDF inside the For Each loop from %PageStart#% to %PageEnd#% into %YourPDFTextVar%
Step 5. Use Parse text (%YourPDFTextVar%) and RegEx (YourRegExPattern) to find the text you want. Store the RegEx matches into a var of your choice, my first RegEx match is the member name, so mine is %Names%
Step 5a. For Each item in your %Names% trim out any junk data, extra string content, trim the result string %CurrentNames% and add the item %CurrentNames% to %YourListVar%.
Step 6. Repeat steps 4 and 5 or 5-5a until your data is satisfactory.
Step 7. Use a Loop from index 0 to %YourListVar.count - 1% (this will always be the upper bounds index value.)
Step 8. Inside the Loop perform an operation. In my case I am writing these values to an Excel book one at a time.
Step 8a (write to Excel book). Write to Excel worksheet %YourListVar[ListIndex]% (or any list value var(%YourListVar%) you want to cycle through,) in Column A and row %FirstFreeRow% of %ExcelInstance%
For any other uses, in your PAD Command block where it asks for a value enter %YourListVar[ListIndex]%
Example of iterate your list with Display Message box: "Message to display: %YourListVar[ListIndex]%
You will now see your Excel doc fill up with all the values from %(listvar)[ListIndex]%.
The purpose of the Loop is to get an index number of the item in question. The index number must always be a retrievable value from the target list. A list of 2 items (indexes 0,1) cannot be called upon for Index value 2, because Index value 2 is null.
%Item[RowNumber]%
Hope that helps anyone stuck with a similar issue.
That sounds like a job for the AI Builder:
AI Builder— Intelligent Automation | Microsoft Power Automate
Not sure how complex the pdfs are but...
I got some RegEx expressions built to capture the desired data.
In investigating @Henrik_M 's proposed solution I discovered that the AI builder is a premium feature and that there's some set up required to get it off the ground and flying. I've already familiarized myself with PAD so I'll continue to work on my solution based in PAD.
I've created a flow that targets a folder, grabs all the .pdfs, searches the content of all the .pdfs in a text input given range of pages, then spits out all the found matches into two running lists.
I'm hoping you can aid me here on this one,
I need to now pair up the Match 1 (name) result with the Match 2 (date) result, I.E.: "Smitty McGee", "1/23/45"
I'm really new to manipulating strings and lists of text. Arrays have always baffled me, but I'm not completely unfamiliar with some of the concepts.
@UK_Mike , thank you for the encouragement. And the jokes! I needed that laugh really bad.
"
-member ID#, NAME - new line
-Claim number, member ID# again - new line
-Date of claim twice, claim type code, claim type code again, claim type code again again, ID number, number of units for this claim, dollar amount requested, dollar amount authorized, dollar amount deducted, dollar amount paid - new line...
"
Probably best to type out what these actually look like, such as...
Member ID: Mike
Claim type code: abc123
Date of claim: 9/4/22
Dollar amount requested: $100.98
ETC...
"
I'm hoping you can aid me here on this one,
I need to now pair up the Match 1 (name) result with the Match 2 (date) result, I.E.: "Smitty McGee", "1/23/45"
"
Each of these will be individual vars for us to write to Excel in a " For Each Loop " targeting the pdf folder.
As in each loop pulls 5,6,7 etc separate vars holding the values from the current item (pdf).
Basically they are already matched...
var 1 = Mike
var 2 = abc123
var 3 = 9/4/22
var 4 = $100.98
Update:
I found the solution through RegEx and some highly specialized RegEx patterns developed by a friend for this particular case. For anyone with a similar need to take apart the contents of a PDF, then create a new variable out of the combined index value of each list of variables, please continue reading.
Step 1.: Get all files in designated target folder.
Step1.a: make sure your page ranges are good, I had to adjust mine to +1 page from page 1 (target page 2) and -3 pages from the last page (target page n-3, not the last three pages.)
Step 2. Create a list of the top level unique variable, mine is the member's name.
->Create new List: %YourListVar%
Step 3. Start a For Each loop, For %CurrentItem% in %Files%
Step 3a. Assign the PageStart# and PageEnd# to your page range variables. This is now the entire range of your data.
Step 4. Extract text from PDF inside the For Each loop from %PageStart#% to %PageEnd#% into %YourPDFTextVar%
Step 5. Use Parse text (%YourPDFTextVar%) and RegEx (YourRegExPattern) to find the text you want. Store the RegEx matches into a var of your choice, my first RegEx match is the member name, so mine is %Names%
Step 5a. For Each item in your %Names% trim out any junk data, extra string content, trim the result string %CurrentNames% and add the item %CurrentNames% to %YourListVar%.
Step 6. Repeat steps 4 and 5 or 5-5a until your data is satisfactory.
Step 7. Use a Loop from index 0 to %YourListVar.count - 1% (this will always be the upper bounds index value.)
Step 8. Inside the Loop perform an operation. In my case I am writing these values to an Excel book one at a time.
Step 8a (write to Excel book). Write to Excel worksheet %YourListVar[ListIndex]% (or any list value var(%YourListVar%) you want to cycle through,) in Column A and row %FirstFreeRow% of %ExcelInstance%
For any other uses, in your PAD Command block where it asks for a value enter %YourListVar[ListIndex]%
Example of iterate your list with Display Message box: "Message to display: %YourListVar[ListIndex]%
You will now see your Excel doc fill up with all the values from %(listvar)[ListIndex]%.
The purpose of the Loop is to get an index number of the item in question. The index number must always be a retrievable value from the target list. A list of 2 items (indexes 0,1) cannot be called upon for Index value 2, because Index value 2 is null.
%Item[RowNumber]%
Hope that helps anyone stuck with a similar issue.
Slightly different on my end.
After each loop I write to excel rather than holding the values in a list for later Excel write.
Lets say im after 10 values from each loop, if "ALL" values are populated to their respective variables they get written at the end of each loop plus the current pdf gets moved to a new folder " Processed ".
If just one value isnt found, skip current loop and that particular pdf gets moved to a new folder " Unprocessed ".
At the end of the flow " If unprocessed folder file count >=1 " than an email is sent to me calling me really really bad names 🙄
Nice write up though, well done 👏
We got claims(invoice) as a dispute with Walmart, million dollars level and it will not stop. The claim only has the total invoice amount plus long text BOL shipment ID in the invoice PDF layout, which BOL might be 180 rows across 5-6 pages. After extract to get BOL number, the next step will call HTTP to the Walmart website and download item details, such as each item refund money, item ID, date, etc.
I almost get there, trigger email and copy PDF into SharePoint, convert PDF to extracted structured JSON object file(not array).
Parsing JSON successfully. Now, I need to run "FOR EACH" to loop/get this BOL number.
I am not a logic app expert, still learning those f(x) functions, and the project is very emergency, then I come here for asking a kind help.
I need "text" and "path", and this JSON file is object > elements (array) > attributes. Item() is not array, how to put elements[] array as output previous field as condition in FOR-EACH loop? Or my direction is wrong?
Thanks in advance if anyone can help.
Episode Six of Power Platform Connections sees David Warner and Hugo Bernier talk to talk to Business Applications MVP Shane Young, alongside the latest news, product updates, and community blogs. Use the hashtag #PowerPlatformConnects on social media for a chance to have your work featured on the show! Show schedule in this episode: 0:00 Cold Open 00:24 Show Intro 01:02 Shane Young Interview 22:00 Blogs & Articles 22:20 Integrate FullCalendar.io with Power Pages 23:50 Text Data 25:15 Zero to Hero Power Apps Saga 25:44 Parent Hub Association 26:33 Using Custom Values for OneNote Power Automate References 28:04 Dynamics Power Israel 28:44 Create Beautiful Canvas Apps in Dataverse for Teams 30:36 Outro & Bloopers Check out the blogs and articles featured in this week’s episode: https://francomusso.com/integrate-fullcalendar-io-with-power-pages-from-json-basics-to-advanced-outp... @crmbizcoach https://yerawizardcat.com/text/ @YerAWizardCat www.fromzerotoheroes.com/mentorship @thevictordantas https://www.expiscornovus.com/2023/03/16/parent-hub-association/ @Expiscornovus https://lindsaytshelton.com/2023/03/15/the-painful-process-of-custom-values-for-onenote-power-automa... @lshelton_Tech https://never-stop-learning.de/create-beautiful-canvas-apps-in-dataverse-for-teams/ @MMe2K Action requested: Feel free to provide feedback on how we can make our community more inclusive and diverse. This episode premiered live on our YouTube at 12pm PST on Thursday 23rd March 2023. Video series available at Power Platform Community YouTube channel. Upcoming events: Business Applications Launch – April 4th – Free and Virtual! M365 Conference - May 1-5th - Las Vegas Power Apps Developers Summit – May 19-20th - London European Power Platform conference – Jun. 20-22nd - Dublin Microsoft Power Platform Conference – Oct. 3-5th - Las Vegas Join our Communities: Power Apps Community Power Automate Community Power Virtual Agents Community Power Pages Community If you’d like to hear from a specific community member in an upcoming recording and/or have specific questions for the Power Platform Connections team, please let us know. We will do our best to address all your requests or questions.
Super Users – 2023 Season 1 We are excited to kick off the Power Users Super User Program for 2023 - Season 1. The Power Platform Super Users have done an amazing job in keeping the Power Platform communities helpful, accurate and responsive. We would like to send these amazing folks a big THANK YOU for their efforts. Super User Season 1 | Contributions July 1, 2022 – December 31, 2022 Super User Season 2 | Contributions January 1, 2023 – June 30, 2023 Curious what a Super User is? Super Users are especially active community members who are eager to help others with their community questions. There are 2 Super User seasons in a year, and we monitor the community for new potential Super Users at the end of each season. Super Users are recognized in the community with both a rank name and icon next to their username, and a seasonal badge on their profile. Power Apps Power Automate Power Virtual Agents Power Pages Pstork1* Pstork1* Pstork1* OliverRodrigues BCBuizer Expiscornovus* Expiscornovus* ragavanrajan AhmedSalih grantjenkins renatoromao Mira_Ghaly* Mira_Ghaly* Sundeep_Malik* Sundeep_Malik* SudeepGhatakNZ* SudeepGhatakNZ* StretchFredrik* StretchFredrik* 365-Assist* 365-Assist* cha_cha ekarim2020 timl Hardesh15 iAm_ManCat annajhaveri SebS Rhiassuring LaurensM abm TheRobRush Ankesh_49 WiZey lbendlin Nogueira1306 Kaif_Siddique victorcp RobElliott dpoggemann srduval SBax CFernandes Roverandom schwibach Akser CraigStewart PowerRanger MichaelAnnis subsguts David_MA EricRegnier edgonzales zmansuri GeorgiosG ChrisPiasecki ryule AmDev fchopo phipps0218 tom_riha theapurva takolota Akash17 momlo BCLS776 Shuvam-rpa rampprakash ScottShearer Rusk ChristianAbata cchannon Koen5 a33ik AaronKnox Matren Alex_10 Jeff_Thorpe poweractivate Ramole DianaBirkelbach DavidZoon AJ_Z PriyankaGeethik BrianS StalinPonnusamy HamidBee CNT Anonymous_Hippo Anchov KeithAtherton alaabitar Tolu_Victor KRider sperry1625 IPC_ahaas zuurg rubin_boer cwebb365 If an * is at the end of a user's name this means they are a Multi Super User, in more than one community. Please note this is not the final list, as we are pending a few acceptances. Once they are received the list will be updated.
Welcome to our March 2023 Newsletter, where we'll be highlighting the great work of our members within our Biz Apps communities, alongside the latest news, video releases, and upcoming events. If you're new to the community, be sure to subscribe to the News & Announcements and stay up to date with the latest news from our ever-growing membership network who find real "Power in the Community". LATEST NEWS Power Platform Connections Check out Episode Five of Power Platform Connections, as David Warner II and Hugo Bernier chat with #PowerAutomate Vice President, Stephen Siciliano, alongside reviewing out the great work of Vesa Juvonen, Waldek Mastykarz, Maximilian Müller, Kristine Kolodziejski, Danish Naglekar, Cat Schneider, Victor Dantas, and many more. Use the hashtag #PowerPlatformConnects on social media for a chance to have your work featured on the show! Did you miss an episode? Catch up now in the Community Connections Galleries Power Apps, Power Automate, Power Virtual Agents, Power Pages Power Platform leading a new era of AI-generated low-code development. **HOT OFF THE PRESS** Fantastic piece here by Charles Lamanna on how we're reinventing software development with Copilot in Power Platform to help you can build apps, flows, and bots with just a simple description! Click here to see the Product Blog Copilot for Power Apps - Power CAT Live To follow on from Charles' blog, check out #PowerCATLive as Phil Topness gives Clay Wesener Wesner a tour of the capabilities of Copilot in Power Apps. UPCOMING EVENTS Modern Workplace Conference Check out the Power Platform and Microsoft 365 Modern Workplace Conference that returns face-to-face at the Espace St Martin in Paris on 27-28th March. The #MWCP23 will feature a wide range of expert speakers, including Nadia Yahiaoui, Amanda Sterner, Pierre-Henri, Chirag Patel, Chris Hoard, Edyta Gorzoń, Erika Beaumier, Estelle Auberix, Femke Cornelissen, Frank POIREAU, Gaëlle Moreau, Gilles Pommier, Ilya Fainberg, Julie Ecolivet, Mai-Lynn Lien, Marijn Somers, Merethe Stave, Nikki Chapple, Patrick Guimonet, Penda Sow, Pieter Op De Beéck, Rémi Riche, Robin Doudoux, Stéphanie Delcroix, Yves Habersaat and many more. Click here to find out more and register today! Business Applications Launch 2023 Join us on Tuesday 4th April 2023 for an in-depth look into the latest updates across Microsoft Power Platform and Microsoft Dynamics 365 that are helping businesses overcome their biggest challenges today. Find out about new features, capabilities, and best practices for connecting data to deliver exceptional customer experiences, collaborating and creating using AI-powered capabilities, driving productivity with automation, and building future growth with today’s leading technology. Click Here to Register Today! Power Platform Conference 2023 We are so excited to see you for the Microsoft Power Platform Conference in Las Vegas October 3-5th, 2023! But first, let's take a look below at some fun moments from MPPC 2022 in Orlando Florida. 2023 sees guest speakers such as Charles Lamanna, Heather Cook, Julie Strauss, Nirav Shah, Ryan Cunningham, Sangya Singh, and many more taking part, so why not click the link below to register for the #PowerPlatformConf today! Vegas, baby! Click Here to Register Today! COMMUNITY HIGHLIGHTS Check out our top Super and Community Users reaching new levels! These hardworking members are posting, answering questions, kudos, and providing top solutions in their communities. Power Apps: Super Users: @WarrenBelz | @iAm_ManCat Community Users: @LaurensM | @Rusk | @RJM07 Power Automate: Super Users: @abm | @Expiscornovus | @RobElliott Community Users: @grantjenkins | @Chriddle Power Virtual Agents: Super Users: @Expiscornovus | @Pstork1 Community Users: @MisterBates | @Jupyter123 | Kunal K Power Pages: Super Users: @OliverRodriguesOliverRodrigues | @Mira_Ghaly Community Users: @FubarFubar | @ianwukianwuk LATEST PRODUCT BLOG ARTICLES Power Apps Community Blog Power Automate Community Blog Power Virtual Agents Community Blog Power Pages Community Blog Check out 'Using the Community' for more helpful tips and information: Power Apps, Power Automate, Power Virtual Agents, Power Pages
Join us for an in-depth look into the latest updates across Microsoft Dynamics 365 and Microsoft Power Platform that are helping businesses overcome their biggest challenges today. Find out about new features, capabilities, and best practices for connecting data to deliver exceptional customer experiences, collaborating, and creating using AI-powered capabilities, driving productivity with automation—and building towards future growth with today’s leading technology. Microsoft leaders and experts will guide you through the full 2023 release wave 1 and how these advancements will help you: Expand visibility, reduce time, and enhance creativity in your departments and teams with unified, AI-powered capabilities.Empower your employees to focus on revenue-generating tasks while automating repetitive tasks.Connect people, data, and processes across your organization with modern collaboration tools.Innovate without limits using the latest in low-code development, including new GPT-powered capabilities. Click Here to Register Today!
We are excited to share the ‘Power Platform Communities Front Door’ experience with you! Front Door brings together content from all the Power Platform communities into a single place for our community members, customers and low-code, no-code enthusiasts to learn, share and engage with peers, advocates, community program managers and our product team members. There are a host of features and new capabilities now available on Power Platform Communities Front Door to make content more discoverable for all power product community users which includes ForumsUser GroupsEventsCommunity highlightsCommunity by numbersLinks to all communities Users can see top discussions from across all the Power Platform communities and easily navigate to the latest or trending posts for further interaction. Additionally, they can filter to individual products as well. Users can filter and browse the user group events from all power platform products with feature parity to existing community user group experience and added filtering capabilities. Users can now explore user groups on the Power Platform Front Door landing page with capability to view all products in Power Platform. Explore Power Platform Communities Front Door today. Visit Power Platform Community Front door to easily navigate to the different product communities, view a roll up of user groups, events and forums.
We are so excited to see you for the Microsoft Power Platform Conference in Las Vegas October 3-5 2023! But first, let's take a look back at some fun moments and the best community in tech from MPPC 2022 in Orlando, Florida. Featuring guest speakers such as Charles Lamanna, Heather Cook, Julie Strauss, Nirav Shah, Ryan Cunningham, Sangya Singh, Stephen Siciliano, Hugo Bernier and many more. Register today: https://www.powerplatformconf.com/
User | Count |
---|---|
18 | |
13 | |
10 | |
6 | |
5 |
User | Count |
---|---|
34 | |
27 | |
17 | |
14 | |
14 |