Showing results for 
Search instead for 
Did you mean: 
Frequent Visitor

Extract Text From Structured PDF

Hi RPA Community,

I have this PDF file that i want to extract its text. The PDF will be in a structured form and the output text file should follow the structure accordingly. Can someone give advice on which approach should i use in order to get the correct output.


I will share you the sample PDF and the desired text format once extracted 


Appreciate your time and assistance,

Thanks and Regards,



Responsive Resident
Responsive Resident

Hi @yoko2020 ,

Thank you for your suggestion, does this mean i need to rely on AI builder or any other 3rd party software service in order to get the extracted text for my situation?

I was hoping that there is a way to get my desired output using commands that are available in PAD.


Thanks and Regards,


I never use parse/regex action or extract text from pdf action  from PAD when dealing with invoice, sales order, custom form document (pdf/image) extraction, always use third party software specialized for this purpose.


Things to consider when dealing with this stuff :

1. Does the document always come in text pdf ?

2. What happen if document come in image pdf ?

3. Are we dealing with  =>1000 of documents per month or just 10 documents per month ?

4. What if in 1 document contain multiple invoices that need to be separated ?

    See this video what i mean about document separation/invoice splitting

5. And sometimes invoice contain multiple page, so we are facing dynamic invoice pages that need to be processed.



Most of this software can handle invoice splitting except power automate aibuilder.

If you only process small quantity you can try use internal PAD action, but make notice of those 5 points or else your project will stuck in the future.






Super User
Super User

Hi @yoko2020 

If the pdf is constant header means you can directly use regex.

first you need to us e action Extract text from PDF

After use parse text and use regex based on the required data.



Ahammad Riyaz

If this post helps answer your question, please click on “Accept as Solution” to help other members find it more quickly. If you thought this post was helpful, please give it a Thumbs Up.

Post Prodigy
Post Prodigy

"The PDF will be in a structured form and the output text file should follow the structure accordingly"

This makes no sense, surely your just pulling particular values rather than the whole pdf text ?


If particular values, yes it can be done wholly in PAD...


Screenshot 2022-03-29 140513.png

Hi @yoko2020 ,

I previously used regex and string manipulation to extract data from this pdf format. However, previously i used Automation Anywhere (AA) it can extract text in structured format so it was easy for me to extract the data line by line with string conditions. Right now I had to migrate to PAD so i find the extract text is not the same as AA and i find that the result is different than i expected. I will share you the output i got from using PAD extract pdf to text command.

Let me answer your question:-

1- Yes, for this file it will always come in readable pdf format as it is generated by a system.

2- If there is an image pdf file during extraction it will extract nothing so an error handling should be able to overcome that.

3- Yes, we are dealing with 1000+ documents per month.

4- No, there wont be any invoice combined together as the system will generate 1 invoice per order.

5- If there is multiple page i should still be able to extract all the necessary information if the text output is in a structured format.


Thanks and Regards,


Hi @Ahammad_Riyaz ,

Yes the pdf will have constant header and will repeat if there is multiple page of the invoice. I tried this approach but however the result i get from "Extract PDF to Text" is hard to implement the regex or string operations. This is probably due to the format of the pdf that's why the text result is cluttered and not in pair. I share with you the output i got from using the PAD Extract PDF to Text.

Is there any way for me to extract the text without using 3rd party applications or AI-builder for this?


Thanks and Regards,


Hi @UK_Mike ,

Yes i wanted to pull particular values, but the result i get from Extract PDF to Text is not organized and applying regex or string manipulation can be difficult as the value doesn't seem to be coming in pairs for this PDF file. If the text extracted is written by following the same format as the PDF then it is possible for me to extract the invoice details as well as the item details. I share with you the text output i get from PAD. As you can see, each value is hard to differentiate.

Plenty of field that i want to extract and evaluate and if the extracted text is coming in this format i can be quite troublesome for me to extract the details. I was hoping to find a solution for this without relying on 3rd party applications or AI-Builder as additional cost will incur and there are over 1000+ PDF invoices needed to be processed per month.


Thanks and Regards,




Yes i know that technique very well.

But i never use that method, wasting of time and double work in the future when dealing with >2000 document and 100 vendor (document layout)



what PAD version you use ? looks like action extract text from pdf has a bug, it does not keep indentation.

In that use you have to go for AI Builder, you can train and use for different vendor.



Yes, already finish long time a go, but not using AI Builder.

Well I extract the pdfs to a variable, not sure what you're extracting to ?

Line numbers cannot be relied on, one pdf could have the invoice date on line 20, next pdf on line 21, too unreliable.

When I have the read pdf text in a variable I then parse this variable with regex.

One regex per value required, Date, Amount, Customer etc.

The resulting variables from the parse then get written to Excel.

Im sure others have their own way of doing it but im trying to avoid Ai builder or any 3rd parties.

I see in the latest PAD update, pdf tables are catered for, not tried it yet, scared of updating 😂

I have a look now and again on the AI Builder forum, it is not the Holy Grail of Pdf data extraction.

I see @Ahammad_Riyaz  referred to this method, it works for me.


@yoko2020  , care to elaborate what software you use, more than 1, costs ?








Responsive Resident
Responsive Resident



Chronoscan Advanced version + Nuance Plug-Ins

ABBYY® FlexiCapture®

Artsyl’s docAlpha










Sorry @yoko2020 , as soon as I posted I seen your post further on up mentioning the software you used.

Thanks x 2 😂

Responsive Resident
Responsive Resident



You also can test using this 2 services if you want getting a headache every 1 minute. 😂


Azure Form Recognizer
Amazon Textract

Ermmmmmmmmmmmm...................... no thanks 😂

Hi @yoko2020 ,

I'm using PAD version free version not licensed, If this is a bug then i can get support for this issue. Can you share me your extract PDF to text result that you get from my file that i shared?


Thanks and Regards,



Hi @UK_Mike ,

For my situation, I'm still in researching phase of this project. Trying out the PDF extraction command. Previously i used Automation Anywhere (AA), the extract pdf to text wont store the extracted data to a variable however it writes on a text file. I can extract the details line by line and create conditions based on the counter variable.

What i can see when i was using PAD the extract pdf to text command will extract all details onto a variable. From that variable i can convert it to a list and begin my regex and string operations to find the necessary details in each line. If i am not mistaken do correct me if i'm wrong.

Just that the issue is when i use the extract PDF to text in PAD, the indentation of the pdf file is removed and is placed on a new line. This makes the data inconsistent even if all the pdf is in the same format.

Does this issue only happen to me can you share your Extract PDF to Text result from PAD so that i can confirm my situation?


Thanks and Regards,


Helpful resources


Are you ready to SUIT UP and become a Super User? Find out how TODAY!

We can’t imagine our communities without the amazing work of our Super Users! They are the most active members of our community, offering incredible solutions, providing answers to questions across the forum, and working closely with the Microsoft Power Platform Community team to find new ways to engage our communities around the world.   If you are interested in becoming a Super User, today at #MPPC23, we annoucned a new way for you to “SUIT” up and earn your Super User badge! The new “Super User in Training” initiative is a great way for you to begin building your solution rate, engage with other community members, and find out what it takes to truly be SUPER.   Become a “super solver” across the Power Platform communities, whether you’re an expert in Power Apps or just getting started with Power Pages. No matter where you are on your Power Platform journey, we are here to encourage YOU to discover YOUR superpower! Don't sell your self short, even as a newcomer to Power Platform or Dynamics 365 you are on a journey of discovery.  In fact in my experience people that are just starting out are often the ones that can solve some of the  most challenging problems because the research they are doing to get ramped up is exactly what the person asking for help is seeking!   Find out more about the SUIT program for “Super Users in Training” at the Power Platform Community Lounge at #MPPC23. Not at the Conference, just click this link to find out how to sign up today:

Back to Basics: Tuesday Tip #2: All About Community Ranks

This weekly series is our way of helping the amazing members of our community--both new members and seasoned veterans--learn and grow in how to best engage in the community! Each Tuesday, we will feature new areas of content that will help you best understand the community--from ranking and badges to profile avatars, from Super Users to blogging in the community. Our hope is that this information will help each of our community members grow in their experience with Power Platform, with the community, and with each other!   Have you ever wondered how your fellow community members earn the different ranks available? What is the difference between an Advocate and a Helper, a Solution Sage and a Community Champion? In today's #TuesdayTip, we share the secrets and tips to help YOU keep your ranking growing--and why it's so important to our communities. What are community ranks? - Power Platform Community (   Get the details in this Knowledge Base article that shows you what ranks are, how they are achieved, and what they mean to you as you engage with other community members on a regular basis. Once you start your journey in the community, ranking up, you'll find the benefits. So get busy with those kudos, solutions, and more! We can't wait to see how you rank!That's it for this week. Tune in for more Tuesday Tips next Tuesday and join the community as we continue to get "Back to Basics."

It's #MPPC23 Week! Check Out the Community Sessions and Events Happening in Vegas

After all the planning and preparing, the annual Microsoft Power Platform Conference is finally here! We are excited to see so many of our community in Las Vegas this week. To help make sure you don't miss any of the workshops, sessions, and events we have planned, make sure to check out this handy Community One-Sheet, and download the pdf today! Make sure to stop by the Community Lounge to meet @hugobernier, @EricArcher, @heaher_italent, and @AshleyFelts from our team!    

Join Us for the First-Ever Biz Apps Community User Group Meeting: Live from MPPC23

      Join us for the first-ever the Biz Apps Community User Group meeting live from the Power Platform Conference! This one hour user group meeting is all about discovering the value and benefits of User Groups! Discover how you can find a group in your local area or about specific topics where you can learn new skills and meet like-minded people as a user group member.   Hear from User Group leaders about why they do what they do and what resources they receive to help them succeed as community ambassadors. If you have never attended a User Group meeting before, this will be a great introduction! We hope you are inspired to find a group that meets your unique interests!   October 5th at 2:15 pm Pacific time   If you're attending #MPPC23 in Las Vegas, join us in person! Find out more here:!/session/Biz%20Apps%20Community%20User%20Group%20Meeting%20-%20Live%20from%20MPPC/6172   Not at MPPC23? Attend vvirtually by registering here:    If you can't attend this meeting live, don't worry! We will record this meeting and share it with the Community at 

Back to Basics: Tuesday Tip #1: All About YOUR Community Account

We are excited to kick off our new #TuesdayTIps series, "Back to Basics." This weekly series is our way of helping the amazing members of our community--both new members and seasoned veterans--learn and grow in how to best engage in the community! Each Tuesday, we will feature new areas of content that will help you best understand the community--from ranking and badges to profile avatars, from Super Users to blogging in the community. Our hope is that this information will help each of our community members grow in their experience with Power Platform, with the community, and with each other!     This Week's Tips: Account Support: Changing Passwords, Changing Email Addresses or Usernames, "Need Admin Approval," Etc.Wondering how to get support for your community account? Check out the details on these common questions and more. Just follow the link below for articles that explain it all.Community Account Support - Power Platform Community (   All About GDPR: How It Affects Closing Your Community Account (And Why You Should Think Twice Before You Do)GDPR, the General Data Protection Regulation (GDPR), took effect May 25th 2018. A European privacy law, GDPR imposes new rules on companies and other organizations offering goods and services to people in the European Union (EU), or that collect and analyze data tied to EU residents. GDPR applies no matter where you are located, and it affects what happens when you decide to close your account. Read the details here:All About GDPR - Power Platform Community (   Getting to Know You: Setting Up Your Community Profile, Customizing Your Profile, and More.Your community profile helps other members of the community get to know you as you begin to engage and interact. Your profile is a mirror of your activity in the community. Find out how to set it up, change your avatar, adjust your time zone, and more. Click on the link below to find out how:Community Profile, Time Zone, Picture (Avatar) & D... - Power Platform Community (   That's it for this week. Tune in for more Tuesday Tips next Tuesday and join the community as we get "Back to Basics."

Announcing the MPPC's Got Power Talent Show at #MPPC23

Are you attending the Microsoft Power Platform Conference 2023 in Las Vegas? If so, we invite you to join us for the MPPC's Got Power Talent Show!      Our talent show is more than a show—it's a grand celebration of connection, inspiration, and shared journeys. Through stories, skills, and collective experiences, we come together to uplift, inspire, and revel in the magic of our community's diverse talents. This year, our talent event promises to be an unforgettable experience, echoing louder and brighter than anything you've seen before.    We're casting a wider net with three captivating categories:  Demo Technical Solutions: Show us your Power Platform innovations, be it apps, flows, chatbots, websites or dashboards... Storytelling: Share tales of your journey with Power Platform. Hidden Talents: Unveil your creative side—be it dancing, singing, rapping, poetry, or comedy. Let your talent shine!    Got That Special Spark? A Story That Demands to Be Heard? Your moment is now!  Sign up to Showcase Your Brilliance:  Deadline for submissions: Thursday, Sept 28th    How It Works:  Submit this form to sign up:  We'll contact you if you're selected. Get ready to be onstage!  The Spotlight is Yours: Each participant has 3-5 minutes to shine, with insightful commentary from our panel of judges. We’re not just giving you a stage; we’re handing you the platform to make your mark.     Be the Story We Tell: Your talents and narratives will not just entertain but inspire, serving as the bedrock for our community’s future stories and successes.    Celebration, Surprises, and Connections: As the curtain falls, the excitement continues! Await surprise awards and seize the chance to mingle with industry experts, Microsoft Power Platform leaders, and community luminaries. It's not just a show; it's an opportunity to forge connections and celebrate shared successes.    Event Details:  Date and Time: Wed Oct 4th, 6:30-9:00PM   Location: MPPC23 at the MGM Grand, Las Vegas, NV, USA  

Top Kudoed Authors
Users online (3,522)