cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
Duck_Taper
New Member

Web Extraction action gives text and the background HTML

I have a Web Extraction that pulls data from a job posting site. It works for most of the text I need extracted, but on 10%-15% of the rows, it extracts some of the text and the rest it extracts in its source/HTML format, rendering it useless. I need it to only extract the text, not half text then the rest as HTML.

 

Here is an example of what the problem looks like:

Web Extract Image 3.png

 

this is the web extraction selector:

Web Extract Image 4.png

 

my CSS selectors are 

html > body > div:eq(0) > div > div > div:eq(1) > div > div:eq(2) > div

div:eq(0) > div > label

div:eq(0) > div > div

 

I think it must be something with the selector, because the problem shows up in the selector tool:

Web Extract Image 2.png

 

I need help extracting only the text, not half the text, then some HTML code. 

 

thanks! 

1 ACCEPTED SOLUTION

Accepted Solutions

Hi @Duck_Taper 

 

As a workaround, you could replace the tags for nothing:

tkuehara_0-1629144063540.png

This is %NewVar% value: with tags

tkuehara_1-1629144274212.png

And this is the %Replaced% value: without tags

tkuehara_2-1629144360547.png

 

View solution in original post

4 REPLIES 4
NikosMoutzou
Microsoft
Microsoft

Hello @Duck_Taper !

 

This is strange behavior, probably it is related to the HTML code of the page.

 

Could you please try to extract the specific area with a separate action? Either with 'Extract data from web page' or the 'Get details of element on web page'.

I've been using 'Extract data from web page' and have tried capturing it different ways, like extracting it as a table or list, Extracting a specific section and broadening out to extract a larger chunk of the web content.  However, I still get the same html problem, no matter how I slice it. 

 

I tried 'Get details of element on web page' and got the same results: text with the html showing in the middle

Web Extract Image 5.PNG

 

Agreed that it's strange behavior. Any help is appreciated.

 

Thanks!

Hi @Duck_Taper 

 

As a workaround, you could replace the tags for nothing:

tkuehara_0-1629144063540.png

This is %NewVar% value: with tags

tkuehara_1-1629144274212.png

And this is the %Replaced% value: without tags

tkuehara_2-1629144360547.png

 

Awesome! I had to rework a few things, but this workaround got me where I need to be. thanks. 

Helpful resources

Announcements
MPA Virtual Workshop Carousel 768x460.png

Register for a Free Workshop

Learn to digitize and optimize business processes and connect all your applications to share data in real time.

New Process Advisor Capabilities carousel.png

Read the blog for the latest news

Read the latest about new experiences and capabilities in the Power Automate product blog.

PA Survey Carousel Image.png

We want to hear from you!

If you are a small business ISV/Reseller, share your thoughts with our research team.

AI Builder AMA June 7th carousel (up on May 25th, take down June 8th) (1).png

'Ask Microsoft Anything' about AI Builder!

The AI Builder team invite you to ask questions and provide helpful answers at our next AMA.

Users online (1,647)