cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
Duck_Taper
New Member

Web Extraction action gives text and the background HTML

I have a Web Extraction that pulls data from a job posting site. It works for most of the text I need extracted, but on 10%-15% of the rows, it extracts some of the text and the rest it extracts in its source/HTML format, rendering it useless. I need it to only extract the text, not half text then the rest as HTML.

 

Here is an example of what the problem looks like:

Web Extract Image 3.png

 

this is the web extraction selector:

Web Extract Image 4.png

 

my CSS selectors are 

html > body > div:eq(0) > div > div > div:eq(1) > div > div:eq(2) > div

div:eq(0) > div > label

div:eq(0) > div > div

 

I think it must be something with the selector, because the problem shows up in the selector tool:

Web Extract Image 2.png

 

I need help extracting only the text, not half the text, then some HTML code. 

 

thanks! 

1 ACCEPTED SOLUTION

Accepted Solutions
tkuehara
Solution Specialist
Solution Specialist

Hi @Duck_Taper 

 

As a workaround, you could replace the tags for nothing:

tkuehara_0-1629144063540.png

This is %NewVar% value: with tags

tkuehara_1-1629144274212.png

And this is the %Replaced% value: without tags

tkuehara_2-1629144360547.png

 

View solution in original post

4 REPLIES 4
NikosMoutzou
Microsoft
Microsoft

Hello @Duck_Taper !

 

This is strange behavior, probably it is related to the HTML code of the page.

 

Could you please try to extract the specific area with a separate action? Either with 'Extract data from web page' or the 'Get details of element on web page'.

I've been using 'Extract data from web page' and have tried capturing it different ways, like extracting it as a table or list, Extracting a specific section and broadening out to extract a larger chunk of the web content.  However, I still get the same html problem, no matter how I slice it. 

 

I tried 'Get details of element on web page' and got the same results: text with the html showing in the middle

Web Extract Image 5.PNG

 

Agreed that it's strange behavior. Any help is appreciated.

 

Thanks!

tkuehara
Solution Specialist
Solution Specialist

Hi @Duck_Taper 

 

As a workaround, you could replace the tags for nothing:

tkuehara_0-1629144063540.png

This is %NewVar% value: with tags

tkuehara_1-1629144274212.png

And this is the %Replaced% value: without tags

tkuehara_2-1629144360547.png

 

View solution in original post

Awesome! I had to rework a few things, but this workaround got me where I need to be. thanks. 

Helpful resources

Announcements
UG GA Amplification 768x460.png

Launching new user group features

Learn how to create your own user groups today!

Community Connections 768x460.jpg

Community & How To Videos

Check out the new Power Platform Community Connections gallery!

M365 768x460.jpg

Microsoft 365 Collaboration Conference | December 7–9, 2021

Join us, in-person, December 7–9 in Las Vegas, for the largest gathering of the Microsoft community in the world.

Users online (3,072)