cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
lliu_western
Resolver II
Resolver II

Scrap a specific text from html

Hello community,

 

I have a flow that gets the html of a website.

lliu_western_0-1606974367313.png

 

Part of the html out that I got from getting the html is follow:

 

lliu_western_1-1606974437827.png

 

The "sessionDataKey" is the unique text within the html that stays consistent and closest to the key that I need to scap.

 

So I have a substring formula as follw:

 

substring
(
substring(outputs('html'), indexOf(outputs('html'), 'sessionDataKey')),
9,
indexOf(substring(outputs('html'), indexOf(outputs('html'), 'sessionDataKey')), '"')
)

 

I counted from the word "sessionDataKey" to the first letter of the key that I need to scrap, there are nine spaces between then, and I want it to stop when it encounter the quotation mark. 

 

But the flow fails, can anyone sees any obvious mistakes in my substring code?

 

Thanks in advance 

1 ACCEPTED SOLUTION

Accepted Solutions
Paulie78
Super User
Super User

It is because the expression I gave you was searching for a double quote as per original post. This string has single quotes so try this instead:

 

split(substring(outputs('htmlContent'), indexof(outputs('htmlContent'), 'name=''sessionDataKey''')),'''')[3]

If you get stuck, lose the [3] off the end of the split expression and see what array it outputs. The [3] just selects the 3rd element of the array.

 

 

View solution in original post

12 REPLIES 12
Paulie78
Super User
Super User

Try:

split(substring(outputs('html'), indexof(outputs('html'), 'name="sessionDataKey"')),'"')[3]

@Paulie78 

lliu_western_0-1607016853966.png

still not liking it. Can't figure out where it got wrong either. 

Paulie78
Super User
Super User

sounds like it is looking for a step called "html" and you don't have one.

@Paulie78 , I did have the step "html" before that. I renamed it again and it worked, I was able to run the flow but it failed with the error below:

 

Unable to process template language expressions in action 'Get_DataKey_Token' inputs at line '1' and column '7271': 'The template language function 'substring' parameter is out of range: 'start index' must be non-negative integer and should be less than the length of the string. Please see https://aka.ms/logicexpressions#substring for usage details.'.

Paulie78
Super User
Super User

That means it did not find 

name="sessionDataKey"

in the string. Try adding a compose before which just has:

indexof(outputs('html'), 'name="sessionDataKey"')

and see what it comes out with. Sounds like it is producing -1

@Paulie78  I got the substring working just fine now, but I got more than I needed in terms of split length but I should be able to figure out from here, thanks!!

@Paulie78 

@yashag2255 

 

this whole image is the return value I got:

lliu_western_0-1607018037728.png

by using this formula:

lliu_western_1-1607018068674.png

But I only need the highlighted text, how can I get the substring top at the first single quote it encounters? 

Paulie78
Super User
Super User

It is because the expression I gave you was searching for a double quote as per original post. This string has single quotes so try this instead:

 

split(substring(outputs('htmlContent'), indexof(outputs('htmlContent'), 'name=''sessionDataKey''')),'''')[3]

If you get stuck, lose the [3] off the end of the split expression and see what array it outputs. The [3] just selects the 3rd element of the array.

 

 

@Paulie78  it is still saying invalid expression even if I take out the [3]

@Paulie78 

lliu_western_0-1607021599250.png

this is what I have currently that gets me exactly what I need. But it is not dynamic enough. I don't want to use "32", I want to use a expression that would stop when it encounters the quotation. 

Paulie78
Super User
Super User

When I tested it, the expression I provided worked perfectly. But it is really hard to test it properly without actual sample data. 

@Paulie78  I still couldn't get it work by using split. But I was able to get it work by using Substring again and just use indexof formula to calculate where I want the substring to stop. But thank you for your help, you have been very helpful! 

Helpful resources

Announcements
Power Automate News & Announcements

Power Automate News & Announcements

Keep up to date with current events and community announcements in the Power Automate community.

Community Calls Conversations

Community Calls Conversations

A great place where you can stay up to date with community calls and interact with the speakers.

Power Automate Community Blog

Power Automate Community Blog

Check out the latest Community Blog from the community!

Top Solution Authors
Users online (2,247)