Hi All,
Need some help with extracting the URL from the email body below. I have a flow which converts the received email body HTML to text and I tried to use split to extract it but its not working, can anyone advise the best way to achieve this?
Solved! Go to Solution.
Hi @aanyoti1
The '\n' aren't being removed as there is no space between the carriage return... I've adjusted as follows:
Full configuration:
Expressions:
substring(variables('Text'),lastIndexOf(variables('Text'),'https'))
replace(outputs('Html_to_text')?['body'],'\n',' ')
substring(outputs('Compose_-_Replace_Chars'),0,indexOf(outputs('Compose_-_Replace_Chars'),' '))
You can obviously consolidate the expressions but I've kept separate for ease of reading... personally I wouldn't consolidate as it just makes it harder to read / support in future.
HTH
Jay
Hey @aanyoti1
You can do this with expressions but it is a little convoluted, review this data flow:
The URL has been extracted, the actions are:
1) using substring to remove all content before the first instance of 'https'
2) converting the remaining string content to plain text (This removes any new line chars (\n))
3) using substring to remove trailing content by locating the first whitespace
Here is the config:
And the expressions:
substring(variables('Text'),lastIndexOf(variables('Text'),'https'))
substring(outputs('Html_to_text')?['body'],0,indexOf(outputs('Html_to_text')?['body'],' '))
You could also consider using the Encodian 'Search Text - Regex' action which would be a lot more robust:
Configuration:
Regex: (?:(?:https?|ftp):\/\/|\b(?:[a-z\d]+\.))(?:(?:[^\s()<>]+|\((?:[^\s()<>]+|(?:\([^\s()<>]+\)))?\))+(?:\((?:[^\s()<>]+|(?:\(?:[^\s()<>]+\)))?\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))?
HTH
Hi @Jay-Encodian,
Looks like we are nearly there, the last compose action still seems to include the word 'please', not sure why, maybe because it begins on a new line?? What expression should be included in the substring to have this removed?
Hi @aanyoti1
I don't think you have copied the expressions correctly.
The 'HTML to Text' is used to remove the line breaks, the last expression substrings the output from the 'HTML to Text' output starting at the first character and then ending at the first blank space.
You need to recheck what you have entered... can you also please post screen shots of the outputs from all the actions as per my previous screenshot and also include the expression you have entered
Thanks
I checked again and this is what I entered:
Compose - Trim Start Action:
substring(variables('Text'),lastIndexOf(variables('Text'),'https'))
Compose Action:
substring(outputs('Html_to_text')?['body'],0,indexOf(outputs('Html_to_text')?['body'],' '))
See below flow:
@aanyoti1 Can you post the a copy of the data you are trying to process... I don;t think there is a space between the URL and please.
Hi @Jay-Encodian,
See below:
Hello,
Your Approval has been requested for
Products in the Basket: https://mydomain.12345.com/a3G5I000000bxkP
Please click link above to approve or reject this record.
Thank you!
Hi @aanyoti1 ... hmmm, same data with the expressions I have already provided to you
substring(variables('Text'),lastIndexOf(variables('Text'),'https'))
substring(outputs('Html_to_text')?['body'],0,indexOf(outputs('Html_to_text')?['body'],' '))
Can you please click on the 'Raw outputs' from the Html to Text action... I think there is some extra data in the payload.
Hi @Jay-Encodian,
See below raw output of the HTML to Text:
{
"statusCode": 200,
"headers": {
"Pragma": "no-cache",
"Transfer-Encoding": "chunked",
"Vary": "Accept-Encoding",
"Strict-Transport-Security": "max-age=31536000; includeSubDomains",
"X-Content-Type-Options": "nosniff",
"X-Frame-Options": "DENY",
"Timing-Allow-Origin": "*",
"x-ms-apihub-cached-response": "false",
"Cache-Control": "no-store, no-cache",
"Date": "Fri, 18 Sep 2020 14:51:08 GMT",
"Set-Cookie": "ARRAffinity=7007353e6908e52d8a882a0d248752a54a5a3b25dfdde97fabc8ecca38b8d51c;Path=/;HttpOnly;Domain=conversionservice-ne.azconn-ne.p.azurewebsites.net",
"Content-Type": "text/html; charset=utf-8",
"Expires": "-1",
"Content-Length": "127"
},
"body": "https://12345.abcde.com/a3G5I000000bxkP\n\nPlease click link above to approve or reject this record. \n\nThank you!"
}
Hi @aanyoti1
The '\n' aren't being removed as there is no space between the carriage return... I've adjusted as follows:
Full configuration:
Expressions:
substring(variables('Text'),lastIndexOf(variables('Text'),'https'))
replace(outputs('Html_to_text')?['body'],'\n',' ')
substring(outputs('Compose_-_Replace_Chars'),0,indexOf(outputs('Compose_-_Replace_Chars'),' '))
You can obviously consolidate the expressions but I've kept separate for ease of reading... personally I wouldn't consolidate as it just makes it harder to read / support in future.
HTH
Jay
I have followed followed your step by step and I am still having the same issue as the OP.
my last compose raw output is:
Hello! I need help.
When I tried to reproduce your flow, I sometimes got this kind of error. What could be the possible causes?
User | Count |
---|---|
96 | |
40 | |
25 | |
22 | |
16 |
User | Count |
---|---|
129 | |
51 | |
48 | |
35 | |
24 |