Solved: Re: Extract URL from Email body

aanyoti1 · ‎09-16-2020

Hi All,

Need some help with extracting the URL from the email body below. I have a flow which converts the received email body HTML to text and I tried to use split to extract it but its not working, can anyone advise the best way to achieve this?

Jay-Encodian · ‎09-22-2020

Updated Sept 23:

There is now a much simpler solution available to extract a URL(s) from the body of an email (or any other text value) by using the Encodian Utility - Extract URL's from Text action.

Consider this simple flow:

Which extracts all the contained URL's within the email:

You can a further flow example and instructional video on this post: Extract URLs from Text and Documents with Power Automate

View solution in original post

Jay-Encodian · ‎09-16-2020

Hey @aanyoti1

You can do this with expressions but it is a little convoluted, review this data flow:

The URL has been extracted, the actions are:

1) using substring to remove all content before the first instance of 'https'

2) converting the remaining string content to plain text (This removes any new line chars (\n))

3) using substring to remove trailing content by locating the first whitespace

Here is the config:

And the expressions:

substring(variables('Text'),lastIndexOf(variables('Text'),'https'))

substring(outputs('Html_to_text')?['body'],0,indexOf(outputs('Html_to_text')?['body'],' '))

You could also consider using the Encodian 'Search Text - Regex' action which would be a lot more robust:

Configuration:

Regex: (?:(?:https?|ftp):\/\/|\b(?:[a-z\d]+\.))(?:(?:[^\s()<>]+|\((?:[^\s()<>]+|(?:\([^\s()<>]+\)))?\))+(?:\((?:[^\s()<>]+|(?:\(?:[^\s()<>]+\)))?\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))?

HTH

aanyoti1 · ‎09-17-2020

Thanks alot @Jay-Encodian, am trying this out and will update.

aanyoti1 · ‎09-18-2020

Hi @Jay-Encodian,

Looks like we are nearly there, the last compose action still seems to include the word 'please', not sure why, maybe because it begins on a new line?? What expression should be included in the substring to have this removed?

Jay-Encodian · ‎09-18-2020

Hi @aanyoti1

I don't think you have copied the expressions correctly.

The 'HTML to Text' is used to remove the line breaks, the last expression substrings the output from the 'HTML to Text' output starting at the first character and then ending at the first blank space.

You need to recheck what you have entered... can you also please post screen shots of the outputs from all the actions as per my previous screenshot and also include the expression you have entered

Thanks

aanyoti1 · ‎09-19-2020

Hi @Jay-Encodian

I checked again and this is what I entered:

Compose - Trim Start Action:

substring(variables('Text'),lastIndexOf(variables('Text'),'https'))

Compose Action:

substring(outputs('Html_to_text')?['body'],0,indexOf(outputs('Html_to_text')?['body'],' '))

See below flow:

Jay-Encodian · ‎09-20-2020

@aanyoti1 Can you post the a copy of the data you are trying to process... I don;t think there is a space between the URL and please.

aanyoti1 · ‎09-20-2020

Hi @Jay-Encodian,

See below:

Hello,

Your Approval has been requested for 
Products in the Basket: https://mydomain.12345.com/a3G5I000000bxkP

Please click link above to approve or reject this record. 

Thank you!

Jay-Encodian · ‎09-20-2020

Hi @aanyoti1 ... hmmm, same data with the expressions I have already provided to you

substring(variables('Text'),lastIndexOf(variables('Text'),'https'))

substring(outputs('Html_to_text')?['body'],0,indexOf(outputs('Html_to_text')?['body'],' '))

Can you please click on the 'Raw outputs' from the Html to Text action... I think there is some extra data in the payload.

aanyoti1 · ‎09-22-2020

Hi @Jay-Encodian,

See below raw output of the HTML to Text:

{
    "statusCode": 200,
    "headers": {
        "Pragma": "no-cache",
        "Transfer-Encoding": "chunked",
        "Vary": "Accept-Encoding",
        "Strict-Transport-Security": "max-age=31536000; includeSubDomains",
        "X-Content-Type-Options": "nosniff",
        "X-Frame-Options": "DENY",
        "Timing-Allow-Origin": "*",
        "x-ms-apihub-cached-response": "false",
        "Cache-Control": "no-store, no-cache",
        "Date": "Fri, 18 Sep 2020 14:51:08 GMT",
        "Set-Cookie": "ARRAffinity=7007353e6908e52d8a882a0d248752a54a5a3b25dfdde97fabc8ecca38b8d51c;Path=/;HttpOnly;Domain=conversionservice-ne.azconn-ne.p.azurewebsites.net",
        "Content-Type": "text/html; charset=utf-8",
        "Expires": "-1",
        "Content-Length": "127"
    },
    "body": "https://12345.abcde.com/a3G5I000000bxkP\n\nPlease click link above to approve or reject this record. \n\nThank you!"
}

Jay-Encodian · ‎09-22-2020

Updated Sept 23:

There is now a much simpler solution available to extract a URL(s) from the body of an email (or any other text value) by using the Encodian Utility - Extract URL's from Text action.

Consider this simple flow:

Which extracts all the contained URL's within the email:

You can a further flow example and instructional video on this post: Extract URLs from Text and Documents with Power Automate

microsoftie · ‎07-27-2021

I have followed followed your step by step and I am still having the same issue as the OP.

my last compose raw output is:

"https://apps-lb.totalcloudpacs.com/r/cf86697abeeeeeee50b8b\n\n\n\nThis"

crt8735 · ‎04-08-2022

@microsoftie try this solution:
https://powerusers.microsoft.com/t5/Building-Flows/Replace-Newline-in-Flow-Expression/td-p/57333

Adomin · ‎02-01-2023

Hello! I need help.
When I tried to reproduce your flow, I sometimes got this kind of error. What could be the possible causes?

Adomin · ‎02-01-2023

@Jay-Encodian 👆

Adomin · ‎02-01-2023

@Jay-Encodian

carlilelance · ‎11-01-2023

This should be the accepted answer because it is by far more dynamic than the others. The regex might look ugly, but it doesn't rely on specific formatting (outside of it being a link). With an email, I guarantee, no one is going to follow any formatting rules.

AlexEncodian · ‎03-22-2024

@aanyoti1

Encodian has since released a utility action (one that consumes only 0.05 credits per operation) to extract URLs reliably, quickly and with no expressions.

https://support.encodian.com/hc/en-gb/articles/11056297407261-Utility-Extract-URL-s-from-Text

See this example solution:

https://www.encodian.com/blog/extract-urls-from-text-and-documents-with-power-automate