cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
lazy
Frequent Visitor

Extract data from webpage to excel

Hello

I am trying to extract data from a webpage (see below sample html) to an excel.

Some "Details" have multiple lines so I cannot handpick values.

What is the best way to achieve that?

I am quite confused with the CSS tags etc (suggested reading welcome)

 

 

<!DOCTYPE html>
<html>
<head>
	<title>Title</title>
</head>
<body>
	<div class="logo">
		<img src="/images/Banner.png"
			alt="alt name" title="ALT NAME" />			
	</div>
</body>
</html>
<div id="mainContent">
	<div>	</div>
	<h1 id="infoHeader">Information about </h1>
	<div>
		<a >xxxxx</a>
	</div>
	<div class="clearBoth"></div>
	<div class="Info">
	<fieldset>
		<legend>Details 1</legend>
			<table class="table">
				<tbody>
					<tr>
						<th>Label 1</th> <td>Data 1</td>
						<th>Label 2</th> <td>Data 2</td>
					</tr>
					<tr>
						<th>Label 3</th> <td>Data 3</td>
						<th>Label 4</th> <td>Data 4</td>
					</tr>
					<tr>
						<th>Label 5</th> <td>Data 5</td>
						<th>Label 6</th> <td>Data 6</td>
					</tr>
					<tr>
						<th>Label 7</th> <td>Data 7</td>
						<th>Label 8</th> <td>Data 8</td>
					</tr>
					<tr>
						<th>Label 9</th> <td colspan="3">Data 9</td>
					</tr>
				</tbody>
			</table>
	</fieldset>

	<fieldSet>          
		<legend>Details 2</legend>
		<ul>
			<li><span>Label 10</span><span>Label 11</span><span>Label 12</span><span>Label 13</span></li>
		</ul>
		<ol>
			<li><span>&nbsp;Data 10</span><span>&nbsp;Data 11</span><span>&nbsp;Data 12</span><span>&nbsp;Data 13</span></li>
		</ol>
	</fieldSet>                                 

	<fieldSet>          
		<legend>Details 3</legend>
		<ul>
			<li><span>Label 14</span><span>Label 15</span><span>Label 16</span><span>Label 17</span><span>Label 18</span><span>Label 19</span></li>
		</ul>
		<ol>
			<li><span>&nbsp;Data14-1</span><span>&nbsp;Data 15-1</span><span>&nbsp;Data 16-1</span><span>&nbsp;Data 17-1</span><span>&nbsp;Data 18-1</span><span>&nbsp;Data 19-1</span></li>
			<li><span>&nbsp;Data14-2</span><span>&nbsp;Data 15-2</span><span>&nbsp;Data 16-2</span><span>&nbsp;Data 17-2</span><span>&nbsp;Data 18-2</span><span>&nbsp;Data 19-2</span></li>
		</ol>
	</fieldSet>    


	</div>

	</body>
	</html>

 

9 REPLIES 9
ViditGholam
Continued Contributor
Continued Contributor

Hi @lazy  check out this youtube video if it can help you in any way to extract data from a web page

https://www.youtube.com/watch?v=QllyIdxm4H0

 

Hope this helps ! 

If this resolves your issue please mark this post as answered and hit me a thumps up.

Thanks and Regards,

 Vidit

GeoffRen
Microsoft
Microsoft

What do you want to extract?

lazy
Frequent Visitor

Basically everything that is Data1, Data2, Data3 etc...

While the above can be handpicked values to be placed in cells on an excel, I have a particular problem with the part "Details 3": this is a dynamic table which can contain n number of lines (Data14-1, 14-2 ... 14-n) and this is where I am particulary stuck.

Given that this webpage seems legacy as it does not use proper HTML table, the live helper does not recognise it as such

TIA

Thanks for the reply - I have already gone over that video and while it helped for certain parts (e.g. for handpicking selected values) I am a bit stuck with the second part - see my reply to @GeoffRen below.

What format can the Data1, Data2, etc strings take? If they're all the same format you can use a text parser (probably regex) to get all the dynamic data. Or if the structure of the html is always going to be the same then you can parse out what you want depending on the format of the tags.

fraenK
Memorable Member
Memorable Member

Fieldset 1 contains a table

Fieldset 2 & 3 contains two lists 

 

The easiest would be to extract label (th, ul > li > span) and data cells (td, ol > li > span) both as lists and then merge them into the desired table, depending on the output.

 

lazy
Frequent Visitor

Thanks

I'll give this a try and revert

lazy
Frequent Visitor

Format you mean data type?

They are either text or numbers and the columns are not dynamic only the rows are

AnkushKrSingh
Frequent Visitor

Have you tried the extract data from webpage action to fetch the details. This will return you your data in many form like, list,Data table and variable.

Later you can manipulate these as per your need and then can write the data in excel. If you need more information then please let me know.

Helpful resources

Announcements
Power Automate News & Announcements

Power Automate News & Announcements

Keep up to date with current events and community announcements in the Power Automate community.

Community Calls Conversations

Community Calls Conversations

A great place where you can stay up to date with community calls and interact with the speakers.

Power Automate Community Blog

Power Automate Community Blog

Check out the latest Community Blog from the community!

Users online (2,254)