Hello Guys,
Is someone able to build up a flow that can extract all the contacts informations listed in all the companies of this website?
i canno't find the right selector to grab it
https://www.energaia.fr/visiter/liste-des-exposants/
thanks for helping,
regards,
Fred
Solved! Go to Solution.
This should work for you. Paste the whole text into an empty PAD flow.
Look it through and see if it makes sense to you.
You must have this page open when you run it: https://exposants.energaia.fr/form/liste_exposant&lang=fr&session=EN22&langue_id=1
https://regexone.com/ takes you through many of the basics, but (a lot of) practice is what makes... proficient, at some point 😅
But actually you shouldn't even need the regular expressions that much moving forward, since the Crop text action can do the whole "get text between two other texts" thing that I did with parse text.
By the way, the Replace text that I add, is just because I find new lines to be annoying when it comes to parsing, so I tend to reduce them to regular spaces.
@VJR @Henri @Henrik_M @Ankesh_49
Hi Team, requesting your help to manage selector and build this flow i'm asking for...
May you help please?
thx fred
@PAuserFromFranc Could you please share the flow you have developed? which selector are you using?
@Ankesh_49 at the moment nothing but i want to grab information from
https://www.energaia.fr/visiter/liste-des-exposants/
for each company name found in the table, then open the little down arrow and get name, phone, email, adress and field of activity for thoses companies and also for each page (we need also pagination)
Usually i know how to do it but i can't here normally selector would be :
body > main > section > div > div > form > div:eq(3) > table > tr.odd:nth-child(1) > td:nth-child(1) > span:nth-child(2)
something like this with attribute like tr[Class="odd"]
From my limited research on this, most I can tell you is that you'll need to run a javascript in order for this to work on the webpage. Why? well because the company information is not actually on the webpage, but rather an imbued document, an iframe. Thus to access it you need to run javascript that can somehow switch the CSS selector from the main page to this iframe of the imbued document. I have honestly no idea how to do that and I hope someone more well versed in this will come along to help out with this.
Figured it out, do the following:
-Launch the webpage
-Use Extract data from web page
-In the advanced options write the following css selector - > "iframe:eq(0)" with the following attribute "src"
this will get you the html link of the iframe, now simply launch a new chrome instance with that link and from there you can extract everything as normal.
Enjoy.
thank you i could step a bit but i'm still stuck with a javascript to execute to open the little arrow and get the datas
Shouldn't need javascript for that, does extract data from webpage not work?
Alright, I can see now why you were struggling on that extraction part, got some good news and some bad news,
Good news, I made an automation that does what you want, opens the arrow, extracts text and moves to the next.
Bad news? its 1 minute 30 seconds per page (in 1ms delay debug mode)
I don't see a way of improving that time other than maybe just using an API call method or something (I have no idea how to do that don't even ask)
However if you want this automation, private message me (just click my profile, should see the button on the right), I'll send it over to you, it will need some editing from your end though.
In summary it does the following:
- Creates a new Datatable
- Gets the number of pages to go through
- Creates a loop based on the number of pages
- Extracts arrows count on page
- Goes through each arrow, extracting specific text (can be edited to extract w/e)
- Puts that information into the Datatable (will need editing if the above is changed)
- Once done, moves to next page and repeat till finished.
Step one should be to enter the iframe directly: https://exposants.energaia.fr/form/liste_exposant&lang=fr&session=EN22&langue_id=1
I thought about the program in my head, and it should be possible. I'll see if I have time to make it during the weekend, then I can share.
This should work for you. Paste the whole text into an empty PAD flow.
Look it through and see if it makes sense to you.
You must have this page open when you run it: https://exposants.energaia.fr/form/liste_exposant&lang=fr&session=EN22&langue_id=1
Wait, Henrik, how'd you put a zip file attachment in your message? I can't seem to do it, just tells me its no supported.
I might have more privileges because of the Super User status. I only got the "not supported" message when I tried uploading the .txt file 🤔
Ah, fair enough.
Thank you so much @Henrik_M
I'm now trying to understand the flow you made but too difficult. I don't get this for instance :
table[Id="exposant"] > tbody > tr > td > span[Class*="fa-chevron"]:eq(%LoopIndex_Chevron%)
loopIndex_Chevron is variable and you use it as attribute right to keep forward?
And what does mean the little * after Class?
The rest i get it i think but very complex for me to think the algorithmes this way...
thank for all
Fred
Correct. Since we know that there are 25 entries on each page, we count from index 0 to 24.
*= is the way to write the contains operator between an attribute (the class) and the value (fa-chevron)
So in this case, we are able to advance down through the list and open each description box, regardless of the chevron type.
Hi @Henrik_M where can i learn Regex like you did (?<=Contact : ).+?(?=string) and so on? i don't get it and i'm not into code or regex so i can't understand it well in order to use it for similar flows which attend to be some others texts to parse
thanks
https://regexone.com/ takes you through many of the basics, but (a lot of) practice is what makes... proficient, at some point 😅
But actually you shouldn't even need the regular expressions that much moving forward, since the Crop text action can do the whole "get text between two other texts" thing that I did with parse text.
By the way, the Replace text that I add, is just because I find new lines to be annoying when it comes to parsing, so I tend to reduce them to regular spaces.
User | Count |
---|---|
13 | |
7 | |
6 | |
6 | |
5 |
User | Count |
---|---|
17 | |
17 | |
15 | |
12 | |
11 |