cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
pjwilson87
Helper I
Helper I

Extract Data from Web Page - Custom CSS Selector

Hello!

 

I am using the action 'Extract data from wepage' and trying to extract a 'Table' with 'Custom CSS Selectors'.

 

Website Example:  https://mckesson.wd3.myworkdayjobs.com/External_Careers?

 

 

So I added a custom CSS Selector which PAD will recognize when extracting for a 'Single Value':

 

single_value.png

 

However, PAD does not seem to recognize custom CSS Selectors when extracting a 'Table':

 

table.png

 

Is it possible to use custom CSS selectors when extracting a 'Table' in PAD?

Or am i doing something wrong?

1 ACCEPTED SOLUTION

Accepted Solutions

My experience shows, that paging in this action in both PAD and WA usually assumes the page has loaded as soon as the paging element becomes available again. In most pages it thus does not really work the way you would think it does. It presses the button, but then extracts the same page a few times by the time the next page actually loads and thus you end up with duplicate data.

 

The reason it returns to the first page is because the functionality is built in a way where the action should return to the original page after extraction. This part of it actually works properly. But when I test it on this page and limit the extraction to the first 5 pages, I usually get 5 copies of the same page extracted before the next page loads. That is because the paging buttons actually do refresh right away after clicking (you can test that manually, too), while the actual results load a bit slower.

 

So, a brief version would be no - it does not work.

 

The funky thing with this exact page, however, is that passing the page index to the URL is not allowed either. It will take you to the 1st page all the time. This might also possibly indicate that the pagination is built slightly differently here.

 

My suggestion is thus running an Invoke web service action to their exposed API. You can copy this action and this will return the same results in JSON format:

Web.InvokeWebService.InvokeWebService Url: $'''https://mckesson.wd3.myworkdayjobs.com/wday/cxs/mckesson/External_Careers/jobs''' Method: Web.Method.Post Accept: $'''application/json''' ContentType: $'''application/json''' RequestBody: $'''{\"limit\": 20, \"offset\": 0, \"searchText\": \"\", \"appliedFacets\": {}}''' ConnectionTimeout: 120 FollowRedirection: True ClearCookies: False FailOnErrorStatus: False EncodeRequestBody: False UserAgent: $'''Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.21) Gecko/20100312 Firefox/3.6''' Encoding: Web.Encoding.AutoDetect AcceptUntrustedCertificates: True ResponseHeaders=> WebServiceResponseHeaders Response=> WebServiceResponse StatusCode=> StatusCode

It looks like this:

AgniusBartninka_0-1635750664617.png

The response will look like this (excluding the Facets, which are irrelevant anyway):

{
    "total": 962,
    "jobPostings": [
        {
            "title": "Casual Registered Nurse - Nanaimo",
            "externalPath": "/job/Nanaimo/Casual-Registered-Nurse---Nanaimo_JR0053724",
            "locationsText": "Nanaimo",
            "postedOn": "Posted Yesterday",
            "bulletFields": [
                "JR0053724"
            ]
        },
        {
            "title": "Sr. Profitability Analyst",
            "externalPath": "/job/Richmond-Metro/Sr-Profitability-Analyst_JR0051351",
            "locationsText": "Richmond Metro",
            "postedOn": "Posted 2 Days Ago",
            "bulletFields": [
                "JR0051351"
            ]
        },
        {
            "title": "Associate Accounts Payable Analyst",
            "externalPath": "/job/DallasFort-Worth-Metro/Associate-Accounts-Payable-Analyst_JR0053567",
            "locationsText": "Dallas/Fort Worth Metro",
            "postedOn": "Posted 2 Days Ago",
            "bulletFields": [
                "JR0053567"
            ]
        },
        {
            "title": "Customer Engagement - Pharmacy - Reconciliation Advisor-2",
            "externalPath": "/job/DallasFort-Worth-Metro/Reconciliation-Advisor-2_JR0051047",
            "locationsText": "2 Locations",
            "postedOn": "Posted 2 Days Ago",
            "bulletFields": [
                "JR0051047"
            ]
        },
        {
            "title": "Account Manager HME  (AL/West GA)",
            "externalPath": "/job/GA-Work-at-Home/Account-Manager-HME---AL-West-GA-_JR0053671",
            "locationsText": "2 Locations",
            "postedOn": "Posted 3 Days Ago",
            "bulletFields": [
                "JR0053671"
            ]
        },
        {
            "title": "Compliance Advisory & Monitoring Leader",
            "externalPath": "/job/DallasFort-Worth-Metro/Compliance-Advisory---Monitoring-Leader_JR0053502",
            "locationsText": "4 Locations",
            "postedOn": "Posted 3 Days Ago",
            "bulletFields": [
                "JR0053502"
            ]
        },
        {
            "title": "Senior Director - Compliance Monitoring & Advisory",
            "externalPath": "/job/DallasFort-Worth-Metro/Senior-Director---Compliance-Monitoring---Advisory_JR0054015-1",
            "locationsText": "3 Locations",
            "postedOn": "Posted 3 Days Ago",
            "bulletFields": [
                "JR0054015"
            ]
        },
        {
            "title": "Bilingual Patient Services Specialist",
            "externalPath": "/job/Mississauga/Bilingual-Patient-Services-Specialist_JR0053785",
            "locationsText": "Mississauga",
            "postedOn": "Posted 3 Days Ago",
            "bulletFields": [
                "JR0053785"
            ]
        },
        {
            "title": "Business Development Executive - Oncology",
            "externalPath": "/job/GA-Work-at-Home/Business-Development-Executive---Oncology_JR0053401",
            "locationsText": "2 Locations",
            "postedOn": "Posted 3 Days Ago",
            "bulletFields": [
                "JR0053401"
            ]
        },
        {
            "title": "Implementation Analyst",
            "externalPath": "/job/DallasFort-Worth-Metro/Implementation-Analyst_JR0051460",
            "locationsText": "2 Locations",
            "postedOn": "Posted 3 Days Ago",
            "bulletFields": [
                "JR0051460"
            ]
        },
        {
            "title": "Stagiaire - Programmeur Analyst, Développeur, Eté 2022 / Intern Software Developer, Summer 2022",
            "externalPath": "/job/Greater-Montreal-Area/Stagiaire---Programmeur-Analyst--Dveloppeur--Et-2022---Intern-Software-Developer--Summer-2022_JR0053942",
            "locationsText": "Greater Montreal Area",
            "postedOn": "Posted 3 Days Ago",
            "bulletFields": [
                "JR0053942"
            ]
        },
        {
            "title": "Sr. Associate Data Management Analyst REMOTE",
            "externalPath": "/job/Work-at-Home---New-Jersey-USA-All-Zones-WNJA/Sr-Associate-Data-Management-Analyst-REMOTE_JR0053713-1",
            "locationsText": "15 Locations",
            "postedOn": "Posted 3 Days Ago",
            "bulletFields": [
                "JR0053713"
            ]
        },
        {
            "title": "Duluth, GA - Warehouse Worker - Full Time - Night Shift",
            "externalPath": "/job/Duluth-GA-USA---2975-Evergreen-Drive-8148/Duluth--GA---Warehouse-Worker---Full-Time---Night-Shift_JR0054131",
            "locationsText": "Duluth, GA, USA - 2975 Evergreen Drive (8148)",
            "postedOn": "Posted 3 Days Ago",
            "bulletFields": [
                "JR0054131"
            ]
        },
        {
            "title": "Spécialiste en chef, opérations pharmacie  / Lead, Pharmacy Operations (Uniprix et Proxim)",
            "externalPath": "/job/Greater-Montreal-Area/Spcialiste-en-chef--oprations-pharmacie----Lead--Pharmacy-Operations--Uniprix-et-Proxim-_JR0044359-1",
            "locationsText": "2 Locations",
            "postedOn": "Posted 3 Days Ago",
            "bulletFields": [
                "JR0044359"
            ]
        },
        {
            "title": "Pricing & Contracts Analyst",
            "externalPath": "/job/TX-The-Woodlands/Pricing---Contracts-Analyst_JR0053596-1",
            "locationsText": "2 Locations",
            "postedOn": "Posted 3 Days Ago",
            "bulletFields": [
                "JR0053596"
            ]
        },
        {
            "title": "Gestionnaire principal de projets – Centre d’excellence Transport / Senior Project Manager – Transportation Centre of Excellence",
            "externalPath": "/job/Greater-Montreal-Area/Gestionnaire-principal-de-projets---Centre-d-excellence-Transport---Senior-Project-Manager---Transportation-Centre-of-Excellence_JR0043733",
            "locationsText": "2 Locations",
            "postedOn": "Posted 3 Days Ago",
            "bulletFields": [
                "JR0043733"
            ]
        },
        {
            "title": "Spécialiste, Taxes indirectes / Specialist, Indirect Tax (contrat 12 mois)",
            "externalPath": "/job/Greater-Montreal-Area/Spcialiste--Taxes-indirectes---Specialist--Indirect-Tax--contrat-12-mois-_JR0053819",
            "locationsText": "Greater Montreal Area",
            "postedOn": "Posted 3 Days Ago",
            "bulletFields": [
                "JR0053819"
            ]
        },
        {
            "title": "Sr. Director, Employee Experience and Technology Enablement",
            "externalPath": "/job/DallasFort-Worth-Metro/Sr-Director--Employee-Experience-and-Technology-Enablement_JR0053640-1",
            "locationsText": "2 Locations",
            "postedOn": "Posted 3 Days Ago",
            "bulletFields": [
                "JR0053640"
            ]
        },
        {
            "title": "Supervisor, Patient Services",
            "externalPath": "/job/Mississauga/Supervisor--Patient-Services_JR0053684",
            "locationsText": "Mississauga",
            "postedOn": "Posted 3 Days Ago",
            "bulletFields": [
                "JR0053684"
            ]
        },
        {
            "title": "Associate Product Support Representative",
            "externalPath": "/job/DallasFort-Worth-Metro/Associate-Product-Support-Representative_JR0052373",
            "locationsText": "2 Locations",
            "postedOn": "Posted 3 Days Ago",
            "bulletFields": [
                "JR0052373"
            ]
        }
    ],
    "userAuthenticated": false
}

You can convert this to a custom object using Convert JSON to custom object action and then will be able to parse it.

The problem, however, is that it is limited to return a maximum of 20 results (pagination). It will return an error if you pass anything else to the "limit" parameter in the body.

But when you run the initial request (with "offset":0), you will also retrieve the total number of results as the very first value in the response. So, you can use that, divide it by 20 and then use pagination. In order to get the next page, you need to increase the value under "offset" in the request body.

So, for example, passing this to the body will return the first page:

{"limit": 20, "offset": 0, "searchText": "", "appliedFacets": {}}

but passing this to the body, will return the second page:

{"limit": 20, "offset": 20, "searchText": "", "appliedFacets": {}}

 etc.

 

This will be much more stable than doing it via the UI.

View solution in original post

6 REPLIES 6
AgniusBartninka
Advocate III
Advocate III

This will not work for a table or a list, because you need to provide the HTML structure for each item under the base selector. Here's what works for me:

AgniusBartninka_0-1635668233298.png

When extracting only a single value, specifying the very last item in the structure is enough (the a element), but if you want to extract all of them, you will need to be more specific.

 

Also, please note that your paging selector also seems to be built incorrectly. Avoid any classes that has 'WinAutomation' in it. This is a class that is auto-inserted by WinAutomation/PAD when setting up the CSS selectors for the Extract data From web page action. It will disappear as soon as you refresh the page and will no longer be there, so your paging will fail.

Anything that looks like this (see the purple dots around the arrow button):

AgniusBartninka_1-1635668383017.png

Is actually added by WA/PAD when interacting with the page. But it will not be there during runtime and thus you will not find the element if your class name is like that.

 

If you found my reply useful, please upvote it.

If you believe this is a solution to your issue, please mark it as the preferred solution.

@AgniusBartninka 

Thanks for the response. The CSS Selector works the way you showed me in screen shot, I was not sure if using the HTML structure was the only way or not.

 

Do you know how the 'Use paging' option works in PAD?

I made corrections to my CSS Selector and it clicks the 'next' button when I run it, but after page 2, it always reverts back to page 1 and then keeps going back and forth and never stops.

 

How do I need to structure the CSS Selectors for the 'Use paging' option in PAD so it clicks the 'next' page button each time and then stopping at the last page?

 

This is what I have:

 

paging.png

My experience shows, that paging in this action in both PAD and WA usually assumes the page has loaded as soon as the paging element becomes available again. In most pages it thus does not really work the way you would think it does. It presses the button, but then extracts the same page a few times by the time the next page actually loads and thus you end up with duplicate data.

 

The reason it returns to the first page is because the functionality is built in a way where the action should return to the original page after extraction. This part of it actually works properly. But when I test it on this page and limit the extraction to the first 5 pages, I usually get 5 copies of the same page extracted before the next page loads. That is because the paging buttons actually do refresh right away after clicking (you can test that manually, too), while the actual results load a bit slower.

 

So, a brief version would be no - it does not work.

 

The funky thing with this exact page, however, is that passing the page index to the URL is not allowed either. It will take you to the 1st page all the time. This might also possibly indicate that the pagination is built slightly differently here.

 

My suggestion is thus running an Invoke web service action to their exposed API. You can copy this action and this will return the same results in JSON format:

Web.InvokeWebService.InvokeWebService Url: $'''https://mckesson.wd3.myworkdayjobs.com/wday/cxs/mckesson/External_Careers/jobs''' Method: Web.Method.Post Accept: $'''application/json''' ContentType: $'''application/json''' RequestBody: $'''{\"limit\": 20, \"offset\": 0, \"searchText\": \"\", \"appliedFacets\": {}}''' ConnectionTimeout: 120 FollowRedirection: True ClearCookies: False FailOnErrorStatus: False EncodeRequestBody: False UserAgent: $'''Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.21) Gecko/20100312 Firefox/3.6''' Encoding: Web.Encoding.AutoDetect AcceptUntrustedCertificates: True ResponseHeaders=> WebServiceResponseHeaders Response=> WebServiceResponse StatusCode=> StatusCode

It looks like this:

AgniusBartninka_0-1635750664617.png

The response will look like this (excluding the Facets, which are irrelevant anyway):

{
    "total": 962,
    "jobPostings": [
        {
            "title": "Casual Registered Nurse - Nanaimo",
            "externalPath": "/job/Nanaimo/Casual-Registered-Nurse---Nanaimo_JR0053724",
            "locationsText": "Nanaimo",
            "postedOn": "Posted Yesterday",
            "bulletFields": [
                "JR0053724"
            ]
        },
        {
            "title": "Sr. Profitability Analyst",
            "externalPath": "/job/Richmond-Metro/Sr-Profitability-Analyst_JR0051351",
            "locationsText": "Richmond Metro",
            "postedOn": "Posted 2 Days Ago",
            "bulletFields": [
                "JR0051351"
            ]
        },
        {
            "title": "Associate Accounts Payable Analyst",
            "externalPath": "/job/DallasFort-Worth-Metro/Associate-Accounts-Payable-Analyst_JR0053567",
            "locationsText": "Dallas/Fort Worth Metro",
            "postedOn": "Posted 2 Days Ago",
            "bulletFields": [
                "JR0053567"
            ]
        },
        {
            "title": "Customer Engagement - Pharmacy - Reconciliation Advisor-2",
            "externalPath": "/job/DallasFort-Worth-Metro/Reconciliation-Advisor-2_JR0051047",
            "locationsText": "2 Locations",
            "postedOn": "Posted 2 Days Ago",
            "bulletFields": [
                "JR0051047"
            ]
        },
        {
            "title": "Account Manager HME  (AL/West GA)",
            "externalPath": "/job/GA-Work-at-Home/Account-Manager-HME---AL-West-GA-_JR0053671",
            "locationsText": "2 Locations",
            "postedOn": "Posted 3 Days Ago",
            "bulletFields": [
                "JR0053671"
            ]
        },
        {
            "title": "Compliance Advisory & Monitoring Leader",
            "externalPath": "/job/DallasFort-Worth-Metro/Compliance-Advisory---Monitoring-Leader_JR0053502",
            "locationsText": "4 Locations",
            "postedOn": "Posted 3 Days Ago",
            "bulletFields": [
                "JR0053502"
            ]
        },
        {
            "title": "Senior Director - Compliance Monitoring & Advisory",
            "externalPath": "/job/DallasFort-Worth-Metro/Senior-Director---Compliance-Monitoring---Advisory_JR0054015-1",
            "locationsText": "3 Locations",
            "postedOn": "Posted 3 Days Ago",
            "bulletFields": [
                "JR0054015"
            ]
        },
        {
            "title": "Bilingual Patient Services Specialist",
            "externalPath": "/job/Mississauga/Bilingual-Patient-Services-Specialist_JR0053785",
            "locationsText": "Mississauga",
            "postedOn": "Posted 3 Days Ago",
            "bulletFields": [
                "JR0053785"
            ]
        },
        {
            "title": "Business Development Executive - Oncology",
            "externalPath": "/job/GA-Work-at-Home/Business-Development-Executive---Oncology_JR0053401",
            "locationsText": "2 Locations",
            "postedOn": "Posted 3 Days Ago",
            "bulletFields": [
                "JR0053401"
            ]
        },
        {
            "title": "Implementation Analyst",
            "externalPath": "/job/DallasFort-Worth-Metro/Implementation-Analyst_JR0051460",
            "locationsText": "2 Locations",
            "postedOn": "Posted 3 Days Ago",
            "bulletFields": [
                "JR0051460"
            ]
        },
        {
            "title": "Stagiaire - Programmeur Analyst, Développeur, Eté 2022 / Intern Software Developer, Summer 2022",
            "externalPath": "/job/Greater-Montreal-Area/Stagiaire---Programmeur-Analyst--Dveloppeur--Et-2022---Intern-Software-Developer--Summer-2022_JR0053942",
            "locationsText": "Greater Montreal Area",
            "postedOn": "Posted 3 Days Ago",
            "bulletFields": [
                "JR0053942"
            ]
        },
        {
            "title": "Sr. Associate Data Management Analyst REMOTE",
            "externalPath": "/job/Work-at-Home---New-Jersey-USA-All-Zones-WNJA/Sr-Associate-Data-Management-Analyst-REMOTE_JR0053713-1",
            "locationsText": "15 Locations",
            "postedOn": "Posted 3 Days Ago",
            "bulletFields": [
                "JR0053713"
            ]
        },
        {
            "title": "Duluth, GA - Warehouse Worker - Full Time - Night Shift",
            "externalPath": "/job/Duluth-GA-USA---2975-Evergreen-Drive-8148/Duluth--GA---Warehouse-Worker---Full-Time---Night-Shift_JR0054131",
            "locationsText": "Duluth, GA, USA - 2975 Evergreen Drive (8148)",
            "postedOn": "Posted 3 Days Ago",
            "bulletFields": [
                "JR0054131"
            ]
        },
        {
            "title": "Spécialiste en chef, opérations pharmacie  / Lead, Pharmacy Operations (Uniprix et Proxim)",
            "externalPath": "/job/Greater-Montreal-Area/Spcialiste-en-chef--oprations-pharmacie----Lead--Pharmacy-Operations--Uniprix-et-Proxim-_JR0044359-1",
            "locationsText": "2 Locations",
            "postedOn": "Posted 3 Days Ago",
            "bulletFields": [
                "JR0044359"
            ]
        },
        {
            "title": "Pricing & Contracts Analyst",
            "externalPath": "/job/TX-The-Woodlands/Pricing---Contracts-Analyst_JR0053596-1",
            "locationsText": "2 Locations",
            "postedOn": "Posted 3 Days Ago",
            "bulletFields": [
                "JR0053596"
            ]
        },
        {
            "title": "Gestionnaire principal de projets – Centre d’excellence Transport / Senior Project Manager – Transportation Centre of Excellence",
            "externalPath": "/job/Greater-Montreal-Area/Gestionnaire-principal-de-projets---Centre-d-excellence-Transport---Senior-Project-Manager---Transportation-Centre-of-Excellence_JR0043733",
            "locationsText": "2 Locations",
            "postedOn": "Posted 3 Days Ago",
            "bulletFields": [
                "JR0043733"
            ]
        },
        {
            "title": "Spécialiste, Taxes indirectes / Specialist, Indirect Tax (contrat 12 mois)",
            "externalPath": "/job/Greater-Montreal-Area/Spcialiste--Taxes-indirectes---Specialist--Indirect-Tax--contrat-12-mois-_JR0053819",
            "locationsText": "Greater Montreal Area",
            "postedOn": "Posted 3 Days Ago",
            "bulletFields": [
                "JR0053819"
            ]
        },
        {
            "title": "Sr. Director, Employee Experience and Technology Enablement",
            "externalPath": "/job/DallasFort-Worth-Metro/Sr-Director--Employee-Experience-and-Technology-Enablement_JR0053640-1",
            "locationsText": "2 Locations",
            "postedOn": "Posted 3 Days Ago",
            "bulletFields": [
                "JR0053640"
            ]
        },
        {
            "title": "Supervisor, Patient Services",
            "externalPath": "/job/Mississauga/Supervisor--Patient-Services_JR0053684",
            "locationsText": "Mississauga",
            "postedOn": "Posted 3 Days Ago",
            "bulletFields": [
                "JR0053684"
            ]
        },
        {
            "title": "Associate Product Support Representative",
            "externalPath": "/job/DallasFort-Worth-Metro/Associate-Product-Support-Representative_JR0052373",
            "locationsText": "2 Locations",
            "postedOn": "Posted 3 Days Ago",
            "bulletFields": [
                "JR0052373"
            ]
        }
    ],
    "userAuthenticated": false
}

You can convert this to a custom object using Convert JSON to custom object action and then will be able to parse it.

The problem, however, is that it is limited to return a maximum of 20 results (pagination). It will return an error if you pass anything else to the "limit" parameter in the body.

But when you run the initial request (with "offset":0), you will also retrieve the total number of results as the very first value in the response. So, you can use that, divide it by 20 and then use pagination. In order to get the next page, you need to increase the value under "offset" in the request body.

So, for example, passing this to the body will return the first page:

{"limit": 20, "offset": 0, "searchText": "", "appliedFacets": {}}

but passing this to the body, will return the second page:

{"limit": 20, "offset": 20, "searchText": "", "appliedFacets": {}}

 etc.

 

This will be much more stable than doing it via the UI.

@AgniusBartninka 

I see. Is it possible to add multiple passes in the 'Request body' to get all those pages in one 'Invoke webservice' action? I tried using a comma and semicolon but does not work.

AgniusBartninka
Advocate III
Advocate III

@pjwilson87 

Nope. That's the idea - it is limited to 20 results per page. If you pass in any other number instead of 20, an error will be returned. That's how their endpoint is built and that's why you don't have an option in their web page to change the number of search results per page.

It is still a much more efficient way than doing it via the UI, in my opinion. You just need to build it in a loop until you get no further results. You would need to do the same thing with the Extract data from web page action, too, because it does not work with the paging element anyway.

@AgniusBartninka 

I agree, this way seems more efficient and should work just fine. Thank you!

Helpful resources

Announcements
MPA Virtual Workshop Carousel 768x460.png

Register for a Free Workshop

Learn to digitize and optimize business processes and connect all your applications to share data in real time.

New Process Advisor Capabilities carousel.png

Read the blog for the latest news

Read the latest about new experiences and capabilities in the Power Automate product blog.

PA Survey Carousel Image.png

We want to hear from you!

If you are a small business ISV/Reseller, share your thoughts with our research team.

Users online (2,240)